Classification Imbalanced Random Forest Src Learner
mlr_learners_classif.imbalanced_rfsrc.Rd
Imbalanced Random forest for classification between two classes.
Calls randomForestSRC::imbalanced.rfsrc()
from from randomForestSRC.
Meta Information
Task type: “classif”
Predict Types: “response”, “prob”
Feature Types: “logical”, “integer”, “numeric”, “factor”, “ordered”
Required Packages: mlr3, randomForestSRC
Parameters
Id | Type | Default | Levels | Range |
ntree | integer | 500 | \([1, \infty)\) | |
method | character | rfq | rfq, brf, standard | - |
block.size | integer | 10 | \([1, \infty)\) | |
fast | logical | FALSE | TRUE, FALSE | - |
ratio | numeric | - | \([0, 1]\) | |
mtry | integer | - | \([1, \infty)\) | |
mtry.ratio | numeric | - | \([0, 1]\) | |
nodesize | integer | 15 | \([1, \infty)\) | |
nodedepth | integer | - | \([1, \infty)\) | |
splitrule | character | gini | gini, auc, entropy | - |
nsplit | integer | 10 | \([0, \infty)\) | |
importance | character | FALSE | FALSE, TRUE, none, permute, random, anti | - |
bootstrap | character | by.root | by.root, by.node, none, by.user | - |
samptype | character | swor | swor, swr | - |
samp | untyped | - | - | |
membership | logical | FALSE | TRUE, FALSE | - |
sampsize | untyped | - | - | |
sampsize.ratio | numeric | - | \([0, 1]\) | |
na.action | character | na.omit | na.omit, na.impute | - |
nimpute | integer | 1 | \([1, \infty)\) | |
ntime | integer | - | \([1, \infty)\) | |
cause | integer | - | \([1, \infty)\) | |
proximity | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
distance | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
forest.wt | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
xvar.wt | untyped | - | - | |
split.wt | untyped | - | - | |
forest | logical | TRUE | TRUE, FALSE | - |
var.used | character | FALSE | FALSE, all.trees, by.tree | - |
split.depth | character | FALSE | FALSE, all.trees, by.tree | - |
seed | integer | - | \((-\infty, -1]\) | |
do.trace | logical | FALSE | TRUE, FALSE | - |
statistics | logical | FALSE | TRUE, FALSE | - |
get.tree | untyped | - | - | |
outcome | character | train | train, test | - |
ptn.count | integer | 0 | \([0, \infty)\) | |
cores | integer | 1 | \([1, \infty)\) | |
save.memory | logical | FALSE | TRUE, FALSE | - |
perf.type | character | - | gmean, misclass, brier, none | - |
case.depth | logical | FALSE | TRUE, FALSE | - |
Custom mlr3 parameters
mtry
: This hyperparameter can alternatively be set via the added hyperparametermtry.ratio
asmtry = max(ceiling(mtry.ratio * n_features), 1)
. Note thatmtry
andmtry.ratio
are mutually exclusive.sampsize
: This hyperparameter can alternatively be set via the added hyperparametersampsize.ratio
assampsize = max(ceiling(sampsize.ratio * n_obs), 1)
. Note thatsampsize
andsampsize.ratio
are mutually exclusive.cores
: This value is set as the optionrf.cores
during training and is set to 1 by default.
References
O’Brien R, Ishwaran H (2019). “A random forests quantile classifier for class imbalanced data.” Pattern Recognition, 90, 232–249. doi:10.1016/j.patcog.2019.01.036 .
Chao C, Leo B (2004). “Using Random Forest to Learn Imbalanced Data.” University of California, Berkeley.
See also
as.data.table(mlr_learners)
for a table of available Learners in the running session (depending on the loaded packages).Chapter in the mlr3book: https://mlr3book.mlr-org.com/basics.html#learners
mlr3learners for a selection of recommended learners.
mlr3cluster for unsupervised clustering learners.
mlr3pipelines to combine learners with pre- and postprocessing steps.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Super classes
mlr3::Learner
-> mlr3::LearnerClassif
-> LearnerClassifImbalancedRandomForestSRC
Methods
Method importance()
The importance scores are extracted from the slot importance
.
Returns
Named numeric()
.
Examples
# Define the Learner
learner = mlr3::lrn("classif.imbalanced_rfsrc", importance = "TRUE")
print(learner)
#> <LearnerClassifImbalancedRandomForestSRC:classif.imbalanced_rfsrc>: Imbalanced Random Forest
#> * Model: -
#> * Parameters: importance=TRUE
#> * Packages: mlr3, randomForestSRC
#> * Predict Types: [response], prob
#> * Feature Types: logical, integer, numeric, factor, ordered
#> * Properties: importance, missings, oob_error, twoclass, weights
# Define a Task
task = mlr3::tsk("sonar")
# Create train and test set
ids = mlr3::partition(task)
# Train the learner on the training ids
learner$train(task, row_ids = ids$train)
print(learner$model)
#> Sample size: 139
#> Frequency of class labels: 70, 69
#> Number of trees: 3000
#> Forest terminal node size: 1
#> Average no. of terminal nodes: 17.1973
#> No. of variables tried at each split: 8
#> Total no. of variables: 60
#> Resampling used to grow trees: swor
#> Resample size used to grow trees: 88
#> Analysis: RFQ
#> Family: class
#> Splitting rule: auc *random*
#> Number of random split points: 10
#> Imbalanced ratio: 1.0145
#> (OOB) Brier score: 0.13562034
#> (OOB) Normalized Brier score: 0.54248135
#> (OOB) AUC: 0.9320911
#> (OOB) Log-loss: 0.43508752
#> (OOB) PR-AUC: 0.92624941
#> (OOB) G-mean: 0.84024053
#> (OOB) Requested performance error: 0.15975947
#>
#> Confusion matrix:
#>
#> predicted
#> observed M R class.error
#> M 62 8 0.1143
#> R 14 55 0.2029
#>
#> (OOB) Misclassification rate: 0.1582734
print(learner$importance())
#> V52 V12 V9 V17 V4
#> 0.0246319238 0.0232369613 0.0154185672 0.0144151338 0.0144151338
#> V43 V44 V55 V51 V10
#> 0.0144151338 0.0144151338 0.0144151338 0.0101640742 0.0076735902
#> V11 V18 V23 V58 V15
#> 0.0076735902 0.0076735902 0.0076735902 0.0076735902 0.0068036790
#> V16 V19 V22 V24 V25
#> 0.0068036790 0.0068036790 0.0068036790 0.0068036790 0.0068036790
#> V26 V3 V30 V41 V45
#> 0.0068036790 0.0068036790 0.0068036790 0.0068036790 0.0068036790
#> V46 V47 V49 V50 V60
#> 0.0068036790 0.0068036790 0.0068036790 0.0068036790 0.0068036790
#> V7 V36 V27 V48 V1
#> 0.0068036790 0.0022205779 0.0009861981 0.0009861981 0.0000000000
#> V13 V14 V2 V20 V29
#> 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
#> V32 V33 V34 V35 V38
#> 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
#> V39 V40 V42 V5 V53
#> 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
#> V54 V56 V59 V6 V8
#> 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
#> V57 V37 V31 V28 V21
#> -0.0007388897 -0.0067490283 -0.0076041417 -0.0144142484 -0.0151406869
# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)
# Score the predictions
predictions$score()
#> classif.ce
#> 0.1014493