Skip to contents

Imbalanced Random forest for classification between two classes. Calls randomForestSRC::imbalanced.rfsrc() from from randomForestSRC.

Dictionary

This Learner can be instantiated via lrn():

lrn("classif.imbalanced_rfsrc")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “logical”, “integer”, “numeric”, “factor”, “ordered”

  • Required Packages: mlr3, randomForestSRC

Parameters

IdTypeDefaultLevelsRange
ntreeinteger500\([1, \infty)\)
methodcharacterrfqrfq, brf, standard-
block.sizeinteger10\([1, \infty)\)
fastlogicalFALSETRUE, FALSE-
rationumeric-\([0, 1]\)
mtryinteger-\([1, \infty)\)
mtry.rationumeric-\([0, 1]\)
nodesizeinteger15\([1, \infty)\)
nodedepthinteger-\([1, \infty)\)
splitrulecharacterginigini, auc, entropy-
nsplitinteger10\([0, \infty)\)
importancecharacterFALSEFALSE, TRUE, none, permute, random, anti-
bootstrapcharacterby.rootby.root, by.node, none, by.user-
samptypecharactersworswor, swr-
sampuntyped--
membershiplogicalFALSETRUE, FALSE-
sampsizeuntyped--
sampsize.rationumeric-\([0, 1]\)
na.actioncharacterna.omitna.omit, na.impute-
nimputeinteger1\([1, \infty)\)
ntimeinteger-\([1, \infty)\)
causeinteger-\([1, \infty)\)
proximitycharacterFALSEFALSE, TRUE, inbag, oob, all-
distancecharacterFALSEFALSE, TRUE, inbag, oob, all-
forest.wtcharacterFALSEFALSE, TRUE, inbag, oob, all-
xvar.wtuntyped--
split.wtuntyped--
forestlogicalTRUETRUE, FALSE-
var.usedcharacterFALSEFALSE, all.trees, by.tree-
split.depthcharacterFALSEFALSE, all.trees, by.tree-
seedinteger-\((-\infty, -1]\)
do.tracelogicalFALSETRUE, FALSE-
statisticslogicalFALSETRUE, FALSE-
get.treeuntyped--
outcomecharactertraintrain, test-
ptn.countinteger0\([0, \infty)\)
coresinteger1\([1, \infty)\)
save.memorylogicalFALSETRUE, FALSE-
perf.typecharacter-gmean, misclass, brier, none-
case.depthlogicalFALSETRUE, FALSE-

Custom mlr3 parameters

  • mtry: This hyperparameter can alternatively be set via the added hyperparameter mtry.ratio as mtry = max(ceiling(mtry.ratio * n_features), 1). Note that mtry and mtry.ratio are mutually exclusive.

  • sampsize: This hyperparameter can alternatively be set via the added hyperparameter sampsize.ratio as sampsize = max(ceiling(sampsize.ratio * n_obs), 1). Note that sampsize and sampsize.ratio are mutually exclusive.

  • cores: This value is set as the option rf.cores during training and is set to 1 by default.

References

O’Brien R, Ishwaran H (2019). “A random forests quantile classifier for class imbalanced data.” Pattern Recognition, 90, 232–249. doi:10.1016/j.patcog.2019.01.036 .

Chao C, Leo B (2004). “Using Random Forest to Learn Imbalanced Data.” University of California, Berkeley.

See also

Author

HarutyunyanLiana

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifImbalancedRandomForestSRC

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.


Method importance()

The importance scores are extracted from the slot importance.

Usage

LearnerClassifImbalancedRandomForestSRC$importance()

Returns

Named numeric().


Method selected_features()

Selected features are extracted from the model slot var.used.

Usage

LearnerClassifImbalancedRandomForestSRC$selected_features()

Returns

character().


Method oob_error()

OOB error extracted from the model slot err.rate.

Usage

LearnerClassifImbalancedRandomForestSRC$oob_error()

Returns

numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerClassifImbalancedRandomForestSRC$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner
learner = mlr3::lrn("classif.imbalanced_rfsrc", importance = "TRUE")
print(learner)
#> <LearnerClassifImbalancedRandomForestSRC:classif.imbalanced_rfsrc>: Imbalanced Random Forest
#> * Model: -
#> * Parameters: importance=TRUE
#> * Packages: mlr3, randomForestSRC
#> * Predict Types:  [response], prob
#> * Feature Types: logical, integer, numeric, factor, ordered
#> * Properties: importance, missings, oob_error, twoclass, weights

# Define a Task
task = mlr3::tsk("sonar")
# Create train and test set
ids = mlr3::partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#>                          Sample size: 139
#>            Frequency of class labels: 70, 69
#>                      Number of trees: 3000
#>            Forest terminal node size: 1
#>        Average no. of terminal nodes: 17.1973
#> No. of variables tried at each split: 8
#>               Total no. of variables: 60
#>        Resampling used to grow trees: swor
#>     Resample size used to grow trees: 88
#>                             Analysis: RFQ
#>                               Family: class
#>                       Splitting rule: auc *random*
#>        Number of random split points: 10
#>                     Imbalanced ratio: 1.0145
#>                    (OOB) Brier score: 0.13562034
#>         (OOB) Normalized Brier score: 0.54248135
#>                            (OOB) AUC: 0.9320911
#>                       (OOB) Log-loss: 0.43508752
#>                         (OOB) PR-AUC: 0.92624941
#>                         (OOB) G-mean: 0.84024053
#>    (OOB) Requested performance error: 0.15975947
#> 
#> Confusion matrix:
#> 
#>           predicted
#>   observed  M  R class.error
#>          M 62  8      0.1143
#>          R 14 55      0.2029
#> 
#>       (OOB) Misclassification rate: 0.1582734
print(learner$importance())
#>           V52           V12            V9           V17            V4 
#>  0.0246319238  0.0232369613  0.0154185672  0.0144151338  0.0144151338 
#>           V43           V44           V55           V51           V10 
#>  0.0144151338  0.0144151338  0.0144151338  0.0101640742  0.0076735902 
#>           V11           V18           V23           V58           V15 
#>  0.0076735902  0.0076735902  0.0076735902  0.0076735902  0.0068036790 
#>           V16           V19           V22           V24           V25 
#>  0.0068036790  0.0068036790  0.0068036790  0.0068036790  0.0068036790 
#>           V26            V3           V30           V41           V45 
#>  0.0068036790  0.0068036790  0.0068036790  0.0068036790  0.0068036790 
#>           V46           V47           V49           V50           V60 
#>  0.0068036790  0.0068036790  0.0068036790  0.0068036790  0.0068036790 
#>            V7           V36           V27           V48            V1 
#>  0.0068036790  0.0022205779  0.0009861981  0.0009861981  0.0000000000 
#>           V13           V14            V2           V20           V29 
#>  0.0000000000  0.0000000000  0.0000000000  0.0000000000  0.0000000000 
#>           V32           V33           V34           V35           V38 
#>  0.0000000000  0.0000000000  0.0000000000  0.0000000000  0.0000000000 
#>           V39           V40           V42            V5           V53 
#>  0.0000000000  0.0000000000  0.0000000000  0.0000000000  0.0000000000 
#>           V54           V56           V59            V6            V8 
#>  0.0000000000  0.0000000000  0.0000000000  0.0000000000  0.0000000000 
#>           V57           V37           V31           V28           V21 
#> -0.0007388897 -0.0067490283 -0.0076041417 -0.0144142484 -0.0151406869 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> classif.ce 
#>  0.1014493