Skip to contents

Random forest for classification. Calls randomForestSRC::rfsrc() from randomForestSRC.

Dictionary

This Learner can be instantiated via lrn():

lrn("classif.rfsrc")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “logical”, “integer”, “numeric”, “factor”

  • Required Packages: mlr3, mlr3extralearners, randomForestSRC

Parameters

IdTypeDefaultLevelsRange
ntreeinteger500\([1, \infty)\)
mtryinteger-\([1, \infty)\)
mtry.rationumeric-\([0, 1]\)
nodesizeinteger15\([1, \infty)\)
nodedepthinteger-\([1, \infty)\)
splitrulecharacterginigini, auc, entropy-
nsplitinteger10\([0, \infty)\)
importancecharacterFALSEFALSE, TRUE, none, permute, random, anti-
block.sizeinteger10\([1, \infty)\)
bootstrapcharacterby.rootby.root, by.node, none, by.user-
samptypecharactersworswor, swr-
sampuntyped--
membershiplogicalFALSETRUE, FALSE-
sampsizeuntyped--
sampsize.rationumeric-\([0, 1]\)
na.actioncharacterna.omitna.omit, na.impute-
nimputeinteger1\([1, \infty)\)
proximitycharacterFALSEFALSE, TRUE, inbag, oob, all-
distancecharacterFALSEFALSE, TRUE, inbag, oob, all-
forest.wtcharacterFALSEFALSE, TRUE, inbag, oob, all-
xvar.wtuntyped--
split.wtuntyped--
forestlogicalTRUETRUE, FALSE-
var.usedcharacterFALSEFALSE, all.trees-
split.depthcharacterFALSEFALSE, all.trees, by.tree-
seedinteger-\((-\infty, -1]\)
do.tracelogicalFALSETRUE, FALSE-
get.treeuntyped--
outcomecharactertraintrain, test-
ptn.countinteger0\([0, \infty)\)
coresinteger1\([1, \infty)\)
save.memorylogicalFALSETRUE, FALSE-
perf.typecharacter-gmean, misclass, brier, none-
case.depthlogicalFALSETRUE, FALSE-
marginal.xvaruntypedNULL-

Custom mlr3 parameters

  • mtry: This hyperparameter can alternatively be set via the added hyperparameter mtry.ratio as mtry = max(ceiling(mtry.ratio * n_features), 1). Note that mtry and mtry.ratio are mutually exclusive.

  • sampsize: This hyperparameter can alternatively be set via the added hyperparameter sampsize.ratio as sampsize = max(ceiling(sampsize.ratio * n_obs), 1). Note that sampsize and sampsize.ratio are mutually exclusive.

  • cores: This value is set as the option rf.cores during training and is set to 1 by default.

References

Breiman, Leo (2001). “Random Forests.” Machine Learning, 45(1), 5–32. ISSN 1573-0565, doi:10.1023/A:1010933404324 .

See also

Author

RaphaelS1

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifRandomForestSRC

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.


Method importance()

The importance scores are extracted from the model slot importance, returned for 'all'.

Usage

LearnerClassifRandomForestSRC$importance()

Returns

Named numeric().


Method selected_features()

Selected features are extracted from the model slot var.used.

Note: Due to a known issue in randomForestSRC, enabling var.used = "all.trees" causes prediction to fail. Therefore, this setting should be used exclusively for feature selection purposes and not when prediction is required.

Usage

LearnerClassifRandomForestSRC$selected_features()

Returns

character().


Method oob_error()

OOB error extracted from the model slot err.rate.

Usage

LearnerClassifRandomForestSRC$oob_error()

Returns

numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerClassifRandomForestSRC$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner
learner = lrn("classif.rfsrc", importance = "TRUE")
print(learner)
#> 
#> ── <LearnerClassifRandomForestSRC> (classif.rfsrc): Random Forest ──────────────
#> • Model: -
#> • Parameters: importance=TRUE
#> • Packages: mlr3, mlr3extralearners, and randomForestSRC
#> • Predict Types: [response] and prob
#> • Feature Types: logical, integer, numeric, and factor
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, missings, multiclass, oob_error, selected_features,
#> twoclass, and weights
#> • Other settings: use_weights = 'use', predict_raw = 'FALSE'

# Define a Task
task = tsk("sonar")
# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#>                          Sample size: 139
#>            Frequency of class labels: M=71, R=68
#>                      Number of trees: 500
#>            Forest terminal node size: 1
#>        Average no. of terminal nodes: 17
#> No. of variables tried at each split: 8
#>               Total no. of variables: 60
#>        Resampling used to grow trees: swor
#>     Resample size used to grow trees: 88
#>                             Analysis: RF-C
#>                               Family: class
#>                       Splitting rule: gini *random*
#>        Number of random split points: 10
#>                     Imbalanced ratio: 1.0441
#>                    (OOB) Brier score: 0.13821178
#>         (OOB) Normalized Brier score: 0.55284713
#>                            (OOB) AUC: 0.90461889
#>                       (OOB) Log-loss: 0.43536668
#>                         (OOB) PR-AUC: 0.90555245
#>                         (OOB) G-mean: 0.7851142
#>    (OOB) Requested performance error: 0.20863309, 0.12676056, 0.29411765
#> 
#> Confusion matrix:
#> 
#>           predicted
#>   observed  M  R class.error
#>          M 62  9      0.1268
#>          R 20 48      0.2941
#> 
#>       (OOB) Misclassification rate: 0.2086331
#> 
#> Random-classifier baselines (uniform):
#>    Brier: 0.25   Normalized Brier: 1   Log-loss: 0.69314718
print(learner$importance())
#>            V9           V12           V11           V48           V46 
#>  0.0591728660  0.0480935526  0.0391452987  0.0329641839  0.0281574232 
#>           V49           V10           V36           V52           V21 
#>  0.0274978561  0.0240553024  0.0232326426  0.0221486561  0.0220769943 
#>           V16           V17           V47           V18           V22 
#>  0.0201184013  0.0199195416  0.0198353469  0.0186306710  0.0163277318 
#>           V51           V13            V5           V28           V23 
#>  0.0154222487  0.0146953351  0.0144722776  0.0125477887  0.0119782600 
#>            V8           V45           V20           V31           V37 
#>  0.0110968148  0.0109841681  0.0108106288  0.0098981083  0.0097688741 
#>           V15           V39           V44            V4           V40 
#>  0.0096243512  0.0094071543  0.0091169984  0.0081499202  0.0079949577 
#>           V14            V3           V26           V38           V32 
#>  0.0078404624  0.0064353294  0.0064088442  0.0063951840  0.0062546110 
#>           V41            V2           V29           V53           V27 
#>  0.0059106681  0.0058037914  0.0055832032  0.0052635866  0.0052173882 
#>           V25           V30           V35           V19           V34 
#>  0.0051058188  0.0046486976  0.0042267988  0.0039289246  0.0039026056 
#>           V33           V56           V43            V6            V1 
#>  0.0035051279  0.0029273814  0.0029048620  0.0023242409  0.0017717020 
#>           V24           V58           V55           V57           V42 
#>  0.0017648538  0.0014410300  0.0012957712  0.0010370592  0.0007601007 
#>           V50           V60            V7           V54           V59 
#>  0.0002835716 -0.0004229885 -0.0004585715 -0.0004668857 -0.0023319105 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> classif.ce 
#>  0.1304348