Classification Random Forest SRC Learner
Source:R/learner_randomForestSRC_classif_rfsrc.R
mlr_learners_classif.rfsrc.RdRandom forest for classification.
Calls randomForestSRC::rfsrc() from randomForestSRC.
Meta Information
Task type: “classif”
Predict Types: “response”, “prob”
Feature Types: “logical”, “integer”, “numeric”, “factor”
Required Packages: mlr3, mlr3extralearners, randomForestSRC
Parameters
| Id | Type | Default | Levels | Range |
| ntree | integer | 500 | \([1, \infty)\) | |
| mtry | integer | - | \([1, \infty)\) | |
| mtry.ratio | numeric | - | \([0, 1]\) | |
| nodesize | integer | 15 | \([1, \infty)\) | |
| nodedepth | integer | - | \([1, \infty)\) | |
| splitrule | character | gini | gini, auc, entropy | - |
| nsplit | integer | 10 | \([0, \infty)\) | |
| importance | character | FALSE | FALSE, TRUE, none, permute, random, anti | - |
| block.size | integer | 10 | \([1, \infty)\) | |
| bootstrap | character | by.root | by.root, by.node, none, by.user | - |
| samptype | character | swor | swor, swr | - |
| samp | untyped | - | - | |
| membership | logical | FALSE | TRUE, FALSE | - |
| sampsize | untyped | - | - | |
| sampsize.ratio | numeric | - | \([0, 1]\) | |
| na.action | character | na.omit | na.omit, na.impute | - |
| nimpute | integer | 1 | \([1, \infty)\) | |
| proximity | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
| distance | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
| forest.wt | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
| xvar.wt | untyped | - | - | |
| split.wt | untyped | - | - | |
| forest | logical | TRUE | TRUE, FALSE | - |
| var.used | character | FALSE | FALSE, all.trees | - |
| split.depth | character | FALSE | FALSE, all.trees, by.tree | - |
| seed | integer | - | \((-\infty, -1]\) | |
| do.trace | logical | FALSE | TRUE, FALSE | - |
| get.tree | untyped | - | - | |
| outcome | character | train | train, test | - |
| ptn.count | integer | 0 | \([0, \infty)\) | |
| cores | integer | 1 | \([1, \infty)\) | |
| save.memory | logical | FALSE | TRUE, FALSE | - |
| perf.type | character | - | gmean, misclass, brier, none | - |
| case.depth | logical | FALSE | TRUE, FALSE | - |
| marginal.xvar | untyped | NULL | - |
Custom mlr3 parameters
mtry: This hyperparameter can alternatively be set via the added hyperparametermtry.ratioasmtry = max(ceiling(mtry.ratio * n_features), 1). Note thatmtryandmtry.ratioare mutually exclusive.sampsize: This hyperparameter can alternatively be set via the added hyperparametersampsize.ratioassampsize = max(ceiling(sampsize.ratio * n_obs), 1). Note thatsampsizeandsampsize.ratioare mutually exclusive.cores: This value is set as the optionrf.coresduring training and is set to 1 by default.
References
Breiman, Leo (2001). “Random Forests.” Machine Learning, 45(1), 5–32. ISSN 1573-0565. doi:10.1023/A:1010933404324 .
See also
as.data.table(mlr_learners)for a table of available Learners in the running session (depending on the loaded packages).Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-learners
mlr3learners for a selection of recommended learners.
mlr3cluster for unsupervised clustering learners.
mlr3pipelines to combine learners with pre- and postprocessing steps.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Super classes
mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifRandomForestSRC
Methods
Inherited methods
LearnerClassifRandomForestSRC$importance()
The importance scores are extracted from the model slot importance, returned for
'all'.
Returns
Named numeric().
LearnerClassifRandomForestSRC$selected_features()
Selected features are extracted from the model slot var.used.
Note: Due to a known issue in randomForestSRC, enabling var.used = "all.trees"
causes prediction to fail. Therefore, this setting should be used exclusively
for feature selection purposes and not when prediction is required.
Examples
# Define the Learner
learner = lrn("classif.rfsrc", importance = "TRUE")
print(learner)
#>
#> ── <LearnerClassifRandomForestSRC> (classif.rfsrc): Random Forest ──────────────
#> • Model: -
#> • Parameters: importance=TRUE
#> • Packages: mlr3, mlr3extralearners, and randomForestSRC
#> • Predict Types: [response] and prob
#> • Feature Types: logical, integer, numeric, and factor
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, missings, multiclass, oob_error, selected_features,
#> twoclass, and weights
#> • Other settings: use_weights = 'use', predict_raw = 'FALSE'
# Define a Task
task = tsk("sonar")
# Create train and test set
ids = partition(task)
# Train the learner on the training ids
learner$train(task, row_ids = ids$train)
print(learner$model)
#> Sample size: 139
#> Frequency of class labels: M=73, R=66
#> Number of trees: 500
#> Forest terminal node size: 1
#> Average no. of terminal nodes: 16.778
#> No. of variables tried at each split: 8
#> Total no. of variables: 60
#> Resampling used to grow trees: swor
#> Resample size used to grow trees: 88
#> Analysis: RF-C
#> Family: class
#> Splitting rule: gini *random*
#> Number of random split points: 10
#> Imbalanced ratio: 1.1061
#> (OOB) Brier score: 0.13805003
#> (OOB) Normalized Brier score: 0.55220011
#> (OOB) AUC: 0.90971357
#> (OOB) Log-loss: 0.43658389
#> (OOB) PR-AUC: 0.90224393
#> (OOB) G-mean: 0.79224133
#> (OOB) Requested performance error: 0.20143885, 0.1369863, 0.27272727
#>
#> Confusion matrix:
#>
#> predicted
#> observed M R class.error
#> M 63 10 0.1370
#> R 18 48 0.2727
#>
#> (OOB) Misclassification rate: 0.2014388
#>
#> Random-classifier baselines (uniform):
#> Brier: 0.25 Normalized Brier: 1 Log-loss: 0.69314718
print(learner$importance())
#> V11 V10 V12 V9 V36
#> 9.104496e-02 5.002222e-02 4.960539e-02 3.050644e-02 2.863931e-02
#> V48 V51 V28 V17 V45
#> 2.783171e-02 2.262111e-02 2.247842e-02 2.138015e-02 1.978581e-02
#> V8 V27 V13 V39 V18
#> 1.802014e-02 1.760007e-02 1.585464e-02 1.571938e-02 1.570030e-02
#> V6 V5 V30 V35 V52
#> 1.540693e-02 1.452612e-02 1.353451e-02 1.252646e-02 1.222836e-02
#> V16 V19 V46 V32 V47
#> 1.222377e-02 1.178500e-02 1.177774e-02 1.174463e-02 1.092390e-02
#> V33 V37 V1 V34 V29
#> 1.079690e-02 1.016751e-02 9.610001e-03 9.300966e-03 8.875921e-03
#> V44 V21 V49 V4 V15
#> 7.409488e-03 7.278236e-03 7.226348e-03 6.989485e-03 6.680434e-03
#> V54 V40 V60 V20 V14
#> 5.977187e-03 5.810945e-03 5.666405e-03 5.661770e-03 5.392952e-03
#> V56 V22 V42 V43 V25
#> 5.113906e-03 5.093682e-03 5.090679e-03 4.800775e-03 4.502305e-03
#> V3 V38 V26 V58 V7
#> 4.381942e-03 4.352381e-03 4.227415e-03 3.942928e-03 3.934559e-03
#> V53 V31 V59 V23 V55
#> 3.357772e-03 3.348173e-03 3.054196e-03 2.624813e-03 1.445448e-03
#> V24 V57 V2 V41 V50
#> 1.170903e-03 1.147874e-03 5.807670e-04 9.583024e-06 -1.502324e-04
# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)
# Score the predictions
predictions$score()
#> classif.ce
#> 0.1594203