Classification Random Forest SRC Learner
Source:R/learner_randomForestSRC_classif_rfsrc.R
mlr_learners_classif.rfsrc.Rd
Random forest for classification.
Calls randomForestSRC::rfsrc()
from randomForestSRC.
Meta Information
Task type: “classif”
Predict Types: “response”, “prob”
Feature Types: “logical”, “integer”, “numeric”, “factor”
Required Packages: mlr3, mlr3extralearners, randomForestSRC
Parameters
Id | Type | Default | Levels | Range |
ntree | integer | 500 | \([1, \infty)\) | |
mtry | integer | - | \([1, \infty)\) | |
mtry.ratio | numeric | - | \([0, 1]\) | |
nodesize | integer | 15 | \([1, \infty)\) | |
nodedepth | integer | - | \([1, \infty)\) | |
splitrule | character | gini | gini, auc, entropy | - |
nsplit | integer | 10 | \([0, \infty)\) | |
importance | character | FALSE | FALSE, TRUE, none, permute, random, anti | - |
block.size | integer | 10 | \([1, \infty)\) | |
bootstrap | character | by.root | by.root, by.node, none, by.user | - |
samptype | character | swor | swor, swr | - |
samp | untyped | - | - | |
membership | logical | FALSE | TRUE, FALSE | - |
sampsize | untyped | - | - | |
sampsize.ratio | numeric | - | \([0, 1]\) | |
na.action | character | na.omit | na.omit, na.impute | - |
nimpute | integer | 1 | \([1, \infty)\) | |
proximity | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
distance | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
forest.wt | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
xvar.wt | untyped | - | - | |
split.wt | untyped | - | - | |
forest | logical | TRUE | TRUE, FALSE | - |
var.used | character | FALSE | FALSE, all.trees | - |
split.depth | character | FALSE | FALSE, all.trees, by.tree | - |
seed | integer | - | \((-\infty, -1]\) | |
do.trace | logical | FALSE | TRUE, FALSE | - |
get.tree | untyped | - | - | |
outcome | character | train | train, test | - |
ptn.count | integer | 0 | \([0, \infty)\) | |
cores | integer | 1 | \([1, \infty)\) | |
save.memory | logical | FALSE | TRUE, FALSE | - |
perf.type | character | - | gmean, misclass, brier, none | - |
case.depth | logical | FALSE | TRUE, FALSE | - |
marginal.xvar | untyped | NULL | - |
Custom mlr3 parameters
mtry
: This hyperparameter can alternatively be set via the added hyperparametermtry.ratio
asmtry = max(ceiling(mtry.ratio * n_features), 1)
. Note thatmtry
andmtry.ratio
are mutually exclusive.sampsize
: This hyperparameter can alternatively be set via the added hyperparametersampsize.ratio
assampsize = max(ceiling(sampsize.ratio * n_obs), 1)
. Note thatsampsize
andsampsize.ratio
are mutually exclusive.cores
: This value is set as the optionrf.cores
during training and is set to 1 by default.
References
Breiman, Leo (2001). “Random Forests.” Machine Learning, 45(1), 5–32. ISSN 1573-0565, doi:10.1023/A:1010933404324 .
See also
as.data.table(mlr_learners)
for a table of available Learners in the running session (depending on the loaded packages).Chapter in the mlr3book: https://mlr3book.mlr-org.com/basics.html#learners
mlr3learners for a selection of recommended learners.
mlr3cluster for unsupervised clustering learners.
mlr3pipelines to combine learners with pre- and postprocessing steps.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Super classes
mlr3::Learner
-> mlr3::LearnerClassif
-> LearnerClassifRandomForestSRC
Methods
Inherited methods
Method importance()
The importance scores are extracted from the model slot importance
, returned for
'all'.
Returns
Named numeric()
.
Method selected_features()
Selected features are extracted from the model slot var.used
.
Note: Due to a known issue in randomForestSRC
, enabling var.used = "all.trees"
causes prediction to fail. Therefore, this setting should be used exclusively
for feature selection purposes and not when prediction is required.
Examples
# Define the Learner
learner = lrn("classif.rfsrc", importance = "TRUE")
print(learner)
#>
#> ── <LearnerClassifRandomForestSRC> (classif.rfsrc): Random Forest ──────────────
#> • Model: -
#> • Parameters: importance=TRUE
#> • Packages: mlr3, mlr3extralearners, and randomForestSRC
#> • Predict Types: [response] and prob
#> • Feature Types: logical, integer, numeric, and factor
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, missings, multiclass, oob_error, selected_features,
#> twoclass, and weights
#> • Other settings: use_weights = 'use'
# Define a Task
task = tsk("sonar")
# Create train and test set
ids = partition(task)
# Train the learner on the training ids
learner$train(task, row_ids = ids$train)
print(learner$model)
#> Sample size: 139
#> Frequency of class labels: M=75, R=64
#> Number of trees: 500
#> Forest terminal node size: 1
#> Average no. of terminal nodes: 17.168
#> No. of variables tried at each split: 8
#> Total no. of variables: 60
#> Resampling used to grow trees: swor
#> Resample size used to grow trees: 88
#> Analysis: RF-C
#> Family: class
#> Splitting rule: gini *random*
#> Number of random split points: 10
#> Imbalanced ratio: 1.1719
#> (OOB) Brier score: 0.13649055
#> (OOB) Normalized Brier score: 0.54596218
#> (OOB) AUC: 0.92260417
#> (OOB) Log-loss: 0.43221282
#> (OOB) PR-AUC: 0.90786716
#> (OOB) G-mean: 0.73908727
#> (OOB) Requested performance error: 0.23021583, 0.08, 0.40625
#>
#> Confusion matrix:
#>
#> predicted
#> observed M R class.error
#> M 69 6 0.0800
#> R 26 38 0.4062
#>
#> (OOB) Misclassification rate: 0.2302158
#>
#> Random-classifier baselines (uniform):
#> Brier: 0.25 Normalized Brier: 1 Log-loss: 0.69314718
print(learner$importance())
#> V12 V9 V11 V10 V36 V48
#> 0.0640557480 0.0566155973 0.0557530585 0.0358038146 0.0294256574 0.0272924846
#> V49 V17 V5 V47 V27 V16
#> 0.0253902455 0.0247943570 0.0235911553 0.0228401597 0.0218924515 0.0215296448
#> V28 V52 V15 V37 V45 V6
#> 0.0212257422 0.0202662771 0.0196168969 0.0188036324 0.0171366104 0.0170399295
#> V51 V46 V24 V18 V23 V44
#> 0.0170309863 0.0152328387 0.0142766627 0.0141253941 0.0132207155 0.0126381838
#> V59 V13 V31 V25 V14 V22
#> 0.0125123199 0.0123664954 0.0121991451 0.0113663907 0.0106317692 0.0104656202
#> V58 V53 V39 V38 V35 V19
#> 0.0101803811 0.0100480210 0.0100403462 0.0097340048 0.0097001322 0.0094561415
#> V4 V26 V60 V20 V2 V3
#> 0.0093660343 0.0092179023 0.0088455420 0.0087532288 0.0085833525 0.0073910004
#> V34 V40 V32 V30 V8 V57
#> 0.0071176074 0.0066768577 0.0062680244 0.0062442700 0.0055465278 0.0055443948
#> V1 V21 V54 V33 V43 V41
#> 0.0052207192 0.0050630309 0.0046478376 0.0046430484 0.0045017133 0.0043746676
#> V55 V50 V29 V42 V7 V56
#> 0.0041955008 0.0040610565 0.0036295588 0.0033578693 0.0022059080 0.0009933635
# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)
# Score the predictions
predictions$score()
#> classif.ce
#> 0.115942