Survival Random Forest SRC Learner
Source:R/learner_randomForestSRC_surv_rfsrc.R
      mlr_learners_surv.rfsrc.RdRandom survival forest.
Calls randomForestSRC::rfsrc() from randomForestSRC.
Prediction types
This learner returns two prediction types:
- distr: a survival matrix in two dimensions, where observations are represented in rows and (unique event) time points in columns. Calculated using the internal- randomForestSRC::predict.rfsrc()function.
- crank: the expected mortality using- mlr3proba::surv_return().
Meta Information
- Task type: “surv” 
- Predict Types: “crank”, “distr” 
- Feature Types: “logical”, “integer”, “numeric”, “factor” 
- Required Packages: mlr3, mlr3proba, mlr3extralearners, randomForestSRC 
Parameters
| Id | Type | Default | Levels | Range | 
| ntree | integer | 500 | \([1, \infty)\) | |
| mtry | integer | - | \([1, \infty)\) | |
| mtry.ratio | numeric | - | \([0, 1]\) | |
| nodesize | integer | 15 | \([1, \infty)\) | |
| nodedepth | integer | - | \([1, \infty)\) | |
| splitrule | character | logrank | logrank, bs.gradient | - | 
| nsplit | integer | 10 | \([0, \infty)\) | |
| importance | character | FALSE | FALSE, TRUE, none, anti, permute, random | - | 
| block.size | integer | 10 | \([1, \infty)\) | |
| bootstrap | character | by.root | by.root, by.node, none, by.user | - | 
| samptype | character | swor | swor, swr | - | 
| samp | untyped | - | - | |
| membership | logical | FALSE | TRUE, FALSE | - | 
| sampsize | untyped | - | - | |
| sampsize.ratio | numeric | - | \([0, 1]\) | |
| na.action | character | na.omit | na.omit, na.impute | - | 
| nimpute | integer | 1 | \([1, \infty)\) | |
| ntime | integer | 150 | \([0, \infty)\) | |
| proximity | character | FALSE | FALSE, TRUE, inbag, oob, all | - | 
| distance | character | FALSE | FALSE, TRUE, inbag, oob, all | - | 
| forest.wt | character | FALSE | FALSE, TRUE, inbag, oob, all | - | 
| xvar.wt | untyped | - | - | |
| split.wt | untyped | - | - | |
| forest | logical | TRUE | TRUE, FALSE | - | 
| var.used | character | FALSE | FALSE, all.trees | - | 
| split.depth | character | FALSE | FALSE, all.trees, by.tree | - | 
| seed | integer | - | \((-\infty, -1]\) | |
| do.trace | logical | FALSE | TRUE, FALSE | - | 
| get.tree | untyped | - | - | |
| outcome | character | train | train, test | - | 
| ptn.count | integer | 0 | \([0, \infty)\) | |
| estimator | character | nelson | nelson, kaplan | - | 
| cores | integer | 1 | \([1, \infty)\) | |
| save.memory | logical | FALSE | TRUE, FALSE | - | 
| perf.type | character | - | none | - | 
| case.depth | logical | FALSE | TRUE, FALSE | - | 
| marginal.xvar | untyped | NULL | - | 
Custom mlr3 parameters
- estimator: Hidden parameter that controls the type of estimator used to derive the survival function during prediction. The default value is- "chf"which uses a bootstrapped Nelson-Aalen estimator for the cumulative hazard function \(H(t)\), (Ishwaran, 2008) from which we calculate \(S(t) = \exp(-H(t))\), whereas- "surv"uses a bootstrapped Kaplan-Meier estimator to directly estimate \(S(t)\).
- mtry: This hyperparameter can alternatively be set via the added hyperparameter- mtry.ratioas- mtry = max(ceiling(mtry.ratio * n_features), 1). Note that- mtryand- mtry.ratioare mutually exclusive.
- sampsize: This hyperparameter can alternatively be set via the added hyperparameter- sampsize.ratioas- sampsize = max(ceiling(sampsize.ratio * n_obs), 1). Note that- sampsizeand- sampsize.ratioare mutually exclusive.
- cores: This value is set as the option- rf.coresduring training and is set to 1 by default.
Initial parameter values
- ntime: Number of time points to coerce the observed event times for use in the estimated survival function during prediction. We changed the default value of- 150to- 0in order to be in line with other random survival forest learners and use all the unique event times from the train set.
References
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008). “Random survival forests.” The Annals of Applied Statistics, 2(3). doi:10.1214/08-aoas169 , https://doi.org/10.1214/08-aoas169.
Breiman, Leo (2001). “Random Forests.” Machine Learning, 45(1), 5–32. ISSN 1573-0565, doi:10.1023/A:1010933404324 .
See also
- as.data.table(mlr_learners)for a table of available Learners in the running session (depending on the loaded packages).
- Chapter in the mlr3book: https://mlr3book.mlr-org.com/basics.html#learners 
- mlr3learners for a selection of recommended learners. 
- mlr3cluster for unsupervised clustering learners. 
- mlr3pipelines to combine learners with pre- and postprocessing steps. 
- mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces. 
Super classes
mlr3::Learner -> mlr3proba::LearnerSurv -> LearnerSurvRandomForestSRC
Methods
Method importance()
The importance scores are extracted from the model slot importance.
Returns
Named numeric().
Method selected_features()
Selected features are extracted from the model slot var.used.
Note: Due to a known issue in randomForestSRC, enabling var.used = "all.trees"
causes prediction to fail. Therefore, this setting should be used exclusively
for feature selection purposes and not when prediction is required.
Examples
# Define the Learner
learner = lrn("surv.rfsrc", importance = "TRUE")
print(learner)
#> 
#> ── <LearnerSurvRandomForestSRC> (surv.rfsrc): Random Survival Forests ──────────
#> • Model: -
#> • Parameters: importance=TRUE, ntime=0
#> • Packages: mlr3, mlr3proba, mlr3extralearners, and randomForestSRC
#> • Predict Types: [crank] and distr
#> • Feature Types: logical, integer, numeric, and factor
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, missings, oob_error, selected_features, and weights
#> • Other settings: use_weights = 'use'
# Define a Task
task = tsk("grace")
# Create train and test set
ids = partition(task)
# Train the learner on the training ids
learner$train(task, row_ids = ids$train)
print(learner$model)
#>                          Sample size: 670
#>                     Number of deaths: 215
#>                      Number of trees: 500
#>            Forest terminal node size: 15
#>        Average no. of terminal nodes: 27.704
#> No. of variables tried at each split: 3
#>               Total no. of variables: 6
#>        Resampling used to grow trees: swor
#>     Resample size used to grow trees: 423
#>                             Analysis: RSF
#>                               Family: surv
#>                       Splitting rule: logrank *random*
#>        Number of random split points: 10
#>                           (OOB) CRPS: 16.30692282
#>              (OOB) standardized CRPS: 0.09318242
#>    (OOB) Requested performance error: 0.1717055
#> 
print(learner$importance())
#>  revascdays      revasc         age       sysbp         los    stchange 
#> 0.437517400 0.296680657 0.112274433 0.054220399 0.052355207 0.003602335 
# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)
# Score the predictions
predictions$score()
#> surv.cindex 
#>   0.8386391