Random Forest Competing Risks Learner
Source:R/learner_randomForestSRC_cmprsk_rfsrc.R
mlr_learners_cmprsk.rfsrc.RdRandom survival forests for competing risks.
Calls randomForestSRC::rfsrc() from randomForestSRC.
Meta Information
Task type: “cmprsk”
Predict Types: “cif”
Feature Types: “logical”, “integer”, “numeric”, “factor”
Required Packages: mlr3, mlr3cmprsk, mlr3extralearners, randomForestSRC
Parameters
| Id | Type | Default | Levels | Range |
| ntree | integer | 500 | \([1, \infty)\) | |
| mtry | integer | - | \([1, \infty)\) | |
| mtry.ratio | numeric | - | \([0, 1]\) | |
| nodesize | integer | 15 | \([1, \infty)\) | |
| nodedepth | integer | - | \([1, \infty)\) | |
| splitrule | character | logrankCR | logrankCR, logrank | - |
| nsplit | integer | 10 | \([0, \infty)\) | |
| importance | character | FALSE | FALSE, TRUE, none, anti, permute, random | - |
| block.size | integer | 10 | \([1, \infty)\) | |
| bootstrap | character | by.root | by.root, by.node, none, by.user | - |
| samptype | character | swor | swor, swr | - |
| samp | untyped | - | - | |
| membership | logical | FALSE | TRUE, FALSE | - |
| sampsize | untyped | - | - | |
| sampsize.ratio | numeric | - | \([0, 1]\) | |
| na.action | character | na.omit | na.omit, na.impute | - |
| nimpute | integer | 1 | \([1, \infty)\) | |
| ntime | integer | 150 | \([0, \infty)\) | |
| cause | untyped | - | - | |
| proximity | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
| distance | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
| forest.wt | character | FALSE | FALSE, TRUE, inbag, oob, all | - |
| xvar.wt | untyped | - | - | |
| split.wt | untyped | - | - | |
| forest | logical | TRUE | TRUE, FALSE | - |
| var.used | character | FALSE | FALSE, all.trees | - |
| split.depth | character | FALSE | FALSE, all.trees, by.tree | - |
| seed | integer | - | \((-\infty, -1]\) | |
| do.trace | logical | FALSE | TRUE, FALSE | - |
| get.tree | untyped | - | - | |
| outcome | character | train | train, test | - |
| ptn.count | integer | 0 | \([0, \infty)\) | |
| cores | integer | 1 | \([1, \infty)\) | |
| save.memory | logical | FALSE | TRUE, FALSE | - |
| perf.type | character | - | none | - |
| case.depth | logical | FALSE | TRUE, FALSE | - |
| marginal.xvar | untyped | NULL | - |
Initial parameter values
ntime: Number of time points to coerce the observed event times for use in the estimated cumulative incidence functions during prediction. We changed the default value of150to0, meaning we now use all the unique event times from the train set across all competing causes.
Custom mlr3 parameters
mtry: This hyperparameter can alternatively be set via the added hyperparametermtry.ratioasmtry = max(ceiling(mtry.ratio * n_features), 1). Note thatmtryandmtry.ratioare mutually exclusive.sampsize: This hyperparameter can alternatively be set via the added hyperparametersampsize.ratioassampsize = max(ceiling(sampsize.ratio * n_obs), 1). Note thatsampsizeandsampsize.ratioare mutually exclusive.cores: This value is set as the optionrf.coresduring training and is set to 1 by default.
References
Ishwaran, H., Gerds, A. T, Kogalur, B. U, Moore, D. R, Gange, J. S, Lau, M. B (2014). “Random survival forests for competing risks.” Biostatistics, 15(4), 757–773. doi:10.1093/BIOSTATISTICS/KXU010 , https://doi.org/10.1093/BIOSTATISTICS/KXU010.
See also
as.data.table(mlr_learners)for a table of available Learners in the running session (depending on the loaded packages).Chapter in the mlr3book: https://mlr3book.mlr-org.com/basics.html#learners
mlr3learners for a selection of recommended learners.
mlr3cluster for unsupervised clustering learners.
mlr3pipelines to combine learners with pre- and postprocessing steps.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Super classes
mlr3::Learner -> mlr3cmprsk::LearnerCompRisks -> LearnerCompRisksRandomForestSRC
Methods
Method importance()
The importance scores are extracted from the model slot importance and
are cause-specific.
Returns
Named numeric().
Method selected_features()
Selected features are extracted from the model slot var.used.
Note: Due to a known issue in randomForestSRC, enabling var.used = "all.trees"
causes prediction to fail. Therefore, this setting should be used exclusively
for feature selection purposes and not when prediction is required.
Method oob_error()
Extracts the out-of-bag (OOB) cumulative incidence function (CIF) error
from the model's err.rate slot.
If cause = "mean" (default), the function returns a weighted average
of the cause-specific OOB errors, where the weights correspond to the
observed proportion of events for each cause in the training data.
Arguments
causeInteger (event type) or
"mean"(default). Use a specific event type to retrieve its OOB error, or"mean"to compute the weighted average across causes.
Examples
# Define the Learner
learner = lrn("cmprsk.rfsrc", importance = "TRUE")
print(learner)
#>
#> ── <LearnerCompRisksRandomForestSRC> (cmprsk.rfsrc): Competing Risk Survival For
#> • Model: -
#> • Parameters: importance=TRUE, ntime=0, cores=1
#> • Packages: mlr3, mlr3cmprsk, mlr3extralearners, and randomForestSRC
#> • Predict Types: [cif]
#> • Feature Types: logical, integer, numeric, and factor
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, missings, oob_error, selected_features, and weights
#> • Other settings: use_weights = 'use'
# Define a Task
task = tsk("pbc")
# Stratification based on event
task$set_col_roles(cols = "status", add_to = "stratum")
# Create train and test set
ids = partition(task)
# Train the learner on the training ids
learner$train(task, row_ids = ids$train)
print(learner$model)
#> Sample size: 184
#> Number of events: 1=12, 2=74
#> Number of trees: 500
#> Forest terminal node size: 15
#> Average no. of terminal nodes: 9.184
#> No. of variables tried at each split: 5
#> Total no. of variables: 17
#> Resampling used to grow trees: swor
#> Resample size used to grow trees: 116
#> Analysis: RSF
#> Family: surv-CR
#> Splitting rule: logrankCR *random*
#> Number of random split points: 10
#> (OOB) Requested performance error: 0.23297665, 0.16906124
#>
print(learner$importance(cause = 1)) # VIMP for cause = 1
#> bili copper chol protime edema age
#> 0.300295376 0.114653340 0.062367330 0.057372859 0.049580303 0.039922886
#> sex ascites ast stage hepato spiders
#> 0.036516979 0.029129541 0.014952769 0.010430054 0.004984822 0.004905700
#> trig trt albumin alk.phos platelet
#> -0.001178774 -0.001842672 -0.002866438 -0.004129264 -0.006712285
print(learner$importance(cause = 2)) # VIMP for cause = 2
#> bili ascites edema copper albumin
#> 2.190077e-01 1.449591e-01 1.152786e-01 7.682099e-02 4.322796e-02
#> age protime chol platelet stage
#> 4.142208e-02 3.391157e-02 2.981750e-02 1.607354e-02 8.540843e-03
#> ast sex trig alk.phos spiders
#> 7.937134e-03 6.180943e-03 3.247560e-03 1.486223e-03 -8.327894e-05
#> hepato trt
#> -2.452516e-04 -2.969326e-04
print(learner$oob_error()) # weighted-mean across causes
#> [1] 0.1779797
# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)
# Score the predictions
predictions$score()
#> cmprsk.auc
#> 0.8239207