Skip to contents

eXtreme Gradient Boosting regression using an Accelerated Failure Time objective. Calls xgboost::xgb.train() from package xgboost with objective set to survival:aft and eval_metric to aft-nloglik.

Note

To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.

Prediction types

This learner returns three prediction types:

  1. response: the estimated survival time \(T\) for each test observation.

  2. lp: a vector of linear predictors (relative risk scores), one per observation, estimated as \(-log(T)\). Higher survival time denotes lower risk.

  3. crank: same as lp.

Early stopping

Early stopping can be used to find the optimal number of boosting rounds. The early_stopping_set parameter controls which set is used to monitor the performance. By default, early_stopping_set = "none" which disables early stopping. Set early_stopping_set = "test" to monitor the performance of the model on the test set while training. The test set for early stopping can be set with the "test" row role in the mlr3::Task. Additionally, the range must be set in which the performance must increase with early_stopping_rounds and the maximum number of boosting rounds with nrounds. While resampling, the test set is automatically applied from the mlr3::Resampling. Not that using the test set for early stopping can potentially bias the performance scores.

Dictionary

This Learner can be instantiated via lrn():

lrn("surv.xgboost.aft")

Meta Information

Parameters

IdTypeDefaultLevelsRange
aft_loss_distributioncharacternormalnormal, logistic, extreme-
aft_loss_distribution_scalenumeric-\((-\infty, \infty)\)
alphanumeric0\([0, \infty)\)
base_scorenumeric0.5\((-\infty, \infty)\)
boostercharactergbtreegbtree, gblinear, dart-
callbacksuntypedNULL-
seedinteger-\((-\infty, \infty)\)
colsample_bylevelnumeric1\([0, 1]\)
colsample_bynodenumeric1\([0, 1]\)
colsample_bytreenumeric1\([0, 1]\)
disable_default_eval_metriclogicalFALSETRUE, FALSE-
evalsuntypedNULL-
early_stopping_roundsintegerNULL\([1, \infty)\)
learning_ratenumeric0.3\([0, 1]\)
feature_selectorcharactercycliccyclic, shuffle, random, greedy, thrifty-
gammanumeric0\([0, \infty)\)
grow_policycharacterdepthwisedepthwise, lossguide-
interaction_constraintsuntyped--
iterationrangeuntyped--
lambdanumeric1\([0, \infty)\)
max_bininteger256\([2, \infty)\)
max_delta_stepnumeric0\([0, \infty)\)
max_depthinteger6\([0, \infty)\)
max_leavesinteger0\([0, \infty)\)
maximizelogicalNULLTRUE, FALSE-
min_child_weightnumeric1\([0, \infty)\)
monotone_constraintsinteger0\([-1, 1]\)
normalize_typecharactertreetree, forest-
nroundsinteger-\([1, \infty)\)
nthreadinteger-\([1, \infty)\)
num_parallel_treeinteger1\([1, \infty)\)
one_droplogicalFALSETRUE, FALSE-
print_every_ninteger1\([1, \infty)\)
rate_dropnumeric0\([0, 1]\)
refresh_leaflogicalTRUETRUE, FALSE-
sampling_methodcharacteruniformuniform, gradient_based-
sample_typecharacteruniformuniform, weighted-
save_nameuntyped--
save_periodinteger-\([0, \infty)\)
scale_pos_weightnumeric1\((-\infty, \infty)\)
seed_per_iterationlogicalFALSETRUE, FALSE-
skip_dropnumeric0\([0, 1]\)
use_rmmlogicalFALSETRUE, FALSE-
max_cached_hist_nodeintegerNULL\((-\infty, \infty)\)
extmem_single_pagelogicalFALSETRUE, FALSE-
max_cat_to_onehotinteger4\((-\infty, \infty)\)
max_cat_thresholdinteger64\((-\infty, \infty)\)
subsamplenumeric1\([0, 1]\)
top_kinteger0\([0, \infty)\)
tree_methodcharacterautoauto, exact, approx, hist, gpu_hist-
updateruntyped--
verboseinteger-\([0, 2]\)
verbosityinteger-\([0, 2]\)
xgb_modeluntyped--
deviceuntyped--
missingnumericNA\((-\infty, \infty)\)
validate_featureslogicalTRUETRUE, FALSE-

Initial parameter values

  • nrounds is initialized to 1000.

  • nthread is initialized to 1 to avoid conflicts with parallelization via future.

  • verbose and verbosity are both initialized to 0.

References

Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785–794. ACM. doi:10.1145/2939672.2939785 .

Avinash B, Hyunsu C, Toby H (2022). “Survival Regression with Accelerated Failure Time Model in XGBoost.” Journal of Computational and Graphical Statistics. ISSN 15372715, doi:10.1080/10618600.2022.2067548 .

See also

Author

bblodfon

Super classes

mlr3::Learner -> mlr3proba::LearnerSurv -> LearnerSurvXgboostAFT

Active bindings

internal_valid_scores

The last observation of the validation scores for all metrics. Extracted from model$evaluation_log

internal_tuned_values

Returns the early stopped iterations if early_stopping_rounds was set during training.

validate

How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

model

(any)
The fitted model. Only available after $train() has been called.

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.

Usage


Method importance()

The importance scores are calculated with xgboost::xgb.importance().

Usage

LearnerSurvXgboostAFT$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerSurvXgboostAFT$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner
learner = lrn("surv.xgboost.aft")
print(learner)
#> 
#> ── <LearnerSurvXgboostAFT> (surv.xgboost.aft): Extreme Gradient Boosting AFT ───
#> • Model: -
#> • Parameters: nrounds=1000, nthread=1, verbose=0, verbosity=0
#> • Validate: NULL
#> • Packages: mlr3, mlr3proba, mlr3extralearners, and xgboost
#> • Predict Types: [crank], lp, and response
#> • Feature Types: integer and numeric
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, internal_tuning, missings, validation, and weights
#> • Other settings: use_weights = 'use'

# Define a Task
task = tsk("grace")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#> ##### xgb.Booster
#> call:
#>   xgboost::xgb.train(params = pv[names(pv) %in% formalArgs(xgboost::xgb.params)], 
#>     data = data, nrounds = pv$nrounds, evals = pv$evals, verbose = pv$verbose, 
#>     print_every_n = pv$print_every_n, early_stopping_rounds = pv$early_stopping_rounds, 
#>     maximize = pv$maximize, save_period = pv$save_period, save_name = pv$save_name, 
#>     callbacks = pv$callbacks %??% list())
#> # of features: 6 
#> # of rounds:  1000 
print(learner$importance())
#>  revascdays      revasc         los       sysbp         age    stchange 
#> 0.333152092 0.319069397 0.138087356 0.114430717 0.087400950 0.007859488 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> surv.cindex 
#>   0.8323826