Skip to contents

eXtreme Gradient Boosting regression using a Cox Proportional Hazards objective. Calls xgboost::xgb.train() from package xgboost with objective set to survival:cox and eval_metric to cox-nloglik.

Note

To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.

Prediction types

Three types of prediction are returned for this learner:

  1. lp: a vector of linear predictors (relative risk scores), one per observation.

  2. crank: same as lp.

  3. distr: a survival matrix in two dimensions, where observations are represented in rows and time points in columns. By default, the Breslow estimator is used via mlr3proba::breslow().

Initial parameter values

  • nrounds is initialized to 1000.

  • nthread is initialized to 1 to avoid conflicts with parallelization via future.

  • verbose and verbosity are both initialized to 0.

Dictionary

This Learner can be instantiated via lrn():

lrn("surv.xgboost.cox")

Meta Information

Parameters

IdTypeDefaultLevelsRange
alphanumeric0\([0, \infty)\)
base_scorenumeric0.5\((-\infty, \infty)\)
boostercharactergbtreegbtree, gblinear, dart-
callbacksuntypedlist()-
seedinteger-\((-\infty, \infty)\)
colsample_bylevelnumeric1\([0, 1]\)
colsample_bynodenumeric1\([0, 1]\)
colsample_bytreenumeric1\([0, 1]\)
disable_default_eval_metriclogicalFALSETRUE, FALSE-
early_stopping_roundsintegerNULL\([1, \infty)\)
evalsuntypedNULL-
learning_ratenumeric0.3\([0, 1]\)
feature_selectorcharactercycliccyclic, shuffle, random, greedy, thrifty-
gammanumeric0\([0, \infty)\)
grow_policycharacterdepthwisedepthwise, lossguide-
interaction_constraintsuntyped--
iterationrangeuntyped--
lambdanumeric1\([0, \infty)\)
max_bininteger256\([2, \infty)\)
max_delta_stepnumeric0\([0, \infty)\)
max_depthinteger6\([0, \infty)\)
max_leavesinteger0\([0, \infty)\)
maximizelogicalNULLTRUE, FALSE-
min_child_weightnumeric1\([0, \infty)\)
monotone_constraintsinteger0\([-1, 1]\)
normalize_typecharactertreetree, forest-
nroundsinteger-\([1, \infty)\)
nthreadinteger-\([1, \infty)\)
num_parallel_treeinteger1\([1, \infty)\)
one_droplogicalFALSETRUE, FALSE-
print_every_ninteger1\([1, \infty)\)
rate_dropnumeric0\([0, 1]\)
refresh_leaflogicalTRUETRUE, FALSE-
sampling_methodcharacteruniformuniform, gradient_based-
sample_typecharacteruniformuniform, weighted-
save_nameuntyped--
save_periodinteger-\([0, \infty)\)
scale_pos_weightnumeric1\((-\infty, \infty)\)
seed_per_iterationlogicalFALSETRUE, FALSE-
skip_dropnumeric0\([0, 1]\)
use_rmmlogicalFALSETRUE, FALSE-
max_cached_hist_nodeintegerNULL\((-\infty, \infty)\)
extmem_single_pagelogicalFALSETRUE, FALSE-
max_cat_to_onehotinteger4\((-\infty, \infty)\)
max_cat_thresholdinteger64\((-\infty, \infty)\)
subsamplenumeric1\([0, 1]\)
top_kinteger0\([0, \infty)\)
tree_methodcharacterautoauto, exact, approx, hist, gpu_hist-
updateruntyped--
verboseinteger-\([0, 2]\)
verbosityinteger-\([0, 2]\)
xgb_modeluntyped--
deviceuntyped--
missingnumericNA\((-\infty, \infty)\)
validate_featureslogicalTRUETRUE, FALSE-

Early stopping

Early stopping can be used to find the optimal number of boosting rounds. The early_stopping_set parameter controls which set is used to monitor the performance. By default, early_stopping_set = "none" which disables early stopping. Set early_stopping_set = "test" to monitor the performance of the model on the test set while training. The test set for early stopping can be set with the "test" row role in the mlr3::Task. Additionally, the range must be set in which the performance must increase with early_stopping_rounds and the maximum number of boosting rounds with nrounds. While resampling, the test set is automatically applied from the mlr3::Resampling. Not that using the test set for early stopping can potentially bias the performance scores.

References

Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785–794. ACM. doi:10.1145/2939672.2939785 .

See also

Author

bblodfon

Super classes

mlr3::Learner -> mlr3proba::LearnerSurv -> LearnerSurvXgboostCox

Active bindings

internal_valid_scores

(named list() or NULL) Validation metrics extracted from the xgboost model's evaluation_log. If early stopping is enabled, the scores correspond to the best_iteration selected by early stopping. Otherwise, the scores are taken from the final boosting round (nrounds).

internal_tuned_values

(named list() or NULL) If early stopping is activated, this returns a list with the early stopped iterations (nrounds), which is extracted from the best iteration of the model. Otherwise NULL.

validate

How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

model

(any)
The fitted model. Only available after $train() has been called.

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.

Usage


Method importance()

The importance scores are calculated with xgboost::xgb.importance().

Usage

LearnerSurvXgboostCox$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerSurvXgboostCox$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner
learner = lrn("surv.xgboost.cox")
print(learner)
#> 
#> ── <LearnerSurvXgboostCox> (surv.xgboost.cox): Extreme Gradient Boosting Cox ───
#> • Model: -
#> • Parameters: nrounds=1000, nthread=1, verbose=0, verbosity=0
#> • Validate: NULL
#> • Packages: mlr3, mlr3proba, mlr3extralearners, and xgboost
#> • Predict Types: [crank], distr, and lp
#> • Feature Types: integer and numeric
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, internal_tuning, missings, validation, and weights
#> • Other settings: use_weights = 'use'

# Define a Task
task = tsk("grace")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#> ##### xgb.Booster
#> call:
#>   xgboost::xgb.train(params = pv[names(pv) %in% formalArgs(xgboost::xgb.params)], 
#>     data = data, nrounds = pv$nrounds, evals = pv$evals, verbose = pv$verbose, 
#>     print_every_n = pv$print_every_n, early_stopping_rounds = pv$early_stopping_rounds, 
#>     maximize = pv$maximize, save_period = pv$save_period, save_name = pv$save_name, 
#>     callbacks = pv$callbacks %??% list())
#> # of features: 6 
#> # of rounds:  1000 
#> callbacks:
#>    lp_train, times, status 
print(learner$importance())
#>      sysbp revascdays        los     revasc        age   stchange 
#> 0.48659951 0.23907252 0.09871223 0.08322981 0.08089301 0.01149292 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> surv.cindex 
#>   0.8682978