Extreme Gradient Boosting Cox Survival Learner

eXtreme Gradient Boosting regression using a Cox Proportional Hazards objective. Calls xgboost::xgb.train() from package xgboost with objective set to survival:cox and eval_metric to cox-nloglik.

Note

To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.

Prediction types

Three types of prediction are returned for this learner:

lp: a vector of linear predictors (relative risk scores), one per observation.
crank: same as lp.
distr: a survival matrix in two dimensions, where observations are represented in rows and time points in columns. By default, the Breslow estimator is used via mlr3proba::breslow().

Saving this learner

In order to save a LearnerSurvXgboostCox for later usage, it is necessary to call the $marshal() method on the Learner before writing it to disk, as the object will otherwise not be saved correctly. After loading a marshaled LearnerSurvXgboostCox into R again, you then need to call $unmarshal() to transform it into a useable state.

Initial parameter values

nrounds is initialized to 1000.
nthread is initialized to 1 to avoid conflicts with parallelization via future.
verbose is initialized to 0.

Dictionary

This Learner can be instantiated via lrn():

lrn("surv.xgboost.cox")

Meta Information

Task type: “surv”
Predict Types: “crank”, “distr”, “lp”
Feature Types: “integer”, “numeric”
Required Packages: mlr3, mlr3proba, mlr3extralearners, xgboost

Parameters

Id	Type	Default	Levels	Range
alpha	numeric	0		$[0, \infty)$
base_score	numeric	0.5		$(-\infty, \infty)$
booster	character	gbtree	gbtree, gblinear, dart	-
callbacks	untyped	list()		-
colsample_bylevel	numeric	1		$[0, 1]$
colsample_bynode	numeric	1		$[0, 1]$
colsample_bytree	numeric	1		$[0, 1]$
disable_default_eval_metric	logical	FALSE	TRUE, FALSE	-
early_stopping_rounds	integer	NULL		$[1, \infty)$
eta	numeric	0.3		$[0, 1]$
feature_selector	character	cyclic	cyclic, shuffle, random, greedy, thrifty	-
feval	untyped	NULL		-
gamma	numeric	0		$[0, \infty)$
grow_policy	character	depthwise	depthwise, lossguide	-
interaction_constraints	untyped	-		-
iterationrange	untyped	-		-
lambda	numeric	1		$[0, \infty)$
lambda_bias	numeric	0		$[0, \infty)$
max_bin	integer	256		$[2, \infty)$
max_delta_step	numeric	0		$[0, \infty)$
max_depth	integer	6		$[0, \infty)$
max_leaves	integer	0		$[0, \infty)$
maximize	logical	NULL	TRUE, FALSE	-
min_child_weight	numeric	1		$[0, \infty)$
missing	numeric	NA		$(-\infty, \infty)$
monotone_constraints	integer	0		$[-1, 1]$
normalize_type	character	tree	tree, forest	-
nrounds	integer	-		$[1, \infty)$
nthread	integer	1		$[1, \infty)$
num_parallel_tree	integer	1		$[1, \infty)$
one_drop	logical	FALSE	TRUE, FALSE	-
print_every_n	integer	1		$[1, \infty)$
process_type	character	default	default, update	-
rate_drop	numeric	0		$[0, 1]$
refresh_leaf	logical	TRUE	TRUE, FALSE	-
sampling_method	character	uniform	uniform, gradient_based	-
sample_type	character	uniform	uniform, weighted	-
save_name	untyped	-		-
save_period	integer	-		$[0, \infty)$
scale_pos_weight	numeric	1		$(-\infty, \infty)$
seed_per_iteration	logical	FALSE	TRUE, FALSE	-
skip_drop	numeric	0		$[0, 1]$
strict_shape	logical	FALSE	TRUE, FALSE	-
subsample	numeric	1		$[0, 1]$
top_k	integer	0		$[0, \infty)$
tree_method	character	auto	auto, exact, approx, hist, gpu_hist	-
tweedie_variance_power	numeric	1.5		$[1, 2]$
updater	untyped	-		-
verbose	integer	1		$[0, 2]$
watchlist	untyped	NULL		-
xgb_model	untyped	-		-
device	untyped	-		-

Early stopping

Early stopping can be used to find the optimal number of boosting rounds. The early_stopping_set parameter controls which set is used to monitor the performance. By default, early_stopping_set = "none" which disables early stopping. Set early_stopping_set = "test" to monitor the performance of the model on the test set while training. The test set for early stopping can be set with the "test" row role in the mlr3::Task. Additionally, the range must be set in which the performance must increase with early_stopping_rounds and the maximum number of boosting rounds with nrounds. While resampling, the test set is automatically applied from the mlr3::Resampling. Not that using the test set for early stopping can potentially bias the performance scores.

References

Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785–794. ACM. doi:10.1145/2939672.2939785 .

Author

bblodfon

Super classes

mlr3::Learner -> mlr3proba::LearnerSurv -> LearnerSurvXgboostCox

Active bindings

internal_valid_scores: The last observation of the validation scores for all metrics. Extracted from model$evaluation_log
internal_tuned_values: Returns the early stopped iterations if early_stopping_rounds was set during training.
validate: How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".
marshaled: (logical(1))
Whether the learner has been marshaled.

Methods

Inherited methods

Method `new()`

Creates a new instance of this R6 class.

Usage

LearnerSurvXgboostCox$new()

Method `importance()`

The importance scores are calculated with xgboost::xgb.importance().

Usage

LearnerSurvXgboostCox$importance()

Returns

Named numeric().

Method `marshal()`

Marshal the learner's model.

Usage

LearnerSurvXgboostCox$marshal(...)

Arguments

...: (any)
Additional arguments passed to mlr3::marshal_model().

Method `unmarshal()`

Unmarshal the learner's model.

Usage

LearnerSurvXgboostCox$unmarshal(...)

Arguments

...: (any)
Additional arguments passed to mlr3::unmarshal_model().

Method `clone()`

The objects of this class are cloneable with this method.

Usage

LearnerSurvXgboostCox$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

# Define the Learner
learner = lrn("surv.xgboost.cox")
print(learner)
#> 
#> ── <LearnerSurvXgboostCox> (surv.xgboost.cox): Extreme Gradient Boosting Cox ───
#> • Model: -
#> • Parameters: nrounds=1000, nthread=1, verbose=0
#> • Validate: NULL
#> • Packages: mlr3, mlr3proba, mlr3extralearners, and xgboost
#> • Predict Types: [crank], distr, and lp
#> • Feature Types: integer and numeric
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, internal_tuning, marshal, missings, validation, and
#> weights
#> • Other settings: use_weights = 'use'

# Define a Task
task = tsk("grace")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#> $model
#> ##### xgb.Booster
#> raw: 882.4 Kb 
#> call:
#>   xgboost::xgb.train(data = data, nrounds = 1000L, verbose = 0L, 
#>     nthread = 1L, objective = "survival:cox", eval_metric = "cox-nloglik")
#> params (as set within xgb.train):
#>   nthread = "1", objective = "survival:cox", eval_metric = "cox-nloglik", validate_parameters = "TRUE"
#> xgb.attributes:
#>   niter
#> # of features: 6 
#> niter: 1000
#> nfeatures : 6 
#> 
#> $train_data
#> xgb.DMatrix  dim: 670 x 6  info: label  colnames: yes
#> 
#> attr(,"class")
#> [1] "xgboost_cox_model"
print(learner$importance())
#> revascdays        los     revasc        age      sysbp   stchange 
#> 0.63790737 0.15864047 0.10020505 0.04905887 0.04734136 0.00684688 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> surv.cindex 
#>    0.826735

Note

Prediction types

Saving this learner

Initial parameter values

Dictionary

Meta Information

Parameters

Early stopping

References

See also

Author

Super classes

Active bindings

Methods

Public methods

Method new()

Usage

Method importance()

Usage

Returns

Method marshal()

Usage

Arguments

Method unmarshal()

Usage

Arguments

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `importance()`

Method `marshal()`

Method `unmarshal()`

Method `clone()`