Extreme Gradient Boosting Survival Learner
mlr_learners_surv.xgboost.Rd
eXtreme Gradient Boosting regression.
Calls xgboost::xgb.train()
from package xgboost.
Note: We strongly advise to use the separate Cox and AFT xgboost survival learners since they represent two very distinct survival modeling methods and we offer more prediction types in the respective learners compared to the ones available here. This learner will be deprecated in the future.
Note
To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.
Initial parameter values
nrounds
is initialized to 1.nthread
is initialized to 1 to avoid conflicts with parallelization via future.verbose
is initialized to 0.objective
is initialized tosurvival:cox
for survival analysis.
Dictionary
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn()
:
Meta Information
Task type: “surv”
Predict Types: “crank”, “lp”
Feature Types: “integer”, “numeric”
Required Packages: mlr3, mlr3proba, mlr3extralearners, xgboost
Parameters
Id | Type | Default | Levels | Range |
aft_loss_distribution | character | normal | normal, logistic, extreme | - |
aft_loss_distribution_scale | numeric | - | \((-\infty, \infty)\) | |
alpha | numeric | 0 | \([0, \infty)\) | |
base_score | numeric | 0.5 | \((-\infty, \infty)\) | |
booster | character | gbtree | gbtree, gblinear, dart | - |
callbacks | untyped | list() | - | |
colsample_bylevel | numeric | 1 | \([0, 1]\) | |
colsample_bynode | numeric | 1 | \([0, 1]\) | |
colsample_bytree | numeric | 1 | \([0, 1]\) | |
disable_default_eval_metric | logical | FALSE | TRUE, FALSE | - |
early_stopping_rounds | integer | NULL | \([1, \infty)\) | |
early_stopping_set | character | none | none, train, test | - |
eta | numeric | 0.3 | \([0, 1]\) | |
feature_selector | character | cyclic | cyclic, shuffle, random, greedy, thrifty | - |
feval | untyped | NULL | - | |
gamma | numeric | 0 | \([0, \infty)\) | |
grow_policy | character | depthwise | depthwise, lossguide | - |
interaction_constraints | untyped | - | - | |
iterationrange | untyped | - | - | |
lambda | numeric | 1 | \([0, \infty)\) | |
lambda_bias | numeric | 0 | \([0, \infty)\) | |
max_bin | integer | 256 | \([2, \infty)\) | |
max_delta_step | numeric | 0 | \([0, \infty)\) | |
max_depth | integer | 6 | \([0, \infty)\) | |
max_leaves | integer | 0 | \([0, \infty)\) | |
maximize | logical | NULL | TRUE, FALSE | - |
min_child_weight | numeric | 1 | \([0, \infty)\) | |
missing | numeric | NA | \((-\infty, \infty)\) | |
monotone_constraints | integer | 0 | \([-1, 1]\) | |
normalize_type | character | tree | tree, forest | - |
nrounds | integer | - | \([1, \infty)\) | |
nthread | integer | 1 | \([1, \infty)\) | |
num_parallel_tree | integer | 1 | \([1, \infty)\) | |
objective | character | survival:cox | survival:cox, survival:aft | - |
one_drop | logical | FALSE | TRUE, FALSE | - |
print_every_n | integer | 1 | \([1, \infty)\) | |
process_type | character | default | default, update | - |
rate_drop | numeric | 0 | \([0, 1]\) | |
refresh_leaf | logical | TRUE | TRUE, FALSE | - |
sampling_method | character | uniform | uniform, gradient_based | - |
sample_type | character | uniform | uniform, weighted | - |
save_name | untyped | - | - | |
save_period | integer | - | \([0, \infty)\) | |
scale_pos_weight | numeric | 1 | \((-\infty, \infty)\) | |
seed_per_iteration | logical | FALSE | TRUE, FALSE | - |
skip_drop | numeric | 0 | \([0, 1]\) | |
strict_shape | logical | FALSE | TRUE, FALSE | - |
subsample | numeric | 1 | \([0, 1]\) | |
top_k | integer | 0 | \([0, \infty)\) | |
tree_method | character | auto | auto, exact, approx, hist, gpu_hist | - |
tweedie_variance_power | numeric | 1.5 | \([1, 2]\) | |
updater | untyped | - | - | |
verbose | integer | 1 | \([0, 2]\) | |
watchlist | untyped | NULL | - | |
xgb_model | untyped | - | - | |
device | untyped | - | - |
Early stopping
Early stopping can be used to find the optimal number of boosting rounds.
The early_stopping_set
parameter controls which set is used to monitor the
performance.
By default, early_stopping_set = "none"
which disables early stopping.
Set early_stopping_set = "test"
to monitor the performance of the model on
the test set while training.
The test set for early stopping can be set with the "test"
row role in the
mlr3::Task.
Additionally, the range must be set in which the performance must increase
with early_stopping_rounds
and the maximum number of boosting rounds with
nrounds
.
While resampling, the test set is automatically applied from the mlr3::Resampling.
Not that using the test set for early stopping can potentially bias the
performance scores.
References
Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785--794. ACM. doi:10.1145/2939672.2939785 .
See also
as.data.table(mlr_learners)
for a table of available Learners in the running session (depending on the loaded packages).Chapter in the mlr3book: https://mlr3book.mlr-org.com/basics.html#learners
mlr3learners for a selection of recommended learners.
mlr3cluster for unsupervised clustering learners.
mlr3pipelines to combine learners with pre- and postprocessing steps.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Super classes
mlr3::Learner
-> mlr3proba::LearnerSurv
-> LearnerSurvXgboost
Methods
Method importance()
The importance scores are calculated with xgboost::xgb.importance()
.
Returns
Named numeric()
.
Examples
learner = mlr3::lrn("surv.xgboost")
#> Warning: 'surv.xgboost' will be deprecated in the future. Use 'surv.xgboost.cox' or 'surv.xgboost.aft' learners instead.
print(learner)
#> <LearnerSurvXgboost:surv.xgboost>: Gradient Boosting
#> * Model: -
#> * Parameters: nrounds=1, nthread=1, verbose=0, early_stopping_set=none
#> * Packages: mlr3, mlr3proba, mlr3extralearners, xgboost
#> * Predict Types: [crank], lp
#> * Feature Types: integer, numeric
#> * Properties: importance, missings, weights
# available parameters:
learner$param_set$ids()
#> [1] "aft_loss_distribution" "aft_loss_distribution_scale"
#> [3] "alpha" "base_score"
#> [5] "booster" "callbacks"
#> [7] "colsample_bylevel" "colsample_bynode"
#> [9] "colsample_bytree" "disable_default_eval_metric"
#> [11] "early_stopping_rounds" "early_stopping_set"
#> [13] "eta" "feature_selector"
#> [15] "feval" "gamma"
#> [17] "grow_policy" "interaction_constraints"
#> [19] "iterationrange" "lambda"
#> [21] "lambda_bias" "max_bin"
#> [23] "max_delta_step" "max_depth"
#> [25] "max_leaves" "maximize"
#> [27] "min_child_weight" "missing"
#> [29] "monotone_constraints" "normalize_type"
#> [31] "nrounds" "nthread"
#> [33] "num_parallel_tree" "objective"
#> [35] "one_drop" "print_every_n"
#> [37] "process_type" "rate_drop"
#> [39] "refresh_leaf" "sampling_method"
#> [41] "sample_type" "save_name"
#> [43] "save_period" "scale_pos_weight"
#> [45] "seed_per_iteration" "skip_drop"
#> [47] "strict_shape" "subsample"
#> [49] "top_k" "tree_method"
#> [51] "tweedie_variance_power" "updater"
#> [53] "verbose" "watchlist"
#> [55] "xgb_model" "device"