Boosted Generalized Linear Survival Learner
mlr_learners_surv.glmboost.Rd
Fits a generalized linear survival model using a boosting algorithm.
Calls mboost::glmboost()
from mboost.
Meta Information
Task type: “surv”
Predict Types: “crank”, “distr”, “lp”
Feature Types: “logical”, “integer”, “numeric”, “factor”
Required Packages: mlr3, mlr3proba, mlr3extralearners, mboost, pracma
Parameters
Id | Type | Default | Levels | Range |
offset | numeric | - | \((-\infty, \infty)\) | |
family | character | coxph | coxph, weibull, loglog, lognormal, gehan, cindex, custom | - |
custom.family | untyped | - | - | |
nuirange | untyped | c(0, 100) | - | |
center | logical | TRUE | TRUE, FALSE | - |
mstop | integer | 100 | \([0, \infty)\) | |
nu | numeric | 0.1 | \([0, 1]\) | |
risk | character | inbag | inbag, oobag, none | - |
oobweights | untyped | NULL | - | |
stopintern | logical | FALSE | TRUE, FALSE | - |
trace | logical | FALSE | TRUE, FALSE | - |
sigma | numeric | 0.1 | \([0, 1]\) | |
ipcw | untyped | 1 | - | |
na.action | untyped | stats::na.omit | - | |
contrasts.arg | untyped | - | - |
Prediction types
This learner returns two to three prediction types:
lp
: a vector containing the linear predictors (relative risk scores), where each score corresponds to a specific test observation. Calculated usingmboost::predict.glmboost()
. If thefamily
parameter is not"coxph"
,-lp
is returned, since non-coxph families represent AFT-style distributions where lowerlp
values indicate higher risk.crank
: same aslp
.distr
: a survival matrix in two dimensions, where observations are represented in rows and time points in columns. Calculated usingmboost::survFit()
. This prediction type is present only when thefamily
distribution parameter is equal to"coxph"
(default). By default the Breslow estimator is used for computing the baseline hazard.
References
Bühlmann, Peter, Yu, Bin (2003). “Boosting with the L 2 loss: regression and classification.” Journal of the American Statistical Association, 98(462), 324–339.
See also
as.data.table(mlr_learners)
for a table of available Learners in the running session (depending on the loaded packages).Chapter in the mlr3book: https://mlr3book.mlr-org.com/basics.html#learners
mlr3learners for a selection of recommended learners.
mlr3cluster for unsupervised clustering learners.
mlr3pipelines to combine learners with pre- and postprocessing steps.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Super classes
mlr3::Learner
-> mlr3proba::LearnerSurv
-> LearnerSurvGLMBoost
Methods
Method importance()
Importance scores are extracted with the function mboost::varimp()
and
represent a feature's individual contribution to the risk reduction per
boosting step of the fitted model.
The higher the risk reduction, the larger the feature importance.
Note: Importance is supported only for datasets with numeric
features, as the presence of factors with multiple levels makes it
difficult to get the original feature names.
Returns
Named numeric()
.
Method selected_features()
Selected features are extracted with the function mboost::coef.glmboost()
which by default returns features with non-zero coefficients and for the
final number of boosting iterations.
Note: Selected features can be retrieved only for datasets with
numeric
features, as the presence of factors with multiple levels makes
it difficult to get the original feature names.
Examples
# Define the Learner
learner = mlr3::lrn("surv.glmboost")
print(learner)
#> <LearnerSurvGLMBoost:surv.glmboost>: Boosted Generalized Linear Model
#> * Model: -
#> * Parameters: family=coxph
#> * Packages: mlr3, mlr3proba, mlr3extralearners, mboost, pracma
#> * Predict Types: [crank], distr, lp
#> * Feature Types: logical, integer, numeric, factor
#> * Properties: importance, selected_features, weights
# Define a Task
task = mlr3::tsk("grace")
# Create train and test set
ids = mlr3::partition(task)
# Train the learner on the training ids
learner$train(task, row_ids = ids$train)
#> Warning: model with centered covariates does not contain intercept
print(learner$model)
#>
#> Generalized Linear Models Fitted via Gradient Boosting
#>
#> Call:
#> glmboost.matrix(x = as.matrix(task$data(cols = task$feature_names)), y = task$truth(), family = family)
#>
#>
#> Cox Partial Likelihood
#>
#> Loss function:
#>
#> Number of boosting iterations: mstop = 100
#> Step size: 0.1
#> Offset: 0
#>
#> Coefficients:
#> age revasc revascdays
#> 0.017849006 -0.944969504 -0.007633255
#> attr(,"offset")
#> [1] 0
#>
print(learner$importance())
#> revascdays revasc age los stchange sysbp
#> 0.06936821 0.05656239 0.03622781 0.00000000 0.00000000 0.00000000
# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)
# Score the predictions
predictions$score()
#> surv.cindex
#> 0.8048688