Skip to contents

Gradient boosting algorithm. Calls lightgbm::lightgbm() from lightgbm. The list of parameters can be found here and in the documentation of lightgbm::lgb.train().

Dictionary

This Learner can be instantiated via lrn():

lrn("classif.lightgbm")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “logical”, “integer”, “numeric”, “factor”

  • Required Packages: mlr3, mlr3extralearners, lightgbm

Parameters

IdTypeDefaultLevelsRange
objectivecharacter-binary, multiclass, multiclassova-
evaluntyped--
verboseinteger1\((-\infty, \infty)\)
recordlogicalTRUETRUE, FALSE-
eval_freqinteger1\([1, \infty)\)
callbacksuntyped--
reset_datalogicalFALSETRUE, FALSE-
boostingcharactergbdtgbdt, rf, dart, goss-
linear_treelogicalFALSETRUE, FALSE-
learning_ratenumeric0.1\([0, \infty)\)
num_leavesinteger31\([1, 131072]\)
tree_learnercharacterserialserial, feature, data, voting-
num_threadsinteger0\([0, \infty)\)
device_typecharactercpucpu, gpu-
seedinteger-\((-\infty, \infty)\)
deterministiclogicalFALSETRUE, FALSE-
data_sample_strategycharacterbaggingbagging, goss-
force_col_wiselogicalFALSETRUE, FALSE-
force_row_wiselogicalFALSETRUE, FALSE-
histogram_pool_sizenumeric-1\((-\infty, \infty)\)
max_depthinteger-1\((-\infty, \infty)\)
min_data_in_leafinteger20\([0, \infty)\)
min_sum_hessian_in_leafnumeric0.001\([0, \infty)\)
bagging_fractionnumeric1\([0, 1]\)
pos_bagging_fractionnumeric1\([0, 1]\)
neg_bagging_fractionnumeric1\([0, 1]\)
bagging_freqinteger0\([0, \infty)\)
bagging_seedinteger3\((-\infty, \infty)\)
bagging_by_querylogicalFALSETRUE, FALSE-
feature_fractionnumeric1\([0, 1]\)
feature_fraction_bynodenumeric1\([0, 1]\)
feature_fraction_seedinteger2\((-\infty, \infty)\)
extra_treeslogicalFALSETRUE, FALSE-
extra_seedinteger6\((-\infty, \infty)\)
max_delta_stepnumeric0\((-\infty, \infty)\)
lambda_l1numeric0\([0, \infty)\)
lambda_l2numeric0\([0, \infty)\)
linear_lambdanumeric0\([0, \infty)\)
min_gain_to_splitnumeric0\([0, \infty)\)
drop_ratenumeric0.1\([0, 1]\)
max_dropinteger50\((-\infty, \infty)\)
skip_dropnumeric0.5\([0, 1]\)
xgboost_dart_modelogicalFALSETRUE, FALSE-
uniform_droplogicalFALSETRUE, FALSE-
drop_seedinteger4\((-\infty, \infty)\)
top_ratenumeric0.2\([0, 1]\)
other_ratenumeric0.1\([0, 1]\)
min_data_per_groupinteger100\([1, \infty)\)
max_cat_thresholdinteger32\([1, \infty)\)
cat_l2numeric10\([0, \infty)\)
cat_smoothnumeric10\([0, \infty)\)
max_cat_to_onehotinteger4\([1, \infty)\)
top_kinteger20\([1, \infty)\)
monotone_constraintsuntypedNULL-
monotone_constraints_methodcharacterbasicbasic, intermediate, advanced-
monotone_penaltynumeric0\([0, \infty)\)
feature_contriuntypedNULL-
forcedsplits_filenameuntyped""-
refit_decay_ratenumeric0.9\([0, 1]\)
cegb_tradeoffnumeric1\([0, \infty)\)
cegb_penalty_splitnumeric0\([0, \infty)\)
cegb_penalty_feature_lazyuntyped--
cegb_penalty_feature_coupleduntyped--
path_smoothnumeric0\([0, \infty)\)
interaction_constraintsuntyped--
use_quantized_gradlogicalTRUETRUE, FALSE-
num_grad_quant_binsinteger4\((-\infty, \infty)\)
quant_train_renew_leaflogicalFALSETRUE, FALSE-
stochastic_roundinglogicalTRUETRUE, FALSE-
serializablelogicalTRUETRUE, FALSE-
max_bininteger255\([2, \infty)\)
max_bin_by_featureuntypedNULL-
min_data_in_bininteger3\([1, \infty)\)
bin_construct_sample_cntinteger200000\([1, \infty)\)
data_random_seedinteger1\((-\infty, \infty)\)
is_enable_sparselogicalTRUETRUE, FALSE-
enable_bundlelogicalTRUETRUE, FALSE-
use_missinglogicalTRUETRUE, FALSE-
zero_as_missinglogicalFALSETRUE, FALSE-
feature_pre_filterlogicalTRUETRUE, FALSE-
pre_partitionlogicalFALSETRUE, FALSE-
two_roundlogicalFALSETRUE, FALSE-
forcedbins_filenameuntyped""-
is_unbalancelogicalFALSETRUE, FALSE-
scale_pos_weightnumeric1\([0, \infty)\)
sigmoidnumeric1\([0, \infty)\)
boost_from_averagelogicalTRUETRUE, FALSE-
eval_atuntyped1:5-
multi_error_top_kinteger1\([1, \infty)\)
auc_mu_weightsuntypedNULL-
num_machinesinteger1\([1, \infty)\)
local_listen_portinteger12400\([1, \infty)\)
time_outinteger120\([1, \infty)\)
machinesuntyped""-
gpu_platform_idinteger-1\((-\infty, \infty)\)
gpu_device_idinteger-1\((-\infty, \infty)\)
gpu_use_dplogicalFALSETRUE, FALSE-
num_gpuinteger1\([1, \infty)\)
start_iteration_predictinteger0\((-\infty, \infty)\)
num_iteration_predictinteger-1\((-\infty, \infty)\)
pred_early_stoplogicalFALSETRUE, FALSE-
pred_early_stop_freqinteger10\((-\infty, \infty)\)
pred_early_stop_marginnumeric10\((-\infty, \infty)\)
num_iterationsinteger100\([1, \infty)\)
early_stopping_roundsinteger-\([1, \infty)\)
early_stopping_min_deltanumeric-\([0, \infty)\)
first_metric_onlylogicalFALSETRUE, FALSE-

Initial parameter values

  • num_threads:

    • Actual default: 0L

    • Initial value: 1L

    • Reason for change: Prevents accidental conflicts with future.

  • verbose:

    • Actual default: 1L

    • Initial value: -1L

    • Reason for change: Prevents accidental conflicts with mlr messaging system.

  • objective:

    • Depends on the task: if binary classification, then this parameter is set to "binary", otherwise "multiclasss" and cannot be changed.

Custom mlr3 parameters

  • num_class: This parameter is automatically inferred for multiclass tasks and does not have to be set.

Early Stopping and Validation

Early stopping can be used to find the optimal number of boosting rounds. Set early_stopping_rounds to an integer value to monitor the performance of the model on the validation set while training. For information on how to configure the validation set, see the Validation section of mlr3::Learner. The internal validation measure can be set the eval parameter which should be a list of mlr3::Measures, functions, or strings for the internal lightgbm measures. If first_metric_only = FALSE (default), the learner stops when any metric fails to improve.

References

Ke, Guolin, Meng, Qi, Finley, Thomas, Wang, Taifeng, Chen, Wei, Ma, Weidong, Ye, Qiwei, Liu, Tie-Yan (2017). “Lightgbm: A highly efficient gradient boosting decision tree.” Advances in neural information processing systems, 30.

See also

Author

kapsner

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifLightGBM

Active bindings

internal_valid_scores

The last observation of the validation scores for all metrics. Extracted from model$evaluation_log

internal_tuned_values

Returns the early stopped iterations if early_stopping_rounds was set during training.

validate

How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.


Method importance()

The importance scores are extracted from lbg.importance.

Usage

LearnerClassifLightGBM$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerClassifLightGBM$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner
learner = mlr3::lrn("classif.lightgbm")
print(learner)
#> <LearnerClassifLightGBM:classif.lightgbm>: Gradient Boosting
#> * Model: -
#> * Parameters: verbose=-1, num_threads=1
#> * Validate: NULL
#> * Packages: mlr3, mlr3extralearners, lightgbm
#> * Predict Types:  response, [prob]
#> * Feature Types: logical, integer, numeric, factor
#> * Properties: hotstart_forward, importance, internal_tuning, missings,
#>   multiclass, twoclass, validation, weights

# Define a Task
task = mlr3::tsk("sonar")

# Create train and test set
ids = mlr3::partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#> LightGBM Model (100 trees)
#> Objective: binary
#> Fitted to dataset with 60 columns
print(learner$importance())
#>          V45          V11          V21          V12          V43          V27 
#> 9.030398e-02 7.806441e-02 7.202907e-02 6.938734e-02 6.815579e-02 6.165502e-02 
#>          V49          V36          V37          V58          V16          V20 
#> 5.983758e-02 4.672284e-02 4.063901e-02 4.018031e-02 3.773376e-02 3.592796e-02 
#>          V48           V4          V46          V51           V9          V23 
#> 2.896876e-02 2.474185e-02 2.282995e-02 2.127623e-02 2.018166e-02 1.963736e-02 
#>          V60          V50          V28          V31          V59          V29 
#> 1.923485e-02 1.392519e-02 1.343221e-02 1.189798e-02 1.184072e-02 9.084860e-03 
#>          V38          V34          V39          V57           V1          V55 
#> 8.589386e-03 8.172659e-03 6.846225e-03 6.816742e-03 5.308793e-03 4.503854e-03 
#>          V10          V42           V6          V18          V17          V33 
#> 4.396706e-03 4.024956e-03 3.484041e-03 3.471210e-03 3.437801e-03 2.923449e-03 
#>          V44          V26          V53          V40          V24          V13 
#> 2.727881e-03 2.480190e-03 2.152688e-03 2.118121e-03 1.881159e-03 1.597926e-03 
#>           V5          V35          V30          V47          V14           V2 
#> 1.419532e-03 1.392442e-03 1.275271e-03 1.075330e-03 6.686351e-04 5.917142e-04 
#>           V8          V19          V54          V15           V3 
#> 2.799310e-04 2.577071e-04 1.759263e-04 1.436892e-04 9.734881e-05 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> classif.ce 
#> 0.08695652