Skip to contents

Gradient boosting algorithm. Calls lightgbm::lightgbm() from lightgbm. The list of parameters can be found here and in the documentation of lightgbm::lgb.train().

Dictionary

This Learner can be instantiated via lrn():

lrn("classif.lightgbm")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “logical”, “integer”, “numeric”, “factor”

  • Required Packages: mlr3, mlr3extralearners, lightgbm

Parameters

IdTypeDefaultLevelsRange
objectivecharacter-binary, multiclass, multiclassova-
evaluntyped--
verboseinteger1\((-\infty, \infty)\)
recordlogicalTRUETRUE, FALSE-
eval_freqinteger1\([1, \infty)\)
callbacksuntyped--
reset_datalogicalFALSETRUE, FALSE-
boostingcharactergbdtgbdt, rf, dart, goss-
linear_treelogicalFALSETRUE, FALSE-
learning_ratenumeric0.1\([0, \infty)\)
num_leavesinteger31\([1, 131072]\)
tree_learnercharacterserialserial, feature, data, voting-
num_threadsinteger0\([0, \infty)\)
device_typecharactercpucpu, gpu-
seedinteger-\((-\infty, \infty)\)
deterministiclogicalFALSETRUE, FALSE-
data_sample_strategycharacterbaggingbagging, goss-
force_col_wiselogicalFALSETRUE, FALSE-
force_row_wiselogicalFALSETRUE, FALSE-
histogram_pool_sizenumeric-1\((-\infty, \infty)\)
max_depthinteger-1\((-\infty, \infty)\)
min_data_in_leafinteger20\([0, \infty)\)
min_sum_hessian_in_leafnumeric0.001\([0, \infty)\)
bagging_fractionnumeric1\([0, 1]\)
pos_bagging_fractionnumeric1\([0, 1]\)
neg_bagging_fractionnumeric1\([0, 1]\)
bagging_freqinteger0\([0, \infty)\)
bagging_seedinteger3\((-\infty, \infty)\)
bagging_by_querylogicalFALSETRUE, FALSE-
feature_fractionnumeric1\([0, 1]\)
feature_fraction_bynodenumeric1\([0, 1]\)
feature_fraction_seedinteger2\((-\infty, \infty)\)
extra_treeslogicalFALSETRUE, FALSE-
extra_seedinteger6\((-\infty, \infty)\)
max_delta_stepnumeric0\((-\infty, \infty)\)
lambda_l1numeric0\([0, \infty)\)
lambda_l2numeric0\([0, \infty)\)
linear_lambdanumeric0\([0, \infty)\)
min_gain_to_splitnumeric0\([0, \infty)\)
drop_ratenumeric0.1\([0, 1]\)
max_dropinteger50\((-\infty, \infty)\)
skip_dropnumeric0.5\([0, 1]\)
xgboost_dart_modelogicalFALSETRUE, FALSE-
uniform_droplogicalFALSETRUE, FALSE-
drop_seedinteger4\((-\infty, \infty)\)
top_ratenumeric0.2\([0, 1]\)
other_ratenumeric0.1\([0, 1]\)
min_data_per_groupinteger100\([1, \infty)\)
max_cat_thresholdinteger32\([1, \infty)\)
cat_l2numeric10\([0, \infty)\)
cat_smoothnumeric10\([0, \infty)\)
max_cat_to_onehotinteger4\([1, \infty)\)
top_kinteger20\([1, \infty)\)
monotone_constraintsuntypedNULL-
monotone_constraints_methodcharacterbasicbasic, intermediate, advanced-
monotone_penaltynumeric0\([0, \infty)\)
feature_contriuntypedNULL-
forcedsplits_filenameuntyped""-
refit_decay_ratenumeric0.9\([0, 1]\)
cegb_tradeoffnumeric1\([0, \infty)\)
cegb_penalty_splitnumeric0\([0, \infty)\)
cegb_penalty_feature_lazyuntyped--
cegb_penalty_feature_coupleduntyped--
path_smoothnumeric0\([0, \infty)\)
interaction_constraintsuntyped--
use_quantized_gradlogicalTRUETRUE, FALSE-
num_grad_quant_binsinteger4\((-\infty, \infty)\)
quant_train_renew_leaflogicalFALSETRUE, FALSE-
stochastic_roundinglogicalTRUETRUE, FALSE-
serializablelogicalTRUETRUE, FALSE-
max_bininteger255\([2, \infty)\)
max_bin_by_featureuntypedNULL-
min_data_in_bininteger3\([1, \infty)\)
bin_construct_sample_cntinteger200000\([1, \infty)\)
data_random_seedinteger1\((-\infty, \infty)\)
is_enable_sparselogicalTRUETRUE, FALSE-
enable_bundlelogicalTRUETRUE, FALSE-
use_missinglogicalTRUETRUE, FALSE-
zero_as_missinglogicalFALSETRUE, FALSE-
feature_pre_filterlogicalTRUETRUE, FALSE-
pre_partitionlogicalFALSETRUE, FALSE-
two_roundlogicalFALSETRUE, FALSE-
forcedbins_filenameuntyped""-
is_unbalancelogicalFALSETRUE, FALSE-
scale_pos_weightnumeric1\([0, \infty)\)
sigmoidnumeric1\([0, \infty)\)
boost_from_averagelogicalTRUETRUE, FALSE-
eval_atuntyped1:5-
multi_error_top_kinteger1\([1, \infty)\)
auc_mu_weightsuntypedNULL-
num_machinesinteger1\([1, \infty)\)
local_listen_portinteger12400\([1, \infty)\)
time_outinteger120\([1, \infty)\)
machinesuntyped""-
gpu_platform_idinteger-1\((-\infty, \infty)\)
gpu_device_idinteger-1\((-\infty, \infty)\)
gpu_use_dplogicalFALSETRUE, FALSE-
num_gpuinteger1\([1, \infty)\)
start_iteration_predictinteger0\((-\infty, \infty)\)
num_iteration_predictinteger-1\((-\infty, \infty)\)
pred_early_stoplogicalFALSETRUE, FALSE-
pred_early_stop_freqinteger10\((-\infty, \infty)\)
pred_early_stop_marginnumeric10\((-\infty, \infty)\)
num_iterationsinteger100\([1, \infty)\)
early_stopping_roundsinteger-\([1, \infty)\)
early_stopping_min_deltanumeric-\([0, \infty)\)
first_metric_onlylogicalFALSETRUE, FALSE-

Initial parameter values

  • num_threads:

    • Actual default: 0L

    • Initial value: 1L

    • Reason for change: Prevents accidental conflicts with future.

  • verbose:

    • Actual default: 1L

    • Initial value: -1L

    • Reason for change: Prevents accidental conflicts with mlr messaging system.

  • objective:

    • Depends on the task: if binary classification, then this parameter is set to "binary", otherwise "multiclasss" and cannot be changed.

Custom mlr3 parameters

  • num_class: This parameter is automatically inferred for multiclass tasks and does not have to be set.

Early Stopping and Validation

Early stopping can be used to find the optimal number of boosting rounds. Set early_stopping_rounds to an integer value to monitor the performance of the model on the validation set while training. For information on how to configure the validation set, see the Validation section of mlr3::Learner. The internal validation measure can be set the eval parameter which should be a list of mlr3::Measures, functions, or strings for the internal lightgbm measures. If first_metric_only = FALSE (default), the learner stops when any metric fails to improve.

References

Ke, Guolin, Meng, Qi, Finley, Thomas, Wang, Taifeng, Chen, Wei, Ma, Weidong, Ye, Qiwei, Liu, Tie-Yan (2017). “Lightgbm: A highly efficient gradient boosting decision tree.” Advances in neural information processing systems, 30.

See also

Author

kapsner

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifLightGBM

Active bindings

internal_valid_scores

The last observation of the validation scores for all metrics. Extracted from model$evaluation_log

internal_tuned_values

Returns the early stopped iterations if early_stopping_rounds was set during training.

validate

How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.


Method importance()

The importance scores are extracted from lbg.importance.

Usage

LearnerClassifLightGBM$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerClassifLightGBM$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner
learner = lrn("classif.lightgbm")
print(learner)
#> 
#> ── <LearnerClassifLightGBM> (classif.lightgbm): Gradient Boosting ──────────────
#> • Model: -
#> • Parameters: verbose=-1, num_threads=1
#> • Validate: NULL
#> • Packages: mlr3, mlr3extralearners, and lightgbm
#> • Predict Types: response and [prob]
#> • Feature Types: logical, integer, numeric, and factor
#> • Encapsulation: none (fallback: -)
#> • Properties: hotstart_forward, importance, internal_tuning, missings,
#> multiclass, twoclass, validation, and weights
#> • Other settings: use_weights = 'use'

# Define a Task
task = tsk("sonar")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#> LightGBM Model (100 trees)
#> Objective: binary
#> Fitted to dataset with 60 columns
print(learner$importance())
#>          V12          V27          V51          V37          V31           V4 
#> 1.182117e-01 7.812965e-02 7.490624e-02 7.268383e-02 6.631697e-02 6.063617e-02 
#>          V11          V45          V36          V23           V9          V28 
#> 5.944957e-02 5.347561e-02 4.790533e-02 4.054334e-02 3.656935e-02 3.604497e-02 
#>          V39          V49           V5          V20          V16          V48 
#> 2.846182e-02 2.242024e-02 2.173387e-02 1.658120e-02 1.553595e-02 1.382644e-02 
#>          V52          V59          V18          V21          V26          V10 
#> 1.377641e-02 1.302099e-02 1.234837e-02 1.149071e-02 1.010776e-02 9.998819e-03 
#>          V25          V43          V42          V40          V41          V32 
#> 8.850534e-03 7.810513e-03 7.606878e-03 5.168267e-03 4.372790e-03 4.108626e-03 
#>          V15          V29          V17          V58          V34          V55 
#> 4.010461e-03 3.995902e-03 3.305969e-03 2.665297e-03 2.451488e-03 1.779241e-03 
#>          V19          V24          V13          V57          V47          V38 
#> 1.721134e-03 1.475334e-03 1.136977e-03 8.909032e-04 7.563557e-04 7.527825e-04 
#>          V60          V33          V14          V44          V46          V54 
#> 7.332086e-04 5.962927e-04 5.128650e-04 4.484831e-04 2.398830e-04 2.364535e-04 
#>          V22          V56 
#> 1.620055e-04 3.607685e-05 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> classif.ce 
#>  0.1449275