Skip to contents

Gradient boosting algorithm. Calls lightgbm::lightgbm() from lightgbm. The list of parameters can be found here and in the documentation of lightgbm::lgb.train().

Dictionary

This Learner can be instantiated via lrn():

lrn("classif.lightgbm")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “logical”, “integer”, “numeric”, “factor”

  • Required Packages: mlr3, mlr3extralearners, lightgbm

Parameters

IdTypeDefaultLevelsRange
objectivecharacter-binary, multiclass, multiclassova-
evaluntyped--
verboseinteger1\((-\infty, \infty)\)
recordlogicalTRUETRUE, FALSE-
eval_freqinteger1\([1, \infty)\)
callbacksuntyped--
reset_datalogicalFALSETRUE, FALSE-
boostingcharactergbdtgbdt, rf, dart, goss-
linear_treelogicalFALSETRUE, FALSE-
learning_ratenumeric0.1\([0, \infty)\)
num_leavesinteger31\([1, 131072]\)
tree_learnercharacterserialserial, feature, data, voting-
num_threadsinteger0\([0, \infty)\)
device_typecharactercpucpu, gpu-
seedinteger-\((-\infty, \infty)\)
deterministiclogicalFALSETRUE, FALSE-
data_sample_strategycharacterbaggingbagging, goss-
force_col_wiselogicalFALSETRUE, FALSE-
force_row_wiselogicalFALSETRUE, FALSE-
histogram_pool_sizenumeric-1\((-\infty, \infty)\)
max_depthinteger-1\((-\infty, \infty)\)
min_data_in_leafinteger20\([0, \infty)\)
min_sum_hessian_in_leafnumeric0.001\([0, \infty)\)
bagging_fractionnumeric1\([0, 1]\)
pos_bagging_fractionnumeric1\([0, 1]\)
neg_bagging_fractionnumeric1\([0, 1]\)
bagging_freqinteger0\([0, \infty)\)
bagging_seedinteger3\((-\infty, \infty)\)
bagging_by_querylogicalFALSETRUE, FALSE-
feature_fractionnumeric1\([0, 1]\)
feature_fraction_bynodenumeric1\([0, 1]\)
feature_fraction_seedinteger2\((-\infty, \infty)\)
extra_treeslogicalFALSETRUE, FALSE-
extra_seedinteger6\((-\infty, \infty)\)
max_delta_stepnumeric0\((-\infty, \infty)\)
lambda_l1numeric0\([0, \infty)\)
lambda_l2numeric0\([0, \infty)\)
linear_lambdanumeric0\([0, \infty)\)
min_gain_to_splitnumeric0\([0, \infty)\)
drop_ratenumeric0.1\([0, 1]\)
max_dropinteger50\((-\infty, \infty)\)
skip_dropnumeric0.5\([0, 1]\)
xgboost_dart_modelogicalFALSETRUE, FALSE-
uniform_droplogicalFALSETRUE, FALSE-
drop_seedinteger4\((-\infty, \infty)\)
top_ratenumeric0.2\([0, 1]\)
other_ratenumeric0.1\([0, 1]\)
min_data_per_groupinteger100\([1, \infty)\)
max_cat_thresholdinteger32\([1, \infty)\)
cat_l2numeric10\([0, \infty)\)
cat_smoothnumeric10\([0, \infty)\)
max_cat_to_onehotinteger4\([1, \infty)\)
top_kinteger20\([1, \infty)\)
monotone_constraintsuntypedNULL-
monotone_constraints_methodcharacterbasicbasic, intermediate, advanced-
monotone_penaltynumeric0\([0, \infty)\)
feature_contriuntypedNULL-
forcedsplits_filenameuntyped""-
refit_decay_ratenumeric0.9\([0, 1]\)
cegb_tradeoffnumeric1\([0, \infty)\)
cegb_penalty_splitnumeric0\([0, \infty)\)
cegb_penalty_feature_lazyuntyped--
cegb_penalty_feature_coupleduntyped--
path_smoothnumeric0\([0, \infty)\)
interaction_constraintsuntyped--
use_quantized_gradlogicalTRUETRUE, FALSE-
num_grad_quant_binsinteger4\((-\infty, \infty)\)
quant_train_renew_leaflogicalFALSETRUE, FALSE-
stochastic_roundinglogicalTRUETRUE, FALSE-
serializablelogicalTRUETRUE, FALSE-
max_bininteger255\([2, \infty)\)
max_bin_by_featureuntypedNULL-
min_data_in_bininteger3\([1, \infty)\)
bin_construct_sample_cntinteger200000\([1, \infty)\)
data_random_seedinteger1\((-\infty, \infty)\)
is_enable_sparselogicalTRUETRUE, FALSE-
enable_bundlelogicalTRUETRUE, FALSE-
use_missinglogicalTRUETRUE, FALSE-
zero_as_missinglogicalFALSETRUE, FALSE-
feature_pre_filterlogicalTRUETRUE, FALSE-
pre_partitionlogicalFALSETRUE, FALSE-
two_roundlogicalFALSETRUE, FALSE-
forcedbins_filenameuntyped""-
is_unbalancelogicalFALSETRUE, FALSE-
scale_pos_weightnumeric1\([0, \infty)\)
sigmoidnumeric1\([0, \infty)\)
boost_from_averagelogicalTRUETRUE, FALSE-
eval_atuntyped1:5-
multi_error_top_kinteger1\([1, \infty)\)
auc_mu_weightsuntypedNULL-
num_machinesinteger1\([1, \infty)\)
local_listen_portinteger12400\([1, \infty)\)
time_outinteger120\([1, \infty)\)
machinesuntyped""-
gpu_platform_idinteger-1\((-\infty, \infty)\)
gpu_device_idinteger-1\((-\infty, \infty)\)
gpu_use_dplogicalFALSETRUE, FALSE-
num_gpuinteger1\([1, \infty)\)
start_iteration_predictinteger0\((-\infty, \infty)\)
num_iteration_predictinteger-1\((-\infty, \infty)\)
pred_early_stoplogicalFALSETRUE, FALSE-
pred_early_stop_freqinteger10\((-\infty, \infty)\)
pred_early_stop_marginnumeric10\((-\infty, \infty)\)
num_iterationsinteger100\([1, \infty)\)
early_stopping_roundsinteger-\([1, \infty)\)
early_stopping_min_deltanumeric-\([0, \infty)\)
first_metric_onlylogicalFALSETRUE, FALSE-

Initial parameter values

  • num_threads:

    • Actual default: 0L

    • Initial value: 1L

    • Reason for change: Prevents accidental conflicts with future.

  • verbose:

    • Actual default: 1L

    • Initial value: -1L

    • Reason for change: Prevents accidental conflicts with mlr messaging system.

  • objective:

    • Depends on the task: if binary classification, then this parameter is set to "binary", otherwise "multiclasss" and cannot be changed.

Custom mlr3 parameters

  • num_class: This parameter is automatically inferred for multiclass tasks and does not have to be set.

Early Stopping and Validation

Early stopping can be used to find the optimal number of boosting rounds. Set early_stopping_rounds to an integer value to monitor the performance of the model on the validation set while training. For information on how to configure the validation set, see the Validation section of mlr3::Learner. The internal validation measure can be set the eval parameter which should be a list of mlr3::Measures, functions, or strings for the internal lightgbm measures. If first_metric_only = FALSE (default), the learner stops when any metric fails to improve.

References

Ke, Guolin, Meng, Qi, Finley, Thomas, Wang, Taifeng, Chen, Wei, Ma, Weidong, Ye, Qiwei, Liu, Tie-Yan (2017). “Lightgbm: A highly efficient gradient boosting decision tree.” Advances in neural information processing systems, 30.

See also

Author

kapsner

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifLightGBM

Active bindings

internal_valid_scores

The last observation of the validation scores for all metrics. Extracted from model$evaluation_log

internal_tuned_values

Returns the early stopped iterations if early_stopping_rounds was set during training.

validate

How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.


Method importance()

The importance scores are extracted from lbg.importance.

Usage

LearnerClassifLightGBM$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerClassifLightGBM$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner
learner = lrn("classif.lightgbm")
print(learner)
#> 
#> ── <LearnerClassifLightGBM> (classif.lightgbm): Gradient Boosting ──────────────
#> • Model: -
#> • Parameters: verbose=-1, num_threads=1
#> • Validate: NULL
#> • Packages: mlr3, mlr3extralearners, and lightgbm
#> • Predict Types: response and [prob]
#> • Feature Types: logical, integer, numeric, and factor
#> • Encapsulation: none (fallback: -)
#> • Properties: hotstart_forward, importance, internal_tuning, missings,
#> multiclass, twoclass, validation, and weights
#> • Other settings: use_weights = 'use'

# Define a Task
task = tsk("sonar")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#> LightGBM Model (100 trees)
#> Objective: binary
#> Fitted to dataset with 60 columns
print(learner$importance())
#>          V12          V11          V27          V48          V45          V20 
#> 0.1882232414 0.1621443056 0.0765553899 0.0652644699 0.0652109437 0.0560577978 
#>          V37          V36          V43           V4          V32          V31 
#> 0.0493789080 0.0344313343 0.0302753603 0.0302704521 0.0190239109 0.0188723900 
#>          V28          V17          V42          V29          V40          V13 
#> 0.0184020778 0.0172530324 0.0131092626 0.0115965938 0.0112179333 0.0110560728 
#>          V50          V21          V60          V59          V49          V16 
#> 0.0103258536 0.0100351616 0.0095258192 0.0091633202 0.0084637919 0.0065811530 
#>          V54          V44          V52          V23          V19          V39 
#> 0.0061666964 0.0056135975 0.0054890668 0.0054849157 0.0053591494 0.0050755151 
#>          V30          V10          V26          V47          V24          V51 
#> 0.0050140823 0.0049788552 0.0045148886 0.0035329386 0.0029123606 0.0021853262 
#>           V9          V57          V14          V34           V8          V41 
#> 0.0019233848 0.0018309115 0.0014018845 0.0013873094 0.0009488221 0.0009280215 
#>           V2          V33           V3          V56          V55          V18 
#> 0.0008930356 0.0006058255 0.0005680104 0.0003284821 0.0001908720 0.0001137413 
#>          V58 
#> 0.0001137307 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> classif.ce 
#>  0.2463768