Skip to contents

Gradient boosting algorithm. Calls lightgbm::lightgbm() from lightgbm. The list of parameters can be found here and in the documentation of lightgbm::lgb.train().

Dictionary

This Learner can be instantiated via lrn():

lrn("classif.lightgbm")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “logical”, “integer”, “numeric”, “factor”

  • Required Packages: mlr3, mlr3extralearners, lightgbm

Parameters

IdTypeDefaultLevelsRange
objectivecharacter-binary, multiclass, multiclassova-
evaluntyped--
verboseinteger1\((-\infty, \infty)\)
recordlogicalTRUETRUE, FALSE-
eval_freqinteger1\([1, \infty)\)
callbacksuntyped--
reset_datalogicalFALSETRUE, FALSE-
boostingcharactergbdtgbdt, rf, dart, goss-
linear_treelogicalFALSETRUE, FALSE-
learning_ratenumeric0.1\([0, \infty)\)
num_leavesinteger31\([1, 131072]\)
tree_learnercharacterserialserial, feature, data, voting-
num_threadsinteger0\([0, \infty)\)
device_typecharactercpucpu, gpu-
seedinteger-\((-\infty, \infty)\)
deterministiclogicalFALSETRUE, FALSE-
data_sample_strategycharacterbaggingbagging, goss-
force_col_wiselogicalFALSETRUE, FALSE-
force_row_wiselogicalFALSETRUE, FALSE-
histogram_pool_sizenumeric-1\((-\infty, \infty)\)
max_depthinteger-1\((-\infty, \infty)\)
min_data_in_leafinteger20\([0, \infty)\)
min_sum_hessian_in_leafnumeric0.001\([0, \infty)\)
bagging_fractionnumeric1\([0, 1]\)
pos_bagging_fractionnumeric1\([0, 1]\)
neg_bagging_fractionnumeric1\([0, 1]\)
bagging_freqinteger0\([0, \infty)\)
bagging_seedinteger3\((-\infty, \infty)\)
bagging_by_querylogicalFALSETRUE, FALSE-
feature_fractionnumeric1\([0, 1]\)
feature_fraction_bynodenumeric1\([0, 1]\)
feature_fraction_seedinteger2\((-\infty, \infty)\)
extra_treeslogicalFALSETRUE, FALSE-
extra_seedinteger6\((-\infty, \infty)\)
max_delta_stepnumeric0\((-\infty, \infty)\)
lambda_l1numeric0\([0, \infty)\)
lambda_l2numeric0\([0, \infty)\)
linear_lambdanumeric0\([0, \infty)\)
min_gain_to_splitnumeric0\([0, \infty)\)
drop_ratenumeric0.1\([0, 1]\)
max_dropinteger50\((-\infty, \infty)\)
skip_dropnumeric0.5\([0, 1]\)
xgboost_dart_modelogicalFALSETRUE, FALSE-
uniform_droplogicalFALSETRUE, FALSE-
drop_seedinteger4\((-\infty, \infty)\)
top_ratenumeric0.2\([0, 1]\)
other_ratenumeric0.1\([0, 1]\)
min_data_per_groupinteger100\([1, \infty)\)
max_cat_thresholdinteger32\([1, \infty)\)
cat_l2numeric10\([0, \infty)\)
cat_smoothnumeric10\([0, \infty)\)
max_cat_to_onehotinteger4\([1, \infty)\)
top_kinteger20\([1, \infty)\)
monotone_constraintsuntypedNULL-
monotone_constraints_methodcharacterbasicbasic, intermediate, advanced-
monotone_penaltynumeric0\([0, \infty)\)
feature_contriuntypedNULL-
forcedsplits_filenameuntyped""-
refit_decay_ratenumeric0.9\([0, 1]\)
cegb_tradeoffnumeric1\([0, \infty)\)
cegb_penalty_splitnumeric0\([0, \infty)\)
cegb_penalty_feature_lazyuntyped--
cegb_penalty_feature_coupleduntyped--
path_smoothnumeric0\([0, \infty)\)
interaction_constraintsuntyped--
use_quantized_gradlogicalTRUETRUE, FALSE-
num_grad_quant_binsinteger4\((-\infty, \infty)\)
quant_train_renew_leaflogicalFALSETRUE, FALSE-
stochastic_roundinglogicalTRUETRUE, FALSE-
serializablelogicalTRUETRUE, FALSE-
max_bininteger255\([2, \infty)\)
max_bin_by_featureuntypedNULL-
min_data_in_bininteger3\([1, \infty)\)
bin_construct_sample_cntinteger200000\([1, \infty)\)
data_random_seedinteger1\((-\infty, \infty)\)
is_enable_sparselogicalTRUETRUE, FALSE-
enable_bundlelogicalTRUETRUE, FALSE-
use_missinglogicalTRUETRUE, FALSE-
zero_as_missinglogicalFALSETRUE, FALSE-
feature_pre_filterlogicalTRUETRUE, FALSE-
pre_partitionlogicalFALSETRUE, FALSE-
two_roundlogicalFALSETRUE, FALSE-
forcedbins_filenameuntyped""-
is_unbalancelogicalFALSETRUE, FALSE-
scale_pos_weightnumeric1\([0, \infty)\)
sigmoidnumeric1\([0, \infty)\)
boost_from_averagelogicalTRUETRUE, FALSE-
eval_atuntyped1:5-
multi_error_top_kinteger1\([1, \infty)\)
auc_mu_weightsuntypedNULL-
num_machinesinteger1\([1, \infty)\)
local_listen_portinteger12400\([1, \infty)\)
time_outinteger120\([1, \infty)\)
machinesuntyped""-
gpu_platform_idinteger-1\((-\infty, \infty)\)
gpu_device_idinteger-1\((-\infty, \infty)\)
gpu_device_id_listuntypedNULL-
gpu_use_dplogicalFALSETRUE, FALSE-
num_gpuinteger1\([1, \infty)\)
start_iteration_predictinteger0\((-\infty, \infty)\)
num_iteration_predictinteger-1\((-\infty, \infty)\)
pred_early_stoplogicalFALSETRUE, FALSE-
pred_early_stop_freqinteger10\((-\infty, \infty)\)
pred_early_stop_marginnumeric10\((-\infty, \infty)\)
num_iterationsinteger100\([1, \infty)\)
early_stopping_roundsinteger-\([1, \infty)\)
early_stopping_min_deltanumeric-\([0, \infty)\)
first_metric_onlylogicalFALSETRUE, FALSE-

Initial parameter values

  • num_threads:

    • Actual default: 0L

    • Initial value: 1L

    • Reason for change: Prevents accidental conflicts with future.

  • verbose:

    • Actual default: 1L

    • Initial value: -1L

    • Reason for change: Prevents accidental conflicts with mlr messaging system.

  • objective:

    • Depends on the task: if binary classification, then this parameter is set to "binary", otherwise "multiclasss" and cannot be changed.

Custom mlr3 parameters

  • num_class: This parameter is automatically inferred for multiclass tasks and does not have to be set.

Early Stopping and Validation

Early stopping can be used to find the optimal number of boosting rounds. Set early_stopping_rounds to an integer value to monitor the performance of the model on the validation set while training. For information on how to configure the validation set, see the Validation section of mlr3::Learner. The internal validation measure can be set the eval parameter which should be a list of mlr3::Measures, functions, or strings for the internal lightgbm measures. If first_metric_only = FALSE (default), the learner stops when any metric fails to improve.

References

Ke, Guolin, Meng, Qi, Finley, Thomas, Wang, Taifeng, Chen, Wei, Ma, Weidong, Ye, Qiwei, Liu, Tie-Yan (2017). “Lightgbm: A highly efficient gradient boosting decision tree.” Advances in neural information processing systems, 30.

See also

Author

kapsner

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifLightGBM

Active bindings

internal_valid_scores

The last observation of the validation scores for all metrics. Extracted from model$evaluation_log

internal_tuned_values

Returns the early stopped iterations if early_stopping_rounds was set during training.

validate

How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.


Method importance()

The importance scores are extracted from lbg.importance.

Usage

LearnerClassifLightGBM$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerClassifLightGBM$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner
learner = lrn("classif.lightgbm")
print(learner)
#> 
#> ── <LearnerClassifLightGBM> (classif.lightgbm): Gradient Boosting ──────────────
#> • Model: -
#> • Parameters: verbose=-1, num_threads=1
#> • Validate: NULL
#> • Packages: mlr3, mlr3extralearners, and lightgbm
#> • Predict Types: response and [prob]
#> • Feature Types: logical, integer, numeric, and factor
#> • Encapsulation: none (fallback: -)
#> • Properties: hotstart_forward, importance, internal_tuning, missings,
#> multiclass, twoclass, validation, and weights
#> • Other settings: use_weights = 'use', predict_raw = 'FALSE'

# Define a Task
task = tsk("sonar")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#> LightGBM Model (100 trees)
#> Objective: binary
#> Fitted to dataset with 60 columns
print(learner$importance())
#>          V11          V16           V9          V51          V21          V52 
#> 0.1282709410 0.0786211061 0.0697479378 0.0638330903 0.0540878169 0.0458941228 
#>          V47          V45           V4          V36          V48          V31 
#> 0.0401018293 0.0386424900 0.0372561965 0.0365420615 0.0360441297 0.0329605549 
#>          V23          V55          V12          V27          V28          V37 
#> 0.0329279686 0.0312960213 0.0288737585 0.0288541613 0.0209999100 0.0192610042 
#>          V17          V46           V3          V26          V54          V33 
#> 0.0182009455 0.0169404468 0.0138533227 0.0136395533 0.0129265043 0.0098754436 
#>          V29          V42          V58           V7          V49          V10 
#> 0.0083749946 0.0078957832 0.0069992482 0.0069782879 0.0061043328 0.0059789705 
#>          V60           V6          V13           V8          V32          V57 
#> 0.0045084737 0.0036760174 0.0035541151 0.0032910833 0.0030722670 0.0029493271 
#>          V24          V20          V50          V40          V59          V35 
#> 0.0028438988 0.0026259865 0.0024451727 0.0023645674 0.0022515076 0.0020794003 
#>          V38           V1          V44          V18          V14          V19 
#> 0.0018586518 0.0018307706 0.0016661580 0.0016127648 0.0015174008 0.0009145429 
#>          V53          V43          V22          V39           V5          V41 
#> 0.0006679626 0.0005259241 0.0004904522 0.0003992146 0.0003063967 0.0002684425 
#>          V56          V34 
#> 0.0001804733 0.0001160935 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> classif.ce 
#>  0.1014493