Skip to contents

Gradient boosting algorithm. Calls lightgbm::lightgbm() from lightgbm.

Dictionary

This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():

mlr_learners$get("regr.lightgbm")
lrn("regr.lightgbm")

Meta Information

  • Task type: “regr”

  • Predict Types: “response”

  • Feature Types: “logical”, “integer”, “numeric”, “factor”

  • Required Packages: mlr3, mlr3extralearners, lightgbm

Parameters

IdTypeDefaultLevelsRange
nroundsinteger5\([1, \infty)\)
objectivecharacterregressionregression, regression_l1, huber, fair, poisson, quantile, mape, gamma, tweedie-
metriccharacter, None, l1, l2, rmse, quantile, mape, huber, fair, poisson, ...-
custom_evaluntyped-
verboseinteger1\((-\infty, \infty)\)
recordlogicalTRUETRUE, FALSE-
eval_freqinteger1\([1, \infty)\)
init_modeluntyped-
early_stopping_roundsinteger-\([1, \infty)\)
early_stopping_splitnumeric0\([0, 1]\)
callbacksuntyped--
reset_datalogicalFALSETRUE, FALSE-
categorical_featureuntyped-
convert_categoricallogicalTRUETRUE, FALSE-
boostingcharactergbdtgbdt, rf, dart, goss-
linear_treelogicalFALSETRUE, FALSE-
num_iterationsinteger100\([0, \infty)\)
learning_ratenumeric0.1\([0, \infty)\)
num_leavesinteger31\([1, 131072]\)
tree_learnercharacterserialserial, feature, data, voting-
num_threadsinteger0\([0, \infty)\)
device_typecharactercpucpu, gpu-
seedinteger-\((-\infty, \infty)\)
deterministiclogicalFALSETRUE, FALSE-
force_col_wiselogicalFALSETRUE, FALSE-
force_row_wiselogicalFALSETRUE, FALSE-
histogram_pool_sizeinteger-1\((-\infty, \infty)\)
max_depthinteger-1\((-\infty, \infty)\)
min_data_in_leafinteger20\([0, \infty)\)
min_sum_hessian_in_leafnumeric0.001\([0, \infty)\)
bagging_fractionnumeric1\([0, 1]\)
bagging_freqinteger0\([0, \infty)\)
bagging_seedinteger3\((-\infty, \infty)\)
feature_fractionnumeric1\([0, 1]\)
feature_fraction_bynodenumeric1\([0, 1]\)
feature_fraction_seedinteger2\((-\infty, \infty)\)
extra_treeslogicalFALSETRUE, FALSE-
extra_seedinteger6\((-\infty, \infty)\)
first_metric_onlylogicalFALSETRUE, FALSE-
max_delta_stepnumeric0\((-\infty, \infty)\)
lambda_l1numeric0\([0, \infty)\)
lambda_l2numeric0\([0, \infty)\)
linear_lambdanumeric0\([0, \infty)\)
min_gain_to_splitnumeric0\([0, \infty)\)
drop_ratenumeric0.1\([0, 1]\)
max_dropinteger50\((-\infty, \infty)\)
skip_dropnumeric0.5\([0, 1]\)
xgboost_dart_modelogicalFALSETRUE, FALSE-
uniform_droplogicalFALSETRUE, FALSE-
drop_seedinteger4\((-\infty, \infty)\)
top_ratenumeric0.2\([0, 1]\)
other_ratenumeric0.1\([0, 1]\)
min_data_per_groupinteger100\([1, \infty)\)
max_cat_thresholdinteger32\([1, \infty)\)
cat_l2numeric10\([0, \infty)\)
cat_smoothnumeric10\([0, \infty)\)
max_cat_to_onehotinteger4\([1, \infty)\)
top_kinteger20\([1, \infty)\)
monotone_constraintsuntyped-
monotone_constraints_methodcharacterbasicbasic, intermediate, advanced-
monotone_penaltynumeric0\([0, \infty)\)
feature_contriuntyped-
forcedsplits_filenameuntyped-
refit_decay_ratenumeric0.9\([0, 1]\)
cegb_tradeoffnumeric1\([0, \infty)\)
cegb_penalty_splitnumeric0\([0, \infty)\)
cegb_penalty_feature_lazyuntyped--
cegb_penalty_feature_coupleduntyped--
path_smoothnumeric0\([0, \infty)\)
interaction_constraintsuntyped--
input_modeluntyped-
output_modeluntypedLightGBM_model.txt-
saved_feature_importance_typeinteger0\([0, 1]\)
snapshot_freqinteger-1\((-\infty, \infty)\)
max_bininteger255\([2, \infty)\)
max_bin_by_featureuntyped-
min_data_in_bininteger3\([1, \infty)\)
bin_construct_sample_cntinteger200000\([1, \infty)\)
data_random_seedinteger1\((-\infty, \infty)\)
is_enable_sparselogicalTRUETRUE, FALSE-
enable_bundlelogicalTRUETRUE, FALSE-
use_missinglogicalTRUETRUE, FALSE-
zero_as_missinglogicalFALSETRUE, FALSE-
feature_pre_filterlogicalTRUETRUE, FALSE-
pre_partitionlogicalFALSETRUE, FALSE-
two_roundlogicalFALSETRUE, FALSE-
headerlogicalFALSETRUE, FALSE-
group_columnuntyped-
forcedbins_filenameuntyped-
save_binarylogicalFALSETRUE, FALSE-
boost_from_averagelogicalTRUETRUE, FALSE-
reg_sqrtlogicalFALSETRUE, FALSE-
alphanumeric0.9\([0, \infty)\)
fair_cnumeric1\([0, \infty)\)
poisson_max_delta_stepnumeric0.7\([0, \infty)\)
tweedie_variance_powernumeric1.5\([1, 2]\)
metric_freqinteger1\([1, \infty)\)
is_provide_training_metriclogicalFALSETRUE, FALSE-
num_machinesinteger1\([1, \infty)\)
local_listen_portinteger12400\([1, \infty)\)
time_outinteger120\([1, \infty)\)
machine_list_filenameuntyped-
machinesuntyped-
gpu_platform_idinteger-1\((-\infty, \infty)\)
gpu_device_idinteger-1\((-\infty, \infty)\)
gpu_use_dplogicalFALSETRUE, FALSE-
num_gpuinteger1\([1, \infty)\)
start_iterationinteger0\((-\infty, \infty)\)
num_iterationinteger-1\((-\infty, \infty)\)
pred_early_stoplogicalFALSETRUE, FALSE-
pred_early_stop_freqinteger10\((-\infty, \infty)\)
pred_early_stop_marginnumeric10\((-\infty, \infty)\)
output_resultuntypedLightGBM_predict_result.txt-

Parameter Changes

  • num_threads:

    • Actual default: 0L

    • Adjusted default: 1L

    • Reason for change: Prevents accidental conflicts with future.

  • verbose:

    • Actual default: 1L

    • Adjusted default: -1L

    • Reason for change: Prevents accidental conflicts with mlr messaging system.

  • convert_categorical: Additional parameter. If this parameter is set to TRUE (default), all factor and logical columns are converted to integers and the parameter categorical_feature of lightgbm is set to those columns.

  • early_stopping_split: Additional parameter. Instead of providing the data that is used for early stopping explicitly, the parameter early_stopping_split determines the proportion of the training data that is used for early stopping. Here, stratification on the target variable is used if there is no grouping variable, as one cannot simultaneously stratify and group.

References

Ke, Guolin, Meng, Qi, Finley, Thomas, Wang, Taifeng, Chen, Wei, Ma, Weidong, Ye, Qiwei, Liu, Tie-Yan (2017). “Lightgbm: A highly efficient gradient boosting decision tree.” Advances in neural information processing systems, 30.

See also

Author

kapsner

Super classes

mlr3::Learner -> mlr3::LearnerRegr -> LearnerRegrLightGBM

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.

Usage


Method importance()

The importance scores are extracted from lbg.importance.

Usage

LearnerRegrLightGBM$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerRegrLightGBM$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

learner = mlr3::lrn("regr.lightgbm")
print(learner)
#> <LearnerRegrLightGBM:regr.lightgbm>: Gradient Boosting
#> * Model: -
#> * Parameters: num_threads=1, verbose=-1, objective=regression,
#>   convert_categorical=TRUE
#> * Packages: mlr3, mlr3extralearners, lightgbm
#> * Predict Type: response
#> * Feature types: numeric, integer, logical, factor
#> * Properties: importance, missings, weights

# available parameters:
learner$param_set$ids()
#>   [1] "nrounds"                       "objective"                    
#>   [3] "metric"                        "custom_eval"                  
#>   [5] "verbose"                       "record"                       
#>   [7] "eval_freq"                     "init_model"                   
#>   [9] "early_stopping_rounds"         "early_stopping_split"         
#>  [11] "callbacks"                     "reset_data"                   
#>  [13] "categorical_feature"           "convert_categorical"          
#>  [15] "boosting"                      "linear_tree"                  
#>  [17] "num_iterations"                "learning_rate"                
#>  [19] "num_leaves"                    "tree_learner"                 
#>  [21] "num_threads"                   "device_type"                  
#>  [23] "seed"                          "deterministic"                
#>  [25] "force_col_wise"                "force_row_wise"               
#>  [27] "histogram_pool_size"           "max_depth"                    
#>  [29] "min_data_in_leaf"              "min_sum_hessian_in_leaf"      
#>  [31] "bagging_fraction"              "bagging_freq"                 
#>  [33] "bagging_seed"                  "feature_fraction"             
#>  [35] "feature_fraction_bynode"       "feature_fraction_seed"        
#>  [37] "extra_trees"                   "extra_seed"                   
#>  [39] "first_metric_only"             "max_delta_step"               
#>  [41] "lambda_l1"                     "lambda_l2"                    
#>  [43] "linear_lambda"                 "min_gain_to_split"            
#>  [45] "drop_rate"                     "max_drop"                     
#>  [47] "skip_drop"                     "xgboost_dart_mode"            
#>  [49] "uniform_drop"                  "drop_seed"                    
#>  [51] "top_rate"                      "other_rate"                   
#>  [53] "min_data_per_group"            "max_cat_threshold"            
#>  [55] "cat_l2"                        "cat_smooth"                   
#>  [57] "max_cat_to_onehot"             "top_k"                        
#>  [59] "monotone_constraints"          "monotone_constraints_method"  
#>  [61] "monotone_penalty"              "feature_contri"               
#>  [63] "forcedsplits_filename"         "refit_decay_rate"             
#>  [65] "cegb_tradeoff"                 "cegb_penalty_split"           
#>  [67] "cegb_penalty_feature_lazy"     "cegb_penalty_feature_coupled" 
#>  [69] "path_smooth"                   "interaction_constraints"      
#>  [71] "input_model"                   "output_model"                 
#>  [73] "saved_feature_importance_type" "snapshot_freq"                
#>  [75] "max_bin"                       "max_bin_by_feature"           
#>  [77] "min_data_in_bin"               "bin_construct_sample_cnt"     
#>  [79] "data_random_seed"              "is_enable_sparse"             
#>  [81] "enable_bundle"                 "use_missing"                  
#>  [83] "zero_as_missing"               "feature_pre_filter"           
#>  [85] "pre_partition"                 "two_round"                    
#>  [87] "header"                        "group_column"                 
#>  [89] "forcedbins_filename"           "save_binary"                  
#>  [91] "boost_from_average"            "reg_sqrt"                     
#>  [93] "alpha"                         "fair_c"                       
#>  [95] "poisson_max_delta_step"        "tweedie_variance_power"       
#>  [97] "metric_freq"                   "is_provide_training_metric"   
#>  [99] "num_machines"                  "local_listen_port"            
#> [101] "time_out"                      "machine_list_filename"        
#> [103] "machines"                      "gpu_platform_id"              
#> [105] "gpu_device_id"                 "gpu_use_dp"                   
#> [107] "num_gpu"                       "start_iteration"              
#> [109] "num_iteration"                 "pred_early_stop"              
#> [111] "pred_early_stop_freq"          "pred_early_stop_margin"       
#> [113] "output_result"