Gradient Boosted Decision Trees Regression Learner
mlr_learners_regr.catboost.Rd
Gradient boosting algorithm that also supports categorical data.
Calls catboost::catboost.train()
from package 'catboost'.
Dictionary
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn()
:
$get("regr.catboost")
mlr_learnerslrn("regr.catboost")
Meta Information
Task type: “regr”
Predict Types: “response”
Feature Types: “numeric”, “factor”, “ordered”
Required Packages: mlr3, mlr3extralearners, catboost
Parameters
Id | Type | Default | Levels | Range |
loss_function | character | RMSE | MAE, MAPE, Poisson, Quantile, RMSE, LogLinQuantile, Lq, Huber, Expectile, Tweedie | - |
iterations | integer | 1000 | \([1, \infty)\) | |
learning_rate | numeric | 0.03 | \([0.001, 1]\) | |
random_seed | integer | 0 | \([0, \infty)\) | |
l2_leaf_reg | numeric | 3 | \([0, \infty)\) | |
bootstrap_type | character | - | Bayesian, Bernoulli, MVS, Poisson, No | - |
bagging_temperature | numeric | 1 | \([0, \infty)\) | |
subsample | numeric | - | \([0, 1]\) | |
sampling_frequency | character | PerTreeLevel | PerTree, PerTreeLevel | - |
sampling_unit | character | Object | Object, Group | - |
mvs_reg | numeric | - | \([0, \infty)\) | |
random_strength | numeric | 1 | \([0, \infty)\) | |
depth | integer | 6 | \([1, 16]\) | |
grow_policy | character | SymmetricTree | SymmetricTree, Depthwise, Lossguide | - |
min_data_in_leaf | integer | 1 | \([1, \infty)\) | |
max_leaves | integer | 31 | \([1, \infty)\) | |
has_time | logical | FALSE | TRUE, FALSE | - |
rsm | numeric | 1 | \([0.001, 1]\) | |
nan_mode | character | Min | Min, Max | - |
fold_permutation_block | integer | - | \([1, 256]\) | |
leaf_estimation_method | character | - | Newton, Gradient, Exact | - |
leaf_estimation_iterations | integer | - | \([1, \infty)\) | |
leaf_estimation_backtracking | character | AnyImprovement | No, AnyImprovement, Armijo | - |
fold_len_multiplier | numeric | 2 | \([1.001, \infty)\) | |
approx_on_full_history | logical | TRUE | TRUE, FALSE | - |
boosting_type | character | - | Ordered, Plain | - |
boost_from_average | logical | - | TRUE, FALSE | - |
langevin | logical | FALSE | TRUE, FALSE | - |
diffusion_temperature | numeric | 10000 | \([0, \infty)\) | |
score_function | character | Cosine | Cosine, L2, NewtonCosine, NewtonL2 | - |
monotone_constraints | untyped | - | - | |
feature_weights | untyped | - | - | |
first_feature_use_penalties | untyped | - | - | |
penalties_coefficient | numeric | 1 | \([0, \infty)\) | |
per_object_feature_penalties | untyped | - | - | |
model_shrink_rate | numeric | - | \((-\infty, \infty)\) | |
model_shrink_mode | character | - | Constant, Decreasing | - |
target_border | numeric | - | \((-\infty, \infty)\) | |
border_count | integer | - | \([1, 65535]\) | |
feature_border_type | character | GreedyLogSum | Median, Uniform, UniformAndQuantiles, MaxLogSum, MinEntropy, GreedyLogSum | - |
per_float_feature_quantization | untyped | - | - | |
thread_count | integer | 1 | \([-1, \infty)\) | |
task_type | character | CPU | CPU, GPU | - |
devices | untyped | - | - | |
logging_level | character | Silent | Silent, Verbose, Info, Debug | - |
metric_period | integer | 1 | \([1, \infty)\) | |
train_dir | untyped | catboost_info | - | |
model_size_reg | numeric | 0.5 | \([0, 1]\) | |
allow_writing_files | logical | FALSE | TRUE, FALSE | - |
save_snapshot | logical | FALSE | TRUE, FALSE | - |
snapshot_file | untyped | - | - | |
snapshot_interval | integer | 600 | \([1, \infty)\) | |
simple_ctr | untyped | - | - | |
combinations_ctr | untyped | - | - | |
ctr_target_border_count | integer | - | \([1, 255]\) | |
counter_calc_method | character | Full | SkipTest, Full | - |
max_ctr_complexity | integer | - | \([1, \infty)\) | |
ctr_leaf_count_limit | integer | - | \([1, \infty)\) | |
store_all_simple_ctr | logical | FALSE | TRUE, FALSE | - |
final_ctr_computation_mode | character | Default | Default, Skip | - |
verbose | logical | FALSE | TRUE, FALSE | - |
ntree_start | integer | 0 | \([0, \infty)\) | |
ntree_end | integer | 0 | \([0, \infty)\) |
Installation
The easiest way to install catboost is with the helper function install_catboost.
Custom mlr3 defaults
logging_level
:Actual default: "Verbose"
Adjusted default: "Silent"
Reason for change: consistent with other mlr3 learners
thread_count
:Actual default: -1
Adjusted default: 1
Reason for change: consistent with other mlr3 learners
allow_writing_files
:Actual default: TRUE
Adjusted default: FALSE
Reason for change: consistent with other mlr3 learners
save_snapshot
:Actual default: TRUE
Adjusted default: FALSE
Reason for change: consistent with other mlr3 learners
References
Dorogush, Veronika A, Ershov, Vasily, Gulin, Andrey (2018). “CatBoost: gradient boosting with categorical features support.” arXiv preprint arXiv:1810.11363.
See also
as.data.table(mlr_learners)
for a table of available Learners in the running session (depending on the loaded packages).Chapter in the mlr3book: https://mlr3book.mlr-org.com/basics.html#learners
mlr3learners for a selection of recommended learners.
mlr3cluster for unsupervised clustering learners.
mlr3pipelines to combine learners with pre- and postprocessing steps.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Super classes
mlr3::Learner
-> mlr3::LearnerRegr
-> LearnerRegrCatboost
Methods
Method importance()
The importance scores are calculated using
catboost.get_feature_importance
,
setting type = "FeatureImportance"
, returned for 'all'.
Returns
Named numeric()
.
Examples
learner = mlr3::lrn("regr.catboost")
print(learner)
#> <LearnerRegrCatboost:regr.catboost>: Gradient Boosting
#> * Model: -
#> * Parameters: loss_function=RMSE, logging_level=Silent, thread_count=1,
#> allow_writing_files=FALSE, save_snapshot=FALSE
#> * Packages: mlr3, mlr3extralearners, catboost
#> * Predict Types: [response]
#> * Feature Types: numeric, factor, ordered
#> * Properties: importance, missings, weights
# available parameters:
learner$param_set$ids()
#> [1] "loss_function" "iterations"
#> [3] "learning_rate" "random_seed"
#> [5] "l2_leaf_reg" "bootstrap_type"
#> [7] "bagging_temperature" "subsample"
#> [9] "sampling_frequency" "sampling_unit"
#> [11] "mvs_reg" "random_strength"
#> [13] "depth" "grow_policy"
#> [15] "min_data_in_leaf" "max_leaves"
#> [17] "has_time" "rsm"
#> [19] "nan_mode" "fold_permutation_block"
#> [21] "leaf_estimation_method" "leaf_estimation_iterations"
#> [23] "leaf_estimation_backtracking" "fold_len_multiplier"
#> [25] "approx_on_full_history" "boosting_type"
#> [27] "boost_from_average" "langevin"
#> [29] "diffusion_temperature" "score_function"
#> [31] "monotone_constraints" "feature_weights"
#> [33] "first_feature_use_penalties" "penalties_coefficient"
#> [35] "per_object_feature_penalties" "model_shrink_rate"
#> [37] "model_shrink_mode" "target_border"
#> [39] "border_count" "feature_border_type"
#> [41] "per_float_feature_quantization" "thread_count"
#> [43] "task_type" "devices"
#> [45] "logging_level" "metric_period"
#> [47] "train_dir" "model_size_reg"
#> [49] "allow_writing_files" "save_snapshot"
#> [51] "snapshot_file" "snapshot_interval"
#> [53] "simple_ctr" "combinations_ctr"
#> [55] "ctr_target_border_count" "counter_calc_method"
#> [57] "max_ctr_complexity" "ctr_leaf_count_limit"
#> [59] "store_all_simple_ctr" "final_ctr_computation_mode"
#> [61] "verbose" "ntree_start"
#> [63] "ntree_end"