Skip to contents

Gradient boosting algorithm that also supports categorical data. Calls catboost::catboost.train() from package 'catboost'.

Dictionary

This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():

mlr_learners$get("classif.catboost")
lrn("classif.catboost")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “numeric”, “factor”, “ordered”

  • Required Packages: mlr3, mlr3extralearners, catboost

Parameters

IdTypeDefaultLevelsRange
loss_function_twoclasscharacterLoglossLogloss, CrossEntropy-
loss_function_multiclasscharacterMultiClassMultiClass, MultiClassOneVsAll-
iterationsinteger1000\([1, \infty)\)
learning_ratenumeric0.03\([0.001, 1]\)
random_seedinteger0\([0, \infty)\)
l2_leaf_regnumeric3\([0, \infty)\)
bootstrap_typecharacter-Bayesian, Bernoulli, MVS, Poisson, No-
bagging_temperaturenumeric1\([0, \infty)\)
subsamplenumeric-\([0, 1]\)
sampling_frequencycharacterPerTreeLevelPerTree, PerTreeLevel-
sampling_unitcharacterObjectObject, Group-
mvs_regnumeric-\([0, \infty)\)
random_strengthnumeric1\([0, \infty)\)
depthinteger6\([1, 16]\)
grow_policycharacterSymmetricTreeSymmetricTree, Depthwise, Lossguide-
min_data_in_leafinteger1\([1, \infty)\)
max_leavesinteger31\([1, \infty)\)
ignored_featuresuntyped-
one_hot_max_sizeuntypedFALSE-
has_timelogicalFALSETRUE, FALSE-
rsmnumeric1\([0.001, 1]\)
nan_modecharacterMinMin, Max-
fold_permutation_blockinteger-\([1, 256]\)
leaf_estimation_methodcharacter-Newton, Gradient, Exact-
leaf_estimation_iterationsinteger-\([1, \infty)\)
leaf_estimation_backtrackingcharacterAnyImprovementNo, AnyImprovement, Armijo-
fold_len_multipliernumeric2\([1.001, \infty)\)
approx_on_full_historylogicalTRUETRUE, FALSE-
class_weightsuntyped--
auto_class_weightscharacterNoneNone, Balanced, SqrtBalanced-
boosting_typecharacter-Ordered, Plain-
boost_from_averagelogical-TRUE, FALSE-
langevinlogicalFALSETRUE, FALSE-
diffusion_temperaturenumeric10000\([0, \infty)\)
score_functioncharacterCosineCosine, L2, NewtonCosine, NewtonL2-
monotone_constraintsuntyped--
feature_weightsuntyped--
first_feature_use_penaltiesuntyped--
penalties_coefficientnumeric1\([0, \infty)\)
per_object_feature_penaltiesuntyped--
model_shrink_ratenumeric-\((-\infty, \infty)\)
model_shrink_modecharacter-Constant, Decreasing-
target_bordernumeric-\((-\infty, \infty)\)
border_countinteger-\([1, 65535]\)
feature_border_typecharacterGreedyLogSumMedian, Uniform, UniformAndQuantiles, MaxLogSum, MinEntropy, GreedyLogSum-
per_float_feature_quantizationuntyped--
classes_countinteger-\([1, \infty)\)
thread_countinteger1\([-1, \infty)\)
task_typecharacterCPUCPU, GPU-
devicesuntyped--
logging_levelcharacterSilentSilent, Verbose, Info, Debug-
metric_periodinteger1\([1, \infty)\)
train_diruntypedcatboost_info-
model_size_regnumeric0.5\([0, 1]\)
allow_writing_fileslogicalFALSETRUE, FALSE-
save_snapshotlogicalFALSETRUE, FALSE-
snapshot_fileuntyped--
snapshot_intervalinteger600\([1, \infty)\)
simple_ctruntyped--
combinations_ctruntyped--
ctr_target_border_countinteger-\([1, 255]\)
counter_calc_methodcharacterFullSkipTest, Full-
max_ctr_complexityinteger-\([1, \infty)\)
ctr_leaf_count_limitinteger-\([1, \infty)\)
store_all_simple_ctrlogicalFALSETRUE, FALSE-
final_ctr_computation_modecharacterDefaultDefault, Skip-
verboselogicalFALSETRUE, FALSE-
ntree_startinteger0\([0, \infty)\)
ntree_endinteger0\([0, \infty)\)

Initial parameter values

  • logging_level:

    • Actual default: "Verbose"

    • Adjusted default: "Silent"

    • Reason for change: consistent with other mlr3 learners

  • thread_count:

    • Actual default: -1

    • Adjusted default: 1

    • Reason for change: consistent with other mlr3 learners

  • allow_writing_files:

    • Actual default: TRUE

    • Adjusted default: FALSE

    • Reason for change: consistent with other mlr3 learners

  • save_snapshot:

    • Actual default: TRUE

    • Adjusted default: FALSE

    • Reason for change: consistent with other mlr3 learners

References

Dorogush, Veronika A, Ershov, Vasily, Gulin, Andrey (2018). “CatBoost: gradient boosting with categorical features support.” arXiv preprint arXiv:1810.11363.

See also

Author

sumny

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifCatboost

Methods

Inherited methods


Method new()

Create a LearnerClassifCatboost object.


Method importance()

The importance scores are calculated using catboost.get_feature_importance, setting type = "FeatureImportance", returned for 'all'.

Usage

LearnerClassifCatboost$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerClassifCatboost$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

learner = mlr3::lrn("classif.catboost")
print(learner)
#> <LearnerClassifCatboost:classif.catboost>: Gradient Boosting
#> * Model: -
#> * Parameters: loss_function_twoclass=Logloss,
#>   loss_function_multiclass=MultiClass, logging_level=Silent,
#>   thread_count=1, allow_writing_files=FALSE, save_snapshot=FALSE
#> * Packages: mlr3, mlr3extralearners, catboost
#> * Predict Types:  [response], prob
#> * Feature Types: numeric, factor, ordered
#> * Properties: importance, missings, multiclass, twoclass, weights

# available parameters:
learner$param_set$ids()
#>  [1] "loss_function_twoclass"         "loss_function_multiclass"      
#>  [3] "iterations"                     "learning_rate"                 
#>  [5] "random_seed"                    "l2_leaf_reg"                   
#>  [7] "bootstrap_type"                 "bagging_temperature"           
#>  [9] "subsample"                      "sampling_frequency"            
#> [11] "sampling_unit"                  "mvs_reg"                       
#> [13] "random_strength"                "depth"                         
#> [15] "grow_policy"                    "min_data_in_leaf"              
#> [17] "max_leaves"                     "ignored_features"              
#> [19] "one_hot_max_size"               "has_time"                      
#> [21] "rsm"                            "nan_mode"                      
#> [23] "fold_permutation_block"         "leaf_estimation_method"        
#> [25] "leaf_estimation_iterations"     "leaf_estimation_backtracking"  
#> [27] "fold_len_multiplier"            "approx_on_full_history"        
#> [29] "class_weights"                  "auto_class_weights"            
#> [31] "boosting_type"                  "boost_from_average"            
#> [33] "langevin"                       "diffusion_temperature"         
#> [35] "score_function"                 "monotone_constraints"          
#> [37] "feature_weights"                "first_feature_use_penalties"   
#> [39] "penalties_coefficient"          "per_object_feature_penalties"  
#> [41] "model_shrink_rate"              "model_shrink_mode"             
#> [43] "target_border"                  "border_count"                  
#> [45] "feature_border_type"            "per_float_feature_quantization"
#> [47] "classes_count"                  "thread_count"                  
#> [49] "task_type"                      "devices"                       
#> [51] "logging_level"                  "metric_period"                 
#> [53] "train_dir"                      "model_size_reg"                
#> [55] "allow_writing_files"            "save_snapshot"                 
#> [57] "snapshot_file"                  "snapshot_interval"             
#> [59] "simple_ctr"                     "combinations_ctr"              
#> [61] "ctr_target_border_count"        "counter_calc_method"           
#> [63] "max_ctr_complexity"             "ctr_leaf_count_limit"          
#> [65] "store_all_simple_ctr"           "final_ctr_computation_mode"    
#> [67] "verbose"                        "ntree_start"                   
#> [69] "ntree_end"