Skip to contents

Gradient boosting algorithm that also supports categorical data. Calls catboost::catboost.train() from package 'catboost'.

Dictionary

This Learner can be instantiated via lrn():

lrn("classif.catboost")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “numeric”, “factor”, “ordered”

  • Required Packages: mlr3, mlr3extralearners, catboost

Parameters

IdTypeDefaultLevelsRange
loss_function_twoclasscharacterLoglossLogloss, CrossEntropy-
loss_function_multiclasscharacterMultiClassMultiClass, MultiClassOneVsAll-
learning_ratenumeric0.03\([0.001, 1]\)
random_seedinteger0\([0, \infty)\)
l2_leaf_regnumeric3\([0, \infty)\)
bootstrap_typecharacter-Bayesian, Bernoulli, MVS, Poisson, No-
bagging_temperaturenumeric1\([0, \infty)\)
subsamplenumeric-\([0, 1]\)
sampling_frequencycharacterPerTreeLevelPerTree, PerTreeLevel-
sampling_unitcharacterObjectObject, Group-
mvs_regnumeric-\([0, \infty)\)
random_strengthnumeric1\([0, \infty)\)
depthinteger6\([1, 16]\)
grow_policycharacterSymmetricTreeSymmetricTree, Depthwise, Lossguide-
min_data_in_leafinteger1\([1, \infty)\)
max_leavesinteger31\([1, \infty)\)
ignored_featuresuntypedNULL-
one_hot_max_sizeuntypedFALSE-
has_timelogicalFALSETRUE, FALSE-
rsmnumeric1\([0.001, 1]\)
nan_modecharacterMinMin, Max-
fold_permutation_blockinteger-\([1, 256]\)
leaf_estimation_methodcharacter-Newton, Gradient, Exact-
leaf_estimation_iterationsinteger-\([1, \infty)\)
leaf_estimation_backtrackingcharacterAnyImprovementNo, AnyImprovement, Armijo-
fold_len_multipliernumeric2\([1.001, \infty)\)
approx_on_full_historylogicalTRUETRUE, FALSE-
class_weightsuntyped--
auto_class_weightscharacterNoneNone, Balanced, SqrtBalanced-
boosting_typecharacter-Ordered, Plain-
boost_from_averagelogical-TRUE, FALSE-
langevinlogicalFALSETRUE, FALSE-
diffusion_temperaturenumeric10000\([0, \infty)\)
score_functioncharacterCosineCosine, L2, NewtonCosine, NewtonL2-
monotone_constraintsuntyped--
feature_weightsuntyped--
first_feature_use_penaltiesuntyped--
penalties_coefficientnumeric1\([0, \infty)\)
per_object_feature_penaltiesuntyped--
model_shrink_ratenumeric-\((-\infty, \infty)\)
model_shrink_modecharacter-Constant, Decreasing-
target_bordernumeric-\((-\infty, \infty)\)
border_countinteger-\([1, 65535]\)
feature_border_typecharacterGreedyLogSumMedian, Uniform, UniformAndQuantiles, MaxLogSum, MinEntropy, GreedyLogSum-
per_float_feature_quantizationuntyped--
classes_countinteger-\([1, \infty)\)
thread_countinteger1\([-1, \infty)\)
task_typecharacterCPUCPU, GPU-
devicesuntyped--
logging_levelcharacterSilentSilent, Verbose, Info, Debug-
metric_periodinteger1\([1, \infty)\)
train_diruntyped"catboost_info"-
model_size_regnumeric0.5\([0, 1]\)
allow_writing_fileslogicalFALSETRUE, FALSE-
save_snapshotlogicalFALSETRUE, FALSE-
snapshot_fileuntyped--
snapshot_intervalinteger600\([1, \infty)\)
simple_ctruntyped--
combinations_ctruntyped--
ctr_target_border_countinteger-\([1, 255]\)
counter_calc_methodcharacterFullSkipTest, Full-
max_ctr_complexityinteger-\([1, \infty)\)
ctr_leaf_count_limitinteger-\([1, \infty)\)
store_all_simple_ctrlogicalFALSETRUE, FALSE-
final_ctr_computation_modecharacterDefaultDefault, Skip-
verboselogicalFALSETRUE, FALSE-
ntree_startinteger0\([0, \infty)\)
ntree_endinteger0\([0, \infty)\)
early_stopping_roundsinteger-\([1, \infty)\)
eval_metricuntyped--
use_best_modellogical-TRUE, FALSE-
iterationsinteger1000\([1, \infty)\)

Initial parameter values

  • logging_level:

    • Actual default: "Verbose"

    • Adjusted default: "Silent"

    • Reason for change: consistent with other mlr3 learners

  • thread_count:

    • Actual default: -1

    • Adjusted default: 1

    • Reason for change: consistent with other mlr3 learners

  • allow_writing_files:

    • Actual default: TRUE

    • Adjusted default: FALSE

    • Reason for change: consistent with other mlr3 learners

  • save_snapshot:

    • Actual default: TRUE

    • Adjusted default: FALSE

    • Reason for change: consistent with other mlr3 learners

Early stopping

Early stopping can be used to find the optimal number of boosting rounds. Set early_stopping_rounds to an integer value to monitor the performance of the model on the validation set while training. For information on how to configure the validation set, see the Validation section of mlr3::Learner.

References

Dorogush, Veronika A, Ershov, Vasily, Gulin, Andrey (2018). “CatBoost: gradient boosting with categorical features support.” arXiv preprint arXiv:1810.11363.

See also

Author

sumny

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifCatboost

Active bindings

internal_valid_scores

The last observation of the validation scores for all metrics. Extracted from model$evaluation_log

internal_tuned_values

Returns the early stopped iterations if early_stopping_rounds was set during training.

validate

How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

Methods

Inherited methods


Method new()

Create a LearnerClassifCatboost object.


Method importance()

The importance scores are calculated using catboost.get_feature_importance, setting type = "FeatureImportance", returned for 'all'.

Usage

LearnerClassifCatboost$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerClassifCatboost$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner
learner = lrn("classif.catboost",
  iterations = 100)

print(learner)
#> 
#> ── <LearnerClassifCatboost> (classif.catboost): Gradient Boosting ──────────────
#> • Model: -
#> • Parameters: loss_function_twoclass=Logloss,
#> loss_function_multiclass=MultiClass, thread_count=1, logging_level=Silent,
#> allow_writing_files=FALSE, save_snapshot=FALSE, iterations=100
#> • Validate: NULL
#> • Packages: mlr3, mlr3extralearners, and catboost
#> • Predict Types: [response] and prob
#> • Feature Types: numeric, factor, and ordered
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, internal_tuning, missings, multiclass, twoclass,
#> validation, and weights
#> • Other settings: use_weights = 'use'

# Define a Task
task = tsk("sonar")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#> CatBoost model (100 trees)
#> Loss function: Logloss
#> Fit to 60 feature(s)
print(learner$importance())
#>       V11       V12       V48       V51        V9       V49       V59       V21 
#> 7.6546477 6.9653648 6.1634126 4.1633041 3.3778780 2.8893515 2.7717928 2.6912525 
#>       V46        V4       V37       V52       V26       V31       V10       V36 
#> 2.6868270 2.6715206 2.3193679 2.2108360 2.1593159 2.1060094 1.9789006 1.8433505 
#>       V23       V45       V27       V38       V17       V28       V20       V55 
#> 1.8168854 1.7288877 1.7279226 1.6954539 1.6865815 1.5928695 1.5700380 1.5478907 
#>       V54       V13       V44        V7       V19       V30        V8        V6 
#> 1.4031872 1.3555040 1.3366502 1.2634338 1.2532397 1.1914064 1.1810551 1.1713524 
#>       V57       V34       V15       V29       V24       V32       V58       V33 
#> 1.1453927 1.0975232 1.0912341 1.0713949 1.0630458 1.0002467 0.9960581 0.9465447 
#>       V47        V1       V25       V16       V39       V22       V40        V2 
#> 0.9442042 0.9113658 0.8960506 0.8852707 0.8813629 0.8600599 0.8596925 0.7792917 
#>       V43        V3       V18       V35       V53       V60       V41       V14 
#> 0.7420614 0.7072267 0.6852019 0.6520146 0.6312928 0.5955998 0.4985880 0.4980606 
#>       V56        V5       V42       V50 
#> 0.4671941 0.3840224 0.2909601 0.2435489 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> classif.ce 
#> 0.08695652