Skip to contents

Gradient boosting algorithm that also supports categorical data. Calls catboost::catboost.train() from package 'catboost'.

Dictionary

This Learner can be instantiated via lrn():

lrn("classif.catboost")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “numeric”, “factor”, “ordered”

  • Required Packages: mlr3, mlr3extralearners, catboost

Parameters

IdTypeDefaultLevelsRange
loss_function_twoclasscharacterLoglossLogloss, CrossEntropy-
loss_function_multiclasscharacterMultiClassMultiClass, MultiClassOneVsAll-
learning_ratenumeric0.03\([0.001, 1]\)
random_seedinteger0\([0, \infty)\)
l2_leaf_regnumeric3\([0, \infty)\)
bootstrap_typecharacter-Bayesian, Bernoulli, MVS, Poisson, No-
bagging_temperaturenumeric1\([0, \infty)\)
subsamplenumeric-\([0, 1]\)
sampling_frequencycharacterPerTreeLevelPerTree, PerTreeLevel-
sampling_unitcharacterObjectObject, Group-
mvs_regnumeric-\([0, \infty)\)
random_strengthnumeric1\([0, \infty)\)
depthinteger6\([1, 16]\)
grow_policycharacterSymmetricTreeSymmetricTree, Depthwise, Lossguide-
min_data_in_leafinteger1\([1, \infty)\)
max_leavesinteger31\([1, \infty)\)
ignored_featuresuntypedNULL-
one_hot_max_sizeuntypedFALSE-
has_timelogicalFALSETRUE, FALSE-
rsmnumeric1\([0.001, 1]\)
nan_modecharacterMinMin, Max-
fold_permutation_blockinteger-\([1, 256]\)
leaf_estimation_methodcharacter-Newton, Gradient, Exact-
leaf_estimation_iterationsinteger-\([1, \infty)\)
leaf_estimation_backtrackingcharacterAnyImprovementNo, AnyImprovement, Armijo-
fold_len_multipliernumeric2\([1.001, \infty)\)
approx_on_full_historylogicalTRUETRUE, FALSE-
class_weightsuntyped--
auto_class_weightscharacterNoneNone, Balanced, SqrtBalanced-
boosting_typecharacter-Ordered, Plain-
boost_from_averagelogical-TRUE, FALSE-
langevinlogicalFALSETRUE, FALSE-
diffusion_temperaturenumeric10000\([0, \infty)\)
score_functioncharacterCosineCosine, L2, NewtonCosine, NewtonL2-
monotone_constraintsuntyped--
feature_weightsuntyped--
first_feature_use_penaltiesuntyped--
penalties_coefficientnumeric1\([0, \infty)\)
per_object_feature_penaltiesuntyped--
model_shrink_ratenumeric-\((-\infty, \infty)\)
model_shrink_modecharacter-Constant, Decreasing-
target_bordernumeric-\((-\infty, \infty)\)
border_countinteger-\([1, 65535]\)
feature_border_typecharacterGreedyLogSumMedian, Uniform, UniformAndQuantiles, MaxLogSum, MinEntropy, GreedyLogSum-
per_float_feature_quantizationuntyped--
classes_countinteger-\([1, \infty)\)
thread_countinteger1\([-1, \infty)\)
task_typecharacterCPUCPU, GPU-
devicesuntyped--
logging_levelcharacterSilentSilent, Verbose, Info, Debug-
metric_periodinteger1\([1, \infty)\)
train_diruntyped"catboost_info"-
model_size_regnumeric0.5\([0, 1]\)
allow_writing_fileslogicalFALSETRUE, FALSE-
save_snapshotlogicalFALSETRUE, FALSE-
snapshot_fileuntyped--
snapshot_intervalinteger600\([1, \infty)\)
simple_ctruntyped--
combinations_ctruntyped--
ctr_target_border_countinteger-\([1, 255]\)
counter_calc_methodcharacterFullSkipTest, Full-
max_ctr_complexityinteger-\([1, \infty)\)
ctr_leaf_count_limitinteger-\([1, \infty)\)
store_all_simple_ctrlogicalFALSETRUE, FALSE-
final_ctr_computation_modecharacterDefaultDefault, Skip-
verboselogicalFALSETRUE, FALSE-
ntree_startinteger0\([0, \infty)\)
ntree_endinteger0\([0, \infty)\)
early_stopping_roundsinteger-\([1, \infty)\)
eval_metricuntyped--
use_best_modellogical-TRUE, FALSE-
iterationsinteger1000\([1, \infty)\)

Initial parameter values

  • logging_level:

    • Actual default: "Verbose"

    • Adjusted default: "Silent"

    • Reason for change: consistent with other mlr3 learners

  • thread_count:

    • Actual default: -1

    • Adjusted default: 1

    • Reason for change: consistent with other mlr3 learners

  • allow_writing_files:

    • Actual default: TRUE

    • Adjusted default: FALSE

    • Reason for change: consistent with other mlr3 learners

  • save_snapshot:

    • Actual default: TRUE

    • Adjusted default: FALSE

    • Reason for change: consistent with other mlr3 learners

Early stopping

Early stopping can be used to find the optimal number of boosting rounds. Set early_stopping_rounds to an integer value to monitor the performance of the model on the validation set while training. For information on how to configure the validation set, see the Validation section of mlr3::Learner.

References

Dorogush, Veronika A, Ershov, Vasily, Gulin, Andrey (2018). “CatBoost: gradient boosting with categorical features support.” arXiv preprint arXiv:1810.11363.

See also

Author

sumny

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifCatboost

Active bindings

internal_valid_scores

The last observation of the validation scores for all metrics. Extracted from model$evaluation_log

internal_tuned_values

Returns the early stopped iterations if early_stopping_rounds was set during training.

validate

How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

Methods

Inherited methods


Method new()

Create a LearnerClassifCatboost object.


Method importance()

The importance scores are calculated using catboost::catboost.get_feature_importance(), setting type = "FeatureImportance", returned for 'all'.

Usage

LearnerClassifCatboost$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerClassifCatboost$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner
learner = lrn("classif.catboost")
print(learner)
#> 
#> ── <LearnerClassifCatboost> (classif.catboost): Gradient Boosting ──────────────
#> • Model: -
#> • Parameters: loss_function_twoclass=Logloss,
#> loss_function_multiclass=MultiClass, thread_count=1, logging_level=Silent,
#> allow_writing_files=FALSE, save_snapshot=FALSE
#> • Validate: NULL
#> • Packages: mlr3, mlr3extralearners, and catboost
#> • Predict Types: [response] and prob
#> • Feature Types: numeric, factor, and ordered
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, internal_tuning, missings, multiclass, twoclass,
#> validation, and weights
#> • Other settings: use_weights = 'use'

# Define a Task
task = tsk("sonar")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#> CatBoost model (1000 trees)
#> Loss function: Logloss
#> Fit to 60 feature(s)
print(learner$importance())
#>       V11       V48       V12        V4       V49       V28       V36       V13 
#> 8.6530351 4.8288852 4.6486352 3.9059753 3.4610327 3.4059687 3.3820078 3.2854925 
#>        V9       V27       V16       V47       V20       V21       V39       V37 
#> 3.0793247 2.8053404 2.3330848 2.2871508 2.2618606 2.2109455 2.1880184 1.8830878 
#>       V17       V52       V43       V45       V29       V51       V46       V23 
#> 1.8262063 1.7430704 1.6197335 1.5914369 1.5497583 1.5464912 1.5370276 1.4563905 
#>       V26       V31       V59       V55       V10       V32       V15        V1 
#> 1.4321003 1.3605685 1.3594862 1.3479737 1.2957126 1.1471920 1.1325616 1.0749990 
#>       V53       V58       V14       V18       V60       V42       V33       V57 
#> 1.0720086 1.0517964 1.0397222 1.0181008 1.0176749 0.9792318 0.9174857 0.8861993 
#>       V40        V8       V30       V38        V6       V54       V25       V35 
#> 0.8830457 0.8799004 0.8771126 0.8692596 0.8597371 0.8410322 0.8393296 0.8088684 
#>       V19       V24       V44        V5        V2       V22       V50        V7 
#> 0.7668132 0.7639119 0.7580799 0.6909821 0.6894752 0.6825773 0.6000329 0.5558185 
#>       V34       V41       V56        V3 
#> 0.5184849 0.5107203 0.4968237 0.4852204 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> classif.ce 
#>  0.1014493