Skip to contents

Gradient boosting algorithm that also supports categorical data. Calls catboost::catboost.train() from package 'catboost'.

Dictionary

This Learner can be instantiated via lrn():

lrn("classif.catboost")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “numeric”, “factor”, “ordered”

  • Required Packages: mlr3, mlr3extralearners, catboost

Parameters

IdTypeDefaultLevelsRange
loss_function_twoclasscharacterLoglossLogloss, CrossEntropy-
loss_function_multiclasscharacterMultiClassMultiClass, MultiClassOneVsAll-
learning_ratenumeric0.03\([0.001, 1]\)
random_seedinteger0\([0, \infty)\)
l2_leaf_regnumeric3\([0, \infty)\)
bootstrap_typecharacter-Bayesian, Bernoulli, MVS, Poisson, No-
bagging_temperaturenumeric1\([0, \infty)\)
subsamplenumeric-\([0, 1]\)
sampling_frequencycharacterPerTreeLevelPerTree, PerTreeLevel-
sampling_unitcharacterObjectObject, Group-
mvs_regnumeric-\([0, \infty)\)
random_strengthnumeric1\([0, \infty)\)
depthinteger6\([1, 16]\)
grow_policycharacterSymmetricTreeSymmetricTree, Depthwise, Lossguide-
min_data_in_leafinteger1\([1, \infty)\)
max_leavesinteger31\([1, \infty)\)
ignored_featuresuntypedNULL-
one_hot_max_sizeuntypedFALSE-
has_timelogicalFALSETRUE, FALSE-
rsmnumeric1\([0.001, 1]\)
nan_modecharacterMinMin, Max-
fold_permutation_blockinteger-\([1, 256]\)
leaf_estimation_methodcharacter-Newton, Gradient, Exact-
leaf_estimation_iterationsinteger-\([1, \infty)\)
leaf_estimation_backtrackingcharacterAnyImprovementNo, AnyImprovement, Armijo-
fold_len_multipliernumeric2\([1.001, \infty)\)
approx_on_full_historylogicalTRUETRUE, FALSE-
class_weightsuntyped--
auto_class_weightscharacterNoneNone, Balanced, SqrtBalanced-
boosting_typecharacter-Ordered, Plain-
boost_from_averagelogical-TRUE, FALSE-
langevinlogicalFALSETRUE, FALSE-
diffusion_temperaturenumeric10000\([0, \infty)\)
score_functioncharacterCosineCosine, L2, NewtonCosine, NewtonL2-
monotone_constraintsuntyped--
feature_weightsuntyped--
first_feature_use_penaltiesuntyped--
penalties_coefficientnumeric1\([0, \infty)\)
per_object_feature_penaltiesuntyped--
model_shrink_ratenumeric-\((-\infty, \infty)\)
model_shrink_modecharacter-Constant, Decreasing-
target_bordernumeric-\((-\infty, \infty)\)
border_countinteger-\([1, 65535]\)
feature_border_typecharacterGreedyLogSumMedian, Uniform, UniformAndQuantiles, MaxLogSum, MinEntropy, GreedyLogSum-
per_float_feature_quantizationuntyped--
classes_countinteger-\([1, \infty)\)
thread_countinteger1\([-1, \infty)\)
task_typecharacterCPUCPU, GPU-
devicesuntyped--
logging_levelcharacterSilentSilent, Verbose, Info, Debug-
metric_periodinteger1\([1, \infty)\)
train_diruntyped"catboost_info"-
model_size_regnumeric0.5\([0, 1]\)
allow_writing_fileslogicalFALSETRUE, FALSE-
save_snapshotlogicalFALSETRUE, FALSE-
snapshot_fileuntyped--
snapshot_intervalinteger600\([1, \infty)\)
simple_ctruntyped--
combinations_ctruntyped--
ctr_target_border_countinteger-\([1, 255]\)
counter_calc_methodcharacterFullSkipTest, Full-
max_ctr_complexityinteger-\([1, \infty)\)
ctr_leaf_count_limitinteger-\([1, \infty)\)
store_all_simple_ctrlogicalFALSETRUE, FALSE-
final_ctr_computation_modecharacterDefaultDefault, Skip-
verboselogicalFALSETRUE, FALSE-
ntree_startinteger0\([0, \infty)\)
ntree_endinteger0\([0, \infty)\)
early_stopping_roundsinteger-\([1, \infty)\)
eval_metricuntyped--
use_best_modellogical-TRUE, FALSE-
iterationsinteger1000\([1, \infty)\)

Initial parameter values

  • logging_level:

    • Actual default: "Verbose"

    • Adjusted default: "Silent"

    • Reason for change: consistent with other mlr3 learners

  • thread_count:

    • Actual default: -1

    • Adjusted default: 1

    • Reason for change: consistent with other mlr3 learners

  • allow_writing_files:

    • Actual default: TRUE

    • Adjusted default: FALSE

    • Reason for change: consistent with other mlr3 learners

  • save_snapshot:

    • Actual default: TRUE

    • Adjusted default: FALSE

    • Reason for change: consistent with other mlr3 learners

Early stopping

Early stopping can be used to find the optimal number of boosting rounds. Set early_stopping_rounds to an integer value to monitor the performance of the model on the validation set while training. For information on how to configure the validation set, see the Validation section of mlr3::Learner.

References

Dorogush, Veronika A, Ershov, Vasily, Gulin, Andrey (2018). “CatBoost: gradient boosting with categorical features support.” arXiv preprint arXiv:1810.11363.

See also

Author

sumny

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifCatboost

Active bindings

internal_valid_scores

The last observation of the validation scores for all metrics. Extracted from model$evaluation_log

internal_tuned_values

Returns the early stopped iterations if early_stopping_rounds was set during training.

validate

How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

Methods

Inherited methods


Method new()

Create a LearnerClassifCatboost object.


Method importance()

The importance scores are calculated using catboost.get_feature_importance, setting type = "FeatureImportance", returned for 'all'.

Usage

LearnerClassifCatboost$importance()

Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerClassifCatboost$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner
learner = lrn("classif.catboost",
  iterations = 100)

print(learner)
#> 
#> ── <LearnerClassifCatboost> (classif.catboost): Gradient Boosting ──────────────
#> • Model: -
#> • Parameters: loss_function_twoclass=Logloss,
#> loss_function_multiclass=MultiClass, thread_count=1, logging_level=Silent,
#> allow_writing_files=FALSE, save_snapshot=FALSE, iterations=100
#> • Validate: NULL
#> • Packages: mlr3, mlr3extralearners, and catboost
#> • Predict Types: [response] and prob
#> • Feature Types: numeric, factor, and ordered
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, internal_tuning, missings, multiclass, twoclass,
#> validation, and weights
#> • Other settings: use_weights = 'use'

# Define a Task
task = tsk("sonar")

# Create train and test set
ids = partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#> CatBoost model (100 trees)
#> Loss function: Logloss
#> Fit to 60 feature(s)
print(learner$importance())
#>         V12         V11         V23         V37         V36          V4 
#> 10.34162242  9.32964726  6.07366885  5.09459775  4.22878178  2.82856489 
#>          V5         V16         V27         V31         V49          V9 
#>  2.78382568  2.64067348  2.63456842  2.57117812  2.42556326  2.25608288 
#>         V17         V59         V54         V51         V26         V46 
#>  2.20232790  2.04867034  1.89768167  1.76066318  1.71865031  1.69456006 
#>         V13         V10         V15         V48         V45         V28 
#>  1.69121358  1.68651193  1.67417535  1.62989035  1.60893194  1.48865759 
#>         V52         V43          V6         V38         V22         V21 
#>  1.48719026  1.46921980  1.18806508  1.14148780  1.12970057  1.11443621 
#>         V24         V44         V41         V39         V18         V29 
#>  1.06417438  1.01538794  1.00718961  0.96707343  0.93358056  0.88155963 
#>         V60         V47         V50          V1         V58         V14 
#>  0.87109631  0.85653284  0.83197596  0.77995533  0.77945387  0.76296876 
#>         V32          V7         V30         V20         V40         V53 
#>  0.68179931  0.66527427  0.63163293  0.61220142  0.55714482  0.52245120 
#>         V19         V33         V57          V3          V8         V25 
#>  0.50598989  0.48155724  0.44536719  0.41424528  0.36108806  0.34374462 
#>         V35          V2         V42         V55         V56         V34 
#>  0.31058020  0.27247967  0.20936065  0.15864558  0.13610017  0.09858018 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> classif.ce 
#>  0.1884058