Gradient Boosted Decision Trees Classification Learner

Gradient boosting algorithm that also supports categorical data. Calls catboost::catboost.train() from package 'catboost'.

Dictionary

This Learner can be instantiated via lrn():

lrn("classif.catboost")

Meta Information

Task type: “classif”
Predict Types: “response”, “prob”
Feature Types: “numeric”, “factor”, “ordered”
Required Packages: mlr3, mlr3extralearners, catboost

Parameters

Id	Type	Default	Levels	Range
loss_function_twoclass	character	Logloss	Logloss, CrossEntropy	-
loss_function_multiclass	character	MultiClass	MultiClass, MultiClassOneVsAll	-
learning_rate	numeric	0.03		$[0.001, 1]$
random_seed	integer	0		$[0, \infty)$
l2_leaf_reg	numeric	3		$[0, \infty)$
bootstrap_type	character	-	Bayesian, Bernoulli, MVS, Poisson, No	-
bagging_temperature	numeric	1		$[0, \infty)$
subsample	numeric	-		$[0, 1]$
sampling_frequency	character	PerTreeLevel	PerTree, PerTreeLevel	-
sampling_unit	character	Object	Object, Group	-
mvs_reg	numeric	-		$[0, \infty)$
random_strength	numeric	1		$[0, \infty)$
depth	integer	6		$[1, 16]$
grow_policy	character	SymmetricTree	SymmetricTree, Depthwise, Lossguide	-
min_data_in_leaf	integer	1		$[1, \infty)$
max_leaves	integer	31		$[1, \infty)$
ignored_features	untyped	NULL		-
one_hot_max_size	untyped	FALSE		-
has_time	logical	FALSE	TRUE, FALSE	-
rsm	numeric	1		$[0.001, 1]$
nan_mode	character	Min	Min, Max	-
fold_permutation_block	integer	-		$[1, 256]$
leaf_estimation_method	character	-	Newton, Gradient, Exact	-
leaf_estimation_iterations	integer	-		$[1, \infty)$
leaf_estimation_backtracking	character	AnyImprovement	No, AnyImprovement, Armijo	-
fold_len_multiplier	numeric	2		$[1.001, \infty)$
approx_on_full_history	logical	TRUE	TRUE, FALSE	-
class_weights	untyped	-		-
auto_class_weights	character	None	None, Balanced, SqrtBalanced	-
boosting_type	character	-	Ordered, Plain	-
boost_from_average	logical	-	TRUE, FALSE	-
langevin	logical	FALSE	TRUE, FALSE	-
diffusion_temperature	numeric	10000		$[0, \infty)$
score_function	character	Cosine	Cosine, L2, NewtonCosine, NewtonL2	-
monotone_constraints	untyped	-		-
feature_weights	untyped	-		-
first_feature_use_penalties	untyped	-		-
penalties_coefficient	numeric	1		$[0, \infty)$
per_object_feature_penalties	untyped	-		-
model_shrink_rate	numeric	-		$(-\infty, \infty)$
model_shrink_mode	character	-	Constant, Decreasing	-
target_border	numeric	-		$(-\infty, \infty)$
border_count	integer	-		$[1, 65535]$
feature_border_type	character	GreedyLogSum	Median, Uniform, UniformAndQuantiles, MaxLogSum, MinEntropy, GreedyLogSum	-
per_float_feature_quantization	untyped	-		-
classes_count	integer	-		$[1, \infty)$
thread_count	integer	1		$[-1, \infty)$
task_type	character	CPU	CPU, GPU	-
devices	untyped	-		-
logging_level	character	Silent	Silent, Verbose, Info, Debug	-
metric_period	integer	1		$[1, \infty)$
train_dir	untyped	"catboost_info"		-
model_size_reg	numeric	0.5		$[0, 1]$
allow_writing_files	logical	FALSE	TRUE, FALSE	-
save_snapshot	logical	FALSE	TRUE, FALSE	-
snapshot_file	untyped	-		-
snapshot_interval	integer	600		$[1, \infty)$
simple_ctr	untyped	-		-
combinations_ctr	untyped	-		-
ctr_target_border_count	integer	-		$[1, 255]$
counter_calc_method	character	Full	SkipTest, Full	-
max_ctr_complexity	integer	-		$[1, \infty)$
ctr_leaf_count_limit	integer	-		$[1, \infty)$
store_all_simple_ctr	logical	FALSE	TRUE, FALSE	-
final_ctr_computation_mode	character	Default	Default, Skip	-
verbose	logical	FALSE	TRUE, FALSE	-
ntree_start	integer	0		$[0, \infty)$
ntree_end	integer	0		$[0, \infty)$
early_stopping_rounds	integer	-		$[1, \infty)$
eval_metric	untyped	-		-
use_best_model	logical	-	TRUE, FALSE	-
iterations	integer	1000		$[1, \infty)$

Installation

See https://catboost.ai/en/docs/concepts/r-installation.

Initial parameter values

logging_level:
- Actual default: "Verbose"
- Adjusted default: "Silent"
- Reason for change: consistent with other mlr3 learners
thread_count:
- Actual default: -1
- Adjusted default: 1
- Reason for change: consistent with other mlr3 learners
allow_writing_files:
- Actual default: TRUE
- Adjusted default: FALSE
- Reason for change: consistent with other mlr3 learners
save_snapshot:
- Actual default: TRUE
- Adjusted default: FALSE
- Reason for change: consistent with other mlr3 learners

Early stopping

Early stopping can be used to find the optimal number of boosting rounds. Set early_stopping_rounds to an integer value to monitor the performance of the model on the validation set while training. For information on how to configure the validation set, see the Validation section of mlr3::Learner.

References

Dorogush, Veronika A, Ershov, Vasily, Gulin, Andrey (2018). “CatBoost: gradient boosting with categorical features support.” arXiv preprint arXiv:1810.11363.

Author

sumny

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifCatboost

Active bindings

internal_valid_scores: The last observation of the validation scores for all metrics. Extracted from model$evaluation_log
internal_tuned_values: Returns the early stopped iterations if early_stopping_rounds was set during training.
validate: How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

Methods

Inherited methods

Method `new()`

Create a LearnerClassifCatboost object.

Usage

LearnerClassifCatboost$new()

Method `importance()`

The importance scores are calculated using catboost.get_feature_importance, setting type = "FeatureImportance", returned for 'all'.

Usage

LearnerClassifCatboost$importance()

Returns

Named numeric().

Method `clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClassifCatboost$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

# Define the Learner
learner = mlr3::lrn("classif.catboost",
  iterations = 100)

print(learner)
#> 
#> ── <LearnerClassifCatboost> (classif.catboost): Gradient Boosting ──────────────
#> • Model: -
#> • Parameters: loss_function_twoclass=Logloss,
#> loss_function_multiclass=MultiClass, thread_count=1, logging_level=Silent,
#> allow_writing_files=FALSE, save_snapshot=FALSE, iterations=100
#> • Validate: NULL
#> • Packages: mlr3, mlr3extralearners, and catboost
#> • Predict Types: [response] and prob
#> • Feature Types: numeric, factor, and ordered
#> • Encapsulation: none (fallback: -)
#> • Properties: importance, internal_tuning, missings, multiclass, twoclass,
#> validation, and weights
#> • Other settings: use_weights = 'use'

# Define a Task
task = tsk("sonar")

# Create train and test set
ids = mlr3::partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
#> CatBoost model (100 trees)
#> Loss function: Logloss
#> Fit to 60 feature(s)
print(learner$importance())
#>        V11        V12        V36        V21        V27        V37         V9 
#> 11.5607056  6.7811584  4.4546517  4.2292058  3.4351623  3.3053435  2.7695125 
#>        V45        V54        V43        V52        V18        V10        V20 
#>  2.6690557  2.4210825  2.4144560  2.1401042  2.0756787  2.0028152  1.8518178 
#>        V17        V46        V59        V31        V49        V39        V13 
#>  1.8344482  1.8232116  1.7453405  1.6969428  1.5849752  1.5775737  1.5487924 
#>        V23        V55         V6        V48        V44        V26        V47 
#>  1.4654665  1.4614116  1.4490613  1.4130349  1.4003827  1.3959912  1.3492427 
#>         V8         V4        V34        V35        V32        V16        V28 
#>  1.3212656  1.2919139  1.2641775  1.2620243  1.2098881  1.1770722  1.1226135 
#>        V19         V7        V53        V33        V22        V14        V15 
#>  1.1068656  1.1060228  1.1023659  1.0898664  1.0798700  1.0442254  0.9052875 
#>        V29        V40        V50        V24         V1        V38        V42 
#>  0.8225845  0.7745600  0.7551145  0.7420124  0.7267056  0.7023792  0.6605845 
#>        V56         V3         V5        V60        V51         V2        V25 
#>  0.5997941  0.5968487  0.5758441  0.5457311  0.4948560  0.4809281  0.4443273 
#>        V57        V58        V30        V41 
#>  0.3504080  0.3163114  0.2904782  0.1804482 

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()
#> classif.ce 
#>  0.1884058

Dictionary

Meta Information

Parameters

Installation

Initial parameter values

Early stopping

References

See also

Author

Super classes

Active bindings

Methods

Public methods

Method new()

Usage

Method importance()

Usage

Returns

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `importance()`

Method `clone()`