Skip to contents

This vignette lists some common issues that one should pay attention to when implementing a Learner.

Ordering Features

When implementing the private $.predict() method it is a good idea to retrieve the columns in the same order as they were during $.train(). While this might not matter in most cases, we have encountered examples where this was necessary to ensure that the learner worked as expected. For this purpose, the ordered_features helper function exists that is also used in the learner template.

Accessing Internals From $state

Sometimes, one needs additional information that might not be stored in the machine learning model itself. In such cases, one might be tempted to access data from the $state of a Learner beyond $state$model. However, this is a rather internal data structure that might change in the future and one should not rely on it. Furthermore, the $state that is created when a learner is resampled is not the same that is created in a manual $train(). If you absolutely need additional information, you can make the private $.train() method return a list() with the actual model and additional metadata so that both is available during $.train(). In such cases, it is a good idea to first consult with the maintainer of mlr3extralearners if this is really necessary.

Default vs. Initial Parameters

When creating a parameter (via p_dbl() and friends), there are two similar arguments that can be specified, namely init and default. On the one hand, the argument default should describe the default value that the upstream package uses when no specific value is set. E.g., if one were to connect the linear model to mlr3, the default for the parameter singualr.ok can be accessed via formals(lm)$singular.ok. Note that these default values do not set any specific parameter values in the $param_set of the Learner. On the other hand, the init field describes to what the parameter value should be initialized to (so it is then also accessible via learner$param_set$values$<id>). By default, it is not initialized to any value, which means that the default behavior is used.

Complex Defaults

When annotating the default value for a parameter (argument default) in the ParamSet, there are cases where defaults are complex expressions such as sample.int(10000L). In such cases, it is ok to not specify any default value for the parameter, which paradox then interpretes as having a complex default.