Common Issues
common_issues.Rmd
This vignette lists some common issues that one should pay attention
to when implementing a Learner
.
Ordering Features
When implementing the private $.predict()
method it is a
good idea to retrieve the columns in the same order as they were during
$.train()
. While this might not matter in most cases, we
have encountered examples where this was necessary to ensure that the
learner worked as expected. For this purpose, the
ordered_features
helper function exists that is also used
in the learner template.
Accessing Internals From $state
Sometimes, one needs additional information that might not be stored
in the machine learning model itself. In such cases, one might be
tempted to access data from the $state
of a
Learner
beyond $state$model
. However, this is
a rather internal data structure that might change in the future and one
should not rely on it. Furthermore, the $state
that is
created when a learner is resample
d is not the same that is
created in a manual $train()
. If you absolutely need
additional information, you can make the private $.train()
method return a list()
with the actual model and additional
metadata so that both is available during $.train()
. In
such cases, it is a good idea to first consult with the maintainer of
mlr3extralearners
if this is really necessary.
Default vs. Initial Parameters
When creating a parameter (via p_dbl()
and friends),
there are two similar arguments that can be specified, namely
init
and default
. On the one hand, the
argument default
should describe the default value that the
upstream package uses when no specific value is set. E.g., if one were
to connect the linear model to mlr3
, the default for the
parameter singualr.ok
can be accessed via
formals(lm)$singular.ok
. Note that these default values do
not set any specific parameter values in the $param_set
of
the Learner
. On the other hand, the init
field
describes to what the parameter value should be initialized to (so it is
then also accessible via
learner$param_set$values$<id>
). By default, it is not
initialized to any value, which means that the default behavior is
used.
Complex Defaults
When annotating the default value for a parameter (argument
default
) in the ParamSet
, there are cases
where defaults are complex expressions such as
sample.int(10000L)
. In such cases, it is ok to not specify
any default
value for the parameter, which
paradox
then interpretes as having a complex default.