3.2 Tuning Search Spaces
When running an optimization, it is important to inform the tuning algorithm about what hyperparameters are valid.
Here the names, types, and valid ranges of each hyperparameter are important.
All this information is communicated with objects of the class ParamSet
, which is defined in paradox.
While it is possible to create ParamSet
-objects using its $new
-constructor, it is much shorter and readable to use the ps
-shortcut, which will be presented here.
For an in-depth description of paradox and its classes, see the paradox
chapter.
Note, that ParamSet
objects exist in two contexts.
First, ParamSet
-objects are used to define the space of valid parameter setting for a learner (and other objects).
Second, they are used to define a search space for tuning.
We are mainly interested in the latter.
For an example we can consider the minsplit
parameter of the classif.rpart Learner
.
The ParamSet
associated with the learner has a lower but no upper bound.
However, for tuning the value, a lower and upper bound must be given because tuning search spaces need to be bounded.
For Learner
or PipeOp
objects, typically “unbounded” ParamSets
are used.
Here, however, we will mainly focus on creating “bounded” ParamSets
that can be used for tuning.
See the in-depth paradox
chapter for more details on using ParamSets
to define parameter ranges for use-cases besides tuning.
3.2.1 Creating ParamSet
s
An empty ParamSet
– not yet very useful – can be constructed using just the ps
call:
library("paradox")
= ps()
search_space print(search_space)
## <ParamSet>
## Empty.
ps
takes named Domain
arguments that are turned into parameters. A possible search space for the "classif.svm"
learner could for example be:
= ps(
search_space cost = p_dbl(lower = 0.1, upper = 10),
kernel = p_fct(levels = c("polynomial", "radial"))
)print(search_space)
## <ParamSet>
## id class lower upper levels default value
## 1: cost ParamDbl 0.1 10 <NoDefault[3]>
## 2: kernel ParamFct NA NA polynomial,radial <NoDefault[3]>
There are five domain constructors that produce a parameters when given to ps
:
Constructor | Description | Is bounded? | Underlying Class |
---|---|---|---|
p_dbl |
Real valued parameter (“double”) | When upper and lower are given |
ParamDbl |
p_int |
Integer parameter | When upper and lower are given |
ParamInt |
p_fct |
Discrete valued parameter (“factor”) | Always | ParamFct |
p_lgl |
Logical / Boolean parameter | Always | ParamLgl |
p_uty |
Untyped parameter | Never | ParamUty |
These domain constructors each take some of the following arguments:
lower
,upper
: lower and upper bound of numerical parameters (p_dbl
andp_int
). These need to be given to get bounded parameter spaces valid for tuning.levels
: Allowed categorical values forp_fct
parameters. Required argument forp_fct
. See below for more details on this parameter.trafo
: transformation function, see below.depends
: dependencies, see below.tags
: Further information about a parameter, used for example by thehyperband
tuner.default
: Value corresponding to default behavior when the parameter is not given. Not used for tuning search spaces.special_vals
: Valid values besides the normally accepted values for a parameter. Not used for tuning search spaces.custom_check
: Function that checks whether a value given top_uty
is valid. Not used for tuning search spaces.
The lower
, upper
, or levels
parameters are always at the first (or second, for upper
) position of the respective constructors, so it is preferred to omit them when defining a ParamSet
, for improved conciseness:
= ps(cost = p_dbl(0.1, 10), kernel = p_fct(c("polynomial", "radial"))) search_space
3.2.2 Transformations (trafo
)
We can use the paradox function generate_design_grid
to look at the values that would be evaluated by grid search.
(We are using rbindlist()
here because the result of $transpose()
is a list that is harder to read. If we didn’t use $transpose()
, on the other hand, the transformations that we investigate here are not applied.)
library("data.table")
rbindlist(generate_design_grid(search_space, 3)$transpose())
## cost kernel
## 1: 0.10 polynomial
## 2: 0.10 radial
## 3: 5.05 polynomial
## 4: 5.05 radial
## 5: 10.00 polynomial
## 6: 10.00 radial
We notice that the cost
parameter is taken on a linear scale.
We assume, however, that the difference of cost between 0.1
and 1
should have a similar effect as the difference between 1
and 10
.
Therefore it makes more sense to tune it on a logarithmic scale.
This is done by using a transformation (trafo
).
This is a function that is applied to a parameter after it has been sampled by the tuner.
We can tune cost
on a logarithmic scale by sampling on the linear scale [-1, 1]
and computing 10^x
from that value.
= ps(
search_space cost = p_dbl(-1, 1, trafo = function(x) 10^x),
kernel = p_fct(c("polynomial", "radial"))
)rbindlist(generate_design_grid(search_space, 3)$transpose())
## cost kernel
## 1: 0.1 polynomial
## 2: 0.1 radial
## 3: 1.0 polynomial
## 4: 1.0 radial
## 5: 10.0 polynomial
## 6: 10.0 radial
It is even possible to attach another transformation to the ParamSet
as a whole that gets executed after individual parameter’s transformations were performed.
It is given through the .extra_trafo
argument and should be a function with parameters x
and param_set
that takes a list of parameter values in x
and returns a modified list.
This transformation can access all parameter values of an evaluation and modify them with interactions.
It is even possible to add or remove parameters.
(The following is a bit of a silly example.)
= ps(
search_space cost = p_dbl(-1, 1, trafo = function(x) 10^x),
kernel = p_fct(c("polynomial", "radial")),
.extra_trafo = function(x, param_set) {
if (x$kernel == "polynomial") {
$cost = x$cost * 2
x
}
x
}
)rbindlist(generate_design_grid(search_space, 3)$transpose())
## cost kernel
## 1: 0.2 polynomial
## 2: 0.1 radial
## 3: 2.0 polynomial
## 4: 1.0 radial
## 5: 20.0 polynomial
## 6: 10.0 radial
The available types of search space parameters are limited: continuous, integer, discrete, and logical scalars.
There are many machine learning algorithms, however, that take parameters of other types, for example vectors or functions.
These can not be defined in a search space ParamSet
, and they are often given as ParamUty
in the Learner
’s ParamSet
.
When trying to tune over these hyperparameters, it is necessary to perform a Transformation that changes the type of a parameter.
An example is the class.weights
parameter of the SVM, which takes a named vector of class weights with one entry for each target class.
The trafo that would tune class.weights
for the tsk("spam")
dataset could be:
= ps(
search_space class.weights = p_dbl(0.1, 0.9, trafo = function(x) c(spam = x, nonspam = 1 - x))
)generate_design_grid(search_space, 3)$transpose()
## [[1]]
## [[1]]$class.weights
## spam nonspam
## 0.1 0.9
##
##
## [[2]]
## [[2]]$class.weights
## spam nonspam
## 0.5 0.5
##
##
## [[3]]
## [[3]]$class.weights
## spam nonspam
## 0.9 0.1
(We are omitting rbindlist()
in this example because it breaks the vector valued return elements.)
3.2.3 Automatic Factor Level Transformation
A common use-case is the necessity to specify a list of values that should all be tried (or sampled from).
It may be the case that a hyperparameter accepts function objects as values and a certain list of functions should be tried.
Or it may be that a choice of special numeric values should be tried.
For this, the p_fct
constructor’s level
argument may be a value that is not a character
vector, but something else.
If, for example, only the values 0.1
, 3
, and 10
should be tried for the cost
parameter, even when doing random search, then the following search space would achieve that:
= ps(
search_space cost = p_fct(c(0.1, 3, 10)),
kernel = p_fct(c("polynomial", "radial"))
)rbindlist(generate_design_grid(search_space, 3)$transpose())
## cost kernel
## 1: 0.1 polynomial
## 2: 0.1 radial
## 3: 3.0 polynomial
## 4: 3.0 radial
## 5: 10.0 polynomial
## 6: 10.0 radial
This is equivalent to the following:
= ps(
search_space cost = p_fct(c("0.1", "3", "10"),
trafo = function(x) list(`0.1` = 0.1, `3` = 3, `10` = 10)[[x]]),
kernel = p_fct(c("polynomial", "radial"))
)rbindlist(generate_design_grid(search_space, 3)$transpose())
## cost kernel
## 1: 0.1 polynomial
## 2: 0.1 radial
## 3: 3.0 polynomial
## 4: 3.0 radial
## 5: 10.0 polynomial
## 6: 10.0 radial
This may seem silly, but makes sense when considering that factorial tuning parameters are always character
values:
= ps(
search_space cost = p_fct(c(0.1, 3, 10)),
kernel = p_fct(c("polynomial", "radial"))
)typeof(search_space$params$cost$levels)
## [1] "character"
Be aware that this results in an “unordered” hyperparameter, however.
Tuning algorithms that make use of ordering information of parameters, like genetic algorithms or model based optimization, will perform worse when this is done.
For these algorithms, it may make more sense to define a p_dbl
or p_int
with a more fitting trafo.
The class.weights
case from above can also be implemented like this, if there are only a few candidates of class.weights
vectors that should be tried.
Note that the levels
argument of p_fct
must be named if there is no easy way for as.character()
to create names:
= ps(
search_space class.weights = p_fct(
list(
candidate_a = c(spam = 0.5, nonspam = 0.5),
candidate_b = c(spam = 0.3, nonspam = 0.7)
)
)
)generate_design_grid(search_space)$transpose()
## [[1]]
## [[1]]$class.weights
## spam nonspam
## 0.5 0.5
##
##
## [[2]]
## [[2]]$class.weights
## spam nonspam
## 0.3 0.7
3.2.4 Parameter Dependencies (depends
)
Some parameters are only relevant when another parameter has a certain value, or one of several values.
The SVM, for example, has the degree
parameter that is only valid when kernel
is "polynomial"
.
This can be specified using the depends
argument.
It is an expression that must involve other parameters and be of the form <param> == <scalar>
, <param> %in% <vector>
, or multiple of these chained by &&
.
To tune the degree
parameter, one would need to do the following:
= ps(
search_space cost = p_dbl(-1, 1, trafo = function(x) 10^x),
kernel = p_fct(c("polynomial", "radial")),
degree = p_int(1, 3, depends = kernel == "polynomial")
)rbindlist(generate_design_grid(search_space, 3)$transpose(), fill = TRUE)
## cost kernel degree
## 1: 0.1 polynomial 1
## 2: 0.1 polynomial 2
## 3: 0.1 polynomial 3
## 4: 0.1 radial NA
## 5: 1.0 polynomial 1
## 6: 1.0 polynomial 2
## 7: 1.0 polynomial 3
## 8: 1.0 radial NA
## 9: 10.0 polynomial 1
## 10: 10.0 polynomial 2
## 11: 10.0 polynomial 3
## 12: 10.0 radial NA
3.2.5 Creating Tuning ParamSets from other ParamSets
Having to define a tuning ParamSet
for a Learner
that already has parameter set information may seem unnecessarily tedious, and there is indeed a way to create tuning ParamSets
from a Learner
’s ParamSet
, making use of as much information as already available.
This is done by setting values of a Learner
’s ParamSet
to so-called TuneToken
s, constructed with a to_tune
call.
This can be done in the same way that other hyperparameters are set to specific values.
It can be understood as the hyperparameters being tagged for later tuning.
The resulting ParamSet
used for tuning can be retrieved using the $search_space()
method.
library("mlr3learners")
= lrn("classif.svm")
learner $param_set$values$kernel = "polynomial" # for example
learner$param_set$values$degree = to_tune(lower = 1, upper = 3)
learner
print(learner$param_set$search_space())
## <ParamSet>
## id class lower upper levels default value
## 1: degree ParamInt 1 3 <NoDefault[3]>
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose())
## degree
## 1: 1
## 2: 2
## 3: 3
It is possible to omit lower
here, because it can be inferred from the lower bound of the degree
parameter itself.
For other parameters, that are already bounded, it is possible to not give any bounds at all, because their ranges are already bounded.
An example is the logical shrinking
hyperparameter:
$param_set$values$shrinking = to_tune()
learner
print(learner$param_set$search_space())
## <ParamSet>
## id class lower upper levels default value
## 1: degree ParamInt 1 3 <NoDefault[3]>
## 2: shrinking ParamLgl NA NA TRUE,FALSE TRUE
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose())
## degree shrinking
## 1: 1 TRUE
## 2: 1 FALSE
## 3: 2 TRUE
## 4: 2 FALSE
## 5: 3 TRUE
## 6: 3 FALSE
to_tune
can also be constructed with a Domain
object, i.e. something constructed with a p_***
call.
This way it is possible to tune continuous parameters with discrete values, or to give trafos or dependencies.
One could, for example, tune the cost
as above on three given special values, and introduce a dependency of shrinking
on it.
Notice that a short form for to_tune(<levels>)
is a short form of to_tune(p_fct(<levels>))
.
(When introducing the dependency, we need to use the degree
value from before the implicit trafo, which is the name or as.character()
of the respective value, here "val2"
!)
$param_set$values$type = "C-classification" # needs to be set because of a bug in paradox
learner$param_set$values$cost = to_tune(c(val1 = 0.3, val2 = 0.7))
learner$param_set$values$shrinking = to_tune(p_lgl(depends = cost == "val2"))
learner
print(learner$param_set$search_space())
## <ParamSet>
## id class lower upper levels default parents value
## 1: cost ParamFct NA NA val1,val2 <NoDefault[3]>
## 2: degree ParamInt 1 3 <NoDefault[3]>
## 3: shrinking ParamLgl NA NA TRUE,FALSE <NoDefault[3]> cost
## Trafo is set.
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose(), fill = TRUE)
## degree cost shrinking
## 1: 1 0.3 NA
## 2: 1 0.7 TRUE
## 3: 1 0.7 FALSE
## 4: 2 0.3 NA
## 5: 2 0.7 TRUE
## 6: 2 0.7 FALSE
## 7: 3 0.3 NA
## 8: 3 0.7 TRUE
## 9: 3 0.7 FALSE
The search_space()
picks up dependencies fromt the underlying ParamSet
automatically.
So if the kernel
is tuned, then degree
automatically gets the dependency on it, without us having to specify that.
(Here we reset cost
and shrinking
to NULL
for the sake of clarity of the generated output.)
$param_set$values$cost = NULL
learner$param_set$values$shrinking = NULL
learner$param_set$values$kernel = to_tune(c("polynomial", "radial"))
learner
print(learner$param_set$search_space())
## <ParamSet>
## id class lower upper levels default parents value
## 1: degree ParamInt 1 3 <NoDefault[3]> kernel
## 2: kernel ParamFct NA NA polynomial,radial <NoDefault[3]>
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose(), fill = TRUE)
## kernel degree
## 1: polynomial 1
## 2: polynomial 2
## 3: polynomial 3
## 4: radial NA
It is even possible to define whole ParamSet
s that get tuned over for a single parameter.
This may be especially useful for vector hyperparameters that should be searched along multiple dimensions.
This ParamSet
must, however, have an .extra_trafo
that returns a list with a single element, because it corresponds to a single hyperparameter that is being tuned.
Suppose the class.weights
hyperparameter should be tuned along two dimensions:
$param_set$values$class.weights = to_tune(
learnerps(spam = p_dbl(0.1, 0.9), nonspam = p_dbl(0.1, 0.9),
.extra_trafo = function(x, param_set) list(c(spam = x$spam, nonspam = x$nonspam))
))head(generate_design_grid(learner$param_set$search_space(), 3)$transpose(), 3)
## [[1]]
## [[1]]$kernel
## [1] "polynomial"
##
## [[1]]$degree
## [1] 1
##
## [[1]]$class.weights
## spam nonspam
## 0.1 0.1
##
##
## [[2]]
## [[2]]$kernel
## [1] "polynomial"
##
## [[2]]$degree
## [1] 1
##
## [[2]]$class.weights
## spam nonspam
## 0.1 0.5
##
##
## [[3]]
## [[3]]$kernel
## [1] "polynomial"
##
## [[3]]$degree
## [1] 1
##
## [[3]]$class.weights
## spam nonspam
## 0.1 0.9