3.2 Tuning Search Spaces

When running an optimization, it is important to inform the tuning algorithm about what hyperparameters are valid. Here the names, types, and valid ranges of each hyperparameter are important. All this information is communicated with objects of the class ParamSet, which is defined in paradox. While it is possible to create ParamSet-objects using its $new-constructor, it is much shorter and readable to use the ps-shortcut, which will be presented here. For an in-depth description of paradox and its classes, see the paradox chapter.

Note, that ParamSet objects exist in two contexts. First, ParamSet-objects are used to define the space of valid parameter setting for a learner (and other objects). Second, they are used to define a search space for tuning. We are mainly interested in the latter. For an example we can consider the minsplit parameter of the classif.rpart Learner. The ParamSet associated with the learner has a lower but no upper bound. However, for tuning the value, a lower and upper bound must be given because tuning search spaces need to be bounded. For Learner or PipeOp objects, typically “unbounded” ParamSets are used. Here, however, we will mainly focus on creating “bounded” ParamSets that can be used for tuning. See the in-depth paradox chapter for more details on using ParamSets to define parameter ranges for use-cases besides tuning.

3.2.1 Creating ParamSets

An empty ParamSet – not yet very useful – can be constructed using just the ps call:

library("paradox")
search_space = ps()
print(search_space)
## <ParamSet>
## Empty.

ps takes named Domain arguments that are turned into parameters. A possible search space for the "classif.svm" learner could for example be:

search_space = ps(
  cost = p_dbl(lower = 0.1, upper = 10),
  kernel = p_fct(levels = c("polynomial", "radial"))
)
print(search_space)
## <ParamSet>
##        id    class lower upper            levels        default value
## 1:   cost ParamDbl   0.1    10                   <NoDefault[3]>      
## 2: kernel ParamFct    NA    NA polynomial,radial <NoDefault[3]>

There are five domain constructors that produce a parameters when given to ps:

Constructor Description Is bounded? Underlying Class
p_dbl Real valued parameter (“double”) When upper and lower are given ParamDbl
p_int Integer parameter When upper and lower are given ParamInt
p_fct Discrete valued parameter (“factor”) Always ParamFct
p_lgl Logical / Boolean parameter Always ParamLgl
p_uty Untyped parameter Never ParamUty

These domain constructors each take some of the following arguments:

  • lower, upper: lower and upper bound of numerical parameters (p_dbl and p_int). These need to be given to get bounded parameter spaces valid for tuning.
  • levels: Allowed categorical values for p_fct parameters. Required argument for p_fct. See below for more details on this parameter.
  • trafo: transformation function, see below.
  • depends: dependencies, see below.
  • tags: Further information about a parameter, used for example by the hyperband tuner.
  • default: Value corresponding to default behavior when the parameter is not given. Not used for tuning search spaces.
  • special_vals: Valid values besides the normally accepted values for a parameter. Not used for tuning search spaces.
  • custom_check: Function that checks whether a value given to p_uty is valid. Not used for tuning search spaces.

The lower, upper, or levels parameters are always at the first (or second, for upper) position of the respective constructors, so it is preferred to omit them when defining a ParamSet, for improved conciseness:

search_space = ps(cost = p_dbl(0.1, 10), kernel = p_fct(c("polynomial", "radial")))

3.2.2 Transformations (trafo)

We can use the paradox function generate_design_grid to look at the values that would be evaluated by grid search. (We are using rbindlist() here because the result of $transpose() is a list that is harder to read. If we didn’t use $transpose(), on the other hand, the transformations that we investigate here are not applied.)

library("data.table")
rbindlist(generate_design_grid(search_space, 3)$transpose())
##     cost     kernel
## 1:  0.10 polynomial
## 2:  0.10     radial
## 3:  5.05 polynomial
## 4:  5.05     radial
## 5: 10.00 polynomial
## 6: 10.00     radial

We notice that the cost parameter is taken on a linear scale. We assume, however, that the difference of cost between 0.1 and 1 should have a similar effect as the difference between 1 and 10. Therefore it makes more sense to tune it on a logarithmic scale. This is done by using a transformation (trafo). This is a function that is applied to a parameter after it has been sampled by the tuner. We can tune cost on a logarithmic scale by sampling on the linear scale [-1, 1] and computing 10^x from that value.

search_space = ps(
  cost = p_dbl(-1, 1, trafo = function(x) 10^x),
  kernel = p_fct(c("polynomial", "radial"))
)
rbindlist(generate_design_grid(search_space, 3)$transpose())
##    cost     kernel
## 1:  0.1 polynomial
## 2:  0.1     radial
## 3:  1.0 polynomial
## 4:  1.0     radial
## 5: 10.0 polynomial
## 6: 10.0     radial

It is even possible to attach another transformation to the ParamSet as a whole that gets executed after individual parameter’s transformations were performed. It is given through the .extra_trafo argument and should be a function with parameters x and param_set that takes a list of parameter values in x and returns a modified list. This transformation can access all parameter values of an evaluation and modify them with interactions. It is even possible to add or remove parameters. (The following is a bit of a silly example.)

search_space = ps(
  cost = p_dbl(-1, 1, trafo = function(x) 10^x),
  kernel = p_fct(c("polynomial", "radial")),
  .extra_trafo = function(x, param_set) {
    if (x$kernel == "polynomial") {
      x$cost = x$cost * 2
    }
    x
  }
)
rbindlist(generate_design_grid(search_space, 3)$transpose())
##    cost     kernel
## 1:  0.2 polynomial
## 2:  0.1     radial
## 3:  2.0 polynomial
## 4:  1.0     radial
## 5: 20.0 polynomial
## 6: 10.0     radial

The available types of search space parameters are limited: continuous, integer, discrete, and logical scalars. There are many machine learning algorithms, however, that take parameters of other types, for example vectors or functions. These can not be defined in a search space ParamSet, and they are often given as ParamUty in the Learner’s ParamSet. When trying to tune over these hyperparameters, it is necessary to perform a Transformation that changes the type of a parameter.

An example is the class.weights parameter of the SVM, which takes a named vector of class weights with one entry for each target class. The trafo that would tune class.weights for the tsk("spam") dataset could be:

search_space = ps(
  class.weights = p_dbl(0.1, 0.9, trafo = function(x) c(spam = x, nonspam = 1 - x))
)
generate_design_grid(search_space, 3)$transpose()
## [[1]]
## [[1]]$class.weights
##    spam nonspam 
##     0.1     0.9 
## 
## 
## [[2]]
## [[2]]$class.weights
##    spam nonspam 
##     0.5     0.5 
## 
## 
## [[3]]
## [[3]]$class.weights
##    spam nonspam 
##     0.9     0.1

(We are omitting rbindlist() in this example because it breaks the vector valued return elements.)

3.2.3 Automatic Factor Level Transformation

A common use-case is the necessity to specify a list of values that should all be tried (or sampled from). It may be the case that a hyperparameter accepts function objects as values and a certain list of functions should be tried. Or it may be that a choice of special numeric values should be tried. For this, the p_fct constructor’s level argument may be a value that is not a character vector, but something else. If, for example, only the values 0.1, 3, and 10 should be tried for the cost parameter, even when doing random search, then the following search space would achieve that:

search_space = ps(
  cost = p_fct(c(0.1, 3, 10)),
  kernel = p_fct(c("polynomial", "radial"))
)
rbindlist(generate_design_grid(search_space, 3)$transpose())
##    cost     kernel
## 1:  0.1 polynomial
## 2:  0.1     radial
## 3:  3.0 polynomial
## 4:  3.0     radial
## 5: 10.0 polynomial
## 6: 10.0     radial

This is equivalent to the following:

search_space = ps(
  cost = p_fct(c("0.1", "3", "10"),
    trafo = function(x) list(`0.1` = 0.1, `3` = 3, `10` = 10)[[x]]),
  kernel = p_fct(c("polynomial", "radial"))
)
rbindlist(generate_design_grid(search_space, 3)$transpose())
##    cost     kernel
## 1:  0.1 polynomial
## 2:  0.1     radial
## 3:  3.0 polynomial
## 4:  3.0     radial
## 5: 10.0 polynomial
## 6: 10.0     radial

This may seem silly, but makes sense when considering that factorial tuning parameters are always character values:

search_space = ps(
  cost = p_fct(c(0.1, 3, 10)),
  kernel = p_fct(c("polynomial", "radial"))
)
typeof(search_space$params$cost$levels)
## [1] "character"

Be aware that this results in an “unordered” hyperparameter, however. Tuning algorithms that make use of ordering information of parameters, like genetic algorithms or model based optimization, will perform worse when this is done. For these algorithms, it may make more sense to define a p_dbl or p_int with a more fitting trafo.

The class.weights case from above can also be implemented like this, if there are only a few candidates of class.weights vectors that should be tried. Note that the levels argument of p_fct must be named if there is no easy way for as.character() to create names:

search_space = ps(
  class.weights = p_fct(
    list(
      candidate_a = c(spam = 0.5, nonspam = 0.5),
      candidate_b = c(spam = 0.3, nonspam = 0.7)
    )
  )
)
generate_design_grid(search_space)$transpose()
## [[1]]
## [[1]]$class.weights
##    spam nonspam 
##     0.5     0.5 
## 
## 
## [[2]]
## [[2]]$class.weights
##    spam nonspam 
##     0.3     0.7

3.2.4 Parameter Dependencies (depends)

Some parameters are only relevant when another parameter has a certain value, or one of several values. The SVM, for example, has the degree parameter that is only valid when kernel is "polynomial". This can be specified using the depends argument. It is an expression that must involve other parameters and be of the form <param> == <scalar>, <param> %in% <vector>, or multiple of these chained by &&. To tune the degree parameter, one would need to do the following:

search_space = ps(
  cost = p_dbl(-1, 1, trafo = function(x) 10^x),
  kernel = p_fct(c("polynomial", "radial")),
  degree = p_int(1, 3, depends = kernel == "polynomial")
)
rbindlist(generate_design_grid(search_space, 3)$transpose(), fill = TRUE)
##     cost     kernel degree
##  1:  0.1 polynomial      1
##  2:  0.1 polynomial      2
##  3:  0.1 polynomial      3
##  4:  0.1     radial     NA
##  5:  1.0 polynomial      1
##  6:  1.0 polynomial      2
##  7:  1.0 polynomial      3
##  8:  1.0     radial     NA
##  9: 10.0 polynomial      1
## 10: 10.0 polynomial      2
## 11: 10.0 polynomial      3
## 12: 10.0     radial     NA

3.2.5 Creating Tuning ParamSets from other ParamSets

Having to define a tuning ParamSet for a Learner that already has parameter set information may seem unnecessarily tedious, and there is indeed a way to create tuning ParamSets from a Learner’s ParamSet, making use of as much information as already available.

This is done by setting values of a Learner’s ParamSet to so-called TuneTokens, constructed with a to_tune call. This can be done in the same way that other hyperparameters are set to specific values. It can be understood as the hyperparameters being tagged for later tuning. The resulting ParamSet used for tuning can be retrieved using the $search_space() method.

library("mlr3learners")
learner = lrn("classif.svm")
learner$param_set$values$kernel = "polynomial"  # for example
learner$param_set$values$degree = to_tune(lower = 1, upper = 3)

print(learner$param_set$search_space())
## <ParamSet>
##        id    class lower upper levels        default value
## 1: degree ParamInt     1     3        <NoDefault[3]>
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose())
##    degree
## 1:      1
## 2:      2
## 3:      3

It is possible to omit lower here, because it can be inferred from the lower bound of the degree parameter itself. For other parameters, that are already bounded, it is possible to not give any bounds at all, because their ranges are already bounded. An example is the logical shrinking hyperparameter:

learner$param_set$values$shrinking = to_tune()

print(learner$param_set$search_space())
## <ParamSet>
##           id    class lower upper      levels        default value
## 1:    degree ParamInt     1     3             <NoDefault[3]>      
## 2: shrinking ParamLgl    NA    NA  TRUE,FALSE           TRUE
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose())
##    degree shrinking
## 1:      1      TRUE
## 2:      1     FALSE
## 3:      2      TRUE
## 4:      2     FALSE
## 5:      3      TRUE
## 6:      3     FALSE

to_tune can also be constructed with a Domain object, i.e. something constructed with a p_*** call. This way it is possible to tune continuous parameters with discrete values, or to give trafos or dependencies. One could, for example, tune the cost as above on three given special values, and introduce a dependency of shrinking on it. Notice that a short form for to_tune(<levels>) is a short form of to_tune(p_fct(<levels>)). (When introducing the dependency, we need to use the degree value from before the implicit trafo, which is the name or as.character() of the respective value, here "val2"!)

learner$param_set$values$type = "C-classification"  # needs to be set because of a bug in paradox
learner$param_set$values$cost = to_tune(c(val1 = 0.3, val2 = 0.7))
learner$param_set$values$shrinking = to_tune(p_lgl(depends = cost == "val2"))

print(learner$param_set$search_space())
## <ParamSet>
##           id    class lower upper      levels        default parents value
## 1:      cost ParamFct    NA    NA   val1,val2 <NoDefault[3]>              
## 2:    degree ParamInt     1     3             <NoDefault[3]>              
## 3: shrinking ParamLgl    NA    NA  TRUE,FALSE <NoDefault[3]>    cost      
## Trafo is set.
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose(), fill = TRUE)
##    degree cost shrinking
## 1:      1  0.3        NA
## 2:      1  0.7      TRUE
## 3:      1  0.7     FALSE
## 4:      2  0.3        NA
## 5:      2  0.7      TRUE
## 6:      2  0.7     FALSE
## 7:      3  0.3        NA
## 8:      3  0.7      TRUE
## 9:      3  0.7     FALSE

The search_space() picks up dependencies fromt the underlying ParamSet automatically. So if the kernel is tuned, then degree automatically gets the dependency on it, without us having to specify that. (Here we reset cost and shrinking to NULL for the sake of clarity of the generated output.)

learner$param_set$values$cost = NULL
learner$param_set$values$shrinking = NULL
learner$param_set$values$kernel = to_tune(c("polynomial", "radial"))

print(learner$param_set$search_space())
## <ParamSet>
##        id    class lower upper            levels        default parents value
## 1: degree ParamInt     1     3                   <NoDefault[3]>  kernel      
## 2: kernel ParamFct    NA    NA polynomial,radial <NoDefault[3]>
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose(), fill = TRUE)
##        kernel degree
## 1: polynomial      1
## 2: polynomial      2
## 3: polynomial      3
## 4:     radial     NA

It is even possible to define whole ParamSets that get tuned over for a single parameter. This may be especially useful for vector hyperparameters that should be searched along multiple dimensions. This ParamSet must, however, have an .extra_trafo that returns a list with a single element, because it corresponds to a single hyperparameter that is being tuned. Suppose the class.weights hyperparameter should be tuned along two dimensions:

learner$param_set$values$class.weights = to_tune(
  ps(spam = p_dbl(0.1, 0.9), nonspam = p_dbl(0.1, 0.9),
    .extra_trafo = function(x, param_set) list(c(spam = x$spam, nonspam = x$nonspam))
  ))
head(generate_design_grid(learner$param_set$search_space(), 3)$transpose(), 3)
## [[1]]
## [[1]]$kernel
## [1] "polynomial"
## 
## [[1]]$degree
## [1] 1
## 
## [[1]]$class.weights
##    spam nonspam 
##     0.1     0.1 
## 
## 
## [[2]]
## [[2]]$kernel
## [1] "polynomial"
## 
## [[2]]$degree
## [1] 1
## 
## [[2]]$class.weights
##    spam nonspam 
##     0.1     0.5 
## 
## 
## [[3]]
## [[3]]$kernel
## [1] "polynomial"
## 
## [[3]]$degree
## [1] 1
## 
## [[3]]$class.weights
##    spam nonspam 
##     0.1     0.9