8.1 Hyperparameter Tuning

Hyperparameter tuning is supported via the extension package mlr3tuning. The heart of mlr3tuning are the R6 classes mlr3tuning::PerformanceEvaluator and the Tuner* classes. They store the settings, perform the tuning and save the results.

8.1.1 The Performance Evaluator class

The mlr3tuning::PerformanceEvaluator class requires the following inputs from the user:

It is similar to resample and benchmark with the additional requirement of a “Parameter Set” (paradox::ParamSet ) specifying the Hyperparameters of the given learner which should be optimized.

An exemplary definition could looks as follows:

task = mlr3::mlr_tasks$get("iris")
learner = mlr3::mlr_learners$get("classif.rpart")
resampling = mlr3::mlr_resamplings$get("holdout")
measures = mlr3::mlr_measures$mget("classif.ce")
task$measures = measures
param_set = paradox::ParamSet$new(params = list(
  paradox::ParamDbl$new("cp", lower = 0.001, upper = 0.1),
  paradox::ParamInt$new("minsplit", lower = 1, upper = 10)))

pe = PerformanceEvaluator$new(
  task = task,
  learner = learner,
  resampling = resampling,
  param_set = param_set
)

Evaluation of Single Parameter Settings

Using the method .$eval(), the mlr3tuning::PerformanceEvaluator is able to tune a specific set of hyperparameters on the given inputs. The parameters have to be handed over wrapped in a data.table:

pe$eval(data.table::data.table(cp = 0.05, minsplit = 5))

The results are stored in a BenchmarkResult class within the pe object. Note that this is the “bare bone” concept of using hyperparameters during Resampling. Usually you want to optimize the parameters in an automated fashion.

8.1.2 Tuning Hyperparameter Spaces

Most often you do not want to only check the performance of fixed hyperparameter settings sequentially but optimize the outcome using different hyperparameter choices in an automated way.

To achieve this, we need a definition of the search spaced that should be optimized. Let’s use again the space we defined in the introduction.

paradox::ParamSet$new(params = list(
  paradox::ParamDbl$new("cp", lower = 0.001, upper = 0.1),
  paradox::ParamInt$new("minsplit", lower = 1, upper = 10)))

To start the tuning, we still need to select how the optimization should take place - in other words, we need to choose the optimization algorithm.

The following algorithms are currently implemented in mlr3:

In this example we will use a simple “Grid Search”. Since we have only numeric parameters and specified the upper and lower bounds for the search space, mlr3tuning::TunerGridSearch will create a grid of equally-sized steps. By default, mlr3tuning::TunerGridSearch creates ten equal-sized steps. The number of steps can be changed with the resolution argument. In this example we use 15 steps and create a new class mlr3tuning::TunerGridSearch using the mlr3tuning::PerformanceEvaluator pe and the resolution.

tuner_gs = TunerGridSearch$new(pe, resolution = 15)
## Error in assert_r6(terminator, "Terminator"): argument "terminator" is missing, with no default

Oh! The error message tells us that we need to specify an addition argument called terminator.

8.1.3 Defining the Terminator

What is a “Terminator”? The mlr3tuning::Terminator defines when the tuning should be stopped. This setting can have various instances:

Often enough one termination criterion is not enough. For example, you will not know beforehand if all of your given evaluations will finish within a given amount of time. This highly depends on the Learner and the paradox::ParamSet given. However, you might not want to exceed a certain tuning time for each learner. In this case, it makes sense to combine both criteria using mlr3tuning::TerminatorMultiplexer. Tuning will stop as soon as one Terminator signals to be finished.

In the following example we create two terminators and then combine them into one:

tr = TerminatorRuntime$new(max_time = 5, units = "secs")
te = TerminatorEvaluations$new(max_evaluations = 50)

tm = TerminatorMultiplexer$new(list(te, tr))
tm
## TerminatorEvaluations with 50 remaining evaluations
## TerminatorRuntime with 5.000000 remaining secs.

8.1.4 Executing the Tuning

Now that we have all required inputs (paradox::ParamSet, mlr3tuning::Terminator and the optimization algorithm), we can perform the hyperparameter tuning.

The first step is to create the respective “Tuner” class, here mlr3tuning::TunerGridSearch.

tuner_gs = TunerGridSearch$new(pe = pe, terminator = tm,
  resolution = 15)

After it has been initialized, we can call its member function .$tune() to run the tuning.

tuner_gs$tune()

.$tune() simply performs a benchmark on the parameter values generated by the tuner and writes the results into a BenchmarkResult object which is stored in field .$bmr of the mlr3tuning::PerformanceEvaluator object that we passed to it.

8.1.5 Inspecting Results

During the .$tune() call not only the BenchmarkResult output was written to the .$bmr slot of the mlr3tuning::PerformanceEvaluator but also the mlr3tuning::Terminator got updated.

We can take a look by directly printing the mlr3tuning::Terminator object:

print(tm)
## TerminatorEvaluations with 0 remaining evaluations
## TerminatorRuntime with 0.490250 remaining secs.

We can easily see that all evaluations were executed before the time limit kicked in.

Now let’s take a closer look at the actual tuning result. It can be queried using .$tune_result() from the respective mlr3tuning::Tuner class that generated it. Internally, the function scrapes the data from the BenchmarkResult that was generated during tuning and stored in .$pe$bmr.

tuner_gs$tune_result()
## $performance
## classif.ce 
##          0 
## 
## $values
## $values$xval
## [1] 0
## 
## $values$cp
## [1] 0.008071
## 
## $values$minsplit
## [1] 9

It returns the scored performance and the values of the optimized hyperparameters. Note that each measure “knows” if it was minimized or maximized during tuning:

task$measures$classif.ce$minimize
## [1] TRUE

A summary of the BenchmarkResult created by the tuning can be queried using the .$aggregated() function of the Tuner class.

tuner_gs$aggregated()
##                  hash  resample_result          task               learner
##   1: 103e96aa81af459f <ResampleResult> <TaskClassif> <LearnerClassifRpart>
##   2: 4367043f57ea7b96 <ResampleResult> <TaskClassif> <LearnerClassifRpart>
##   3: 321a930e3b2a6c43 <ResampleResult> <TaskClassif> <LearnerClassifRpart>
##   4: 2ce6e2416aa8c1d6 <ResampleResult> <TaskClassif> <LearnerClassifRpart>
##   5: d5e96963c6aa2e18 <ResampleResult> <TaskClassif> <LearnerClassifRpart>
##  ---                                                                      
## 147: 6341551eb4ab8c33 <ResampleResult> <TaskClassif> <LearnerClassifRpart>
## 148: 0bb96c127cc557a4 <ResampleResult> <TaskClassif> <LearnerClassifRpart>
## 149: dbe910fe1bac4e07 <ResampleResult> <TaskClassif> <LearnerClassifRpart>
## 150: d5ae6f9d2aa90d7d <ResampleResult> <TaskClassif> <LearnerClassifRpart>
## 151: 0aa93a9354c4b3f4 <ResampleResult> <TaskClassif> <LearnerClassifRpart>
##      task_id       learner_id resampling_id classif.ce xval    cp minsplit
##   1:    iris   classif.rpart1       holdout       0.08    0 0.050        5
##   2:    iris   classif.rpart2       holdout       0.06    0 0.001        1
##   3:    iris   classif.rpart3       holdout       0.08    0 0.001        2
##   4:    iris   classif.rpart4       holdout       0.06    0 0.001        3
##   5:    iris   classif.rpart5       holdout       0.08    0 0.001        4
##  ---                                                                      
## 147:    iris classif.rpart147       holdout       0.06    0 0.100        6
## 148:    iris classif.rpart148       holdout       0.06    0 0.100        7
## 149:    iris classif.rpart149       holdout       0.08    0 0.100        8
## 150:    iris classif.rpart150       holdout       0.04    0 0.100        9
## 151:    iris classif.rpart151       holdout       0.12    0 0.100       10

Now the optimized hyperparameters can be used to create a new ref("Learner") and train it on the full dataset.

task = mlr3::mlr_tasks$get("iris")
learner = mlr3::mlr_learners$get("classif.rpart", 
  param_vals = list(
    xval = tuner_gs$tune_result()$values$xval,
    cp = tuner_gs$tune_result()$values$cp)
)

e = Experiment$new(task = task, learner = learner)
e$train()
## <Experiment> [trained]:
##  + Task: iris
##  + Learner: classif.rpart
##  + Model: [rpart]
##  - Predictions: [missing]
##  - Performance: [missing]

8.1.6 Automating the Tuning

The steps shown above can be executed in a more convenient way using the mlr3tuning::AutoTuner class.

This class gathers all the steps from above into a single call and uses the optimized hyperparameters from the tuning to create a new learner.

Requirements:

  • Task
  • Learner
  • Resampling
  • Measure
  • Parameter Set
  • Terminator
  • Tuning method
  • Tuning settings (optional)
task = mlr3::mlr_tasks$get("iris")
learner = mlr3::mlr_learners$get("classif.rpart")
resampling = mlr3::mlr_resamplings$get("holdout")
measures = mlr3::mlr_measures$mget("classif.ce")
task$measures = measures
param_set = paradox::ParamSet$new(
  params = list(paradox::ParamDbl$new("cp", lower = 0.001, upper = 0.1)))
terminator = TerminatorEvaluations$new(5)

at = mlr3tuning::AutoTuner$new(learner, resampling, param_set, terminator, 
  tuner = TunerGridSearch, tuner_settings = list(resolution = 10L))

at$train(task)
## <AutoTuner:autotuner>
## Parameters: cp=0.034
## Packages: rpart
## Predict Type: response
## Feature types: logical, integer, numeric, character, factor, ordered
## Properties: importance, missings, multiclass, selected_features,
##   twoclass, weights
at$learner
## <LearnerClassifRpart:classif.rpart>
## Parameters: cp=0.034
## Packages: rpart
## Predict Type: response
## Feature types: logical, integer, numeric, character, factor, ordered
## Properties: importance, missings, multiclass, selected_features,
##   twoclass, weights

8.1.7 Summary

References

Bergstra, James, and Yoshua Bengio. 2012. “Random Search for Hyper-Parameter Optimization.” J. Mach. Learn. Res. 13 (February): 281–305.