8.5 Nested Resampling

8.5.1 Introduction

In order to obtain unbiased performance estimates for a learners, all parts of the model building (preprocessing and model selection steps) should be included in the resampling, i.e., repeated for every pair of training/test data. For steps that themselves require resampling like hyperparameter tuning or feature-selection (via the wrapper approach) this results in two nested resampling loops.

The graphic above illustrates nested resampling for parameter tuning with 3-fold cross-validation in the outer and 4-fold cross-validation in the inner loop.

In the outer resampling loop, we have three pairs of training/test sets. On each of these outer training sets parameter tuning is done, thereby executing the inner resampling loop. This way, we get one set of selected hyperparameters for each outer training set. Then the learner is fitted on each outer training set using the corresponding selected hyperparameters and its performance is evaluated on the outer test sets.

In mlr3, you can get nested resampling for free without programming any looping by using the mlr3tuning::AutoTuner class. This works as follows:

  1. Generate a wrapped Learner via class mlr3tuning::AutoTuner or mlr3featsel::AutoSelect (not yet implemented).
  2. Specify all required settings - see section “Automating the Tuning” for help.
  3. Call function resample() or benchmark() with the created Learner.

You can freely combine different inner and outer resampling strategies.

A common setup is prediction and performance evaluation on a fixed outer test set. This can be achieved by passing the Resampling strategy (mlr_resamplings$get("holdout")) as the outer resampling instance to either resample() or benchmark().

The inner resampling strategy could be a cross-validation one (mlr_resamplings$get("cv")) as the sizes of the outer training sets might differ. Per default, the inner resample description is instantiated once for every outer training set.

Nested resampling is computationally expensive. For this reason in the examples shown below we use relatively small search spaces and a low number of resampling iterations. In practice, you normally have to increase both. As this is computationally intensive you might want to have a look at section parallelization.

8.5.2 Execution

To optimize hyperparameters or conduct features-selection in a nested resampling you need to create learners using either

class.

We use the example from section “Automating the Tuning” and pipe the resulting learner into a resample() call.

task = mlr3::mlr_tasks$get("iris")
learner = mlr3::mlr_learners$get("classif.rpart")
resampling = mlr3::mlr_resamplings$get("holdout")
measures = mlr3::mlr_measures$mget("classif.ce")
task$measures = measures
param_set = paradox::ParamSet$new(
  params = list(paradox::ParamDbl$new("cp", lower = 0.001, upper = 0.1)))
terminator = TerminatorEvaluations$new(5)

at = mlr3tuning::AutoTuner$new(learner, resampling, param_set, terminator, 
  tuner = TunerGridSearch, tuner_settings = list(resolution = 10L))

Now construct the resample() call:

resampling_outer = mlr3::mlr_resamplings$get("cv3")

rr = resample(task = task, learner = at, resampling = resampling_outer, 
  measures = task$measures)

8.5.3 Evaluation

With the created ResampleResult we can now inspect the executed Experiment more closely. See also section Resampling for more detailed information about ResampleResult objects.

For example, we can query the aggregated performance result:

rr$aggregated
## classif.ce 
##       0.06

Check for any errors in the folds during execution:

rr$errors
## [1] FALSE FALSE FALSE

Or take a look at the predictions:

rr$prediction
## <PredictionClassif> for 150 observations:
##      row_id     truth  response
##   1:     16    setosa    setosa
##   2:     22    setosa    setosa
##   3:     24    setosa    setosa
##  ---                           
## 148:    138 virginica virginica
## 149:    139 virginica virginica
## 150:    143 virginica virginica