# 3 Model Optimization

Model Tuning

Machine learning algorithms have default values set for their hyperparameters. Irrespective, these hyperparameters need to be changed by the user to achieve optimal performance on the given dataset. A manual selection of hyperparameter values is not recommended as this approach rarely leads to the best performance. To substantiate the validity of the selected hyperparameters (= tuning), data-driven optimization is recommended. In order to tune a machine learning algorithm, one has to specify (1) the search space, (2) the optimization algorithm (aka tuning method), (3) an evaluation method, i.e., a resampling strategy and (4) a performance measure.

In summary, the sub-chapter on tuning illustrates how to:

This sub-chapter also requires the package mlr3tuning, an extension package which supports hyperparameter tuning.

Feature Selection

The second part of this chapter explains feature selection, also known as variable selection. Feature selection is the process of finding a subset of relevant features of the data. Some of the reasons to perform the selection:

• enhance the interpretability of the model,
• speed up model fitting or
• improve the learner performance by reducing noise in the data.

In this book we focus mainly on the last aspect. Different approaches exist to identify the relevant features. In the sub-chapter on feature selection, we emphasize three methods:

• Filter algorithms select features independently of the learner according to a score.
• Variable importance filters select features that are important according to a learner.
• Wrapper methods iteratively select features to optimize a performance measure.

Note, that filters do not require a learner. Variable importance filters require a learner that can calculate feature importance values once it is trained. The obtained importance values can be used to subset the data, which can then be used to train a learner. Wrapper methods can be used with any learner but need to train the learner multiple times.

Nested Resampling

In order to get a good estimate of generalization performance and avoid data leakage, both an outer (performance) and an inner (tuning/feature selection) resampling process are necessary. The following features are discussed in this chapter:

This sub-chapter will provide instructions on how to implement nested resampling, accounting for both inner and outer resampling in mlr3.

## 3.1 Hyperparameter Tuning

Hyperparameters are second-order parameters of machine learning models that, while often not explicitly optimized during the model estimation process, can have an important impact on the outcome and predictive performance of a model. Typically, hyperparameters are fixed before training a model. However, because the output of a model can be sensitive to the specification of hyperparameters, it is often recommended to make an informed decision about which hyperparameter settings may yield better model performance. In many cases, hyperparameter settings may be chosen a priori, but it can be advantageous to try different settings before fitting your model on the training data. This process is often called model ‘tuning’.

Hyperparameter tuning is supported via the mlr3tuning extension package. Below you can find an illustration of the process:

At the heart of mlr3tuning are the R6 classes:

### 3.1.1 The TuningInstance* Classes

The following sub-section examines the optimization of a simple classification tree on the Pima Indian Diabetes data set.

library("mlr3verse")
print(task)
## <TaskClassif:pima> (768 x 9)
## * Target: diabetes
## * Properties: twoclass
## * Features (8):
##   - dbl (8): age, glucose, insulin, mass, pedigree, pregnant, pressure,
##     triceps

We use the classification tree from rpart and choose a subset of the hyperparameters we want to tune. This is often referred to as the “tuning space”.

learner = lrn("classif.rpart")
learner$param_set ## <ParamSet> ## id class lower upper nlevels default value ## 1: cp ParamDbl 0 1 Inf 0.01 ## 2: keep_model ParamLgl NA NA 2 FALSE ## 3: maxcompete ParamInt 0 Inf Inf 4 ## 4: maxdepth ParamInt 1 30 30 30 ## 5: maxsurrogate ParamInt 0 Inf Inf 5 ## 6: minbucket ParamInt 1 Inf Inf <NoDefault[3]> ## 7: minsplit ParamInt 1 Inf Inf 20 ## 8: surrogatestyle ParamInt 0 1 2 0 ## 9: usesurrogate ParamInt 0 2 3 2 ## 10: xval ParamInt 0 Inf Inf 10 0 Here, we opt to tune two parameters: • The complexity cp • The termination criterion minsplit The tuning space needs to be bounded, therefore one has to set lower and upper bounds: search_space = ps( cp = p_dbl(lower = 0.001, upper = 0.1), minsplit = p_int(lower = 1, upper = 10) ) search_space ## <ParamSet> ## id class lower upper nlevels default value ## 1: cp ParamDbl 0.001 0.1 Inf <NoDefault[3]> ## 2: minsplit ParamInt 1.000 10.0 10 <NoDefault[3]> Next, we need to specify how to evaluate the performance. For this, we need to choose a resampling strategy and a performance measure. hout = rsmp("holdout") measure = msr("classif.ce") Finally, one has to select the budget available, to solve this tuning instance. This is done by selecting one of the available Terminators: For this short introduction, we specify a budget of 20 evaluations and then put everything together into a TuningInstanceSingleCrit: library("mlr3tuning") ## Loading required package: paradox evals20 = trm("evals", n_evals = 20) instance = TuningInstanceSingleCrit$new(
learner = learner,
resampling = hout,
measure = measure,
search_space = search_space,
terminator = evals20
)
instance
## <TuningInstanceSingleCrit>
## * State:  Not optimized
## * Objective: <ObjectiveTuning:classif.rpart_on_pima>
## * Search Space:
## <ParamSet>
##          id    class lower upper nlevels        default value
## 1:       cp ParamDbl 0.001   0.1     Inf <NoDefault[3]>
## 2: minsplit ParamInt 1.000  10.0      10 <NoDefault[3]>
## * Terminator: <TerminatorEvals>
## * Terminated: FALSE
## * Archive:
## <ArchiveTuning>
## Null data.table (0 rows and 0 cols)

To start the tuning, we still need to select how the optimization should take place. In other words, we need to choose the optimization algorithm via the Tuner class.

### 3.1.2 The Tuner Class

The following algorithms are currently implemented in mlr3tuning:

In this example, we will use a simple grid search with a grid resolution of 5.

tuner = tnr("grid_search", resolution = 5)

Since we have only numeric parameters, TunerGridSearch will create an equidistant grid between the respective upper and lower bounds. As we have two hyperparameters with a resolution of 5, the two-dimensional grid consists of $$5^2 = 25$$ configurations. Each configuration serves as a hyperparameter setting for the previously defined Learner which is then fitted on the task using the provided Resampling. All configurations will be examined by the tuner (in a random order), until either all configurations are evaluated or the Terminator signals that the budget is exhausted.

### 3.1.3 Triggering the Tuning

To start the tuning, we simply pass the TuningInstanceSingleCrit to the $optimize() method of the initialized Tuner. The tuner proceeds as follows: 1. The Tuner proposes at least one hyperparameter configuration (the Tuner may propose multiple points to improve parallelization, which can be controlled via the setting batch_size). 2. For each configuration, the given Learner is fitted on the Task using the provided Resampling. All evaluations are stored in the archive of the TuningInstanceSingleCrit. 3. The Terminator is queried if the budget is exhausted. If the budget is not exhausted, restart with 1) until it is. 4. Determine the configuration with the best observed performance. 5. Store the best configurations as result in the instance object. The best hyperparameter settings ($result_learner_param_vals) and the corresponding measured performance ($result_y) can be accessed from the instance. tuner$optimize(instance)
## INFO  [14:21:46.366] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerGridSearch>' and '<TerminatorEvals> [n_evals=20, k=0]'
## INFO  [14:21:46.402] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:46.579] [bbotk] Result of batch 1:
## INFO  [14:21:46.581] [bbotk]      cp minsplit classif.ce runtime_learners
## INFO  [14:21:46.581] [bbotk]  0.0505       10       0.25            0.017
## INFO  [14:21:46.581] [bbotk]                                 uhash
## INFO  [14:21:46.583] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:46.685] [bbotk] Result of batch 2:
## INFO  [14:21:46.687] [bbotk]      cp minsplit classif.ce runtime_learners
## INFO  [14:21:46.687] [bbotk]  0.0505        1       0.25            0.009
## INFO  [14:21:46.687] [bbotk]                                 uhash
## INFO  [14:21:46.689] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:46.779] [bbotk] Result of batch 3:
## INFO  [14:21:46.781] [bbotk]      cp minsplit classif.ce runtime_learners
## INFO  [14:21:46.781] [bbotk]  0.0505        3       0.25            0.009
## INFO  [14:21:46.781] [bbotk]                                 uhash
## INFO  [14:21:46.781] [bbotk]  7e16fb27-e9ee-4242-a5cb-24ab96448c54
## INFO  [14:21:46.782] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:46.878] [bbotk] Result of batch 4:
## INFO  [14:21:46.879] [bbotk]       cp minsplit classif.ce runtime_learners
## INFO  [14:21:46.879] [bbotk]  0.02575        8     0.2148            0.009
## INFO  [14:21:46.879] [bbotk]                                 uhash
## INFO  [14:21:46.879] [bbotk]  14dd5c15-2660-403c-bd45-930364d7b8e0
## INFO  [14:21:46.881] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:46.971] [bbotk] Result of batch 5:
## INFO  [14:21:46.973] [bbotk]       cp minsplit classif.ce runtime_learners
## INFO  [14:21:46.973] [bbotk]  0.07525        5       0.25             0.01
## INFO  [14:21:46.973] [bbotk]                                 uhash
## INFO  [14:21:46.973] [bbotk]  0c498a40-916e-4407-a57e-6ac55ca26140
## INFO  [14:21:46.975] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:47.068] [bbotk] Result of batch 6:
## INFO  [14:21:47.070] [bbotk]       cp minsplit classif.ce runtime_learners
## INFO  [14:21:47.070] [bbotk]  0.07525        8       0.25            0.009
## INFO  [14:21:47.070] [bbotk]                                 uhash
## INFO  [14:21:47.070] [bbotk]  76b60a6b-26f0-485e-8193-4bf34e23fb6f
## INFO  [14:21:47.078] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:47.164] [bbotk] Result of batch 7:
## INFO  [14:21:47.166] [bbotk]   cp minsplit classif.ce runtime_learners                                uhash
## INFO  [14:21:47.166] [bbotk]  0.1        5       0.25            0.009 fd978152-4f4f-4e57-98e7-b262c98332ca
## INFO  [14:21:47.167] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:47.263] [bbotk] Result of batch 8:
## INFO  [14:21:47.265] [bbotk]     cp minsplit classif.ce runtime_learners
## INFO  [14:21:47.265] [bbotk]  0.001        1     0.3008            0.011
## INFO  [14:21:47.265] [bbotk]                                 uhash
## INFO  [14:21:47.265] [bbotk]  547394f0-657f-452d-8801-6bba95e6d7bf
## INFO  [14:21:47.266] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:47.443] [bbotk] Result of batch 9:
## INFO  [14:21:47.445] [bbotk]     cp minsplit classif.ce runtime_learners
## INFO  [14:21:47.445] [bbotk]  0.001        8     0.3008            0.011
## INFO  [14:21:47.445] [bbotk]                                 uhash
## INFO  [14:21:47.445] [bbotk]  45ddb76e-9907-454d-a18b-6cc16be789fa
## INFO  [14:21:47.447] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:47.542] [bbotk] Result of batch 10:
## INFO  [14:21:47.544] [bbotk]       cp minsplit classif.ce runtime_learners
## INFO  [14:21:47.544] [bbotk]  0.07525        1       0.25             0.01
## INFO  [14:21:47.544] [bbotk]                                 uhash
## INFO  [14:21:47.544] [bbotk]  d52c48ac-10fa-4f62-b948-4328c68c5790
## INFO  [14:21:47.546] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:47.636] [bbotk] Result of batch 11:
## INFO  [14:21:47.638] [bbotk]   cp minsplit classif.ce runtime_learners                                uhash
## INFO  [14:21:47.638] [bbotk]  0.1        3       0.25            0.015 9fca5392-f1cc-4ff1-a337-79d0b543bb6c
## INFO  [14:21:47.639] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:47.727] [bbotk] Result of batch 12:
## INFO  [14:21:47.728] [bbotk]      cp minsplit classif.ce runtime_learners
## INFO  [14:21:47.728] [bbotk]  0.0505        5       0.25            0.009
## INFO  [14:21:47.728] [bbotk]                                 uhash
## INFO  [14:21:47.728] [bbotk]  2b70991a-1b2b-415b-a6bf-3b8f15f671a1
## INFO  [14:21:47.730] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:47.822] [bbotk] Result of batch 13:
## INFO  [14:21:47.824] [bbotk]     cp minsplit classif.ce runtime_learners
## INFO  [14:21:47.824] [bbotk]  0.001        5      0.293            0.011
## INFO  [14:21:47.824] [bbotk]                                 uhash
## INFO  [14:21:47.824] [bbotk]  1c10594d-82c7-4740-aa7d-26aff0525cc0
## INFO  [14:21:47.825] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:47.911] [bbotk] Result of batch 14:
## INFO  [14:21:47.913] [bbotk]       cp minsplit classif.ce runtime_learners
## INFO  [14:21:47.913] [bbotk]  0.02575        5     0.2148            0.008
## INFO  [14:21:47.913] [bbotk]                                 uhash
## INFO  [14:21:47.913] [bbotk]  dd97cfc0-1c3e-4a53-afa7-4b7499ba37fc
## INFO  [14:21:47.915] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:48.011] [bbotk] Result of batch 15:
## INFO  [14:21:48.013] [bbotk]       cp minsplit classif.ce runtime_learners
## INFO  [14:21:48.013] [bbotk]  0.02575        3     0.2148            0.009
## INFO  [14:21:48.013] [bbotk]                                 uhash
## INFO  [14:21:48.013] [bbotk]  423587ef-fdc5-4a67-80fa-01d2c856a679
## INFO  [14:21:48.014] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:48.104] [bbotk] Result of batch 16:
## INFO  [14:21:48.105] [bbotk]       cp minsplit classif.ce runtime_learners
## INFO  [14:21:48.105] [bbotk]  0.07525       10       0.25            0.009
## INFO  [14:21:48.105] [bbotk]                                 uhash
## INFO  [14:21:48.105] [bbotk]  8abdbab9-914c-4850-8a6d-888d26852a19
## INFO  [14:21:48.107] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:48.201] [bbotk] Result of batch 17:
## INFO  [14:21:48.203] [bbotk]   cp minsplit classif.ce runtime_learners                                uhash
## INFO  [14:21:48.203] [bbotk]  0.1        1       0.25            0.009 371d7ab6-69bf-4fe5-ba1f-b49468d4c8d0
## INFO  [14:21:48.205] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:48.295] [bbotk] Result of batch 18:
## INFO  [14:21:48.297] [bbotk]       cp minsplit classif.ce runtime_learners
## INFO  [14:21:48.297] [bbotk]  0.07525        3       0.25            0.009
## INFO  [14:21:48.297] [bbotk]                                 uhash
## INFO  [14:21:48.297] [bbotk]  0ff48422-8552-490b-b5f4-60c65ae3c295
## INFO  [14:21:48.299] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:48.398] [bbotk] Result of batch 19:
## INFO  [14:21:48.400] [bbotk]     cp minsplit classif.ce runtime_learners
## INFO  [14:21:48.400] [bbotk]  0.001        3     0.2969            0.012
## INFO  [14:21:48.400] [bbotk]                                 uhash
## INFO  [14:21:48.402] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:21:48.492] [bbotk] Result of batch 20:
## INFO  [14:21:48.494] [bbotk]       cp minsplit classif.ce runtime_learners
## INFO  [14:21:48.494] [bbotk]  0.02575       10     0.2148            0.009
## INFO  [14:21:48.494] [bbotk]                                 uhash
## INFO  [14:21:48.494] [bbotk]  1ac1273f-90b5-41ce-8672-4d673169185d
## INFO  [14:21:48.499] [bbotk] Finished optimizing after 20 evaluation(s)
## INFO  [14:21:48.500] [bbotk] Result:
## INFO  [14:21:48.502] [bbotk]       cp minsplit learner_param_vals  x_domain classif.ce
## INFO  [14:21:48.502] [bbotk]  0.02575        8          <list[3]> <list[2]>     0.2148
##         cp minsplit learner_param_vals  x_domain classif.ce
## 1: 0.02575        8          <list[3]> <list[2]>     0.2148
instance$result_learner_param_vals ##$xval
## [1] 0
##
## $cp ## [1] 0.02575 ## ##$minsplit
## [1] 8
instance$result_y ## classif.ce ## 0.2148 One can investigate all resamplings which were undertaken, as they are stored in the archive of the TuningInstanceSingleCrit and can be accessed by using as.data.table(): as.data.table(instance$archive)
##          cp minsplit classif.ce x_domain_cp x_domain_minsplit runtime_learners
##  1: 0.05050       10     0.2500     0.05050                10            0.017
##  2: 0.05050        1     0.2500     0.05050                 1            0.009
##  3: 0.05050        3     0.2500     0.05050                 3            0.009
##  4: 0.02575        8     0.2148     0.02575                 8            0.009
##  5: 0.07525        5     0.2500     0.07525                 5            0.010
##  6: 0.07525        8     0.2500     0.07525                 8            0.009
##  7: 0.10000        5     0.2500     0.10000                 5            0.009
##  8: 0.00100        1     0.3008     0.00100                 1            0.011
##  9: 0.00100        8     0.3008     0.00100                 8            0.011
## 10: 0.07525        1     0.2500     0.07525                 1            0.010
## 11: 0.10000        3     0.2500     0.10000                 3            0.015
## 12: 0.05050        5     0.2500     0.05050                 5            0.009
## 13: 0.00100        5     0.2930     0.00100                 5            0.011
## 14: 0.02575        5     0.2148     0.02575                 5            0.008
## 15: 0.02575        3     0.2148     0.02575                 3            0.009
## 16: 0.07525       10     0.2500     0.07525                10            0.009
## 17: 0.10000        1     0.2500     0.10000                 1            0.009
## 18: 0.07525        3     0.2500     0.07525                 3            0.009
## 19: 0.00100        3     0.2969     0.00100                 3            0.012
## 20: 0.02575       10     0.2148     0.02575                10            0.009
##               timestamp batch_nr      resample_result
##  1: 2021-09-19 14:21:46        1 <ResampleResult[20]>
##  2: 2021-09-19 14:21:46        2 <ResampleResult[20]>
##  3: 2021-09-19 14:21:46        3 <ResampleResult[20]>
##  4: 2021-09-19 14:21:46        4 <ResampleResult[20]>
##  5: 2021-09-19 14:21:46        5 <ResampleResult[20]>
##  6: 2021-09-19 14:21:47        6 <ResampleResult[20]>
##  7: 2021-09-19 14:21:47        7 <ResampleResult[20]>
##  8: 2021-09-19 14:21:47        8 <ResampleResult[20]>
##  9: 2021-09-19 14:21:47        9 <ResampleResult[20]>
## 10: 2021-09-19 14:21:47       10 <ResampleResult[20]>
## 11: 2021-09-19 14:21:47       11 <ResampleResult[20]>
## 12: 2021-09-19 14:21:47       12 <ResampleResult[20]>
## 13: 2021-09-19 14:21:47       13 <ResampleResult[20]>
## 14: 2021-09-19 14:21:47       14 <ResampleResult[20]>
## 15: 2021-09-19 14:21:48       15 <ResampleResult[20]>
## 16: 2021-09-19 14:21:48       16 <ResampleResult[20]>
## 17: 2021-09-19 14:21:48       17 <ResampleResult[20]>
## 18: 2021-09-19 14:21:48       18 <ResampleResult[20]>
## 19: 2021-09-19 14:21:48       19 <ResampleResult[20]>
## 20: 2021-09-19 14:21:48       20 <ResampleResult[20]>

In sum, the grid search evaluated 20/25 different configurations of the grid in a random order before the Terminator stopped the tuning.

The associated resampling iterations can be accessed in the BenchmarkResult:

instance$archive$benchmark_result
## <BenchmarkResult> of 20 rows with 20 resampling runs
##  nr task_id    learner_id resampling_id iters warnings errors
##   1    pima classif.rpart       holdout     1        0      0
##   2    pima classif.rpart       holdout     1        0      0
##   3    pima classif.rpart       holdout     1        0      0
##   4    pima classif.rpart       holdout     1        0      0
##   5    pima classif.rpart       holdout     1        0      0
##   6    pima classif.rpart       holdout     1        0      0
##   7    pima classif.rpart       holdout     1        0      0
##   8    pima classif.rpart       holdout     1        0      0
##   9    pima classif.rpart       holdout     1        0      0
##  10    pima classif.rpart       holdout     1        0      0
##  11    pima classif.rpart       holdout     1        0      0
##  12    pima classif.rpart       holdout     1        0      0
##  13    pima classif.rpart       holdout     1        0      0
##  14    pima classif.rpart       holdout     1        0      0
##  15    pima classif.rpart       holdout     1        0      0
##  16    pima classif.rpart       holdout     1        0      0
##  17    pima classif.rpart       holdout     1        0      0
##  18    pima classif.rpart       holdout     1        0      0
##  19    pima classif.rpart       holdout     1        0      0
##  20    pima classif.rpart       holdout     1        0      0

The uhash column links the resampling iterations to the evaluated configurations stored in instance$archive$data. This allows e.g. to score the included ResampleResults on a different measure.

instance$archive$benchmark_result$score(msr("classif.acc")) ## uhash nr task task_id ## 1: 063add42-bb3b-4e95-bea0-5949ff3db42c 1 <TaskClassif[47]> pima ## 2: f7a9dbb3-adc0-47f7-9f7f-bce6a868901d 2 <TaskClassif[47]> pima ## 3: 7e16fb27-e9ee-4242-a5cb-24ab96448c54 3 <TaskClassif[47]> pima ## 4: 14dd5c15-2660-403c-bd45-930364d7b8e0 4 <TaskClassif[47]> pima ## 5: 0c498a40-916e-4407-a57e-6ac55ca26140 5 <TaskClassif[47]> pima ## 6: 76b60a6b-26f0-485e-8193-4bf34e23fb6f 6 <TaskClassif[47]> pima ## 7: fd978152-4f4f-4e57-98e7-b262c98332ca 7 <TaskClassif[47]> pima ## 8: 547394f0-657f-452d-8801-6bba95e6d7bf 8 <TaskClassif[47]> pima ## 9: 45ddb76e-9907-454d-a18b-6cc16be789fa 9 <TaskClassif[47]> pima ## 10: d52c48ac-10fa-4f62-b948-4328c68c5790 10 <TaskClassif[47]> pima ## 11: 9fca5392-f1cc-4ff1-a337-79d0b543bb6c 11 <TaskClassif[47]> pima ## 12: 2b70991a-1b2b-415b-a6bf-3b8f15f671a1 12 <TaskClassif[47]> pima ## 13: 1c10594d-82c7-4740-aa7d-26aff0525cc0 13 <TaskClassif[47]> pima ## 14: dd97cfc0-1c3e-4a53-afa7-4b7499ba37fc 14 <TaskClassif[47]> pima ## 15: 423587ef-fdc5-4a67-80fa-01d2c856a679 15 <TaskClassif[47]> pima ## 16: 8abdbab9-914c-4850-8a6d-888d26852a19 16 <TaskClassif[47]> pima ## 17: 371d7ab6-69bf-4fe5-ba1f-b49468d4c8d0 17 <TaskClassif[47]> pima ## 18: 0ff48422-8552-490b-b5f4-60c65ae3c295 18 <TaskClassif[47]> pima ## 19: eb6ead2b-d1c5-4428-bcf3-dbc91a8df1a7 19 <TaskClassif[47]> pima ## 20: 1ac1273f-90b5-41ce-8672-4d673169185d 20 <TaskClassif[47]> pima ## learner learner_id resampling ## 1: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 2: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 3: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 4: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 5: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 6: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 7: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 8: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 9: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 10: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 11: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 12: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 13: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 14: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 15: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 16: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 17: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 18: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 19: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## 20: <LearnerClassifRpart[36]> classif.rpart <ResamplingHoldout[19]> ## resampling_id iteration prediction classif.acc ## 1: holdout 1 <PredictionClassif[19]> 0.7500 ## 2: holdout 1 <PredictionClassif[19]> 0.7500 ## 3: holdout 1 <PredictionClassif[19]> 0.7500 ## 4: holdout 1 <PredictionClassif[19]> 0.7852 ## 5: holdout 1 <PredictionClassif[19]> 0.7500 ## 6: holdout 1 <PredictionClassif[19]> 0.7500 ## 7: holdout 1 <PredictionClassif[19]> 0.7500 ## 8: holdout 1 <PredictionClassif[19]> 0.6992 ## 9: holdout 1 <PredictionClassif[19]> 0.6992 ## 10: holdout 1 <PredictionClassif[19]> 0.7500 ## 11: holdout 1 <PredictionClassif[19]> 0.7500 ## 12: holdout 1 <PredictionClassif[19]> 0.7500 ## 13: holdout 1 <PredictionClassif[19]> 0.7070 ## 14: holdout 1 <PredictionClassif[19]> 0.7852 ## 15: holdout 1 <PredictionClassif[19]> 0.7852 ## 16: holdout 1 <PredictionClassif[19]> 0.7500 ## 17: holdout 1 <PredictionClassif[19]> 0.7500 ## 18: holdout 1 <PredictionClassif[19]> 0.7500 ## 19: holdout 1 <PredictionClassif[19]> 0.7031 ## 20: holdout 1 <PredictionClassif[19]> 0.7852 Now the optimized hyperparameters can take the previously created Learner, set the returned hyperparameters and train it on the full dataset. learner$param_set$values = instance$result_learner_param_vals
learner$train(task) The trained model can now be used to make a prediction on external data. Note that predicting on observations present in the task, should be avoided. The model has seen these observations already during tuning and therefore results would be statistically biased. Hence, the resulting performance measure would be over-optimistic. Instead, to get statistically unbiased performance estimates for the current task, nested resampling is required. ### 3.1.4 Automating the Tuning The AutoTuner wraps a learner and augments it with an automatic tuning for a given set of hyperparameters. Because the AutoTuner itself inherits from the Learner base class, it can be used like any other learner. Analogously to the previous subsection, a new classification tree learner is created. This classification tree learner automatically tunes the parameters cp and minsplit using an inner resampling (holdout). We create a terminator which allows 10 evaluations, and use a simple random search as tuning algorithm: learner = lrn("classif.rpart") search_space = ps( cp = p_dbl(lower = 0.001, upper = 0.1), minsplit = p_int(lower = 1, upper = 10) ) terminator = trm("evals", n_evals = 10) tuner = tnr("random_search") at = AutoTuner$new(
learner = learner,
resampling = rsmp("holdout"),
measure = msr("classif.ce"),
search_space = search_space,
terminator = terminator,
tuner = tuner
)
at
## <AutoTuner:classif.rpart.tuned>
## * Model: -
## * Search Space:
## <ParamSet>
##          id    class lower upper nlevels        default value
## 1:       cp ParamDbl 0.001   0.1     Inf <NoDefault[3]>
## 2: minsplit ParamInt 1.000  10.0      10 <NoDefault[3]>
## * Packages: rpart
## * Predict Type: response
## * Feature Types: logical, integer, numeric, factor, ordered
## * Properties: importance, missings, multiclass, selected_features,
##   twoclass, weights

We can now use the learner like any other learner, calling the $train() and $predict() method.

at$train(task) ## INFO [14:21:48.916] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=10, k=0]' ## INFO [14:21:48.933] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:49.021] [bbotk] Result of batch 1: ## INFO [14:21:49.022] [bbotk] cp minsplit classif.ce runtime_learners ## INFO [14:21:49.022] [bbotk] 0.0591 8 0.3516 0.008 ## INFO [14:21:49.022] [bbotk] uhash ## INFO [14:21:49.022] [bbotk] f09c031c-c221-46a0-b2b2-af4beb536abc ## INFO [14:21:49.026] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:49.112] [bbotk] Result of batch 2: ## INFO [14:21:49.114] [bbotk] cp minsplit classif.ce runtime_learners ## INFO [14:21:49.114] [bbotk] 0.05978 6 0.3516 0.008 ## INFO [14:21:49.114] [bbotk] uhash ## INFO [14:21:49.114] [bbotk] 2ae1bd8f-7d7d-4519-bc47-701895da0872 ## INFO [14:21:49.118] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:49.213] [bbotk] Result of batch 3: ## INFO [14:21:49.215] [bbotk] cp minsplit classif.ce runtime_learners ## INFO [14:21:49.215] [bbotk] 0.03989 10 0.3516 0.009 ## INFO [14:21:49.215] [bbotk] uhash ## INFO [14:21:49.215] [bbotk] 078b74b8-8b69-4a13-bd02-b38bc82d08a2 ## INFO [14:21:49.219] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:49.315] [bbotk] Result of batch 4: ## INFO [14:21:49.317] [bbotk] cp minsplit classif.ce runtime_learners ## INFO [14:21:49.317] [bbotk] 0.07781 3 0.3516 0.01 ## INFO [14:21:49.317] [bbotk] uhash ## INFO [14:21:49.317] [bbotk] 565ea92e-e4db-41f1-ad4b-a05925a6031b ## INFO [14:21:49.321] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:49.409] [bbotk] Result of batch 5: ## INFO [14:21:49.410] [bbotk] cp minsplit classif.ce runtime_learners ## INFO [14:21:49.410] [bbotk] 0.007905 8 0.2812 0.009 ## INFO [14:21:49.410] [bbotk] uhash ## INFO [14:21:49.410] [bbotk] 71151613-012f-4f77-92c2-53ec265eafc5 ## INFO [14:21:49.413] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:49.510] [bbotk] Result of batch 6: ## INFO [14:21:49.512] [bbotk] cp minsplit classif.ce runtime_learners ## INFO [14:21:49.512] [bbotk] 0.05173 7 0.3516 0.008 ## INFO [14:21:49.512] [bbotk] uhash ## INFO [14:21:49.512] [bbotk] a6cccc55-eccd-4a8e-892d-863120934abc ## INFO [14:21:49.516] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:49.600] [bbotk] Result of batch 7: ## INFO [14:21:49.602] [bbotk] cp minsplit classif.ce runtime_learners ## INFO [14:21:49.602] [bbotk] 0.09692 4 0.3516 0.009 ## INFO [14:21:49.602] [bbotk] uhash ## INFO [14:21:49.602] [bbotk] 1dba9a91-a010-4199-a9ec-b66fcc8fbb67 ## INFO [14:21:49.605] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:49.704] [bbotk] Result of batch 8: ## INFO [14:21:49.706] [bbotk] cp minsplit classif.ce runtime_learners ## INFO [14:21:49.706] [bbotk] 0.09489 6 0.3516 0.009 ## INFO [14:21:49.706] [bbotk] uhash ## INFO [14:21:49.706] [bbotk] 61219708-190e-48a0-95ee-7a0ed55f1513 ## INFO [14:21:49.709] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:49.799] [bbotk] Result of batch 9: ## INFO [14:21:49.801] [bbotk] cp minsplit classif.ce runtime_learners ## INFO [14:21:49.801] [bbotk] 0.04475 8 0.3516 0.009 ## INFO [14:21:49.801] [bbotk] uhash ## INFO [14:21:49.801] [bbotk] d088af4c-dd07-434c-974c-6c0cf1206fc1 ## INFO [14:21:49.806] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:49.909] [bbotk] Result of batch 10: ## INFO [14:21:49.911] [bbotk] cp minsplit classif.ce runtime_learners ## INFO [14:21:49.911] [bbotk] 0.04027 6 0.3516 0.01 ## INFO [14:21:49.911] [bbotk] uhash ## INFO [14:21:49.911] [bbotk] fe49da15-d129-476d-b994-a031e4d390a2 ## INFO [14:21:49.919] [bbotk] Finished optimizing after 10 evaluation(s) ## INFO [14:21:49.919] [bbotk] Result: ## INFO [14:21:49.921] [bbotk] cp minsplit learner_param_vals x_domain classif.ce ## INFO [14:21:49.921] [bbotk] 0.007905 8 <list[3]> <list[2]> 0.2812 We can also pass it to resample() and benchmark(). This is called nested resampling which is discussed in the next chapter. ## 3.2 Tuning Search Spaces When running an optimization, it is important to inform the tuning algorithm about what hyperparameters are valid. Here the names, types, and valid ranges of each hyperparameter are important. All this information is communicated with objects of the class ParamSet, which is defined in paradox. While it is possible to create ParamSet-objects using its $new-constructor, it is much shorter and readable to use the ps-shortcut, which will be presented here. For an in-depth description of paradox and its classes, see the paradox chapter.

Note, that ParamSet objects exist in two contexts. First, ParamSet-objects are used to define the space of valid parameter setting for a learner (and other objects). Second, they are used to define a search space for tuning. We are mainly interested in the latter. For an example we can consider the minsplit parameter of the classif.rpart Learner. The ParamSet associated with the learner has a lower but no upper bound. However, for tuning the value, a lower and upper bound must be given because tuning search spaces need to be bounded. For Learner or PipeOp objects, typically “unbounded” ParamSets are used. Here, however, we will mainly focus on creating “bounded” ParamSets that can be used for tuning. See the in-depth paradox chapter for more details on using ParamSets to define parameter ranges for use-cases besides tuning.

### 3.2.1 Creating ParamSets

An empty ParamSet – not yet very useful – can be constructed using just the ps call:

library("mlr3verse")

search_space = ps()
print(search_space)
## <ParamSet>
## Empty.

ps takes named Domain arguments that are turned into parameters. A possible search space for the "classif.svm" learner could for example be:

search_space = ps(
cost = p_dbl(lower = 0.1, upper = 10),
kernel = p_fct(levels = c("polynomial", "radial"))
)
print(search_space)
## <ParamSet>
##        id    class lower upper nlevels        default value
## 1:   cost ParamDbl   0.1    10     Inf <NoDefault[3]>
## 2: kernel ParamFct    NA    NA       2 <NoDefault[3]>

There are five domain constructors that produce a parameters when given to ps:

Constructor Description Is bounded? Underlying Class
p_dbl Real valued parameter (“double”) When upper and lower are given ParamDbl
p_int Integer parameter When upper and lower are given ParamInt
p_fct Discrete valued parameter (“factor”) Always ParamFct
p_lgl Logical / Boolean parameter Always ParamLgl
p_uty Untyped parameter Never ParamUty

These domain constructors each take some of the following arguments:

• lower, upper: lower and upper bound of numerical parameters (p_dbl and p_int). These need to be given to get bounded parameter spaces valid for tuning.
• levels: Allowed categorical values for p_fct parameters. Required argument for p_fct. See below for more details on this parameter.
• trafo: transformation function, see below.
• depends: dependencies, see below.
• tags: Further information about a parameter, used for example by the hyperband tuner.
• default: Value corresponding to default behavior when the parameter is not given. Not used for tuning search spaces.
• special_vals: Valid values besides the normally accepted values for a parameter. Not used for tuning search spaces.
• custom_check: Function that checks whether a value given to p_uty is valid. Not used for tuning search spaces.

The lower, upper, or levels parameters are always at the first (or second, for upper) position of the respective constructors, so it is preferred to omit them when defining a ParamSet, for improved conciseness:

search_space = ps(cost = p_dbl(0.1, 10), kernel = p_fct(c("polynomial", "radial")))

### 3.2.2 Transformations (trafo)

We can use the paradox function generate_design_grid to look at the values that would be evaluated by grid search. (We are using rbindlist() here because the result of $transpose() is a list that is harder to read. If we didn’t use $transpose(), on the other hand, the transformations that we investigate here are not applied.)

library("data.table")
rbindlist(generate_design_grid(search_space, 3)$transpose()) ## cost kernel ## 1: 0.10 polynomial ## 2: 0.10 radial ## 3: 5.05 polynomial ## 4: 5.05 radial ## 5: 10.00 polynomial ## 6: 10.00 radial We notice that the cost parameter is taken on a linear scale. We assume, however, that the difference of cost between 0.1 and 1 should have a similar effect as the difference between 1 and 10. Therefore it makes more sense to tune it on a logarithmic scale. This is done by using a transformation (trafo). This is a function that is applied to a parameter after it has been sampled by the tuner. We can tune cost on a logarithmic scale by sampling on the linear scale [-1, 1] and computing 10^x from that value. search_space = ps( cost = p_dbl(-1, 1, trafo = function(x) 10^x), kernel = p_fct(c("polynomial", "radial")) ) rbindlist(generate_design_grid(search_space, 3)$transpose())
##    cost     kernel
## 1:  0.1 polynomial
## 3:  1.0 polynomial
## 5: 10.0 polynomial
## 6: 10.0     radial

It is even possible to attach another transformation to the ParamSet as a whole that gets executed after individual parameter’s transformations were performed. It is given through the .extra_trafo argument and should be a function with parameters x and param_set that takes a list of parameter values in x and returns a modified list. This transformation can access all parameter values of an evaluation and modify them with interactions. It is even possible to add or remove parameters. (The following is a bit of a silly example.)

search_space = ps(
cost = p_dbl(-1, 1, trafo = function(x) 10^x),
.extra_trafo = function(x, param_set) {
if (x$kernel == "polynomial") { x$cost = x$cost * 2 } x } ) rbindlist(generate_design_grid(search_space, 3)$transpose())
##    cost     kernel
## 1:  0.2 polynomial
## 3:  2.0 polynomial
## 5: 20.0 polynomial
## 6: 10.0     radial

The available types of search space parameters are limited: continuous, integer, discrete, and logical scalars. There are many machine learning algorithms, however, that take parameters of other types, for example vectors or functions. These can not be defined in a search space ParamSet, and they are often given as ParamUty in the Learner’s ParamSet. When trying to tune over these hyperparameters, it is necessary to perform a Transformation that changes the type of a parameter.

An example is the class.weights parameter of the SVM, which takes a named vector of class weights with one entry for each target class. The trafo that would tune class.weights for the tsk("spam") dataset could be:

search_space = ps(
class.weights = p_dbl(0.1, 0.9, trafo = function(x) c(spam = x, nonspam = 1 - x))
)
generate_design_grid(search_space, 3)$transpose() ## [[1]] ## [[1]]$class.weights
##    spam nonspam
##     0.1     0.9
##
##
## [[2]]
## [[2]]$class.weights ## spam nonspam ## 0.5 0.5 ## ## ## [[3]] ## [[3]]$class.weights
##    spam nonspam
##     0.9     0.1

(We are omitting rbindlist() in this example because it breaks the vector valued return elements.)

### 3.2.3 Automatic Factor Level Transformation

A common use-case is the necessity to specify a list of values that should all be tried (or sampled from). It may be the case that a hyperparameter accepts function objects as values and a certain list of functions should be tried. Or it may be that a choice of special numeric values should be tried. For this, the p_fct constructor’s level argument may be a value that is not a character vector, but something else. If, for example, only the values 0.1, 3, and 10 should be tried for the cost parameter, even when doing random search, then the following search space would achieve that:

search_space = ps(
cost = p_fct(c(0.1, 3, 10)),
)
rbindlist(generate_design_grid(search_space, 3)$transpose()) ## cost kernel ## 1: 0.1 polynomial ## 2: 0.1 radial ## 3: 3.0 polynomial ## 4: 3.0 radial ## 5: 10.0 polynomial ## 6: 10.0 radial This is equivalent to the following: search_space = ps( cost = p_fct(c("0.1", "3", "10"), trafo = function(x) list(0.1 = 0.1, 3 = 3, 10 = 10)[[x]]), kernel = p_fct(c("polynomial", "radial")) ) rbindlist(generate_design_grid(search_space, 3)$transpose())
##    cost     kernel
## 1:  0.1 polynomial
## 3:  3.0 polynomial
## 5: 10.0 polynomial
## 6: 10.0     radial

This may seem silly, but makes sense when considering that factorial tuning parameters are always character values:

search_space = ps(
cost = p_fct(c(0.1, 3, 10)),
)
typeof(search_space$params$cost$levels) ## [1] "character" Be aware that this results in an “unordered” hyperparameter, however. Tuning algorithms that make use of ordering information of parameters, like genetic algorithms or model based optimization, will perform worse when this is done. For these algorithms, it may make more sense to define a p_dbl or p_int with a more fitting trafo. The class.weights case from above can also be implemented like this, if there are only a few candidates of class.weights vectors that should be tried. Note that the levels argument of p_fct must be named if there is no easy way for as.character() to create names: search_space = ps( class.weights = p_fct( list( candidate_a = c(spam = 0.5, nonspam = 0.5), candidate_b = c(spam = 0.3, nonspam = 0.7) ) ) ) generate_design_grid(search_space)$transpose()
## [[1]]
## [[1]]$class.weights ## spam nonspam ## 0.5 0.5 ## ## ## [[2]] ## [[2]]$class.weights
##    spam nonspam
##     0.3     0.7

### 3.2.4 Parameter Dependencies (depends)

Some parameters are only relevant when another parameter has a certain value, or one of several values. The SVM, for example, has the degree parameter that is only valid when kernel is "polynomial". This can be specified using the depends argument. It is an expression that must involve other parameters and be of the form <param> == <scalar>, <param> %in% <vector>, or multiple of these chained by &&. To tune the degree parameter, one would need to do the following:

search_space = ps(
cost = p_dbl(-1, 1, trafo = function(x) 10^x),
degree = p_int(1, 3, depends = kernel == "polynomial")
)
rbindlist(generate_design_grid(search_space, 3)$transpose(), fill = TRUE) ## cost kernel degree ## 1: 0.1 polynomial 1 ## 2: 0.1 polynomial 2 ## 3: 0.1 polynomial 3 ## 4: 0.1 radial NA ## 5: 1.0 polynomial 1 ## 6: 1.0 polynomial 2 ## 7: 1.0 polynomial 3 ## 8: 1.0 radial NA ## 9: 10.0 polynomial 1 ## 10: 10.0 polynomial 2 ## 11: 10.0 polynomial 3 ## 12: 10.0 radial NA ### 3.2.5 Creating Tuning ParamSets from other ParamSets Having to define a tuning ParamSet for a Learner that already has parameter set information may seem unnecessarily tedious, and there is indeed a way to create tuning ParamSets from a Learner’s ParamSet, making use of as much information as already available. This is done by setting values of a Learner’s ParamSet to so-called TuneTokens, constructed with a to_tune call. This can be done in the same way that other hyperparameters are set to specific values. It can be understood as the hyperparameters being tagged for later tuning. The resulting ParamSet used for tuning can be retrieved using the $search_space() method.

learner = lrn("classif.svm")
learner$param_set$values$kernel = "polynomial" # for example learner$param_set$values$degree = to_tune(lower = 1, upper = 3)

print(learner$param_set$search_space())
## <ParamSet>
##        id    class lower upper nlevels        default value
## 1: degree ParamInt     1     3       3 <NoDefault[3]>
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose()) ## degree ## 1: 1 ## 2: 2 ## 3: 3 It is possible to omit lower here, because it can be inferred from the lower bound of the degree parameter itself. For other parameters, that are already bounded, it is possible to not give any bounds at all, because their ranges are already bounded. An example is the logical shrinking hyperparameter: learner$param_set$values$shrinking = to_tune()

print(learner$param_set$search_space())
## <ParamSet>
##           id    class lower upper nlevels        default value
## 1:    degree ParamInt     1     3       3 <NoDefault[3]>
## 2: shrinking ParamLgl    NA    NA       2           TRUE
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose()) ## degree shrinking ## 1: 1 TRUE ## 2: 1 FALSE ## 3: 2 TRUE ## 4: 2 FALSE ## 5: 3 TRUE ## 6: 3 FALSE to_tune can also be constructed with a Domain object, i.e. something constructed with a p_*** call. This way it is possible to tune continuous parameters with discrete values, or to give trafos or dependencies. One could, for example, tune the cost as above on three given special values, and introduce a dependency of shrinking on it. Notice that a short form for to_tune(<levels>) is a short form of to_tune(p_fct(<levels>)). (When introducing the dependency, we need to use the degree value from before the implicit trafo, which is the name or as.character() of the respective value, here "val2"!) learner$param_set$values$type = "C-classification"  # needs to be set because of a bug in paradox
learner$param_set$values$cost = to_tune(c(val1 = 0.3, val2 = 0.7)) learner$param_set$values$shrinking = to_tune(p_lgl(depends = cost == "val2"))

print(learner$param_set$search_space())
## <ParamSet>
##           id    class lower upper nlevels        default parents value
## 1:      cost ParamFct    NA    NA       2 <NoDefault[3]>
## 2:    degree ParamInt     1     3       3 <NoDefault[3]>
## 3: shrinking ParamLgl    NA    NA       2 <NoDefault[3]>    cost
## Trafo is set.
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose(), fill = TRUE) ## degree cost shrinking ## 1: 1 0.3 NA ## 2: 1 0.7 TRUE ## 3: 1 0.7 FALSE ## 4: 2 0.3 NA ## 5: 2 0.7 TRUE ## 6: 2 0.7 FALSE ## 7: 3 0.3 NA ## 8: 3 0.7 TRUE ## 9: 3 0.7 FALSE The search_space() picks up dependencies fromt the underlying ParamSet automatically. So if the kernel is tuned, then degree automatically gets the dependency on it, without us having to specify that. (Here we reset cost and shrinking to NULL for the sake of clarity of the generated output.) learner$param_set$values$cost = NULL
learner$param_set$values$shrinking = NULL learner$param_set$values$kernel = to_tune(c("polynomial", "radial"))

print(learner$param_set$search_space())
## <ParamSet>
##        id    class lower upper nlevels        default parents value
## 1: degree ParamInt     1     3       3 <NoDefault[3]>  kernel
## 2: kernel ParamFct    NA    NA       2 <NoDefault[3]>
rbindlist(generate_design_grid(learner$param_set$search_space(), 3)$transpose(), fill = TRUE) ## kernel degree ## 1: polynomial 1 ## 2: polynomial 2 ## 3: polynomial 3 ## 4: radial NA It is even possible to define whole ParamSets that get tuned over for a single parameter. This may be especially useful for vector hyperparameters that should be searched along multiple dimensions. This ParamSet must, however, have an .extra_trafo that returns a list with a single element, because it corresponds to a single hyperparameter that is being tuned. Suppose the class.weights hyperparameter should be tuned along two dimensions: learner$param_set$values$class.weights = to_tune(
ps(spam = p_dbl(0.1, 0.9), nonspam = p_dbl(0.1, 0.9),
.extra_trafo = function(x, param_set) list(c(spam = x$spam, nonspam = x$nonspam))
))
head(generate_design_grid(learner$param_set$search_space(), 3)$transpose(), 3) ## [[1]] ## [[1]]$kernel
## [1] "polynomial"
##
## [[1]]$degree ## [1] 1 ## ## [[1]]$class.weights
##    spam nonspam
##     0.1     0.1
##
##
## [[2]]
## [[2]]$kernel ## [1] "polynomial" ## ## [[2]]$degree
## [1] 1
##
## [[2]]$class.weights ## spam nonspam ## 0.1 0.5 ## ## ## [[3]] ## [[3]]$kernel
## [1] "polynomial"
##
## [[3]]$degree ## [1] 1 ## ## [[3]]$class.weights
##    spam nonspam
##     0.1     0.9

## 3.3 Nested Resampling

Evaluating a machine learning model often requires an additional layer of resampling when hyperparameters or features have to be selected. Nested resampling separates these model selection steps from the process estimating the performance of the model. If the same data is used for the model selection steps and the evaluation of the model itself, the resulting performance estimate of the model might be severely biased. One reason is that the repeated evaluation of the model on the test data could leak information about its structure into the model, what results in over-optimistic performance estimates. Keep in mind that nested resampling is a statistical procedure to estimate the predictive performance of the model trained on the full dataset. Nested resampling is not a procedure to select optimal hyperparameters. The resampling produces many hyperparameter configurations which should be not used to construct a final model .

The graphic above illustrates nested resampling for hyperparameter tuning with 3-fold cross-validation in the outer and 4-fold cross-validation in the inner loop.

In the outer resampling loop, we have three pairs of training/test sets. On each of these outer training sets parameter tuning is done, thereby executing the inner resampling loop. This way, we get one set of selected hyperparameters for each outer training set. Then the learner is fitted on each outer training set using the corresponding selected hyperparameters. Subsequently, we can evaluate the performance of the learner on the outer test sets. The aggregated performance on the outer test sets is the unbiased performance estimate of the model.

### 3.3.1 Execution

The previous section examined the optimization of a simple classification tree on the mlr_tasks_pima. We continue the example and estimate the predictive performance of the model with nested resampling.

We use a 4-fold cross-validation in the inner resampling loop. The AutoTuner executes the hyperparameter tuning and is stopped after 5 evaluations. The hyperparameter configurations are proposed by grid search.

library("mlr3verse")

learner = lrn("classif.rpart")
resampling = rsmp("holdout")
measure = msr("classif.ce")
search_space = ps(cp = p_dbl(lower = 0.001, upper = 0.1))
terminator = trm("evals", n_evals = 5)
tuner = tnr("grid_search", resolution = 10)

at = AutoTuner$new(learner, resampling, measure, terminator, tuner, search_space) A 3-fold cross-validation is used in the outer resampling loop. On each of the three outer train sets hyperparameter tuning is done and we receive three optimized hyperparameter configurations. To execute the nested resampling, we pass the AutoTuner to the resample() function. We have to set store_models = TRUE because we need the AutoTuner models to investigate the inner tuning. task = tsk("pima") outer_resampling = rsmp("cv", folds = 3) rr = resample(task, at, outer_resampling, store_models = TRUE) ## INFO [14:21:58.716] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerGridSearch>' and '<TerminatorEvals> [n_evals=5, k=0]' ## INFO [14:21:58.750] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:58.869] [bbotk] Result of batch 1: ## INFO [14:21:58.872] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:21:58.872] [bbotk] 0.078 0.2222 0.016 13a32d6a-557c-42f7-91b0-e374e1a35a9e ## INFO [14:21:58.874] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:58.965] [bbotk] Result of batch 2: ## INFO [14:21:58.967] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:21:58.967] [bbotk] 0.045 0.2222 0.009 df035515-620e-446b-af89-ffe383d6bea7 ## INFO [14:21:58.969] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:59.067] [bbotk] Result of batch 3: ## INFO [14:21:59.069] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:21:59.069] [bbotk] 0.067 0.2222 0.008 537451b7-d14d-4df6-882f-f14dfcff50d7 ## INFO [14:21:59.070] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:59.167] [bbotk] Result of batch 4: ## INFO [14:21:59.168] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:21:59.168] [bbotk] 0.1 0.2222 0.009 861bc132-6bab-4059-848c-f10d88958511 ## INFO [14:21:59.170] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:59.264] [bbotk] Result of batch 5: ## INFO [14:21:59.266] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:21:59.266] [bbotk] 0.034 0.2222 0.009 1ed587be-73cd-423c-8f27-1e32c0cd23b4 ## INFO [14:21:59.271] [bbotk] Finished optimizing after 5 evaluation(s) ## INFO [14:21:59.272] [bbotk] Result: ## INFO [14:21:59.273] [bbotk] cp learner_param_vals x_domain classif.ce ## INFO [14:21:59.273] [bbotk] 0.078 <list[2]> <list[1]> 0.2222 ## INFO [14:21:59.327] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerGridSearch>' and '<TerminatorEvals> [n_evals=5, k=0]' ## INFO [14:21:59.330] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:59.419] [bbotk] Result of batch 1: ## INFO [14:21:59.421] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:21:59.421] [bbotk] 0.1 0.2164 0.008 0ba63400-f9da-4d7d-bb1a-888abc563da6 ## INFO [14:21:59.423] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:59.512] [bbotk] Result of batch 2: ## INFO [14:21:59.514] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:21:59.514] [bbotk] 0.001 0.2281 0.008 a8301cbd-5425-4165-ba51-979d8c000b57 ## INFO [14:21:59.516] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:59.682] [bbotk] Result of batch 3: ## INFO [14:21:59.684] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:21:59.684] [bbotk] 0.034 0.2164 0.008 ebf8a9e0-ef92-4c64-aad4-df53382fe50c ## INFO [14:21:59.685] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:59.768] [bbotk] Result of batch 4: ## INFO [14:21:59.770] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:21:59.770] [bbotk] 0.067 0.2164 0.007 33fe6d5f-dc72-4b64-a111-2064e650d9fd ## INFO [14:21:59.771] [bbotk] Evaluating 1 configuration(s) ## INFO [14:21:59.859] [bbotk] Result of batch 5: ## INFO [14:21:59.860] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:21:59.860] [bbotk] 0.012 0.2281 0.008 ba0a1087-2e5f-4101-bfb5-6b963386d7e9 ## INFO [14:21:59.865] [bbotk] Finished optimizing after 5 evaluation(s) ## INFO [14:21:59.865] [bbotk] Result: ## INFO [14:21:59.866] [bbotk] cp learner_param_vals x_domain classif.ce ## INFO [14:21:59.866] [bbotk] 0.1 <list[2]> <list[1]> 0.2164 ## INFO [14:21:59.918] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerGridSearch>' and '<TerminatorEvals> [n_evals=5, k=0]' ## INFO [14:21:59.921] [bbotk] Evaluating 1 configuration(s) ## INFO [14:22:00.002] [bbotk] Result of batch 1: ## INFO [14:22:00.003] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:22:00.003] [bbotk] 0.089 0.2398 0.009 c0465599-de27-4deb-95da-696fc130bb55 ## INFO [14:22:00.004] [bbotk] Evaluating 1 configuration(s) ## INFO [14:22:00.097] [bbotk] Result of batch 2: ## INFO [14:22:00.099] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:22:00.099] [bbotk] 0.067 0.2398 0.009 c658cf05-3ef1-4777-8a81-8861e6a81741 ## INFO [14:22:00.100] [bbotk] Evaluating 1 configuration(s) ## INFO [14:22:00.184] [bbotk] Result of batch 3: ## INFO [14:22:00.185] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:22:00.185] [bbotk] 0.1 0.2398 0.009 075a5757-79fe-49e0-bbbb-e81e828d3339 ## INFO [14:22:00.187] [bbotk] Evaluating 1 configuration(s) ## INFO [14:22:00.276] [bbotk] Result of batch 4: ## INFO [14:22:00.278] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:22:00.278] [bbotk] 0.045 0.2398 0.008 927a988a-a41d-4b26-8e4e-3e97226722b1 ## INFO [14:22:00.279] [bbotk] Evaluating 1 configuration(s) ## INFO [14:22:00.365] [bbotk] Result of batch 5: ## INFO [14:22:00.367] [bbotk] cp classif.ce runtime_learners uhash ## INFO [14:22:00.367] [bbotk] 0.001 0.2573 0.009 5c1e2a7a-c7b2-4bde-bac6-22f766ed71de ## INFO [14:22:00.371] [bbotk] Finished optimizing after 5 evaluation(s) ## INFO [14:22:00.372] [bbotk] Result: ## INFO [14:22:00.373] [bbotk] cp learner_param_vals x_domain classif.ce ## INFO [14:22:00.373] [bbotk] 0.089 <list[2]> <list[1]> 0.2398 You can freely combine different inner and outer resampling strategies. Nested resampling is not restricted to hyperparameter tuning. You can swap the AutoTuner for a AutoFSelector and estimate the performance of a model which is fitted on an optimized feature subset. ### 3.3.2 Evaluation With the created ResampleResult we can now inspect the executed resampling iterations more closely. See the section on Resampling for more detailed information about ResampleResult objects. We check the inner tuning results for stable hyperparameters. This means that the selected hyperparameters should not vary too much. We might observe unstable models in this example because the small data set and the low number of resampling iterations might introduces too much randomness. Usually, we aim for the selection of stable hyperparameters for all outer training sets. extract_inner_tuning_results(rr) ## iteration cp classif.ce learner_param_vals x_domain task_id ## 1: 1 0.078 0.2222 <list[2]> <list[1]> pima ## 2: 2 0.089 0.2398 <list[2]> <list[1]> pima ## 3: 3 0.100 0.2164 <list[2]> <list[1]> pima ## learner_id resampling_id ## 1: classif.rpart.tuned cv ## 2: classif.rpart.tuned cv ## 3: classif.rpart.tuned cv Next, we want to compare the predictive performances estimated on the outer resampling to the inner resampling. Significantly lower predictive performances on the outer resampling indicate that the models with the optimized hyperparameters overfit the data. rr$score()
##                 task task_id         learner          learner_id
## 1: <TaskClassif[47]>    pima <AutoTuner[40]> classif.rpart.tuned
## 2: <TaskClassif[47]>    pima <AutoTuner[40]> classif.rpart.tuned
## 3: <TaskClassif[47]>    pima <AutoTuner[40]> classif.rpart.tuned
##            resampling resampling_id iteration              prediction
## 1: <ResamplingCV[19]>            cv         1 <PredictionClassif[19]>
## 2: <ResamplingCV[19]>            cv         2 <PredictionClassif[19]>
## 3: <ResamplingCV[19]>            cv         3 <PredictionClassif[19]>
##    classif.ce
## 1:     0.2422
## 2:     0.2617
## 3:     0.2969

The aggregated performance of all outer resampling iterations is essentially the unbiased performance of the model with optimal hyperparameter found by grid search.

rr$aggregate() ## classif.ce ## 0.2669 Note that nested resampling is computationally expensive. For this reason we use relatively small number of hyperparameter configurations and a low number of resampling iterations in this example. In practice, you normally have to increase both. As this is computationally intensive you might want to have a look at the section on Parallelization. ### 3.3.3 Final Model We can use the AutoTuner to tune the hyperparameters of our learner and fit the final model on the full data set. at$train(task)
## INFO  [14:22:00.713] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerGridSearch>' and '<TerminatorEvals> [n_evals=5, k=0]'
## INFO  [14:22:00.716] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:22:00.803] [bbotk] Result of batch 1:
## INFO  [14:22:00.805] [bbotk]     cp classif.ce runtime_learners                                uhash
## INFO  [14:22:00.805] [bbotk]  0.056     0.2812            0.008 a123ea38-5ad1-4a96-b065-5d5e754a7ec8
## INFO  [14:22:00.806] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:22:00.889] [bbotk] Result of batch 2:
## INFO  [14:22:00.891] [bbotk]     cp classif.ce runtime_learners                                uhash
## INFO  [14:22:00.891] [bbotk]  0.089     0.2812            0.009 31a8f982-2ff9-4c4f-b889-9fe2d7498b62
## INFO  [14:22:00.892] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:22:00.986] [bbotk] Result of batch 3:
## INFO  [14:22:00.988] [bbotk]     cp classif.ce runtime_learners                                uhash
## INFO  [14:22:00.988] [bbotk]  0.034     0.2852            0.016 2009121b-27a7-4e5d-b7d2-3173dc7b2998
## INFO  [14:22:00.989] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:22:01.078] [bbotk] Result of batch 4:
## INFO  [14:22:01.080] [bbotk]     cp classif.ce runtime_learners                                uhash
## INFO  [14:22:01.080] [bbotk]  0.023     0.3008            0.009 6e896ee0-8d32-429f-8d9c-17d038aa935e
## INFO  [14:22:01.081] [bbotk] Evaluating 1 configuration(s)
## INFO  [14:22:01.172] [bbotk] Result of batch 5:
## INFO  [14:22:01.173] [bbotk]     cp classif.ce runtime_learners                                uhash
## INFO  [14:22:01.173] [bbotk]  0.078     0.2812            0.015 f8b82e08-86ce-4621-ba71-f0d5af22911b
## INFO  [14:22:01.178] [bbotk] Finished optimizing after 5 evaluation(s)
## INFO  [14:22:01.178] [bbotk] Result:
## INFO  [14:22:01.180] [bbotk]     cp learner_param_vals  x_domain classif.ce
## INFO  [14:22:01.180] [bbotk]  0.056          <list[2]> <list[1]>     0.2812

The trained model can now be used to make predictions on new data. A common mistake is to report the performance estimated on the resampling sets on which the tuning was performed (at$tuning_result$classif.ce) as the model’s performance. Instead, we report the performance estimated with nested resampling as the performance of the model.

## 3.4 Tuning with Hyperband

Besides the more traditional tuning methods, the ecosystem around mlr3 offers another procedure for hyperparameter optimization called Hyperband implemented in the mlr3hyperband package.

Hyperband is a budget-oriented procedure, weeding out suboptimal performing configurations early on during a partially sequential training process, increasing tuning efficiency as a consequence. For this, a combination of incremental resource allocation and early stopping is used: As optimization progresses, computational resources are increased for more promising configurations, while less promising ones are terminated early.

To give an introductory analogy, imagine two horse trainers are given eight untrained horses. Both trainers want to win the upcoming race, but they are only given 32 units of food. Given that each horse can be fed up to 8 units food (“maximum budget” per horse), there is not enough food for all the horses. It is critical to identify the most promising horses early, and give them enough food to improve. So, the trainers need to develop a strategy to split up the food in the best possible way. The first trainer is very optimistic and wants to explore the full capabilities of a horse, because he does not want to pass a judgment on a horse’s performance unless it has been fully trained. So, he divides his budget by the maximum amount he can give to a horse (lets say eight, so $$32 / 8 = 4$$) and randomly picks four horses - his budget simply is not enough to fully train more. Those four horses are then trained to their full capabilities, while the rest is set free. This way, the trainer is confident about choosing the best out of the four trained horses, but he might have overlooked the horse with the highest potential since he only focused on half of them. The other trainer is more creative and develops a different strategy. He thinks, if a horse is not performing well at the beginning, it will also not improve after further training. Based on this assumption, he decides to give one unit of food to each horse and observes how they develop. After the initial food is consumed, he checks their performance and kicks the slowest half out of his training regime. Then, he increases the available food for the remaining, further trains them until the food is consumed again, only to kick out the worst half once more. He repeats this until the one remaining horse gets the rest of the food. This means only one horse is fully trained, but on the flip side, he was able to start training with all eight horses.

On race day, all the horses are put on the starting line. But which trainer will have the winning horse? The one, who tried to train a maximum amount of horses to their fullest? Or the other one, who made assumptions about the training progress of his horses? How the training phases may possibly look like is visualized in figure 3.1.

Hyperband works very similar in some ways, but also different in others. It is not embodied by one of the trainers in our analogy, but more by the person, who would pay them. Hyperband consists of several brackets, each bracket corresponding to a trainer, and we do not care about horses but about hyperparameter configurations of a machine learning algorithm. The budget is not in terms of food, but in terms of a hyperparameter of the learner that scales in some way with the computational effort. An example is the number of epochs we train a neural network, or the number of iterations in boosting. Furthermore, there are not only two brackets (or trainers), but several, each placed at a unique spot between fully explorative of later training stages and extremely selective, equal to higher exploration of early training stages. The level of selection aggressiveness is handled by a user-defined parameter called $$\eta$$. So, $$1/\eta$$ is the fraction of remaining configurations after a bracket removes his worst performing ones, but $$\eta$$ is also the factor by that the budget is increased for the next stage. Because there is a different maximum budget per configuration that makes sense in different scenarios, the user also has to set this as the $$R$$ parameter. No further parameters are required for Hyperband – the full required budget across all brackets is indirectly given by $(\lfloor \log_{\eta}{R} \rfloor + 1)^2 * R$ . To give an idea how a full bracket layout might look like for a specific $$R$$ and $$\eta$$, a quick overview is given in the following table.

Table 3.1: Hyperband layout for $$\eta = 2$$ and $$R = 8$$, consisting of four brackets with $$n$$ as the amount of active configurations.
stage budget n
1 1 8
2 2 4
3 4 2
4 8 1
stage budget n
1 2 6
2 4 3
3 8 1
stage budget n
1 4 4
2 8 2
stage budget n
1 8 4

Of course, early termination based on a performance criterion may be disadvantageous if it is done too aggressively in certain scenarios. A learner to jumping radically in its estimated performance during the training phase may get the best configurations canceled too early, simply because they do not improve quickly enough compared to others. In other words, it is often unclear beforehand if having an high amount of configurations $$n$$, that gets aggressively discarded early, is better than having a high budget $$B$$ per configuration. The arising tradeoff, that has to be made, is called the “$$n$$ versus $$B/n$$ problem”. To create a balance between selection based on early training performance versus exploration of training performances in later training stages, $$\lfloor \log_{\eta}{R} \rfloor + 1$$ brackets are constructed with an associated set of varying sized configurations. Thus, some brackets contain more configurations, with a small initial budget. In these, a lot are discarded after having been trained for only a short amount of time, corresponding to the selective trainer in our horse analogy. Others are constructed with fewer configurations, where discarding only takes place after a significant amount of budget was consumed. The last bracket usually never discards anything, but also starts with only very few configurations – this is equivalent to the trainer explorative of later stages. The former corresponds high $$n$$, while the latter high $$B/n$$. Even though different brackets are initialized with a different amount of configurations and different initial budget sizes, each bracket is assigned (approximately) the same budget $$(\lfloor \log_{\eta}{R} \rfloor + 1) * R$$.

The configurations at the start of each bracket are initialized by random, often uniform sampling. Note that currently all configurations are trained completely from the beginning, so no online updates of models from stage to stage is happening.

To identify the budget for evaluating Hyperband, the user has to specify explicitly which hyperparameter of the learner influences the budget by extending a single hyperparameter in the ParamSet with an argument (tags = "budget"), like in the following snippet:

library("mlr3verse")

# Hyperparameter subset of XGBoost
search_space = ps(
nrounds = p_int(lower = 1, upper = 16, tags = "budget"),
booster = p_fct(levels = c("gbtree", "gblinear", "dart"))
)

Thanks to the broad ecosystem of the mlr3verse a learner does not require a natural budget parameter. A typical case of this would be decision trees. By using subsampling as preprocessing with mlr3pipelines, we can work around a lacking budget parameter.

set.seed(123)

# extend "classif.rpart" with "subsampling" as preprocessing step
ll = po("subsample") %>>% lrn("classif.rpart")

# extend hyperparameters of "classif.rpart" with subsampling fraction as budget
search_space = ps(
classif.rpart.cp = p_dbl(lower = 0.001, upper = 0.1),
classif.rpart.minsplit = p_int(lower = 1, upper = 10),
subsample.frac = p_dbl(lower = 0.1, upper = 1, tags = "budget")
)

We can now plug the new learner with the extended hyperparameter set into a TuningInstanceSingleCrit the same way as usual. Naturally, Hyperband terminates once all of its brackets are evaluated, so a Terminator in the tuning instance acts as an upper bound and should be only set to a low value if one is unsure of how long Hyperband will take to finish under the given settings.

instance = TuningInstanceSingleCrit$new( task = tsk("iris"), learner = ll, resampling = rsmp("holdout"), measure = msr("classif.ce"), terminator = trm("none"), # hyperband terminates itself search_space = search_space ) Now, we initialize a new instance of the mlr3hyperband::mlr_tuners_hyperband class and start tuning with it. library("mlr3hyperband") ## Loading required package: mlr3tuning ## Loading required package: paradox tuner = tnr("hyperband", eta = 3) # reduce logging output lgr::get_logger("bbotk")$set_threshold("warn")

tuner$optimize(instance) ## classif.rpart.cp classif.rpart.minsplit subsample.frac learner_param_vals ## 1: 0.07348 5 0.1111 <list[6]> ## x_domain classif.ce ## 1: <list[3]> 0.02 To receive the results of each sampled configuration, we simply run the following snippet. as.data.table(instance$archive)[, c(
"subsample.frac",
"classif.rpart.cp",
"classif.rpart.minsplit",
"classif.ce"
), with = FALSE]
##     subsample.frac classif.rpart.cp classif.rpart.minsplit classif.ce
##  1:         0.1111          0.02533                      3       0.04
##  2:         0.1111          0.07348                      5       0.02
##  3:         0.1111          0.08490                      3       0.02
##  4:         0.1111          0.05026                      6       0.02
##  5:         0.1111          0.03940                      4       0.02
##  6:         0.1111          0.02540                      7       0.42
##  7:         0.1111          0.01200                      4       0.14
##  8:         0.1111          0.03961                      4       0.02
##  9:         0.1111          0.05762                      6       0.02
## 10:         0.3333          0.07348                      5       0.06
## 11:         0.3333          0.08490                      3       0.04
## 12:         0.3333          0.05026                      6       0.06
## 13:         1.0000          0.08490                      3       0.04
## 14:         0.3333          0.08650                      6       0.02
## 15:         0.3333          0.07491                      9       0.06
## 16:         0.3333          0.06716                      6       0.04
## 17:         0.3333          0.06218                      9       0.08
## 18:         0.3333          0.03785                      4       0.06
## 19:         1.0000          0.08650                      6       0.04
## 20:         1.0000          0.02724                     10       0.04
## 21:         1.0000          0.05689                      3       0.04
## 22:         1.0000          0.09141                      4       0.04
##     subsample.frac classif.rpart.cp classif.rpart.minsplit classif.ce

You can access the best found configuration through the instance object.

instance$result ## classif.rpart.cp classif.rpart.minsplit subsample.frac learner_param_vals ## 1: 0.07348 5 0.1111 <list[6]> ## x_domain classif.ce ## 1: <list[3]> 0.02 instance$result_learner_param_vals
## $subsample.frac ## [1] 0.1111 ## ##$subsample.stratify
## [1] FALSE
##
## $subsample.replace ## [1] FALSE ## ##$classif.rpart.xval
## [1] 0
##
## $classif.rpart.cp ## [1] 0.07348 ## ##$classif.rpart.minsplit
## [1] 5
instance$result_y ## classif.ce ## 0.02 If you are familiar with the original paper, you may have wondered how we just used Hyperband with a parameter ranging from 0.1 to 1.0 . The answer is, with the help the internal rescaling of the budget parameter. mlr3hyperband automatically divides the budget parameters boundaries with its lower bound, ending up with a budget range starting again at 1, like it is the case originally. If we want an overview of what bracket layout Hyperband created and how the rescaling in each bracket worked, we can print a compact table to see this information. unique(as.data.table(instance$archive)[, .(bracket, bracket_stage, budget_scaled, budget_real, n_configs)])
##    bracket bracket_stage budget_scaled budget_real n_configs
## 1:       2             0         1.111      0.1111         9
## 2:       2             1         3.333      0.3333         3
## 3:       2             2        10.000      1.0000         1
## 4:       1             0         3.333      0.3333         5
## 5:       1             1        10.000      1.0000         1
## 6:       0             0        10.000      1.0000         3

In the traditional way, Hyperband uses uniform sampling to receive a configuration sample at the start of each bracket. But it is also possible to define a custom Sampler for each hyperparameter.

search_space = ps(
nrounds = p_int(lower = 1, upper = 16, tags = "budget"),
eta = p_dbl(lower = 0, upper = 1),
booster = p_fct(levels = c("gbtree", "gblinear", "dart"))
)

instance = TuningInstanceSingleCrit$new( task = tsk("iris"), learner = lrn("classif.xgboost"), resampling = rsmp("holdout"), measure = msr("classif.ce"), terminator = trm("none"), # hyperband terminates itself search_space = search_space ) # beta distribution with alpha = 2 and beta = 5 # categorical distribution with custom probabilities sampler = SamplerJointIndep$new(list(
Sampler1DRfun$new(search_space$params$eta, function(n) rbeta(n, 2, 5)), Sampler1DCateg$new(search_space$params$booster, prob = c(0.2, 0.3, 0.5))
))

Then, the defined sampler has to be given as an argument during instance creation. Afterwards, the usual tuning can proceed.

tuner = tnr("hyperband", eta = 2, sampler = sampler)
set.seed(123)
tuner$optimize(instance) ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:09] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:11] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:11] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:11] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:11] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:11] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:11] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:11] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:11] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:12] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:12] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:12] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:12] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:13] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:13] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:13] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:13] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:13] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:14] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:14] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:14] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:14] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:14] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:14] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:14] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:14] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:15] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:15] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:15] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:15] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:15] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:15] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:15] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:16] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:16] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:16] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:16] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:16] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:16] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:16] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:16] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:17] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:17] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:17] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:17] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:18] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:18] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:18] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:18] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:18] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:18] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:18] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:19] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:19] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:19] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:19] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## [14:22:19] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. ## nrounds eta booster learner_param_vals x_domain classif.ce ## 1: 1 0.2415 dart <list[5]> <list[3]> 0.04 instance$result
##    nrounds    eta booster learner_param_vals  x_domain classif.ce
## 1:       1 0.2415    dart          <list[5]> <list[3]>       0.04

Furthermore, we extended the original algorithm, to make it also possible to use mlr3hyperband for multi-objective optimization. To do this, simply specify more measures in the TuningInstanceMultiCrit and run the rest as usual.

instance = TuningInstanceMultiCrit$new( task = tsk("pima"), learner = lrn("classif.xgboost"), resampling = rsmp("holdout"), measures = msrs(c("classif.tpr", "classif.fpr")), terminator = trm("none"), # hyperband terminates itself search_space = search_space ) tuner = tnr("hyperband", eta = 4) tuner$optimize(instance)
## [14:22:20] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:20] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:20] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:20] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:21] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:23] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:23] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:23] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:23] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:23] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:24] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:24] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:24] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:24] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:24] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:24] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:24] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:25] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:25] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
## [14:22:25] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
##     nrounds     eta  booster learner_param_vals  x_domain classif.tpr
##  1:       1 0.34093 gblinear          <list[5]> <list[3]>      0.0000
##  2:       1 0.50700 gblinear          <list[5]> <list[3]>      0.0000
##  3:       1 0.77369 gblinear          <list[5]> <list[3]>      0.0000
##  4:       1 0.55967 gblinear          <list[5]> <list[3]>      0.0000
##  5:       1 0.46843 gblinear          <list[5]> <list[3]>      0.0000
##  6:       1 0.27922 gblinear          <list[5]> <list[3]>      0.0000
##  7:       1 0.79274 gblinear          <list[5]> <list[3]>      0.0000
##  8:       1 0.58557 gblinear          <list[5]> <list[3]>      0.0000
##  9:       1 0.39476 gblinear          <list[5]> <list[3]>      0.0000
## 10:       1 0.58145 gblinear          <list[5]> <list[3]>      0.0000
## 11:      16 0.01919   gbtree          <list[5]> <list[3]>      0.6374
## 12:       4 0.55531     dart          <list[5]> <list[3]>      0.5824
## 13:       4 0.65920   gbtree          <list[5]> <list[3]>      0.6044
## 14:      16 0.13587     dart          <list[5]> <list[3]>      0.6264
##     classif.fpr
##  1:      0.0000
##  2:      0.0000
##  3:      0.0000
##  4:      0.0000
##  5:      0.0000
##  6:      0.0000
##  7:      0.0000
##  8:      0.0000
##  9:      0.0000
## 10:      0.0000
## 11:      0.2364
## 12:      0.1576
## 13:      0.1758
## 14:      0.1818

Now the result is not a single best configuration but an estimated Pareto front. All red points are not dominated by another parameter configuration regarding their fpr and tpr performance measures.

instance$result ## nrounds eta booster learner_param_vals x_domain classif.tpr ## 1: 1 0.34093 gblinear <list[5]> <list[3]> 0.0000 ## 2: 1 0.50700 gblinear <list[5]> <list[3]> 0.0000 ## 3: 1 0.77369 gblinear <list[5]> <list[3]> 0.0000 ## 4: 1 0.55967 gblinear <list[5]> <list[3]> 0.0000 ## 5: 1 0.46843 gblinear <list[5]> <list[3]> 0.0000 ## 6: 1 0.27922 gblinear <list[5]> <list[3]> 0.0000 ## 7: 1 0.79274 gblinear <list[5]> <list[3]> 0.0000 ## 8: 1 0.58557 gblinear <list[5]> <list[3]> 0.0000 ## 9: 1 0.39476 gblinear <list[5]> <list[3]> 0.0000 ## 10: 1 0.58145 gblinear <list[5]> <list[3]> 0.0000 ## 11: 16 0.01919 gbtree <list[5]> <list[3]> 0.6374 ## 12: 4 0.55531 dart <list[5]> <list[3]> 0.5824 ## 13: 4 0.65920 gbtree <list[5]> <list[3]> 0.6044 ## 14: 16 0.13587 dart <list[5]> <list[3]> 0.6264 ## classif.fpr ## 1: 0.0000 ## 2: 0.0000 ## 3: 0.0000 ## 4: 0.0000 ## 5: 0.0000 ## 6: 0.0000 ## 7: 0.0000 ## 8: 0.0000 ## 9: 0.0000 ## 10: 0.0000 ## 11: 0.2364 ## 12: 0.1576 ## 13: 0.1758 ## 14: 0.1818 plot(classif.tpr~classif.fpr, instance$archive$data) points(classif.tpr~classif.fpr, instance$result, col = "red")

## 3.5 Feature Selection / Filtering

Often, data sets include a large number of features. The technique of extracting a subset of relevant features is called “feature selection”.

The objective of feature selection is to fit the sparse dependent of a model on a subset of available data features in the most suitable manner. Feature selection can enhance the interpretability of the model, speed up the learning process and improve the learner performance. Different approaches exist to identify the relevant features. Two different approaches are emphasized in the literature: one is called Filtering and the other approach is often referred to as feature subset selection or wrapper methods.

What are the differences ?

• Filtering: An external algorithm computes a rank of the features (e.g. based on the correlation to the response). Then, features are subsetted by a certain criteria, e.g. an absolute number or a percentage of the number of variables. The selected features will then be used to fit a model (with optional hyperparameters selected by tuning). This calculation is usually cheaper than “feature subset selection” in terms of computation time. All filters are connected via package mlr3filters.

• Wrapper Methods: Here, no ranking of features is done. Instead, an optimization algorithm selects a subset of the features, evaluates the set by calculating the resampled predictive performance, and then proposes a new set of features (or terminates). A simple example is the sequential forward selection. This method is usually computationally very intensive as a lot of models are fitted. Also, strictly speaking, all these models would need to be tuned before the performance is estimated. This would require an additional nested level in a CV setting. After undertaken all of these steps, the final set of selected features is again fitted (with optional hyperparameters selected by tuning). Wrapper methods are implemented in the mlr3fselect package.

• Embedded Methods: Many learners internally select a subset of the features which they find helpful for prediction. These subsets can usually be queried, as the following example demonstrates:

library("mlr3verse")

learner = lrn("classif.rpart")

# ensure that the learner selects features
stopifnot("selected_features" %in% learner$properties) # fit a simple classification tree learner = learner$train(task)

# extract all features used in the classification tree:
learner$selected_features() ## [1] "Petal.Length" "Petal.Width" There are also ensemble filters built upon the idea of stacking single filter methods. These are not yet implemented. ### 3.5.1 Filters Filter methods assign an importance value to each feature. Based on these values the features can be ranked. Thereafter, we are able to select a feature subset. There is a list of all implemented filter methods in the Appendix. ### 3.5.2 Calculating filter values Currently, only classification and regression tasks are supported. The first step it to create a new R object using the class of the desired filter method. Similar to other instances in mlr3, these are registered in a dictionary (mlr_filters) with an associated shortcut function flt(). Each object of class Filter has a .$calculate() method which computes the filter values and ranks them in a descending order.

filter = flt("jmim")

filter$calculate(task) as.data.table(filter) ## feature score ## 1: Petal.Width 1.0000 ## 2: Sepal.Length 0.6667 ## 3: Petal.Length 0.3333 ## 4: Sepal.Width 0.0000 Some filters support changing specific hyperparameters. This is similar to setting hyperparameters of a Learner using .$param_set$values: filter_cor = flt("correlation") filter_cor$param_set
## <ParamSet>
##        id    class lower upper nlevels    default value
## 1:    use ParamFct    NA    NA       5 everything
## 2: method ParamFct    NA    NA       3    pearson
# change parameter 'method'
filter_cor$param_set$values = list(method = "spearman")
filter_cor$param_set ## <ParamSet> ## id class lower upper nlevels default value ## 1: use ParamFct NA NA 5 everything ## 2: method ParamFct NA NA 3 pearson spearman ### 3.5.3 Variable Importance Filters All Learner with the property “importance” come with integrated feature selection methods. You can find a list of all learners with this property in the Appendix. For some learners the desired filter method needs to be set during learner creation. For example, learner classif.ranger comes with multiple integrated methods, c.f. the help page of ranger::ranger(). To use method “impurity”, you need to set the filter method during construction. lrn = lrn("classif.ranger", importance = "impurity") Now you can use the FilterImportance filter class for algorithm-embedded methods: task = tsk("iris") filter = flt("importance", learner = lrn) filter$calculate(task)
head(as.data.table(filter), 3)
##         feature score
## 1: Petal.Length 45.21
## 2:  Petal.Width 42.65
## 3: Sepal.Length  9.29

### 3.5.4 Wrapper Methods

Wrapper feature selection is supported via the mlr3fselect extension package. At the heart of mlr3fselect are the R6 classes:

### 3.5.5 The FSelectInstance Classes

The following sub-section examines the feature selection on the Pima data set which is used to predict whether or not a patient has diabetes.

task = tsk("pima")
print(task)
## <TaskClassif:pima> (768 x 9)
## * Target: diabetes
## * Properties: twoclass
## * Features (8):
##   - dbl (8): age, glucose, insulin, mass, pedigree, pregnant, pressure,
##     triceps

We use the classification tree from rpart.

learner = lrn("classif.rpart")

Next, we need to specify how to evaluate the performance of the feature subsets. For this, we need to choose a resampling strategy and a performance measure.

hout = rsmp("holdout")
measure = msr("classif.ce")

Finally, one has to choose the available budget for the feature selection. This is done by selecting one of the available Terminators:

For this short introduction, we specify a budget of 20 evaluations and then put everything together into a FSelectInstanceSingleCrit:

evals20 = trm("evals", n_evals = 20)

instance = FSelectInstanceSingleCrit$new( task = task, learner = learner, resampling = hout, measure = measure, terminator = evals20 ) instance ## <FSelectInstanceSingleCrit> ## * State: Not optimized ## * Objective: <ObjectiveFSelect:classif.rpart_on_pima> ## * Search Space: ## <ParamSet> ## id class lower upper nlevels default value ## 1: age ParamLgl NA NA 2 <NoDefault[3]> ## 2: glucose ParamLgl NA NA 2 <NoDefault[3]> ## 3: insulin ParamLgl NA NA 2 <NoDefault[3]> ## 4: mass ParamLgl NA NA 2 <NoDefault[3]> ## 5: pedigree ParamLgl NA NA 2 <NoDefault[3]> ## 6: pregnant ParamLgl NA NA 2 <NoDefault[3]> ## 7: pressure ParamLgl NA NA 2 <NoDefault[3]> ## 8: triceps ParamLgl NA NA 2 <NoDefault[3]> ## * Terminator: <TerminatorEvals> ## * Terminated: FALSE ## * Archive: ## <ArchiveFSelect> ## Null data.table (0 rows and 0 cols) To start the feature selection, we still need to select an algorithm which are defined via the FSelector class ### 3.5.6 The FSelector Class The following algorithms are currently implemented in mlr3fselect: In this example, we will use a simple random search and retrieve it from the dictionary mlr_fselectors with the fs() function: fselector = fs("random_search") ### 3.5.7 Triggering the Tuning To start the feature selection, we simply pass the FSelectInstanceSingleCrit to the $optimize() method of the initialized FSelector. The algorithm proceeds as follows

1. The FSelector proposes at least one feature subset and may propose multiple subsets to improve parallelization, which can be controlled via the setting batch_size).
2. For each feature subset, the given Learner is fitted on the Task using the provided Resampling. All evaluations are stored in the archive of the FSelectInstanceSingleCrit.
3. The Terminator is queried if the budget is exhausted. If the budget is not exhausted, restart with 1) until it is.
4. Determine the feature subset with the best observed performance.
5. Store the best feature subset as the result in the instance object. The best feature subset ($result_feature_set) and the corresponding measured performance ($result_y) can be accessed from the instance.
# reduce logging output
lgr::get_logger("bbotk")$set_threshold("warn") fselector$optimize(instance)
##     age glucose insulin mass pedigree pregnant pressure triceps
## 1: TRUE    TRUE    TRUE TRUE     TRUE     TRUE     TRUE    TRUE
##                                          features classif.ce
## 1: age,glucose,insulin,mass,pedigree,pregnant,...      0.207
instance$result_feature_set ## [1] "age" "glucose" "insulin" "mass" "pedigree" "pregnant" "pressure" ## [8] "triceps" instance$result_y
## classif.ce
##      0.207

One can investigate all resamplings which were undertaken, as they are stored in the archive of the FSelectInstanceSingleCrit and can be accessed by using as.data.table():

as.data.table(instance$archive) ## age glucose insulin mass pedigree pregnant pressure triceps classif.ce ## 1: TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 0.3242 ## 2: TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE 0.2227 ## 3: FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE 0.3477 ## 4: FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE 0.2734 ## 5: FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE 0.2617 ## 6: TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE 0.2188 ## 7: TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE 0.3203 ## 8: TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE 0.3125 ## 9: FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE 0.2500 ## 10: TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE 0.3672 ## 11: FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE 0.3945 ## 12: TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE 0.3008 ## 13: TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE 0.2344 ## 14: FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE 0.3477 ## 15: TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE 0.2227 ## 16: TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 0.2070 ## 17: FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE 0.2461 ## 18: TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 0.2070 ## 19: TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE 0.3086 ## 20: TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE 0.2266 ## runtime_learners timestamp batch_nr resample_result ## 1: 0.070 2021-09-19 14:22:30 1 <ResampleResult[20]> ## 2: 0.056 2021-09-19 14:22:30 2 <ResampleResult[20]> ## 3: 0.053 2021-09-19 14:22:30 3 <ResampleResult[20]> ## 4: 0.060 2021-09-19 14:22:31 4 <ResampleResult[20]> ## 5: 0.062 2021-09-19 14:22:31 5 <ResampleResult[20]> ## 6: 0.063 2021-09-19 14:22:31 6 <ResampleResult[20]> ## 7: 0.054 2021-09-19 14:22:31 7 <ResampleResult[20]> ## 8: 0.055 2021-09-19 14:22:31 8 <ResampleResult[20]> ## 9: 0.062 2021-09-19 14:22:32 9 <ResampleResult[20]> ## 10: 0.065 2021-09-19 14:22:32 10 <ResampleResult[20]> ## 11: 0.067 2021-09-19 14:22:32 11 <ResampleResult[20]> ## 12: 0.069 2021-09-19 14:22:32 12 <ResampleResult[20]> ## 13: 0.061 2021-09-19 14:22:33 13 <ResampleResult[20]> ## 14: 0.053 2021-09-19 14:22:33 14 <ResampleResult[20]> ## 15: 0.057 2021-09-19 14:22:33 15 <ResampleResult[20]> ## 16: 0.073 2021-09-19 14:22:33 16 <ResampleResult[20]> ## 17: 0.058 2021-09-19 14:22:33 17 <ResampleResult[20]> ## 18: 0.062 2021-09-19 14:22:34 18 <ResampleResult[20]> ## 19: 0.071 2021-09-19 14:22:34 19 <ResampleResult[20]> ## 20: 0.062 2021-09-19 14:22:34 20 <ResampleResult[20]> The associated resampling iterations can be accessed in the BenchmarkResult: instance$archive$benchmark_result$data
## Warning: '.__BenchmarkResult__data' is deprecated.
## See help("Deprecated")
## <ResultData>
##   Public:
##     as_data_table: function (view = NULL, reassemble_learners = TRUE, convert_predictions = TRUE,
##     clone: function (deep = FALSE)
##     combine: function (rdata)
##     data: list
##     initialize: function (data = NULL, store_backends = TRUE)
##     iterations: function (view = NULL)
##     learners: function (view = NULL, states = TRUE, reassemble = TRUE)
##     logs: function (view = NULL, condition)
##     prediction: function (view = NULL, predict_sets = "test")
##     predictions: function (view = NULL, predict_sets = "test")
##     resamplings: function (view = NULL)
##     sweep: function ()
##     tasks: function (view = NULL)
##     uhashes: function (view = NULL)
##   Private:
##     deep_clone: function (name, value)
##     get_view_index: function (view)

The uhash column links the resampling iterations to the evaluated feature subsets stored in instance$archive$data(). This allows e.g. to score the included ResampleResults on a different measure.

Now the optimized feature subset can be used to subset the task and fit the model on all observations.

task$select(instance$result_feature_set)
learner$train(task) The trained model can now be used to make a prediction on external data. Note that predicting on observations present in the task, should be avoided. The model has seen these observations already during feature selection and therefore results would be statistically biased. Hence, the resulting performance measure would be over-optimistic. Instead, to get statistically unbiased performance estimates for the current task, nested resampling is required. ### 3.5.8 Automating the Feature Selection The AutoFSelector wraps a learner and augments it with an automatic feature selection for a given task. Because the AutoFSelector itself inherits from the Learner base class, it can be used like any other learner. Analogously to the previous subsection, a new classification tree learner is created. This classification tree learner automatically starts a feature selection on the given task using an inner resampling (holdout). We create a terminator which allows 10 evaluations, and uses a simple random search as feature selection algorithm: learner = lrn("classif.rpart") terminator = trm("evals", n_evals = 10) fselector = fs("random_search") at = AutoFSelector$new(
learner = learner,
resampling = rsmp("holdout"),
measure = msr("classif.ce"),
terminator = terminator,
fselector = fselector
)
at
## <AutoFSelector:classif.rpart.fselector>
## * Model: -
## * Parameters: xval=0
## * Packages: rpart
## * Predict Type: response
## * Feature types: logical, integer, numeric, factor, ordered
## * Properties: importance, missings, multiclass, selected_features,
##   twoclass, weights

We can now use the learner like any other learner, calling the $train() and $predict() method. This time however, we pass it to benchmark() to compare the optimized feature subset to the complete feature set. This way, the AutoFSelector will do its resampling for feature selection on the training set of the respective split of the outer resampling. The learner then undertakes predictions using the test set of the outer resampling. This yields unbiased performance measures, as the observations in the test set have not been used during feature selection or fitting of the respective learner. This is called nested resampling.

To compare the optimized feature subset with the complete feature set, we can use benchmark():

grid = benchmark_grid(
learner = list(at, lrn("classif.rpart")),
resampling = rsmp("cv", folds = 3)
)

bmr = benchmark(grid, store_models = TRUE)
bmr\$aggregate(msrs(c("classif.ce", "time_train")))
##    nr      resample_result task_id              learner_id resampling_id iters
## 1:  1 <ResampleResult[20]>    pima classif.rpart.fselector            cv     3
## 2:  2 <ResampleResult[20]>    pima           classif.rpart            cv     3
##    classif.ce time_train
## 1:     0.2513          0
## 2:     0.2552          0

Note that we do not expect any significant differences since we only evaluated a small fraction of the possible feature subsets.