3.1 Hyperparameter Tuning
TuningInstance: This class describes the tuning problem and stores results.
Tuner: This class is the base class for implementations of tuning algorithms.
The following sub-section examines the optimization of a simple classification tree on the
Pima Indian Diabetes data set.
We use the classification tree from rpart and choose a subset of the hyperparameters we want to tune. This is often referred to as the “tuning space”.
Here, we opt to tune two parameters namely on the one hand the complexity
cp and second of all the termination criterion
As the tuning space has to be bound,one needs to set lower and upper bounds:
Finally, one has to select the budget available, to solve this tuning instance.
This is done by selecting one of the available
- Terminate after a given time (
- Terminate after a given amount of iterations (
- Terminate after a specific performance is reached (
- Terminate when tuning does not improve (
- A combination of the above in an ALL or ANY fashion, using
For this short introduction, we grant a budget of 20 evaluations and then put everything together into a
library(mlr3tuning) evals20 = term("evals", n_evals = 20) instance = TuningInstance$new( task = task, learner = learner, resampling = hout, measures = measure, param_set = tune_ps, terminator = evals20 ) print(instance) ## <TuningInstance> ## * Task: <TaskClassif:pima> ## * Learner: <LearnerClassifRpart:classif.rpart> ## * Measures: classif.ce ## * Resampling: <ResamplingHoldout> ## * Terminator: <TerminatorEvals> ## * bm_args: list() ## ParamSet: ## id class lower upper levels default value ## 1: cp ParamDbl 0.001 0.1 <NoDefault> ## 2: minsplit ParamInt 1.000 10.0 <NoDefault> ## Archive: ## Empty data.table (0 rows and 11 cols): nr,batch_nr,resample_result,task_id,learner_id,resampling_id...
To start the tuning, we still need to select how the optimization should take place.
In other words, we need to choose the optimization algorithm via the
The following algorithms are currently implemented in mlr3tuning:
- Grid Search (
- Random Search (
TunerRandomSearch) (Bergstra and Bengio 2012)
- Generalized Simulated Annealing (
In this example, we will use a simple grid search with a grid resolution of 10:
Since we have only numeric parameters,
TunerGridSearch will create a grid of equally-sized steps between the respective upper and lower bounds.
As we have two hyperparameters with a resolution of 5, the two-dimensional grid consists of \(5^2 = 25\) configurations.
Each configuration serves as hyperparameter setting for the classification tree and triggers a 3-fold cross validation on the task.
All configurations will be examined by the tuner (in a random order), until either all configurations are evaluated or the
Terminator signals that the budget is exhausted.
3.1.3 Triggering the Tuning
Tunerproposes at least one hyperparameter configuration (the
Tunerand may propose multiple points to improve parallelization, which can be controlled via the setting
- For each configuration, a
Learneris fitted on
Taskusing the provided
Resampling. The results are combined with other results from previous iterations to a single
Terminatoris queried if the budget is exhausted. If the budget is not exhausted, restart with 1) until it is.
- Determine the configuration with the best observed performance.
- Return a named list with the hyperparameter settings (
"values") and the corresponding measured performance (
One can investigate all resamplings which where undertaken, using the
$archive() method of the
Here, we just extract the performance values and the hyperparameters:
instance$archive(unnest = "params")[, c("cp", "minsplit", "classif.ce")] ## cp minsplit classif.ce ## 1: 0.00100 10 0.2500 ## 2: 0.00100 5 0.2773 ## 3: 0.02575 8 0.2422 ## 4: 0.07525 10 0.2656 ## 5: 0.05050 1 0.2344 ## 6: 0.07525 8 0.2656 ## 7: 0.10000 3 0.2656 ## 8: 0.07525 1 0.2656 ## 9: 0.10000 10 0.2656 ## 10: 0.00100 3 0.2656 ## 11: 0.05050 10 0.2344 ## 12: 0.10000 5 0.2656 ## 13: 0.05050 8 0.2344 ## 14: 0.02575 3 0.2422 ## 15: 0.05050 3 0.2344 ## 16: 0.00100 1 0.2734 ## 17: 0.07525 3 0.2656 ## 18: 0.02575 1 0.2422 ## 19: 0.10000 8 0.2656 ## 20: 0.00100 8 0.2500
In sum, the grid search evaluated 20/25 different configurations of the grid in a random order before the
Terminator stopped the tuning.
The trained model could now be used to make a prediction on external data.
Note that predicting on observations present in the
task, is statistically bias and should be avoided.
The model has already seen these observations during the tuning process.
Hence, the resulting performance measure would be over-optimistic.
Instead, to get unbiased performance estimates for the current task, nested resampling is required.
3.1.4 Automating the Tuning
AutoTuner wraps a learner and augments it with an automatic tuning for a given set of hyperparameters.
AutoTuner itself inherits from the
Learner base class, it can be used like any other learner.
Analogously to the previous subsection, a new classification tree learner is created.
This classification tree learner automatically tunes the parameters
minsplit using an inner resampling (holdout).
We create a terminator which allows 10 evaluations, and use a simple random search as tuning algorithm:
library(paradox) library(mlr3tuning) learner = lrn("classif.rpart") resampling = rsmp("holdout") measures = msr("classif.ce") tune_ps = ParamSet$new(list( ParamDbl$new("cp", lower = 0.001, upper = 0.1), ParamInt$new("minsplit", lower = 1, upper = 10) )) terminator = term("evals", n_evals = 10) tuner = tnr("random_search") at = AutoTuner$new( learner = learner, resampling = resampling, measures = measures, tune_ps = tune_ps, terminator = terminator, tuner = tuner ) at ## <AutoTuner:classif.rpart.tuned> ## * Model: - ## * Parameters: xval=0 ## * Packages: rpart ## * Predict Type: response ## * Feature types: logical, integer, numeric, factor, ordered ## * Properties: importance, missings, multiclass, selected_features, ## twoclass, weights
We can now use the learner like any other learner, calling the
This time however, we pass it to
benchmark() to compare the tuner to a classification tree without tuning.
This way, the
AutoTuner will do its resampling for tuning on the training set of the respective split of the outer resampling.
The learner then predicts using the test set of the outer resampling.
This yields unbiased performance measures, as the observations in the test set have not been used during tuning or fitting of the respective learner.
This is called nested resampling.
To compare the tuned learner with the learner using its default, we can use
grid = benchmark_grid( task = tsk("pima"), learner = list(at, lrn("classif.rpart")), resampling = rsmp("cv", folds = 3) ) bmr = benchmark(grid) bmr$aggregate(measures) ## nr resample_result task_id learner_id resampling_id iters ## 1: 1 <ResampleResult> pima classif.rpart.tuned cv 3 ## 2: 2 <ResampleResult> pima classif.rpart cv 3 ## classif.ce ## 1: 0.2578 ## 2: 0.2422
Note that we do not expect any differences compared to the non-tuned approach for multiple reasons:
- the task is too easy
- the task is rather small, and thus prone to overfitting
- the tuning budget (10 evaluations) is small
- rpart does not benefit that much from tuning
Bergstra, James, and Yoshua Bengio. 2012. “Random Search for Hyper-Parameter Optimization.” J. Mach. Learn. Res. 13. JMLR.org: 281–305.