3 Model Optimization
Machine learning algorithms have default values set for their hyperparameters. Irrespective, these hyperparameters need to be changed by the user to achieve optimal performance on the given dataset. A manual selection of hyperparameter values is not recommended as this approach rarely leads to the best performance. To substantiate the validity of the selected hyperparameters (= tuning), data-driven optimization is recommended. In order to tune a machine learning algorithm, one has to specify (1) the search space, (2) the optimization algorithm (aka tuning method), (3) an evaluation method, i.e., a resampling strategy and (4) a performance measure.
In summary, the sub-chapter on tuning illustrates how to:
- undertake empirically sound hyperparameter selection
- select the optimizing algorithm
- write out search spaces concisely
- trigger the tuning
- automate tuning
This sub-chapter also requires the package mlr3tuning, an extension package which supports hyperparameter tuning.
The second part of this chapter explains feature selection, also known as variable selection. Feature selection is the process of finding a subset of relevant features of the data. Some of the reasons to perform the selection:
- enhance the interpretability of the model,
- speed up model fitting or
- improve the learner performance by reducing noise in the data.
In this book we focus mainly on the last aspect. Different approaches exist to identify the relevant features. In the sub-chapter on feature selection, we emphasize three methods:
- Filter algorithms select features independently of the learner according to a score.
- Variable importance filters select features that are important according to a learner.
- Wrapper methods iteratively select features to optimize a performance measure.
Note, that filters do not require a learner. Variable importance filters require a learner that can calculate feature importance values once it is trained. The obtained importance values can be used to subset the data, which can then be used to train a learner. Wrapper methods can be used with any learner but need to train the learner multiple times.
In order to get a good estimate of generalization performance and avoid data leakage, both an outer (performance) and an inner (tuning/feature selection) resampling process are necessary. The following features are discussed in this chapter:
- Inner and outer resampling strategies in nested resampling
- The execution of nested resampling
- The evaluation of executed resampling iterations
This sub-chapter will provide instructions on how to implement nested resampling, accounting for both inner and outer resampling in mlr3.