3 Model Optimization
Machine learning algorithms have default values set for their hyperparameters. Irrespective, these hyperparameters need to be changed by the user to achieve optimal performance on the given dataset. A manual selection of hyperparameter values is not recommended as this approach rarely leads to an optimal performance. To substantiate the validity of the selected hyperparatmeters (= tuning), data-driven optimization is recommended. In order to tune a machine learning algorithm, one has to specify (1) the search space, (2) the optimization algorithm (aka tuning method) and (3) an evaluation method, i.e., a resampling strategy and a performance measure.
In summary, the sub-chapter on tuning illustrates how to:
- undertake empirically sound hyperparameter selection
- select the optimizing algorithm
- trigger the tuning
- automate tuning
This sub-chapter requires the package
mlr3-tuning, an extension package which supports hyperparameter tuning.
The second part of this chapter explains feature selection. The objective of feature selection is to fit the sparse dependent of a model on a subset of available data features in the most suitable manner. Feature selection can enhance the interpretability of the model, speed up model fitting and improve the learner performance by reducing noise in the data. Different approaches exist to identify the relevant features. In the sub-chapter on feature selection, three approaches are emphasized:
- Feature selection using filter algorithms
- Feature selection via variable importance filters
- Feature selection by employing the so called wrapper methods
A fourth approach, feature selection via ensemble filters, is introduced subsequently.
The implementation of all four approaches in mlr3 is showcased using the extension-package
In order to get a good estimate of generalization performance and avoid data leakage, both an outer (performance) and an inner (tuning/feature selection) resampling process are necessary. Following features are discussed in this chapter:
- Inner and outer resampling strategies in nested resampling
- The execution of nested resampling
- The evaluation of executed resampling iterations
The sub-section nested resampling will provide instructions on how to implement nested resampling, accounting for both inner and outer resampling in mlr3.