4 Model Optimization

Model Tuning

Even though machine learning algorithms have default values set for their hyperparameters, these need to be changed by the user to achieve optimal performance on the given dataset. A manual selection of hyperparameter values is not recommended as this approach rarely leads to an optimal performance. This is why a data-driven optimization of hyperparameters (= tuning) should be conducted. In order to tune a machine learning algorithm, one has to specify:

  • the search space
  • the optimization algorithm (aka tuning method)
  • an evaluation method, i.e., a resampling strategy and a performance measure

In summary, the sub-chapter on tuning illustrates hyperparameter selection, how to pick an optimizing algorithm and how to automate tuning using mlr3.

Feature Selection

The second part of this chapter explains “feature selection”. The objective of feature selection is to fit the sparse dependent of a model on a subset of available data features in the most suitable manner. Feature selection can enhance the interpretability of the model, speed up model fitting and improve the learner performance by reducing noise in the data. Different approaches exist to identify the relevant features.

The sub-chapter on feature selection introduces two distinct approaches:

  • feature selection using filter algorithms
  • feature selection using so called “wrapper methods”

Nested Resampling

In order to get a good estimate of generalization performance and avoid data leakage, both an outer (performance) and an inner (tuning/feature selection) resampling process are necessary.

The third sub-section of this chapter will provide instructions on how to implement nested resampling in mlr3, accounting for both inner and outer resampling.