3 Model Optimization

Model Tuning

Even though machine learning algorithms have default values set for their hyperparameters, these need to be changed by the user to achieve optimal performance on the given dataset. A manual selection of hyperparameter values is not recommended as this approach rarely leads to an optimal performance. This is why a data-driven optimization of hyperparameters (= tuning) should be conducted. In order to tune a machine learning algorithm, one has to specify:

  • the search space
  • the optimization algorithm (aka tuning method)
  • an evaluation method, i.e., a resampling strategy and a performance measure

In summary, the sub-chapter on tuning illustrates hyperparameter selection, how to pick an optimizing algorithm and how to automate tuning using mlr3. This sub-chapter requires the package mlr3-tuning, an extension package which supports hyperparameter tuning.

Feature Selection

The second part of this chapter explains feature selection. The objective of feature selection is to fit the sparse dependent of a model on a subset of available data features in the most suitable manner. Feature selection can enhance the interpretability of the model, speed up model fitting and improve the learner performance by reducing noise in the data. Different approaches exist to identify the relevant features.

In the sub-chapter on feature selection, two approaches are emphasized:

A third approach, feature selection via ensemble filters, is introduced subsequently. The implementation of all three approaches in mlr3 is showcased using the extension-package mlr3filters.

Nested Resampling

In order to get a good estimate of generalization performance and avoid data leakage, both an outer (performance) and an inner (tuning/feature selection) resampling process are necessary.

The sub-section nested resampling will provide instructions on how to implement nested resampling in mlr3, accounting for both inner and outer resampling.