Model Optimization

In machine learning, when you are dissatisfied with the performance of a model, you might ask yourself how to best improve the model:

Model Tuning

Machine learning algorithms have default values set for their hyperparameters. In many cases, these hyperparameters need to be changed by the user to achieve optimal performance on the given dataset. While you can certainly search for hyperparameter settings that improve performance manually, we do not recommend this approach as it is tedious and rarely leads to the best performance. Fortunately, the mlr3 ecosystem provides packages and tools for automated tuning. In order to tune a machine learning algorithm, you have to specify

  1. the search space,
  2. the optimization algorithm (i.e. tuning method),
  3. an evaluation method (i.e. a resampling strategy), and
  4. a performance measure.

In the tuning part, we will have a look at:

We will use the mlr3tuning package, which supports common tuning operations.

Feature Selection

Tuning the hyperparameters is only one way of improving the performance of your model. The second part of this chapter explains feature selection, also known as variable or descriptor selection. Feature selection is the process of finding the feature subset that is most relevant with respect to the prediction or for which the learner fits a model with the highest performance. Apart from improving model performance, there are additional reasons to perform feature selection:

  • enhance the interpretability of the model,
  • speed up model fitting, or
  • eliminate the need to collect lots of expensive features.

Here, we mostly focus on feature selection as a means for improving model performance.

There are different approaches to identify the relevant features. In the feature selection part, we describe three methods:

  • Filter algorithms select features independently of the learner by scoring the different features.
  • Variable importance filters select features that are important according the model induced by a learner.
  • Wrapper methods iteratively select features to optimize a performance measure, each time fitting and evaluating a model with a different subset of features.

Note that filters operate independently of learners. Variable importance filters rely on the learner to extract information on feature importance from a trained model, for example by inspecting a learned decision tree and returning the features that are used in the first few levels. The obtained importance values can be used to subset the data, which can then be used to train a learner. Wrapper methods can be used with any learner but need to train the learner potentially many times, making this the most expensive method.

Nested Resampling

For hyperparameter tuning, a normal resampling (e.g. a cross-validation) is no longer sufficient to ensure an unbiased evaluation. Consider the following thought experiment to gain intuition for why this is the case. Suppose a learner has a hyperparameter that has no real effect on the fitted model, but only introduces random noise into the predictions. Evaluating different values for this hyperparameter, one will show the best performance (purely randomly). This is the hyperparameter value that will be chosen as the best, although the hyperparameter has no real effect. To discover this, another separate validation set is required – it will reveal that the “optimized” setting really does not perform better than anything else.

We need a nested resampling to ensure unbiased estimates of the generalization error during hyperparameter optimization. We discuss the following aspects in this part: