3 mlr3 Basics

This chapter will teach you the essential building blocks, R6 classes and operations of mlr3. These include creating supervised ML tasks like classification/regression, performing training/prediction and cross-validate/benchmark different models.

A typical (simple) machine-learning workflow looks like this:

Data is usually split up into training and testing portions. Using a learning algorithm we induce a model on training data, label the test instances with our model, and compare true test label to predicted labels. We obtain a scalar numeric score which quantifies the predictive power of our learner given the current data situation.

Resampling statistically repeats the whole process and averages test scores to smooth out variance of the performance estimation.

The mlr3 package provides R6 classes for the these essential building blocks:

  • A task wraps the data and stores additional information about it.
  • A learners interfaces R’s many ML algorithms, allows train and predict operations on data and the setting and querying of hyperparameters.
  • A measure is a mapping from test labels and predicted labels on a test set to a numeric score.
  • A resampling specifies through index sets the (repeated) splitting of the data into train and test sets.

You will later learn how to enhance learners to full pipelines and tune hyperparameters, but for now we will stick to this simple workflow.