2 Fundamentals
Natalie Foss
University of Wyoming
Lars Kotthoff
University of Wyoming
In this chapter, we will introduce the mlr3
objects and corresponding R6
classes that implement the essential building blocks of machine learning (ML). These building blocks include the data (and the methods of creating training and test sets), the ML algorithm (and its training and prediction process), the configuration of a ML algorithm through its hyperparameters, and evaluation measures to assess the quality of predictions.
In the simplest definition, machine learning is the process of using computer models to learn relationships from data. Supervised learning is a subfield of ML in which datasets consist of observations (rows in tabular data) that are labeled, which means that each data point includes features (columns in tabular data) and a quantity that we are trying to predict, also called a target. A classic example might be trying to predict a car’s miles per gallon (the target) based on properties (the features) such as horsepower and the number of gears (we will return to this particular example later). In mlr3
, we refer to datasets, and their associated metadata as tasks (Section 2.1). The term ‘tasks’ is used to refer to the ML task (i.e., mathematical problem) that we are trying to solve. Tasks are defined by the features used for prediction and the targets to predict, so there can be multiple tasks associated with any given dataset. For example, predicting miles per gallon (mpg) from horsepower is one task, predicting horsepower from miles per gallon is another task, and predicting number of gears from model is yet another task, and so on.
Supervised learning can be further divided into regression – which is prediction of numeric target values, e.g. predicting a car’s mpg – and classification – which is prediction of categorical values/labels, e.g., predicting a car’s model. Other tasks are also encompassed by supervised learning, and these are returned to in Chapter 8, we will also consider unsupervised learning tasks in that chapter. For any supervised learning task, the goal is to build a model that captures the relationship between features and target and often to train a model to be able to make predictions for new and previously unseen data. A model is formally a mapping from a feature vector to predictions, such models are induced by passing training data to machine learning algorithms, including decision trees, support vector machines, neural networks, and many more. ML algorithms are called learners in mlr3
(Section 2.2) as given data, they learn models. Each learner has a parameterized space that potential models are drawn from and during the training process, these parameters are fitted to best match the data. For example, the parameters could be the weights given to individual features when training a linear regression model. During training, all ML algorithms are fitted/trained by optimizing a lossfunction that quantifies the mismatch between ground truth target values in the training data and the predictions of the model.
For a model to be most useful, it should generalize beyond the training data to make ‘good’ predictions (Section 2.2.2) on new and previously ‘unseen’ (by the model) data. The simplest way to determine if a model will generalize to make ‘good’ predictions for new data, is to split data into training data and test data – where the model is trained on the training data and then the separate test data is used to evaluate models in an unbiased way by assessing to what extent the model has learned the true relationships that underlie the data (Chapter 3). This evaluation procedure estimates a model’s generalization error, i.e., how well we expect the model to perform in general. There are many ways to evaluate models (Chapter 3) and to split data for estimating generalization error (Section 3.2).
This brief overview of ML provides the basic knowledge required to use software in mlr3
and is summarized in Figure 2.1. In the rest of this book we will also provide introductions to methodology when relevant and in Chapter 8 we will also provide introduction to applications in other tasks. For texts about ML, including detailed methodology and underpinnings of different algorithms, we recommend Hastie, Friedman, and Tibshirani (2001), James et al. (2013), and Bishop (2006).
In the next few sections we will look at the building blocks of mlr3
using regression as an example, we will then consider how to extend this to classification in Section 2.5, for other tasks see Chapter 8.
2.1 Tasks
Tasks are objects that contain the (usually tabular) data and additional metadata that define a ML problem. The metadata contain, for example, the name of the target feature for supervised ML problems. This information is used automatically by operations that can be performed on a task so that for example the user does not have to specify the prediction target every time a model is trained.
2.1.1 Constructing Tasks
mlr_tasks
<DictionaryTask> with 20 stored values
Keys: bike_sharing, boston_housing, breast_cancer, german_credit, ilpd,
iris, kc_housing, moneyball, mtcars, optdigits, penguins,
penguins_simple, pima, ruspini, sonar, spam, titanic, usarrests,
wine, zoo
To get a task from the dictionary, use the tsk()
function and assign the return value to a new variable. Below we retrieve the task mlr_tasks_mtcars
, which uses the datasets::mtcars
dataset:
task_mtcars = tsk("mtcars")
task_mtcars
<TaskRegr:mtcars> (32 x 11): Motor Trends
* Target: mpg
* Properties: 
* Features (10):
 dbl (10): am, carb, cyl, disp, drat, gear, hp, qsec, vs, wt
Usually in R, the help pages of functions can be queried with ?
. The same is true of R6 classes, so if you want to find the help page of the mtcars
task you could use ?mlr_tasks_mtcars
. We have also added a $help()
method to many of our classes, which allows you to access the help page of a class from any instance of that class, for example: tsk("mtcars")$help()
.
Many object names in mlr3
are standardized according to the convention: mlr_<types>_<key>
. Where <types>
will be tasks
, learners
, measures
, and others to be covered later in the book; and <key>
refers to the ID of the object. To simplify the process of constructing objects, you only need to know the object key and the sugar function for construction.
For example: mlr_tasks_mtcars
becomes tsk("mtcars")
;mlr_learners_regr.rpart
becomes lrn("regr.rpart")
; and mlr_measures_regr.mse
becomes msr("regr.mse")
.
To create your own regression task, you will need to construct a new instance of the TaskRegr
. The simplest way to do this is with the function as_task_regr()
to convert a data.frame
type object to a regression task, specifying the target feature by passing this to the target
argument. By example, we will imagine that mtcars
was not already available as a predefined task in mlr3
. In the code below we load the datasets::mtcars
dataset, print its properties, subset the data to only include columns "mpg"
, "cyl"
, "disp"
, print the modified data’s properties, and then setup a regression task called "cars"
(id = "cars"
) in which we will try to predict miles per gallon (target = "mpg"
) from number of cylinders ("cyl"
) and displacement ("disp"
):
library(mlr3)
data("mtcars", package = "datasets")
mtcars_subset = subset(mtcars, select = c("mpg", "cyl", "disp"))
str(mtcars_subset)
'data.frame': 32 obs. of 3 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
task_mtcars = as_task_regr(mtcars_subset, target = "mpg", id = "cars")
The data can be in any tabular format, e.g. a data.frame()
, data.table()
, or tibble()
. The target
argument specifies the prediction target column. The id
argument is optional and specifies an identifier for the task that is used in plots and summaries; if omitted the variable name of the data will be used as the id
.
As many ML models do not work properly with arbitrary UTF8 names^{1}, mlr3
defaults to throw an error if any of the column names passed to as_task_regr()
(and other task constructors) contain a nonASCII character or do not comply with R’s variable naming scheme. Therefore, we recommend converting names with make.names()
if possible, but if not then you can bypass this check inr mlr3
by setting options(mlr3.allow_utf8_names = TRUE)
(but do not be surprised if an underlying package implementation throws up a related error).
Printing a task provides a short summary, in this case we can see the task has 32 observations and 3 columns (32 x 3), of which mpg
is the target, there are no special properties, and there are 2 features stored in doubleprecision floating point format.
task_mtcars
<TaskRegr:cars> (32 x 3)
* Target: mpg
* Properties: 
* Features (2):
 dbl (2): cyl, disp
We can plot the task using the mlr3viz
package, which gives a graphical summary of the distribution of the target and feature values:
2.1.2 Retrieving Data
We have looked at how to create tasks to store data and metadata, now we will look at how to retrieve the stored data.
Various fields can be used to retrieve metadata about a task. The dimensions, for example, can be retrieved using $nrow
and $ncol
:
c(task_mtcars$nrow, task_mtcars$ncol)
[1] 32 3
The names of the feature and target columns are stored in the $feature_names
and $target_names
slots, respectively.
c(Features = task_mtcars$feature_names, Target = task_mtcars$target_names)
Features1 Features2 Target
"cyl" "disp" "mpg"
While the columns of a task have unique character
valued names, their rows are identified by unique natural numbers, called row IDs. They can be accessed through the $row_ids
field:
head(task_mtcars$row_ids)
[1] 1 2 3 4 5 6
Row IDs are not used as features when training or predicting but are metadata that allows to access individual observations. Note that row IDs are not the same as row numbers. This is best demonstrated by example, below we create a regression task from random data, print the original row IDs, which correspond to row numbers 15, then we filter three rows (we will return to this method just below) and print the new row IDs, which no longer correspond to the row numbers.
task = as_task_regr(data.frame(x = runif(5), y = runif(5)), target = "y")
task$row_ids
[1] 1 2 3 4 5
task$filter(c(4, 1, 3))
task$row_ids
[1] 1 3 4
This design decision allows tasks and learners to transparently operate on real database management systems, where uniqueness is the only requirement for primary keys (and not the actual row ID value).
The data contained in a task can be accessed through $data()
, which returns a data.table
object. This method has optional rows
and cols
arguments to specify subsets of the data to retrieve.
# retrieve all data
task_mtcars$data()
mpg cyl disp
1: 21.0 6 160.0
2: 21.0 6 160.0
3: 22.8 4 108.0
4: 21.4 6 258.0
5: 18.7 8 360.0

28: 30.4 4 95.1
29: 15.8 8 351.0
30: 19.7 6 145.0
31: 15.0 8 301.0
32: 21.4 4 121.0
# retrieve data for rows with IDs 1, 5, and 10 and feature columns
task_mtcars$data(rows = c(1, 5, 10), cols = task_mtcars$feature_names)
cyl disp
1: 6 160.0
2: 8 360.0
3: 6 167.6
You can work with row numbers instead of row IDs by adding a step to extract the corresponding row ID:
# select the 2nd row of the task by extracting the second row_id:
task$data(rows = task$row_ids[2])
You can always use ‘standard’ R methods to extract summary data from a task, for example to summarize the underlying data:
summary(as.data.table(task_mtcars))
mpg cyl disp
Min. :10.40 Min. :4.000 Min. : 71.1
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8
Median :19.20 Median :6.000 Median :196.3
Mean :20.09 Mean :6.188 Mean :230.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0
Max. :33.90 Max. :8.000 Max. :472.0
2.1.3 Task Mutators
After a task has been created, you may want to perform operations on the task such as filtering down to subsets of rows and columns, which is often useful for manually creating train and test splits or to fit models on a subset of given features. Above we saw how to access subsets of the underlying dataset using $data()
, however this will not change the underlying task. Therefore, we provide mutators, which modify the given Task
in place, this can be seen in examples below.
Subsetting by features (columns) is possible with $select()
with the desired feature names passed as a character vector and subsetting by observations (rows) is performed with $filter()
by passing the row IDs as a numeric vector :
$select()
/$filter()
task_mtcars_small = tsk("mtcars") # initialize with the full task
task_mtcars_small$select(c("am", "carb")) # keep only these features
task_mtcars_small$filter(2:4) # keep only these rows
task_mtcars_small$data()
mpg am carb
1: 21.0 1 4
2: 22.8 1 1
3: 21.4 0 1
As R6
uses reference semantics (Section 1.8.2), you need to use $clone()
if you want to copy a task and then mutate it further:
# the wrong way
task_mtcars_small = tsk("mtcars")$filter(1:2)$select("cyl")
task_mtcars_wrong = task_mtcars_small
task_mtcars_wrong$data()
mpg cyl
1: 21 6
2: 21 6
task_mtcars_wrong$filter(1)
# original data affected
task_mtcars_small$data()
mpg cyl
1: 21 6
# the right way
task_mtcars_small = tsk("mtcars")$filter(1:2)$select("cyl")
task_mtcars_right = task_mtcars_small$clone()
task_mtcars_right$data()
mpg cyl
1: 21 6
2: 21 6
task_mtcars_right$filter(1)
# original data unaffected
task_mtcars_small$data()
mpg cyl
1: 21 6
2: 21 6
To add extra rows and columns to a task, you can use $rbind()
and $cbind()
respectively :
$rbind()
/$cbind()
task_mtcars_small$cbind( # add another column
data.frame(disp = c(150, 160))
)
task_mtcars_small$rbind( # add another row
data.frame(mpg = 23, cyl = 5, disp = 170)
)
task_mtcars_small$data()
mpg cyl disp
1: 21 6 150
2: 21 6 160
3: 23 5 170
2.2 Learners
Objects of class Learner
provide a unified interface to many popular ML algorithms in R. The mlr_learners
dictionary contains all the learners available in mlr3
, we will discuss the available learners in Section 2.7, for now we will just use a regression tree learner as an example to discuss the Learner
interface. As with tasks, you can access learners from the dictionary with a single sugar function, in this case lrn()
.
lrn("regr.rpart")
<LearnerRegrRpart:regr.rpart>: Regression Tree
* Model: 
* Parameters: xval=0
* Packages: mlr3, rpart
* Predict Types: [response]
* Feature Types: logical, integer, numeric, factor, ordered
* Properties: importance, missings, selected_features, weights
All Learner
objects include the following metadata, which can be seen in the output above:

$feature_types
: the type of features the learner can handle. 
$packages
: the packages required to be installed to use the learner. 
$properties
: special properties the model can handle, for example the “missings” properties means a model can handle missing data, and “importance” means it can compute the relative importance of each feature. 
$predict_types
: the types of prediction that the model can make (Section 2.2.2). 
$param_set
: the set of available hyperparameters (Section 2.2.3).
To run an ML experiment, learners pass through two stages (Figure 2.2):
 Training: A training
Task
is passed to the learner’s$train()
function which trains and stores a model, i.e., the learned relationship of the features to the target.  Predicting: New data, often a different partition of the original dataset, is passed to the
$predict()
method of the trained learner to predict the target values.
2.2.1 Training
In the simplest usecase, models are trained by passing a task to a learner with the $train()
method:
After training, the fitted model is stored in the $model
field for future inspection and prediction:
# inspect the trained model
learner_rpart$model
n= 32
node), split, n, deviance, yval
* denotes terminal node
1) root 32 1126.04700 20.09062
2) cyl>=5 21 198.47240 16.64762
4) hp>=192.5 7 28.82857 13.41429 *
5) hp< 192.5 14 59.87214 18.26429 *
3) cyl< 5 11 203.38550 26.66364 *
We see that the regression tree has identified features in the task that are predictive of the target (mpg
) and used them to partition observations. The textual representation of the model depends on the type of learner. For more information on any model see the learner help page, which can be accessed in the same way as tasks with the help()
field, e.g., learner_rpart$help()
.
2.2.1.1 Partitioning data
When performing simple examples to assess the quality of a model’s predictions, you will likely want to partition your dataset to get a fair and unbiased estimate of a model’s generalization error. In Chapter 3 we will look at resampling and benchmark experiments, which will go into more detail about performance estimation, for now we will just discuss the simplest method of splitting data using the partition()
function. This function randomly splits the given task into two disjoint sets: a training set (67% of the total data, the default) and a test set (33% of the total data, the data not part of the training set).
# changing default to a 70:30 train:test split
splits = partition(task_mtcars, ratio = 0.7)
splits
$train
[1] 1 3 4 5 8 10 21 25 32 6 7 11 14 15 16 17 22 24 31 18 19 20 26
$test
[1] 2 9 27 30 12 13 23 29 28
Now when training we will tell the model to only use the training data by passing the row IDs from partition
to the row_ids
argument of $train()
:
learner_rpart$train(task_mtcars, row_ids = splits$train)
Now we can use our trained learner to make predictions on new data.
2.2.2 Predicting
Predicting from trained models is as simple as passing your data to the $predict()
method of the trained Learner
.
Carrying straight on from our last example, we will call the $predict()
method from our trained learner and again will use the row_ids
argument, but this time to pass the IDs of our test set:
predictions = learner_rpart$predict(task_mtcars, row_ids = splits$test)
The $predict()
method returns an object inheriting from Prediction
, in this case PredictionRegr
as this is a regression task.
predictions
<PredictionRegr> for 9 observations:
row_ids truth response
2 21.0 16.2800
9 22.8 26.7625
27 26.0 26.7625

23 15.2 16.2800
29 15.8 16.2800
28 30.4 26.7625
The row_ids
column corresponds to the row IDs of the predicted observations. The truth
column contains the ground truth data, which the object extracts from the task, in this case: task_mtcars$truth(splits$test)
. Finally, the response
column contains the values predicted by the model. The Prediction
object can easily be converted into a data.table
or data.frame
using as.data.table()
/as.data.frame()
respectively.
All data in the above columns can be accessed directly, for example to get the first two predicted responses:
predictions$response[1:2]
[1] 16.2800 26.7625
Similarly to plotting Task
s, mlr3viz
provides an autoplot()
method for Prediction
objects.
library(mlr3viz)
predictions = learner_rpart$predict(task_mtcars, splits$test)
autoplot(predictions)
In the examples above we made predictions by passing a task to $predict()
, instead if you would rather pass a data.frame
type object directly then you can use $predict_newdata()
:
mtcars_new = data.table(cyl = c(5, 6), disp = c(100, 120),
hp = c(100, 150), drat = c(4, 3.9), wt = c(3.8, 4.1),
qsec = c(18, 19.5), vs = c(1, 0), am = c(1, 1),
gear = c(6, 4), carb = c(3, 5))
predictions = learner_rpart$predict_newdata(mtcars_new)
predictions
<PredictionRegr> for 2 observations:
row_ids truth response
1 NA 26.7625
2 NA 26.7625
Changing the Prediction Type
Whilst predicting a single numeric quantity is the most common prediction type in regression, it is not the only prediction type. Several regression models can also predict standard errors, which are computed during training. To predict these, the $predict_type
field of a LearnerRegr
must be changed from “response” (the default) to “se” before training, and most simply during construction. The rpart
learner we used above does not support predicting standard errors, so in the example below we will use a linear regression model implemented in mlr3learners::LearnerRegrLm
, note how the output now includes standard errors.
library(mlr3learners)
learner_lm = lrn("regr.lm", predict_type = "se")
learner_lm$train(task_mtcars, splits$train)
learner_lm$predict(task_mtcars, splits$test)
<PredictionRegr> for 9 observations:
row_ids truth response se
2 21.0 21.92034 1.193742
9 22.8 25.36346 1.346913
27 26.0 25.80819 1.219812

23 15.2 15.76983 1.349221
29 15.8 14.75022 1.056986
28 30.4 26.35487 1.145776
We will see prediction types playing an even bigger part in classification in Section 2.5.3.
The final step of the basic ML workflow (Figure 2.1) is to evaluate the quality of predictions to see if our trained model is any ‘good’. We will cover basic evaluation in Section 2.3 and then more advanced evaluation for data resampling and model comparison in Chapter 3. But first we will cover the final element that makes ML models powerful predictive tools, which is their hyperparameters, which give users control over the fitting and predicting process and when set correctly can result in more accurate models.
2.2.3 Hyperparameters
Learner
s encapsulate an ML algorithm and its hyperparameters, which are free parameters that can be set by the user to affect how the algorithm is run. Hyperparameters may affect how a model is trained or how it makes predictions and deciding which hyperparameters to set can require expert knowledge though often there is an element of trial and error. Hyperparameters are hugely important to a model performing well and therefore setting hyperparameters manually is rarely a good idea. In practice, automated hyperparameter optimization is more common, which we will return to in Chapter 4. For this chapter we will refer to manual setting of hyperparameters for the sake of brevity. We will first look at paradox
and ParamSet
objects which are used to store learner hyperparameters, and then we will look at getting and setting these values.
2.2.3.1 Paradox and parameter sets
We will continue our running example with a regression tree learner. To access the hyperparameters in the decision tree, we use $param_set
:
learner_rpart$param_set
<ParamSet>
id class lower upper nlevels default value
1: cp ParamDbl 0 1 Inf 0.01
2: keep_model ParamLgl NA NA 2 FALSE
3: maxcompete ParamInt 0 Inf Inf 4
4: maxdepth ParamInt 1 30 30 30
5: maxsurrogate ParamInt 0 Inf Inf 5
6: minbucket ParamInt 1 Inf Inf <NoDefault[3]>
7: minsplit ParamInt 1 Inf Inf 20
8: surrogatestyle ParamInt 0 1 2 0
9: usesurrogate ParamInt 0 2 3 2
10: xval ParamInt 0 Inf Inf 10 0
The output above is a paradox::ParamSet
object from the package paradox
. These objects provide information on hyperparameters including their name (id
), data types (class
), acceptable ranges for hyperparameter values (lower
, upper
), the number of levels possible if the data type is categorical (nlevels
), the default value from the underlying package (default
), and finally the set value if different from the default (value
). The second column references classes defined in paradox
that determine the class of the parameter and the possible values it can take. Table 2.1 lists the possible hyperparameter types, all of which inherit from paradox::Param
.
Hyperparameter Class  Description 

ParamDbl 
Realvalued (Numeric) Parameters 
ParamInt 
Integer Parameters 
ParamFct 
Categorical (Factor) Parameters 
ParamLgl 
Logical / Boolean Parameters 
ParamUty 
Untyped Parameters 
Let us carry on the example above and consider some specific hyperparameters. From the decision tree ParamSet
output we can infer the following:

cp
must be a “double” (ParamDbl
) taking values between 0 (lower
) and 1 (upper
) with a default of 0.01 (default
). 
keep_model
must be a “logical” (ParamLgl
) taking valuesTRUE
orFALSE
with defaultFALSE

xval
must be an “integer” (ParamInt
) taking values between 0 andInf
with a default of 10 and a set value of0
.
In rare cases (we try to minimize it as much as possible), we alter hyperparameter values in construction. When we do this the reason will always be given in the learner help page. In the case of regr.rpart
, we change the xval
default to 0
because xval
controls internal crossvalidations and if a user accidentally leaves this at 10 then model training can take a long time.
2.2.3.2 Getting and setting hyperparameter values
Now we have looked at how parameter sets are stored, we can now think about getting and setting parameters. Returning to our decision tree, say we are interested in growing a tree with depth 1, which means a tree where data is split once into two terminal nodes. From the parameter set output, we know that the maxdepth
parameter has a default of 30 and that it takes integer values.
There are a few different ways we could change this hyperparameter. The simplest way to set a hyperparameter is in construction of the learner by simply passing the hyperparameter name and new value to lrn()
:
learner_rpart = lrn("regr.rpart", maxdepth = 1)
We can view the set of nondefault hyperparameters (i.e., those changed by the user) by using $param_set$values
:
learner_rpart$param_set$values
$xval
[1] 0
$maxdepth
[1] 1
# depth 1
learner_rpart$train(tsk("mtcars"))$model
n= 32
node), split, n, deviance, yval
* denotes terminal node
1) root 32 1126.0470 20.09062
2) cyl>=5 21 198.4724 16.64762 *
3) cyl< 5 11 203.3855 26.66364 *
Now we can see that maxdepth = 1
(as we discussed above xval = 0
is changed in construction) and the constructed regression tree reflects this.
This values
field simply returns a list
of set hyperparameters, so another way to update hyperparameters is by updating an element in the list:
learner_rpart$param_set$values$maxdepth = 2
learner_rpart$param_set$values
$xval
[1] 0
$maxdepth
[1] 2
# depth 2
learner_rpart$train(tsk("mtcars"))$model
n= 32
node), split, n, deviance, yval
* denotes terminal node
1) root 32 1126.04700 20.09062
2) cyl>=5 21 198.47240 16.64762
4) hp>=192.5 7 28.82857 13.41429 *
5) hp< 192.5 14 59.87214 18.26429 *
3) cyl< 5 11 203.38550 26.66364 *
Finally, to set multiple values at once we recommend either setting these in construction or using $set_values
.
learner_rpart = lrn("regr.rpart", maxdepth = 3, xval = 1)
learner_rpart$param_set$values
$xval
[1] 1
$maxdepth
[1] 3
# or with set_values
learner_rpart$param_set$set_values(xval = 2, cp = 0.5)
learner_rpart$param_set$values
$xval
[1] 2
$maxdepth
[1] 3
$cp
[1] 0.5
As learner_rpart$param_set$values
returns a list
, some users may be tempted to set hyperparameters by passing a new list
to $values
– this would work but we do not recommend it. This is because passing a list
will wipe any existing hyperparameter values if they are not included in the list. So by example:
rpart_params = lrn("regr.rpart")
# values at construction
rpart_params$param_set$values
$xval
[1] 0
# passing maxdepth the wrong way
rpart_params$param_set$values = list(maxdepth = 1)
# we have removed xval by mistake
rpart_params$param_set$values
$maxdepth
[1] 1
# now with set_values
rpart_params = lrn("regr.rpart")
rpart_params$param_set$set_values(maxdepth = 1)
rpart_params$param_set$values
$xval
[1] 0
$maxdepth
[1] 1
All methods have safety checks to ensure your new values fall within the allowed parameter range:
lrn("regr.rpart", cp = 2, maxdepth = 2)
Error in self$assert(xs): Assertion on 'xs' failed: cp: Element 1 is not <= 1.
2.2.3.3 Parameter dependencies
More complex hyperparameter spaces may include parameter dependencies, which occur when setting a hyperparameter is conditional on the value of another hyperparameter, this is most important in the context of model tuning (Chapter 4). One such example is an SVM classifier, implemented in mlr3learners::LearnerClassifSVM
. The parameter set of this model has an additional column called ‘parents’, which tells us there are parameter dependencies in the learner.
lrn("classif.svm")$param_set
<ParamSet>
id class lower upper nlevels default parents
1: cachesize ParamDbl Inf Inf Inf 40
2: class.weights ParamUty NA NA Inf
3: coef0 ParamDbl Inf Inf Inf 0 kernel
4: cost ParamDbl 0 Inf Inf 1 type
5: cross ParamInt 0 Inf Inf 0
6: decision.values ParamLgl NA NA 2 FALSE
7: degree ParamInt 1 Inf Inf 3 kernel
8: epsilon ParamDbl 0 Inf Inf 0.1
9: fitted ParamLgl NA NA 2 TRUE
10: gamma ParamDbl 0 Inf Inf <NoDefault[3]> kernel
11: kernel ParamFct NA NA 4 radial
12: nu ParamDbl Inf Inf Inf 0.5 type
13: scale ParamUty NA NA Inf TRUE
14: shrinking ParamLgl NA NA 2 TRUE
15: tolerance ParamDbl 0 Inf Inf 0.001
16: type ParamFct NA NA 2 Cclassification
1 variable not shown: [value]
To view exactly what the dependency is we can use $deps
, this returns a data.table
which can queried in the usual way. So to see the dependencies of the SVM and to inspect the conditions we could do the following:
lrn("classif.svm")$param_set$deps
id on cond
1: cost type <CondEqual[9]>
2: nu type <CondEqual[9]>
3: degree kernel <CondEqual[9]>
4: coef0 kernel <CondAnyOf[9]>
5: gamma kernel <CondAnyOf[9]>
lrn("classif.svm")$param_set$deps[1, cond]
[[1]]
CondEqual: x = Cclassification
lrn("classif.svm")$param_set$deps[4, cond]
[[1]]
CondAnyOf: x ∈ {polynomial, sigmoid}
This tells us that the parameter cost
should only be set if the type
parameter is set to "Cclassification"
. Similarly, the coef0
parameter should be set only if "polynomial"
, "radial"
, or "sigmoid"
.
# errors as type is not Cclassification
lrn("classif.svm", type = "epsclassification", cost = 0.5)
Error in self$assert(xs): Assertion on 'xs' failed: type: Must be element of set {'Cclassification','nuclassification'}, but is 'epsclassification'.
# works because type is Cclassification
lrn("classif.svm", type = "Cclassification", cost = 0.5)
<LearnerClassifSVM:classif.svm>
* Model: 
* Parameters: type=Cclassification, cost=0.5
* Packages: mlr3, mlr3learners, e1071
* Predict Types: [response], prob
* Feature Types: logical, integer, numeric
* Properties: multiclass, twoclass
2.2.4 Baseline learners
Before we move onto learner evaluation, we will first highlight one particularly important class of learners that are useful in many aspects of ML. Contrary to expectations, these are actually the ‘bad’ or ‘weak’ learners known as baselines. Baselines are useful in model comparison (Chapter 3), as fallback learners (Section 4.7.1, Section 9.2.2), to be ‘composed’ into more complex models (Section 8.2.4), and can be used by sophisticated models (e.g., random forests) during training and/or prediction. For regression, we have implemented the baseline regr.featureless
, which always predicts the mean of the target of the training data:
# generate data
df = as_task_regr(data.frame(x = runif(1000), y = rnorm(1000, 2, 1)), target = "y")
lrn("regr.featureless")$train(df, 1:995)$predict(df, 996:1000)
<PredictionRegr> for 5 observations:
row_ids truth response
996 3.675092 1.977081
997 3.651012 1.977081
998 1.803173 1.977081
999 1.195544 1.977081
1000 1.861893 1.977081
It is good practice to test all new models against a baseline, and also to include baselines in experiments with multiple other models. In general, a model that does not outperform a baseline is a ‘bad’ model, on the other hand a model is not necessarily ‘good’ if it outperforms the baseline.
2.3 Evaluation
Perhaps the most important step of the ML workflow is evaluating model performance. Without this step, we would have no way to know if our trained model makes very accurate predictions, is worse than randomly guessing, or somewhere in between. We will continue with our decision tree example to establish if the quality of our predictions is ‘good’, first we will rerun the above code so it is easier to follow along.
2.3.1 Measures
Analogously to Task
s and Learner
s, the available measures in mlr3
are stored in a dictionary called mlr_measures
, which can be converted to a data.table
to view all available measures; we have a sugar function msr()
to simplify retrieving a measure for you and again you can use the $help()
method to find documentation for any measure.
as.data.table(mlr_measures)
key label task_type packages
1: aic Akaike Information Criterion <NA> mlr3
2: bic Bayesian Information Criterion <NA> mlr3
3: classif.acc Classification Accuracy classif mlr3,mlr3measures
4: classif.auc Area Under the ROC Curve classif mlr3,mlr3measures
5: classif.bacc Balanced Accuracy classif mlr3,mlr3measures

62: sim.jaccard Jaccard Similarity Index <NA> mlr3,mlr3measures
63: sim.phi Phi Coefficient Similarity <NA> mlr3,mlr3measures
64: time_both Elapsed Time <NA> mlr3
65: time_predict Elapsed Time <NA> mlr3
66: time_train Elapsed Time <NA> mlr3
2 variables not shown: [predict_type, task_properties]
All measures implemented in mlr3
are defined primarily by three components: 1) the function that defines the measure; 2) whether a lower or higher value is consider ‘good’; and 3) the range of possible values the measure can take. As well as these defining elements, other metadata are important to consider when selecting and using a Measure
, including if the measure has any special properties (e.g., requires training data), the type of predictions the measure can evaluate, and whether the measure has any ‘control parameters’. All this information is encapsulated in the Measure
object. By example let us consider the mean absolute error (regr.mae
).
measure = msr("regr.mae")
measure
<MeasureRegrSimple:regr.mae>: Mean Absolute Error
* Packages: mlr3, mlr3measures
* Range: [0, Inf]
* Minimize: TRUE
* Average: macro
* Parameters: list()
* Properties: 
* Predict type: response
This measure compares the absolute difference (‘error’) between true and predicted values: \(f(y, \hat{y}) =  y  \hat{y} \). Lower values are considered better (Minimize = TRUE
), which is intuitive as we would like the true values, \(y\), to be identical (or as close as possible) in value to the predicted values, \(\hat{y}\). Finally we can see that the range of possible values the learner can take is from \(0\) to \(\infty\) (Range: [0, Inf]
). The measure has no special properties (Properties: 
), it evaluates response
type predictions for regression models (Predict type: response
), and it has no control parameters (Parameters: list()
).
2.3.2 Scoring Prediction
s
All supervised learning measures compare the difference between predicted values and the ground truth. mlr3
simplifies the process of bringing these quantities together by storing the predictions and true outcomes in the Prediction
object as we have already seen.
predictions
<PredictionRegr> for 11 observations:
row_ids truth response
2 21.0 16.70000
8 24.4 26.81429
21 21.5 26.81429

31 15.0 16.70000
18 32.4 26.81429
26 27.3 26.81429
To actually calculate model performance, we simply call the $score()
method of a Prediction
object and pass as a single argument the measure (or measures passed as a list) that we want to compute. Less abstractly:
predictions$score(measure)
regr.mae
2.590909
Note that all task types have default measures that are used if the argument to $score()
is omitted, for regression this is the mean squared error (regr.mse
), which is the squared difference between true and predicted values: \(f(y, \hat{y}) = (y  \hat{y})^2\).
It is possible to calculate multiple measures at the same time by passing multiple measures to $score()
. For example, below we compute performance for mean squared error (regr.mse
) and mean absolute error (regr.mae
) – note we use msrs()
to load multiple measures at once.
2.3.3 Technical measures
mlr3
also provides measures that do not quantify the quality of the predictions of a model, but instead provide ‘meta’ information about the model, in particular we have implemented:

mlr_measures_time_train
– The time taken to train a model 
mlr_measures_time_predict
– The time taken for the model to make predictions 
mlr_measures_time_both
– The total time taken to train the model and then make predictions 
mlr_measures_selected_features
– The features selected by a model, which can only be used if the model has the ‘selected_features’ property.
So we can now score our decision tree to see how long it takes to train the model and then make predictions:
measures = msrs(c("time_train", "time_predict", "time_both"))
predictions$score(measures, learner = learner_rpart)
time_train time_predict time_both
0.004 0.003 0.007
Notice a few key properties of these measures:

time_both
is simply the sum oftime_train
andtime_predict
 We had to pass
learner = learner_rpart
to$score()
as these measures have therequires_learner
property:
msr("time_train")$properties
[1] "requires_learner"
 These can be used after model training and predicting because we automatically store model run times whenever
$train()
and$test()
are called, so the measures above are equivalent to:
The selected_features
measure calculates how many features were selected as important by the learner.
measure_sf = msr("selected_features")
measure_sf
<MeasureSelectedFeatures:selected_features>: Absolute or Relative Frequency of Selected Features
* Packages: mlr3
* Range: [0, Inf]
* Minimize: TRUE
* Average: macro
* Parameters: normalize=FALSE
* Properties: requires_task, requires_learner, requires_model
* Predict type: NA
We can see that this measure contains control parameters (Parameters: normalize=FALSE
), which are parameters that control how the measure is computed. As with hyperparameters these can be viewed with $param_set
:
measure_sf = msr("selected_features")
measure_sf$param_set
<ParamSet>
id class lower upper nlevels default value
1: normalize ParamLgl NA NA 2 FALSE FALSE
The normalize
hyperparameter specifies whether the returned number of selected features should be normalized by the total number of features, this is useful if you are comparing this value across tasks with differing number of features, so let us change the default to TRUE
and see how many (normalized) features our decision tree selected:
measure_sf$param_set$values$normalize = TRUE
predictions$score(measure_sf, task = task_mtcars, learner = learner_rpart)
selected_features
0.1
Note that we passed the task and learner as the measure has the requires_task
and requires_learner
property.
2.4 Our first regression experiment
Before we go on to look at how the building blocks of mlr3
extend to classification, we will take a brief pause to put together everything above in a short experiment. In this experiment we will compare the performance of a featureless regression learner to a decision tree with changed parameters.
library(mlr3)
set.seed(349)
# load and partition our task
task_bh = tsk("mtcars")
splits = partition(task)
# load featureless learner
featureless = lrn("regr.featureless")
# load decision tree with different hyperparameters
rpart = lrn("regr.rpart", cp = 0.2, maxdepth = 5)
# load MSE and MAE measures, and calculate time
measures = msrs(c("regr.mse", "regr.mae"))
# train learners
featureless$train(task, splits$train)
rpart$train(task, splits$train)
# make and score predictions
featureless$predict(task, splits$test)$score(measures)
regr.mse regr.mae
26.726772 4.512987
rpart$predict(task, splits$test)$score(measures)
regr.mse regr.mae
6.932709 2.206494
Before starting the experiment we load the mlr3
library and set a seed (in an exercise below you will be asked to think about why setting a seed is essential for reproducibility in this experiment). In this experiment we loaded the regression task mtcars
with tsk()
and then split this using partition
with the default 70/30 split. Next we loaded a featureless baseline learner (regr.featureless
) with the lrn()
function. Then loaded a decision tree (regr.rpart
) but changed the complexity parameter and max tree depth from their defaults. We then used msrs()
to load multiple measures at once, the mean squared error (MSE) (regr.mse
) and the mean absolute error (MAE) (regr.mae
). With all objects loaded we then train our models, passing the same training data to both. Finally we made predictions from our trained models and scored these, note how we use ‘method chaining’, which is an R6 technique to combine multiple methods (called with $()
) in a row on the same line. For both MSE and MAE, lower values are ‘better’ (Minimize: TRUE
) therefore we can conclude that the decision tree performs better than the featureless baseline as its MSE and MAE are both lower. In Section 3.3 we will see how to formalize comparison between models in a more efficient way using benchmark()
.
Now we have put everything together you may notice that our learners and measures both have the "regr."
prefix, which is a handy way of reminding us that we are working with a regression task and therefore must make use of learners and measures built for regression. In the next section, we will extend the building block of mlr3
to consider classification tasks, which make use of learners and measures with the "classif."
prefix.
2.5 Classification
Classification problems are ones in which a model tries to predict a discrete, categorical target, as opposed to a continuous, numeric quantity. For example, predicting the species of penguin from its physical characteristics would be a classification problem as there are only a finite number of species. mlr3
ensures that the interface for all tasks is as similar as possible (if not identical) and therefore we will not repeat any content from the previous section but will just focus on differences that make classification a unique ML problem. We will first demonstrate the similarities between regression and classification by performing an experiment very similar to the one in Section 2.4 using code that will now be familiar to you. We will then move to differences in tasks, learners and predictions, before looking at thresholding, which is a method specific to classification.
2.5.1 Our first classification experiment
The interface for classification tasks, learners, and measures, is identical to the regression setting, except the underlying objects inherit from TaskClassif
, LearnerClassif
, and MeasureClassif
, respectively.
We can therefore run a very similar experiment to the one above.
library(mlr3)
set.seed(349)
# load and partition our task
task_pen = tsk("penguins")
splits = partition(task_pen)
# load featureless learner
featureless = lrn("classif.featureless")
# load decision tree with different hyperparameters
rpart = lrn("classif.rpart", cp = 0.2, maxdepth = 5)
# load accuracy measure
measure = msr("classif.acc")
# train learners
featureless$train(task_pen, splits$train)
rpart$train(task_pen, splits$train)
# make and score predictions
featureless$predict(task_pen, splits$test)$score(measure)
classif.acc
0.4424779
rpart$predict(task_pen, splits$test)$score(measure)
classif.acc
0.9469027
In this experiment we loaded the predefined task mlr_tasks_penguins
, which is based on the palmerpenguins::penguins
dataset, then partitioned the data into training and test splits. We loaded the featureless classification baseline (which always predicts the most common class in the training data) and a classification decision tree, then the accuracy measure (sum of correct predictions divided by total number of predictions), trained our models then made predictions and scored them. In this experiment the decision tree is clearly the better performing model as it is vastly more accurate.
Now we have seen the similarities between classification and regression, we can turn to some key differences.
2.5.2 TaskClassif
Classification tasks, objects inheriting from TaskClassif
, are very similar to regression tasks, except the target variable is of type factor and will have a limited number of possible classes/categories that observations can fall into.
You can view the predefined classification tasks in mlr3
by filtering the mlr_tasks
dictionary, and you can create your own with as_task_classif
.
as.data.table(mlr_tasks)[task_type == "classif"]
key label task_type nrow
1: breast_cancer Wisconsin Breast Cancer classif 683
2: german_credit German Credit classif 1000
3: ilpd Indian Liver Patient Data classif 583
4: iris Iris Flowers classif 150
5: optdigits Optical Recognition of Handwritten Digits classif 5620
6: penguins Palmer Penguins classif 344
7: penguins_simple Simplified Palmer Penguins classif 333
8: pima Pima Indian Diabetes classif 768
9: sonar Sonar: Mines vs. Rocks classif 208
10: spam HP Spam Detection classif 4601
11: titanic Titanic classif 1309
12: wine Wine Regions classif 178
13: zoo Zoo Animals classif 101
9 variables not shown: [ncol, properties, lgl, int, dbl, chr, fct, ord, pxc]
as_task_classif(palmerpenguins::penguins, target = "species")
<TaskClassif:palmerpenguins::penguins> (344 x 8)
* Target: species
* Properties: multiclass
* Features (7):
 int (3): body_mass_g, flipper_length_mm, year
 dbl (2): bill_depth_mm, bill_length_mm
 fct (2): island, sex
There are two types of classification task supported in mlr3
: binary classification, in which the outcome can be one of two categories, and multiclass classification, where the outcome can be one of three or more categories.
The sonar
task (mlr_tasks_sonar
) is an example of a binary classification problem, as it has two targets, in mlr3
terminology it has the “twoclass” property:
task_sonar = tsk("sonar")
task_sonar
<TaskClassif:sonar> (208 x 61): Sonar: Mines vs. Rocks
* Target: Class
* Properties: twoclass
* Features (60):
 dbl (60): V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2,
V20, V21, V22, V23, V24, V25, V26, V27, V28, V29, V3, V30, V31,
V32, V33, V34, V35, V36, V37, V38, V39, V4, V40, V41, V42, V43,
V44, V45, V46, V47, V48, V49, V5, V50, V51, V52, V53, V54, V55,
V56, V57, V58, V59, V6, V60, V7, V8, V9
task_sonar$class_names
[1] "M" "R"
In contrast, penguins
(mlr_tasks_penguins
) is a multiclass problem as there are more than two species of penguins, in mlr3
terminology it has the “multiclass” property:
task_penguins = tsk("penguins")
task_penguins
<TaskClassif:penguins> (344 x 8): Palmer Penguins
* Target: species
* Properties: multiclass
* Features (7):
 int (3): body_mass, flipper_length, year
 dbl (2): bill_depth, bill_length
 fct (2): island, sex
task_penguins$class_names
[1] "Adelie" "Chinstrap" "Gentoo"
In mlr3
, the only difference between these is that binary classification tasks have an extra field called $positive
, which defines the ‘positive’ class. In binary classification, as there are only two possible class types, by convention one of these is known as the positive class and the other as the negative class; it is arbitrary which is which, though often the more ‘important’ class is set as the positive class. You can set the positive class during or after construction, if no positive class is specified then mlr3
assumes the first level in the target
column is the positive class, which can lead to misleading results, as shown in the example below.
# create a dataset with factor target
data = data.frame(x = runif(5), y = factor(c("neg", "pos", "neg", "neg", "pos")))
# specifying the positive class:
as_task_classif(data, target = "y", positive = "pos")$positive
[1] "pos"
# default is first class, which here is "neg"
classif_task = as_task_classif(data, target = "y")
classif_task$positive
[1] "neg"
# changing after construction
classif_task$positive = "pos"
classif_task$positive
[1] "pos"
Whilst the choice of positive and negative class is arbitrary, it is essential to ensuring results from models and performance measures are interpreted as expected – this is best demonstrated when we discuss thresholding (Section 2.5.4) and ROC metrics (Section 3.4).
Finally, plotting is possible with mlr3viz::autoplot.TaskClassif
, below we plot a comparison between the target column and features.
2.5.3 LearnerClassif
and MeasureClassif
Classification learners, which inherit from LearnerClassif
have the same interface as regression learners. However, a key difference is that the possible prediction types in classification are either "response"
– predicting an observation’s class (a penguins Species in our example) – or "prob"
– predicting the probability of an observation belonging to each class. In classification, the latter is more informative as it provides more information about the confidence of the predictions:
learner_rpart = lrn("classif.rpart", predict_type = "prob")
learner_rpart$train(task_penguins, splits$train)
predictions = learner_rpart$predict(task_penguins, splits$test)
predictions
<PredictionClassif> for 113 observations:
row_ids truth response prob.Adelie prob.Chinstrap prob.Gentoo
2 Adelie Adelie 0.97029703 0.02970297 0.00000000
4 Adelie Adelie 0.97029703 0.02970297 0.00000000
7 Adelie Adelie 0.97029703 0.02970297 0.00000000

338 Chinstrap Chinstrap 0.04651163 0.93023256 0.02325581
341 Chinstrap Adelie 0.97029703 0.02970297 0.00000000
344 Chinstrap Chinstrap 0.04651163 0.93023256 0.02325581
Notice how the predictions include the predicted probabilities for all three classes, as well as the response
, which (by default) is the class with the highest predicted probability.
The interface for classification measures, which are of class MeasureClassif
, is identical to regression measures. The key difference in usage is that predict types are more important to be aware of, to ensure you are evaluating the ‘correct’ predictions. To evaluate "response"
predictions, you will need measures with predict_type = "response"
, or to evaluate probability predictions you will require predict_type = "prob"
. The easiest way to find these measures is by filtering the mlr_measures
dictionary:
as.data.table(mlr_measures)[task_type == "classif" & predict_type == "prob" & task_properties != "twoclass"]
key label task_type
1: classif.logloss Log Loss classif
2: classif.mauc_au1p Weighted average 1 vs. 1 multiclass AUC classif
3: classif.mauc_au1u Average 1 vs. 1 multiclass AUC classif
4: classif.mauc_aunp Weighted average 1 vs. rest multiclass AUC classif
5: classif.mauc_aunu Average 1 vs. rest multiclass AUC classif
6: classif.mbrier Multiclass Brier Score classif
3 variables not shown: [packages, predict_type, task_properties]
We also filtered to remove any measures that have the “twoclass” property as this would conflict with our “multiclass” task. Now we can evaluate the quality of our probability predictions and response predictions simultaneously:
classif.mbrier classif.logloss classif.acc
0.1016821 0.2291407 0.9469027
The accuracy measure evaluates the "response"
predictions whereas the brier score (classif.mbrier
) (squared difference between predicted probabilities and the truth) and logloss (classif.logloss
) (negative logarithm of the predicted probability for the true class) are evaluating the probability predictions.
If no measure is passed to $score()
, the default classification error (classif.ce
) is calculated, which is the number of misclassifications divided by the number of predictions, i.e., 1  classif.acc
.
2.5.4 PredictionClassif
, Confusion Matrix, and Thresholding
PredictionClassif
objects have two important differences from the regression case. Firstly, the added field $confusion
, and secondly the added method $set_threshold()
.
Confusion matrix
A confusion matrix is a popular way to show the quality of classification (response) predictions in a more detailed fashion by seeing if a model is good at (mis)classifying observations in a particular class. For binary and multiclass classification, the confusion matrix is stored in the $confusion
field of the PredictionClassif
object:
predictions$confusion
truth
response Adelie Chinstrap Gentoo
Adelie 49 3 0
Chinstrap 1 18 1
Gentoo 0 1 40
The rows in a confusion matrix are the predicted class and the columns are the true class. All offdiagonal entries are incorrectly classified observations, and all diagonal entries are correctly classified. In this case, the classifier does fairly well classifying all penguins, but we could have found that it only classifies the Adelie species well but often conflates Chinstrap and Gentoo. You can visualize a confusion matrix with autoplot.PredictionClassif
.
autoplot(predictions)
If we take task_sonar$positive
(M
) to be the positive class then the confusion matrix corresponds to true positives (top left), false positives (top right), false negatives (bottom left), and true negatives (bottom right) (see Figure 3.10):
splits = partition(task_sonar)
learner_rpart$
train(task_sonar, splits$train)$
predict(task_sonar, splits$test)$
confusion
truth
response M R
M 29 10
R 8 22
We will return to the concept of binary (mis)classification in greater detail in Section 3.4.
Thresholding
The final big difference to discuss is thresholding. We saw previously that the response
prediction type by default is calculated as the class that has the highest predicted probability. For n
classes, with predicted probabilities \(p_1,...,p_n\), this is the same as saying response
= argmax\(\{p_1,...,p_n\}\). If the maximum probability is not unique, i.e., multiple classes are predicted to have the highest probability, then the response is chosen randomly from these. In binary classification this means that the positive class will be selected if the predicted class is greater than 50%, and the negative class otherwise.
This 50% value is known as the threshold and it can be useful to change this threshold if there is class imbalance (when one class is over or underrepresented in a dataset), or if there are different costs associated with classes, or simply if there is a preference to ‘over’predict one class. As an example, let us take the german_credit
task in which 700 customers have good credit and 300 have bad. Now we could easily build a model with 70% accuracy simply by always predicting a customer will have good credit:
library(mlr3viz)
task_credit = tsk("german_credit")
learn_featureless = lrn("classif.featureless", predict_type = "prob")
split = partition(task_credit)
learn_featureless$train(task_credit, split$train)
preds = learn_featureless$predict(task_credit, split$test)
preds$score(msr("classif.acc"))
classif.acc
0.7
autoplot(preds)
Whilst this model may appear ‘good’ on the surface, in fact it just ignores all ‘bad’ customers – this can create very big problems in healthcare and other settings where there are data biases, as well as for the insurance company if false positives cost more than false negatives (see Section 8.1 for costsensitive classification).
Thresholding allows classes to be selected with a lower probability threshold, so instead of predicting a customer has bad credit if P(good) < 50%, instead we might predict bad credit if P(good) < 70% – notice how we write this in terms of the positive class, which in this task is ‘good’. Let us see this in practice:
Whilst our model performs ‘worse’ overall, i.e. with lower accuracy, it is still a ‘better’ model as it more accurately captures the relationship between classes.
In the binary classification setting, $set_threshold
only requires one numeric argument, which corresponds with the threshold for the positive class – hence why it is essential to ensure the positive class is correctly set in your task.
In multiclass classification, thresholding works by first assigning a threshold to each of the n
classes, dividing the predicted probabilities for each class by these thresholds to return n
ratios, and then the class with the highest ratio is selected. By example say we are predicting if a new observation will be of class A, B, C, or D and we have predicted \(P(A = 0.2), P(B = 0.4), P(C = 0.1), P(D = 0.3)\). For now we will assume that the threshold for all classes is identical, note that it is arbitrary what thresholds are chosen if they are all identical so below we just use 1
:
A B C D
0.2 0.4 0.1 0.3
We would therefore predict our observation is of class B as this is the highest ratio. However, we could change our thresholds so that D has the lowest threshold and is therefore most likely to be predicted, A has the highest threshold, and B and C are equal:
thresholds = c(A = 0.5, B = 0.25, C = 0.25, D = 0.1)
probs/thresholds
A B C D
0.4 1.6 0.4 3.0
Now our observation will be predicted to be in class D.
In mlr3
, the same principle is followed with $set_threshold
by passing a named list. This is demonstrated below with mlr_tasks_zoo
. Before changing the thresholds, some classes are never predicted and some are overpredicted.
library(ggplot2)
library(patchwork)
task = tsk("zoo")
splits = partition(task)
learner = lrn("classif.rpart", predict_type = "prob")
learner$train(task, splits$train)
preds = learner$predict(task, splits$test)
before = autoplot(preds) + ggtitle("Default thresholds")
new_thresh = proportions(table(task$truth(splits$train)))
new_thresh
mammal bird reptile fish amphibian
0.40298507 0.19402985 0.04477612 0.13432836 0.04477612
insect mollusc.et.al
0.07462687 0.10447761
preds$set_threshold(new_thresh)
after = autoplot(preds) + ggtitle("Inverse weighting thresholds")
before + after + plot_layout(guides = "collect")
Again we see that the model better represents all classes after thresholding. In this example we set the new thresholds to be the proportions of each class in the training set, doing so, known as inverse weighting, effectively sets the thresholds as the inverse probability of occurring, this means that more common classes are will have higher thresholds and vice versa.
In Chapter 6 we will return to thresholding to see how to automatically choose and set thresholds,§ and in Section 8.1 we will look at costsensitive classification where each class has a different associated cost.
2.6 Task Column Roles
Now we have covered regression and classification, we can briefly return to tasks and in particular to column roles, which are used to customize tasks further. Column roles are used by Task
objects to define important metadata that can be used by learners and other objects to interact with the task. We have already seen some of these in action with targets and features. There are seven column roles available:

"feature"
: Features used for prediction. 
"target"
: Target variable to predict. 
"name"
: Row names/observation labels, formtcars
this is the"model"
column. 
"order"
: Variable(s) used to order data returned by$data()
; must be sortable withorder()
. 
"group"
: Variable used to keep observations together during resampling. 
"stratum"
: Variable(s) to stratify during resampling. 
"weight"
: Observation weights. Not more than one numeric column may have this role.
We have already seen how feature and targets work in Section 2.1, these are the only required column roles. In Section 3.2.5 we will have a look at the stratum
and group
column roles. So for now we will only look at order
, and weight
. We will not go into detail about name
, which is primarily used by plotting and will almost always be the rownames()
of the underlying data.
Column roles are updated using the $set_col_roles()
method. When we set the order
column role, the data is ordered according to that column(s), as in the following example.
df = data.frame(mtcars[1:2, ], idx = 2:1)
task_mtcars_order = as_task_regr(df, target = "mpg")
task_mtcars_order$data(ordered = TRUE)
mpg am carb cyl disp drat gear hp idx qsec vs wt
1: 21 1 4 6 160 3.9 4 110 2 16.46 0 2.620
2: 21 1 4 6 160 3.9 4 110 1 17.02 0 2.875
# order by "idx" column
task_mtcars_order$set_col_roles("idx", roles = "order")
task_mtcars_order$data(ordered = TRUE)
mpg am carb cyl disp drat gear hp qsec vs wt
1: 21 1 4 6 160 3.9 4 110 17.02 0 2.875
2: 21 1 4 6 160 3.9 4 110 16.46 0 2.620
In this example we can see that by setting "idx"
to have the order
column role, it is no longer displayed when we run $data()
but instead is used to order the observations according to its value. This demonstrates how the Task
object can hold metadata that is not passed to the learner.
The weights
column role is used to weight data points differently. One example of why we would do this is in classification tasks with severe class imbalance, weighting the minority class rows more heavily may improve the model’s performance on that class. For example in the breast_cancer
dataset, there are more instances of the benign tumors than malignant tumors, so if we want to better predict malignant tumors we could weight the data in favour of this class:
malignant benign
239 444
df = cancer_unweighted$data()
# adding a column where the weight is 2 when the class == "malignant", and 1 otherwise
df$weights = ifelse(df$class == "malignant", 2, 1)
cancer_weighted = as_task_classif(df, target = "class")
cancer_weighted$set_col_roles("weights", roles = "weight")
# compare weighted and unweighted predictions
split = partition(cancer_unweighted)
lrn_rf = lrn("classif.ranger")
lrn_rf$train(cancer_unweighted, split$train)$predict(cancer_unweighted, split$test)$score()
classif.ce
0.01769912
lrn_rf$train(cancer_weighted, split$train)$predict(cancer_weighted, split$test)$score()
classif.ce
0.008849558
In this example, weighting may marginally improve the model performance (see Chapter 3 for more thorough comparison methods). Not all models can handle weights in the data so it’s important to check a learner’s properties to make sure this column role is being used as expected. Furthermore, algorithms will make use of weights in different ways so it’s important to read the implementation’s documentation to understand how weights are being used.
2.7 Supported Algorithms
mlr3
supports many algorithms (some through multiple implementations) as Learner
s. These are primarily accessed through mlr3
, mlr3learners
and mlr3extralearners
package, however all packages that implement new tasks (Chapter 8) also include a handful of simple algorithms.
The list of learners included in mlr3
is deliberately small to avoid large sets of dependencies:
 Featureless learner (
regr.featureless
/classif.featureless
), which are implemented inmlr3
and are baseline learners used for model comparison or as fallback learners (Section 9.2.2). The former predicts the mean of the target values in the training set for all new observations, the latter predicts the most frequent label.  Debug learners (
regr.debug
/classif.debug
), which are implemented inmlr3
and used only to debug code (Section 9.2).  Classification and regression trees (CART) (
regr.rpart
/classif.rpart
).
The mlr3learners
package contains a selection of algorithms (and select implementations) chosen by the mlr team that we recommend as a good starting point for most experiments:
 Linear (
regr.lm
) and logistic (classif.log_reg
) regression.  Penalized Generalized Linear Models (
regr.glmnet
/classif.glmnet
) and with builtin optimization of the penalization parameter (regr.cv_glmnet
/classif.cv_glmnet`).  Weighted \(k\)Nearest Neighbors regression (
regr.kknn
/classif.kknn
).  Kriging / Gaussian Process Regression (
regr.km
).  Linear (
classif.lda
) and Quadratic (classif.qda
) Discriminant Analysis.  Naïve Bayes Classification (
classif.naive_bayes
).  SupportVector machines (
regr.svm
/classif.svm
).  Gradient Boosting (
regr.xgboost
/classif.xgboost
).  Random Forests for regression and classification (
regr.ranger
/classif.ranger
).
The majority of other learners are all in mlr3extralearners
. You can find an uptodate list of learners here: https://mlrorg.com/learners.html.
The dictionary mlr_learners
contains learners that are supported in loaded packages. You can list all learners by converting the mlr_learners
dictionary into a data.table
:
as.data.table(mlr_learners)
key label task_type
1: classif.AdaBoostM1 Adaptive Boosting classif
2: classif.C50 Treebased Model classif
3: classif.IBk Nearest Neighbour classif
4: classif.J48 Treebased Model classif
5: classif.JRip Propositional Rule Learner. classif

136: surv.priority_lasso Priority Lasso surv
137: surv.ranger Random Forest surv
138: surv.rfsrc Random Forest surv
139: surv.svm Support Vector Machine surv
140: surv.xgboost Gradient Boosting surv
4 variables not shown: [feature_types, packages, properties, predict_types]
The resulting data.table
contains a lot of metadata that is useful for identifying learners with particular properties. For example, we can list all learners that support regression problems:
as.data.table(mlr_learners)[task_type == "classif"]
key label task_type
1: classif.AdaBoostM1 Adaptive Boosting classif
2: classif.C50 Treebased Model classif
3: classif.IBk Nearest Neighbour classif
4: classif.J48 Treebased Model classif
5: classif.JRip Propositional Rule Learner. classif

41: classif.ranger <NA> classif
42: classif.rfsrc Random Forest classif
43: classif.rpart Classification Tree classif
44: classif.svm <NA> classif
45: classif.xgboost <NA> classif
4 variables not shown: [feature_types, packages, properties, predict_types]
We can filter by multiple conditions, for example to list all regression learners that and can predict standard errors:
as.data.table(mlr_learners)[task_type == "regr" &
sapply(predict_types, function(x) "se" %in% x)]
key label task_type
1: regr.debug Debug Learner for Regression regr
2: regr.earth Multivariate Adaptive Regression Splines regr
3: regr.featureless Featureless Regression Learner regr
4: regr.gam Generalized Additive Regression Model regr
5: regr.glm Generalized Linear Regression regr
6: regr.km <NA> regr
7: regr.lm <NA> regr
8: regr.mob Modelbased Recursive Partitioning regr
9: regr.ranger <NA> regr
4 variables not shown: [feature_types, packages, properties, predict_types]
2.8 Conclusion
In this chapter we covered the building blocks of mlr3
. We first introduced basic ML methodology and then showed how this is implemented in mlr3
We began by looking at the Task
class, which is used to define ML tasks or problems to solve. We then looked at the Learner
class, which encapsulates ML algorithms, hyperparameters, and other metainformation. Finally we consider how to evaluate ML models with objects from the Measure
class. After looking at regression implementations, we extended all the above to the classification setting, before finally looking at some extra details about tasks and the algorithms that are implemented across mlr3
. The rest of this book will build on the basic elements seen in this chapter, starting with more advanced model comparison methods in Chapter 3 before moving to improving model performance with automated hyperparameter tuning in Chapter 4. Table 2.2 summarizes the most important functions and methods seen in this chapter.
Underlying R6 Class  Constructor (if applicable)  Important methods 

Task 
tsk() /tsks() /as_task_X

$filter() /$data()

Learner 
lrn() /lrns()

$train() /$predict()

Prediction 
some_learner$predict() 
$score() 
Measure 
msr() /msrs()

2.9 Exercises
 Set the seed to
124
then train a classification tree model withclassif.rpart
and default hyperparameters on 80% of the data in the predefinedsonar
task. Evaluate the model’s performance with the classification error measure on the remaining data. Also think about why we need to set the seed in this example.  Calculate the true positive, false positive, true negative, and false negative rates of the predictions made by the model in exercise 1.
 Change the threshold of the model from exercise 1 such that the false positive rate is lower than the false negative rate. What is one reason you might do this in practice?