# Appendix A — Solutions to exercises

## A.1 Solutions to Chapter 2

- Use the built in
`sonar`

task and the`classif.rpart`

learner along with the partition function to train a model.

Once the model is trained, generate the predictions on the test set, define the performance measure (`classif.ce`

), and score the predictions.

```
classif.ce
0.2195122
```

- Generate a confusion matrix from the built in function.

Since the rows represent predictions (response) and the columns represent the ground truth values, the TP, FP, TN, and FN rates are as follows:

True Positive (TP) = 20

False Positive (FP) = 2

True Negative (TN) = 12

False Positive (FN) = 7

- Since in this case we want the model to predict the negative class more often, we will raise the threshold (note the
`predict_type`

for the learner must be`prob`

for this to work).

```
truth
response M R
M 14 4
R 8 15
```

One reason we might want the false positive rate to be lower than the false negative rate is if we felt it was worse for a positive prediction to be incorrect (meaning the true label was the negative label) than it was for a negative prediction to be incorrect (meaning the true label was the positive label).

## A.2 Solutions to Chapter 3

- Use the
`spam`

task and 5-fold cross-validation to benchmark Random Forest (`classif.ranger`

), Logistic Regression (`classif.log_reg`

), and XGBoost (`classif.xgboost`

) with regards to AUC. Which learner appears to do best? How confident are you in your conclusion? How would you improve upon this?

```
grid = benchmark_grid(
tasks = tsk("spam"),
learners = lrns(c("classif.ranger", "classif.log_reg", "classif.xgboost"), predict_type = "prob"),
resamplings = rsmp("cv", folds = 5)
)
bmr = benchmark(grid)
```

```
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
```

This is only a small example for a benchmark workflow, but without tuning (see Chapter 4), the results are naturally not suitable to make any broader statements about the superiority of either learner for this task.

- A colleague claims to have achieved a 93.1% classification accuracy using the
`classif.rpart`

learner on the`penguins_simple`

task. You want to reproduce their results and ask them about their resampling strategy. They said they used 3-fold cross-validation, and they assigned rows using the task’s`row_id`

modulo 3 to generate three evenly sized folds. Reproduce their results using the custom CV strategy.

## A.3 Solutions to Chapter 4

- Tune the
`mtry`

,`sample.fraction`

,`num.trees`

hyperparameters of a random forest model (`regr.ranger`

) on the`Motor Trend`

data set (`mtcars`

). Use a simple random search with 50 evaluations and select a suitable batch size. Evaluate with a 3-fold cross-validation and the root mean squared error.

```
set.seed(4)
learner = lrn("regr.ranger",
mtry.ratio = to_tune(0, 1),
sample.fraction = to_tune(1e-1, 1),
num.trees = to_tune(1, 2000)
)
instance = ti(
task = tsk("mtcars"),
learner = learner,
resampling = rsmp("cv", folds = 3),
measures = msr("regr.rmse"),
terminator = trm("evals", n_evals = 50)
)
tuner = tnr("random_search", batch_size = 10)
tuner$optimize(instance)
```

```
mtry.ratio sample.fraction num.trees learner_param_vals x_domain regr.rmse
1: 0.3558625 0.9108463 192 <list[4]> <list[3]> 2.757883
```

- Evaluate the performance of the model created in Question 1 with nested resampling. Use a holdout validation for the inner resampling and a 3-fold cross-validation for the outer resampling. Print the unbiased performance estimate of the model.

```
set.seed(4)
learner = lrn("regr.ranger",
mtry.ratio = to_tune(0, 1),
sample.fraction = to_tune(1e-1, 1),
num.trees = to_tune(1, 2000)
)
at = auto_tuner(
tuner = tnr("random_search", batch_size = 10),
learner = learner,
resampling = rsmp("holdout"),
measure = msr("regr.rmse"),
terminator = trm("evals", n_evals = 50)
)
task = tsk("mtcars")
outer_resampling = rsmp("cv", folds = 3)
rr = resample(task, at, outer_resampling, store_models = TRUE)
rr$aggregate()
```

```
regr.mse
12.16805
```

- Tune and benchmark an XGBoost model against a logistic regression
`Spam`

data set and determine which has the best Brier score. Use mlr3tuningspaces and nested resampling.

`Loading required package: mlr3tuning`

`Loading required package: paradox`

```
learner_xgboost = lts(lrn("classif.xgboost", predict_type = "prob"))
at_xgboost = auto_tuner(
tuner = tnr("random_search", batch_size = 1),
learner = learner_xgboost,
resampling = rsmp("cv", folds = 3),
measure = msr("classif.bbrier"),
term_evals = 2,
)
learner_logreg = lrn("classif.log_reg", predict_type = "prob")
at_logreg = auto_tuner(
tuner = tnr("random_search", batch_size = 1),
learner = learner_logreg,
resampling = rsmp("cv", folds = 3),
measure = msr("classif.bbrier"),
term_evals = 2,
)
task = tsk("spam")
outer_resampling = rsmp("cv", folds = 3)
design = benchmark_grid(
tasks = task,
learners = list(at_xgboost, at_logreg),
resamplings = outer_resampling
)
bmr = benchmark(design, store_models = TRUE)
```

```
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
```

```
<BenchmarkResult> of 6 rows with 2 resampling runs
nr task_id learner_id resampling_id iters warnings errors
1 spam classif.xgboost.tuned cv 3 0 0
2 spam classif.log_reg.tuned cv 3 0 0
```

## A.4 Solutions to Chapter 5

- Calculate a correlation filter on the
`Motor Trend`

data set (`mtcars`

).

```
library("mlr3verse")
filter = flt("correlation")
task = tsk("mtcars")
filter$calculate(task)
as.data.table(filter)
```

```
feature score
1: wt 0.8676594
2: cyl 0.8521620
3: disp 0.8475514
4: hp 0.7761684
5: drat 0.6811719
6: vs 0.6640389
7: am 0.5998324
8: carb 0.5509251
9: gear 0.4802848
10: qsec 0.4186840
```

- Use the filter from the first exercise to select the five best features in the
`mtcars`

data set.

`[1] "cyl" "disp" "drat" "hp" "wt" `

- Apply a backward selection to the
`penguins`

data set with a classification tree learner`"classif.rpart"`

and holdout resampling by the measure classification accuracy. Compare the results with those in Section 5.2.1.

```
Attaching package: 'mlr3fselect'
```

```
The following object is masked from 'package:mlr3tuning':
ContextEval
```

```
instance = fselect(
fselector = fs("sequential", strategy = "sbs"),
task = tsk("penguins"),
learner = lrn("classif.rpart"),
resampling = rsmp("holdout"),
measure = msr("classif.acc")
)
as.data.table(instance$result)[, .(bill_depth, bill_length, body_mass, classif.acc)]
```

```
bill_depth bill_length body_mass classif.acc
1: FALSE TRUE TRUE 0.9826087
```

`[1] "bill_length" "body_mass" "island" "sex" "year" `

Answer the following questions:

- Do the selected features differ?

Yes, the backward selection selects more features.

- Which feature selection method achieves a higher classification accuracy?

In this example, the backwards example performs slightly better, but this depends heavily on the random seed and could look different in another run.

- Are the accuracy values in b) directly comparable? If not, what has to be changed to make them comparable?

No, they are not comparable because the holdout sampling called with `rsmp("holdout")`

creates a different holdout set for the two runs. A fair comparison would create a single resampling instance and use it for both feature selections (see Chapter 3 for details):

```
resampling = rsmp("holdout")
resampling$instantiate(tsk("penguins"))
sfs = fselect(
fselector = fs("sequential", strategy = "sfs"),
task = tsk("penguins"),
learner = lrn("classif.rpart"),
resampling = resampling,
measure = msr("classif.acc")
)
sbs = fselect(
fselector = fs("sequential", strategy = "sbs"),
task = tsk("penguins"),
learner = lrn("classif.rpart"),
resampling = resampling,
measure = msr("classif.acc")
)
as.data.table(sfs$result)[, .(bill_depth, bill_length, body_mass, classif.acc)]
```

```
bill_depth bill_length body_mass classif.acc
1: TRUE TRUE FALSE 0.973913
```

```
bill_depth bill_length body_mass classif.acc
1: TRUE TRUE TRUE 0.973913
```

Alternatively, one could automate the feature selection and perform a benchmark between the two wrapped learners.

- Automate the feature selection as in Section 5.2.6 with the
`spam`

data set and a logistic regression learner (`"classif.log_reg"`

). Hint: Remember to call`library("mlr3learners")`

for the logistic regression learner.

```
library("mlr3fselect")
library("mlr3learners")
at = auto_fselector(
fselector = fs("random_search"),
learner = lrn("classif.log_reg"),
resampling = rsmp("holdout"),
measure = msr("classif.acc"),
terminator = trm("evals", n_evals = 50)
)
grid = benchmark_grid(
task = tsk("spam"),
learner = list(at, lrn("classif.log_reg")),
resampling = rsmp("cv", folds = 3)
)
bmr = benchmark(grid)
aggr = bmr$aggregate(msrs(c("classif.acc", "time_train")))
as.data.table(aggr)[, .(learner_id, classif.acc, time_train)]
```

```
learner_id classif.acc time_train
1: classif.log_reg.fselector 0.8778337 5.802333
2: classif.log_reg 0.9256695 0.246000
```

## A.5 Solutions to Chapter 6

## A.6 Solutions to Chapter 8

- Run a benchmark experiment on the
`german_credit`

task with algorithms:`featureless`

,`log_reg`

,`ranger`

. Tune the`featureless`

model using`tunetreshold`

and`learner_cv`

. Use 2-fold CV and evaluate with`msr("classif.costs", costs = costs)`

where you should make the parameter`costs`

so that the cost of a true positive is -10, the cost of a true negative is -1, the cost of a false positive is 2, and the cost of a false negative is 3. Use`set.seed(11)`

to make sure you get the same results as us. Are your results surprising?

```
library(mlr3verse)
set.seed(11)
costs = matrix(c(-10, 3, 2, -1), nrow = 2, dimnames =
list("Predicted Credit" = c("good", "bad"),
Truth = c("good", "bad")))
cost_measure = msr("classif.costs", costs = costs)
gr = po("learner_cv", lrn("classif.featureless", predict_type = "prob")) %>>%
po("tunethreshold", measure = cost_measure)
task = tsk("german_credit")
learners = list(as_learner(gr), lrn("classif.log_reg"), lrn("classif.ranger"))
bmr = benchmark(benchmark_grid(task, learners, rsmp("cv", folds = 2)))
bmr$aggregate(cost_measure)[, c(4, 7)]
```

```
learner_id classif.costs
1: classif.featureless.tunethreshold -6.400
2: classif.log_reg -5.420
3: classif.ranger -5.923
```

- Use the
`probregr`

pipeline to create a probabilistic regression model using: xgboost for the`response`

prediction, featureless learner for the`se`

prediction, and assuming an Cauchy distribution. Train and predict on`task = tsk("mtcars"); split = partition(task)`

. Evaluate your model with the logloss measure.

```
library(mlr3verse)
library(mlr3proba)
set.seed(11)
l = as_learner(ppl("probregr",
learner = lrn("regr.xgboost"),
learner_se = lrn("regr.featureless"),
dist = "Cauchy")
)
task = tsk("mtcars")
split = partition(task)
meas = msr("regr.logloss")
l$train(task, split$train)$predict(task, split$test)$score(meas)
```

```
regr.logloss
4.846335
```

- Train and predict a survival forest using
`rfsrc`

(from`mlr3extralearners`

). Run this experiment using`task = tsk("rats"); split = partition(task)`

. Evaluate your model with the RCLL measure.

```
library(mlr3verse)
library(mlr3proba)
library(mlr3extralearners)
set.seed(11)
task = tsk("rats")
split = partition(task)
lrn("surv.rfsrc")$
train(task, split$train)$
predict(task, split$test)$
score(msr("surv.rcll"))
```

```
surv.rcll
4.030926
```

- Estimate the density of the
`tsk("precip")`

data using`logspline`

(from`mlr3extralearners`

). Run this experiment using`task = tsk("precip"); split = partition(task)`

. Evaluate your model with the logloss measure.

```
library(mlr3verse)
library(mlr3proba)
library(mlr3extralearners)
set.seed(11)
task = tsk("precip")
split = partition(task)
lrn("dens.logspline")$
train(task, split$train)$
predict(task, split$test)$
score(msr("dens.logloss"))
```

```
dens.logloss
3.979233
```

- Run a benchmark clustering experiment on the
`wine`

dataset without a label column. Compare the performance of`k-means`

learner with`k`

equal to 2, 3 and 4 using the silhouette measure. Use insample resampling technique. What value of`k`

would you choose based on the silhouette scores?

```
library(mlr3)
library(mlr3cluster)
set.seed(11)
learners = list(
lrn("clust.kmeans", centers = 2L, id = "k-means, k=2"),
lrn("clust.kmeans", centers = 3L, id = "k-means, k=3"),
lrn("clust.kmeans", centers = 4L, id = "k-means, k=4")
)
task = as_task_clust(tsk("wine")$data()[, -1])
measure = msr("clust.silhouette")
bmr = benchmark(benchmark_grid(task, learners, rsmp("insample")))
bmr$aggregate(measure)[, c(4, 7)]
```

```
learner_id clust.silhouette
1: k-means, k=2 0.6568537
2: k-means, k=3 0.5711382
3: k-means, k=4 0.5605941
```

Based on the silhouette score, we can choose `k = 2`

.

- Run a (spatially) unbiased classification benchmark experiment on the
`ecuador`

task with a featureless learner and xgboost, evaluate with the binary Brier score.

You can use any resampling method from `mlr3spatiotempcv`

, in this solution we use 4-fold spatial envrionmental blocking.

```
Attaching package: 'mlr3spatiotempcv'
```

```
The following objects are masked from 'package:mlr3spatial':
as_task_classif_st, as_task_classif_st.data.frame,
as_task_classif_st.DataBackend, as_task_classif_st.sf,
as_task_classif_st.TaskClassifST, as_task_regr_st,
as_task_regr_st.data.frame, as_task_regr_st.DataBackend,
as_task_regr_st.sf, as_task_regr_st.TaskClassifST,
as_task_regr_st.TaskRegrST, TaskClassifST, TaskRegrST
```

```
set.seed(11)
learners = lrns(paste0("classif.", c("xgboost", "featureless")),
predict_type = "prob")
rsmp_sp = rsmp("spcv_env", folds = 4)
design = benchmark_grid(tsk("ecuador"), learners, rsmp_sp)
bmr = benchmark(design)
bmr$aggregate(msr("classif.bbrier"))[, c(4, 7)]
```

```
learner_id classif.bbrier
1: classif.xgboost 0.2302815
2: classif.featureless 0.3838972
```

## A.7 Solutions to Chapter 9

## A.8 Solutions to Chapter 10

- Prepare a
`mlr3`

regression task for`fifa`

data. Select only variables describing the age and skills of footballers. Train any predictive model for this task, e.g.`regr.ranger`

.

```
library("DALEX")
library("ggplot2")
data("fifa", package = "DALEX")
old_theme = set_theme_dalex("ema")
library("mlr3")
library("mlr3learners")
set.seed(1)
fifa20 <- fifa[,5:42]
task_fifa = as_task_regr(fifa20, target = "value_eur", id = "fifa20")
learner = lrn("regr.ranger")
learner$train(task_fifa)
learner$model
```

```
Ranger result
Call:
ranger::ranger(dependent.variable.name = task$target_names, data = task$data(), case.weights = task$weights$weight, num.threads = 1L)
Type: Regression
Number of trees: 500
Sample size: 5000
Number of independent variables: 37
Mtry: 6
Target node size: 5
Variable importance mode: none
Splitrule: variance
OOB prediction error (MSE): 1.022805e+13
R squared (OOB): 0.869943
```

- Use the permutation importance method to calculate variable importance ranking. Which variable is the most important? Is it surprising?

**With iml**

```
library(iml)
model = Predictor$new(learner,
data = fifa20,
y = fifa$value_eur)
effect = FeatureImp$new(model,
loss = "rmse")
effect$plot()
```

**With DALEX**

```
library("DALEX")
ranger_exp = DALEX::explain(learner,
data = fifa20,
y = fifa$value_eur,
label = "Fifa 2020",
verbose = FALSE)
ranger_effect = model_parts(ranger_exp, B = 5)
head(ranger_effect)
```

```
variable mean_dropout_loss label
1 _full_model_ 1402526 Fifa 2020
2 value_eur 1402526 Fifa 2020
3 weight_kg 1471865 Fifa 2020
4 goalkeeping_kicking 1472795 Fifa 2020
5 height_cm 1474859 Fifa 2020
6 movement_balance 1475618 Fifa 2020
```

- Use the Partial Dependence profile to draw the global behavior of the model for this variable. Is it aligned with your expectations?

**With iml**

```
num_features = c("movement_reactions", "skill_ball_control", "age")
effect = FeatureEffects$new(model)
plot(effect, features = num_features)
```

**With DALEX**

```
num_features = c("movement_reactions", "skill_ball_control", "age")
ranger_profiles = model_profile(ranger_exp, variables = num_features)
plot(ranger_profiles)
```

4 Choose one of the football players. You can choose some well-known striker (e.g. Robert Lewandowski) or a well-known goalkeeper (e.g. Manuel Neuer). The following tasks are worth repeating for several different choices.

- For the selected footballer, calculate and plot the Shapley values. Which variable is locally the most important and has the strongest influence on the valuation of the footballer?

**With iml**

**With DALEX**

```
ranger_shap = predict_parts(ranger_exp,
new_observation = player_1,
type = "shap", B = 1)
plot(ranger_shap, show_boxplots = FALSE)
```

- For the selected footballer, calculate the Ceteris Paribus / Individual Conditional Expectation profiles to draw the local behavior of the model for this variable. Is it different from the global behavior?

**With DALEX**