Appendix A — Solutions to exercises

A.1 Solutions to Chapter 2

  1. Use the built in sonar task and the classif.rpart learner along with the partition function to train a model.
set.seed(124)
task = tsk("sonar")
learner = lrn("classif.rpart", predict_type = "prob")
measure = msr("classif.ce")
splits = partition(task, ratio=0.8)

learner$train(task, splits$train)

Once the model is trained, generate the predictions on the test set, define the performance measure (classif.ce), and score the predictions.

preds = learner$predict(task, splits$test)

measure = msr("classif.ce")
preds$score(measure)
classif.ce 
 0.2195122 
  1. Generate a confusion matrix from the built in function.
preds$confusion
        truth
response  M  R
       M 20  7
       R  2 12

Since the rows represent predictions (response) and the columns represent the ground truth values, the TP, FP, TN, and FN rates are as follows:

  • True Positive (TP) = 20

  • False Positive (FP) = 2

  • True Negative (TN) = 12

  • False Positive (FN) = 7

  1. Since in this case we want the model to predict the negative class more often, we will raise the threshold (note the predict_type for the learner must be prob for this to work).
# raise threshold from 0.5 default to 0.6
preds$set_threshold(0.6)

preds$confusion
        truth
response  M  R
       M 14  4
       R  8 15

One reason we might want the false positive rate to be lower than the false negative rate is if we felt it was worse for a positive prediction to be incorrect (meaning the true label was the negative label) than it was for a negative prediction to be incorrect (meaning the true label was the positive label).

A.2 Solutions to Chapter 3

  1. Use the spam task and 5-fold cross-validation to benchmark Random Forest (classif.ranger), Logistic Regression (classif.log_reg), and XGBoost (classif.xgboost) with regards to AUC. Which learner appears to do best? How confident are you in your conclusion? How would you improve upon this?
grid = benchmark_grid(
  tasks = tsk("spam"),
  learners = lrns(c("classif.ranger", "classif.log_reg", "classif.xgboost"), predict_type = "prob"),
  resamplings = rsmp("cv", folds = 5)
)

bmr = benchmark(grid)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
mlr3viz::autoplot(bmr, measure = msr("classif.auc"))

This is only a small example for a benchmark workflow, but without tuning (see Chapter 4), the results are naturally not suitable to make any broader statements about the superiority of either learner for this task.

  1. A colleague claims to have achieved a 93.1% classification accuracy using the classif.rpart learner on the penguins_simple task. You want to reproduce their results and ask them about their resampling strategy. They said they used 3-fold cross-validation, and they assigned rows using the task’s row_id modulo 3 to generate three evenly sized folds. Reproduce their results using the custom CV strategy.
task = tsk("penguins_simple")

resampling_customcv = rsmp("custom_cv")

resampling_customcv$instantiate(task = task, f = factor(task$row_ids %% 3))

rr = resample(
  task = task,
  learner = lrn("classif.rpart"),
  resampling = resampling_customcv
)

rr$aggregate(msr("classif.acc"))
classif.acc 
  0.9309309 

A.3 Solutions to Chapter 4

  1. Tune the mtry, sample.fraction, num.trees hyperparameters of a random forest model (regr.ranger) on the Motor Trend data set (mtcars). Use a simple random search with 50 evaluations and select a suitable batch size. Evaluate with a 3-fold cross-validation and the root mean squared error.
set.seed(4)
learner = lrn("regr.ranger",
  mtry.ratio      = to_tune(0, 1),
  sample.fraction = to_tune(1e-1, 1),
  num.trees       = to_tune(1, 2000)
)

instance = ti(
  task = tsk("mtcars"),
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measures = msr("regr.rmse"),
  terminator = trm("evals", n_evals = 50)
)

tuner = tnr("random_search", batch_size = 10)

tuner$optimize(instance)
   mtry.ratio sample.fraction num.trees learner_param_vals  x_domain regr.rmse
1:  0.3558625       0.9108463       192          <list[4]> <list[3]>  2.757883
  1. Evaluate the performance of the model created in Question 1 with nested resampling. Use a holdout validation for the inner resampling and a 3-fold cross-validation for the outer resampling. Print the unbiased performance estimate of the model.
set.seed(4)
learner = lrn("regr.ranger",
  mtry.ratio      = to_tune(0, 1),
  sample.fraction = to_tune(1e-1, 1),
  num.trees       = to_tune(1, 2000)
)

at = auto_tuner(
  tuner = tnr("random_search", batch_size = 10),
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("regr.rmse"),
  terminator = trm("evals", n_evals = 50)
)

task = tsk("mtcars")
outer_resampling = rsmp("cv", folds = 3)
rr = resample(task, at, outer_resampling, store_models = TRUE)

rr$aggregate()
regr.mse 
12.16805 
  1. Tune and benchmark an XGBoost model against a logistic regression Spam data set and determine which has the best Brier score. Use mlr3tuningspaces and nested resampling.
library(mlr3tuningspaces)
Loading required package: mlr3tuning
Loading required package: paradox
learner_xgboost = lts(lrn("classif.xgboost", predict_type = "prob"))

at_xgboost = auto_tuner(
  tuner = tnr("random_search", batch_size = 1),
  learner = learner_xgboost,
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.bbrier"),
  term_evals = 2,
)

learner_logreg = lrn("classif.log_reg", predict_type = "prob")

at_logreg = auto_tuner(
  tuner = tnr("random_search", batch_size = 1),
  learner = learner_logreg,
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.bbrier"),
  term_evals = 2,
)

task = tsk("spam")
outer_resampling = rsmp("cv", folds = 3)

design = benchmark_grid(
  tasks = task,
  learners = list(at_xgboost, at_logreg),
  resamplings = outer_resampling
)

bmr = benchmark(design, store_models = TRUE)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
bmr
<BenchmarkResult> of 6 rows with 2 resampling runs
 nr task_id            learner_id resampling_id iters warnings errors
  1    spam classif.xgboost.tuned            cv     3        0      0
  2    spam classif.log_reg.tuned            cv     3        0      0

A.4 Solutions to Chapter 5

  1. Calculate a correlation filter on the Motor Trend data set (mtcars).
library("mlr3verse")
filter = flt("correlation")

task = tsk("mtcars")
filter$calculate(task)

as.data.table(filter)
    feature     score
 1:      wt 0.8676594
 2:     cyl 0.8521620
 3:    disp 0.8475514
 4:      hp 0.7761684
 5:    drat 0.6811719
 6:      vs 0.6640389
 7:      am 0.5998324
 8:    carb 0.5509251
 9:    gear 0.4802848
10:    qsec 0.4186840
  1. Use the filter from the first exercise to select the five best features in the mtcars data set.
keep = names(head(filter$scores, 5))
task$select(keep)
task$feature_names
[1] "cyl"  "disp" "drat" "hp"   "wt"  
  1. Apply a backward selection to the penguins data set with a classification tree learner "classif.rpart" and holdout resampling by the measure classification accuracy. Compare the results with those in Section 5.2.1.
library("mlr3fselect")

Attaching package: 'mlr3fselect'
The following object is masked from 'package:mlr3tuning':

    ContextEval
instance = fselect(
  fselector = fs("sequential", strategy = "sbs"),
  task =  tsk("penguins"),
  learner = lrn("classif.rpart"),
  resampling = rsmp("holdout"),
  measure = msr("classif.acc")
)
as.data.table(instance$result)[, .(bill_depth, bill_length, body_mass, classif.acc)]
   bill_depth bill_length body_mass classif.acc
1:      FALSE        TRUE      TRUE   0.9826087
instance$result_feature_set
[1] "bill_length" "body_mass"   "island"      "sex"         "year"       

Answer the following questions:

  1. Do the selected features differ?

Yes, the backward selection selects more features.

  1. Which feature selection method achieves a higher classification accuracy?

In this example, the backwards example performs slightly better, but this depends heavily on the random seed and could look different in another run.

  1. Are the accuracy values in b) directly comparable? If not, what has to be changed to make them comparable?

No, they are not comparable because the holdout sampling called with rsmp("holdout") creates a different holdout set for the two runs. A fair comparison would create a single resampling instance and use it for both feature selections (see Chapter 3 for details):

resampling = rsmp("holdout")
resampling$instantiate(tsk("penguins"))

sfs = fselect(
  fselector = fs("sequential", strategy = "sfs"),
  task =  tsk("penguins"),
  learner = lrn("classif.rpart"),
  resampling = resampling,
  measure = msr("classif.acc")
)
sbs = fselect(
  fselector = fs("sequential", strategy = "sbs"),
  task =  tsk("penguins"),
  learner = lrn("classif.rpart"),
  resampling = resampling,
  measure = msr("classif.acc")
)
as.data.table(sfs$result)[, .(bill_depth, bill_length, body_mass, classif.acc)]
   bill_depth bill_length body_mass classif.acc
1:       TRUE        TRUE     FALSE    0.973913
as.data.table(sbs$result)[, .(bill_depth, bill_length, body_mass, classif.acc)]
   bill_depth bill_length body_mass classif.acc
1:       TRUE        TRUE      TRUE    0.973913

Alternatively, one could automate the feature selection and perform a benchmark between the two wrapped learners.

  1. Automate the feature selection as in Section 5.2.6 with the spam data set and a logistic regression learner ("classif.log_reg"). Hint: Remember to call library("mlr3learners") for the logistic regression learner.
library("mlr3fselect")
library("mlr3learners")

at = auto_fselector(
  fselector = fs("random_search"),
  learner = lrn("classif.log_reg"),
  resampling = rsmp("holdout"),
  measure = msr("classif.acc"),
  terminator = trm("evals", n_evals = 50)
)

grid = benchmark_grid(
  task = tsk("spam"),
  learner = list(at, lrn("classif.log_reg")),
  resampling = rsmp("cv", folds = 3)
)

bmr = benchmark(grid)

aggr = bmr$aggregate(msrs(c("classif.acc", "time_train")))
as.data.table(aggr)[, .(learner_id, classif.acc, time_train)]
                  learner_id classif.acc time_train
1: classif.log_reg.fselector   0.8778337   5.802333
2:           classif.log_reg   0.9256695   0.246000

A.5 Solutions to Chapter 6

A.6 Solutions to Chapter 8

  1. Run a benchmark experiment on the german_credit task with algorithms: featureless, log_reg, ranger. Tune the featureless model using tunetreshold and learner_cv. Use 2-fold CV and evaluate with msr("classif.costs", costs = costs) where you should make the parameter costs so that the cost of a true positive is -10, the cost of a true negative is -1, the cost of a false positive is 2, and the cost of a false negative is 3. Use set.seed(11) to make sure you get the same results as us. Are your results surprising?
library(mlr3verse)
set.seed(11)

costs = matrix(c(-10, 3, 2, -1), nrow = 2, dimnames =
  list("Predicted Credit" = c("good", "bad"),
    Truth = c("good", "bad")))
cost_measure = msr("classif.costs", costs = costs)

gr = po("learner_cv", lrn("classif.featureless", predict_type = "prob")) %>>%
  po("tunethreshold", measure = cost_measure)
task = tsk("german_credit")
learners = list(as_learner(gr), lrn("classif.log_reg"), lrn("classif.ranger"))
bmr = benchmark(benchmark_grid(task, learners, rsmp("cv", folds = 2)))
bmr$aggregate(cost_measure)[, c(4, 7)]
                          learner_id classif.costs
1: classif.featureless.tunethreshold        -6.400
2:                   classif.log_reg        -5.420
3:                    classif.ranger        -5.923
  1. Use the probregr pipeline to create a probabilistic regression model using: xgboost for the response prediction, featureless learner for the se prediction, and assuming an Cauchy distribution. Train and predict on task = tsk("mtcars"); split = partition(task). Evaluate your model with the logloss measure.
library(mlr3verse)
library(mlr3proba)
set.seed(11)

l = as_learner(ppl("probregr",
  learner = lrn("regr.xgboost"),
  learner_se = lrn("regr.featureless"),
  dist = "Cauchy")
)

task = tsk("mtcars")
split = partition(task)
meas = msr("regr.logloss")
l$train(task, split$train)$predict(task, split$test)$score(meas)
regr.logloss 
    4.846335 
  1. Train and predict a survival forest using rfsrc (from mlr3extralearners). Run this experiment using task = tsk("rats"); split = partition(task). Evaluate your model with the RCLL measure.
library(mlr3verse)
library(mlr3proba)
library(mlr3extralearners)
set.seed(11)

task = tsk("rats")
split = partition(task)

lrn("surv.rfsrc")$
  train(task, split$train)$
  predict(task, split$test)$
  score(msr("surv.rcll"))
surv.rcll 
 4.030926 
  1. Estimate the density of the tsk("precip") data using logspline (from mlr3extralearners). Run this experiment using task = tsk("precip"); split = partition(task). Evaluate your model with the logloss measure.
library(mlr3verse)
library(mlr3proba)
library(mlr3extralearners)
set.seed(11)

task = tsk("precip")
split = partition(task)

lrn("dens.logspline")$
  train(task, split$train)$
  predict(task, split$test)$
  score(msr("dens.logloss"))
dens.logloss 
    3.979233 
  1. Run a benchmark clustering experiment on the wine dataset without a label column. Compare the performance of k-means learner with k equal to 2, 3 and 4 using the silhouette measure. Use insample resampling technique. What value of k would you choose based on the silhouette scores?
library(mlr3)
library(mlr3cluster)
set.seed(11)
learners = list(
  lrn("clust.kmeans", centers = 2L, id = "k-means, k=2"),
  lrn("clust.kmeans", centers = 3L, id = "k-means, k=3"),
  lrn("clust.kmeans", centers = 4L, id = "k-means, k=4")
)

task = as_task_clust(tsk("wine")$data()[, -1])
measure = msr("clust.silhouette")
bmr = benchmark(benchmark_grid(task, learners, rsmp("insample")))
bmr$aggregate(measure)[, c(4, 7)]
     learner_id clust.silhouette
1: k-means, k=2        0.6568537
2: k-means, k=3        0.5711382
3: k-means, k=4        0.5605941

Based on the silhouette score, we can choose k = 2.

  1. Run a (spatially) unbiased classification benchmark experiment on the ecuador task with a featureless learner and xgboost, evaluate with the binary Brier score.

You can use any resampling method from mlr3spatiotempcv, in this solution we use 4-fold spatial envrionmental blocking.

library(mlr3verse)
library(mlr3spatial)
library(mlr3spatiotempcv)

Attaching package: 'mlr3spatiotempcv'
The following objects are masked from 'package:mlr3spatial':

    as_task_classif_st, as_task_classif_st.data.frame,
    as_task_classif_st.DataBackend, as_task_classif_st.sf,
    as_task_classif_st.TaskClassifST, as_task_regr_st,
    as_task_regr_st.data.frame, as_task_regr_st.DataBackend,
    as_task_regr_st.sf, as_task_regr_st.TaskClassifST,
    as_task_regr_st.TaskRegrST, TaskClassifST, TaskRegrST
set.seed(11)
learners = lrns(paste0("classif.", c("xgboost", "featureless")),
  predict_type = "prob")
rsmp_sp = rsmp("spcv_env", folds = 4)
design = benchmark_grid(tsk("ecuador"), learners, rsmp_sp)
bmr = benchmark(design)
bmr$aggregate(msr("classif.bbrier"))[, c(4, 7)]
            learner_id classif.bbrier
1:     classif.xgboost      0.2302815
2: classif.featureless      0.3838972

A.7 Solutions to Chapter 9

A.8 Solutions to Chapter 10

  1. Prepare a mlr3 regression task for fifa data. Select only variables describing the age and skills of footballers. Train any predictive model for this task, e.g. regr.ranger.
library("DALEX")
library("ggplot2")
data("fifa", package = "DALEX")
old_theme = set_theme_dalex("ema")

library("mlr3")
library("mlr3learners")
set.seed(1)

fifa20 <- fifa[,5:42]
task_fifa = as_task_regr(fifa20, target = "value_eur", id = "fifa20")

learner = lrn("regr.ranger")
learner$train(task_fifa)
learner$model
Ranger result

Call:
 ranger::ranger(dependent.variable.name = task$target_names, data = task$data(),      case.weights = task$weights$weight, num.threads = 1L) 

Type:                             Regression 
Number of trees:                  500 
Sample size:                      5000 
Number of independent variables:  37 
Mtry:                             6 
Target node size:                 5 
Variable importance mode:         none 
Splitrule:                        variance 
OOB prediction error (MSE):       1.022805e+13 
R squared (OOB):                  0.869943 
  1. Use the permutation importance method to calculate variable importance ranking. Which variable is the most important? Is it surprising?

With iml

library(iml)
model = Predictor$new(learner,
                data = fifa20,
                y = fifa$value_eur)

effect = FeatureImp$new(model,
                loss = "rmse")
effect$plot()

With DALEX

library("DALEX")
ranger_exp = DALEX::explain(learner,
  data = fifa20,
  y = fifa$value_eur,
  label = "Fifa 2020",
  verbose = FALSE)

ranger_effect = model_parts(ranger_exp, B = 5)
head(ranger_effect)
             variable mean_dropout_loss     label
1        _full_model_           1402526 Fifa 2020
2           value_eur           1402526 Fifa 2020
3           weight_kg           1471865 Fifa 2020
4 goalkeeping_kicking           1472795 Fifa 2020
5           height_cm           1474859 Fifa 2020
6    movement_balance           1475618 Fifa 2020
plot(ranger_effect)

  1. Use the Partial Dependence profile to draw the global behavior of the model for this variable. Is it aligned with your expectations?

With iml

num_features = c("movement_reactions", "skill_ball_control", "age")

effect = FeatureEffects$new(model)
plot(effect, features = num_features)

With DALEX

num_features = c("movement_reactions", "skill_ball_control", "age")

ranger_profiles = model_profile(ranger_exp, variables = num_features)
plot(ranger_profiles)

4 Choose one of the football players. You can choose some well-known striker (e.g. Robert Lewandowski) or a well-known goalkeeper (e.g. Manuel Neuer). The following tasks are worth repeating for several different choices.

player_1 <- fifa["R. Lewandowski", 5:42]
  1. For the selected footballer, calculate and plot the Shapley values. Which variable is locally the most important and has the strongest influence on the valuation of the footballer?

With iml

shapley = Shapley$new(model, x.interest = player_1)
plot(shapley)

With DALEX

ranger_shap = predict_parts(ranger_exp,
             new_observation = player_1,
             type = "shap", B = 1)
plot(ranger_shap, show_boxplots = FALSE)

  1. For the selected footballer, calculate the Ceteris Paribus / Individual Conditional Expectation profiles to draw the local behavior of the model for this variable. Is it different from the global behavior?

With DALEX

num_features = c("movement_reactions", "skill_ball_control", "age")

ranger_ceteris = predict_profile(ranger_exp, player_1)
plot(ranger_ceteris, variables = num_features) +
  ggtitle("Ceteris paribus for R. Lewandowski", " ")

A.9 Solutions to Chapter 11