Appendix A — Solutions to exercises
A.1 Solutions to Chapter 2
- Use the built in
sonar
task and theclassif.rpart
learner along with the partition function to train a model.
Once the model is trained, generate the predictions on the test set, define the performance measure (classif.ce
), and score the predictions.
classif.ce
0.2195122
- Generate a confusion matrix from the built in function.
Since the rows represent predictions (response) and the columns represent the ground truth values, the TP, FP, TN, and FN rates are as follows:
True Positive (TP) = 20
False Positive (FP) = 2
True Negative (TN) = 12
False Positive (FN) = 7
- Since in this case we want the model to predict the negative class more often, we will raise the threshold (note the
predict_type
for the learner must beprob
for this to work).
truth
response M R
M 14 4
R 8 15
One reason we might want the false positive rate to be lower than the false negative rate is if we felt it was worse for a positive prediction to be incorrect (meaning the true label was the negative label) than it was for a negative prediction to be incorrect (meaning the true label was the positive label).
A.2 Solutions to Chapter 3
- Use the
spam
task and 5-fold cross-validation to benchmark Random Forest (classif.ranger
), Logistic Regression (classif.log_reg
), and XGBoost (classif.xgboost
) with regards to AUC. Which learner appears to do best? How confident are you in your conclusion? How would you improve upon this?
grid = benchmark_grid(
tasks = tsk("spam"),
learners = lrns(c("classif.ranger", "classif.log_reg", "classif.xgboost"), predict_type = "prob"),
resamplings = rsmp("cv", folds = 5)
)
bmr = benchmark(grid)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
This is only a small example for a benchmark workflow, but without tuning (see Chapter 4), the results are naturally not suitable to make any broader statements about the superiority of either learner for this task.
- A colleague claims to have achieved a 93.1% classification accuracy using the
classif.rpart
learner on thepenguins_simple
task. You want to reproduce their results and ask them about their resampling strategy. They said they used 3-fold cross-validation, and they assigned rows using the task’srow_id
modulo 3 to generate three evenly sized folds. Reproduce their results using the custom CV strategy.
A.3 Solutions to Chapter 4
- Tune the
mtry
,sample.fraction
,num.trees
hyperparameters of a random forest model (regr.ranger
) on theMotor Trend
data set (mtcars
). Use a simple random search with 50 evaluations and select a suitable batch size. Evaluate with a 3-fold cross-validation and the root mean squared error.
set.seed(4)
learner = lrn("regr.ranger",
mtry.ratio = to_tune(0, 1),
sample.fraction = to_tune(1e-1, 1),
num.trees = to_tune(1, 2000)
)
instance = ti(
task = tsk("mtcars"),
learner = learner,
resampling = rsmp("cv", folds = 3),
measures = msr("regr.rmse"),
terminator = trm("evals", n_evals = 50)
)
tuner = tnr("random_search", batch_size = 10)
tuner$optimize(instance)
mtry.ratio sample.fraction num.trees learner_param_vals x_domain regr.rmse
1: 0.3558625 0.9108463 192 <list[4]> <list[3]> 2.757883
- Evaluate the performance of the model created in Question 1 with nested resampling. Use a holdout validation for the inner resampling and a 3-fold cross-validation for the outer resampling. Print the unbiased performance estimate of the model.
set.seed(4)
learner = lrn("regr.ranger",
mtry.ratio = to_tune(0, 1),
sample.fraction = to_tune(1e-1, 1),
num.trees = to_tune(1, 2000)
)
at = auto_tuner(
tuner = tnr("random_search", batch_size = 10),
learner = learner,
resampling = rsmp("holdout"),
measure = msr("regr.rmse"),
terminator = trm("evals", n_evals = 50)
)
task = tsk("mtcars")
outer_resampling = rsmp("cv", folds = 3)
rr = resample(task, at, outer_resampling, store_models = TRUE)
rr$aggregate()
regr.mse
12.16805
- Tune and benchmark an XGBoost model against a logistic regression
Spam
data set and determine which has the best Brier score. Use mlr3tuningspaces and nested resampling.
Loading required package: mlr3tuning
Loading required package: paradox
learner_xgboost = lts(lrn("classif.xgboost", predict_type = "prob"))
at_xgboost = auto_tuner(
tuner = tnr("random_search", batch_size = 1),
learner = learner_xgboost,
resampling = rsmp("cv", folds = 3),
measure = msr("classif.bbrier"),
term_evals = 2,
)
learner_logreg = lrn("classif.log_reg", predict_type = "prob")
at_logreg = auto_tuner(
tuner = tnr("random_search", batch_size = 1),
learner = learner_logreg,
resampling = rsmp("cv", folds = 3),
measure = msr("classif.bbrier"),
term_evals = 2,
)
task = tsk("spam")
outer_resampling = rsmp("cv", folds = 3)
design = benchmark_grid(
tasks = task,
learners = list(at_xgboost, at_logreg),
resamplings = outer_resampling
)
bmr = benchmark(design, store_models = TRUE)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
<BenchmarkResult> of 6 rows with 2 resampling runs
nr task_id learner_id resampling_id iters warnings errors
1 spam classif.xgboost.tuned cv 3 0 0
2 spam classif.log_reg.tuned cv 3 0 0
A.4 Solutions to Chapter 5
- Calculate a correlation filter on the
Motor Trend
data set (mtcars
).
library("mlr3verse")
filter = flt("correlation")
task = tsk("mtcars")
filter$calculate(task)
as.data.table(filter)
feature score
1: wt 0.8676594
2: cyl 0.8521620
3: disp 0.8475514
4: hp 0.7761684
5: drat 0.6811719
6: vs 0.6640389
7: am 0.5998324
8: carb 0.5509251
9: gear 0.4802848
10: qsec 0.4186840
- Use the filter from the first exercise to select the five best features in the
mtcars
data set.
[1] "cyl" "disp" "drat" "hp" "wt"
- Apply a backward selection to the
penguins
data set with a classification tree learner"classif.rpart"
and holdout resampling by the measure classification accuracy. Compare the results with those in Section 5.2.1.
Attaching package: 'mlr3fselect'
The following object is masked from 'package:mlr3tuning':
ContextEval
instance = fselect(
fselector = fs("sequential", strategy = "sbs"),
task = tsk("penguins"),
learner = lrn("classif.rpart"),
resampling = rsmp("holdout"),
measure = msr("classif.acc")
)
as.data.table(instance$result)[, .(bill_depth, bill_length, body_mass, classif.acc)]
bill_depth bill_length body_mass classif.acc
1: FALSE TRUE TRUE 0.9826087
[1] "bill_length" "body_mass" "island" "sex" "year"
Answer the following questions:
- Do the selected features differ?
Yes, the backward selection selects more features.
- Which feature selection method achieves a higher classification accuracy?
In this example, the backwards example performs slightly better, but this depends heavily on the random seed and could look different in another run.
- Are the accuracy values in b) directly comparable? If not, what has to be changed to make them comparable?
No, they are not comparable because the holdout sampling called with rsmp("holdout")
creates a different holdout set for the two runs. A fair comparison would create a single resampling instance and use it for both feature selections (see Chapter 3 for details):
resampling = rsmp("holdout")
resampling$instantiate(tsk("penguins"))
sfs = fselect(
fselector = fs("sequential", strategy = "sfs"),
task = tsk("penguins"),
learner = lrn("classif.rpart"),
resampling = resampling,
measure = msr("classif.acc")
)
sbs = fselect(
fselector = fs("sequential", strategy = "sbs"),
task = tsk("penguins"),
learner = lrn("classif.rpart"),
resampling = resampling,
measure = msr("classif.acc")
)
as.data.table(sfs$result)[, .(bill_depth, bill_length, body_mass, classif.acc)]
bill_depth bill_length body_mass classif.acc
1: TRUE TRUE FALSE 0.973913
bill_depth bill_length body_mass classif.acc
1: TRUE TRUE TRUE 0.973913
Alternatively, one could automate the feature selection and perform a benchmark between the two wrapped learners.
- Automate the feature selection as in Section 5.2.6 with the
spam
data set and a logistic regression learner ("classif.log_reg"
). Hint: Remember to calllibrary("mlr3learners")
for the logistic regression learner.
library("mlr3fselect")
library("mlr3learners")
at = auto_fselector(
fselector = fs("random_search"),
learner = lrn("classif.log_reg"),
resampling = rsmp("holdout"),
measure = msr("classif.acc"),
terminator = trm("evals", n_evals = 50)
)
grid = benchmark_grid(
task = tsk("spam"),
learner = list(at, lrn("classif.log_reg")),
resampling = rsmp("cv", folds = 3)
)
bmr = benchmark(grid)
aggr = bmr$aggregate(msrs(c("classif.acc", "time_train")))
as.data.table(aggr)[, .(learner_id, classif.acc, time_train)]
learner_id classif.acc time_train
1: classif.log_reg.fselector 0.8778337 5.802333
2: classif.log_reg 0.9256695 0.246000
A.5 Solutions to Chapter 6
A.6 Solutions to Chapter 8
- Run a benchmark experiment on the
german_credit
task with algorithms:featureless
,log_reg
,ranger
. Tune thefeatureless
model usingtunetreshold
andlearner_cv
. Use 2-fold CV and evaluate withmsr("classif.costs", costs = costs)
where you should make the parametercosts
so that the cost of a true positive is -10, the cost of a true negative is -1, the cost of a false positive is 2, and the cost of a false negative is 3. Useset.seed(11)
to make sure you get the same results as us. Are your results surprising?
library(mlr3verse)
set.seed(11)
costs = matrix(c(-10, 3, 2, -1), nrow = 2, dimnames =
list("Predicted Credit" = c("good", "bad"),
Truth = c("good", "bad")))
cost_measure = msr("classif.costs", costs = costs)
gr = po("learner_cv", lrn("classif.featureless", predict_type = "prob")) %>>%
po("tunethreshold", measure = cost_measure)
task = tsk("german_credit")
learners = list(as_learner(gr), lrn("classif.log_reg"), lrn("classif.ranger"))
bmr = benchmark(benchmark_grid(task, learners, rsmp("cv", folds = 2)))
bmr$aggregate(cost_measure)[, c(4, 7)]
learner_id classif.costs
1: classif.featureless.tunethreshold -6.400
2: classif.log_reg -5.420
3: classif.ranger -5.923
- Use the
probregr
pipeline to create a probabilistic regression model using: xgboost for theresponse
prediction, featureless learner for these
prediction, and assuming an Cauchy distribution. Train and predict ontask = tsk("mtcars"); split = partition(task)
. Evaluate your model with the logloss measure.
library(mlr3verse)
library(mlr3proba)
set.seed(11)
l = as_learner(ppl("probregr",
learner = lrn("regr.xgboost"),
learner_se = lrn("regr.featureless"),
dist = "Cauchy")
)
task = tsk("mtcars")
split = partition(task)
meas = msr("regr.logloss")
l$train(task, split$train)$predict(task, split$test)$score(meas)
regr.logloss
4.846335
- Train and predict a survival forest using
rfsrc
(frommlr3extralearners
). Run this experiment usingtask = tsk("rats"); split = partition(task)
. Evaluate your model with the RCLL measure.
library(mlr3verse)
library(mlr3proba)
library(mlr3extralearners)
set.seed(11)
task = tsk("rats")
split = partition(task)
lrn("surv.rfsrc")$
train(task, split$train)$
predict(task, split$test)$
score(msr("surv.rcll"))
surv.rcll
4.030926
- Estimate the density of the
tsk("precip")
data usinglogspline
(frommlr3extralearners
). Run this experiment usingtask = tsk("precip"); split = partition(task)
. Evaluate your model with the logloss measure.
library(mlr3verse)
library(mlr3proba)
library(mlr3extralearners)
set.seed(11)
task = tsk("precip")
split = partition(task)
lrn("dens.logspline")$
train(task, split$train)$
predict(task, split$test)$
score(msr("dens.logloss"))
dens.logloss
3.979233
- Run a benchmark clustering experiment on the
wine
dataset without a label column. Compare the performance ofk-means
learner withk
equal to 2, 3 and 4 using the silhouette measure. Use insample resampling technique. What value ofk
would you choose based on the silhouette scores?
library(mlr3)
library(mlr3cluster)
set.seed(11)
learners = list(
lrn("clust.kmeans", centers = 2L, id = "k-means, k=2"),
lrn("clust.kmeans", centers = 3L, id = "k-means, k=3"),
lrn("clust.kmeans", centers = 4L, id = "k-means, k=4")
)
task = as_task_clust(tsk("wine")$data()[, -1])
measure = msr("clust.silhouette")
bmr = benchmark(benchmark_grid(task, learners, rsmp("insample")))
bmr$aggregate(measure)[, c(4, 7)]
learner_id clust.silhouette
1: k-means, k=2 0.6568537
2: k-means, k=3 0.5711382
3: k-means, k=4 0.5605941
Based on the silhouette score, we can choose k = 2
.
- Run a (spatially) unbiased classification benchmark experiment on the
ecuador
task with a featureless learner and xgboost, evaluate with the binary Brier score.
You can use any resampling method from mlr3spatiotempcv
, in this solution we use 4-fold spatial envrionmental blocking.
Attaching package: 'mlr3spatiotempcv'
The following objects are masked from 'package:mlr3spatial':
as_task_classif_st, as_task_classif_st.data.frame,
as_task_classif_st.DataBackend, as_task_classif_st.sf,
as_task_classif_st.TaskClassifST, as_task_regr_st,
as_task_regr_st.data.frame, as_task_regr_st.DataBackend,
as_task_regr_st.sf, as_task_regr_st.TaskClassifST,
as_task_regr_st.TaskRegrST, TaskClassifST, TaskRegrST
set.seed(11)
learners = lrns(paste0("classif.", c("xgboost", "featureless")),
predict_type = "prob")
rsmp_sp = rsmp("spcv_env", folds = 4)
design = benchmark_grid(tsk("ecuador"), learners, rsmp_sp)
bmr = benchmark(design)
bmr$aggregate(msr("classif.bbrier"))[, c(4, 7)]
learner_id classif.bbrier
1: classif.xgboost 0.2302815
2: classif.featureless 0.3838972
A.7 Solutions to Chapter 9
A.8 Solutions to Chapter 10
- Prepare a
mlr3
regression task forfifa
data. Select only variables describing the age and skills of footballers. Train any predictive model for this task, e.g.regr.ranger
.
library("DALEX")
library("ggplot2")
data("fifa", package = "DALEX")
old_theme = set_theme_dalex("ema")
library("mlr3")
library("mlr3learners")
set.seed(1)
fifa20 <- fifa[,5:42]
task_fifa = as_task_regr(fifa20, target = "value_eur", id = "fifa20")
learner = lrn("regr.ranger")
learner$train(task_fifa)
learner$model
Ranger result
Call:
ranger::ranger(dependent.variable.name = task$target_names, data = task$data(), case.weights = task$weights$weight, num.threads = 1L)
Type: Regression
Number of trees: 500
Sample size: 5000
Number of independent variables: 37
Mtry: 6
Target node size: 5
Variable importance mode: none
Splitrule: variance
OOB prediction error (MSE): 1.022805e+13
R squared (OOB): 0.869943
- Use the permutation importance method to calculate variable importance ranking. Which variable is the most important? Is it surprising?
With iml
library(iml)
model = Predictor$new(learner,
data = fifa20,
y = fifa$value_eur)
effect = FeatureImp$new(model,
loss = "rmse")
effect$plot()
With DALEX
library("DALEX")
ranger_exp = DALEX::explain(learner,
data = fifa20,
y = fifa$value_eur,
label = "Fifa 2020",
verbose = FALSE)
ranger_effect = model_parts(ranger_exp, B = 5)
head(ranger_effect)
variable mean_dropout_loss label
1 _full_model_ 1402526 Fifa 2020
2 value_eur 1402526 Fifa 2020
3 weight_kg 1471865 Fifa 2020
4 goalkeeping_kicking 1472795 Fifa 2020
5 height_cm 1474859 Fifa 2020
6 movement_balance 1475618 Fifa 2020
- Use the Partial Dependence profile to draw the global behavior of the model for this variable. Is it aligned with your expectations?
With iml
num_features = c("movement_reactions", "skill_ball_control", "age")
effect = FeatureEffects$new(model)
plot(effect, features = num_features)
With DALEX
num_features = c("movement_reactions", "skill_ball_control", "age")
ranger_profiles = model_profile(ranger_exp, variables = num_features)
plot(ranger_profiles)
4 Choose one of the football players. You can choose some well-known striker (e.g. Robert Lewandowski) or a well-known goalkeeper (e.g. Manuel Neuer). The following tasks are worth repeating for several different choices.
- For the selected footballer, calculate and plot the Shapley values. Which variable is locally the most important and has the strongest influence on the valuation of the footballer?
With iml
With DALEX
ranger_shap = predict_parts(ranger_exp,
new_observation = player_1,
type = "shap", B = 1)
plot(ranger_shap, show_boxplots = FALSE)
- For the selected footballer, calculate the Ceteris Paribus / Individual Conditional Expectation profiles to draw the local behavior of the model for this variable. Is it different from the global behavior?
With DALEX