6.1 Benchmarking Exhaustive Designs

The interface of the benchmark() function accepts a design of tasks, learners, and resampling strategies as data frame.

Here, we call benchmark() to perform a single holdout split on a single task and two learners:

library(data.table)
design = data.table(
  task = mlr_tasks$mget("iris"),
  learner = mlr_learners$mget(c("classif.rpart", "classif.featureless")),
  resampling = mlr_resamplings$mget("holdout")
)
print(design)
##             task                     learner
## 1: <TaskClassif>       <LearnerClassifRpart>
## 2: <TaskClassif> <LearnerClassifFeatureless>
##             resampling
## 1: <ResamplingHoldout>
## 2: <ResamplingHoldout>
bmr = benchmark(design)
## INFO [mlr3] Benchmarking 2 experiments
## INFO [mlr3] Running learner 'classif.rpart' on task 'iris' (iteration 1/1)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'iris' (iteration 1/1)'
## INFO [mlr3] Finished benchmark

Note that the holdout splits have been automatically instantiated for each row of the design. As a result, the rpart learner used a different training set than the featureless learner. However, for comparison of learners you usually want the learners to see the same splits into train and test sets. To overcome this issue, the resampling strategy needs to be manually instantiated before creating the design.

While the interface of benchmark() allows full flexibility, the creation of such design tables can be tedious. Therefore, mlr3 provides a helper function to quickly generate design tables and instantiate resampling strategies in an exhaustive grid fashion: mlr3::expand_grid().

# get some example tasks
tasks = mlr_tasks$mget(c("pima", "sonar", "spam"))

# set measures for all tasks: accuracy (acc) and area under the curve (auc)
measures = mlr_measures$mget(c("classif.acc", "classif.auc"))
tasks = lapply(tasks, function(task) { task$measures = measures; task })

# get a featureless learner and a classification tree
learners = mlr_learners$mget(c("classif.featureless", "classif.rpart"))

# let the learners predict probabilities instead of class labels (required for AUC measure)
learners$classif.featureless$predict_type = "prob"
learners$classif.rpart$predict_type = "prob"

# compare via 10-fold cross validation
resamplings = mlr_resamplings$mget("cv")

# create a BenchmarkResult object
design = expand_grid(tasks, learners, resamplings)
print(design)
##             task                     learner
## 1: <TaskClassif> <LearnerClassifFeatureless>
## 2: <TaskClassif>       <LearnerClassifRpart>
## 3: <TaskClassif> <LearnerClassifFeatureless>
## 4: <TaskClassif>       <LearnerClassifRpart>
## 5: <TaskClassif> <LearnerClassifFeatureless>
## 6: <TaskClassif>       <LearnerClassifRpart>
##        resampling
## 1: <ResamplingCV>
## 2: <ResamplingCV>
## 3: <ResamplingCV>
## 4: <ResamplingCV>
## 5: <ResamplingCV>
## 6: <ResamplingCV>
bmr = benchmark(design)
## INFO [mlr3] Benchmarking 60 experiments
## INFO [mlr3] Running learner 'classif.featureless' on task 'pima' (iteration 1/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'pima' (iteration 2/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'pima' (iteration 3/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'pima' (iteration 4/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'pima' (iteration 5/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'pima' (iteration 6/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'pima' (iteration 7/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'pima' (iteration 8/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'pima' (iteration 9/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'pima' (iteration 10/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'pima' (iteration 1/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'pima' (iteration 2/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'pima' (iteration 3/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'pima' (iteration 4/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'pima' (iteration 5/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'pima' (iteration 6/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'pima' (iteration 7/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'pima' (iteration 8/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'pima' (iteration 9/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'pima' (iteration 10/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'sonar' (iteration 1/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'sonar' (iteration 2/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'sonar' (iteration 3/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'sonar' (iteration 4/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'sonar' (iteration 5/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'sonar' (iteration 6/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'sonar' (iteration 7/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'sonar' (iteration 8/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'sonar' (iteration 9/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'sonar' (iteration 10/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'sonar' (iteration 1/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'sonar' (iteration 2/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'sonar' (iteration 3/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'sonar' (iteration 4/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'sonar' (iteration 5/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'sonar' (iteration 6/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'sonar' (iteration 7/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'sonar' (iteration 8/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'sonar' (iteration 9/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'sonar' (iteration 10/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'spam' (iteration 1/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'spam' (iteration 2/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'spam' (iteration 3/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'spam' (iteration 4/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'spam' (iteration 5/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'spam' (iteration 6/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'spam' (iteration 7/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'spam' (iteration 8/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'spam' (iteration 9/10)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'spam' (iteration 10/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'spam' (iteration 1/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'spam' (iteration 2/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'spam' (iteration 3/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'spam' (iteration 4/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'spam' (iteration 5/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'spam' (iteration 6/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'spam' (iteration 7/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'spam' (iteration 8/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'spam' (iteration 9/10)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'spam' (iteration 10/10)'
## INFO [mlr3] Finished benchmark

The aggregated resampling results can be accessed with:

bmr$aggregated(objects = FALSE)
##                hash resampling_id task_id
## 1: 556e9c6c2cc60ff8            cv    pima
## 2: e5152a2b19c540cc            cv    pima
## 3: e94a5e8f7c2d27ee            cv   sonar
## 4: 7d6cf0500aba346b            cv   sonar
## 5: 76c3d2322d6d13aa            cv    spam
## 6: 7f572971e3439abb            cv    spam
##             learner_id classif.acc classif.auc
## 1: classif.featureless      0.6511      0.5000
## 2:       classif.rpart      0.7578      0.8037
## 3: classif.featureless      0.5343      0.5000
## 4:       classif.rpart      0.6731      0.7403
## 5: classif.featureless      0.6059      0.5000
## 6:       classif.rpart      0.8946      0.8971

We can aggregate it further, i.e. if we are interested which learner performed best over all tasks:

bmr$aggregated(objects = FALSE)[, list(acc = mean(classif.acc), auc = mean(classif.auc)), by = "learner_id"]
##             learner_id    acc    auc
## 1: classif.featureless 0.5971 0.5000
## 2:       classif.rpart 0.7751 0.8137

Unsurprisingly, the classification tree outperformed the featureless learner.