5.3 Manual instantiation

If you want to compare multiple learners, you should use the same resampling per task to reduce the variance of the performance estimation. Until now, we have just passed a resampling strategy to resample(), without specifying the actual splits into training and test. Here, we manually instantiate the resampling:

resampling = mlr_resamplings$get("cv", param_vals = list(folds = 3L))
resampling$instantiate(task)
resampling$iters
## [1] 3
resampling$train_set(1)
##   [1]   1   5   8  11  17  25  27  30  31  33  34  37
##  [13]  41  42  44  47  54  56  63  74  75  84  86  89
##  [25]  90  91  92  97 101 102 103 108 114 118 121 122
##  [37] 123 128 129 130 131 132 133 136 137 140 142 146
##  [49] 147 150   2   4  14  15  22  29  35  43  45  46
##  [61]  50  51  53  59  61  62  64  66  67  68  69  70
##  [73]  73  76  77  78  79  80  81  83  88  93  95  98
##  [85]  99 100 105 109 110 111 116 119 120 124 126 127
##  [97] 134 141 145 149

If we now pass this instantiated object to resample, the pre-calculated training and test splits will be used for both learners:

learner1 = mlr_learners$get("classif.rpart") # simple classification tree
learner2 = mlr_learners$get("classif.featureless") # featureless learner, prediction majority class
rr1 = resample(task, learner1, resampling)
## INFO [mlr3] Running learner 'classif.rpart' on task 'iris' (iteration 1/3)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'iris' (iteration 2/3)'
## INFO [mlr3] Running learner 'classif.rpart' on task 'iris' (iteration 3/3)'
rr2 = resample(task, learner2, resampling)
## INFO [mlr3] Running learner 'classif.featureless' on task 'iris' (iteration 1/3)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'iris' (iteration 2/3)'
## INFO [mlr3] Running learner 'classif.featureless' on task 'iris' (iteration 3/3)'
setequal(rr1$experiment(1)$train_set, rr2$experiment(1)$train_set)
## [1] TRUE

We can also combine the created result objects into a BenchmarkResult (see below for an introduction to simple benchmarking):

bmr = rr1$combine(rr2)
bmr$aggregated(objects = FALSE)
##                hash resampling_id task_id
## 1: eb4e163b9b1d51cc            cv    iris
## 2: e1b3072b9c9d6114            cv    iris
##             learner_id classif.ce
## 1:       classif.rpart       0.06
## 2: classif.featureless       0.76