5.2 Instantation

So far we just set the stage and selected the resampling strategy. To actually perform the splitting, we need to apply the settings on a dataset. This can be done in two ways:

  1. Manually by calling the method .$instantiate() on a Task
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 3L))
resampling$instantiate(task)
resampling$iters
## [1] 3
resampling$train_set(1)
##   [1]   3   5   7   9  10  12  16  23  26  30  35  36  37  38  42  43  49  52
##  [19]  54  55  63  65  66  67  68  70  71  73  74  78  79  80  83  87  90  98
##  [37]  99 103 109 112 114 116 123 124 126 129 130 137 145 149   1   6  11  13
##  [55]  15  17  22  24  25  28  31  34  39  40  41  44  45  47  53  56  58  59
##  [73]  62  64  85  88  93  96 101 105 106 107 108 110 111 113 115 119 120 122
##  [91] 128 131 132 133 134 136 141 142 144 147
  1. Automatically by passing the resampling object to resample(). Here, the splitting is done within the resample() call based on the supplied Task.
learner1 = mlr_learners$get("classif.rpart") # simple classification tree
learner2 = mlr_learners$get("classif.featureless") # featureless learner, prediction majority class
rr1 = resample(task, learner1, resampling)
rr2 = resample(task, learner2, resampling)

setequal(rr1$experiment(1)$train_set, rr2$experiment(1)$train_set)
## [1] TRUE

If you want to compare multiple learners, you should use the same resampling per task to reduce the variance of the performance estimation (method 1).
If you use method 2, the resampling splits will differ between both runs.

If you aim is to compare different Task, Learner or Resampling, you are better off using the benchmark() function. It is basically a wrapper around resample() simplifying the handling of multiple settings.

If you discover this only after you’ve run multiple resample() calls, don’t worry - you can transform multiple single ResampleResult objects into a BenchmarkResult using the .$combine() function.

bmr = rr1$combine(rr2)
bmr$aggregated(objects = FALSE)
##                hash  resample_result task_id          learner_id resampling_id
## 1: 2de3c264b559f327 <ResampleResult>    iris       classif.rpart            cv
## 2: 7858635b4c3b525d <ResampleResult>    iris classif.featureless            cv
##    classif.ce
## 1:     0.0600
## 2:     0.7467