5.1 Settings

In this example we use the iris task and a simple classification tree (package rpart).

task = mlr_tasks$get("iris")
learner = mlr_learners$get("classif.rpart")

When performing resampling with a dataset, we first need to define which approach should be used. The resampling strategies of mlr3 can be queried using the .$keys() function of the mlr_resampling dictionary.

mlr_resamplings$keys()
## [1] "bootstrap"   "custom"      "cv"          "cv3"         "holdout"    
## [6] "repeated_cv" "subsampling"

Additional resampling methods for special use cases will be available via extension packages, such as mlr3spatiotemporal for spatial data (still in development).

The experiment conducted in the train/predict/score chapter is equivalent to “holdout”, so let’s consider this one first.

resampling = mlr_resamplings$get("holdout")
print(resampling)
## <ResamplingHoldout> with 1 iterations
## Instantiated: FALSE
## Parameters: ratio=0.6667
## 
## Public: clone, duplicated_ids, format, hash, id, instance,
##   instantiate, is_instantiated, iters, param_set, task_hash, test_set,
##   train_set
print(resampling$param_set$values)
## $ratio
## [1] 0.6667

Note that the Instantianated field is set to FALSE. This means we did not actually apply the strategy on a dataset yet but just performed a dry-run. Applying the strategy on a dataset is done in section next Instantation.

By default we get a .66/.33 split of the data. There are two ways how the ratio can be changed:

  1. Overwriting the slot in .$param_set$values using a named list.
resampling$param_set$values = list(ratio = 0.8)
  1. Specifying the resampling parameters directly during creation using the param_vals argument:
mlr_resamplings$get("holdout", param_vals = list(ratio = 0.8))
## <ResamplingHoldout> with 1 iterations
## Instantiated: FALSE
## Parameters: ratio=0.8
## 
## Public: clone, duplicated_ids, format, hash, id, instance,
##   instantiate, is_instantiated, iters, param_set, task_hash, test_set,
##   train_set