21  Parallelization

Parallelization refers to the process of running multiple jobs in parallel, simultaneously. This process allows for significant savings in computing power. We distinguish between implicit parallelism and explicit parallelism.

21.1 Implicit Parallelization

We talk about implicit parallelization in this context if we call external code (i.e., code from foreign CRAN packages) which runs in parallel. Many machine learning algorithms can parallelize their model fit using threading, e.g. ranger or xgboost. Unfortunately, threading conflicts with certain parallel backends used during explicit parallelization, causing the system to be overutilized in the best case and causing hangs or segfaults in the worst case. For this reason, we introduced the convention that implicit parallelization is turned off in the defaults, but can be enabled again via a hyperparameter which is tagged with the label "threads".


learner = lrn("classif.ranger")
learner$param_set$ids(tags = "threads")
[1] "num.threads"

To enable the parallelization for this learner, we simply can call the helper function set_threads():

# set to use 4 CPUs
set_threads(learner, n = 4)
* Model: -
* Parameters: num.threads=4
* Packages: mlr3, mlr3learners, ranger
* Predict Type: response
* Feature types: logical, integer, numeric, character, factor, ordered
* Properties: hotstart_backward, importance, multiclass, oob_error,
  twoclass, weights
# auto-detect cores on the local machine
* Model: -
* Parameters: num.threads=2
* Packages: mlr3, mlr3learners, ranger
* Predict Type: response
* Feature types: logical, integer, numeric, character, factor, ordered
* Properties: hotstart_backward, importance, multiclass, oob_error,
  twoclass, weights

This also works for filters from mlr3filters and lists of objects, even if some objects do not support threading at all:

# retrieve 2 filters
# * variance filter with no support for threading
# * mrmr filter with threading support
filters = flts(c("variance", "mrmr"))

# set threads for all filters which support it
set_threads(filters, n = 4)
Task Types: NA
Task Properties: -
Packages: mlr3filters, stats
Feature types: integer, numeric

Task Types: classif, regr
Task Properties: -
Packages: mlr3filters, praznik
Feature types: integer, numeric, factor, ordered
# variance filter is unchanged
      id    class lower upper nlevels default value
1: na.rm ParamLgl    NA    NA       2    TRUE      
# mrmr now works in parallel with 4 cores
        id    class lower upper nlevels default value
1: threads ParamInt     0   Inf     Inf       0     4

21.2 Explicit Parallelization

We talk about explicit parallelization here if mlr3 starts the parallelization itself. The abstraction implemented in future is used to support a broad range of parallel backends. There are two use cases where mlr3 calls future: resample() and benchmark(). During resampling, all resampling iterations can be executed in parallelization. The same holds for benchmarking, where additionally all combinations in the provided design are also independent. These loops are performed by future using the parallel backend configured with future::plan(). Extension packages like mlr3tuning internally call benchmark() during tuning and thus work in parallel, too.

In this section, we will use the spam task and a simple classification tree to showcase the explicit parallelization. In this example, the future::multisession parallel backend is selected which should work on all systems.

# select the multisession backend

task = tsk("spam")
learner = lrn("classif.rpart")
resampling = rsmp("subsampling")

time = Sys.time()
resample(task, learner, resampling)
Sys.time() - time

By default, all CPUs of your machine are used unless you specify argument workers in future::plan().

On most systems you should see a decrease in the reported elapsed time, but in practice you cannot expect the runtime to fall linearly as the number of cores increases (Amdahl’s law). Depending on the parallel backend, the technical overhead for starting workers, communicating objects, sending back results and shutting down the workers can be quite large. Therefore, it is advised to only enable parallelization for resamplings where each iteration runs at least some seconds.

If you are transitioning from mlr, you might be used to selecting different parallelization levels, e.g. for resampling, benchmarking or tuning. In mlr3 this is no longer required (except for nested resampling, briefly described in the following section). All kind of events are rolled out on the same level. Therefore, there is no need to decide whether you want to parallelize the tuning OR the resampling.

Just lean back and let the machine do the work :-)


During tuning with mlr3tuning, you can often adjust the batch size of the Tuner, i.e., control how many hyperparameter configurations are evaluated in parallel. If you want full parallelization, make sure that the batch size multiplied by number of (inner) resampling iterations is at least equal to the number of cores or workers.

In general, larger batches mean more parallelization, while smaller batches imply a more fine-grained checking of termination criteria. We default to a batch_size of 1 which ensures that all Terminators work as intended and you cannot exceed the computational budget.

21.3 Nested Resampling Parallelization

Nested resampling results in two nested resampling loops. We can choose different parallelization backends for the inner and outer resampling loop, respectively. We just have to pass a list of future backends:

# Runs the outer loop in parallel and the inner loop sequentially
future::plan(list("multisession", "sequential"))
# Runs the outer loop sequentially and the inner loop in parallel
future::plan(list("sequential", "multisession"))

While nesting real parallelization backends is often unintended and causes unnecessary overhead, it is useful in some distributed computing setups. It can be achieved with future by forcing a fixed number of workers for each loop:

# Runs both loops in parallel
future::plan(list(future::tweak("multisession", workers = 2),
  future::tweak("multisession", workers = 4)))

This example would run on 8 cores (= 2 * 4) on the local machine. The vignette of the future package gives more insight into nested parallelization.