17  Modeling

The main purpose of a Graph is to build combined preprocessing and model fitting pipelines that can be used as mlr3 Learner.

Conceptually, the process may be summarized as follows:

In the following we chain two preprocessing tasks:

Subsequently one can chain a PO learner to train and predict on the modified dataset.

mutate = po("mutate")
filter = po("filter",
  filter = mlr3filters::flt("variance"),
  param_vals = list(filter.frac = 0.5))

graph = mutate %>>%
  filter %>>%
  po("learner",
    learner = lrn("classif.rpart"))

Until here we defined the main pipeline stored in Graph. Now we can train and predict the pipeline:

task = tsk("iris")
graph$train(task)
$classif.rpart.output
NULL
graph$predict(task)
$classif.rpart.output
<PredictionClassif> for 150 observations:
    row_ids     truth  response
          1    setosa    setosa
          2    setosa    setosa
          3    setosa    setosa
---                            
        148 virginica virginica
        149 virginica virginica
        150 virginica virginica

Rather than calling $train() and $predict() manually, we can put the pipeline Graph into a GraphLearner object. A GraphLearner encapsulates the whole pipeline (including the preprocessing steps) and can be put into resample() or benchmark() . If you are familiar with the old mlr package, this is the equivalent of all the make*Wrapper() functions. The pipeline being encapsulated (here Graph) must always produce a Prediction with its $predict() call, so it will probably contain at least one PipeOpLearner .

glrn = as_learner(graph)

This learner can be used for model fitting, resampling, benchmarking, and tuning:

cv3 = rsmp("cv", folds = 3)
resample(task, glrn, cv3)
INFO  [21:35:51.489] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 2/3) 
INFO  [21:35:51.638] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 3/3) 
INFO  [21:35:51.760] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/3) 
<ResampleResult> of 3 iterations
* Task: iris
* Learner: mutate.variance.classif.rpart
* Warnings: 0 in 0 iterations
* Errors: 0 in 0 iterations

17.1 Setting Hyperparameters

Individual POs offer hyperparameters because they contain $param_set slots that can be read and written from $param_set$values (via the paradox package). The parameters get passed down to the Graph, and finally to the GraphLearner . This makes it not only possible to easily change the behavior of a Graph / GraphLearner and try different settings manually, but also to perform tuning using the mlr3tuning package.

glrn$param_set$values$variance.filter.frac = 0.25
cv3 = rsmp("cv", folds = 3)
resample(task, glrn, cv3)
INFO  [21:35:52.012] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 3/3) 
INFO  [21:35:52.124] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/3) 
INFO  [21:35:52.237] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 2/3) 
<ResampleResult> of 3 iterations
* Task: iris
* Learner: mutate.variance.classif.rpart
* Warnings: 0 in 0 iterations
* Errors: 0 in 0 iterations

17.2 Tuning

If you are unfamiliar with tuning in mlr3, we recommend to take a look at the section about tuning first. Here we define a ParamSet for the “rpart” learner and the “variance” filter which should be optimized during the tuning process.

library("paradox")
ps = ps(
  classif.rpart.cp = p_dbl(lower = 0, upper = 0.05),
  variance.filter.frac = p_dbl(lower = 0.25, upper = 1)
)

After having defined the Tuner, a random search with 10 iterations is created. For the inner resampling, we are simply using holdout (single split into train/test) to keep the runtimes reasonable.

library("mlr3tuning")
instance = TuningInstanceSingleCrit$new(
  task = task,
  learner = glrn,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
  search_space = ps,
  terminator = trm("evals", n_evals = 20)
)
tuner = tnr("random_search")
tuner$optimize(instance)
INFO  [21:35:52.636] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=20, k=0]' 
INFO  [21:35:52.655] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:52.706] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:52.713] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:52.822] [mlr3] Finished benchmark 
INFO  [21:35:52.864] [bbotk] Result of batch 1: 
INFO  [21:35:52.866] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:52.866] [bbotk]        0.02823251            0.7778568       0.02        0      0 
INFO  [21:35:52.866] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:52.866] [bbotk]             0.101 a312ef03-d056-44ee-9778-b7db2b942c3f 
INFO  [21:35:52.870] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:52.908] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:52.915] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:53.034] [mlr3] Finished benchmark 
INFO  [21:35:53.070] [bbotk] Result of batch 2: 
INFO  [21:35:53.072] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:53.072] [bbotk]        0.01596592            0.7346384       0.02        0      0 
INFO  [21:35:53.072] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:53.072] [bbotk]             0.111 6f8c5594-c404-4ed4-aa54-42422d7abb90 
INFO  [21:35:53.077] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:53.115] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:53.122] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:53.240] [mlr3] Finished benchmark 
INFO  [21:35:53.277] [bbotk] Result of batch 3: 
INFO  [21:35:53.279] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:53.279] [bbotk]        0.03131324            0.8840266       0.02        0      0 
INFO  [21:35:53.279] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:53.279] [bbotk]              0.11 d0e41a64-a54c-4508-a511-8eacc0bd2f5d 
INFO  [21:35:53.284] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:53.323] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:53.336] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:53.451] [mlr3] Finished benchmark 
INFO  [21:35:53.488] [bbotk] Result of batch 4: 
INFO  [21:35:53.490] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:53.490] [bbotk]        0.02831327            0.4730817       0.08        0      0 
INFO  [21:35:53.490] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:53.490] [bbotk]             0.106 019bad7b-bbd8-437e-855f-50358d4229f6 
INFO  [21:35:53.494] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:53.537] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:53.544] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:53.659] [mlr3] Finished benchmark 
INFO  [21:35:53.700] [bbotk] Result of batch 5: 
INFO  [21:35:53.702] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:53.702] [bbotk]       0.005338823            0.4295558       0.08        0      0 
INFO  [21:35:53.702] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:53.702] [bbotk]             0.107 5efa091f-24eb-4617-9575-900c354ef37d 
INFO  [21:35:53.707] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:53.744] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:53.751] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:53.867] [mlr3] Finished benchmark 
INFO  [21:35:53.906] [bbotk] Result of batch 6: 
INFO  [21:35:53.908] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:53.908] [bbotk]        0.03536951            0.4277043       0.08        0      0 
INFO  [21:35:53.908] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:53.908] [bbotk]             0.109 9baf622a-5b72-4bb9-9f7e-febcd1886ef4 
INFO  [21:35:53.913] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:53.953] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:53.960] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:54.080] [mlr3] Finished benchmark 
INFO  [21:35:54.120] [bbotk] Result of batch 7: 
INFO  [21:35:54.122] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:54.122] [bbotk]        0.04395021            0.2684455       0.08        0      0 
INFO  [21:35:54.122] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:54.122] [bbotk]             0.112 b88b9785-f22f-4ef5-9411-ff1da663f01d 
INFO  [21:35:54.127] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:54.172] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:54.180] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:54.292] [mlr3] Finished benchmark 
INFO  [21:35:54.337] [bbotk] Result of batch 8: 
INFO  [21:35:54.340] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:54.340] [bbotk]        0.03984967            0.6770771       0.02        0      0 
INFO  [21:35:54.340] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:54.340] [bbotk]             0.105 9017fc50-2e6d-4508-a998-d9b672642f3a 
INFO  [21:35:54.344] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:54.385] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:54.392] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:54.509] [mlr3] Finished benchmark 
INFO  [21:35:54.548] [bbotk] Result of batch 9: 
INFO  [21:35:54.550] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:54.550] [bbotk]        0.01353304            0.3857355       0.08        0      0 
INFO  [21:35:54.550] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:54.550] [bbotk]             0.108 23cf5d63-e06d-48ce-b7be-05ad103e07b7 
INFO  [21:35:54.554] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:54.598] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:54.605] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:54.728] [mlr3] Finished benchmark 
INFO  [21:35:54.766] [bbotk] Result of batch 10: 
INFO  [21:35:54.769] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:54.769] [bbotk]       0.009084786            0.5108698       0.08        0      0 
INFO  [21:35:54.769] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:54.769] [bbotk]             0.108 6110888e-a91c-4f0d-b471-5fa0f5d40a0a 
INFO  [21:35:54.773] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:54.825] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:54.832] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:54.952] [mlr3] Finished benchmark 
INFO  [21:35:54.991] [bbotk] Result of batch 11: 
INFO  [21:35:54.993] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:54.993] [bbotk]        0.03182164            0.3101431       0.08        0      0 
INFO  [21:35:54.993] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:54.993] [bbotk]             0.112 4e61bcd2-0f0f-43f9-83fb-efe25a1d36ef 
INFO  [21:35:54.998] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:55.037] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:55.044] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:55.172] [mlr3] Finished benchmark 
INFO  [21:35:55.212] [bbotk] Result of batch 12: 
INFO  [21:35:55.215] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:55.215] [bbotk]        0.01005369            0.4117315       0.08        0      0 
INFO  [21:35:55.215] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:55.215] [bbotk]             0.121 37197c41-11e6-4dff-aa3b-5e19a7aa26ff 
INFO  [21:35:55.220] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:55.267] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:55.275] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:55.396] [mlr3] Finished benchmark 
INFO  [21:35:55.441] [bbotk] Result of batch 13: 
INFO  [21:35:55.444] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:55.444] [bbotk]       0.001490258            0.4092659       0.08        0      0 
INFO  [21:35:55.444] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:55.444] [bbotk]             0.114 0442845d-efaf-4d17-9e82-f8b9cebf3239 
INFO  [21:35:55.448] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:55.489] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:55.496] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:55.621] [mlr3] Finished benchmark 
INFO  [21:35:55.661] [bbotk] Result of batch 14: 
INFO  [21:35:55.664] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:55.664] [bbotk]        0.04983528            0.3246353       0.08        0      0 
INFO  [21:35:55.664] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:55.664] [bbotk]             0.116 1bad5079-2803-4bba-8231-2cc9d7c3afea 
INFO  [21:35:55.668] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:55.748] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:55.756] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:55.870] [mlr3] Finished benchmark 
INFO  [21:35:55.909] [bbotk] Result of batch 15: 
INFO  [21:35:55.912] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:55.912] [bbotk]         0.0116094            0.9499791       0.02        0      0 
INFO  [21:35:55.912] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:55.912] [bbotk]             0.106 74905143-9b32-4241-9620-c8c76e61b933 
INFO  [21:35:55.921] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:55.966] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:55.973] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:56.097] [mlr3] Finished benchmark 
INFO  [21:35:56.140] [bbotk] Result of batch 16: 
INFO  [21:35:56.143] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:56.143] [bbotk]        0.02373585            0.7983819       0.02        0      0 
INFO  [21:35:56.143] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:56.143] [bbotk]             0.115 13d772ae-c58d-4ca1-9162-978778ec5588 
INFO  [21:35:56.147] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:56.187] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:56.195] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:56.321] [mlr3] Finished benchmark 
INFO  [21:35:56.362] [bbotk] Result of batch 17: 
INFO  [21:35:56.364] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:56.364] [bbotk]         0.0287812            0.4055964       0.08        0      0 
INFO  [21:35:56.364] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:56.364] [bbotk]             0.119 2d90f4c0-a9a2-46dd-bd21-efec2a95fb23 
INFO  [21:35:56.369] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:56.409] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:56.416] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:56.535] [mlr3] Finished benchmark 
INFO  [21:35:56.572] [bbotk] Result of batch 18: 
INFO  [21:35:56.574] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:56.574] [bbotk]       0.009408437            0.8256714       0.02        0      0 
INFO  [21:35:56.574] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:56.574] [bbotk]             0.111 81557b50-bd9e-49eb-957a-65a58ea1a2cc 
INFO  [21:35:56.578] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:56.627] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:56.634] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:56.747] [mlr3] Finished benchmark 
INFO  [21:35:56.795] [bbotk] Result of batch 19: 
INFO  [21:35:56.798] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:56.798] [bbotk]        0.04700366            0.3028588       0.08        0      0 
INFO  [21:35:56.798] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:56.798] [bbotk]             0.104 63494d62-034f-4c92-a34d-3abc25f97d83 
INFO  [21:35:56.803] [bbotk] Evaluating 1 configuration(s) 
INFO  [21:35:56.842] [mlr3] Running benchmark with 1 resampling iterations 
INFO  [21:35:56.849] [mlr3] Applying learner 'mutate.variance.classif.rpart' on task 'iris' (iter 1/1) 
INFO  [21:35:56.977] [mlr3] Finished benchmark 
INFO  [21:35:57.019] [bbotk] Result of batch 20: 
INFO  [21:35:57.021] [bbotk]  classif.rpart.cp variance.filter.frac classif.ce warnings errors 
INFO  [21:35:57.021] [bbotk]        0.04730462            0.6231299       0.08        0      0 
INFO  [21:35:57.021] [bbotk]  runtime_learners                                uhash 
INFO  [21:35:57.021] [bbotk]             0.118 f13d6743-1244-4f87-9593-7b9f8449ec81 
INFO  [21:35:57.035] [bbotk] Finished optimizing after 20 evaluation(s) 
INFO  [21:35:57.036] [bbotk] Result: 
INFO  [21:35:57.038] [bbotk]  classif.rpart.cp variance.filter.frac learner_param_vals  x_domain classif.ce 
INFO  [21:35:57.038] [bbotk]        0.02823251            0.7778568          <list[5]> <list[2]>       0.02 
   classif.rpart.cp variance.filter.frac learner_param_vals  x_domain
1:       0.02823251            0.7778568          <list[5]> <list[2]>
   classif.ce
1:       0.02

The tuning result can be found in the respective result slots.

instance$result_learner_param_vals
$mutate.mutation
list()

$mutate.delete_originals
[1] FALSE

$variance.filter.frac
[1] 0.7778568

$classif.rpart.xval
[1] 0

$classif.rpart.cp
[1] 0.02823251
instance$result_y
classif.ce 
      0.02