## 5.4 Modeling

The main purpose of a Graph is to build combined preprocessing and model fitting pipelines that can be used as mlr3 Learner. In the following we chain two preprocessing tasks

• mutate (creation of a new feature)
• filter (filtering the dataset)

and then chain a PO learner to train and predict on the modified dataset.

graph = mutate %>>% filter %>>%
mlr_pipeops$get("learner", learner = mlr_learners$get("classif.rpart"))

Until here we defined the main pipeline stored in Graph. Now we can train and predict the pipeline.

task = mlr_tasks$get("iris") graph$train(task)
#> $classif.rpart.output #> NULL graph$predict(task)
#> $classif.rpart.output #> <PredictionClassif> for 150 observations: #> row_id truth response #> 1: 1 setosa setosa #> 2: 2 setosa setosa #> 3: 3 setosa setosa #> --- #> 148: 148 virginica virginica #> 149: 149 virginica virginica #> 150: 150 virginica virginica Rather than calling $train() and $predict() manually, we can put the pipeline Graph into a GraphLearner object. A GraphLearner encapsulates the whole pipeline (including the preprocessing steps) and can be put into resample() or benchmark() . If you are familiar with the old mlr package, this is the equivalent of all the make*Wrapper() functions. The pipeline being encapsulated (here Graph ) must always produce a Prediction with its $predict() call, so it will probably contain at least one PipeOpLearner .

glrn$param_set$values$variance.filter.frac = 0.25 resample(task, glrn, "cv3") #> <ResampleResult> of 3 iterations #> Task: iris #> Learner: mutate.variance.classif.rpart ### 5.4.2 Tuning If you are unfamiliar with tuning in mlr3 yet, we recommend to take a look at the section about tuning first. Here we define a ParamSet for the “rpart” learner and the “variance” filter which should be optimized during tuning. library("paradox") ps = ParamSet$new(list(
ParamDbl$new("classif.rpart.cp", lower = 0, upper = 0.05), ParamDbl$new("variance.filter.frac", lower = 0.25, upper = 1)
))

After having defined the PerformanceEvaluator, a random search with 10 iterations is created. For the inner resampling, we are simply doing holdout (single split into train/test) to keep the runtimes reasonable.

library("mlr3tuning")
pe = PerformanceEvaluator$new(task, glrn, "holdout", "classif.ce", ps) tuner = TunerRandomSearch$new(pe, TerminatorEvaluations$new(10)) tuner$tune()

The tuning result can be inspected using the $tune_result() method. tuner$tune_result()
#> $performance #> classif.ce #> 0.08 #> #>$values
#> $values$mutate.mutation
#> named list()
#>
#> $values$mutate.env
#> <environment: R_GlobalEnv>
#>
#> $values$mutate.delete_originals
#> [1] FALSE
#>
#> $values$variance.filter.frac
#> [1] 0.5693
#>
#> $values$variance.na.rm
#> [1] TRUE
#>
#> $values$classif.rpart.cp
#> [1] 0.02956
#>
#> $values$classif.rpart.xval
#> [1] 0