Welcome to the Machine Learning in R universe (mlr3verse)! Before we begin, make sure you have installed `mlr3` if you want to follow along. We recommend installing the complete `mlr3verse`, which will install all of the important packages.

``install.packages("mlr3verse")``

Or you can install just the base package:

``install.packages("mlr3")``

In our first example, we will show you some of the most basic functionality – training a model and making predictions.

``````library(mlr3)
learner = lrn("classif.rpart")

learner\$model``````
``````n= 231

node), split, n, loss, yval, (yprob)
* denotes terminal node

1) root 231 129 Adelie (0.441558442 0.199134199 0.359307359)
2) flipper_length< 207.5 145  44 Adelie (0.696551724 0.296551724 0.006896552)
4) bill_length< 44.65 100   2 Adelie (0.980000000 0.020000000 0.000000000) *
5) bill_length>=44.65 45   4 Chinstrap (0.066666667 0.911111111 0.022222222) *
3) flipper_length>=207.5 86   4 Gentoo (0.011627907 0.034883721 0.953488372) *``````
``````predictions = learner\$predict(task, row_ids = split\$test)
predictions``````
``````<PredictionClassif> for 113 observations:
row_ids     truth  response
---
343 Chinstrap    Gentoo
344 Chinstrap Chinstrap``````
``predictions\$score(msr("classif.acc"))``
``````classif.acc
0.9380531 ``````

In this example, we trained a decision tree on a subset of the `penguins` dataset, made predictions on the rest of the data and then evaluated these with the accuracy measure. In Chapter 2 we will break this down in more detail.

`mlr3` makes training and predicting easy, but it also allows us to perform very complex operations in just a few lines of code:

``````library(mlr3verse)
library(mlr3pipelines)
library(mlr3benchmark)

tuned_rf = auto_tuner(
tnr("grid_search", resolution = 5),
lrn("classif.ranger", num.trees = to_tune(200, 500)),
rsmp("holdout")
)
tuned_rf = pipeline_robustify(NULL, tuned_rf, TRUE) %>>%
po("learner", tuned_rf)
stack_lrn = ppl(
"stacking",
base_learners = lrns(c("classif.rpart", "classif.kknn")),
lrn("classif.log_reg"))
stack_lrn = pipeline_robustify(NULL, stack_lrn, TRUE) %>>%
po("learner", stack_lrn)

learners = c(tuned_rf, stack_lrn)
``````bma = bm\$aggregate(msr("classif.acc"))[, c("task_id", "learner_id",
"classif.acc")]
bma\$learner_id = rep(c("RF", "Stack"), 2)
bma``````
``````         task_id learner_id classif.acc
1: breast_cancer         RF   0.9605263
2: breast_cancer      Stack   0.9122807
3:         sonar         RF   0.7681159
4:         sonar      Stack   0.7101449``````
``as.BenchmarkAggr(bm)\$friedman_test()``
``````
Friedman rank sum test

data:  ce and learner_id and task_id
Friedman chi-squared = 2, df = 1, p-value = 0.1573``````

In this (much more complex!) example we chose two tasks and two machine learning (ML) algorithms (“learners” in `mlr3` terms). We used automated tuning to optimize the number of trees in the random forest learner (Chapter 4) and a ML pipeline that imputes missing data, collapses factor levels, and creates stacked models (Chapter 6). We also showed basic features like loading learners (Chapter 2) and choosing resampling strategies for benchmarking (Chapter 3). Finally, we compared the performance of the models using the mean accuracy on the test set, and applied a statistical test to see if the learners performed significantly different (they did not!).

You will learn how to do all this and more in this book. We will walk through the functionality offered by `mlr3` and the packages in the `mlr3verse` step by step. There are a few different ways you can use this book, which we will discuss next.

## How to use this book

The mlr3 ecosystem is the result of many years of methodological and applied research and improving the design and implementation of the packages over the years. This book describes the resulting features of the `mlr3verse` and discusses best practices for ML, technical implementation details, extension guidelines, and in-depth considerations for optimizing ML. It is suitable for a wide range of readers and levels of ML expertise.

Chapter 1, Chapter 2, and Chapter 3 cover the basics of mlr3. These chapters are essential to understanding the core infrastrucure of ML in mlr3. We recommend that all readers study these chapters to become familiar with basic mlr3 terminology, syntax, and style. Chapter 4, Chapter 5, and Chapter 6 contain more advanced implementation details and some ML theory. Chapter 8 delves into detail on domain-specific methods that are implemented in our extension packages. Readers may choose to selectively read sections in this chapter depending on your use cases (i.e., if you have domain-specific problems to tackle), or to use these as introductions to new domains to explore. Chapter 9 contains technical implementation details that are essential reading for advanced users who require parallelisation, custom error handling, and fine control over hyperparameters and large databases. Chapter 10 discusses packages that can be integrated with mlr3 to provide model-agnostic interpretability methods. Finally, anyone who would like to contribute to our ecosystem should read Chapter 11.

Of course, you can also read the book cover to cover from start to finish. We have marked any section that contains complex technical information and you may wish to skip these if you are only interested in basic functionality. Similarly, we have marked sections that are optional, such as parts that are more methodological focused and do not discuss the software implementation. Readers that are interested in the more technical detail will likely want to pay attention to the tables at the end of each chapter that show the relationship between our S3 ‘sugar’ functions and the underlying R6 classes; this is explained in more detail in Chapter 1.

This book tries to follow the Diátaxis framework1 for documentation and so we include tutorials, how-to guides, API references, and explanations. This means that the conclusion of each chapter includes a short reference to the core functions learnt in the chapter, links to relevant posts in the mlr3gallery2, and a few exercises that will cover content introduced in the chapter. You can find the solutions to these exercises in Appendix A.

Finally, if you want to reproduce any of the results in this book, note that at the start of each chapter we run `set.seed(<chapter number>)` and the `sessionInfo` at the time of publication is printed in Appendix E.

## Installation guidelines

All packages in the mlr3 ecosystem can be installed from GitHub and R-universe; the majority (but not all) packages can also be installed from CRAN. We recommend adding the mlr-org R-universe3 to your R options so that you can install all packages with `install.packages()` without having to worry which package repository it comes from. To do this, install the `usethis` package and run the following:

• 3 R-universe is an alternative package repository to CRAN. The bit of code below tells R to look at both R-universe and CRAN when trying to install packages. R will always install the latest version of a package.

• ``usethis::edit_r_profile()``

In the file that opens add or change the `repos` argument in `options` so it looks something like this (you might need to add the full code block below or just edit the existing `options` function).

``````options(repos = c(
mlrorg = "https://mlr-org.r-universe.dev",
CRAN = "https://cloud.r-project.org/"
))``````

Save the file, restart your R session, and you are ready to go!

``install.packages("mlr3verse")``

If you want latest development versions of any of our packages, run

``remotes::install_github("mlr-org/{pkg}")``

with `{pkg}` replaced with the name of the package you want to install. You can see an up-to-date list of all our extension packages at https://github.com/mlr-org/mlr3/wiki/Extension-Packages.

## Citation info

Every package in the mlr3verse has its own citation details that can be found on the respective GitHub repository.

To reference this book please use:

``````Becker M, Binder M, Bischl B, Foss N, Kotthoff L, Lang M, Pfisterer F,
Reich N G, Richter J, Schratz P, Sonabend R, Pulatov D.
2023. "Preface". https://mlr3book.mlr-org.com.``````
``````@misc{
title = Preface
author = {Marc Becker, Martin Binder, Bernd Bischl, Natalie Foss,
Lars Kotthoff, Michel Lang, Florian Pfisterer, Nicholas G. Reich,
Jakob Richter, Patrick Schratz, Raphael Sonabend, Damir Pulatov},
url = {https://mlr3book.mlr-org.com},
year = {2023}
}``````

To reference the `mlr3` package, please cite our JOSS paper:

``````Lang M, Binder M, Richter J, Schratz P, Pfisterer F, Coors S, Au Q,
Casalicchio G, Kotthoff L, Bischl B (2019). “mlr3: A modern object-oriented
machine learning framework in R.” Journal of Open Source Software.
doi: 10.21105/joss.01903.

@Article{mlr3,
title = {{mlr3}: A modern object-oriented machine learning framework in {R}},
author = {Michel Lang and Martin Binder and Jakob Richter and Patrick Schratz and
Florian Pfisterer and Stefan Coors and Quay Au and Giuseppe Casalicchio and
Lars Kotthoff and Bernd Bischl},
journal = {Journal of Open Source Software},
year = {2019},
month = {dec},
doi = {10.21105/joss.01903},
url = {https://joss.theoj.org/papers/10.21105/joss.01903},
}``````

## mlr3book style guide

Throughout this book we will use our own style guide that can be found in the mlr3 wiki9. Below are the most important style choices relevant to the book.

1. We always use `=` instead of `<-` for assignment.

2. Class names are in `UpperCamelCase`

3. Function and method names are in `lower_snake_case`

4. When referencing functions, we will only include the package prefix (e.g., `pkg::function`) for functions outside the mlr3 universe or when there may be ambiguity about in which package the function lives. Note you can use `environment(function)` to see which namespace a function is loaded from.

5. We denote packages, fields, methods, and functions as follows:

• `package` - With link (if online) to package CRAN, R-Universe, or GitHub page
• `package::function()` (for functions outside the mlr-org ecosystem)
• `function()` (for functions inside the mlr-org ecosystem) - With link to function documentation page
• `\$field` for fields (data encapsulated in a R6 class)
• `\$method()` for methods (functions encapsulated in a R6 class)