Applied Machine Learning Using mlr3 in R

Getting Started

Editors

Bernd Bischl, Raphael Sonabend, Lars Kotthoff, Michel Lang

Contributors

  • Marc Becker
  • Przemysław Biecek
  • Martin Binder
  • Bernd Bischl
  • Lukas Burk
  • Giuseppe Casalicchio
  • Susanne Dandl
  • Sebastian Fischer
  • Natalie Foss
  • Lars Kotthoff
  • Michel Lang
  • Florian Pfisterer
  • Damir Pulatov
  • Lennart Schneider
  • Patrick Schratz
  • Raphael Sonabend
  • Janek Thomas
  • Marvin N. Wright

Welcome to the Machine Learning in R universe. This is the online version of the print book Applied Machine Learning Using mlr3 in R published by CRC Press, you can buy a copy of the book here - all profits from the book will go to the mlr organisation to support future maintenance and development of the mlr universe. This book will teach you about the mlr3 universe of packages, from machine learning methodology to implementations of complex algorithmic pipelines.

We hope you enjoy reading our book and always welcome comments and feedback. If you notice any mistakes we would appreciate if you could open an issue in the mlr3book issue tracker.

Licensing

Code chunks in this book are licensed under MIT and all figures generated by code chunks are licensed under CC BY, which means you can copy, adapt, and redistribute this material in any way that you like as long as you reference this book (see citation information just below).

All other content (text, tables, figures not generated from code chunks, etc.) is licensed under CC BY-NC-SA 4.0, which means you can copy and redistribute the material however you want and adapt it however you want, as long as: you do reference the book (see citation information below), you do not use any material for commercial purposes, you do use a CC BY-NC-SA 4.0 compatible license if you adapt the material.

If you have any questions about licensing just open an issue and we will help you out.

Citation Information

Citation details of packages in the mlr3 ecosystem can be found in their respective GitHub repositories.

When you are citing this book please cite chapters directly; citations can be found at the end of each chapter. If you need to reference the full book please use:

Bischl, B., Sonabend, R., Kotthoff, L., & Lang, M. (Eds.). (2024).
"Applied Machine Learning Using mlr3 in R". CRC Press. https://mlr3book.mlr-org.com

@book{Bischl2024
    title = {Applied Machine Learning Using {m}lr3 in {R}},
    editor = {Bernd Bischl and Raphael Sonabend and Lars Kotthoff and Michel Lang},
    url = {https://mlr3book.mlr-org.com},
    year = {2024},
    isbn = {9781032507545},
    publisher = {CRC Press}
}

Overview

The mlr3 ecosystem is the result of many years of methodological and applied research. This book describes the resulting features and discusses best practices for ML, technical implementation details, and in-depth considerations for model optimization. This book may be helpful for both practitioners who want to quickly apply machine learning (ML) algorithms and researchers who want to implement, benchmark, and compare their new methods in a structured environment. While we hope this book is accessible to a wide range of readers and levels of ML expertise, we do assume that readers have taken at least an introductory ML course or have the equivalent expertise and some basic experience with R. A background in computer science or statistics is beneficial for understanding the advanced functionality described in the later chapters of this book, but not required. A comprehensive ML introduction for those new to the field can be found in James et al. (2014). Wickham and Grolemund (2017) provides a comprehensive introduction to data science in R.

The book is split into the following four parts:

Part I: Fundamentals
In this part of the book we will teach you the fundamentals of mlr3. This will give you a flavor of the building blocks of the mlr3 universe and the basic tools you will need to tackle most machine learning problems. We recommend that all readers study these chapters to become familiar with mlr3 terminology, syntax, and style. In 2  Data and Basic Modeling we will cover the basic classes in mlr3, including Learner (machine learning implementations), Measure (performance metrics), and Task (machine learning task definitions). 3  Evaluation and Benchmarking will take evaluation a step further to include discussions about resampling – robust strategies for measuring model performance – and benchmarking – experiments for comparing multiple models.

Part II: Tuning and Feature Selection
In this part of the book, we will look at more advanced methodology that is essential to developing powerful ML models with good predictive ability. 4  Hyperparameter Optimization introduces hyperparameter optimization, which is the process of tuning model hyperparameters to obtain better model performance. Tuning is implemented via the mlr3tuning package, which also includes methods for automating complex tuning processes, including nested resampling. The performance of ML models can be improved by tuning hyperparameters but also by carefully selecting features. 6  Feature Selection introduces feature selection with filters and wrappers implemented in mlr3filters and mlr3fselect. For readers interested in taking a deep dive into tuning, 5  Advanced Tuning Methods and Black Box Optimization discusses advanced tuning methods including error handling, multi-objective tuning, and tuning with Hyperband and Bayesian optimization methods.

Part III: Pipelines and Preprocessing
In Part III we introduce mlr3pipelines, which allows users to implement complex ML workflows easily. In 7  Sequential Pipelines we will show you how to build a pipeline out of discrete configurable operations and how to treat complex pipelines as if they were any other machine learning model. In 8  Non-sequential Pipelines and Tuning we will build on the previous chapter by introducing non-sequential pipelines, which can have multiple branches that carry out operations concurrently. We will also demonstrate how to tune pipelines, including how to tune which operations should be included in the pipeline. Finally, in 9  Preprocessing we will put pipelines into practice by demonstrating how to solve common problems that occur when fitting ML models to messy data.

Part IV: Advanced Topics
In the final part of the book, we will look at advanced methodology and technical details. This part of the book is more theory-heavy in some sections to help ground the design and implementation decisions. We will begin by looking at advanced technical details in 10  Advanced Technical Aspects of mlr3 that are essential reading for advanced users who require parallelization, custom error handling, or large databases. 11  Large-Scale Benchmarking will build on all preceding chapters to introduce large-scale benchmarking experiments that compare many models, tasks, and measures; including how to make use of mlr3 extension packages for loading data, using high-performance computing clusters, and formal statistical analysis of benchmark experiments. 12  Model Interpretation will discuss different packages that are compatible with mlr3 to provide model-agnostic interpretability for feature importance and local explainability of individual predictions. 13  Beyond Regression and Classification will then delve into detail on domain-specific methods that are implemented in our extension packages including survival analysis, density estimation, spatio-temporal analysis, and more. Readers may choose to selectively read sections in this chapter depending on your use case (i.e., if you have domain-specific problems to tackle), or to use these as introductions to new domains to explore. Finally, 14  Algorithmic Fairness will introduce algorithmic fairness, which includes specialized measures and methods to identify and reduce algorithmic biases.

Acknowledgments

As well as the editors and contributing authors, many others have contributed to this book. We would like to acknowledge Stefan Coors for creating many of the images in the book, as well as Daniel Saggau, Jakob Richter, and Marvin Böcker for contributions to earlier drafts the book. We would also like to acknowledge the following organisations that supported various contributors: Munich Center for Machine Learning (MCML), National Science Foundation (NSF), and Mathematical Research Data Initiative (MaRDI).