2 Introduction and Overview

The mlr3 package and ecosystem provides a generic, object-oriented, and extensible framework for classification, regression, survival analysis, and other machine learning tasks for the R language. It does not implement any learners itself, but provides a unified interface to many existing learners in R. This unified interface allows to provide functionality to extend and combine existing learners, intelligently select and tune the most appropriate technique for a task, and perform large-scale comparisons that enable meta-learning. Examples of this advanced functionality include hyperparameter tuning, feature selection, and ensemble construction. Parallelization of many operations is natively supported.

Target Audience

mlr3 provides a domain-specific language for machine learning in R. We target both practitioners who want to quickly apply machine learning algorithms and researchers who want to implement, benchmark, and compare their new methods in a structured environment. The package is a complete rewrite of an earlier version of mlr that leverages many years of experience to provide a state-of-the-art system that is easy to use and extend. It is intended for users who have basic knowledge in machine learning and R and who are interested in complex projects and advanced functionality rather than one-liners for one specific thing.

Why a Rewrite?

mlr was first released to CRAN in 2013, with the core design and architecture dating back much further. Over time, the addition of many features has led to a considerably more complex design that made it harder to build, maintain, and extend than we had hoped for. With hindsight, we saw that some of the design and architecture changes in mlr made it difficult to support new features, in particular with respect to pipelines. Furthermore, the R ecosystem as well as helpful packages such as data.table have undergone major changes in the meantime. It would have been nearly impossible to integrate all these changes into the original design of mlr. Instead, we decided to start working on a reimplementation in 2018, which resulted in the first release of mlr3 on CRAN in July 2019. The new design and the integration of further and newly developed R packages (R6, future, data.table) makes mlr3 much easier to maintain and use, and more performant compared to mlr.

Design Principles

We follow the general design principles below in the implementation of the mlr3 package and ecosystem.

  • Backend over frontend. Most packages of the mlr3 ecosystem focus on processing and transforming data, applying machine learning algorithms, and computing results. We do not provide graphical user interfaces (GUIs) and visualizations of data and results is provided in extra packages.
  • Embrace R6 for a clean object-oriented design, object state-changes, and reference semantics.
  • Embrace data.table for fast and convenient data frame computations.
  • Unify container and result classes as much as possible and provide result data in data.tables. This considerably simplifies the API and allows easy selection and “split-apply-combine” (aggregation) operations. We combine data.table and R6 to place references to non-atomic and compound objects in tables and make heavy use of list columns.
  • Be light on dependencies. One of the main maintenance burdens for mlr was to keep up with changing learner interfaces and behaviour of the many packages it depended on. We require far fewer packages in mlr3 to make installation and maintenance easier.

mlr3 requires the following packages:

  • backports: Ensures backward compatibility with older R releases. Developed by members of the mlr team. No recursive dependencies.
  • checkmate: Fast argument checks. Developed by members of the mlr team. No extra recursive dependencies.
  • mlr3misc: Miscellaneous functions used in multiple mlr3 extension packages. Developed by the mlr team. No extra recursive dependencies.
  • paradox: Descriptions for parameters and parameter sets. Developed by the mlr team. No extra recursive dependencies.
  • R6: Reference class objects. No recursive dependencies.
  • data.table: Extension of R’s data.frame. No recursive dependencies.
  • digest: Hash digests. No recursive dependencies.
  • uuid: Create unique string identifiers. No recursive dependencies.
  • lgr: Logging facility. No extra recursive dependencies.
  • Metrics: Package which implements performance measures. No recursive dependencies.
  • mlbench: A collection of machine learning data sets. No dependencies.

mlr3 provides additional functionality through extra packages: