mlr3pipelines (Binder et al. 2021) is a dataflow programming toolkit. This chapter focuses on the applicant’s side of the package. A more in-depth and technically oriented guide can be found in the In-depth look into mlr3pipelines chapter.

Machine learning workflows can be written as directed “Graphs”/“Pipelines” that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. We will most often use the term “Graph” in this manual but it can interchangeably be used with “pipeline” or “workflow”.

Below you can examine an example for such a graph:

Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing. Currently supported features are:

Additionally, we implement several meta operators that can be used to construct powerful pipelines:

An extensive introduction to creating custom PipeOps (PO’s) can be found in the technical introduction.

Using methods from mlr3tuning, it is even possible to simultaneously optimize parameters of multiple processing units.

A predecessor to this package is the mlrCPO package, which works with mlr 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are:

An example for a Pipeline that can be constructed using mlr3pipelines is depicted below: