2.3 Learners

Objects of class mlr3::Learner provide a unified interface to many popular machine learning algorithms in R. They consist of methods to train and predict a model for a mlr3::Task and provide meta-information about the learners, such as the hyperparameters you can set.

The package ships with a minimal set of classification and regression learners to avoid lots of dependencies:

Some of the most popular learners are connected via the mlr3learners package:

  • (penalized) linear and logistic regression
  • \(k\)-Nearest Neighbors regression and classification
  • Linear and Quadratic Discriminant Analysis
  • Naive Bayes
  • Support-Vector machines
  • Gradient Boosting
  • Random Regression Forests and Random Classification Forests
  • Kriging

More learners are collected on GitHub in the mlr3learners organization. Their state is also listed on the wiki of the mlr3learners repository.

The base class of each learner is Learner, specialized for regression as LearnerRegr and for classification as LearnerClassif. In contrast to the Task, the creation of a custom Learner is usually not required and a more advanced topic. Hence, we refer the reader to Section 6.1 and proceed with an overview of the interface of already implemented learners.

2.3.1 Predefined Learners

Similar to mlr_tasks, the Dictionary mlr_learners can be queried for available learners:

library(mlr3learners)
mlr_learners
## <DictionaryLearner> with 21 stored values
## Keys: classif.debug, classif.featureless, classif.glmnet, classif.kknn,
##   classif.lda, classif.log_reg, classif.naive_bayes, classif.qda,
##   classif.ranger, classif.rpart, classif.svm, classif.xgboost,
##   regr.featureless, regr.glmnet, regr.kknn, regr.km, regr.lm,
##   regr.ranger, regr.rpart, regr.svm, regr.xgboost

Each learner has the following information:

  • feature_types: the type of features the learner can deal with.
  • packages: the packages required to train a model with this learner and make predictions.
  • properties: additional properties and capabilities. For example, a learner has the property “missings” if it is able to handle missing feature values, and “importance” if it computes and allows to extract data on the relative importance of the features. A complete list of these is available in the mlr3 reference on regression learners and classification learners.
  • predict_types: possible prediction types. For example, a classification learner can predict labels (“response”) or probabilities (“prob”). For a complete list of possible predict types see the mlr3 reference.

For a tabular overview of integrated learners, see Section 10.1.

You can get a specific learner using its id, listed under key in the dictionary:

learner = mlr_learners$get("classif.rpart")
print(learner)
## <LearnerClassifRpart:classif.rpart>
## * Model: -
## * Parameters: xval=0
## * Packages: rpart
## * Predict Type: response
## * Feature types: logical, integer, numeric, factor, ordered
## * Properties: importance, missings, multiclass, selected_features,
##   twoclass, weights

The field param_set stores a description of the hyperparameters the learner has, their ranges, defaults, and current values:

learner$param_set
## ParamSet: 
##              id    class lower upper levels default value
## 1:     minsplit ParamInt     1   Inf             20      
## 2:           cp ParamDbl     0     1           0.01      
## 3:   maxcompete ParamInt     0   Inf              4      
## 4: maxsurrogate ParamInt     0   Inf              5      
## 5:     maxdepth ParamInt     1    30             30      
## 6:         xval ParamInt     0   Inf             10     0

The set of current hyperparameter values is stored in the values field of the param_set field. You can change the current hyperparameter values by assigning a named list to this field:

learner$param_set$values = list(cp = 0.01, xval = 0)
learner
## <LearnerClassifRpart:classif.rpart>
## * Model: -
## * Parameters: cp=0.01, xval=0
## * Packages: rpart
## * Predict Type: response
## * Feature types: logical, integer, numeric, factor, ordered
## * Properties: importance, missings, multiclass, selected_features,
##   twoclass, weights

Note that this operation just overwrites all previously set parameters. If you just want to add or update hyperparameters, you can use mlr3misc::insert_named():

learner$param_set$values = mlr3misc::insert_named(
  learner$param_set$values,
  list(cp = 0.02, minsplit = 2)
)
learner
## <LearnerClassifRpart:classif.rpart>
## * Model: -
## * Parameters: cp=0.02, xval=0, minsplit=2
## * Packages: rpart
## * Predict Type: response
## * Feature types: logical, integer, numeric, factor, ordered
## * Properties: importance, missings, multiclass, selected_features,
##   twoclass, weights

This updates cp to 0.02, sets minsplit to 2 and keeps the previously set parameter xval.

Again, there is an alternative to writing down the lengthy mlr_learners$get() part: lrn(). This function additionally allows to construct learners with specific hyperparameters or settings of a different identifier in one go:

lrn("classif.rpart", id = "rp", cp = 0.001)
## <LearnerClassifRpart:rp>
## * Model: -
## * Parameters: xval=0, cp=0.001
## * Packages: rpart
## * Predict Type: response
## * Feature types: logical, integer, numeric, factor, ordered
## * Properties: importance, missings, multiclass, selected_features,
##   twoclass, weights

If you pass hyperparameters here, it is added to the default parameters in a insert_named()-fashion.

For further information on how to customize learners using mlr3, see the section on extending learners.