## 8.1 IML

Author: Shawn Storm

iml is an R package that interprets the behavior and explains predictions of machine learning models. The functions provided in the iml package are model-agnostic which gives the flexibility to use any machine learning model.

This chapter provides examples of how to use iml with mlr3. For more information refer to the IML github and the IML book

To understand what iml can offer, we start off with a thorough example. The goal of this example is to figure out the species of penguins given a set of features. The palmerpenguins::penguins data set will be used which is an alternative to the iris data set. The penguins data sets contains 8 variables of 344 penguins:

data("penguins", package = "palmerpenguins")
str(penguins)
## tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
##  $species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ... ##$ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $bill_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ... ##$ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##  $flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ... ##$ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
##  $sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ... ##$ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

To get started run:

library("iml")
library("mlr3")
library("mlr3learners")
set.seed(1)
penguins = na.omit(penguins)
task_peng = as_task_classif(penguins, target = "species")

penguins = na.omit(penguins) is to omit the 11 cases with missing values. If not omitted, there will be an error when running the learner from the data points that have N/A for some features.

learner = lrn("classif.ranger")
learner$predict_type = "prob" learner$train(task_peng)
learner$model ## Ranger result ## ## Call: ## ranger::ranger(dependent.variable.name = task$target_names, data = task$data(), probability = self$predict_type == "prob", case.weights = task$weights$weight,      num.threads = 1L)
##
## Type:                             Probability estimation
## Number of trees:                  500
## Sample size:                      333
## Number of independent variables:  7
## Mtry:                             2
## Target node size:                 10
## Variable importance mode:         none
## Splitrule:                        gini
## OOB prediction error (Brier s.):  0.0179
x = penguins[which(names(penguins) != "species")]
model = Predictor$new(learner, data = x, y = penguins$species)

As explained in Section 2.3, specific learners can be queried with mlr_learners. In Section 2.5 it is recommended for some classifiers to use the predict_type as prob instead of directly predicting a label. This is what is done in this example. penguins[which(names(penguins) != "species")] is the data of all the features and y will be the penguinsspecies. learner$train(task_peng) trains the model and learner$model stores the model from the training command. Predictor holds the machine learning model and the data. All interpretation methods in iml need the machine learning model and the data to be wrapped in the Predictor object.

Next is the core functionality of iml. In this example three separate interpretation methods will be used: FeatureEffects, FeatureImp and Shapley

### 8.1.2 FeatureEffects

In addition to the commands above the following two need to be ran:

num_features = c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g", "year")
effect = FeatureEffects$new(model) plot(effect, features = num_features) effect stores the object from the FeatureEffect computation and the results can then be plotted. In this example, all of the features provided by the penguins data set were used. All features except for year provide meaningful interpretable information. It should be clear why year doesn’t provide anything of significance. bill_length_mm shows for example that when the bill length is smaller than roughly 40mm, there is a high chance that the penguin is an Adelie. ### 8.1.3 Shapley x = penguins[which(names(penguins) != "species")] model = Predictor$new(learner, data = penguins, y = "species")
x.interest = data.frame(penguins[1, ])
effect$plot(features = num_features) FeatureImp shows the level of importance of the features when classifying the penguins. It is clear to see that the bill_length_mm is of high importance and one should concentrate on different boundaries of this feature when attempting to classify the three species. ### 8.1.5 Independent Test Data It is also interesting to see how well the model performs on a test data set. For this section, exactly as was recommended in Section 2.4, 80% of the penguin data set will be used for the training set and 20% for the test set: train_set = sample(task_peng$nrow, 0.8 * task_peng$nrow) test_set = setdiff(seq_len(task_peng$nrow), train_set)
learner$train(task_peng, row_ids = train_set) prediction = learner$predict(task_peng, row_ids = test_set)

First, we compare the feature importance on training and test set

# plot on training
model = Predictor$new(learner, data = penguins[train_set,], y = "species") effect = FeatureImp$new(model, loss = "ce" )
plot_train = plot(effect, features = num_features)

# plot on test data
model = Predictor$new(learner, data = penguins[test_set, ], y = "species") effect = FeatureImp$new(model, loss = "ce" )
plot_test = plot(effect, features = num_features)

# combine into single plot
library(patchwork)
plot_train + plot_test

The results of the train set for FeatureImp are very similar, which is expected. We follow a similar approach to compare the feature effects:

model = Predictor$new(learner, data = penguins[train_set,], y = "species") effect = FeatureEffects$new(model)
plot(effect, features = num_features)
model = Predictor$new(learner, data = penguins[test_set,], y = "species") effect = FeatureEffects$new(model)
plot(effect, features = num_features)

As is the case with FeatureImp, the test data results show either an over- or underestimate of feature importance / feature effects compared to the results where the entire penguin data set was used. This would be a good opportunity for the reader to attempt to resolve the estimation by playing with the amount of features and the amount of data used for both the test and train data sets of FeatureImp and FeatureEffects. Be sure to not change the line train_set = sample(task_peng$nrow, 0.8 * task_peng$nrow) as it will randomly sample the data again.