## 8.1 IML

*Author: Shawn Storm*

iml is an R package that interprets the behavior and explains predictions of machine learning models. The functions provided in the iml package are model-agnostic which gives the flexibility to use any machine learning model.

This chapter provides examples of how to use `iml`

with `mlr3`

. For more information refer to the IML github and the IML book

### 8.1.1 Penguin Task

To understand what `iml`

can offer, we start off with a thorough example. The goal of this example is to figure out the species of penguins given a set of features. The `mlr3data::penguins`

data set from package mlr3data will be used which is an alternative to the `iris`

data set.
The `penguins`

data sets contains 8 variables of 344 penguins:

```
## 'data.frame': 344 obs. of 8 variables:
## $ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ bill_length : num 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
## $ bill_depth : num 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
## $ flipper_length: int 181 186 195 NA 193 190 181 195 193 190 ...
## $ body_mass : int 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
## $ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
## $ year : int 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
```

To get started run:

```
penguins = na.omit(penguins)
task_peng = TaskClassif$new(id = "Species", backend = penguins, target = "species")
```

`penguins = na.omit(penguins)`

is to omit the 11 cases with missing values.
If not omitted, there will be an error when running the learner from the data points that have N/A for some features.

```
learner = mlr_learners$get("classif.ranger")
learner$predict_type = "prob"
learner$train(task_peng)
learner$model
```

```
## Ranger result
##
## Call:
## ranger::ranger(dependent.variable.name = task$target_names, data = task$data(), probability = self$predict_type == "prob", case.weights = task$weights$weight)
##
## Type: Probability estimation
## Number of trees: 500
## Sample size: 333
## Number of independent variables: 7
## Mtry: 2
## Target node size: 10
## Variable importance mode: none
## Splitrule: gini
## OOB prediction error (Brier s.): 0.0179
```

```
x = penguins[which(names(penguins) != "species")]
model = Predictor$new(learner, data = x, y = penguins$species)
```

As explained in Section 2.3, specific learners can be queried with `mlr_learners`

.
In Section 2.5 it is recommended for some classifiers to use the `predict_type`

as `prob`

instead of directly predicting a label.
This is what is done in this example.
`penguins[which(names(penguins) != "species")]`

is the data of all the features and `y`

will be the penguins`species`

.
`learner$train(task_peng)`

trains the model and `learner$model`

stores the model from the training command.
`Predictor`

holds the machine learning model and the data.
All interpretation methods in `iml`

need the machine learning model and the data to be wrapped in the `Predictor`

object.

Next is the core functionality of `iml`

. In this example three separate interpretation methods will be used: FeatureEffects, FeatureImp and Shapley

`FeatureEffects`

computes the effects for all given features on the model prediction. Different methods are implemented: Accumulated Local Effect (ALE) plots, Partial Dependence Plots (PDPs) and Individual Conditional Expectation (ICE) curves.`Shapley`

computes feature contributions for single predictions with the Shapley value – an approach from cooperative game theory (Shapley Value).`FeatureImp`

computes the importance of features by calculating the increase in the model’s prediction error after permuting the feature (more here).

### 8.1.2 FeatureEffects

In addition to the commands above the following two need to be ran:

```
effect = FeatureEffects$new(model)
plot(effect, features = c("bill_length", "bill_depth", "flipper_length", "body_mass", "body_mass", "year"))
```

`effect`

stores the object from the `FeatureEffect`

computation and the results can then be plotted. In this example, all of the features provided by the `penguins`

data set were used.

All features except for `year`

provide meaningful interpretable information. It should be clear why `year`

doesn’t provide anything of significance. `bill_length`

shows for example that when the bill length is smaller than roughly 40mm, there is a high chance that the penguin is an Adelie.

### 8.1.3 Shapley

```
x = penguins[which(names(penguins) != "species")]
model = Predictor$new(learner, data = penguins, y = "species")
x.interest = data.frame(penguins[1, ])
shapley = Shapley$new(model, x.interest = x.interest)
plot(shapley)
```

The \(\phi\) provides insight into the probability given the values on the vertical axis. For example, a penguin is less likely to be Gentoo if the bill_depth=18.7 is and much more likely to be Adelie than Chinstrap.

### 8.1.4 FeatureImp

```
effect = FeatureImp$new(model, loss = "ce")
effect$plot(features = c("bill_length","bill_depth", "flipper_length", "body_mass", "body_mass","year"))
```

`FeatureImp`

shows the level of importance of the features when classifying the penguins. It is clear to see that the `bill_length`

is of high importance and one should concentrate on different boundaries of this feature when attempting to classify the three species.

### 8.1.5 Independent Test Data

It is also interesting to see how well the model performs on a test data set. For this section, exactly as was recommended in Section 2.4, 80% of the penguin data set will be used for the training set and 20% for the test set:

```
train_set = sample(task_peng$nrow, 0.8 * task_peng$nrow)
test_set = setdiff(seq_len(task_peng$nrow), train_set)
learner$train(task_peng, row_ids = train_set)
prediction = learner$predict(task_peng, row_ids = test_set)
```

First, we compare the feature importance on training and test set

```
# plot on training
model = Predictor$new(learner, data = penguins[train_set,], y = "species")
effect = FeatureImp$new(model, loss = "ce" )
plot_train = plot(effect, features = c("bill_length", "bill_depth", "flipper_length", "body_mass", "body_mass", "year"))
# plot on test data
model = Predictor$new(learner, data = penguins[test_set, ], y = "species")
effect = FeatureImp$new(model, loss = "ce" )
plot_test = plot(effect, features = c("bill_length", "bill_depth", "flipper_length", "body_mass", "body_mass", "year"))
# combine into single plot
library(patchwork)
plot_train + plot_test
```

The results of the train set for `FeatureImp`

are very similar, which is expected.
We follow a similar approach to compare the feature effects:

```
model = Predictor$new(learner, data = penguins[train_set,], y = "species")
effect = FeatureEffects$new(model)
plot(effect, features = c("bill_length", "bill_depth", "flipper_length", "body_mass", "body_mass", "year"))
```

```
model = Predictor$new(learner, data = penguins[test_set,], y = "species")
effect = FeatureEffects$new(model)
plot(effect, features = c("bill_length", "bill_depth", "flipper_length", "body_mass", "body_mass", "year"))
```

As is the case with `FeatureImp`

, the test data results show either an over- or underestimate of feature importance / feature effects compared to the results where the entire penguin data set was used.
This would be a good opportunity for the reader to attempt to resolve the estimation by playing with the amount of features and the amount of data used for both the test and train data sets of `FeatureImp`

and `FeatureEffects`

.
Be sure to not change the line `train_set = sample(task_peng$nrow, 0.8 * task_peng$nrow)`

as it will randomly sample the data again.