Author: Shawn Storm
iml is an R package that interprets the behavior and explains predictions of machine learning models. The functions provided in the iml package are model-agnostic which gives the flexibility to use any machine learning model.
8.1.1 Penguin Task
To understand what
iml can offer, we start off with a thorough example. The goal of this example is to figure out the species of penguins given a set of features. The
mlr3data::penguins data set from package mlr3data will be used which is an alternative to the
iris data set.
penguins data sets contains 8 variables of 344 penguins:
## 'data.frame': 344 obs. of 8 variables: ## $ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ... ## $ bill_length : num 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ... ## $ bill_depth : num 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ... ## $ flipper_length: int 181 186 195 NA 193 190 181 195 193 190 ... ## $ body_mass : int 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ... ## $ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ... ## $ year : int 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
To get started run:
penguins = na.omit(penguins) is to omit the 11 cases with missing values.
If not omitted, there will be an error when running the learner from the data points that have N/A for some features.
## Ranger result ## ## Call: ## ranger::ranger(dependent.variable.name = task$target_names, data = task$data(), probability = self$predict_type == "prob", case.weights = task$weights$weight) ## ## Type: Probability estimation ## Number of trees: 500 ## Sample size: 333 ## Number of independent variables: 7 ## Mtry: 2 ## Target node size: 10 ## Variable importance mode: none ## Splitrule: gini ## OOB prediction error (Brier s.): 0.0179
As explained in Section 2.3, specific learners can be queried with
In Section 2.5 it is recommended for some classifiers to use the
prob instead of directly predicting a label.
This is what is done in this example.
penguins[which(names(penguins) != "species")] is the data of all the features and
y will be the penguins
learner$train(task_peng) trains the model and
learner$model stores the model from the training command.
Predictor holds the machine learning model and the data.
All interpretation methods in
iml need the machine learning model and the data to be wrapped in the
FeatureEffectscomputes the effects for all given features on the model prediction. Different methods are implemented: Accumulated Local Effect (ALE) plots, Partial Dependence Plots (PDPs) and Individual Conditional Expectation (ICE) curves.
Shapleycomputes feature contributions for single predictions with the Shapley value – an approach from cooperative game theory (Shapley Value).
FeatureImpcomputes the importance of features by calculating the increase in the model’s prediction error after permuting the feature (more here).
In addition to the commands above the following two need to be ran:
effect stores the object from the
FeatureEffect computation and the results can then be plotted. In this example, all of the features provided by the
penguins data set were used.
All features except for
year provide meaningful interpretable information. It should be clear why
year doesn’t provide anything of significance.
bill_length shows for example that when the bill length is smaller than roughly 40mm, there is a high chance that the penguin is an Adelie.
The \(\phi\) provides insight into the probability given the values on the vertical axis. For example, a penguin is less likely to be Gentoo if the bill_depth=18.7 is and much more likely to be Adelie than Chinstrap.
FeatureImp shows the level of importance of the features when classifying the penguins. It is clear to see that the
bill_length is of high importance and one should concentrate on different boundaries of this feature when attempting to classify the three species.
8.1.5 Independent Test Data
It is also interesting to see how well the model performs on a test data set. For this section, exactly as was recommended in Section 2.4, 80% of the penguin data set will be used for the training set and 20% for the test set:
First, we compare the feature importance on training and test set
# plot on training model = Predictor$new(learner, data = penguins[train_set,], y = "species") effect = FeatureImp$new(model, loss = "ce" ) plot_train = plot(effect, features = c("bill_length", "bill_depth", "flipper_length", "body_mass", "body_mass", "year")) # plot on test data model = Predictor$new(learner, data = penguins[test_set, ], y = "species") effect = FeatureImp$new(model, loss = "ce" ) plot_test = plot(effect, features = c("bill_length", "bill_depth", "flipper_length", "body_mass", "body_mass", "year")) # combine into single plot library(patchwork) plot_train + plot_test
The results of the train set for
FeatureImp are very similar, which is expected.
We follow a similar approach to compare the feature effects:
As is the case with
FeatureImp, the test data results show either an over- or underestimate of feature importance / feature effects compared to the results where the entire penguin data set was used.
This would be a good opportunity for the reader to attempt to resolve the estimation by playing with the amount of features and the amount of data used for both the test and train data sets of
Be sure to not change the line
train_set = sample(task_peng$nrow, 0.8 * task_peng$nrow) as it will randomly sample the data again.