In principle, all generic frameworks for model interpretation are applicable on the models fitted with
mlr3 by just extracting the fitted models from the
However, two of the most popular frameworks,
additionally come with some convenience for
Author: Shawn Storm
iml is an R package that interprets the behavior and explains predictions of machine learning models. The functions provided in the iml package are model-agnostic which gives the flexibility to use any machine learning model.
To understand what
iml can offer, we start off with a thorough example. The goal of this example is to figure out the species of penguins given a set of features. The
palmerpenguins::penguins data set will be used which is an alternative to the
iris data set.
penguins data sets contains 8 variables of 344 penguins:
## tibble [344 × 8] (S3: tbl_df/tbl/data.frame) ## $ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ... ## $ bill_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ... ## $ bill_depth_mm : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ... ## $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ... ## $ body_mass_g : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ... ## $ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ... ## $ year : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
To get started run:
penguins = na.omit(penguins) is to omit the 11 cases with missing values.
If not omitted, there will be an error when running the learner from the data points that have N/A for some features.
learner = lrn("classif.ranger") learner$predict_type = "prob" learner$train(task_peng) learner$model
## Ranger result ## ## Call: ## ranger::ranger(dependent.variable.name = task$target_names, data = task$data(), probability = self$predict_type == "prob", case.weights = task$weights$weight, num.threads = 1L) ## ## Type: Probability estimation ## Number of trees: 500 ## Sample size: 333 ## Number of independent variables: 7 ## Mtry: 2 ## Target node size: 10 ## Variable importance mode: none ## Splitrule: gini ## OOB prediction error (Brier s.): 0.0179
As explained in Section 2.3, specific learners can be queried with
In Section 2.5 it is recommended for some classifiers to use the
prob instead of directly predicting a label.
This is what is done in this example.
penguins[which(names(penguins) != "species")] is the data of all the features and
y will be the penguins
learner$train(task_peng) trains the model and
learner$model stores the model from the training command.
Predictor holds the machine learning model and the data.
All interpretation methods in
iml need the machine learning model and the data to be wrapped in the
FeatureEffectscomputes the effects for all given features on the model prediction. Different methods are implemented: Accumulated Local Effect (ALE) plots, Partial Dependence Plots (PDPs) and Individual Conditional Expectation (ICE) curves.
Shapleycomputes feature contributions for single predictions with the Shapley value – an approach from cooperative game theory (Shapley Value).
FeatureImpcomputes the importance of features by calculating the increase in the model’s prediction error after permuting the feature (more here).
In addition to the commands above the following two need to be ran:
num_features = c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g", "year") effect = FeatureEffects$new(model) plot(effect, features = num_features)
effect stores the object from the
FeatureEffect computation and the results can then be plotted. In this example, all of the features provided by the
penguins data set were used.
All features except for
year provide meaningful interpretable information. It should be clear why
year doesn’t provide anything of significance.
bill_length_mm shows for example that when the bill length is smaller than roughly 40mm, there is a high chance that the penguin is an Adelie.
The \(\phi\) provides insight into the probability given the values on the vertical axis. For example, a penguin is less likely to be Gentoo if the bill_depth=18.7 is and much more likely to be Adelie than Chinstrap.
effect = FeatureImp$new(model, loss = "ce") effect$plot(features = num_features)
FeatureImp shows the level of importance of the features when classifying the penguins. It is clear to see that the
bill_length_mm is of high importance and one should concentrate on different boundaries of this feature when attempting to classify the three species.
It is also interesting to see how well the model performs on a test data set. For this section, exactly as was recommended in Section 2.4, 80% of the penguin data set will be used for the training set and 20% for the test set:
First, we compare the feature importance on training and test set
# plot on training model = Predictor$new(learner, data = penguins[train_set,], y = "species") effect = FeatureImp$new(model, loss = "ce" ) plot_train = plot(effect, features = num_features) # plot on test data model = Predictor$new(learner, data = penguins[test_set, ], y = "species") effect = FeatureImp$new(model, loss = "ce" ) plot_test = plot(effect, features = num_features) # combine into single plot library("patchwork") plot_train + plot_test
The results of the train set for
FeatureImp are very similar, which is expected.
We follow a similar approach to compare the feature effects:
model = Predictor$new(learner, data = penguins[train_set,], y = "species") effect = FeatureEffects$new(model) plot(effect, features = num_features)
model = Predictor$new(learner, data = penguins[test_set,], y = "species") effect = FeatureEffects$new(model) plot(effect, features = num_features)
As is the case with
FeatureImp, the test data results show either an over- or underestimate of feature importance / feature effects compared to the results where the entire penguin data set was used.
This would be a good opportunity for the reader to attempt to resolve the estimation by playing with the amount of features and the amount of data used for both the test and train data sets of
Be sure to not change the line
train_set = sample(task_peng$nrow, 0.8 * task_peng$nrow) as it will randomly sample the data again.
The DALEX package X-rays any predictive model and helps to explore, explain and visualize its behaviour. The package implements a collection of methods for Explanatory Model Analysis. It is based on a unified grammar summarised in Figure 8.7.
In the following sections, we will present subsequent methods available in the DALEX package based on a random forest model trained for football players worth prediction on the FIFA 20 data. We will show both methods analyzing the model at the level of a single prediction and the global level - for the whole data set.
The structure of this chapter is the following:
- In Section 8.2.2 we introduce the FIFA 20 dataset and then in section 8.2.3 we train a random regression forest using the ranger package.
- Section 8.2.4 introduces general logic beyond DALEX explainers.
- Section 8.2.5 introduces methods for dataset level model exploration.
- Section 8.2.6 introduces methods for instance-level model exploration.
Examples presented in this chapter are based on data retrieved from the FIFA video game. We will use the data scrapped from the sofifa website. The raw data is available at kaggle. After some basic data cleaning, the processed data for the top 5000 football players is available in the DALEX package under the name
## value_eur age height_cm nationality attacking_crossing ## L. Messi 95500000 32 170 Argentina 88 ## Cristiano Ronaldo 58500000 34 187 Portugal 84
For every player, we have 42 features available.
##  5000 42
In the table below we overview these 42 features for three selected players.
One of the features, called
value_eur, is the worth of the footballer in euros. In the next section, we will build a prediction model, which will estimate the worth of the player based on other player characteristics.
|Lionel Messi||Cristiano Ronaldo||Neymar Junior|
|value_eur||95 500 000||58 500 000||105 500 000|
In order to get a more stable model we remove four variables i.e.
DALEX package works for any model regardless of its internal structure. Examples of how this package works are shown on a random forest model implemented in the ranger package.
We use the
mlr3 package to build a predictive model.
First, let’s load the required packages.
Then we can define the regression task - prediction of the
fifa_task <- as_task_regr(fifa, target = "value_eur")
Finally, we train mlr3’s
ranger learner with 250 trees. Note that in this example for brevity we do not split the data into a train/test data. The model is built on the whole data.
## <LearnerRegrRanger:regr.ranger> ## * Model: ranger ## * Parameters: num.trees=250 ## * Packages: ranger ## * Predict Type: response ## * Feature types: logical, integer, numeric, character, factor, ordered ## * Properties: importance, oob_error, weights
Working with explanations in the DALEX package always consists of three steps schematically shown in the pipe below.
model %>% explain_mlr3(data = ..., y = ..., label = ...) %>% model_parts() %>% plot()
All functions in the DALEX package can work for models with any structure. It is possible because in the first step we create an adapter that allows the downstream functions to access the model in a consistent fashion. In general, such an adapter is created with
DALEX::explain.default()function, but for models created in the
mlr3package it is more convenient to use the
Explanations are determined by the functions
DALEX::predict_profile(). Each of these functions takes the model adapter as its first argument. The other arguments describe how the function works. We will present them in the following section.
Explanations can be visualized with the generic function
plotor summarised with the generic function
print(). Each explanation is a data frame with an additional class attribute. The
plotfunction creates graphs using the ggplot2 package, so they can be easily modified with usual
We show this cascade of functions based on the FIFA example.
To get started with the exploration of the model behaviour we need to create an explainer.
DALEX::explain.default function handles is for all types of predictive models. In the DALEXtra package there generic versions for the most common ML frameworks. Among them the
DALEXtra::explain_mlr3() function works for
This function performs a series of internal checks so the output is a bit verbose. Turn the
verbose = FALSE argument to make it less wordy.
library("DALEX") library("DALEXtra") ranger_exp <- explain_mlr3(fifa_ranger, data = fifa, y = fifa$value_eur, label = "Ranger RF", colorize = FALSE)
## Preparation of a new explainer is initiated ## -> model label : Ranger RF ## -> data : 5000 rows 38 cols ## -> target variable : 5000 values ## -> predict function : yhat.LearnerRegr will be used ( default ) ## -> predicted values : No value for predict function target column. ( default ) ## -> model_info : package mlr3 , ver. 0.12.0 , task regression ( default ) ## -> predicted values : numerical, min = 466803 , mean = 7472317 , max = 88264233 ## -> residual function : difference between y and yhat ( default ) ## -> residuals : numerical, min = -8558527 , mean = 970.2 , max = 17805600 ## A new explainer has been created!
## variable mean_dropout_loss label ## 1 _full_model_ 1416582 Ranger RF ## 2 value_eur 1416582 Ranger RF ## 3 height_cm 1476876 Ranger RF ## 4 goalkeeping_kicking 1477638 Ranger RF ## 5 weight_kg 1482370 Ranger RF ## 6 movement_balance 1486915 Ranger RF
Results can be visualized with generic
plot(). The chart for all 38 variables would be unreadable, so with the
max_vars argument, we limit the number of variables on the plot.
plot(fifa_vi, max_vars = 12, show_boxplots = FALSE)
Once we know which variables are most important, we can use Partial Dependence Plots to show how the model, on average, changes with changes in selected variables. In this example, they show the average relation between the particular variables and players’ value.
selected_variables <- c("age", "movement_reactions", "skill_ball_control", "skill_dribbling") fifa_pd <- model_profile(ranger_exp, variables = selected_variables)$agr_profiles fifa_pd
## Top profiles : ## _vname_ _label_ _x_ _yhat_ _ids_ ## 1 skill_ball_control Ranger RF 5 6426909 0 ## 2 skill_dribbling Ranger RF 7 6671599 0 ## 3 skill_dribbling Ranger RF 11 6662567 0 ## 4 skill_dribbling Ranger RF 12 6650203 0 ## 5 skill_dribbling Ranger RF 13 6649258 0 ## 6 skill_dribbling Ranger RF 14 6649165 0
Again, the result of the explanation can be presented with the generic function
library("ggplot2") plot(fifa_pd) + scale_y_continuous("Estimated value in Euro", labels = scales::dollar_format(suffix = "€", prefix = "")) + ggtitle("Partial Dependence profiles for selected variables")
The general trend for most player characteristics is the same. The higher are the skills the higher is the player’s worth. With a single exception – variable Age.
Time to see how the model behaves for a single observation/player This can be done for any player, but this example we will use the Cristiano Ronaldo.
predict_parts is an instance-level version of the
model_parts function introduced in the previous section. For the background behind that method see the Introduction to Break Down.
ronaldo <- fifa["Cristiano Ronaldo",] ronaldo_bd_ranger <- predict_parts(ranger_exp, new_observation = ronaldo) head(ronaldo_bd_ranger)
## contribution ## Ranger RF: intercept 7472317 ## Ranger RF: movement_reactions = 96 12189418 ## Ranger RF: skill_ball_control = 92 5463949 ## Ranger RF: attacking_finishing = 94 4643519 ## Ranger RF: mentality_positioning = 95 5531779 ## Ranger RF: skill_dribbling = 89 4102550
plot() function shows the estimated contribution of variables to the final prediction.
Cristiano is a striker, therefore characteristics that influence his worth are those related to attack, like
skill_dribbling. The only variable with negative attribution is
Another way to inspect the local behaviour of the model is to use SHapley Additive exPlanations (SHAP). It locally shows the contribution of variables to a single observation, just like Break Down.
ronaldo_shap_ranger <- predict_parts(ranger_exp, new_observation = ronaldo, type = "shap") plot(ronaldo_shap_ranger) + scale_y_continuous("Estimated value in Euro", labels = scales::dollar_format(suffix = "€", prefix = ""))
In the previous section, we’ve introduced a global explanation - Partial Dependence Plots. Ceteris Paribus is the instance level version of that plot. It shows the response of the model for observation when we change only one variable while others stay unchanged. Blue dot stands for the original value.
selected_variables <- c("age", "movement_reactions", "skill_ball_control", "skill_dribbling") ronaldo_cp_ranger <- predict_profile(ranger_exp, ronaldo, variables = selected_variables) plot(ronaldo_cp_ranger, variables = selected_variables) + scale_y_continuous("Estimated value of Christiano Ronaldo", labels = scales::dollar_format(suffix = "€", prefix = ""))