Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
University of Bremen, Germany
University of Copenhagen, Denmark
Abstract
The goal of this chapter is to present key methods that allow an in-depth analysis of a trained model. When using predictive models in practice, a high generalization performance alone is often not sufficient. In many applications, users want to gain insights into the inner workings of a model and, e.g., understand which features are important and how they influence the model’s predictions. For the end user, this knowledge allows better utilization of models in the decision-making process, e.g., by analyzing different possible decision options. In addition, if the model’s behavior turns out to be in line with the domain knowledge or the user’s intuition, then the user’s confidence in the model and its prediction will increase. For the modeler, an in-depth analysis of the model allows undesirable model behavior to be detected and corrected.
The increasing availability of data and software frameworks to create predictive models has allowed the widespread adoption of machine learning in many applications. However, high predictive performance of such models often comes at the cost of interpretability: Many trained models by default do not provide further insights into the model and are often too complex to be understood by humans such that questions like ‘’What are the most important features and how do they influence a prediction?’’ cannot be directly answered. This lack of explanations hurts trust and creates barriers to adapt predictive models, especially in critical areas with decisions affecting human life, such as credit scoring or medical applications.
Interpretation methods are valuable from multiple perspectives:
To gain global insights into a model, e.g., to identify which features were overall most important.
To improve the model after flaws were identified (in the data or model), e.g., whether the model unexpectedly relies on a certain feature.
To understand and control individual predictions, e.g., to identify how a given prediction changes when changing the input.
For justification purposes or to assess fairness, e.g., to inspect whether the model adversely affects certain subpopulations or individuals. This point is not subject of this chapter, but will be discussed in detail in Section 12.1.
In this chapter, we focus on some important methods from the field of interpretable machine learning that can be applied post-hoc, i.e. after the model has been trained, and are model-agnostic, i.e. applicable to any model without any restriction to a specific model class. Specifically, we introduce methods implemented in the following three packages:
The iml and DALEX packages offer similar functionality, but they differ in design choices. Both iml and counterfactuals are based on the R6 class system, thus working with them is more similar in style to the mlr3 package. In contrast, DALEX is based on the S3 class system and focuses on comparing multiple predictive models, usually of different types, on the same plot. This is a model comparison-oriented approach, often referred to as Rashomon perspective. We will briefly introduce the available methods in the respective sections. Most of the methods are described in more detail in the introductory book about interpretable machine learning by Molnar (2022).
11.1 German Credit Classification Task
Throughout this chapter, we use the rchallenge::german data, which contains corrections proposed by Groemping (2019). The data includes 1000 credit applications from a regional bank in Germany with the aim to predict creditworthiness, labeled as “good” (which is the positive class) and “bad” in the column credit_risk. For the sake of clarity and simplicity, we use only half of the 20 features:
<TaskClassif:german_credit> (1000 x 11): German Credit
* Target: credit_risk
* Properties: twoclass
* Features (10):
- fct (7): credit_history, employment_duration, other_debtors,
property, purpose, savings, status
- int (3): age, amount, duration
We want to fit a random forest model using mlr3. We use 2/3 of the data to train the model and the remaining 1/3 to analyze the trained model using different interpretation methods. A detailed discussion on which data set to use for interpretation is given in Section 11.5.
We fit a "classif.ranger" learner, i.e. a classification random forest, on the training data to predict the probability of being creditworthy (by setting predict_type = "prob"):
The iml package (Molnar, Bischl, and Casalicchio 2018) implements a variety of model-agnostic interpretation methods and is based on the R6 class system just like the mlr3 package. It provides a unified interface to the implemented methods and facilitates the analysis and interpretation of machine learning models. Below, we provide examples of how to use the iml package using the random forest model fitted above with the mlr3 package.
11.2.1 The Predictor Object
The iml package supports machine learning models (for classification or regression) fitted by any R package. To ensure that this works seamlessly, fitted models need to be wrapped in an iml::Predictor object to unify the input-output behavior of the trained models. An iml::Predictor object contains the prediction model as well as the data used for analyzing the model and producing the desired explanation. We construct the iml::Predictor object using the test data which will be used as the basis for the presented model interpretation methods:
library("iml")credit_x=task$data(rows =split$test, cols =task$feature_names)credit_y=task$data(rows =split$test, cols =task$target_names)predictor=Predictor$new(learner, data =credit_x, y =credit_y)
Tip
The constructor iml::Predictor$new() has an (optional) input argument predict.function which requires a function that predicts on new data. For models fitted with the packages mlr3, mlr or caret, a default predict.function is already implemented in the iml package and the predict.function argument is not required to be specified. For models fitted with other packages, the model-specific predict function of that package is used by default. Passing a custom predict.function to unify the output of the model-specific predict function might be necessary for some packages. This is especially needed if the model-specific predict function does not produce a vector of predictions (in case of regression tasks or classification tasks that predict discrete classes instead of probabilities) or a data.frame with as many columns as class labels (in case of classification tasks that predict a probability for each class label).
11.2.2 Feature Importance
When deploying a model in practice, it is often of interest which features are contributing the most towards the predictive performance of the model. On the one hand, this is useful to better understand the problem at hand and the relationship between the features and the target to be predicted. On the other hand, this can also be useful to identify irrelevant features and potentially remove those using feature importance methods as the basis for feature filtering (see Section 5.1). In the context of this book, we use the term feature importance to describe global methods that calculate a single score per feature reflecting the importance regarding a certain quantity of interest such as the model performance. The calculated feature importance is reported by a single numeric score per feature and allows to rank the features according to their importance.
Tip
It should be noted that there are also other notions of feature importance not based on a performance measure such as the SHAP feature importance, which is based on Shapley values (see Section 11.2.5 and Lundberg, Erion, and Lee (2019a)) or the partial dependence-based feature importance introduced in Greenwell, Boehmke, and McCarthy (2018), which is based on the variance of a PD plot (see Section 11.2.3).
One of the most popular methods is the permutation feature importance (PFI), originally introduced by Breiman (2001a) for random forests and adapted by Fisher, Rudin, and Dominici (2019) as a model-agnostic feature importance measure, which they called model reliance. The PFI quantifies the importance of a feature regarding the model performance (measured by a performance measure of interest). Specifically, the PFI measures the change in the model performance before and after permuting a feature (the former refers to the original model performance). Permutation means that the observed feature values in a dataset are randomly shuffled. This destroys the dependency structure of the feature with the target variable and all other features while maintaining the marginal distribution of the feature. The intuition of PFI is simple: if a feature is not important, permuting that feature should not affect the model performance. However, the higher the measured change in model performance is, the more important a feature is deemed. Therefore, the PFI summarizes the change in model performance after destroying the information of each feature due to permutation in a numeric feature importance score (that either reflects the difference or the ratio of the change in performance). In general, repeating the permutation process and aggregating the individual importance values is recommended because the results can be unreliable due to the randomness of the permutation process. The PFI method requires a labeled dataset (for which both the feature and true target are given), a trained model, and a performance measure to quantify the decrease in model performance after permutation.
We now analyze which features were the most important ones according to the PFI for our fitted random forest model to classify creditworthiness. We initialize an iml::FeatureImp object with the model and the classification error ("ce") as the performance measure and visualize the results with the $plot() method.
importance=FeatureImp$new(predictor, loss ="ce")importance$plot()
Permutation feature importance (PFI). Points indicate the median and bars the 5 % and 95 % quantiles of the PFI over 5 repetitions of the permutation process.
By default, iml::FeatureImp repeats the permutation 5 times, and in each repetition, the importance value corresponding to the change in the classification error is calculated. The $plot() method of the iml::FeatureImp object shows the median of the 5 resulting importance values (as a point) and the boundaries of the error bar in the plot refer to the 5 % and 95 % quantiles of the importance values.
Tip
The number of repetitions can be set with the input argument n.repetitions of iml::FeatureImp$new(). By default, iml::FeatureImp uses the ratio of the model performance before and after permutation as an importance value. Alternatively, the difference instead of the ratio of the two performance measures can be used by setting compare = "difference" in iml::FeatureImp$new().
We see that status is the most important feature. That is, if we permute the status column in the data, the classification error of our model increases by a factor of around 1.17.
11.2.3 Feature Effects
Feature effect methods describe how or to what extent a feature contributes towards the model predictions by analyzing how the predictions change when changing a feature. For example, these methods are useful to identify whether the model estimated a non-linear relationship between a feature of interest and the target. In general, we distinguish between local and global feature effect methods. Global feature effect methods refer to how a prediction changes on average when a feature is changed. In contrast, local feature effect methods address the question of how a single prediction of a considered observation changes when a feature value is changed. To a certain extent, local feature effect methods can also reveal interactions in the model that become visible when the local effects are heterogeneous, i.e., if changes in the local effect are different across the observations.
A popular global method to visualize feature effects is the partial dependence (PD) plot (Friedman 2001). It visualizes how the model predictions change on average if we vary the feature values of a certain feature of interest. Later, Goldstein et al. (2015) discovered that the global feature effect as visualized by a PD plot can be disaggregated into local feature effects associated with a single observation. Specifically, they introduced individual conditional expectation (ICE) curves, a visual tool for local feature effects, and showed that the PD plot is just the average of ICE curves. ICE curves display how the prediction of a single observation changes when varying a feature of interest while all other features stay constant (ceteris paribus). Hence, for each observation, the values of the feature of interest are replaced by multiple other values (e.g., on an equidistant grid of the feature’s value range) to inspect the changes in the model prediction. Specifically, an ICE curve visualizes on the x-axis the set of feature values used to replace the feature of interest and on the y-axis the prediction of the model after the original feature value of the considered observation has been replaced. Hence, each ICE curve is a local explanation that assesses the feature effect of a single observation on the model prediction. A heterogeneous shape of the ICE curves, i.e., the ICE curves are not parallel and might cross each other, indicates that the model may have estimated an interaction involving the considered feature.
Note
Feature effects are very similar to regression coefficients \(\beta\) in linear models which offer interpretations such as ‘’if you increase this feature by one unit, your prediction increases on average by \(\beta\) if all other features stay constant’’. However, feature effects cannot only convey linear effects but also more complex ones (similar to splines in generalized additive models) and can be applied to any type of predictive model.
Now, we inspect how the feature amount influences the predictions of the creditworthiness classification. For this purpose, we compute feature effects using PD plots and ICE curves. We generally recommend accompanying PD plots with ICE curves because showing the PD plot alone might obfuscate heterogeneous effects and interactions. To plot PDP and ICE, we initialize an iml::FeatureEffect object with the feature name and select method = "pdp+ice". Again, we used the $plot() method to visualize the results:
Partial dependence (PD) plot (yellow) and individual conditional expectation (ICE) curves (black) that show how the credit amount affects the predicted credit risk.
As we have a binary classification task at hand, we see PD and ICE plots for the predicted probabilities of both predicted classes separately. The plot shows that if the amount is smaller than roughly 10,000 DM (standing for Deutsche Mark, the currency in Germany before the Euro was introduced as cash in 2002), there is on average a high chance that the creditworthiness is good.
11.2.4 Surrogate Models
Interpretable models such as decision trees or linear models can be used as surrogate models to approximate or mimic a (often very complex) black-box model. Inspecting the surrogate model provides insights into the behavior of a black-box model. For example, the (learned) model structure of a decision tree, or the (learned) parameters (coefficients) of a linear model can be easily interpreted and provide information on how the features formed the prediction.
We differentiate between local surrogate models, which approximate a model locally around a specific data point of interest, and global surrogate models that approximate the model across the entire input space (Ribeiro, Singh, and Guestrin 2016; Molnar 2022). The surrogate model is usually trained on the same data used to train the black-box model or on data having the same distribution to ensure a representative input space. For local surrogate models, the data used to train the surrogate model is usually weighted according to the closeness to the data point of interest. In either case, the predictions obtained from the black-box model are used as the target to train a surrogate model that approximates the black-box model.
11.2.4.1 Global Surrogate Model
With iml::TreeSurrogate, we can fit a tree-based surrogate model, more specifically a partykit::ctree() model from the partykit package, to the predictions of our random forest model we want to analyze.
Distribution of the predicted outcomes for each leaf node identified by the tree surrogate. The first two leave nodes consist of applications with a balance of the checking account being positive (statusis either "0 <= ... < 200 DM", "... >= 200 DM" or "salary for at least 1 year") and either a duration of less or equal than 36 months (first node), or more than 36 months (second node). The third and fourth nodes contain applicants that either have no checking account or a negative balance (status). For the third node the additional requirement was a duration of less or equal to 21 months, for the fourth node it was a duration of more than 21 months.
As we have used maxdepth = 2, the surrogate tree performs two binary splits yielding 4 leaf nodes. iml::TreeSurrogate extracts the decision rules found by the tree surrogate (that lead to the leaf nodes), and the $plot() method visualizes the distribution of the predicted outcomes for each leaf node. For example, we can see that the leaf node shown in the top left bar chart only consists of creditworthy applicants and results from the split by duration <= 36 and by status being either 0<= ... < 200 DM or ... >= 200 DM / salary for at least 1 year (i.e., meaning that the balance of the credit applicant’s checking account is positive).
We can access the trained tree surrogate via the $tree field of the fitted iml::TreeSurrogate object and can, therefore, apply any functions provided by the partykit package to the model object, e.g., partykit::print.party() to show the performed splits:
[1] root
| [2] status in no checking account, ... < 0 DM
| | [3] duration <= 21: *
| | [4] duration > 21: *
| [5] status in 0<= ... < 200 DM, ... >= 200 DM / salary for at least 1 year
| | [6] duration <= 36: *
| | [7] duration > 36: *
Since the surrogate model only uses the predictions of the black-box model (here, the random forest model) and not the real outcomes of the underlying data, the conclusions drawn from the surrogate model do not apply generally, but only to the black-box model (if the approximation of the surrogate model is accurate).
To evaluate whether the surrogate model approximates the prediction model accurately, we can compare the predictions of the tree surrogate and the predictions of the black-box model. For example, we can produce a cross table comparing the predicted classes of the random forest and predicted classes of the surrogate tree:
pred_surrogate=treesurrogate$predict(credit_x, type ="class")pred_rf=learner$predict_newdata(credit_x)$responseprop.table(table(pred_rf, pred_surrogate$.class))
pred_rf bad good
good 0.11515152 0.67575758
bad 0.13636364 0.07272727
Mostly, the black-box predicted class and the surrogate predicted classes overlap (with around 81% matching predictions).
11.2.4.2 Local Surrogate Model
In general, it can be very difficult to accurately approximate the black-box model with an interpretable one in the entire feature space. However, it is much simpler if we only focus on a small area in the feature space surrounding a specific point – the point of interest. For a local surrogate model, we conduct the following steps:
We obtain predictions from the black-box model for a given data set.
We weight the observations in this data set by their proximity to our point of interest.
We fit an interpretable model on the weighted data set using the predictions of the black-box model as the target.
We exploit the interpretable nature of the surrogate model to explain the prediction of our point of interest.
To illustrate this using the iml package, we first select a data point we want to explain. Here, we select an observation in the data set, let us call him Steve.
steve=credit_x[249, ]steve
age amount credit_history duration
249 74 5129 critical account/other credits elsewhere 9
employment_duration other_debtors property purpose
249 >= 7 yrs none real estate car (new)
savings status
249 unknown/no savings account ... < 0 DM
For Steve, the model predicts the class bad with 54.6% probability:
predictor$predict(steve)
good bad
1 0.4537167 0.5462833
As a next step, we use iml::LocalModel, which fits a locally weighted linear regression model that explains why Steve was classified as having bad creditworthiness by inspecting the estimated coefficients. The underlying surrogate model is L1-penalized such that only a pre-defined number of features per class, say k, will have a non-zero coefficient. The default is k = 3L and shows the three most influential features.
Note
The argument gower.power can be used to specify the size of the neighborhood for the local model (default is gower.power = 1). The smaller the value, the stronger the local model will focus on points closer to the point of interest (here: Steve). If the prediction of the local model and the prediction of the black box model greatly differ, you can change the value of gower.power and, e.g., increase the number of non-zero coefficients k for a more accurate local model.
From the table, we see that the three locally most influential features are status, amount, and credit history. A closer look at the coefficients reveals that all three features have a negative effect.
11.2.5 Shapley Values
Shapley values were originally developed in the context of cooperative game theory to study how the payout of a game can be fairly distributed among the players that form a team. This concept has been adapted for use in machine learning as a local interpretation method to explain the contributions of each input feature to the final model prediction of a single observation (Štrumbelj and Kononenko 2013). The analogy is as follows: In the context of machine learning, the cooperative game played by a set of players refers to the process of predicting by a set of features for a single observation whose prediction we want to explain. Hence, the features are considered to be the players. The total payout, which should be fairly distributed among the players, refers to the difference between the individual observation’s prediction and the mean prediction.
In essence, Shapley values aim to answer the question of how much each input feature contributed to the final prediction for a single observation (after subtracting the mean prediction). By assigning a value to each feature, we can gain insights into which features were the most important ones for the considered observation. Compared to the penalized linear model as a local surrogate model, Shapley values guarantee that the prediction is fairly distributed among the features.
Note
Shapley values are sometimes misinterpreted: The Shapley value of a feature does not display the difference of the predicted value after removing the feature from the model training. The exact Shapley value of a feature is calculated by considering all possible subsets of features and computing the difference in the model prediction with and without the feature of interest included. Hence, it refers to the average marginal contribution of a feature to the difference between the actual prediction and the mean prediction, given the current set of features.
With the help of iml::Shapley, we now generate Shapley values for Steve’s prediction. Again, the results can be visualized with the $plot() method.
Shapley values. The actual prediction displays the prediction of the model for the observation we are interested in, the average prediction displays the average prediction over the given test data set.
If we focus on the plot, the Shapley values (phi) of the features show us how to fairly distribute the difference of Steve’s probability to be creditworthy to the data set’s average probability to be creditworthy among the given features. Steve’s duration of 9 months has the most positive effect on the probability of being creditworthy, with an increase in the predicted probability of about 5 %. The most negative effect on the probability of being creditworthy has the credit history, with a decrease in the predicted probability of more than 15 %.
11.3 The counterfactuals Package
Counterfactual explanations try to identify the smallest possible changes to the input features of a given observation that would lead to a different prediction (Wachter, Mittelstadt, and Russell 2017). In other words, a counterfactual explanation provides an answer to the question: “What changes in the current feature values are necessary to achieve a different prediction?”. Such statements are valuable explanations because we can understand which features affect a prediction and what actions can be taken to obtain a different, more desirable prediction.
Counterfactual explanations can have many applications in different areas such as healthcare, finance, and criminal justice, where it may be important to understand how small changes in input features could affect the model’s prediction. For example, a counterfactual explanation could be used to suggest lifestyle changes to a patient to reduce their risk of developing a particular disease, or to suggest actions that would increase the chance of a credit being approved. For the rchallenge::german data, a counterfactual could be: “If the credit applicant would have asked for a lower credit amount and duration, he would have been classified being of good credit risk with a probability of > 50 %” (Figure 11.1). By revealing the feature changes that alter a decision, the counterfactuals reveal which features are the key drivers for a decision.
Figure 11.1: Illustration of a counterfactual explanation. The brown dot displays a counterfactual for a given point (blue dot) which proposed decreasing the credit amount and duration such that the prediction changes from bad to good.
Many methods were proposed in previous years to generate counterfactual explanations. These methods differ in what targeted properties their generated counterfactuals have (for example, are the feature changes actionable?) and with which method (for example, should a set of counterfactuals be returned in a single run?). The simplest method to generate counterfactuals is the What-If approach (Wexler et al. 2019). For a data point, whose prediction should be explained, the counterfactual is equal to the closest data point of a given data set with the desired prediction.
Another method is the multi-objective counterfactuals method (MOC) introduced by Dandl et al. (2020). Compared to the What-If method, MOC can generate multiple counterfactuals that are not equal to observations in a given dataset but are artificially generated data points. The generation of counterfactuals is based on an optimization problem that aims for counterfactuals that
have the desired prediction,
are close to the observation of interest,
only require changes in a few features, and
originate from the same distribution as the observations in the given dataset.
All these four objectives are optimized simultaneously via a multi-objective optimization method.
Due to the variety of methods, counterfactual explanations were outsourced into a separate package, the counterfactuals package, instead of integrating these methods into the iml package. Currently, three methods are implemented in the R package – including What-If and MOC – but the R6-based interface makes it easy to add other counterfactual explanation methods in the future.
11.3.1 What-If Method
To illustrate the overall workflow of the package, we generate counterfactuals for Steve using the What-If approach (Wexler et al. 2019). The counterfactuals package relies on iml::Predictor() as a model wrapper and can, therefore, explain any prediction model fitted with the mlr3 package, including the random forest model we trained above.
predictor$predict(steve)
good
1 0.4537167
The random forest classifies Steve with 45.4% as being good creditworthy. The What-If method can be used to answer, e.g., how the features need to be changed such that Steve is classified as good with a probability of more or equal to 60%. Since we have a classification model we initialize a counterfactuals::WhatIfClassif() object. By calling $find_counterfactuals(), we generate a counterfactual for Steve.
cfe is a counterfactuals::Counterfactuals object which offers many visualization and evaluation methods. For example, $evaluate(show_diff = TRUE) shows how the features need to be changed.
age amount credit_history duration
1 -8 -3603 all credits at this bank paid back duly 3
employment_duration other_debtors property purpose savings
1 <NA> <NA> <NA> <NA> <NA>
status dist_x_interest no_changed dist_train dist_target
1 no checking account 0.242221 5 0 0
minimality
1 4
For a probability of more than 60 % for being creditworthy, the age is reduced by 8 years and the amount is reduced by 3603 DM while the duration is enlarged by 3 months. Furthermore, Steve’s credit history must be improved (he must pay back all his credits at his bank duly, instead of having a critical account/other credits elsewhere), and his status must be "no checking account". For the employment duration, other debtors, property, purpose, and savings no changes are required. Additional columns in cfe$evaluate() reveal the quality of the counterfactuals, for example, the number of required feature changes (no_changed) or the distance to the closest training data point (dist_train) which is 0 because the counterfactual is a training point.
11.3.2 MOC method
Calling the MOC method is very similar to calling the What-If method. Instead of a counterfactuals::WhatIfClassif() object, we initialize a counterfactuals::MOCClassif() object. We set the epsilon parameter to 0 to penalize counterfactuals in the optimization process with predictions outside the desired range. With MOC, we can also prohibit changes in specific features, here age, via the fixed_features argument. For illustrative purposes, we let the multi-objective optimizer only run for 30 generations.
Since the multi-objective approach does not guarantee that all counterfactuals have the desired prediction, we remove all counterfactuals with predictions not equal to the desired prediction via the $subset_to_valid() method.
cfe_multi$subset_to_valid()cfe_multi
5 Counterfactual(s)
Desired class: good
Desired predicted probability range: [0.6, 1]
Head:
age amount credit_history duration
1: 74 5129 all credits at this bank paid back duly 9
2: 74 5129 existing credits paid back duly till now 9
3: 74 5129 no credits taken/all credits paid back duly 9
6 variables not shown: [employment_duration, other_debtors, property, purpose, savings, status]
Overall we generated 5 counterfactuals. Compared to the counterfactual of the What-If approach, MOC’s counterfactuals were artificially generated and are not necessarily equal to actual observations of the underlying data set. For a concise overview of the required feature changes, we can use the plot_freq_of_feature_changes() method. It visualizes the frequency of feature changes across all returned counterfactuals.
cfe_multi$plot_freq_of_feature_changes()
Relative frequency of feature changes.
We see that credit history and status are the only changed features. To see how the features have been changed, we can visualize the counterfactuals on a 2-dim ICE plot, also called a surface plot.
2-dimensional individual expectation plot for the status and credit history.
The colors and contour lines indicate the predicted value of the model when credit history and status differ while all other features are set to the values of Steve. The white point displays Steve, and the black points are the counterfactuals that only propose changes in the two features. We see that the credit histories are better or equal to Steve’s value of having a critical account/other credits elsewhere (changes to all credits at this bank paid back duly, existing credits paid back duly till now, no credits taken/all credits paid back duly) and that the status is one time lowered to no checking account, three times stayed on … < 0 DM and one time increased to … >= 200 DM / salary for at least 1 year.
11.4 The DALEX Package
The DALEX(Biecek 2018) package implements a similar set of methods as the iml package presented above, but the architecture of DALEX is oriented towards model comparison. The logic behind working with this package assumes that the process of exploring models is an iterative process, and in successive iterations we want to compare different perspectives, including perspectives presented/learned by different models. This logic is commonly referred to as the Rashomon perspective, first described in “Statistical Modeling: The Two Cultures” paper (Breiman 2001b) and more extensively developed and formalized as interactive explanatory model analysis (Baniecki, Parzych, and Biecek 2023).
You can use the DALEX package with any number of models built with the mlr3 package as well as with other frameworks in R. The counterpart in Python is the library dalex(Baniecki et al. 2021). In this chapter, we present code snippets and a general overview of this package. For illustrative purposes, we reuse the german_credit task built in the Section 11.1 on rchallenge::german data. Since we already presented intuitions behind most of the methods in Section 11.2 we do not duplicate the descriptions below and focus on presenting the syntax and code snippets.
The analysis of a model is usually an interactive process starting with a shallow analysis – usually a single-number summary. Then in a series of subsequent steps, one can systematically deepen understanding of the model by exploring the importance of single variables or pairs of variables to an in-depth analysis of the relationship between selected variables to the model outcome. See Bücker et al. (2022) for a broader discussion of what the model exploration process looks like.
This explanatory model analysis (EMA) process can focus on a single observation, in which case we speak of local model analysis, or for a set of observations, in which case we speak of global model analysis. Below, we will present these two scenarios in separate subsections. See Figure 11.2 for an overview of key functions that will be discussed. An in-depth description of this methodology can be found in the EMA book(Biecek and Burzykowski 2021).
Figure 11.2: Taxonomy of methods for model exploration presented in this chapter. Left part overview methods for global level exploration while the right part is related to local level model exploration.
Predictive models in R have different internal structures. To be able to compare models created with different frameworks, an intermediate object – a wrapper – is needed to provide a consistent interface for accessing the model. Working with explanations in the DALEX package always starts with the creation of such a wrapper with the use of the DALEX::explain() function or its specialized variants. For models created with the mlr3 package, it is more convenient to use the dedicated DALEXtra::explain_mlr3() version. Let us use it to wrap the learner object.
Preparation of a new explainer is initiated
-> model label : Ranger Credit
-> data : 330 rows 10 cols
-> target variable : 330 values
-> predict function : yhat.LearnerClassif will be used ( default )
-> predicted values : No value for predict function target column. ( default )
-> model_info : package mlr3 , ver. 0.16.0 , task classification ( default )
-> predicted values : numerical, min = 0.005031746 , mean = 0.3008911 , max = 0.8951405
-> residual function : difference between y and yhat ( default )
-> residuals : numerical, min = -0.8951405 , mean = -0.0008911013 , max = 0.9577865
A new explainer has been created!
Tip
The DALEXtra::explain_mlr3() function performs a series of internal checks so the output is a bit verbose. If any of the internal checks is alarming then the explain function output will show a yellow WARNING message. If any check does not pass, such as the predict function or the residuals cannot be calculated then a red ERROR message will appear. Turn the verbose = FALSE argument to make it less wordy.
ranger_exp
Model label: Ranger Credit
Model class: LearnerClassifRanger,LearnerClassif,Learner,R6
Data head :
age amount credit_history duration
1 35 6948 no credits taken/all credits paid back duly 36
2 28 1403 no credits taken/all credits paid back duly 15
employment_duration other_debtors property
1 1 <= ... < 4 yrs none building soc. savings agr. / life insurance
2 1 <= ... < 4 yrs none building soc. savings agr. / life insurance
purpose savings status
1 car (new) unknown/no savings account ... < 0 DM
2 others unknown/no savings account no checking account
The result of the explain function is an object of the class explainer in the S3 system. This object is a list containing at least: the model object, the dataset that will be used for calculation of explanations, the predict function, the function that calculates residuals, name/label of the model name and other additional information about the model. The generic print function for this object outputs three items: the name of the model (a label given by the user), information about the model’s classes (in this case, an object of the LearnerClassifRanger class which inherits from several broader classes), and the first few rows of data used for the explanation count.
11.4.2 Global level exploration
The global model analysis aims to understand how a model behaves on average on a set of observations. In the DALEX package, functions for global level analysis have names starting with the prefix model_.
11.4.2.1 Model Performance
As shown in Figure 11.2, the model exploration process starts by evaluating the performance of a model. This can be done with a variety of tools, but in the DALEX package the simplest is to use the DALEX::model_performance function. Since the explain function checks what type of task is being analyzed, it can select the appropriate performance measures for it. In our illustration, we have a binary classification model, so measures such as precision, recall, F1-score, accuracy, or AUC are calculated in the following snippet.
Graphical summary of model performance using Receiver Operator Curve. The dashed gray line indicates the diagonal.
Tip
Various visual summaries may be selected with the geom argument. In credit risk, the LIFT curve is also a popular graphical summary.
11.4.2.2 Feature Importance
The process of model understanding usually starts with understanding which features are the most important. A popular technique for this is the permutation feature importance introduced in Section 11.2.2. More about this technique can be found in the chapter Variable-importance Measures of the EMA book.
The DALEX::model_parts() function calculates the importance of features and its results can be visualized with the generic plot() function.
Graphical summary of permutation importance of features. Each bar starts at the loss function for the original data and ends at the loss function for the data after permutation of a specific feature. The longer the bar, the larger the change in the loss function after permutation of the particular feature and therefore the more important the feature.
The bars start in the model loss (here 1 minus AUC) calculated for the selected data and end in a loss calculated for the data after the permutation of the selected feature. The more important the feature, the more the model will lose after its permutation. The plot indicates that status is the most important feature.
Tip
The type argument in the model_parts function allows you to specify how the importance of the features is to be calculated, by the difference of the loss functions (type = "difference"), by the quotient (type = "ratio"), or without any transformation (type = "raw").
11.4.2.3 Feature Effects
Once we know which features are most important, we can use feature effect plots to understand its effect and show how the model, on average, changes with changes in selected features. The DALEX::model_profile() function calculates effects and, by default, it calculates the partial dependence profiles (see Section 11.2.3).
plot(ranger_profiles)+theme(legend.position ="top")+ggtitle("Partial Dependence for Ranger Credit model","")
Graphical summary of the model’s partial dependence profile for three selected variables (age, amount, duration). Each variable is shown in a separate panel. The jumpy response function is characteristic of a random forest model.
From the figure above, we can read the interesting information that the random forest model has learned a non-monotonic relationship for the features amount and age.
Tip
The type argument of the DALEX::model_profile() function also allows marginal profiles (with type = "conditional") and accumulated local profiles (with type = "accumulated") to be calculated.
11.4.3 Local level explanation
The local model analysis aims to understand how a model behaves for a single observation. In the DALEX package, functions for local analysis have names starting with the prefix predict_.
We will carry out the following examples using a sample credit application, labeled as Steve.
steve=credit_x[249,]steve
age amount credit_history duration
249 74 5129 critical account/other credits elsewhere 9
employment_duration other_debtors property purpose
249 >= 7 yrs none real estate car (new)
savings status
249 unknown/no savings account ... < 0 DM
11.4.3.1 Model Prediction
As shown in Figure 11.2, the local analysis starts with the calculation of a model prediction. It turns out that for the selected credit application, the chances are quite high that it will not be paid.
A popular technique for assessing the contributions of features to model prediction is break-down (see the Break-down Plots for Additive Attributions chapter in the EMA book for more information about this method).
The function DALEX::predict_parts() function calculates the attributions of features and its results can be visualized with the generic plot() function.
ranger_attributions=predict_parts(ranger_exp, new_observation =steve)plot(ranger_attributions)+ggtitle("Break Down for Steve")
Graphical summary of local attributions of features calculated by the break-down method. Positive attributions are shown in green and negative attributions in red. The violet bar corresponds to the model prediction for the explained observation and the dashed line corresponds to the average model prediction.
Looking at the plots above, we can read that the biggest contributors to the final prediction were for Steve the features credit_history and status.
Tip
The order argument allows you to indicate the selected order of the features. This is a useful option when the features have some relative conditional importance (e.g. pregnancy and sex).
11.4.3.3 Shapley Values
By far the most popular technique for local model exploration (Holzinger et al. 2022) is Shapley values introduced in Section 11.2.5. The most popular algorithm for estimating these values is the SHAP algorithm. A detailed description of the method and algorithm can be found in the chapter SHapley Additive exPlanations (SHAP) for Average Attributions of the EMA book.
The function DALEX::predict_parts() calculates SHAP attributions, you just need to set type = "shap". Its results can be visualized with a generic plot() function.
ranger_shap=predict_parts(ranger_exp, new_observation =steve, type ="shap")plot(ranger_shap, show_boxplots =FALSE)+ggtitle("Shapley values for Steve", "")
Graphical summary of local attributions of features calculated by the shap method. Positive attributions are shown in green and negative attributions in red.
The results for Break Down and SHAP methods are generally similar. Differences will emerge if there are many complex interactions in the model.
Tip
Shapley values can take a long time to compute. This process can be sped up at the expense of accuracy. The parameters B and N can be used to tune this trade-off, where N is the number of observations on which conditional expectation values are estimated (500 by default) and B is the number of random paths used to calculate Shapley values (25 by default).
11.4.3.4 Ceteris Paribus Effects
In the previous section, we have introduced a global explanation – partial dependence plots. Ceteris paribus plots are the local version of that explanation. They show the response of a model when only one feature is changed while others stay unchanged. Ceteris paribus profiles are also known as individual conditional explanations (ICE). More on this technique can be found in Section 11.2.3 and the chapter Ceteris-paribus Profiles of the EMA book.
The function DALEX::predict_profile() calculates ceteris paribus profiles which can be visualized with the generic plot() function. The blue dot stands for the prediction for Steve.
Graphical summary of local model response profiles calculated by the ceteris-paribus method for three selected variables (age, amount, duration). Each variable is shown in a separate panel.
Tip
Ceteris paribus profiles are often plotted simultaneously for a set of observations. This helps to see potential deviations from the additivity of the model. To draw profiles for multiple observations, simply point to more rows in the data for the predict_profile function. Deviation from additivity is manifested by profiles that are not parallel.
11.5 Training or Test Data?
According to Chapter 3, performance evaluation should not be conducted on the training data but on unseen data to receive unbiased estimates for the performance. Similar considerations play a role in the choice of the underlying data set used for post-hoc interpretations of a model: Should model interpretation be based on training data or test data?
This question is especially relevant when we use a performance-based interpretation method such as the permutation feature importance (PFI, see also Section 11.2.2). For example, if the PFI is computed based on the training data, the reported feature importance values may overestimate the actual importance: the model may have learned to rely on specific features during training that may not be as important for generalization to new data. This comes from the fact that the performance measured on the training data is overly optimistic and permuting a feature would lead to a huge drop in the performance. In contrast, computing the PFI on the test data provides a more accurate estimate of the feature importance as the performance measured on test data is a more reliable estimate. Thus, for the same reasons performance evaluation of a model should be performed on an independent test set (see Chapter 3), it is also recommended to apply a performance-based interpretation method such as the PFI on an independent test set.
Note
For performance evaluation, it is generally recommended to use resampling with more than one iteration, e.g., k-fold cross-validation or (repeated) subsampling, instead of a single train-test split as performed by the hold-out method (see Chapter 3). However, for the sake of simplicity, we will use a single test data set when applying the interpretation methods to produce explanations throughout this chapter. Since produced explanations can be considered as statistical estimates, a higher number of resampling iterations can reduce the variance and result in a more reliable explanation (Molnar et al. 2021).
For prediction-based methods that do not require performance estimation such as ICE/PD (Section 11.2.3) or Shapley values (Section 11.2.5), the differences in interpretation between training and test data are less pronounced (Molnar et al. 2022).
11.6 Conclusions
In this chapter, we learned how to gain post-hoc insights into a model trained with mlr3 by using the most popular approaches from the field of interpretable machine learning. The methods are all model-agnostic such that they do not depend on specific model classes. We utilized three different packages: iml, counterfactuals and DALEX. iml and DALEX offer a wide range of (partly) overlapping methods, while the counterfactuals package focuses solely on counterfactual methods. We demonstrated on the rchallenge::german data set how these packages offer an in-depth analysis of a random forest model fitted with mlr3. In the following, we show some limitations of the presented methods.
If features are correlated, the insights from the interpretation methods should be treated with caution. Changing the feature values of an observation without taking the correlation with other features into account leads to unrealistic combinations of the feature values. Since such feature combinations are also unlikely part of the training data, the model will likely extrapolate in these areas (Molnar et al. 2022; Hooker and Mentch 2019). This distorts the interpretation of methods that are based on changing single feature values such as PFI, PD plots, and Shapley values. Alternative methods can help in these cases: conditional feature importance instead of PFI (Strobl et al. 2008; Watson and Wright 2021), accumulated local effect plots instead of PD plots (Apley and Zhu 2020), and the KernelSHAP method instead of Shapley values (Lundberg, Erion, and Lee 2019b).
The explanations derived from an interpretation method can also be ambiguous. A method can deliver multiple equally plausible but potentially contradicting explanations. This phenomenon is also called the Rashomon effect (Breiman 2001b). Differing hyperparameters can be one reason, for example, local surrogate models react very sensitively to changes in the underlying weighting scheme of observations. Even with fixed hyperparameters, the underlying data set or the initial seed can lead to disparate explanations (Molnar et al. 2022).
The rchallenge::german is only a low-dimensional dataset with a limited number of observations. Applying interpretation methods off-the-shelf to higher dimensional datasets is often not feasible due to the enormous computational costs. That is why in previous years more efficient methods were proposed, e.g., Shapley value computations based on kernel-based estimators (SHAP). Another challenge is that the high-dimensional IML output generated for high-dimensional datasets can overwhelm users. If the features can be meaningfully grouped, grouped versions of the methods, e.g. the grouped feature importance proposed by Au et al. (2022), can be applied.
11.7 Exercises
Model explanation allows us to confront our expert knowledge related to the problem with relations learned by the model. The following tasks are based on predictions of the value of football players based on data from the FIFA game. It is a graceful example, as most people have some intuition about how a footballer’s age or skill can affect their value. The latest FIFA statistics can be downloaded from kaggle.com, but also one can use the 2020 data available in the DALEX packages (see the DALEX::fifa data set). The following exercises can be performed in both the iml and DALEX packages, we provide solutions for both.
Prepare an mlr3 regression task for the fifa data. Select only features describing the age and skills of footballers. Train a predictive model of your own choice on this task, e.g. regr.ranger, to predict the value of a footballer.
Use the permutation importance method to calculate feature importance ranking. Which feature is the most important? Do you find the results surprising?
Use the partial dependence plot/profile to draw the global behavior of the model for this feature. Is it aligned with your expectations?
Choose one of the football players, for example, a well-known striker (e.g., Robert Lewandowski) or a well-known goalkeeper (e.g., Manuel Neuer). The following tasks are worth repeating for several different choices.
For the selected footballer, calculate and plot the Shapley values. Which feature is locally the most important and has the strongest influence on the valuation of the footballer?
For the selected footballer, calculate the ceteris paribus profiles / individual conditional expectation curves to draw the local behavior of the model for this feature. Is it different from the global behavior?
Apley, Daniel W., and Jingyu Zhu. 2020. “Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models.”Journal of the Royal Statistical Society Series B: Statistical Methodology 82 (4): 1059–86. https://doi.org/10.1111/rssb.12377.
Au, Quay, Julia Herbinger, Clemens Stachl, Bernd Bischl, and Giuseppe Casalicchio. 2022. “Grouped Feature Importance and Combined Features Effect Plot.”Data Mining and Knowledge Discovery 36 (4): 1401–50. https://doi.org/10.1007/s10618-022-00840-5.
Baniecki, Hubert, and Przemyslaw Biecek. 2019. “modelStudio: Interactive Studio with Explanations for ML Predictive Models.”Journal of Open Source Software 4 (43): 1798. https://doi.org/10.21105/joss.01798.
Baniecki, Hubert, Wojciech Kretowicz, Piotr Piątyszek, Jakub Wiśniewski, and Przemysław Biecek. 2021. “dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python.”Journal of Machine Learning Research 22 (214): 1–7. http://jmlr.org/papers/v22/20-1473.html.
Baniecki, Hubert, Dariusz Parzych, and Przemyslaw Biecek. 2023. “The grammar of interactive explanatory model analysis.”Data Mining and Knowledge Discovery, 1573–756X. https://doi.org/10.1007/s10618-023-00924-w.
Biecek, Przemyslaw. 2018. “DALEX: Explainers for complex predictive models in R.”Journal of Machine Learning Research 19 (84): 1–5. http://jmlr.org/papers/v19/18-416.html.
Biecek, Przemyslaw, and Tomasz Burzykowski. 2021. Explanatory Model Analysis. Chapman; Hall/CRC, New York. https://ema.drwhy.ai/.
———. 2001b. “Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).”Statistical Science 16 (3). https://doi.org/10.1214/ss/1009213726.
Bücker, Michael, Gero Szepannek, Alicja Gosiewska, and Przemyslaw Biecek. 2022. “Transparency, Auditability, and Explainability of Machine Learning Models in Credit Scoring.”Journal of the Operational Research Society 73 (1): 70–90. https://doi.org/10.1080/01605682.2021.1922098.
Dandl, Susanne, Christoph Molnar, Martin Binder, and Bernd Bischl. 2020. “Multi-Objective Counterfactual Explanations.” In Parallel Problem Solving from Nature PPSNXVI, 448–69. Springer International Publishing. https://doi.org/10.1007/978-3-030-58112-1_31.
Fisher, Aaron, Cynthia Rudin, and Francesca Dominici. 2019. “All Models Are Wrong, but Many Are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously.”https://doi.org/10.48550/arxiv.1801.01489.
Friedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.”The Annals of Statistics 29 (5). https://doi.org/10.1214/aos/1013203451.
Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.”Journal of Computational and Graphical Statistics 24 (1): 44–65. https://doi.org/10.1080/10618600.2014.907095.
Greenwell, Brandon M, Bradley C Boehmke, and Andrew J McCarthy. 2018. “A Simple and Effective Model-Based Variable Importance Measure.”arXiv Preprint arXiv:1805.04755.
Groemping, Ulrike. 2019. “South German Credit Data: Correcting a Widely Used Data Set.”Rep. Math., Phys. Chem., Berlin, Germany, Tech. Rep 4: 2019.
Holzinger, Andreas, Anna Saranti, Christoph Molnar, Przemyslaw Biecek, and Wojciech Samek. 2022. “Explainable AI Methods - a Brief Overview.”International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers, 13–38. https://doi.org/10.1007/978-3-031-04083-2_2.
Krzyziński, Mateusz, Mikołaj Spytek, Hubert Baniecki, and Przemysław Biecek. 2023. “SurvSHAP(t): Time-dependent explanations of machine learning survival models.”Knowledge-Based Systems 262: 110234. https://doi.org/https://doi.org/10.1016/j.knosys.2022.110234.
Lundberg, Scott M., Gabriel G. Erion, and Su-In Lee. 2019a. “Consistent Individualized Feature Attribution for Tree Ensembles.”https://arxiv.org/abs/1802.03888.
Molnar, Christoph, Bernd Bischl, and Giuseppe Casalicchio. 2018. “Iml: An r Package for Interpretable Machine Learning.”JOSS 3 (26): 786. https://doi.org/10.21105/joss.00786.
Molnar, Christoph, Timo Freiesleben, Gunnar König, Giuseppe Casalicchio, Marvin N Wright, and Bernd Bischl. 2021. “Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process.”arXiv Preprint arXiv:2109.01433.
Molnar, Christoph, Gunnar König, Julia Herbinger, Timo Freiesleben, Susanne Dandl, Christian A. Scholbeck, Giuseppe Casalicchio, Moritz Grosse-Wentrup, and Bernd Bischl. 2022. “General Pitfalls of Model-Agnostic Interpretation Methods for Machine Learning Models.” In xxAI - Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers, edited by Andreas Holzinger, Randy Goebel, Ruth Fong, Taesup Moon, Klaus-Robert Müller, and Wojciech Samek, 39–68. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-04083-2_4.
Ribeiro, Marco, Sameer Singh, and Carlos Guestrin. 2016. ““Why Should I Trust You?”: Explaining the Predictions of Any Classifier.” In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 97–101. San Diego, California: Association for Computational Linguistics. https://doi.org/10.18653/v1/N16-3020.
Romaszko, Kamil, Magda Tatarynowicz, Mateusz Urbański, and Przemysław Biecek. 2019. “modelDown: Automated Website Generator with Interpretable Documentation for Predictive Machine Learning Models.”Journal of Open Source Software 4 (38): 1444. https://doi.org/10.21105/joss.01444.
Strobl, Carolin, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, and Achim Zeileis. 2008. “Conditional Variable Importance for Random Forests.”BMC Bioinformatics 9 (1). https://doi.org/10.1186/1471-2105-9-307.
Štrumbelj, Erik, and Igor Kononenko. 2013. “Explaining Prediction Models and Individual Predictions with Feature Contributions.”Knowledge and Information Systems 41 (3): 647–65. https://doi.org/10.1007/s10115-013-0679-x.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.”SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3063289.
Watson, David S, and Marvin N Wright. 2021. “Testing Conditional Independence in Supervised Learning Algorithms.”Machine Learning 110 (8): 2107–29.
Wexler, James, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viégas, and Jimbo Wilson. 2019. “The What-If Tool: Interactive Probing of Machine Learning Models.”IEEE Transactions on Visualization and Computer Graphics 26 (1): 56–65. https://doi.org/10.1109/TVCG.2019.2934619.
Wiśniewski, Jakub, and Przemysław Biecek. 2022. “The r Journal: Fairmodels: A Flexible Tool for Bias Detection, Visualization, and Mitigation in Binary Classification Models.”The R Journal 14: 227–43. https://doi.org/10.32614/RJ-2022-019.
---author: - name: Przemysław Biecek orcid: 0000-0001-8423-1823 email: przemyslaw.biecek@gmail.com affiliations: - name: MI2.AI, Warsaw University of Technology, Poland - name: University of Warsaw, Poland - name: Susanne Dandl orcid: 0000-0003-4324-4163 email: dandls.datascience@gmail.com affiliations: - name: Ludwig-Maximilians-Universität München - name: Munich Center for Machine Learning (MCML) - name: Giuseppe Casalicchio orcid: 0000-0001-5324-5966 email: giuseppe.casalicchio@stat.uni-muenchen.de affiliations: - name: Ludwig-Maximilians-Universität München - name: Munich Center for Machine Learning (MCML) - name: Essential Data Science Training GmbH - name: Marvin N. Wright orcid: 0000-0002-8542-6291 email: wright@leibniz-bips.de affiliations: - name: Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany - name: University of Bremen, Germany - name: University of Copenhagen, Denmarkabstract: The goal of this chapter is to present key methods that allow an in-depth analysis of a trained model. When using predictive models in practice, a high generalization performance alone is often not sufficient. In many applications, users want to gain insights into the inner workings of a model and, e.g., understand which features are important and how they influence the model's predictions. For the end user, this knowledge allows better utilization of models in the decision-making process, e.g., by analyzing different possible decision options. In addition, if the model's behavior turns out to be in line with the domain knowledge or the user's intuition, then the user's confidence in the model and its prediction will increase. For the modeler, an in-depth analysis of the model allows undesirable model behavior to be detected and corrected.---# Model Interpretation {#sec-interpretation}{{< include _setup.qmd >}}<!-- Predictive models have numerous applications in virtually every area of life. -->The increasing availability of data and software frameworks to create predictive models has allowed the widespread adoption of machine learning in many applications.However, high predictive performance of such models often comes at the cost of `r index("interpretability")`:Many trained models by default do not provide further insights into the model and are often too complex to be understood by humans such that questions like ''What are the most important features and how do they influence a prediction?'' cannot be directly answered.This lack of explanations hurts trust and creates barriers to adapt predictive models, especially in critical areas with decisions affecting human life, such as credit scoring or medical applications.<!-- TODO: Not sure if the interpretation goals below are too broad as we do not mention HOW and with which IML methods they can be addressed? -->Interpretation methods are valuable from multiple perspectives:1. To gain global insights into a model, e.g., to identify which features were overall most important.2. To improve the model after flaws were identified (in the data or model), e.g., whether the model unexpectedly relies on a certain feature.3. To understand and control individual predictions, e.g., to identify how a given predictionchanges when changing the input.4. For justification purposes or to assess fairness, e.g., to inspect whether the model adversely affects certain subpopulations or individuals. This point is not subject of this chapter, but will be discussed in detail in @sec-fairness.In this chapter, we focus on some important methods from the field of interpretable machine learning that can be applied `r index("post-hoc")`, i.e. after the model has been trained, and are `r index("model-agnostic")`, i.e. applicable to any model without any restriction to a specific model class.<!-- TODO: Interactions fehlen... -->Specifically, we introduce methods implemented in the following three packages:- `r ref_pkg("iml")` presented in @sec-iml,- `r ref_pkg("counterfactuals")`presented in @sec-counterfactuals that implements counterfactual explanation methods, and- `r ref_pkg("DALEX")` presented in @sec-dalex.The `r ref_pkg("iml")` and `r ref_pkg("DALEX")` packages offer similar functionality, but they differ in design choices.Both `r ref_pkg("iml")` and `r ref_pkg("counterfactuals")` are based on the R6 class system, thus working with them is more similar in style to the `r ref_pkg("mlr3")` package.In contrast, `r ref_pkg("DALEX")` is based on the S3 class system and focuses on comparing multiple predictive models, usually of different types, on the same plot. This is a model comparison-oriented approach, often referred to as Rashomon perspective.We will briefly introduce the available methods in the respective sections.Most of the methods are described in more detail in the introductory book about interpretable machine learning by @Molnar2022.<!-- Model-agnostic means that the methods can be applied to any machine learning model without any restriction to a specific model class. --><!-- We distinguish between local and global methods: --><!-- If we want to explain the model behavior over the entire population by taking all observations into account, we should use `r index("global interpretation methods")`. --><!-- If we want to understand how the model behaves for single data instances (and the close neighborhood around them), we should use `r index("local interpretation methods")`. -->## German Credit Classification Task {#sec-credit-task}Throughout this chapter, we use the `r ref("rchallenge::german")` data, which contains corrections proposed by @groemping2019south.The data includes 1000 credit applications from a regional bank in Germany with the aim to predict creditworthiness, labeled as "good" (which is the positive class) and "bad" in the column `credit_risk`.For the sake of clarity and simplicity, we use only half of the 20 features:```{r interpretation-003}library("mlr3")task =tsk("german_credit")task$select(cols =c("duration", "amount", "age", "status", "savings", "purpose","credit_history", "property", "employment_duration", "other_debtors"))task```We want to fit a random forest model using `r ref_pkg("mlr3")`.We use 2/3 of the data to train the model and the remaining 1/3 to analyze the trained model using different interpretation methods. A detailed discussion on which data set to use for interpretation is given in @sec-dataset.```{r, interpretation--005-model}library("mlr3learners")set.seed(1L)split =partition(task)```We fit a `"classif.ranger"` learner, i.e. a classification random forest, on the training data to predict the probability of being creditworthy (by setting `predict_type = "prob"`):```{r, interpretation--0006-model}learner =lrn("classif.ranger", predict_type ="prob")learner$train(task, row_ids = split$train)```## The `iml` Package {#sec-iml}The `r ref_pkg("iml")` package [@Molnar2018] implements a variety of model-agnostic interpretation methods and is based on the R6 class system just like the `r ref_pkg("mlr3")` package.It provides a unified interface to the implemented methods and facilitates the analysis and interpretation of machine learning models.<!-- Each interpretation method has its own R6 class and inherits from the same parent class --><!-- The implemented methods internally originate from the same parent class and use the same processing framework. --><!-- Thus, calls to each interpretation method follow the same syntax and also the output and functionalities are consistent (for example, all methods have a `$plot()` method). --><!-- This makes it easy to analyse machine learning models using multiple interpretation tools. -->Below, we provide examples of how to use the `r ref_pkg("iml")` package using the random forest model fitted above with the `r ref_pkg("mlr3")` package.### The Predictor ObjectThe `r ref_pkg("iml")` package supports machine learning models (for classification or regression) fitted by *any* R package.To ensure that this works seamlessly, fitted models need to be wrapped in an `r ref("iml::Predictor")` object to unify the input-output behavior of the trained models.An `r ref("iml::Predictor")` object contains the prediction model as well as the data used for analyzing the model and producing the desired explanation.We construct the `r ref("iml::Predictor")` object using the test data which will be used as the basis for the presented model interpretation methods:```{r iml-Predictor}library("iml")credit_x = task$data(rows = split$test, cols = task$feature_names)credit_y = task$data(rows = split$test, cols = task$target_names)predictor = Predictor$new(learner, data = credit_x, y = credit_y)```::: {.callout-tip}The constructor `r ref("iml::Predictor", text = "iml::Predictor$new()")` has an (optional) input argument `predict.function` which requires a function that predicts on new data.For models fitted with the packages `r ref_pkg("mlr3")`, `r ref_pkg("mlr")` or `r ref_pkg("caret")`, a default `predict.function` is already implemented in the `r ref_pkg("iml")` package and the `predict.function` argument is not required to be specified.For models fitted with other packages, the model-specific `predict` function of that package is used by default.Passing a custom `predict.function` to unify the output of the model-specific `predict` function might be necessary for some packages.This is especially needed if the model-specific `predict` function does not produce a vector of predictions (in case of regression tasks or classification tasks that predict discrete classes instead of probabilities) or a `data.frame` with as many columns as class labels (in case of classification tasks that predict a probability for each class label).<!-- This argument only needs to be specified if the model was not built with the `r ref_pkg("mlr3")`, `r ref_pkg("mlr")` or `r ref_pkg("caret")` packages. --><!-- Since the random forest (`learner`) was fitted with `r ref_pkg("mlr3")`, `predict.function` is already implemented in the `r ref_pkg("iml")` package and does not need to be specified by us. -->:::### Feature Importance {#sec-feat-importance}When deploying a model in practice, it is often of interest which features are contributing the most towards the *predictive performance* of the model.On the one hand, this is useful to better understand the problem at hand and the relationship between the features and the target to be predicted.On the other hand, this can also be useful to identify irrelevant features and potentially remove those using feature importance methods as the basis for feature filtering (see @sec-fs-filter).In the context of this book, we use the term `r index("feature importance")` to describe global methods that calculate a single score per feature reflecting the importance regarding a certain quantity of interest such as the model performance.<!-- As we focus on performance-based feature importance methods here, the *quantity of interest* usually refers to the improvement regarding a performance measure such as the classification error. -->The calculated feature importance is reported by a single numeric score per feature and allows to rank the features according to their importance.::: {.callout-tip}It should be noted that there are also other notions of feature importance not based on a performance measure such as the SHAP feature importance, which is based on Shapley values (see @sec-shapley and @lundberg2019consistent) or the partial dependence-based feature importance introduced in @greenwell2018simple, which is based on the variance of a PD plot (see @sec-feature-effects).:::<!-- based on computing Shapley values for each observation and averaging their absolute values (SHAP feature importance) --><!-- quantify the importance of features regarding a certain quantity of interest. --><!-- and help us to identify the features that are most important for a prediction. -->One of the most popular methods is the permutation feature importance (PFI), originally introduced by @breiman2001random for random forests and adapted by @Fisher2019pfi as a model-agnostic feature importance measure, which they called *model reliance*.The PFI quantifies the importance of a feature regarding the model performance (measured by a performance measure of interest).Specifically, the PFI measures the change in the model performance before and after permuting a feature (the former refers to the original model performance).Permutation means that the observed feature values in a dataset are randomly shuffled.This destroys the dependency structure of the feature with the target variable and all other features while maintaining the marginal distribution of the feature.The intuition of PFI is simple: if a feature is not important, permuting that feature should not affect the model performance.However, the higher the measured change in model performance is, the more important a feature is deemed.<!-- However, the worse the performance under permutation is in comparison to the original model performance, the more important this feature is w.r.t. the prediction performance. --><!-- Therefore, the PFI summarizes the decrease (either as a difference or as a fraction) in model performance (or, equivalently the increase in the model's prediction error) after destroying the information of each feature due to permutation in a numeric feature importance score. -->Therefore, the PFI summarizes the change in model performance after destroying the information of each feature due to permutation in a numeric feature importance score (that either reflects the difference or the ratio of the change in performance).In general, repeating the permutation process and aggregating the individual importance values is recommended because the results can be unreliable due to the randomness of the permutation process.The PFI method requires a labeled dataset (for which both the feature and true target are given), a trained model, and a performance measure to quantify the decrease in model performance after permutation.<!-- Then, the following steps need to be conducted for each feature: --><!-- 1. Generate a new dataset by permuting the column of the feature of interest. --><!-- 2. Estimate the performance of the model on the permuted data. --><!-- 3. Compare the performance under permutation with the original model performance. -->We now analyze which features were the most important ones according to the PFI for our fitted random forest model to classify creditworthiness.We initialize an `r ref("iml::FeatureImp")` object with the model and the classification error (`"ce"`) as the performance measure and visualize the results with the `$plot()` method.```{r iml-007}#| fig-height: 3#| fig-cap: Permutation feature importance (PFI). Points indicate the median and bars the 5 % and 95 % quantiles of the PFI over 5 repetitions of the permutation process. #| fig-alt: Plot of the permutation feature importance values per feature of the credit data set. The permutation process was repeated 5 times. The points on the plot indicate the median and bars indicate the 5 % and 95 % quantiles of the permutation feature importance over the 5 repetitions.importance = FeatureImp$new(predictor, loss ="ce")importance$plot()```By default, `r ref("iml::FeatureImp")` repeats the permutation 5 times, and in each repetition, the importance value corresponding to the change in the classification error is calculated.The `$plot()` method of the `r ref("iml::FeatureImp")` object shows the median of the 5 resulting importance values (as a point) and the boundaries of the error bar in the plot refer to the 5 % and 95 % quantiles of the importance values.::: {.callout-tip}The number of repetitions can be set with the input argument `n.repetitions` of `r ref("iml::FeatureImp", text = "iml::FeatureImp$new()")`.By default, `r ref("iml::FeatureImp")` uses the ratio of the model performance before and after permutation as an importance value.Alternatively, the difference instead of the ratio of the two performance measures can be used by setting `compare = "difference"` in `r ref("iml::FeatureImp", text = "iml::FeatureImp$new()")`.:::We see that `r importance$results$feature[1]` is the most important feature.That is, if we permute the `r importance$results$feature[1]` column in the data, the classification error of our model increases by a factor of around `r round(importance$results$importance[1],2)`.### Feature Effects {#sec-feature-effects}`r index("Feature effect")` methods describe how or to what extent a feature contributes towards the *model predictions* by analyzing how the predictions change when changing a feature.For example, these methods are useful to identify whether the model estimated a non-linear relationship between a feature of interest and the target.<!-- or to identify whether the model contains interactions. -->In general, we distinguish between local and global feature effect methods.Global feature effect methods refer to how a prediction changes *on average* when a feature is changed.In contrast, local feature effect methods address the question of how a *single* prediction of a considered observation changes when a feature value is changed.To a certain extent, local feature effect methods can also reveal interactions in the model that become visible when the local effects are heterogeneous, i.e., if changes in the local effect are different across the observations.A popular global method to visualize feature effects is the partial dependence (PD) plot [@Friedman2001pdp].It visualizes how the model predictions change on average if we vary the feature values of a certain feature of interest.Later, @Goldstein2015ice discovered that the global feature effect as visualized by a PD plot can be disaggregated into local feature effects associated with a single observation.Specifically, they introduced individual conditional expectation (ICE) curves, a visual tool for local feature effects, and showed that the PD plot is just the average of ICE curves.ICE curves display how the prediction of a *single* observation changes when varying a feature of interest while all other features stay constant (ceteris paribus).<!-- TODO: Add a figure for illustration from iml lecture? --><!-- To visualize the ICE values of a single feature -- the ICE curve -- the prediction changes are inspected for multiple points (e.g., on an equidistant grid of the feature's value range). -->Hence, for each observation, the values of the feature of interest are replaced by multiple other values (e.g., on an equidistant grid of the feature's value range) to inspect the changes in the model prediction.Specifically, an ICE curve visualizes on the x-axis the set of feature values used to replace the feature of interest and on the y-axis the prediction of the model after the original feature value of the considered observation has been replaced.Hence, each ICE curve is a local explanation that assesses the feature effect of a *single* observation on the model prediction.A heterogeneous shape of the ICE curves, i.e., the ICE curves are not parallel and might cross each other, indicates that the model may have estimated an interaction involving the considered feature.<!-- To receive an estimate of the global effect, the ICE curves of all observations in a given data set can be averaged. -->::: {.callout-note}Feature effects are very similar to regression coefficients $\beta$ in linear models which offer interpretations such as''if you increase this feature by one unit, your prediction increases on average by $\beta$ if all other features stay constant''.However, feature effects cannot only convey linear effects but also more complex ones (similar to splines in generalized additive models) and can be applied to any type of predictive model.:::Now, we inspect how the feature `amount` influences the predictions of the creditworthiness classification.For this purpose, we compute feature effects using PD plots and ICE curves.We generally recommend accompanying PD plots with ICE curves because showing the PD plot alone might obfuscate heterogeneous effects and interactions.To plot PDP and ICE, we initialize an `r ref("iml::FeatureEffect")` object with the feature name and select `method = "pdp+ice"`.Again, we used the `$plot()` method to visualize the results:```{r iml-pdp}#| fig-height: 3#| fig-cap: Partial dependence (PD) plot (yellow) and individual conditional expectation (ICE) curves (black) that show how the credit amount affects the predicted credit risk. #| fig-alt: Feature effect plot that shows how the credit amount affects the prediction of being a good or bad risk. Individual conditional expectation curves on the plot highlight how changing the amount for individuals affects their risk prediction. Averaging these curves results in the partial dependence plot that is also shown.effect = FeatureEffect$new(predictor,feature ="amount", method ="pdp+ice")effect$plot()```As we have a binary classification task at hand, we see PD and ICE plots for the predicted probabilities of both predicted classes separately.The plot shows that if the `amount` is smaller than roughly 10,000 DM (standing for Deutsche Mark, the currency in Germany before the Euro was introduced as cash in 2002), there is on average a high chance that the creditworthiness is `good`.<!-- We also see that the effect of the amount on the probability of being creditworthy is homogeneous because most of the ICE curves are parallel. --><!-- We can also compute and visualize the feature effects of all numeric features at once with `r ref("iml::FeatureEffects")`. --><!-- ```{r iml-pdp2, message=FALSE, warning=FALSE, fig.cap='Feature effects of all numeric features computed with the PDP method implemented in `iml::FeatureEffect` for the penguin classification task and random forest model.', fig.align='center'} --><!-- num_features = c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g", "year") --><!-- effect = FeatureEffects$new(predictor, features = num_features, method = "pdp") --><!-- effect$plot() --><!-- ``` --><!-- All numeric features except for study `year` (either 2007, 2008 or 2009) provide meaningful interpretable information. -->### Surrogate Models<!-- Interpretable models such as decision trees or linear models can be used as `r index("surrogate models")` to approximate the behavior of a (often very complex) black-box model. --><!-- Surrogate models are simpler models such as a decision trees or linear models that can be trained to mimic the behavior of the black-box model, allowing for greater interpretability and explainability. --><!-- `r index("Surrogate models")` aim at approximating the (often) very complex machine learning model using an inherently interpretable model such as a decision tree or linear model as a surrogate. -->Interpretable models such as decision trees or linear models can be used as `r index("surrogate models")` to approximate or mimic a (often very complex) black-box model.Inspecting the surrogate model provides insights into the behavior of a black-box model.For example, the (learned) model structure of a decision tree, or the (learned) parameters (coefficients) of a linear model can be easily interpreted and provide information on how the features formed the prediction.<!-- Suitable surrogate models are, e.g., decision trees or linear models whose tree structure or coefficients can be easily interpreted. -->We differentiate between local surrogate models, which approximate a model locally around a specific data point of interest, and global surrogate models that approximate the model across the entire input space [@Ribeiro2016lime;@Molnar2022].<!-- While it is very difficult to approximate the whole model, it is much simpler if we only do so for a small area in the feature space surrounding a specific point -- the point of interest. -->The surrogate model is usually trained on the same data used to train the black-box model or on data having the same distribution to ensure a representative input space.For local surrogate models, the data used to train the surrogate model is usually weighted according to the closeness to the data point of interest.In either case, the predictions obtained from the black-box model are used as the target to train a surrogate model that approximates the black-box model.<!-- Since surrogate models approximate the black-box model of interest, the target to train the surrogate model are the predictions obtained from the black-box model. -->#### Global Surrogate ModelWith `r ref("iml::TreeSurrogate")`, we can fit a tree-based surrogate model, more specifically a `r ref("partykit::ctree()")` model from the `r ref_pkg("partykit")` package, to the predictions of our random forest model we want to analyze.```{r iml-globalsurrogate,message=FALSE}#| fig-cap: Distribution of the predicted outcomes for each leaf node identified by the tree surrogate. The first two leave nodes consist of applications with a balance of the checking account being positive (`status`is either `"0 <= ... < 200 DM"`, `"... >= 200 DM"` or `"salary for at least 1 year"`) and either a duration of less or equal than 36 months (first node), or more than 36 months (second node). The third and fourth nodes contain applicants that either have no checking account or a negative balance (`status`). For the third node the additional requirement was a duration of less or equal to 21 months, for the fourth node it was a duration of more than 21 months.#| fig-alt: Distribution of the predicted outcomes for each leaf node identified by the tree surrogate. The first two leave nodes consist of applications with a balance of the checking account being positive (`status`is either `"0 <= ... < 200 DM"`, `"... >= 200 DM"` or `"salary for at least 1 year"`) and either a duration of less or equal than 36 months (first node), or more than 36 months (second node). The third and fourth nodes contain applicants that either have no checking account or a negative balance (`status`). For the third node the additional requirement was a duration of less or equal to 21 months, for the fourth node it was a duration of more than 21 months.treesurrogate = TreeSurrogate$new(predictor, maxdepth = 2L)treesurrogate$plot()```As we have used `maxdepth = 2`, the surrogate tree performs two binary splits yielding 4 leaf nodes.<!-- By default, the plot method of `r ref("iml::TreeSurrogate")` extracts the decision rules found by the tree surrogate (that lead to the leaf nodes) and visualizes the distribution of the predicted outcomes for each leaf node. -->`r ref("iml::TreeSurrogate")` extracts the decision rules found by the tree surrogate (that lead to the leaf nodes), and the `$plot()` method visualizes the distribution of the predicted outcomes for each leaf node.For example, we can see that the leaf node shown in the top left bar chart only consists of creditworthy applicants and results from the split by `duration <= 36` and by `status` being either `0<= ... < 200 DM` or `... >= 200 DM / salary for at least 1 year` (i.e., meaning that the balance of the credit applicant's checking account is positive).<!-- For example, we can see that the first leaf node that results from the split `(status = 0 <= ... < 200 DM | status = ... >= 200 DM / salary for at least 1 year) & duration <= 36` and only consists of persons with good creditworthiness. -->We can access the trained tree surrogate via the `$tree` field of the fitted `r ref("iml::TreeSurrogate")` object and can, therefore, apply any functions provided by the `r ref_pkg("partykit")` package to the model object, e.g., `r ref("partykit::print.party()")` to show the performed splits:<!-- apply the `plot()` function to visualize the underlying structure of the tree surrogate: -->```{r iml-globalsurrogate-tree}partykit::print.party(treesurrogate$tree)```<!-- If a penguin comes from the Biscoe island, the model derives the species based on the flipper length. If a penguin comes from the other islands, the model determines the species from the bill length. --><!-- It is important to note that statements such as "the bill length and the island determine the species" are in general not valid, since the surrogate model never sees the real outcomes of the underlying data. -->Since the surrogate model only uses the predictions of the black-box model (here, the random forest model) and not the real outcomes of the underlying data, the conclusions drawn from the surrogate model do not apply generally, but only to the black-box model (if the approximation of the surrogate model is accurate).<!-- We can only draw conclusions on the data if the surrogate model approximates the prediction model accurately and the prediction model accurately predicts the species. -->To evaluate whether the surrogate model approximates the prediction model accurately, we can compare the predictions of the tree surrogate and the predictions of the black-box model.For example, we can produce a cross table comparing the predicted classes of the random forest and predicted classes of the surrogate tree:```{r iml-crosstable}pred_surrogate = treesurrogate$predict(credit_x, type ="class")pred_rf = learner$predict_newdata(credit_x)$responseprop.table(table(pred_rf, pred_surrogate$.class))mean(pred_rf == pred_surrogate$.class)```Mostly, the black-box predicted class and the surrogate predicted classes overlap (with around `r round(mean(pred_rf == pred_surrogate$.class)*100)`% matching predictions).<!-- Furthermore, the `r ref("iml::TreeSurrogate")` object has a field `$r.squared` --><!-- R squared measures how well the surrogate tree approximates the underlying black-box model. --><!-- It is calculated as 1 - (variance of prediction differences / variance of black-box model predictions). For the multi-class case, r.squared contains one measure per class. --><!-- ```{r} --><!-- treesurrogate$r.squared --><!-- ``` -->#### Local Surrogate ModelIn general, it can be very difficult to accurately approximate the black-box model with an interpretable one in the entire feature space.However, it is much simpler if we only focus on a small area in the feature space surrounding a specific point -- the point of interest.For a local surrogate model, we conduct the following steps:1. We obtain predictions from the black-box model for a given data set.2. We weight the observations in this data set by their proximity to our point of interest.3. We fit an interpretable model on the weighted data set using the predictions of the black-box model as the target.4. We exploit the interpretable nature of the surrogate model to explain the prediction of our point of interest.<!-- How can we approach this with the `r ref_pkg("iml")` package? -->To illustrate this using the `r ref_pkg("iml")` package, we first select a data point we want to explain.Here, we select an observation in the data set, let us call him Steve.<!-- ind = order(predictor$predict(credit_x[credit_x$duration< 20,])[,1])cbind(credit_x[credit_x$duration< 20,][ind,1:2], credit_y[credit_x$duration< 20,][ind,], pr = predictor$predict(credit_x[credit_x$duration< 20,][ind,])) -->```{r steve, asis='results'}steve = credit_x[249, ]steve```For Steve, the model predicts the class `r names(which.max(predictor$predict(steve)))` with `r round(max(predictor$predict(steve))*100, 1)`% probability:```{r steve2, asis='results'}predictor$predict(steve)```As a next step, we use `r ref("iml::LocalModel")`, which fits a locally weighted linear regression model that explains why Steve was classified as having bad creditworthiness by inspecting the estimated coefficients.The underlying surrogate model is L1-penalized such that only a pre-defined number of features per class, say `k`, will have a non-zero coefficient.The default is `k = 3L` and shows the three most influential features.::: {.callout-note}The argument `gower.power` can be used to specify the size of the neighborhood for the local model (default is `gower.power = 1`).The smaller the value, the stronger the local model will focus on points closer to the point of interest (here: Steve).If the prediction of the local model and the prediction of the black box model greatly differ, you can change the value of `gower.power` and, e.g., increase the number of non-zero coefficients `k` for a more accurate local model.:::```{r iml-localsurrogate,message=FALSE}predictor$class ="good"localsurrogate = LocalModel$new(predictor, steve, gower.power =0.01)localsurrogate$results[, c("beta", "effect", "x.recoded")]```From the table, we see that the three *locally* most influential features are status, amount, and credit history.A closer look at the coefficients reveals that all three features have a negative effect.<!-- Compared to the global surrogate model, the local surrogate does not have to be accurate w.r.t. the prediction of the black-box model on the whole data set but only w.r.t. to the prediction of the black-box model on the local neighborhood of the point of interest. -->### Shapley Values {#sec-shapley}`r index("Shapley values")` were originally developed in the context of cooperative game theory to study how the payout of a game can be fairly distributed among the players that form a team.This concept has been adapted for use in machine learning as a local interpretation method to explain the contributions of each input feature to the final model prediction of a single observation [@Trumbelj2013Shapley].The analogy is as follows:In the context of machine learning, the cooperative game played by a set of players refers to the process of predicting by a set of features for a single observation whose prediction we want to explain.Hence, the features are considered to be the players.The total payout, which should be fairly distributed among the players, refers to the difference between the individual observation's prediction and the mean prediction.In essence, Shapley values aim to answer the question of how much each input feature contributed to the final prediction for a single observation (after subtracting the mean prediction).By assigning a value to each feature, we can gain insights into which features were the most important ones for the considered observation.Compared to the penalized linear model as a local surrogate model, Shapley values guarantee that the prediction is fairly distributed among the features.<!-- Shapley values are calculated by considering all possible combinations of features and computing the difference in the model prediction with and without each feature included. --><!-- The Shapley value of a feature is then defined as the average contribution of the feature across all possible combinations. --><!-- To compute Shapley values, we need to inspect how the prediction changes if a feature value is present vs. when a feature value --><!-- is not present. Not being present means that the feature is set to a different value than the one of the single observation. --><!-- For inspecting the prediction changes, the feature values of the other features do not necessarily have to stay constant (as it is demanded for ICE curves or PD plots) but they can also differ. This is necessary such that the interaction effects are also taken into account. -->::: {.callout-note}Shapley values are sometimes misinterpreted: The Shapley value of a feature does *not* display the difference of the predicted value after removing the feature from the model training.The exact Shapley value of a feature is calculated by considering all possible subsets of features and computing the difference in the model prediction with and without the feature of interest included.<!-- The contribution of a feature is then defined as the difference in the expected prediction with and without the considered feature, averaged over all possible subsets that include the feature. -->Hence, it refers to the average marginal contribution of a feature to the difference between the actual prediction and the mean prediction, given the current set of features.:::With the help of `r ref("iml::Shapley")`, we now generate Shapley values for Steve's prediction. Again, the results can be visualized with the `$plot()` method.```{r iml-006}#| fig-height: 3#| fig-cap: Shapley values. The actual prediction displays the prediction of the model for the observation we are interested in, the average prediction displays the average prediction over the given test data set. #| fig-alt: Bar plot of Shapley values. Duration has the strongest positive Shapley value, followed by amount and purpose. Credit history has the most negative Shapley value, followed by status, property, age, savings, and employment duration. The header of the bar plot shows the predicted probability of being creditworthy of the observation we are interested in, which is 0.45. It also shows the average probability of being creditworthy of the underlying test data set, which is 0.7.shapley = Shapley$new(predictor, x.interest = steve)plot(shapley)```If we focus on the plot, the Shapley values (`phi`) of the features show us how to fairly distribute the difference of Steve's probability to be creditworthy to the data set's average probability to be creditworthy among the given features. Steve's duration of 9 months has the most positive effect on the probability of being creditworthy, with an increase in the predicted probability of about 5 %. The most negative effect on the probability of being creditworthy has the credit history, with a decrease in the predicted probability of more than 15 %.<!-- ### Independent Test Data {#subsec-iml-testdata} --><!-- It is also interesting to see how well the model performs on a test data set. For this section, 2/3 of the penguin data set will be used for the training set and 1/3 for the test set (default of the holdout method in `r ref("mlr3::resample")`): --><!-- ```{r iml-008, message=FALSE, warning=FALSE} --><!-- train_set = sample(task_peng$nrow, 2/3 * task_peng$nrow) --><!-- test_set = setdiff(seq_len(task_peng$nrow), train_set) --><!-- learner$train(task_peng, row_ids = train_set) --><!-- prediction = learner$predict(task_peng, row_ids = test_set) --><!-- ``` --><!-- First, we compare the feature importance on training and test set --><!-- ```{r iml-009, message=FALSE, warning=FALSE, fig.height=3, fig.cap='FeatImp on train (left) and test (right)', fig.align='center'} --><!-- # plot on training --><!-- model = Predictor$new(learner, data = penguins[train_set, ], y = "species") --><!-- effect = FeatureImp$new(predictor, loss = "ce") --><!-- plot_train = effect$plot() --><!-- # plot on test data --><!-- model = Predictor$new(learner, data = penguins[test_set, ], y = "species") --><!-- effect = FeatureImp$new(predictor, loss = "ce") --><!-- plot_test = effect$plot() --><!-- # combine into single plot --><!-- library("patchwork") --><!-- plot_train + plot_test --><!-- ``` --><!-- In both cases, the bill lengths is the most important feature. Since all other features have similar, much lower importance values, the ranking between training and test data slightly changes. The magnitude of values differs between training and test data. For test data FI values of $>$ 15 are measured while for train data the values are $\le$ 0.3. This is because fitting a model means that the model parameters are adapted to have low prediction error on the training data. --><!-- We follow a similar approach to compare the feature effects: --><!-- ```{r iml-010, message=FALSE, warning=FALSE, fig.cap='FeatEffect train data set', fig.align='center'} --><!-- num_features = c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g", "year") --><!-- model = Predictor$new(learner, data = penguins[train_set, ], y = "species") --><!-- effect = FeatureEffects$new(predictor, method = "pdp") --><!-- plot(effect, features = num_features) --><!-- ``` --><!-- ```{r iml-011, message=FALSE, warning=FALSE, fig.cap='FeatEffect test data set', fig.align='center'} --><!-- model = Predictor$new(learner, data = penguins[test_set, ], y = "species") --><!-- effect = FeatureEffects$new(predictor, method = "pdp") --><!-- plot(effect, features = num_features) --><!-- ``` --><!-- As is the case with `FeatureImp`, the test data results are similar to the training results but the magnitude of effects differs slightly. This would be a good opportunity for the reader to inspect the effect of varying amounts of features and the amount of data used for both the test and train data sets on `FeatureImp` and `FeatureEffects`. -->## The `counterfactuals` Package {#sec-counterfactuals}Counterfactual explanations try to identify the smallest possible changes to the input features of a given observation that would lead to a different prediction [@Wachter2017].<!-- answer the question of how we can minimally change the feature values of a given observation such that we obtain a different prediction [@Wachter2017]. --><!-- This allows for statements such as: ''If you had the following feature values instead of the present ones, you would have received the desired outcome.'' -->In other words, a counterfactual explanation provides an answer to the question: "What changes in the current feature values are necessary to achieve a different prediction?".Such statements are valuable explanations because we can understand which features affect a prediction and what actions can be taken to obtain a different, more desirable prediction.<!--The goal is to identify the smallest possible set of changes needed to achieve a desired prediction, while still keeping the changes as realistic and interpretable as possible.-->Counterfactual explanations can have many applications in different areas such as healthcare, finance, and criminal justice, where it may be important to understand how small changes in input features could affect the model's prediction.For example, a counterfactual explanation could be used to suggest lifestyle changes to a patient to reduce their risk of developing a particular disease, or to suggest actions that would increase the chance of a credit being approved.For the `r ref("rchallenge::german")` data, a counterfactual could be: "If the credit applicant would have asked for a lower credit amount and duration, he would have been classified being of good credit risk with a probability of \> 50 %" (@fig-counterfactuals-ill). By revealing the feature changes that alter a decision, the counterfactuals reveal which features are the key drivers for a decision.<!-- source to figure: https://docs.google.com/presentation/d/1H-g90sKhvkhvQzLdEwRj-8B7rGkxWSbs7Wj_Nzvl3eU/edit?usp=sharing -->```{r interpretation-counterfactuals-fig, echo=FALSE}#| label: fig-counterfactuals-ill#| out-width: 50%#| fig-cap: Illustration of a counterfactual explanation. The brown dot displays a counterfactual for a given point (blue dot) which proposed decreasing the credit amount and duration such that the prediction changes from bad to good.#| fig-alt: Illustration of counterfactual explanations. Two dots are shown one that is the point whose prediction we want to explain and the other is its counterfactual. The counterfactual proposes to decrease the credit amount and duration such that the point is classified of being a good credit risk.knitr::include_graphics("Figures/counterfactuals.png")```Many methods were proposed in previous years to generate counterfactual explanations. These methods differ in what targeted properties their generated counterfactuals have (for example, are the feature changes actionable?) and with which method (for example, should a set of counterfactuals be returned in a single run?).<!-- that differ in the counterfactual properties they target, the generation approach, or the number of returned counterfactuals. -->The simplest method to generate counterfactuals is the What-If approach [@Wexler2019].For a data point, whose prediction should be explained, the counterfactual is equal to the closest data point of a given data set with the desired prediction.Another method is the multi-objective counterfactuals method (MOC) introduced by @Dandl2020.Compared to the What-If method, MOC can generate multiple counterfactuals that are not equal to observations in a given dataset but are artificially generated data points.The generation of counterfactuals is based on an optimization problem that aims for counterfactuals that1) have the desired prediction,2) are close to the observation of interest,3) only require changes in a few features, and4) originate from the same distribution as the observations in the given dataset.All these four objectives are optimized simultaneously via a multi-objective optimization method.Due to the variety of methods, counterfactual explanations were outsourced into a separate package, the `r ref_pkg("counterfactuals")` package, instead of integrating these methods into the `r ref_pkg("iml")` package. Currently, three methods are implemented in the R package -- including What-If and MOC -- but the R6-based interface makes it easy to add other counterfactual explanation methods in the future.### What-If Method<!-- In the following, we focus on the simplest method among the three implemented ones: the What-If approach [@Wexler2019]. For a data point, whose prediction should be explained, the returned counterfactual is equal to the closest data point of a given data set (here, the test data) with the desired prediction. To illustrate the overall workflow of the package, we generate counterfactuals for Steve, the first penguin in the data set. -->To illustrate the overall workflow of the package, we generate counterfactuals for Steve using the What-If approach [@Wexler2019].The `r ref_pkg("counterfactuals")` package relies on `r ref("iml::Predictor()")` as a model wrapper and can, therefore, explain any prediction model fitted with the `r ref_pkg("mlr3")` package, including the random forest model we trained above.```{r interpretation-predictsteve}predictor$predict(steve)```The random forest classifies Steve with `r round(max(predictor$predict(steve))*100, 1)`% as being `r predictor$class` creditworthy. The What-If method can be used to answer, e.g., how the features need to be changed such that Steve is classified as `r predictor$class` with a probability of more or equal to 60%. Since we have a classification model we initialize a `r ref("counterfactuals::WhatIfClassif()")` object. By calling `$find_counterfactuals()`, we generate a counterfactual for Steve.```{r interpretation-whatif}library("counterfactuals")whatif = WhatIfClassif$new(predictor, n_counterfactuals = 1L)cfe = whatif$find_counterfactuals(steve,desired_class ="good", desired_prob =c(0.6, 1))````cfe` is a `r ref("counterfactuals::Counterfactuals")` object which offers many visualization and evaluation methods. For example, `$evaluate(show_diff = TRUE)` shows how the features need to be changed.```{r interpretation-whatifevaluation, asis='results'}data.frame(cfe$evaluate(show_diff =TRUE))```<!-- ```{r interpretation-whatifevaluation,echo=FALSE,results='asis'} --><!-- as.data.frame(cfe$evaluate(show_diff = TRUE)) --><!-- ``` --><!-- These changes can also be visualized with the`$parallel_plot()` method. --><!-- The blue line corresponds to the original feature values of Steve, while the gray line displays the counterfactual. --><!-- ```{r interpretation-whatifparallel} --><!-- #| fig-height: 3.5 --><!-- #| out-width: 80% --><!-- feat_freq = cfe$get_freq_of_feature_changes(subset_zero = TRUE) --><!-- cfe$plot_parallel(feature_names = names(feat_freq)) --><!-- ``` -->For a probability of more than 60 % for being creditworthy, the age is reduced by 8 years and the amount is reduced by 3603 DM while the duration is enlarged by 3 months. Furthermore, Steve's credit history must be improved (he must pay back all his credits at his bank duly, instead of having a critical account/other credits elsewhere), and his status must be `"no checking account"`. For the employment duration, other debtors, property, purpose, and savings no changes are required. Additional columns in `cfe$evaluate()` reveal the quality of the counterfactuals, for example, the number of required feature changes (`no_changed`) or the distance to the closest training data point (`dist_train`) which is 0 because the counterfactual *is* a training point.### MOC methodCalling the MOC method is very similar to calling the What-If method. Instead of a `r ref("counterfactuals::WhatIfClassif()")` object, we initialize a `r ref("counterfactuals::MOCClassif()")` object.We set the `epsilon` parameter to 0 to penalize counterfactuals in the optimization process with predictions outside the desired range.With MOC, we can also prohibit changes in specific features, here age, via the `fixed_features` argument.For illustrative purposes, we let the multi-objective optimizer only run for 30 generations.```{r interpretation-mocmulti,message=FALSE}library("counterfactuals")set.seed(123L)moc = MOCClassif$new(predictor, epsilon =0, n_generations = 30L,fixed_features =c("age"))cfe_multi = moc$find_counterfactuals(steve,desired_class ="good", desired_prob =c(0.6, 1))```Since the multi-objective approach does not guarantee that all counterfactuals have the desired prediction, we remove all counterfactuals with predictions not equal to the desired prediction via the `$subset_to_valid()` method.```{r interpretation-mocmulti-subset}cfe_multi$subset_to_valid()cfe_multi```Overall we generated `r nrow(cfe_multi$data)` counterfactuals.Compared to the counterfactual of the What-If approach, MOC's counterfactuals were artificially generated and are not necessarily equal to actual observations of the underlying data set.For a concise overview of the required feature changes, we can use the `plot_freq_of_feature_changes()` method.It visualizes the frequency of feature changes across all returned counterfactuals.```{r interpretation-mocfreq}#| fig-height: 3.5#| fig-cap: Relative frequency of feature changes.#| fig-alt: Barplot of the relative frequency of feature changes of the counterfactuals found by MOC. For 80 % of the counterfactuals credit history was changed, for 40 % status.cfe_multi$plot_freq_of_feature_changes()```We see that credit history and status are the only changed features.To see how the features have been changed, we can visualize the counterfactuals on a 2-dim ICE plot, also called a surface plot.<!-- ```{r interpretation-moc2features,asis='results'} --><!-- data.frame(cfe_multi$evaluate("no_changed", show_diff = TRUE)) --><!-- ``` -->```{r interpretation-mocsurface}#| fig-height: 3.5#| fig-cap: 2-dimensional individual expectation plot for the status and credit history.#| fig-alt: 2-dimensional individual expectation plot for the features status (x-axis) and credit history (y-axis). Colors and contour lines indicate the predicted value of the model when credit history and status differ while all other features are set to the values of our point of interest, Steve. A white point displays the values of Steve, and the black points are the counterfactuals that only propose changes in the two features status and credit history.cfe_multi$plot_surface(feature_names =c("status", "credit_history"))```The colors and contour lines indicate the predicted value of the model when credit history and status differ while all other features are set to the values of Steve.The white point displays Steve, and the black points are the counterfactuals that only propose changes in the two features.We see that the credit histories are better or equal to Steve's value of having a critical account/other credits elsewhere (changes to all credits at this bank paid back duly, existing credits paid back duly till now, no credits taken/all credits paid back duly) and that the status is one time lowered to no checking account, three times stayed on ... < 0 DM and one time increased to ... >= 200 DM / salary for at least 1 year.## The `DALEX` Package {#sec-dalex}The `r ref_pkg("DALEX")`[@Biecek2018] package implements a similar set of methods as the `r ref_pkg("iml")` package presented above, but the architecture of `r ref_pkg("DALEX")` is oriented towards model comparison. The logic behind working with this package assumes that the process of exploring models is an iterative process, and in successive iterations we want to compare different perspectives, including perspectives presented/learned by different models. This logic is commonly referred to as the Rashomon perspective, first described in "Statistical Modeling: The Two Cultures" paper [@Breiman2001] and more extensively developed and formalized as interactive explanatory model analysis [@Baniecki2023].You can use the `r ref_pkg("DALEX")` package with any number of models built with the `r ref_pkg("mlr3")` package as well as with other frameworks in `R`. The counterpart in `Python` is the library `dalex`[@Baniecki2021].In this chapter, we present code snippets and a general overview of this package. For illustrative purposes, we reuse the `german_credit` task built in the @sec-credit-task on `r ref("rchallenge::german")` data. Since we already presented intuitions behind most of the methods in @sec-iml we do not duplicate the descriptions below and focus on presenting the syntax and code snippets.Once you become familiar with the philosophy of working with the `r ref_pkg("DALEX")` package, you can also use other packages from this family such as `r ref_pkg("fairmodels")`[@Wisniewski2022] for detection and mitigation of biases, `r ref_pkg("modelStudio")`[@Baniecki2019] for interactive model exploration, `r ref_pkg("modelDown")`[@Romaszko2019] for the automatic generation of IML model documentation in the form of a report, `r ref_pkg("survex")`[@Krzyzinski2023] for the explanation of survival models, or `r ref_pkg("treeshap")` for the analysis of tree-based models.### Explanatory model analysis {#sec-interpretability-architecture}The analysis of a model is usually an interactive process starting with a shallow analysis -- usually a single-number summary. Then in a series of subsequent steps, one can systematically deepen understanding of the model by exploring the importance of single variables or pairs of variables to an in-depth analysis of the relationship between selected variables to the model outcome. See @Bucker2022 for a broader discussion of what the model exploration process looks like.This explanatory model analysis (EMA) process can focus on a single observation, in which case we speak of local model analysis, or for a set of observations, in which case we speak of global model analysis. Below, we will present these two scenarios in separate subsections. See @fig-dalex-fig-plot-01 for an overview of key functions that will be discussed. An in-depth description of this methodology can be found in the [EMA book](https://ema.drwhy.ai/)[@biecek_burzykowski_2021].```{r interpretation-012, echo=FALSE}#| label: fig-dalex-fig-plot-01#| out-width: 92%#| fig-cap: Taxonomy of methods for model exploration presented in this chapter. Left part overview methods for global level exploration while the right part is related to local level model exploration.#| fig-alt: Taxonomy of methods for model exploration presented in this chapter. Left part overview methods for global level exploration while the right part is related to local level model exploration.knitr::include_graphics("Figures/DALEX_ema_process.png")```Predictive models in R have different internal structures. To be able to compare models created with different frameworks, an intermediate object -- a wrapper -- is needed to provide a consistent interface for accessing the model. Working with explanations in the `r ref_pkg("DALEX")` package always starts with the creation of such a wrapper with the use of the `r ref("DALEX::explain()")` function or its specialized variants. For models created with the `r mlr3` package, it is more convenient to use the dedicated `r ref("DALEXtra::explain_mlr3()")` version. Let us use it to wrap the `learner` object.```{r interpretation-019,message=FALSE}library("DALEX")library("DALEXtra")ranger_exp = DALEXtra::explain_mlr3(learner,data = credit_x,y =as.numeric(credit_y$credit_risk =="bad"),label ="Ranger Credit",colorize =FALSE)```::: {.callout-tip}The `r ref("DALEXtra::explain_mlr3()")` function performs a series of internal checks so the output is a bit verbose. If any of the internal checks is alarming then the `explain` function output will show a yellow `WARNING` message. If any check does not pass, such as the predict function or the residuals cannot be calculated then a red `ERROR` message will appear. Turn the `verbose = FALSE` argument to make it less wordy.:::```{r interpretation-019b,message=FALSE}ranger_exp```The result of the explain function is an object of the class `explainer` in the S3 system. This object is a list containing at least: the model object, the dataset that will be used for calculation of explanations, the predict function, the function that calculates residuals, name/label of the model name and other additional information about the model. The generic `print` function for this object outputs three items: the name of the model (a label given by the user), information about the model's classes (in this case, an object of the `LearnerClassifRanger` class which inherits from several broader classes), and the first few rows of data used for the explanation count. ### Global level exploration {#sec-interpretability-dataset-level}The global model analysis aims to understand how a model behaves on average on a set of observations. In the `r ref_pkg("DALEX")` package, functions for global level analysis have names starting with the prefix `model_`.#### Model PerformanceAs shown in @fig-dalex-fig-plot-01, the model exploration process starts by evaluating the performance of a model. This can be done with a variety of tools, but in the `r ref_pkg("DALEX")` package the simplest is to use the `r ref("DALEX::model_performance")` function. Since the `explain` function checks what type of task is being analyzed, it can select the appropriate performance measures for it. In our illustration, we have a binary classification model, so measures such as precision, recall, F1-score, accuracy, or AUC are calculated in the following snippet.```{r interpretation-020a}#| fig-height: 6#| fig-width: 5#| out-width: 60%perf_credit =model_performance(ranger_exp)perf_credit```Each explanation can be drawn with the generic `plot()` function, for classification the default graphical summary is the ROC curve.```{r interpretation-020b}#| fig-height: 6#| fig-width: 5#| out-width: 60%#| fig-cap: Graphical summary of model performance using Receiver Operator Curve. The dashed gray line indicates the diagonal. #| fig-alt: Graphical summary of model performance using Receiver Operator Curve. The dashed gray line indicates the diagonal.library("ggplot2")old_theme =set_theme_dalex("ema")plot(perf_credit, geom ="roc")```::: {.callout-tip}Various visual summaries may be selected with the `geom` argument. In credit risk, the LIFT curve is also a popular graphical summary.:::<!--The task of classifying the penguin species is rather easy, which is why there are so many values of 1 in the performance assessment of this model.-->#### Feature ImportanceThe process of model understanding usually starts with understanding which features are the most important. A popular technique for this is the permutation feature importance introduced in @sec-feat-importance. More about this technique can be found in the chapter *Variable-importance Measures* of the [EMA book](https://ema.drwhy.ai/).The `r ref("DALEX::model_parts()")` function calculates the importance of features and its results can be visualized with the generic `plot()` function.```{r interpretation-021}#| fig-height: 4#| fig-width: 8#| out-width: 90%#| fig-cap: Graphical summary of permutation importance of features. Each bar starts at the loss function for the original data and ends at the loss function for the data after permutation of a specific feature. The longer the bar, the larger the change in the loss function after permutation of the particular feature and therefore the more important the feature.#| fig-alt: Graphical summary of permutation importance of features. Each bar starts at the loss function for the original data and ends at the loss function for the data after permutation of a specific feature. The longer the bar, the larger the change in the loss function after permutation of the particular feature and therefore the more important the feature.ranger_effect =model_parts(ranger_exp)head(ranger_effect)plot(ranger_effect, show_boxplots =FALSE)```The bars start in the model loss (here 1 minus AUC) calculated for the selected data and end in a loss calculated for the data after the permutation of the selected feature. The more important the feature, the more the model will lose after its permutation.The plot indicates that status is the most important feature.::: {.callout-tip}The `type` argument in the `model_parts` function allows you to specify how the importance of the features is to be calculated, by the difference of the loss functions (`type = "difference"`), by the quotient (`type = "ratio"`), or without any transformation (`type = "raw"`).:::#### Feature EffectsOnce we know which features are most important, we can use [feature effect plots](https://ema.drwhy.ai/partialDependenceProfiles.html) to understand its effect and show how the model, on average, changes with changes in selected features. The `r ref("DALEX::model_profile()")` function calculates effects and, by default, it calculates the partial dependence profiles (see @sec-feature-effects).```{r interpretation-024, warning=FALSE}#| fig-height: 5#| fig-width: 8#| out-width: 90%#| fig-cap: Graphical summary of the model's partial dependence profile for three selected variables (age, amount, duration). Each variable is shown in a separate panel. The jumpy response function is characteristic of a random forest model. #| fig-alt: Graphical summary of the model's partial dependence profile for three selected variables (age, amount, duration). Each variable is shown in a separate panel. The jumpy response function is characteristic of a random forest model.ranger_profiles =model_profile(ranger_exp)ranger_profilesplot(ranger_profiles) +theme(legend.position ="top") +ggtitle("Partial Dependence for Ranger Credit model","")```From the figure above, we can read the interesting information that the random forest model has learned a non-monotonic relationship for the features `amount` and `age.`::: {.callout-tip}The `type` argument of the `r ref("DALEX::model_profile()")` function also allows *marginal profiles* (with `type = "conditional"`) and *accumulated local profiles* (with `type = "accumulated"`) to be calculated.:::### Local level explanation {#sec-interpretability-instance-level}The local model analysis aims to understand how a model behaves for a single observation. In the `r ref_pkg("DALEX")` package, functions for local analysis have names starting with the prefix `predict_`.We will carry out the following examples using a sample credit application, labeled as Steve.```{r interpretation-025a}steve = credit_x[249,]steve```#### Model PredictionAs shown in @fig-dalex-fig-plot-01, the local analysis starts with the calculation of a model prediction. It turns out that for the selected credit application, the chances are quite high that it will not be paid.```{r interpretation-025}predict(ranger_exp, steve)```#### Break Down ValuesA popular technique for assessing the contributions of features to model prediction is break-down (see the *Break-down Plots for Additive Attributions* chapter in the [EMA book](https://ema.drwhy.ai) for more information about this method).The function `r ref("DALEX::predict_parts()")` function calculates the attributions of features and its results can be visualized with the generic `plot()` function.```{r interpretation-027}#| fig-height: 4.5#| fig-width: 8#| out-width: 90%#| fig-cap: Graphical summary of local attributions of features calculated by the break-down method. Positive attributions are shown in green and negative attributions in red. The violet bar corresponds to the model prediction for the explained observation and the dashed line corresponds to the average model prediction.#| fig-alt: Graphical summary of local attributions of features calculated by the break-down method. Positive attributions are shown in green and negative attributions in red. The violet bar corresponds to the model prediction for the explained observation and the dashed line corresponds to the average model prediction.ranger_attributions =predict_parts(ranger_exp, new_observation = steve)plot(ranger_attributions) +ggtitle("Break Down for Steve")```Looking at the plots above, we can read that the biggest contributors to the final prediction were for Steve the features `credit_history` and `status`.::: {.callout-tip}The `order` argument allows you to indicate the selected order of the features. This is a useful option when the features have some relative conditional importance (e.g. pregnancy and sex).:::#### Shapley ValuesBy far the most popular technique for local model exploration [@Holzinger2022] is Shapley values introduced in @sec-shapley. The most popular algorithm for estimating these values is the SHAP algorithm. A detailed description of the method and algorithm can be found in the chapter *SHapley Additive exPlanations (SHAP) for Average Attributions* of the [EMA book](https://ema.drwhy.ai/).The function `r ref("DALEX::predict_parts()")` calculates SHAP attributions, you just need to set `type = "shap"`. Its results can be visualized with a generic `plot()` function.```{r interpretation-028}#| fig-height: 4.5#| fig-width: 8#| out-width: 90%#| fig-cap: Graphical summary of local attributions of features calculated by the shap method. Positive attributions are shown in green and negative attributions in red.#| fig-alt: Graphical summary of local attributions of features calculated by the shap method. Positive attributions are shown in green and negative attributions in red.ranger_shap =predict_parts(ranger_exp, new_observation = steve,type ="shap")plot(ranger_shap, show_boxplots =FALSE) +ggtitle("Shapley values for Steve", "")```The results for Break Down and SHAP methods are generally similar. Differences will emerge if there are many complex interactions in the model.::: {.callout-tip}Shapley values can take a long time to compute. This process can be sped up at the expense of accuracy. The parameters `B` and `N` can be used to tune this trade-off, where `N` is the number of observations on which conditional expectation values are estimated (500 by default) and `B` is the number of random paths used to calculate Shapley values (25 by default).:::#### Ceteris Paribus EffectsIn the previous section, we have introduced a global explanation -- partial dependence plots. Ceteris paribus plots are the local version of that explanation. They show the response of a model when only one feature is changed while others stay unchanged. Ceteris paribus profiles are also known as individual conditional explanations (ICE). More on this technique can be found in @sec-feature-effects and the chapter *Ceteris-paribus Profiles* of the [EMA book](https://ema.drwhy.ai/).The function `r ref("DALEX::predict_profile()")` calculates ceteris paribus profiles which can be visualized with the generic `plot()` function. The blue dot stands for the prediction for Steve.```{r interpretation-029, warning=FALSE}#| fig-height: 5#| fig-width: 8#| out-width: 90%#| fig-cap: Graphical summary of local model response profiles calculated by the ceteris-paribus method for three selected variables (age, amount, duration). Each variable is shown in a separate panel. #| fig-alt: Graphical summary of local model response profiles calculated by the ceteris-paribus method for three selected variables (age, amount, duration). Each variable is shown in a separate panel. ranger_ceteris =predict_profile(ranger_exp, steve)plot(ranger_ceteris)```::: {.callout-tip}Ceteris paribus profiles are often plotted simultaneously for a set of observations. This helps to see potential deviations from the additivity of the model. To draw profiles for multiple observations, simply point to more rows in the data for the `predict_profile` function. Deviation from additivity is manifested by profiles that are not parallel.:::## Training or Test Data? {#sec-dataset}According to @sec-performance, performance evaluation should not be conducted on the training data but on unseen data to receive unbiased estimates for the performance.Similar considerations play a role in the choice of the underlying data set used for post-hoc interpretations of a model:Should model interpretation be based on training data or test data?This question is especially relevant when we use a performance-based interpretation method such as the permutation feature importance (PFI, see also @sec-feat-importance).For example, if the PFI is computed based on the training data, the reported feature importance values may overestimate the actual importance: the model may have learned to rely on specific features during training that may not be as important for generalization to new data.This comes from the fact that the performance measured on the training data is overly optimistic and permuting a feature would lead to a huge drop in the performance.In contrast, computing the PFI on the test data provides a more accurate estimate of the feature importance as the performance measured on test data is a more reliable estimate.Thus, for the same reasons performance evaluation of a model should be performed on an independent test set (see @sec-performance), it is also recommended to apply a performance-based interpretation method such as the PFI on an independent test set.::: {.callout-note}For performance evaluation, it is generally recommended to use resampling with more than one iteration, e.g., k-fold cross-validation or (repeated) subsampling, instead of a single train-test split as performed by the hold-out method (see @sec-performance).<!-- However, not all resampling strategies can be combined with all interpretation methods. -->However, for the sake of simplicity, we will use a single test data set when applying the interpretation methods to produce explanations throughout this chapter.Since produced explanations can be considered as statistical estimates, a higher number of resampling iterations can reduce the variance and result in a more reliable explanation [@molnar2021relating].<!-- However, if the test data is used, the prediction performance is an estimate of the generalization performance and consequently the calculated feature importance reflect a feature's importance for good predictions on unseen data. -->For prediction-based methods that do not require performance estimation such as ICE/PD (@sec-feature-effects) or Shapley values (@sec-shapley), the differences in interpretation between training and test data are less pronounced [@Molnar2022pitfalls].<!-- Generally, we recommend to use test data to. --><!-- TODO: reference missing!-->:::## ConclusionsIn this chapter, we learned how to gain post-hoc insights into a model trained with `r ref_pkg("mlr3")` by using the most popular approaches from the field of interpretable machine learning.The methods are all model-agnostic such that they do not depend on specific model classes.We utilized three different packages: `r ref_pkg("iml")`, `r ref_pkg("counterfactuals")` and `r ref_pkg("DALEX")`. `iml` and `DALEX` offer a wide range of (partly) overlapping methods, while the `counterfactuals` package focuses solely on counterfactual methods.We demonstrated on the `r ref("rchallenge::german")` data set how these packages offer an in-depth analysis of a random forest model fitted with `r ref_pkg("mlr3")`. In the following, we show some limitations of the presented methods.If features are correlated, the insights from the interpretation methods should be treated with caution. Changing the feature values of an observation without taking the correlation with other features into account leads to unrealistic combinations of the feature values.Since such feature combinations are also unlikely part of the training data, the model will likely extrapolate in these areas [@Molnar2022pitfalls;@Hooker2019PleaseSP].This distorts the interpretation of methods that are based on changing single feature values such as PFI, PD plots, and Shapley values.Alternative methods can help in these cases: conditional feature importance instead of PFI [@Strobl2008;@Watson2021], accumulated local effect plots instead of PD plots [@Apley2020], and the KernelSHAP method instead of Shapley values [@Lundberg2019].The explanations derived from an interpretation method can also be ambiguous.A method can deliver multiple equally plausible but potentially contradicting explanations.This phenomenon is also called the Rashomon effect [@Breiman2001].Differing hyperparameters can be one reason, for example, local surrogate models react very sensitively to changes in the underlying weighting scheme of observations.Even with fixed hyperparameters, the underlying data set or the initial seed can lead to disparate explanations [@Molnar2022pitfalls].The `r ref("rchallenge::german")` is only a low-dimensional dataset with a limited number of observations.Applying interpretation methods off-the-shelf to higher dimensional datasets is often not feasible due to the enormous computational costs.That is why in previous years more efficient methods were proposed, e.g., Shapley value computations based on kernel-based estimators (SHAP).Another challenge is that the high-dimensional IML output generated for high-dimensional datasets can overwhelm users. If the features can be meaningfully grouped, grouped versions of the methods, e.g. the grouped feature importance proposed by @Au2022, can be applied.## ExercisesModel explanation allows us to confront our expert knowledge related to the problem with relations learned by the model. The following tasks are based on predictions of the value of football players based on data from the FIFA game. It is a graceful example, as most people have some intuition about how a footballer's age or skill can affect their value. The latest FIFA statistics can be downloaded from [kaggle.com](https://www.kaggle.com/), but also one can use the 2020 data available in the `DALEX` packages (see the `DALEX::fifa` data set). The following exercises can be performed in both the `iml` and `DALEX` packages, we provide solutions for both.1. Prepare an `mlr3` regression task for the `fifa` data. Select only features describing the age and skills of footballers. Train a predictive model of your own choice on this task, e.g. `regr.ranger`, to predict the value of a footballer.2. Use the permutation importance method to calculate feature importance ranking. Which feature is the most important? Do you find the results surprising?3. Use the partial dependence plot/profile to draw the global behavior of the model for this feature. Is it aligned with your expectations?4. Choose one of the football players, for example, a well-known striker (e.g., Robert Lewandowski) or a well-known goalkeeper (e.g., Manuel Neuer). The following tasks are worth repeating for several different choices.5. For the selected footballer, calculate and plot the Shapley values. Which feature is locally the most important and has the strongest influence on the valuation of the footballer?6. For the selected footballer, calculate the ceteris paribus profiles / individual conditional expectation curves to draw the local behavior of the model for this feature. Is it different from the global behavior?