## 11.1 Cost-Sensitive Classification

Imagine you are an analyst for a big credit institution. Let’s also assume that a correct decision of the bank would result in 35% of the profit at the end of a specific period. A correct decision means that the bank predicts that a customer will pay their bills (hence would obtain a loan), and the customer indeed has good credit. On the other hand, a wrong decision means that the bank predicts that the customer’s credit is in good standing, but the opposite is true. This would result in a loss of 100% of the given loan.

Good Customer (truth) | Bad Customer (truth) | |
---|---|---|

Good Customer (predicted) | + 0.35 | - 1.0 |

Bad Customer (predicted) | 0 | 0 |

Expressed as costs (instead of profit), we can write down the cost-matrix as follows:

```
costs = matrix(c(-0.35, 0, 1, 0), nrow = 2)
dimnames(costs) = list(response = c("good", "bad"), truth = c("good", "bad"))
print(costs)
## truth
## response good bad
## good -0.35 1
## bad 0.00 0
```

An exemplary data set for such a problem is the `German Credit`

task:

```
library(mlr3)
task = mlr_tasks$get("german_credit")
table(task$truth())
##
## good bad
## 700 300
```

The data has 70% customers who are able to pay back their credit, and 30% bad customers who default on the debt. A manager, who doesn’t have any model, could decide to give either everybody a credit or to give nobody a credit. The resulting costs for the German credit data are:

```
# nobody:
(700 * costs[2, 1] + 300 * costs[2, 2]) / 1000
## [1] 0
# everybody
(700 * costs[1, 1] + 300 * costs[1, 2]) / 1000
## [1] 0.055
```

If the average loan is $20,000, the credit institute would lose more than one million dollar if it would grant everybody a credit:

```
# average profit * average loan * number of customers
0.055 * 20000 * 1000
## [1] 1100000
```

Our goal is to find a model which minimizes the costs (and thereby maximizes the expected profit).

### 11.1.1 A First Model

For our first model, we choose an ordinary logistic regression (implemented in the add-on package mlr3learners). We first create a classification task, then resample the model using a 10-fold cross validation and extract the resulting confusion matrix:

```
library(mlr3learners)
learner = mlr_learners$get("classif.log_reg")
rr = resample(task, learner, "cv")
confusion = rr$prediction$confusion
print(confusion)
## truth
## response good bad
## good 607 156
## bad 93 144
```

To calculate the average costs like above, we can simply multiply the elements of the confusion matrix with the elements of the previously introduced cost matrix, and sum the values of the resulting matrix:

```
avg_costs = sum(confusion * costs) / 1000
print(avg_costs)
## [1] -0.05645
```

With an average loan of $20,000, the logistic regression yields the following costs:

```
avg_costs * 20000 * 1000
## [1] -1129000
```

Instead of losing over $1,000,000, the credit institute now can expect a profit of more than $1,000,000.

### 11.1.2 Cost-sensitive Measure

Our natural next step would be to further improve the modeling step in order to maximize the profit.
For this purpose we first create a cost-sensitive classification measure which calculates the costs based on our cost matrix.
This allows us to conveniently quantify and compare modeling decisions.
Fortunately, there already is a predefined measure `Measure`

for this purpose: `MeasureClassifCosts`

:

```
cost_measure = MeasureClassifCosts$new("credit_costs", costs)
print(cost_measure)
## <MeasureClassifCosts:credit_costs>
## Packages: -
## Range: [-Inf, Inf]
## Minimize: TRUE
## Predict type: response
```

We set this measure as new default measure for our cost-sensitive task.

`task$measures = list(cost_measure)`

If we now call `resample()`

or `benchmark()`

, the cost-sensitive measures will be evaluated.
We compare the logistic regression to a simple featureless learner and to a random forest from package ranger :

```
learners = mlr_learners$mget(c("classif.log_reg", "classif.featureless", "classif.ranger"))
bmr = benchmark(expand_grid(task, learners, "cv"))
print(bmr)
## <BenchmarkResult> of 30 experiments in 3 resamplings:
## task learner resampling credit_costs
## german_credit classif.log_reg cv -0.05665
## german_credit classif.ranger cv -0.04865
## german_credit classif.featureless cv 0.05500
```

As expected, the featureless learner is performing comparably bad. The logistic regression and the random forest work equally well.

### 11.1.3 Thresholding

Although we now correctly evaluate the models in a cost-sensitive fashion, the models themselves are unaware of the classification costs. They assume the same costs for both wrong classification decisions (false positives and false negatives). Some learners natively support cost-sensitive classification (e.g., XXX). However, we will concentrate on a more generic approach which works for all models which can predict probabilities for class labels: thresholding.

Most learners can calculate the probability \(p\) for the positive class. If \(p\) exceeds the threshold \(0.5\), they predict the positive class, and the negative class otherwise.

For our binary classification case of the credit data, the we primarily want to minimize the errors where the model predicts “good”, but truth is “bad” (i.e., the number of false positives) as this is the more expensive error. If we now increase the threshold to values \(> 0.5\), we reduce the number of false negatives. Note that we increase the number of false positives simultaneously, or, in other words, we are trading false positives for false negatives.

```
# fit models with probability prediction
learner = mlr_learners$get("classif.log_reg", predict_type = "prob")
rr = resample(task, learner, "cv")
p = rr$prediction
print(p)
## <PredictionClassif> for 1000 observations:
## row_id truth response prob.good prob.bad
## 1: 7 good good 0.9364 0.06356
## 2: 12 bad bad 0.1563 0.84369
## 3: 43 good good 0.6406 0.35944
## ---
## 998: 963 good good 0.8204 0.17963
## 999: 976 good good 0.9167 0.08334
## 1000: 993 good good 0.8080 0.19201
# helper function to try different threshold values interactively
with_threshold = function(p, th) {
p$response = p$set_threshold(th)$response
list(confusion = p$confusion, costs = cost_measure$calculate(prediction = p))
}
with_threshold(p, 0.5)
## $confusion
## truth
## response good bad
## good 603 159
## bad 97 141
##
## $costs
## [1] -0.05205
with_threshold(p, 0.75)
## $confusion
## truth
## response good bad
## good 471 72
## bad 229 228
##
## $costs
## [1] -0.09285
with_threshold(p, 1.0)
## $confusion
## truth
## response good bad
## good 1 1
## bad 699 299
##
## $costs
## [1] 0.00065
# TODO: include plot of threshold vs performance
```

Instead of manually trying different threshold values, we here use `optimize()`

to find a good threshold value w.r.t. our performance measure:

```
# simple wrapper function which takes a threshold and returns the resulting model performance
# this wrapper is passed to optimize() to find its minimum for thresholds in [0.5, 1]
f = function(th) {
with_threshold(p, th)$costs
}
best = optimize(f, c(0.5, 1))
print(best)
## $minimum
## [1] 0.7661
##
## $objective
## [1] -0.0956
# optimized confusion matrix:
with_threshold(p, best$minimum)$confusion
## truth
## response good bad
## good 456 64
## bad 244 236
```

The function `optimize()`

is intended for unimodal functions and therefore may converge to a local optimum here. See below for better alternatives to find good threshold values.

:x ### Threshold Tuning

To be continued…

- threshold tuning as pipeline operator
- joint hyperparameter optimization