2.7 Binary classification

Classification problems with a target variable containing only two classes are called “binary.” For such binary target variables, you can specify the positive class within the classification task object during task creation. If not explicitly set during construction, the positive class defaults to the first level of the target variable.

# during construction
data("Sonar", package = "mlbench")
task = as_task_classif(Sonar, target = "Class", positive = "R")

# switch positive class to level 'M'
task$positive = "M"

2.7.1 ROC Curve and Thresholds

ROC Analysis, which stands for “receiver operating characteristics,” is a subfield of machine learning which studies the evaluation of binary prediction systems. We saw earlier that one can retrieve the confusion matrix of a Prediction by accessing the $confusion field:

learner = lrn("classif.rpart", predict_type = "prob")
pred = learner$train(task)$predict(task)
C = pred$confusion
##         truth
## response  M  R
##        M 95 10
##        R 16 87

The confusion matrix contains the counts of correct and incorrect class assignments, grouped by class labels. The columns illustrate the true (observed) labels and the rows display the predicted labels. The positive is always the first row or column in the confusion matrix. Thus, the element in \(C_{11}\) is the number of times our model predicted the positive class and was right about it. Analogously, the element in \(C_{22}\) is the number of times our model predicted the negative class and was also right about it. The elements on the diagonal are called True Positives (TP) and True Negatives (TN). The element \(C_{12}\) is the number of times we falsely predicted a positive label, and is called False Positives (FP). The element \(C_{21}\) is called False Negatives (FN).

We can now normalize in rows and columns of the confusion matrix to derive several informative metrics:

  • True Positive Rate (TPR): How many of the true positives did we predict as positive?
  • True Negative Rate (TNR): How many of the true negatives did we predict as negative?
  • Positive Predictive Value PPV: If we predict positive how likely is it a true positive?
  • Negative Predictive Value NPV: If we predict negative how likely is it a true negative?

Source: Wikipedia

It is difficult to achieve a high TPR and low FPR in conjunction, so one uses them for constructing the ROC Curve. We characterize a classifier by its TPR and FPR values and plot them in a coordinate system. The best classifier lies on the top-left corner. The worst classifier lies at the diagonal. Classifiers lying on the diagonal produce random labels (with different proportions). If each positive \(x\) will be randomly classified with 25% as “positive,” we get a TPR of 0.25. If we assign each negative \(x\) randomly to “positive” we get a FPR of 0.25. In practice, we should never obtain a classifier below the diagonal, as inverting the predicted labels will result in a reflection at the diagonal.

A scoring classifier is a model which produces scores or probabilities, instead of discrete labels. To obtain probabilities from a learner in mlr3, you have to set predict_type = "prob" for a ref("LearnerClassif"). Whether a classifier can predict probabilities is given in its $predict_types field. Thresholding flexibly converts measured probabilities to labels. Predict \(1\) (positive class) if \(\hat{f}(x) > \tau\) else predict \(0\). Normally, one could use \(\tau = 0.5\) to convert probabilities to labels, but for imbalanced or cost-sensitive situations another threshold could be more suitable. After thresholding, any metric defined on labels can be used.

For mlr3 prediction objects, the ROC curve can easily be created with mlr3viz which relies on the precrec to calculate and plot ROC curves:


# TPR vs FPR / Sensitivity vs (1 - Specificity)
autoplot(pred, type = "roc")

# Precision vs Recall
autoplot(pred, type = "prc")

2.7.2 Threshold Tuning

Learners which can predict the probability for the positive class usually use a simple rule to determine the predicted class label: if the probability exceeds the threshold \(t = 0.5\), pick the positive label, and select the negative label otherwise. If the model is not well calibrated or the class labels are heavily unbalanced, selecting a different threshold can help to improve the predictive performance w.r.t. a chosen performance measure.

Here, we change the threshold to \(t = 0.2\), improving the True Positive Rate (TPR). Note that with the new threshold more observations from the positive class will get correctly classified with the positive label, but at the same time the True Negative Rate (TNR) decreases. Depending on the application, this may be a desired trade-off.

measures = msrs(c("classif.tpr", "classif.tnr"))
##         truth
## response  M  R
##        M 95 10
##        R 16 87
## classif.tpr classif.tnr 
##      0.8559      0.8969
##         truth
## response   M   R
##        M 104  25
##        R   7  72
## classif.tpr classif.tnr 
##      0.9369      0.7423

Thresholds can also be tuned with the mlr3pipelines package, i.e. using PipeOpTuneThreshold.