As we have seen before, binary classification is special because of the presence of a positive and negative class and a threshold probability to ditinguish between the two. ROC Analysis, which stands for receiver operating characteristics, applies specifically to this case and allows to get a better picture of the trade offs when choosing between the two classes. We saw earlier that one can retrieve the confusion matrix of a Prediction by accessing the $confusion field:
The upper left quandrant denotes the number of times our model predicted the positive class and was correct about it. Similarly, the lower right quadrant denotes the number of times our model predicted the negative class and was also correct about it. Together, the elements on the diagonal are called True Positives (TP) and True Negatives (TN). The upper right quadrant denotes the number of times we falsely predicted a positive label, and is called False Positives (FP). The lower left quadrant is called False Negatives (FN).
We can derive the following performance metrics from the confusion matrix:
True Positive Rate (TPR): How many of the true positives did we predict as positive?
True Negative Rate (TNR): How many of the true negatives did we predict as negative?
Positive Predictive Value PPV: If we predict positive how likely is it a true positive?
Negative Predictive Value NPV: If we predict negative how likely is it a true negative?
It is difficult to achieve a high TPR and low FPR at the same time. We can characterize the behavior of a binary classifier for different thresholds by plotting the TPR and FPR values – this is the ROC curve. The best classifier lies on the top-left corner – the TPR is 1 and the FPR is 0. The worst classifier lies at the diagonal. Classifiers on the diagonal produce labels essentially randomly (possibly with different proportions). For example, if each positive \(x\) will be randomly classified with 25% as “positive”, we get a TPR of 0.25. If we assign each negative \(x\) randomly to “positive” we get a FPR of 0.25. In practice, we should never obtain a classifier below the diagonal – ROC curves for different labels are symmetric with respect to the diagonal, so a curve below the diagonal would indicate that the positive and negative class labels have been switched by the classifier.
For mlr3 prediction objects, the ROC curve can easily be created with mlr3viz which relies on the precrec to calculate and plot ROC curves:
We can also plot the precision-recall curve (PPV vs. TPR). The main difference between ROC curves and precision-recall curves (PRC) is that the number of true-negative results is not used for making a PRC. PRCs are preferred over ROC curves for imbalanced populations.
# ROC Curve and Thresholds {#binary-roc}```{r 03-perf-binary-001, include = F}library(mlr3)library(mlr3book)```As we have seen before, binary classification is special because of the presence of a positive and negative class and a threshold probability to ditinguish between the two.ROC Analysis, which stands for receiver operating characteristics, applies specifically to this case and allows to get a better picture of the trade offs when choosing between the two classes.We saw earlier that one can retrieve the confusion matrix of a `r ref("Prediction")` by accessing the `$confusion` field:```{r 03-perf-binary-002}data("Sonar", package ="mlbench")task =as_task_classif(Sonar, target ="Class", positive ="M")learner =lrn("classif.rpart", predict_type ="prob")pred = learner$train(task)$predict(task)C = pred$confusionprint(C)```The upper left quandrant denotes the number of times our model predicted the positive class and was correct about it.Similarly, the lower right quadrant denotes the number of times our model predicted the negative class and was also correct about it.Together, the elements on the diagonal are called True Positives (TP) and True Negatives (TN).The upper right quadrant denotes the number of times we falsely predicted a positive label, and is called False Positives (FP).The lower left quadrant is called False Negatives (FN).We can derive the following performance metrics from the confusion matrix:* **True Positive Rate (TPR)**: How many of the true positives did we predict as positive?* **True Negative Rate (TNR)**: How many of the true negatives did we predict as negative?* **Positive Predictive Value PPV**: If we predict positive how likely is it a true positive?* **Negative Predictive Value NPV**: If we predict negative how likely is it a true negative?It is difficult to achieve a high TPR and low FPR at the same time.We can characterize the behavior of a binary classifier for different thresholds by plotting the TPR and FPR values -- this is the ROC curve.The best classifier lies on the top-left corner -- the TPR is 1 and the FPR is 0.The worst classifier lies at the diagonal.Classifiers on the diagonal produce labels essentially randomly (possibly with different proportions).For example, if each positive $x$ will be randomly classified with 25\% as "positive", we get a TPR of 0.25.If we assign each negative $x$ randomly to "positive" we get a FPR of 0.25.In practice, we should never obtain a classifier below the diagonal -- ROC curves for different labels are symmetric with respect to the diagonal, so a curve below the diagonal would indicate that the positive and negative class labels have been switched by the classifier.For `r mlr3book::mlr_pkg("mlr3")` prediction objects, the ROC curve can easily be created with `r mlr3book::mlr_pkg("mlr3viz")` which relies on the `r mlr3book::cran_pkg("precrec")` to calculate and plot ROC curves:```{r 03-perf-binary-003}library("mlr3viz")# TPR vs FPR / Sensitivity vs (1 - Specificity)autoplot(pred, type ="roc")```We can also plot the precision-recall curve (PPV vs. TPR).The main difference between ROC curves and precision-recall curves (PRC) is that the number of true-negative results is not used for making a PRC.PRCs are preferred over ROC curves for imbalanced populations.```{r 03-perf-binary-004}# Precision vs Recallautoplot(pred, type ="prc")```