An optimization problem has to be solved by adjusting the threshold and seeking the optimum in order to balance the trade-off between the decrease in revenue and a decrease in cost.
If “Settled” is described as good and “Past Due” is described as negative, then using the design associated with confusion matrix plotted in Figure 6, the four areas are divided as real Positive (TN), False Positive (FP), False Negative (FN) and real Negative (TN). Aligned with all the confusion matrices plotted in Figure 5, TP could be the loans that are good, and FP could be the defaults missed. We have been keen on those two areas. To normalize the values, two widely used mathematical terms are defined: true rate that is positiveTPR) and False Positive Rate (FPR). Their equations are shown below:
In this application, TPR could be the hit price of good loans, and it also represents the ability of creating funds from loan interest; FPR is the rate that is missing of, plus it represents the chances of taking a loss.
Receiver Operational Characteristic (ROC) bend is one of widely used plot to visualize the performance of a classification model at all thresholds. In Figure 7 left, the ROC Curve regarding the Random Forest model is plotted. This plot really shows the relationship between TPR and FPR, where one always goes into the direction that is same one other, from 0 to 1. good category model would usually have the ROC curve over the red standard, sitting because of the “random classifier”. The region Under Curve (AUC) can be a metric for assessing the classification model besides precision. The AUC regarding the Random Forest model is 0.82 away from 1, which will be decent.
Although the ROC Curve obviously shows the connection between TPR and FPR, the limit can be an implicit adjustable. The optimization task cannot purely be done by the ROC Curve. Consequently, another measurement is introduced to add the limit adjustable, as plotted in Figure 7 right.