Issue
While evaluating my machine learning model with cross-validation, I encountered an issue. I knew how to plot AUROC and the corresponding threshold for each fold in cross-validation, but I was unsure about plotting the mean AUROC and its corresponding threshold across all folds.
For this, I explored relevant questions on Stack Overflow and found the corresponding solutions. You can find the original question by following this link: [https://stackoverflow.com/questions/57708023/plotting-the-roc-curve-of-k-fold-cross-validation%5C]. Although I managed to generate the Mean ROC, I encountered challenges in accurately plotting the corresponding threshold. To address this, I incorporated additional code based on my understanding, but I am uncertain about the correctness of this approach.
Additionally, I observed a discrepancy between the mean AUC calculated using np.mean()
and the AUC value computed by sklearn.metrics
. Consequently, I'm seeking guidance on which value is more accurate to get a precise AUC result. Below is the modified code after my adjustments.
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
cv = StratifiedKFold(n_splits=10)
classifier = SVC(kernel='sigmoid',probability=True,random_state=0)
tprs = []
aucs = []
optimal_thresholds = []
mean_fpr = np.linspace(0, 1, 100)
plt.figure(figsize=(10,10))
i = 0
for train, test in cv.split(X, y):
probas_ = classifier.fit(X[train], y[train]).predict_proba(X[test])
# Compute ROC curve and area the curve
fpr, tpr, thresholds = roc_curve(y[test], probas_[:, 1])
# The code I added:
optimal_threshold_index = np.argmax(tpr-fpr)
optimal_threshold = thresholds[optimal_threshold_index]
optimal_thresholds.append(optimal_threshold)
#
tprs.append(np.interp(mean_fpr, fpr, tpr))
tprs[-1][0] = 0.0
roc_auc = auc(fpr, tpr)
aucs.append(roc_auc)
plt.plot(fpr, tpr, lw=1, alpha=0.3,
label='ROC fold %d (AUC = %0.4f)' % (i, roc_auc))
i += 1
plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r',
label='Chance', alpha=.8)
mean_tpr = np.mean(tprs, axis=0)
mean_tpr[-1] = 1.0
mean_auc = auc(mean_fpr, mean_tpr)
# The code I added:
np_mean_AUC = np.mean(aucs)
# print(f"np_mean_AUC={np_mean_AUC},mean_auc={mean_auc}")
#
std_auc = np.std(aucs)
plt.plot(mean_fpr, mean_tpr, color='b',
label=r'Mean ROC (AUC = %0.4f $\pm$ %0.4f)' % (np_mean_AUC, std_auc),
lw=2, alpha=.8)
# The code I added:
mean_optimal_threshold_index = np.argmax(mean_tpr-mean_fpr)
plt.annotate(f'Mean Optimal Threshold ({np.mean(optimal_thresholds):.2f})',
xy=(mean_fpr[mean_optimal_threshold_index], mean_tpr[mean_optimal_threshold_index]),
xytext=(5, -5),
textcoords='offset points',
arrowprops=dict(facecolor='red', arrowstyle='wedge,tail_width=0.7', shrinkA=0, shrinkB=10),
color='red')
#
std_tpr = np.std(tprs, axis=0)
tprs_upper = np.minimum(mean_tpr + std_tpr, 1)
tprs_lower = np.maximum(mean_tpr - std_tpr, 0)
plt.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,
label=r'$\pm$ 1 std. dev.')
plt.xlim([-0.01, 1.01])
plt.ylim([-0.01, 1.01])
plt.xlabel('False Positive Rate',fontsize=18)
plt.ylabel('True Positive Rate',fontsize=18)
plt.title('Cross-Validation ROC of SVM',fontsize=18)
plt.legend(loc="lower right", prop={'size': 15})
plt.show()
following is the output:
Please let me know if the changes I made in the code can accurately plot the ROC curve for cross-validation along with the corresponding thresholds, and if the labeled AUC values make sense.
Solution
- For each fold, you obtain: an ROC curve, an AUC, and an optimal threshold
- You interpolate each fold's ROC curve onto a common axis
- After all splits are complete, you average the interpolated curves and compute the resulting curve's AUC and optimal threshold coordinates.
I haven't picked up on any bugs in your code. I think there's a slight discrepancy in how you report optimal_threshold
, which I expand on at the end.
I observed a discrepancy between the mean AUC calculated using
np.mean()
and the AUC value computed bysklearn.metrics
. Consequently, I'm seeking guidance on which value is more accurate to get a precise AUC result.
My view is that for the purpose of reporting a single performance score, averaging the 10 AUC scores is easier to interpret than the interpolation method. This is because you average together the scores from 10 actual trained models, providing a summary of performance across folds. If you do it the other way, where you first derive the mean ROC curve, then the area under that curve doesn't belong to any particular model, and is less easy to interpret as an aggregate score. Interpolating the curves onto a common axis is nevertheless a useful technique for zooming in on TPR-FPR characteristics at different thresholds.
In practice I think both metrics would be similar, and in this case they are very close. I don't think there's a definite right vs. wrong approach.
Although I managed to generate the Mean ROC, I encountered challenges in accurately plotting the corresponding threshold.
In your code the mean ROC curve is plotted, and it's annotated with np.mean(optimal_thresholds)
, rather than the threshold of that mean curve itself. It's a bit like you have one curve, but are annotating it with some aggregate measure derived from other curves. Suppose we interpolate each fold's thresholds onto a common axis, and get the average threshold at each point...that still doesn't seem meaningful to me because there can be a lot of variance in the threshold values used by skearn for each fold:
This figure shows the thresholds returned by roc
for each fold, where the black line marks the location of the optimal location on the mean ROC. The average threshold doesn't seem like useful actionable information to me.
Answered By - user3128
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.