Issue
Both predict() vs predict_proba() gives different roc_auc_score in Random Forest.
I understand that predict_proba() gives probabilities such as in case of Binary Classification it will gives two probabilities corresponding both classes. predict() gives class it predicted.
#Using predict_proba()
rf = RandomForestClassifier(n_estimators=200, random_state=39)
rf.fit(X_train[['Cabin_mapped', 'Sex']], y_train)
#make predictions on train and test set
pred_train = rf.predict_proba(X_train[['Cabin_mapped', 'Sex']])
pred_test = rf.predict_proba(X_test[['Cabin_mapped', 'Sex']].fillna(0))
print('Train set')
print('Random Forests using predict roc-auc: {}'.format(roc_auc_score (y_train, pred_train)))
print('Test set')
print('Random Forests using predict roc-auc: {}'.format(roc_auc_score(y_test, pred_test)))
#using predict()
pred_train = rf.predict(X_train[['Cabin_reduced', 'Sex']])
pred_test = rf.predict(X_test[['Cabin_reduced', 'Sex']])
print('Train set')
print('Random Forests using predict roc-auc: {}'.format(roc_auc_score(y_train, pred_train)))
print('Test set')
print('Random Forests using predict roc-auc: {}'.format(roc_auc_score(y_test, pred_test)))
Train set Random Forests using predict_proba roc-auc: 0.8199550985878832
Test set Random Forests using preditc_proba roc-auc: 0.8332142857142857
Train set Random Forests using predict roc-auc: 0.7779440793041364
Test set Random Forests using predict roc-auc: 0.7686904761904761
Solution
As you said, the predict
function returns the prediction as True
/False
value, whereas proba
function returns probabilities, values between one and zero
and this is the reason for the difference.
AUC means "area under the curve" which is indeed different if the curve is a 0/1 step function or a curve made of continuous values.
Let's imagine you have only one example, it should be classified as False
. If your classifier yields the probability of 0.7, the ROC-AUC value is 1.0-0.7=0.3. If you used predict
, the prediction will be True
= 1.0, so the ROC-AUC will be 1.0-1.0=0.0.
Answered By - Jindřich
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.