Issue
I need to learn how this function works for multilabel problems.
I try to calculate accuracy for to reach same result but i couldnt. How does it work?
4 labels in this dataset, y_array is real, y_pred is predicted array. y is like this; [0,1,1,1], [1,0,0,0] ...
tp = 0
tn = 0
fn = 0
fp = 0
for i in range(len(y_array)):
for j in range(4) :
#True
if ( y_array[i][j] == 1 ) and (y_pred[i][j] == 1 ) :
tp = tp + 1
elif ( y_array[i][j] == 0 ) and (y_pred[i][j] == 0 ) :
tn = tn + 1
#False
elif ( y_array[i][j] == 0 ) and (y_pred[i][j] == 1 ) :
fn = fn + 1
elif ( y_array[i][j] == 1 ) and (y_pred[i][j] == 0 ) :
fp = fp + 1
ac = (tp+tn)/(tp+tn+fp+fn)
print("Accuracy", ac)
print('Accuracy: {0}'.format(accuracy_score(y_array, y_pred)))
They are different from each other, How can i calculate accuracy or other metrics for this multilabel problem? Is it wrong to use sklearn accuracy metric?
Accuracy 0.9068711367973193
Accuracy: 0.7134998676125521
Solution
As per scikit-learn documentation for accuracy_score:
for multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.
This means that each label will look something like [0,0,1,0]
and will need identical match for a single Positive (so y_pred will need to be [0,0,1,0]
as well), and anything that isn't [0,0,1,0]
will result in a single Negative.
In your manual function, you count each partial match separately:
if y_true is [0,0,1,0]
and y_pred is [0,1,0,0]
, you count this as 2 True Negatives (in position 0 and 3), 1 False Positive (position 1) and 1 False Negative (position 2). With the formula you use for accuracy, this results in ac = (0+2)/(0+2+1+1)
, which gives 50% accuracy, while sklearn.metrics.accuracy_score
will be 0%.
If you want to replicate scikit-learn accuracy_score manually, you would need to first check each member of y_array[i], and only then label it as one of the TP,TN,FP,FN.
However seeing as you're dealign with multilabel classification, as per link above, you might want to check out sklearn.metrics.jaccard_score, sklearn.metrics.hamming_loss or sklearn.metrics.zero_one_loss
Answered By - dm2
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.