Issue
I'm a newbie to python as well as machine learning. As per my requirement, I'm trying to use Naive Bayes algorithm for my dataset.
I'm able to find out the accuracy but trying to find out precision and recall for the same. But, it is throwing the following error:
ValueError: Target is multiclass but average='binary'. Please choose another average setting.
Can anyone please suggest me how to proceed with it. I have tried using average ='micro' in the precision and the recall scores.It worked without any errors but it is giving the same score for accuracy, precision, recall.
My dataset:
train_data.csv:
review,label
Colors & clarity is superb,positive
Sadly the picture is not nearly as clear or bright as my 40 inch Samsung,negative
test_data.csv:
review,label
The picture is clear and beautiful,positive
Picture is not clear,negative
My Code:
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
def load_data(filename):
reviews = list()
labels = list()
with open(filename) as file:
file.readline()
for line in file:
line = line.strip().split(',')
labels.append(line[1])
reviews.append(line[0])
return reviews, labels
X_train, y_train = load_data('/Users/abc/Sep_10/train_data.csv')
X_test, y_test = load_data('/Users/abc/Sep_10/test_data.csv')
vec = CountVectorizer()
X_train_transformed = vec.fit_transform(X_train)
X_test_transformed = vec.transform(X_test)
clf= MultinomialNB()
clf.fit(X_train_transformed, y_train)
score = clf.score(X_test_transformed, y_test)
print("score of Naive Bayes algo is :" , score)
y_pred = clf.predict(X_test_transformed)
print(confusion_matrix(y_test,y_pred))
print("Precision Score : ",precision_score(y_test,y_pred,pos_label='positive'))
print("Recall Score :" , recall_score(y_test, y_pred, pos_label='positive') )
Solution
You need to add the 'average'
param. According to the documentation:
average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]
This parameter is required for multiclass/multilabel targets. If
None
, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:
Do this:
print("Precision Score : ",precision_score(y_test, y_pred,
pos_label='positive'
average='micro'))
print("Recall Score : ",recall_score(y_test, y_pred,
pos_label='positive'
average='micro'))
Replace 'micro'
with any one of the above options except 'binary'
. Also, in the multiclass setting, there is no need to provide the 'pos_label'
as it will be anyways ignored.
Update for comment:
Yes, they can be equal. Its given in the user guide here:
Note that for “micro”-averaging in a multiclass setting with all labels included will produce equal precision, recall and F, while “weighted” averaging may produce an F-score that is not between precision and recall.
Answered By - Vivek Kumar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.