Issue
I created a model for multiclass classification. Everything went good, got a validation accuracy of 84% but when I printed the classification report I got this warning:
UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
classification report:
precision recall f1-score support
0 0.84 1.00 0.91 51890
1 0.67 0.04 0.08 8706
2 0.00 0.00 0.00 1605
accuracy 0.84 62201
macro avg 0.50 0.35 0.33 62201
weighted avg 0.79 0.84 0.77 62201
Source Code -
import pandas as pd
df=pd.read_csv('Crop_Agriculture_Data_2.csv')
df=df.drop('ID',axis=1)
dummies=pd.get_dummies(df[['Crop_Type', 'Soil_Type', 'Pesticide_Use_Category', 'Season']],drop_first=True)
df=df.drop(['Crop_Type', 'Soil_Type', 'Pesticide_Use_Category', 'Season'],axis=1)
df=pd.concat([df,dummies],axis=1)
df['Crop_Damage']=df['Crop_Damage'].map({'Minimal Damage':0,'Partial Damage':1,'Significant Damage':2})
x=df.drop('Crop_Damage',axis=1).values
y=df.Crop_Damage.values
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.3,random_state=101)
from sklearn.preprocessing import MinMaxScaler
mms=MinMaxScaler()
x_train=mms.fit_transform(x_train)
x_test=mms.transform(x_test)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout,Flatten
model=Sequential()
model.add(Flatten())
model.add(Dense(10,activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(6,activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(3,activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=13)
import numpy as np
pred=np.argmax(model.predict(x_test),axis=-1)
from sklearn.metrics import classification_report
print(classification_report(y_test,pred))
I think it might be because most of the data is in one category but I'm not sure. Is there anything I can do to solve this ?
Solution
You don't want to get rid this warning as it says that your class 2 are not on the predictions as there were no samples in the training set
you got an imbalance classification problem and the class 2 has realy low number of samples, and it was present in the test data only
I suggest you 2 things
StratifiedKFold So when you split for training and test, it consider all classes
Oversampling you might need adjust your data by randomly resample the training dataset to duplicate examples from the minority class
Answered By - Yefet
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.