Issue
I am trying to use a new data set on a previously trained model to see how accurate the model is. I use the following code and receive the below error. Would another method solve this problem? thanks
import pandas as pd
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
%matplotlib inline
df = pd.read_excel('xxxx.xlsx')
enc = LabelEncoder()
X = df[df.columns[1:]]
Y = df[df.columns[0]].values.ravel()
Y2 = enc.fit_transform(Y)
df.insert(0, "Unit Status", Y2, True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y2, random_state = 0, test_size = 0.25)
clf = LinearSVC(random_state=0,dual=False, tol=1e-5)
clf.fit(X, Y2)
Y_pred = clf.predict(X_test)
confusion_matrix(Y_test, Y_pred)
classifier_predictions = clf.predict(X_test)
print(accuracy_score(Y_test, classifier_predictions)*100)
df2 = pd.read_excel('xxxx_v2.xlsx')
y_pred=clf.predict(df2)
ValueError: could not convert string to float: '20-002'
Solution
The data in the new dataframe must all be floats or at least can be converted to float, the first and second columns have string data which cannot be converted to numbers, thus the model cannot train or predict on this data. from looking at the data, you could use labelEncoder
on the second column and decide whether or not to use OneHotEncoder
, but it looks to me that the first column doesn't contain categorical data. If the model needs the first column's data, then you need to convert it to numbers somehow, otherwise just drop the column.
Answered By - Mahmoud Youssef
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.