Issue
I am trying to run this machine learning platform and I get the following error:
ValueError: X.shape[1] = 574 should be equal to 11, the number of features at training time
My Code:
from pylab import *
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
import numpy as np
X = list ()
Y = list ()
validationX = list ()
validationY = list ()
file = open ('C:\\Users\\User\\Desktop\\csci4113\\project1\\whitewineTraining.txt','r')
for eachline in file:
strArray = eachline.split(";")
row = list ()
for i in range(len(strArray) - 1):
row.append(float(strArray[i]))
X.append(row)
if (int(strArray[-1]) > 6):
Y.append(1)
else:
Y.append(0)
file2 = open ('C:\\Users\\User\\Desktop\\csci4113\\project1\\whitewineValidation.txt', 'r')
for eachline in file2:
strArray = eachline.split(";")
row2 = list ()
for i in range(len(strArray) - 1):
row2.append(float(strArray[i]))
validationX.append(row2)
if (int(strArray[-1]) > 6):
validationY.append(1)
else:
validationY.append(0)
X = np.array(X)
print (X)
Y = np.array(Y)
print (Y)
validationX = np.array(validationX)
validationY = np.array(validationY)
clf = svm.SVC()
clf.fit(X,Y)
result = clf.predict(validationX)
clf.score(result, validationY)
The goal of the program is to to build a model from the fit() command where we can use it to compare to a validation set in validationY and see the validity of our machine learning model. Here is the rest of the console output: keep in mind X is confusingly a 11x574 array!
[[ 7. 0.27 0.36 ..., 3. 0.45 8.8 ]
[ 6.3 0.3 0.34 ..., 3.3 0.49 9.5 ]
[ 8.1 0.28 0.4 ..., 3.26 0.44 10.1 ]
...,
[ 6.3 0.28 0.22 ..., 3. 0.33 10.6 ]
[ 7.4 0.16 0.33 ..., 3.04 0.68 10.5 ]
[ 8.4 0.27 0.3 ..., 2.89 0.3
11.46666667]]
[0 0 0 ..., 0 1 0]
C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Traceback (most recent call last):
File "<ipython-input-68-31c649fe24b3>", line 1, in <module>
runfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1')
File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/User/Desktop/csci4113/project1/program1.py", line 43, in <module>
clf.score(result, validationY)
File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\base.py", line 310, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 568, in predict
y = super(BaseSVC, self).predict(X)
File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 305, in predict
X = self._validate_for_predict(X)
File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 474, in _validate_for_predict
(n_features, self.shape_fit_[1]))
ValueError: X.shape[1] = 574 should be equal to 11, the number of features at training time
runfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1')
10
[[ 7. 0.27 0.36 ..., 3. 0.45 8.8 ]
[ 6.3 0.3 0.34 ..., 3.3 0.49 9.5 ]
[ 8.1 0.28 0.4 ..., 3.26 0.44 10.1 ]
...,
[ 6.3 0.28 0.22 ..., 3. 0.33 10.6 ]
[ 7.4 0.16 0.33 ..., 3.04 0.68 10.5 ]
[ 8.4 0.27 0.3 ..., 2.89 0.3
11.46666667]]
[0 0 0 ..., 0 1 0]
C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Traceback (most recent call last):
File "<ipython-input-69-31c649fe24b3>", line 1, in <module>
runfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1')
File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/User/Desktop/csci4113/project1/program1.py", line 46, in <module>
clf.score(result, validationY)
File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\base.py", line 310, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 568, in predict
y = super(BaseSVC, self).predict(X)
File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 305, in predict
X = self._validate_for_predict(X)
File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 474, in _validate_for_predict
(n_features, self.shape_fit_[1]))``
Solution
You are simply passing wrong object to score function, documentation clearly states
score(X, y, sample_weight=None)
X : array-like, shape = (n_samples, n_features) Test samples.
and you pass predictions instead, thus
result = clf.predict(validationX)
clf.score(result, validationY)
is invalid, and should be just
clf.score(validationX, validationY)
What you tried to do would be fine if you use some scorer, and not classifier, classifier .score methods call .predict on their own, thus you pass raw data as an argument.
Answered By - lejlot
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.