Issue
I want to make a prediction using knn and I have following lines of code:
def knn(trainImages, trainLabels, testImages, testLabels):
max = 0
for i in range(len(trainImages)):
if len(trainImages[i]) > max:
max = len(trainImages[i])
for i in range(len(trainImages)):
aux = np.array(trainImages[i])
aux.resize(max)
trainImages[i] = aux
max = 0
for i in range(len(testImages)):
if len(testImages[i]) > max:
max = len(testImages[i])
for i in range(len(testImages)):
aux = np.array(testImages[i])
aux.resize(max)
testImages[i] = aux
scaler = StandardScaler()
scaler.fit(list(trainImages))
trainImages = scaler.transform(list(trainImages))
testImages = scaler.transform(list(testImages))
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(trainImages, trainLabels)
pred = classifier.predict(testImages)
print(classification_report(testLabels, pred))
I got the error at testImages = scaler.transform(list(testImages))
. I understand that its a difference between arrays number. How can I solve it?
Solution
scaler
in scikit-learn expects input shape as (n_samples, n_features)
.
If your second dimension in train and test set is not equal, then not only in sklearn it is incorrect and cause to raise error, but also in theory it does not make sense. n_features
dimension of test and train set should be equal, but first dimension can be different, since it show number of samples and you can have any number of samples in train and test sets.
When you execute scaler.transform(test)
it expects test
have the same feature numbers as where you executed scaler.fit(train)
. So, all your images should be in the same size.
For example, if you have 100 images, train_images
shape should be something like (90,224,224,3)
and test_images
shape should be like (10,224,224,3)
(only first dimension is different).
So, try to resize your images like this:
import cv2
resized_image = cv2.resize(image, (224,224)) #don't include channel dimension
Answered By - Kaveh
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.