Issue
Have:
- Dataset: 115 color images with 256x256 size, all photos belongs to ONE class (cartoon person).
- Classifiers: KNN and Random Forest Classifier.
Comment: i wanted to make a classifier to predict ONE cartoon person on some photo, so i've collected dataset, digitized it and put it in fit method of classifiers. So at first i choosed SGDClassifier, but it works only with 2 and more classes in dataset. So then choosed KNN and Random Forest Classifier.
Problem: when i try to test my ready classifiers, i got 1.0 score on EVERY photo (i tested that 1 object, 1 another object (another cartoon person) and a photo of black screen) and they all had 1.0 score anyway.
Can somebody help me please? : ( I am stuck on this 2 days already and don't see ways to solve it by myself, i watched many solutions, but none of them worked in my case.
Dataset:
- The shape of my dataset numpy array is (115, 196608) and (for example) one image in my dataset numpy array looks this:
- Dataset is a 2D array, because classifiers take only 1D or 2D arrays.
Code: it's not full, just for an example
train_data_values = numpy.array([*115 photos*]) train_data_labels = numpy.array([*115 labels*]) # For fact, all my labels equal "1", there is no other value. # Trying KNN from sklearn.neighbors import KNeighborsClassifier KNN_clf = KNeighborsClassifier(**{'n_neighbors': 16, 'weights': 'distance'}) KNN_clf.fit(train_data_values, train_data_labels) test_im = cv2.imread(DATASET_IMAGES_DIRECTORY + "\\test\\" + "test2.png") KNN_clf.predict_proba(test_im.reshape(1, 3*256*256)) # Returns array([[1.]]) # Trying Random Forest Classifier from sklearn.ensemble import RandomForestClassifier RF_clf = RandomForestClassifier() RF_clf.fit(train_data_values, train_data_labels) test_im = cv2.imread(DATASET_IMAGES_DIRECTORY + "\\test\\" + "test.png") RF_clf.predict_proba(test_im.reshape(1, 3*256*256)) # Returns array([[1.]])
Comment: i looked on images in my numpy dataset, because i thought they can be bad digitized, but NO, they can be built easily from array to image. P.S. Parameters for KNN classifier are random, because i've been trying grid search for best parameteres, but there were again 1.0 scores everywhere.
Solution
All classifiers learn their scores from their training data. And scores of most classifiers (including random forest and KNN) have probabilistic meaning: they are tuned to reflect the probabilistic distribution of the training data as well as possible.
So if your training data consists of 100% of a single class, then the classifier will learn that with 100% probability any sample belongs to this class, and will predict this class with absolute confidence.
The lesson: to use any classifier, you need at least two classes, otherwise, the prediction will be more or less meaningless. My recommendation is to add negative samples, that is, samples without your target person, including:
- images with other persons from your and other cartoons
- images with background only and without persons
- images with some non-animated objects
There are a few exceptions, such as OneClassSVM, that are (presumable) capable of producing meaningful scores being trained on a single class. But whether they work adequately on your data, that you will never know, until you test them with data from several different classes.
Answered By - David Dale
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.