Issue
I'm a beginner in machine learning and I currently am trying to predict the position of an object within an image that is part of a dataset I created.
This dataset contains about 300 images in total and contains 2 classes (Ace and Two).
I created a CNN that predicts whether it's an Ace or a two with about 88% accuracy.
Since this dataset was doing a great job, I decided to try and predict the position of the card (instead of the class). I read up some articles and from what I understood, all I had to do was to take the same CNN that I used to predict the class and to change the last layer for a Dense layer of 4 nodes. That's what I did, but apparently this isn't working.
Here is my model:
model = Sequential()
model.add(Conv2D(64,(3,3),input_shape = (150,150,1)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(32,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=2))
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(4))
model.compile(loss="mean_squared_error",optimizer='adam',metrics=[])
model.fit(X,y,batch_size=1,validation_split=0,
epochs=30,verbose=1,callbacks=[TENSOR_BOARD])
What I feed to my model:
X: a grayscale Image of 150x150 pixels. Each pixels are rescaled between [0-1]
y: Smallest X coordinate, Highest Y coordinate, Width and Height of the object (each of those values are between [0-1].
And here's an example of predictions it gives me:
[array([ 28.66145 , 41.278576, -9.568813, -13.520659], dtype=float32)]
but what I really wanted was:
[0.32, 0.38666666666666666, 0.4, 0.43333333333333335]
I knew something was wrong here so I decided to train and test my CNN on a single image (so it should overfit and predict the right bounding box for this single image if it worked). Even after overfitting on this single image, the predicted values were ridiculously high.
So my question is: What am I doing wrong ?
EDIT 1
After trying @Matias's solution which was to add a sigmoid activation function to the last layer, all of the output's values are now between [0,1].
But, even with this, the model still produces bad outputs. For example, after training it 10 epochs on the same image, it predicted this:
[array([0.0000000e+00, 0.0000000e+00, 8.4378130e-18, 4.2288357e-07],dtype=float32)]
but what I expected was:
[0.2866666666666667, 0.31333333333333335, 0.44666666666666666, 0.5]
EDIT 2
Okay, so, after experimenting for quite a while, I've come to a conclusion that the problem was either my model (the way it is built) or the lack of training data.
But even if it was caused by a lack of training data, I should have been able to overfit it on 1 image in order to get the right predictions for this one, right?
I created another post which asks about my last question since the original one has been answered and I don't want to completely re-edit the post since it would make the first answers kind of pointless.
Solution
Since your targets (the Y values) are normalized to the [0, 1] range, the output of the model should match this range. For this you should use a sigmoid activation at the output layer, so the output is constrained to the [0, 1] range:
model.add(Dense(4, activation='sigmoid'))
Answered By - Dr. Snoopy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.