Issue
I have been trying to use a pre-trained model(XceptionNet) to get a feature vector corresponding to each input image for a classification task. But am stuck as the model.predict() gives unreliable and varying output vector for the same image when the dataset size changes.
In the following code, batch
is the data containing images and for each of these images I want a feature vector which I am obtaining using the pre-trained model.
batch.shape
TensorShape([803, 800, 600, 3])
Just to make it clear that all the input images are different here are few of the input images displayed.
plt.imshow(batch[-23])
plt.figure()
plt.imshow(batch[-15])
My model is the following
model_xception = Xception(weights="imagenet", input_shape=(*INPUT_SHAPE, 3), include_top=False)
model_xception.trainable = False
inp = Input(shape=(*INPUT_SHAPE, 3)) # INPUT_SHAPE=(800, 600)
out = model_xception(inp, training=False)
output = GlobalAvgPool2D()(out)
model = tf.keras.Model(inp, output, name='Xception-kPiece')
Now the issue is presented in the following code outputs
model.predict(batch[-25:]) # prediction on the last 25 images
1/1 [==============================] - 1s 868ms/step
array([[4.99584060e-03, 4.25433293e-02, 9.93836671e-02, ...,
3.21301445e-03, 2.59823762e-02, 9.08260979e-03],
[2.50613055e-04, 1.18759666e-02, 0.00000000e+00, ...,
1.77203789e-02, 7.71604702e-02, 1.28602296e-01],
[3.41954082e-02, 1.82092339e-02, 5.07147610e-03, ...,
7.09404126e-02, 9.45318267e-02, 2.69510925e-01],
...,
[0.00000000e+00, 5.16504236e-03, 4.90547449e-04, ...,
4.62833559e-04, 9.43152513e-03, 1.17826145e-02],
[0.00000000e+00, 4.64747474e-03, 0.00000000e+00, ...,
1.21422185e-04, 4.47714329e-03, 1.92385539e-02],
[0.00000000e+00, 1.29655155e-03, 4.02751788e-02, ...,
0.00000000e+00, 0.00000000e+00, 3.20959717e-01]], dtype=float32)
model.predict(batch)[-25:] # prediction on entire dataset of 803 images and then extracting the vectors corresponding to the last 25 images
26/26 [==============================] - 34s 1s/step
array([[1.7320104e-05, 3.6561250e-04, 0.0000000e+00, ..., 0.0000000e+00,
3.5924271e-02, 0.0000000e+00],
[1.7320104e-05, 3.6561250e-04, 0.0000000e+00, ..., 0.0000000e+00,
3.5924271e-02, 0.0000000e+00],
[1.7320104e-05, 3.6561250e-04, 0.0000000e+00, ..., 0.0000000e+00,
3.5924271e-02, 0.0000000e+00],
...,
[1.7318112e-05, 3.6561041e-04, 0.0000000e+00, ..., 0.0000000e+00,
3.5924841e-02, 0.0000000e+00],
[1.7318112e-05, 3.6561041e-04, 0.0000000e+00, ..., 0.0000000e+00,
3.5924841e-02, 0.0000000e+00],
[1.7318112e-05, 3.6561041e-04, 0.0000000e+00, ..., 0.0000000e+00,
3.5924841e-02, 0.0000000e+00]], dtype=float32)
There are two problems in such a behavior:
- Both the outputs are not same, but the last 25 input images are same.
- The output for each input image in the larger batch is same.
My take on the problem:
- I feel like the BatchNormalization layers are causing the issue. But what is the fix? I am passing argument in the
model_xception
fortraining=False
and alsomodel_xception.trainable=False
still the output is same for all the inputs. - The increase in number of images in the batch is the problem.
- Not only for XceptionNet for all other models this issue is evident. I have also experimented with EfficientNetV2 models.
Can anyone help fix the bug?
Solution
The issue seems to be appearing cause I am using tensorflow-macos which has this major bug of predictions which are wrong for exceeding a particular number of input images.
See the issue in action below:
- When 57 input images are used then the predictions are different and same as 56, ..., 1 input image (which is consistent behavior and as expected).
model.predict(batch[-57:])
1/1 [==============================] - 2s 2s/step
array([[0.00000000e+00, 2.56574154e-02, 1.79693177e-01, ...,
2.85670068e-03, 1.08444700e-02, 2.34257965e-03],
[0.00000000e+00, 1.28444552e-03, 0.00000000e+00, ...,
4.11680201e-03, 4.49061068e-03, 1.83695972e-01],
[0.00000000e+00, 2.29660165e-03, 7.84890354e-03, ...,
1.86224483e-04, 1.81426702e-03, 1.54079705e-01],
...,
[0.00000000e+00, 5.16504236e-03, 4.90547449e-04, ...,
4.62833559e-04, 9.43152513e-03, 1.17826145e-02],
[0.00000000e+00, 4.64747474e-03, 0.00000000e+00, ...,
1.21422185e-04, 4.47714329e-03, 1.92385539e-02],
[0.00000000e+00, 1.29655155e-03, 4.02751788e-02, ...,
0.00000000e+00, 0.00000000e+00, 3.20959717e-01]], dtype=float32)
model.predict(batch[-55:])
2/2 [==============================] - 2s 1s/step
array([[0.00000000e+00, 2.29660165e-03, 7.84890354e-03, ...,
1.86224483e-04, 1.81426702e-03, 1.54079705e-01],
[4.94572960e-05, 8.04292504e-04, 5.08825444e-02, ...,
4.58029518e-03, 2.09121332e-02, 5.57549708e-02],
[0.00000000e+00, 1.62312540e-03, 0.00000000e+00, ...,
4.35817856e-05, 2.16606092e-02, 1.30677417e-01],
...,
[0.00000000e+00, 5.16504236e-03, 4.90547449e-04, ...,
4.62833559e-04, 9.43152513e-03, 1.17826145e-02],
[0.00000000e+00, 4.64747474e-03, 0.00000000e+00, ...,
1.21422185e-04, 4.47714329e-03, 1.92385539e-02],
[0.00000000e+00, 1.29655155e-03, 4.02751788e-02, ...,
0.00000000e+00, 0.00000000e+00, 3.20959717e-01]], dtype=float32)
- But when the input images is changed to 58 or more there is the above mentioned issue.
model.predict(batch[-58:])
1/1 [==============================] - 2s 2s/step
array([[5.3905282e-04, 2.8516021e-02, 1.2775734e-03, ..., 5.4674568e-03,
1.7451918e-02, 9.4717339e-02],
[0.0000000e+00, 2.8345605e-02, 1.2786543e-03, ..., 0.0000000e+00,
2.4870334e-03, 1.2716405e-01],
[4.3588653e-03, 8.2868971e-02, 1.8764129e-02, ..., 2.5320805e-03,
5.9973758e-02, 6.9927111e-02],
...,
[1.7320104e-05, 3.6561250e-04, 0.0000000e+00, ..., 0.0000000e+00,
3.5924271e-02, 0.0000000e+00],
[1.7320104e-05, 3.6561250e-04, 0.0000000e+00, ..., 0.0000000e+00,
3.5924271e-02, 0.0000000e+00],
[1.7320104e-05, 3.6561250e-04, 0.0000000e+00, ..., 0.0000000e+00,
3.5924271e-02, 0.0000000e+00]], dtype=float32)
If anyone could suggest a fix or workaround while still using tensorflow on mac it would be really helpful.
There is also a github issue which is still not fixed here.
Answered By - Rishi Dey Chowdhury
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.