Issue
I have trained a model to figure out if an image is right or wrong (just 2 classes) and I have used the guide on keras website for GradCAM.
The input images are reshaped to (250, 250) and then normalized by dividing the image numpy array by 255. This is then passed for the training of the model.
Here is the code attached. I am encountering the following error: Invalid reduction dimension (1 for input with 1 dimension(s) [Op:Mean]
Data
image = cv2.imread("/content/drive/MyDrive/SendO2/Train/correct/droidcam-20210128-152301.jpg")
image = cv2.resize(image, (250, 250))
image = image.astype('float32') / 255
image = np.expand_dims(image, axis=0)
Model
model = Sequential()
#Adding first convolutional layer
model.add(Conv2D(64, (3,3), activation="relu"))
#Adding maxpooling
model.add(MaxPooling2D((2,2)))
#Adding second convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))
#Adding third convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))
#Adding fourth convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))
#Adding fifth convolutional layer and maxpooling
model.add(Conv2D(64, (3,3), activation="relu"))
model.add(MaxPooling2D((2,2)))
#Flattening the layers
model.add(Flatten())
model.add(Dense(128, input_shape = X.shape[1:], activation="relu"))
#Output Layer. Since, the image is right/wrong, only 2 neurons is needed.
model.add(Dense(2, activation = "softmax"))
# model.add(Dense(2, activation = "sigmoid"))
model.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy", metrics = ["accuracy"])
GradCAM
def get_img_array(img_path, size):
# `img` is a PIL image of size 299x299
img = keras.preprocessing.image.load_img(img_path, target_size=size)
# `array` is a float32 Numpy array of shape (299, 299, 3)
array = keras.preprocessing.image.img_to_array(img)
# We add a dimension to transform our array into a "batch"
# of size (1, 299, 299, 3)
array = np.expand_dims(array, axis=0)
print(array.shape)
return array
def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
# First, we create a model that maps the input image to the activations
# of the last conv layer as well as the output predictions
grad_model = tf.keras.models.Model(
[model.inputs], [model.get_layer(last_conv_layer_name).output, model.output]
)
# Then, we compute the gradient of the top predicted class for our input image
# with respect to the activations of the last conv layer
with tf.GradientTape() as tape:
last_conv_layer_output, preds = grad_model(img_array)
if pred_index is None:
pred_index = tf.argmax(preds[0])
class_channel = preds[:, pred_index]
# This is the gradient of the output neuron (top predicted or chosen)
# with regard to the output feature map of the last conv layer
grads = tape.gradient(class_channel, last_conv_layer_output)
# This is a vector where each entry is the mean intensity of the gradient
# over a specific feature map channel
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
# pooled_grads = tf.reduce_mean(grads)
# We multiply each channel in the feature map array
# by "how important this channel is" with regard to the top predicted class
# then sum all the channels to obtain the heatmap class activation
last_conv_layer_output = last_conv_layer_output[0]
heatmap = last_conv_layer_output @ pooled_grads[..., tf.newaxis]
heatmap = tf.squeeze(heatmap)
# For visualization purpose, we will also normalize the heatmap between 0 & 1
heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
return heatmap.numpy()
Adjusting parameters
img_size = (250, 250)
preprocess_input = keras.applications.xception.preprocess_input
decode_predictions = keras.applications.xception.decode_predictions
last_conv_layer_name = "dense_1"
# The local path to our target image
img_path = "/content/drive/MyDrive/SendO2/Train/correct/droidcam-20210128-152301.jpg"
preprocess_input = keras.applications.xception.preprocess_input
decode_predictions = keras.applications.xception.decode_predictions
display(Image(img_path))
Running them
# Prepare image
img_array = preprocess_input(get_img_array(img_path, size=img_size))
# Make model
model = model_builder(weights="imagenet")
# Remove last layer's softmax
model.layers[-1].activation = None
# Print what the top predicted class is
preds = model.predict(img_array)
print("Predicted:", decode_predictions(preds, top=1)[0])
# Generate class activation heatmap
heatmap = make_gradcam_heatmap(img_array, model, last_conv_layer_name)
# Display heatmap
plt.matshow(heatmap)
plt.show()
I would really be grateful if anyone can help me out here.
Solution
Here is the complete demo working code. Similar to you, I will be classifying 2 classes with softmax
and use the sparse_categorical_crossentropy
loss function. There is some issue in your model definition, so I will write my own, don't worry very simple and alike.
To make it useful, I will answer end-to-end, hopefully, new visitors may find it useful too. I will be classifying whether a digit even or odd - a binary classification. And lastly, we will use a raw sample image and preprocess it and try to find a class activation map to see where our model makes its attention. So, here is the content:
- Prepare 2 classes data set
- Build 2 class classifier and train it
- Find Grad-CAM
Data Sets
We will use MNIST
and modify it for binary classes - even number or odd number.
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
# x set
x_train = tf.expand_dims(x_train, axis=-1)
x_train = tf.divide(x_train, 255)
x_train = tf.image.resize(x_train, [84,84])
# y set
# odd : 0 ; even : 1
y_train = (y_train % 2 == 0).astype(int)
print(x_train.shape, y_train.shape)
(60000, 84, 84, 1) (60000,)
Model
from tensorflow.keras import Sequential
from tensorflow.keras.layers import (Conv2D, MaxPooling2D, Flatten, Dropout,
Dense, GlobalAveragePooling2D)
model = Sequential()
#Adding first convolutional layer
model.add(Conv2D(16, kernel_size=(3,3), input_shape = (84,84,1)))
model.add(Conv2D(32, kernel_size=(3,3), activation="relu"))
model.add(Conv2D(64, kernel_size=(3,3), activation="relu"))
model.add(Conv2D(128, kernel_size=(3,3), activation="relu"))
model.add(GlobalAveragePooling2D())
model.add(Dropout(0.5))
model.add(Dense(2, activation=tf.nn.softmax))
model.summary()
model.compile(optimizer = "adam",
loss = "sparse_categorical_crossentropy",
metrics = ["accuracy"])
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_16 (Conv2D) (None, 82, 82, 16) 160
_________________________________________________________________
conv2d_17 (Conv2D) (None, 80, 80, 32) 4640
_________________________________________________________________
conv2d_18 (Conv2D) (None, 78, 78, 64) 18496
_________________________________________________________________
conv2d_19 (Conv2D) (None, 76, 76, 128) 73856
_________________________________________________________________
global_average_pooling2d_3 ( (None, 128) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 128) 0
_________________________________________________________________
dense_2 (Dense) (None, 2) 258
=================================================================
Total params: 97,410
Trainable params: 97,410
Non-trainable params: 0
Note that, for Grad-CAM
we will be using layer conv2d_19
- right before the GAP
layer. (You may choose another layer that gives 2D
feature maps). Ok, training the model:
model.fit(x_train, y_train, epochs=20, batch_size=256, verbose=2)
Epoch 1/20
235/235 - 32s - loss: 0.5549 - accuracy: 0.7203
Epoch 2/20
235/235 - 29s - loss: 0.4586 - accuracy: 0.7890
Epoch 3/20
235/235 - 29s - loss: 0.4160 - accuracy: 0.8132
Epoch 4/20
235/235 - 29s - loss: 0.4000 - accuracy: 0.8226
Epoch 5/20
235/235 - 29s - loss: 0.3830 - accuracy: 0.8304
Epoch 6/20
235/235 - 29s - loss: 0.3690 - accuracy: 0.8367
Epoch 7/20
235/235 - 29s - loss: 0.3619 - accuracy: 0.8421
...
...
Grad-CAM
We will be using exact same GradCAM
classes from my other answer, here. No differences. After that, we will load a sample image:
import matplotlib.pyplot as plt
image = cv2.imread('/content/5.png', 0)
image = cv2.bitwise_not(image) # ATTENTION
image = cv2.resize(image, (84, 84))
# checking how it looks
plt.imshow(image, cmap="gray")
plt.show()
image = tf.expand_dims(image, axis=-1) # from 84 x 84 to 84 x 84 x 1
image = tf.divide(image, 255) # normalize
image = tf.reshape(image, [1, 84, 84, 1]) # reshape to add batch dimension
print(image.shape) # (1, 84, 84, 1)
Note: You may wonder why I am using image = cv2.bitwise_not(image)
this. Because for me, my sample looks as follows (background white and foreground black), which is not look like the data set (MNIST
) on which the model is trained on. That's why I had to use the bitwise_not
operation on my sample - but you may don't need to do this for your case. For more details, please see my another answer.
Ok, we know what is this image. And I also added some lines to the image right below to see what model reacts on them VS the main ROI. Let's see what the model would say, whether it's an even number (1) or an odd number (0).
preds = model.predict(image)
i = np.argmax(preds[0])
i # 0 - great model correctly recognize, this is an odd number
Finding class activation maps
# `conv2d_19` - remember this, we talked about it earlier
icam = GradCAM(model, i, 'conv2d_19')
heatmap = icam.compute_heatmap(image)
heatmap = cv2.resize(heatmap, (84, 84))
image = cv2.imread('/content/5.png')
image = cv2.resize(image, (84, 84))
print(heatmap.shape, image.shape)
(heatmap, output) = icam.overlay_heatmap(heatmap, image, alpha=0.5)
Visualize
fig, ax = plt.subplots(1, 3)
fig.set_size_inches(20,20)
ax[0].imshow(heatmap)
ax[1].imshow(image)
ax[2].imshow(output)
Answered By - M.Innat
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.