Issue
My question concerns more about how the algorithm work. I have successfully implemented EfficientNet integration and modelization for grayscale images and now I want to understand why it works.
Here the most important aspect is the grayscale and its 1 channel. When I put channels=1
, the algorithm doesn't work because, if I understood right, it was made on 3-channel images. When I put channels=3
it works perfectly.
So my question is, when I put channels = 3
and feed the model with preprocessed images with channels=1
, why it continues to work?
Code for EfficientNetB5
# Variable assignments
num_classes = 9
img_height = 84
img_width = 112
channels = 3
batch_size = 32
# Make the input layer
new_input = Input(shape=(img_height, img_width, channels),
name='image_input')
# Download and use EfficientNetB5
tmp = tf.keras.applications.EfficientNetB5(include_top=False,
weights='imagenet',
input_tensor=new_input,
pooling='max')
model = Sequential()
model.add(tmp) # adding EfficientNetB5
model.add(Flatten())
...
Code of preprocessing into grayscale
data_generator = ImageDataGenerator(
validation_split=0.2)
train_generator = data_generator.flow_from_directory(
train_path,
target_size=(img_height, img_width),
batch_size=batch_size,
color_mode="grayscale", ###################################
class_mode="categorical",
subset="training")
Solution
I dug into what happens when you give grayscale images to efficient net models with three-channel inputs. Here are the first layers of Efficient Net B5 whose input_shape is (128,128,3)
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_7 (InputLayer) [(None, 128, 128, 3 0 []
)]
rescaling_7 (Rescaling) (None, 128, 128, 3) 0 ['input_7[0][0]']
normalization_13 (Normalizatio (None, 128, 128, 3) 7 ['rescaling_7[0][0]']
n)
tf.math.truediv_4 (TFOpLambda) (None, 128, 128, 3) 0 ['normalization_13[0][0]']
stem_conv_pad (ZeroPadding2D) (None, 129, 129, 3) 0 ['tf.math.truediv_4[0][0]']
And here is the shape of the output of each of these layers when the model has as input a grayscale image:
input_7 (128, 128, 1)
rescaling_7 (128, 128, 1)
normalization_13 (128, 128, 3)
tf.math.truediv_4 (128, 128, 3)
stem_conv_pad (129, 129, 3)
As you can see, the number of channels of the output tensor switches from 1 to 3 when proceeding to the normalization_13 layer, so let's see what this layer is actually doing. The Normalization layer is performing this operation on the input tensor:
(input_tensor - self.mean) / sqrt(self.var) // see https://www.tensorflow.org/api_docs/python/tf/keras/layers/Normalization
The number of channels changes after the subtraction. As a matter of fact, self.mean looks like this :
<tf.Tensor: shape=(1, 1, 1, 3), dtype=float32, numpy=array([[[[0.485, 0.456, 0.406]]]], dtype=float32)>
So self.mean has three channels and when performing the subtraction between a tensor with one channel and a tensor with three channels, the output looks like this: [firstTensor - secondTensorFirstChannel, firstTensor - secondTensorSecondChannel, firstTensor - secondTensorThirdChannel]
And this is how the magic happens and this is why the model can take as input grayscale images!
I have checked this with efficient net B5 and with efficient net B2V2. Even if they have differences in the way the Normalization layer is declared, the process is the same. I suppose that is also the case for the other efficient net models.
I hope it was clear enough!
Answered By - afm215
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.