Issue
I wanted to see how dropout works so I went into the layers.core module and changed the dropout call from in_train_phase to in_test_phase.
I'm not sure if my change is responsable for the weird dropout behaviour, so please bear with me.
Now with these changes in mind the following code snippet:
from keras.models import Model
from keras.layers import Dropout, Input
import numpy as np
import tensorflow as tf
from keras import initializers
x=np.ones((2,2,4))
# x[:,1,:] = 1
print(x)
from keras.layers import Dense
input = Input(name='atom_inputs', shape=(2, 4))
x1 = Dense(4, activation='linear',
kernel_initializer=initializers.Ones(),
bias_initializer='zeros')(input)
x1 = Dropout(0.5, noise_shape=(tf.shape(input)[0], 1, 4))(x1)
fmodel = Model(input, x1)
fmodel.compile(optimizer='sgd', loss='mse')
print(fmodel.predict(x))
will produce different predictions depending on the dropout rate.
e.g.:
Dropout(0.2)
[[[5. 5. 5. 5.]
[5. 5. 5. 5.]]
[[5. 0. 5. 0.]
[5. 0. 5. 0.]]]
Dropout(0.5)
[[[0. 0. 8. 8.]
[0. 0. 8. 8.]]
[[8. 0. 8. 8.]
[8. 0. 8. 8.]]]
Where am I going wrong? The dropout is defined on the dense output layer, so it should only affect the neurons that turn off and on, but not their respective values. Right?
Solution
This happens because when using Dropout
, you not only turn on and off different neurons but also scale data in order to compensate the fact that the following layer could receive less signal due to blacking out part of neurons. It's called an inverted dropout and you may read about it here.
So basically each output from your network is rescaled by a 1 / (1 - p)
factor for this compensation. This is why your outputs differ.
For Dropout(0.2)
compensation is 1 / (1 - 0.2) = 1.25
and this results in 5 = 4 * 1.25
and for Dropout(0.5)
compensation is 1 / (1 - 0.5) = 2
and this results in 8 = 4 * 2
.
Answered By - Marcin Możejko
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.