Issue
When training my network I am occasionally met with the warning:
W0722 11:47:35.101842 140641577297728 optimizer_v2.py:928] Gradients does not exist for variables ['model/conv1d_x/Variable:0'] when minimizing the loss.
This happens sporadically at infrequent intervals (maybe once in every 20 successful steps). My model basically has two paths which join together with concatenations at various positions in the network. To illustrate this, here is a simplified example of what I mean.
class myModel(tf.keras.Model):
def __init__(self):
self.conv1 = Conv2D(32)
self.conv2 = Conv2D(32)
self.conv3 = Conv2D(16)
def call(self, inputs):
net1 = self.conv1(inputs)
net2 = self.conv2(inputs)
net = tf.concat([net1, net2], axis=2)
net = self.conv3(net)
end_points = tf.nn.softmax(net)
model = myModel()
with tf.GradientTape() as tape:
predicition = model(image)
loss = myloss(labels, prediction)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
In reality my network is much larger, but the variables that generally don't have gradients tend to be the ones at the top of the network. Before each Conv2D
layer I also have a custom gradient. Sometimes when I the error appears I can notice that the gradient function for that layer has not been called.
My question is how can the gradient tape sometimes take what appears to be different paths when propagating backwards through my network. My secondary question, is this caused by having two separate routes through my network (i.e. conv1 AND conv2). Is there a fundamental flaw in this network architecture?
Ideally, could I define to the GradientTape()
that it must find the gradients for each of the top layers?
Solution
Revisiting this question, it is actually quite unhelpful and probably should have been down voted more! There are many scenarios where your gradient has invalid values in it. But ultimately, at some point in the gradient computation a NaN
value was created.
In my scenario I was using custom gradient op, and ultimately there was a bug in my gradient calculation code. This bug caused the NaN
under some circumstances.
If you are not using custom gradient ops, then likely you've either made a mistake in your network definition (e.g., disconnected variable as other answers suggest) or there is some issue with your data.
In summary, no one problem will cause this, it just an artefact from a) buggy gradient calculation, b) buggy network definition, c) issue with your data or d) anything else. There is no one solution for this question, it's just the result of an error somewhere else.
To directly answer my questions in the original post:
Q. How can the gradient tape sometimes take what appears to be different paths when propagating backwards through my network?
A. It doesn't, a bug in the input to the gradient function resulted in no gradients being calcucated for that layer.
Q. My secondary question, is this caused by having two separate routes through my network (i.e. conv1 AND conv2). Is there a fundamental flaw in this network architecture?
A. No, there is nothing wrong with this architecture.
Answered By - D.Griffiths
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.