Issue
I am trying to train a deep learning architecture, the model trains perfectly. I am testing after each epoch. For 7 epoch all the loss and accuracy seems okay but at 8 epoch during the testing test loss becomes nan. I have checked my data, it got no nan. Also my test accuracy is higher than train which is weird. Train data size is 37646 and test is 18932 so it should be enough. Before becoming nan test started to become very high around 1.6513713663602217e+30. This is really weird and I don't understand why is happening. Any help or suggestion is much appreciated.
Solution
Assuming that a very high learning rate isn't the cause of the problem, you can clip your gradients before the update, using PyTorch's gradient clipping
.
Example:
optimizer.zero_grad()
loss, hidden = model(data, hidden, targets)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)
optimizer.step()
This is the first thing to do when you have a NaN loss, if of course you have made sure than you don't have NaNs elsewhere, e.g. in your input features. I have made use of gradient clipping in cases where increasing the learning rate caused NaNs, but still wanted to test a higher learning rate. Decreasing the learning rate could also solve your problem, but I'm guessing that you have already tried this.
Empirically, I set clip_value = 5
most of the times, and then see its (usually non-significant) impact on performance. Feel free to experiment with different values.
Answered By - Alex Metsai
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.