Issue
‘Gradient will not be updated but be accumulated, and updated every N rounds.’ I have a question that how the gradients are accumulated in the below code snippet: in every round of the below loop I can see a new gradient is computed by loss.backward() and should be stored internally, but would this internally stored gradient be refreshed in the next round? How the gradient is summed up, and later be applied every N rounds?
for i, (inputs, labels) in enumerate(training_set):
predictions = model(inputs) # Forward pass
loss = loss_function(predictions, labels) # Compute loss function
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
optimizer.step() # Now we can do an optimizer step
model.zero_grad()
Solution
The first time you call backward, the .grad
attribute of the parameters of your model will be updated from None
, to the gradients. If you do not reset the gradients to zero, future calls to .backward()
will accumulate (i.e. add) gradients into the attribute (see the docs).
When you call model.zero_grad()
you are doing the reset.
Answered By - iacolippo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.