Issue
import torch
import torch.optim as optim
import torch.nn as nn
input = torch.tensor([1.,2.], requires_grad=True)
sigmoid = nn.Sigmoid()
interm = sigmoid(input)
optimizer = optim.SGD([input], lr=1, momentum=0.9)
for epoch in range(5):
optimizer.zero_grad()
loss = torch.linalg.vector_norm(interm - torch.tensor([2.,2.]))
print(epoch, loss, input, interm)
loss.backward(retain_graph=True)
optimizer.step()
print(interm.grad)
So I created this simplified example with an input going into a sigmoid as an intermediate activation function.
I am trying to find the input that results in interm = [2.,2.]
But the gradients are not passing through. Anyone know why?
Solution
Grads are computed for leaf tensors. In your example, input
is a leaf tensor, while interm
is not.
When you try to access interm.grad
, you should get the following error message:
UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:486.)
This is because grads are propagated back to the leaf tensor input
, not to interm
. You can add interm.retain_grad()
if you want to get the grad for the interm
variable.
However, even if you did this, there is nothing in your example that would cause the value of interm
to change. Each optimizer step changes the input
value, but this does not result in interm
being recomputed. If you want interm
to be updated, you need to recompute it each iteration with the new input
value. ie:
for epoch in range(5):
optimizer.zero_grad()
interm = sigmoid(input)
interm.retain_grad()
loss = torch.linalg.vector_norm(interm - torch.tensor([2.,2.]))
print(epoch, loss, input, interm)
loss.backward(retain_graph=True)
optimizer.step()
print(interm.grad)
There's also a fundamental problem with what you are trying to do. You say you want the input
that results in interm = [2., 2.]
. However, you are computing interm = sigmoid(input)
. The sigmoid function is bounded between (0, 1)
. There is no such value of input
that would result in interm = [2., 2.]
, because 2
is outside the range of the sigmoid function. If you ran your optimization loop indefinitely, you would get input = [inf, inf]
and interm = [1., 1.]
.
Answered By - Karl
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.