Issue
Sorry, I know questions of this sort have been asked a lot, but I still don't understand the behavior of autograd.
A simple example is below:
ce_loss=torch.nn.BCELoss()
par=torch.randn((1,n),requires_grad=True)
act=torch.nn.Sigmoid()
y_hat=[]
for obs in data:
y_hat.append(act(par@obs))
loss=ce_loss(torch.tensor(y_hat,requires_grad=True),y)
loss.backward()
After applying backward, the grad of par remains None (although it is a leaf node with requires_grad=True).
Any tips?
Solution
It is simply because torch.tensor(...)
create a new leaf of the computational graph. It means by definition that the operation inside torch.tensor
are blocked, in particular the computation using elements of par
(and so, the grads are never computed). Note that adding requires_grad=True
doesn't change anything because it always creates a leaf (with grads) that forgot the previous operations by definition of a leaf.
I suggest you an other way to make your computation without iterate on data
and using native parallelization:
batch_size, n = 8, 10 # or something else
# Random data and labels to reproduce the code
data = torch.randn((batch_size, n))
y = torch.randn((batch_size, ))
y = y.unsqueeze(1) # size (batch_size, 1)
ce_loss = torch.nn.BCELoss()
par = torch.randn((1, n), requires_grad=True)
act = torch.nn.Sigmoid()
y_hat = act(data @ par.T) # compute all predictions in parallel
loss = ce_loss(y_hat, y) # automatically reduced to scalar (mean)
loss.backward()
print(par.grad) # no longer None!
Answered By - Valentin Goldité
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.