Issue
I can see what this code below from this video is trying to do. But the sum
from y=torch.sum(x**2)
confuses me. With sum
operation, y
becomes a tensor with one single value. As I understand .backward()
as calculating derivatives, why would we want to use sum
and reduce y
to one value?
import pytorch
import matplotlib.pyplot as plt
x = torch.linspace(-10.0,10.0,10, requires_grad=True)
Y = x**2
y = torch.sum(x**2)
y.backward()
plt.plot(x.detach().numpy(), Y.detach().numpy(), label="Y")
plt.plot(x.detach().numpy(), x.grad.detach().numpy(), label="derivatives")
plt.legend()
Solution
You can only compute partial derivatives for a scalar function. What backwards()
gives you is d loss/d parameter
and you expect a single gradient value per parameter/variable.
Had your loss function been a vector function, i.e., mapping from multiple inputs to multiple outputs, you would have ended up with multiple gradients per parameter/variable.
Please see this answer for more information.
Answered By - Shai
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.