Issue
I was following a series of tutorial on youtube about deep learning, and I encountered a problem which really confuses me.
X = torch.tensor([1,2,3,4], dtype = torch.float32)
Y = torch.tensor([2,4,6,8], dtype = torch.float32)
w = torch.tensor(0.0, dtype = torch.float32, requires_grad=True)
def forward(x):
return w*x;
def loss(y, y_predicted):
return ((y-y_predicted)**2).mean()
print(f'Prediction before training: f(5) = {forward(5):.3f}')
learning_rate= 0.01
epoch = 20
for i in range(epoch):
y_pred = forward(X)
l = loss(Y, y_pred)
l.backward()
with torch.no_grad():
w = w - learning_rate * w.grad
# (w -= learning_rate * w.grad) # would not cause error in the following line
w.grad.zero_() #error : 'NoneType' object has no attribute 'zero_'
if i % 1 ==0:
print(f'weight : {w}, loss : {l}')
I really wonder the difference between "w = w - learning_rate * w.grad" and "w -= learning_rate * w.grad" cause these two are the same in my expericence. Thanks!
Solution
As pointed out in the comment, the problem is in how Pytorch computes/stores gradients. In fact,
w-= learning_rate * w.grad
is an in-place operation, which will make w keep its initial properties (the requires_grad=True). Usually in Pytorch, we avoid in-place operations as it may break the computational graph used by Autograd (see Pytorch Forum Post).
But for you, this:
w = w - learning_rate * w.grad
is not in-place. Thus, w is assigned to a new copy, and because of the torch.no_grad() statement, this copy won't have a .grad attribute.
Answered By - Lucas D. Meier
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.