Issue
I can iterate through my first iteration of PyTorch successfully, but am unable to do a second iteration. Please see my code and error I get if I were to run my code a second time through.
current_grad = 0
l_rate = 10**-4
x=torch.tensor([[1.0,2.,4.],[2.,3.,2.]])
y=torch.tensor([1.0,0.])
w=torch.tensor([.5,2.,1.], requires_grad=True)
# forward propagate
output = x @ w.T
y_pred = 1/(1+ math.e**-(output))
# objective function
loss = sum(y*(y_pred**.5) + ((1-y) * (1-y_pred)**.5)) / len(y_pred)
# now, get gradient over these layers
x.requires_grad =True
y.requires_grad =True
w.requires_grad =True
loss.backward()
# update only 1 set of weights here.
with torch.no_grad():
w = w + (w.grad * l_rate)
I get an error at my loss.backward line:
TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'
How can I fix this so that my w.grad is not NoneType on the second time around?
Solution
The main issue with your code is that w = w + (w.grad * l_rate)
is assigning a new variable to w
.
Instead you need to update w
in place.
with torch.no_grad():
w.copy_(w + w.grad * l_rate)
Some other issues
- Learning rate is much too small for this problem
- You need to zero the gradients of
w
after each step sincebackwards
accumulates gradients. - Setting
requires_grad
forx
andy
is unnecessary since you don't need the gradient of loss w.r.t. these tensors. - In your code, the value of "loss" is actually something you want to maximize, since your objective function is maximal when y = y_pred. Generally we wouldn't call this loss, because that would imply you want to minimize it (only a masochist wants to maximize their loss :P).
Correcting for these issues
import torch
l_rate = 0.1
x = torch.tensor([[1.0, 2., 4.], [2., 3., 2.]])
y = torch.tensor([1.0, 0.])
w = torch.tensor([.5, 2., 1.], requires_grad=True)
# training loop ...
# forward propagate
output = x @ w
y_pred = torch.sigmoid(output)
# objective function
objective = torch.mean(y * (y_pred**.5) + ((1 - y) * (1 - y_pred)**.5))
# compute gradient of objective w.r.t. w
objective.backward()
with torch.no_grad():
# gradient ascent
w.copy_(w + (w.grad * l_rate))
# zero the gradients
w.grad.zero_()
Answered By - jodag
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.