Issue
we can get loss of last layer by loss = loss_fn(y_pred, y_true)
, and results in a loss: Tensor
then we call loss.backward()
to do back propagation.
after optimizer.step()
we could see updated model.parameters()
taking below example
y = Model1(x) # with optimizer1
z = Model2(y) # with optimizer2
loss = loss_fn(z, z_true)
loss.backward()
optimizer2.optimize() # update Model2 parameters
# in order to update Model1 parameters I think we should do
y.backward(grad_tensor=the_output_gradient_from_Model2)
optimizer1.optimize()
How to get the intermediate back propagation result? e.g. the gradient of output grad, which will be taken by y_pred.backward(grad_tensor=grad).
Update: The solution is setting required_grad=True
and take Tensor x.grad
. Thanks for the answers.
PS: The scenario is I am doing a federated learning, the model is split into 2 parts. The first part takes input and forward to second part. And it need the second part to calculate the loss and back propagate the loss to first part, so that the first part takes the loss and do its own back propagation.
Solution
I will assume you're referring to intermediate gradients when you say "loss of a specific layer".
You can access the gradient of the layer with respect to the output loss by accessing the grad
attribute on the parameters of your model which require gradient computation.
Here is a simplistic setup:
>>> f = nn.Sequential(
nn.Linear(10,5),
nn.Linear(5,2),
nn.Linear(2, 2, bias=False),
nn.Sigmoid())
>>> x = torch.rand(3, 10).requires_grad_(True)
>>> f(x).mean().backward()
Navigate through all the parameters per layer:
>>> for n, c in f.named_children():
... for p in c.parameters():
... print(f'<{n}>:{p.grad}')
<0>:tensor([[-0.0054, -0.0034, -0.0028, -0.0058, -0.0073, -0.0066, -0.0037, -0.0044,
-0.0035, -0.0051],
[ 0.0037, 0.0023, 0.0019, 0.0040, 0.0050, 0.0045, 0.0025, 0.0030,
0.0024, 0.0035],
[-0.0016, -0.0010, -0.0008, -0.0017, -0.0022, -0.0020, -0.0011, -0.0013,
-0.0010, -0.0015],
[ 0.0095, 0.0060, 0.0049, 0.0102, 0.0129, 0.0116, 0.0066, 0.0077,
0.0063, 0.0091],
[ 0.0005, 0.0003, 0.0002, 0.0005, 0.0006, 0.0006, 0.0003, 0.0004,
0.0003, 0.0004]])
<0>:tensor([-0.0090, 0.0062, -0.0027, 0.0160, 0.0008])
<1>:tensor([[-0.0035, 0.0035, -0.0026, -0.0106, -0.0002],
[-0.0020, 0.0020, -0.0015, -0.0061, -0.0001]])
<1>:tensor([-0.0289, -0.0166])
<2>:tensor([[0.0355, 0.0420],
[0.0354, 0.0418]])
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.