Issue
Let’s say I have a function Psi with a 4-dimensional vector output, that takes a 3-dimensional vector u as input. I would like to compute the gradient of the first three components of Psi w.r.t. the respective three components of u:
import torch
u = torch.tensor([1.,2.,3.], requires_grad=True)
psi = torch.zeros(4)
psi[0] = 2*u[0]
psi[1] = 2*u[1]
psi[2] = 2*u[2]
psi[3] = torch.dot(u,u)
grad_Psi_0 = torch.autograd.grad(psi[0], u[0])
grad_Psi_1 = torch.autograd.grad(psi[1], u[1])
grad_Psi_2 = torch.autograd.grad(psi[2], u[2])
And I get the error that u[0],u[1], and u[2] are not used in the graph:
---> 19 grad_Psi_0 = torch.autograd.grad(psi[0], u[0])
20 grad_Psi_1 = torch.autograd.grad(psi[1], u[1])
21 grad_Psi_2 = torch.autograd.grad(psi[2], u[2])
File ~/.local/lib/python3.10/site-packages/torch/autograd/__init__.py:275, in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused, is_grads_batched)
273 return _vmap_internals._vmap(vjp, 0, 0, allow_none_pass_through=True)(grad_outputs)
274 else:
--> 275 return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
276 outputs, grad_outputs_, retain_graph, create_graph, inputs,
277 allow_unused, accumulate_grad=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
This is strange, as components of u are used to compute components of psi, so it should be possible to compute derivatives of the components of psi w.r.t to components of u. How to fix this?
Answer: based on answer from @Ivan, for higher-order derivatives, component-wise calls to gradient require create_graph=True
, otherwise the same error happens as described above.
import torch
# u = grad(Phi) = [2*u0, 2*u1, 2*u2]
# Phi = u0**2 + u1**2 + u2**2 = dot(u,u)
u = torch.tensor([1.,2.,3.], requires_grad=True)
psi = torch.zeros(4)
psi[0] = 2*u[0]
psi[1] = 2*u[1]
psi[2] = 2*u[2]
psi[3] = torch.dot(u,u)
print("u = ",u)
print("psi = ",psi)
grad_v_x = torch.autograd.grad(psi[0], u, retain_graph=True)[0]
print(grad_v_x)
grad_v_y = torch.autograd.grad(psi[1], u, retain_graph=True)[0]
print(grad_v_y)
grad_v_z = torch.autograd.grad(psi[2], u, retain_graph=True)[0]
print(grad_v_z)
div_v = grad_v_x[0] + grad_v_y[1] + grad_v_z[2]
# Divergence of the vector phi[0:3]=2u0 + 2u1 + 2u2 w.r.t [u0,u1,u2] = 2+2+2=6
print (div_v)
# laplace(psi[3]) = \partial_u0^2 psi[3] + \partial_u1^2 psi[3] + \partial_u2^2 psi[3]
# = \partial_u0 2x + \partial_u1 2u1 + \partial_u2 2u2 = 2 + 2 + 2 = 6
d_phi_du = torch.autograd.grad(psi[3], u, create_graph=True, retain_graph=True)[0]
print(d_phi_du)
dd_phi_d2u0 = torch.autograd.grad(d_phi_du[0], u, retain_graph=True)[0]
dd_phi_d2u1 = torch.autograd.grad(d_phi_du[1], u, retain_graph=True)[0]
dd_phi_d2u2 = torch.autograd.grad(d_phi_du[2], u, retain_graph=True)[0]
laplace_phi = torch.dot(dd_phi_d2u0 + dd_phi_d2u1 + dd_phi_d2u2, torch.ones(3))
print(laplace_phi)
Solution
The reason why is because u[0]
is actually a copy so the one used on the following line:
psi[0] = 2*u[0]
is different to the one used here
grad_Psi_0 = torch.autograd.grad(psi[0], u[0])
which means they are not linked in the computation graph.
A possible solution is to assign u[0]
to a separate variable such that it can be used on both calls:
>>> u0 = u[0]
>>> psi[0] = 2*u0
>>> torch.autograd.grad(psi[0], u0)
(tensor(2.),)
Alternatively, you can call autograd.grad
directly on u
:
>>> psi[0] = 2*u[0]
>>> torch.autograd.grad(psi[0], u)
(tensor([2., 0., 0.]),)
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.