Issue
I am trying to calculate the grad of a variable in PyTorch. However, there was a RuntimeError which tells me that the shape of output and grad must be the same. However, in my case, the shape of output and grad cannot be the same. Here is my code to reproduce:
import numpy as np
import torch
from torch.autograd import Variable as V
ne = 3
m, n = 79, 164
G = np.random.rand(m, n).astype(np.float64)
w = np.random.rand(n, n).astype(np.float64)
z = -np.random.rand(n).astype(np.float64)
G = V(torch.from_numpy(G))
w = V(torch.from_numpy(w))
z = V(torch.from_numpy(z), requires_grad=True)
e, v = torch.symeig(torch.diag(2 * z - torch.sum(w, dim=1)) + w, eigenvectors=True, upper=False)
ssev = torch.sum(torch.pow(e[-ne:] * v[:, -ne:], 2), dim=1)
out = torch.sum(torch.matmul(G, ssev.reshape((n, 1))))
out.backward(z)
print(z.grad)
The error message is: RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([164]) and output[0] has a shape of torch.Size([])
Similar calculation is allowed in TensorFlow and I can successfully get the gradient I want:
import numpy as np
import tensorflow as tf
m, n = 79, 164
G = np.random.rand(m, n).astype(np.float64)
w = np.random.rand(n, n).astype(np.float64)
z = -np.random.rand(n).astype(np.float64)
def tf_function(z, G, w, ne=3):
e, v = tf.linalg.eigh(tf.linalg.diag(2 * z - tf.reduce_sum(w, 1)) + w)
ssev = tf.reduce_sum(tf.square(e[-ne:] * v[:, -ne:]), 1)
return tf.reduce_sum(tf.matmul(G, tf.expand_dims(ssev, 1)))
z, G, w = [tf.convert_to_tensor(_, dtype=tf.float64) for _ in (z, G, w)]
z = tf.Variable(z)
with tf.GradientTape() as g:
g.watch(z)
out = tf_function(z, G, w)
print(g.gradient(out, z).numpy())
My tensorflow version is 2.0 and my PyTorch version is 1.14.0. I am using Python3.6.9. In my opinion, calculating the gradients when the output and the variables have different shapes is very reasonable and I don't think I made any mistake.Can anyone help me with this problem? I really appreciate it!
Solution
First of all you don't need to use numpy and then convert to Variable (which is deprecated by the way), you can just use G = torch.rand(m, n)
etc. Second, when you write out.backward(z)
, you are passing z
as the gradient of out
, i.e. out.backward(gradient=z)
, probably due to the misconception that "out.backward(z)
computes the gradient of z
, i.e. dout/dz
". Instead, this argument is meant to be gradient = d[f(out)]/dout
for some function f
(e.g. a loss function) and it's the tensor used to compute vector-Jacobian product dout/dz * df/dout
. Therefore, the reason why you got the error is because your out
(and its gradient df/dout
) is a scalar (zero-dimensional tensor) and z
is a tensor of size n
, leading to a mismatch in shapes.
To fix the problem, as you have already figured out by yourself, just replace out.backward(z)
with out.backward()
, which is equivalent to out.backward(gradient=torch.tensor(1.))
, since in your case out
is a scalar and f(out) = out
, so d[f(out)]/dout = d(out)/d(out) = tensor(1.)
. If your out
was a non-scalar tensor, then out.backward()
would not work and instead you would have to use out.backward(torch.ones(out.shape))
(again assuming that f(out) = out
). In any case, if you need to pass gradient
to the out.backward()
, make sure that it has the same shape as the out
.
Answered By - Andreas K.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.