Thursday, September 8, 2022

[FIXED] Pytorch backward does not compute the gradients for requested variables

September 08, 2022 machine-learning, python, pytorch, pytorch-lightning No comments

Issue

I'm trying to train a resnet18 model on pytorch (+pytorch-lightning) with the use of Virtual Adversarial Training. During the computations required for this type of training I need to obtain the gradient of D (ie. the cross-entropy loss of the model) with regard to tensor r.

This should, in theory, happen in the following code snippet:

def generic_step(self, train_batch, batch_idx, step_type):
    x, y = train_batch
    unlabeled_idx = y is None

    d = torch.rand(x.shape).to(x.device)
    d = d/(torch.norm(d) + 1e-8)

    pred_y = self.classifier(x)
    y[unlabeled_idx] = pred_y[unlabeled_idx]
    l = self.criterion(pred_y, y)
    R_adv = torch.zeros_like(x)
    for _ in range(self.ip):
        r = self.xi * d
        r.requires_grad = True
        pred_hat = self.classifier(x + r)
        # pred_hat = F.log_softmax(pred_hat, dim=1)
        D = self.criterion(pred_hat, pred_y)
        self.classifier.zero_grad()
        D.requires_grad=True
        D.backward()
        R_adv += self.eps * r.grad / (torch.norm(r.grad) + 1e-8)

    R_adv /= 32
    loss = l + R_adv * self.a
    loss.backward()
    self.accuracy[step_type] = self.acc_metric(torch.argmax(pred_y, 1), y)
    return loss

Here, to my understanding, r.grad should in theory be the gradient of D with respect to r. However, the code throws this at D.backward():

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn (full traceback excluded because this error is not helpful and technically "solved" as I know the cause for it, explained just below)

After some research and debugging it seems that in this situation D.backward() attempts to calculate dD/dD disregarding any previous mention of requires_grad=True. This is confirmed when I add D.requires_grad=True and I get D.grad=Tensor(1.,device='cuda:0') but r.grad=None.

Does anyone know why this may be happening?

Solution

In Lightning, .backward() and optimizer step are all handled under the hood. If you do it yourself like in the code above, it will mess with Lightning because it doesn't know you called backward yourself.

You can enable manual optimization in the LightningModule:

def __init__(self):
    super().__init__()

    # put this in your init
    self.automatic_optimization = False

This tells Lightning that you are taking over calling backward and handling optimizer step + zero grad yourself. Don't forget to add that in your code above. You can access the optimizer and scheduler like so in your training step:

def training_step(self, batch, batch_idx):

    optimizer = self.optimizers()
    scheduler = self.lr_schedulers()

    # do your training step
    # don't forget to call:
    # 1) backward 2) optimizer step 3) zero grad

Read more about manual optimization here.

Answered By - awaelchli

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, September 8, 2022

[FIXED] Pytorch backward does not compute the gradients for requested variables

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels