Sunday, October 3, 2021

[FIXED] PyTorch: "one of the variables needed for gradient computation has been modified by an inplace operation"

October 03, 2021 python, pytorch, recurrent-neural-network No comments

Issue

I'm training a PyTorch RNN on a text file of song lyrics to predict the next character given a character.

Here's how my RNN is defined:


import torch.nn as nn
import torch.optim

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        
        self.hidden_size = hidden_size
        
        # from input, previous hidden state to new hidden state
        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        
        # from input, previous hidden state to output
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        
        # softmax on output
        self.softmax = nn.LogSoftmax(dim = 1)
    
    def forward(self, input, hidden):
        
        combined = torch.cat((input, hidden), 1)
        
        #get new hidden state
        hidden = self.i2h(combined)
        
        #get output
        output = self.i2o(combined)
        
        #apply softmax
        output = self.softmax(output)
        return output, hidden
    
    def initHidden(self): 
        return torch.zeros(1, self.hidden_size)

rnn = RNN(input_size = num_chars, hidden_size = 200, output_size = num_chars)
criterion = nn.NLLLoss()

lr = 0.01
optimizer = torch.optim.AdamW(rnn.parameters(), lr = lr)

Here's my training function:

def train(train, target):
    
    hidden = rnn.initHidden()
    
    loss = 0
    
    for i in range(len(train)):
        
        optimizer.zero_grad()

        # get output, hidden state from rnn given input char, hidden state
        output, hidden = rnn(train[i].unsqueeze(0), hidden)

        #returns the index with '1' - indentifying the index of the right character
        target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
        
        loss += criterion(output, target_class)
        
    
        loss.backward(retain_graph = True)
        optimizer.step()
        
        print("done " + str(i) + " loop")
    
    return output, loss.item() / train.size(0)

When I run my training function, I get this error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [274, 74]], which is output 0 of TBackward, is at version 5; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Interestingly, it makes it through two complete loops of the training function before giving me that error.

Now, when I remove the retain_graph = True from loss.backward(), I get this error:

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

It shouldn't be trying to go backward through the graph multiple times here. Perhaps the graph is not getting cleared between training loops?

Solution

The issue is you are accumulating your loss values (and at the same time, the computation graphs associated attached to them) on variable loss, here:

    loss += criterion(output, target_class)

In turn, this means at every iteration you are trying to backpropagate through the current and previous loss values that were computed in previous inferences. In this particular instance where you are looping through your dataset, it isn't the right thing to do.

A simple fix is to accumulate loss's underlying value, i.e. the scalar value, not the tensor itself, using item. And, backpropagate on the current loss tensor:

total_loss = 0
    
for i in range(len(train)):
    optimizer.zero_grad()
    output, hidden = rnn(train[i].unsqueeze(0), hidden)
    target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
        
    loss = criterion(output, target_class)
    loss.backward()

    total_loss += loss.item()

Since you are updating the model's parameter straight after having done the backpropagation, you don't need to retain the graph in memory.

Answered By - Ivan

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, October 3, 2021

[FIXED] PyTorch: "one of the variables needed for gradient computation has been modified by an inplace operation"

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels