Friday, November 24, 2023

[FIXED] Laptop stopped when training a pytorch lstm model, while tensorflow counterpart works

November 24, 2023 deep-learning, nlp, python, pytorch No comments

Issue

I have the following NN module in PyTorch

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.emb = nn.Embedding(num_embeddings=10000, embedding_dim=512)
        self.drop1 = nn.Dropout(p=0.25)
        self.lstm = nn.LSTM(input_size=512, hidden_size=32, num_layers=1)
        self.drop2 = nn.Dropout(p=0.25)
        self.dense = nn.Linear(32, 1)
        self.activ = nn.Sigmoid()

    def forward(self, x):
        t1 = self.emb(x)
        t2 = self.drop1(t1)
        outputs, (hidden, cell) = self.lstm(t2)
        t4 = self.drop2(outputs[:,-1,:])
        t5 = self.dense(t4)
        return self.activ(t5)

The training code is the following:

model = Model()
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters())

for epoch in range(3):
    outputs = model(torch.from_numpy(x_train))
    loss = criterion(torch.flatten(outputs).to(torch.float32), torch.flatten(torch.from_numpy(y_train)).to(torch.float32))
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

The code works fine when I drastically lower the dimensions in the different network layer (2 or 4 instead of 512 and 32 and so on). I did that to debug my implementation and make sure it works.

However with the given parameters in the code I provided, my laptop stops (the mouse doesn't move anymore, nothing works, I had to unplug the laptop and restart it). Same thing when executing on Google Colab, there is an error and the session resets.

I added prints everywhere, the code seemingly stops at outputs = model(torch.from_numpy(x_train)). I didn't check which step of the forward pass though.

What is surprising is that the exact same module coded using Tensorflow.Keras works fine both on my laptop and on Google Colab.

What am I missing here? Thanks a lot!

I expect the training to work correctly.

Data download and processing

import tensorflow as tf
import numpy as np
data = tf.keras.datasets.imdb.load_data(num_words=10000)
train, test = data[0], data[1]
x_train, y_train = train[0], train[1]
x_test, y_test = test[0], test[1]

review_length = 500
from tensorflow.keras.preprocessing import sequence
x_train = sequence.pad_sequences(x_train, maxlen = review_length)
x_test = sequence.pad_sequences(x_test, maxlen = review_length)

The same model in Tensorflow Keras that works fine

from tensorflow.keras.models import Sequential

model = Sequential()
model.add(tf.keras.layers.Embedding(input_dim=10000, output_dim=512, input_length=500))
model.add(tf.keras.layers.Dropout(rate=0.25))
model.add(tf.keras.layers.LSTM(units=32))
model.add(tf.keras.layers.Dropout(rate=0.25))
model.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

model.compile(optimizer=tf.keras.optimizers.Adam(), loss="binary_crossentropy", metrics=["accuracy"])

model.fit(np.asarray(x_train), y_train, epochs=3, batch_size=256, validation_split=0.2)

Solution

The problem is passing the whole training set to the model in a single call, the simple solution to this is to use batching, and it is why the Keras model works. as model.fit applies batching automatically, while in PyTorch batching has to be manually implemented.

Answered By - Dr. Snoopy

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, November 24, 2023

[FIXED] Laptop stopped when training a pytorch lstm model, while tensorflow counterpart works

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels