Issue
I have the following NN module in PyTorch
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.emb = nn.Embedding(num_embeddings=10000, embedding_dim=512)
self.drop1 = nn.Dropout(p=0.25)
self.lstm = nn.LSTM(input_size=512, hidden_size=32, num_layers=1)
self.drop2 = nn.Dropout(p=0.25)
self.dense = nn.Linear(32, 1)
self.activ = nn.Sigmoid()
def forward(self, x):
t1 = self.emb(x)
t2 = self.drop1(t1)
outputs, (hidden, cell) = self.lstm(t2)
t4 = self.drop2(outputs[:,-1,:])
t5 = self.dense(t4)
return self.activ(t5)
The training code is the following:
model = Model()
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters())
for epoch in range(3):
outputs = model(torch.from_numpy(x_train))
loss = criterion(torch.flatten(outputs).to(torch.float32), torch.flatten(torch.from_numpy(y_train)).to(torch.float32))
optimizer.zero_grad()
loss.backward()
optimizer.step()
The code works fine when I drastically lower the dimensions in the different network layer (2 or 4 instead of 512 and 32 and so on). I did that to debug my implementation and make sure it works.
However with the given parameters in the code I provided, my laptop stops (the mouse doesn't move anymore, nothing works, I had to unplug the laptop and restart it). Same thing when executing on Google Colab, there is an error and the session resets.
I added prints everywhere, the code seemingly stops at outputs = model(torch.from_numpy(x_train)). I didn't check which step of the forward pass though.
What is surprising is that the exact same module coded using Tensorflow.Keras works fine both on my laptop and on Google Colab.
What am I missing here? Thanks a lot!
I expect the training to work correctly.
Data download and processing
import tensorflow as tf
import numpy as np
data = tf.keras.datasets.imdb.load_data(num_words=10000)
train, test = data[0], data[1]
x_train, y_train = train[0], train[1]
x_test, y_test = test[0], test[1]
review_length = 500
from tensorflow.keras.preprocessing import sequence
x_train = sequence.pad_sequences(x_train, maxlen = review_length)
x_test = sequence.pad_sequences(x_test, maxlen = review_length)
The same model in Tensorflow Keras that works fine
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(tf.keras.layers.Embedding(input_dim=10000, output_dim=512, input_length=500))
model.add(tf.keras.layers.Dropout(rate=0.25))
model.add(tf.keras.layers.LSTM(units=32))
model.add(tf.keras.layers.Dropout(rate=0.25))
model.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
model.compile(optimizer=tf.keras.optimizers.Adam(), loss="binary_crossentropy", metrics=["accuracy"])
model.fit(np.asarray(x_train), y_train, epochs=3, batch_size=256, validation_split=0.2)
Solution
The problem is passing the whole training set to the model in a single call, the simple solution to this is to use batching, and it is why the Keras model works. as model.fit
applies batching automatically, while in PyTorch batching has to be manually implemented.
Answered By - Dr. Snoopy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.