Issue
I implement a simple linear model with pytorch and keras to learn the basics of each library. I set up a one layer model for linear regression. Both models seem to work and loss is decreasing, but when the pytorch model is reaching the minimum the loss is bouncing while in keras the loss is stabil. The data I use is synthetic and tested for linearity.
Here is the keras model
IceCream = pd.read_csv("IceCreamData.csv")
x_values = IceCream[["Temperature"]]
y_values = IceCream["Revenue"]
x_train, x_test, y_train, y_test = train_test_split(x_values, y_values, test_size=0.25)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(units=1,
kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01),
bias_initializer=tf.keras.initializers.Zeros())
)
model.compile(optimizer=tf.keras.optimizers.Adam(0.01, epsilon=1e-07), loss='mean_squared_error')
model.fit(x_train, y_train, epochs=25, batch_size=1)
Giving following training data:
375/375 [==============================] - 0s 617us/step - loss: 261208.2969
Epoch 2/25
375/375 [==============================] - 0s 568us/step - loss: 192060.6094
Epoch 3/25
375/375 [==============================] - 0s 577us/step - loss: 137438.0000
(...)
Epoch 20/25
375/375 [==============================] - 0s 536us/step - loss: 667.3316
Epoch 21/25
375/375 [==============================] - 0s 535us/step - loss: 665.7455
Epoch 22/25
375/375 [==============================] - 0s 535us/step - loss: 666.8908
Epoch 23/25
375/375 [==============================] - 0s 577us/step - loss: 665.0857
Epoch 24/25
375/375 [==============================] - 0s 536us/step - loss: 662.0533
Epoch 25/25
375/375 [==============================] - 0s 534us/step - loss: 661.3047
Here is the pytorch model:
class RegressionDataset(Dataset):
def __init__(self, x, y):
super().__init__()
self.x = torch.from_numpy(x.astype("float32"))
self.y = torch.from_numpy(y.astype("float32"))
def __len__(self):
return len(self.x)
def __getitem__(self, index):
return self.x[index], self.y[index].unsqueeze(0)
class LinearRegressionModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(1, 1)
self.loss_function = nn.MSELoss()
self.optimizer_function = torch.optim.Adam(self.parameters(), lr=0.01, eps=1e-07)
torch.nn.init.normal_(self.linear.weight, mean=0.0, std=1.0)
def forward(self, inputs):
return self.linear(inputs)
def backward(self, train_loader, epoch, num_epochs):
self.train()
for x_values, y_values in train_loader:
prediction = self.linear(x_values)
loss = self.loss_function(prediction, y_values)
loss.backward()
self.optimizer_function.step()
self.optimizer_function.zero_grad()
print(f"Epoch [{epoch + 1:03}/{num_epochs:3}] | Train Loss: {loss.item():.4f}")
def validate(self, val_loader):
self.eval()
with torch.no_grad():
for inputs, targets in val_loader:
outputs = self.linear(inputs)
loss = self.loss_function(outputs, targets)
print(f'Validation Loss: {loss.item():.4f}')
data = pd.read_csv("./IceCreamData.csv", delimiter=",")
x_values = data[["Temperature"]].to_numpy()
y_values = data["Revenue"].to_numpy()
dataset = RegressionDataset(x_values, y_values)
train_dataset, test_dataset = random_split(dataset, lengths=[0.75, 0.25])
train_loader = DataLoader(dataset=train_dataset, batch_size=1, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=1, shuffle=True)
model = LinearRegressionModel()
num_epochs = 25
for epoch in range(num_epochs):
model.backward(train_loader, epoch, num_epochs)
model.validate(test_loader)
giving following training results:
Epoch [001/ 25] | Train Loss: 248788.2500
Validation Loss: 257732.9062
Epoch [002/ 25] | Train Loss: 96519.7422
Validation Loss: 110466.8281
Epoch [003/ 25] | Train Loss: 76869.0547
Validation Loss: 178772.9375
(...)
Epoch [020/ 25] | Train Loss: 679.7694
Validation Loss: 1674.3351
Epoch [021/ 25] | Train Loss: 2065.5454
Validation Loss: 1177.6052
Epoch [022/ 25] | Train Loss: 269.6078
Validation Loss: 595.9854
Epoch [023/ 25] | Train Loss: 115.4116
Validation Loss: 0.1172
Epoch [024/ 25] | Train Loss: 2134.9248
Validation Loss: 9816.9375
Epoch [025/ 25] | Train Loss: 37.1115
Validation Loss: 2869.8569
First I thought its resulting by different init values for weights, so I implemented initialization with standard deviation.
I also tested with different values for learning rate but also no better results. According to documentation the other parameters like momentum, betas etc. should be the same. Only the epsilon was different, which I adjusted in the code.
Why is the loss of the pytorch model bouncing up and down when reaching the minimum, while the keras model loss is stable?
Solution
The loss value returned by fit
in Keras is an average over the entire epoch; from the docs:
History.history
attribute is a record of training loss values and metrics values at successive epochs
The torch code in your question prints out the loss of only a single batch (a single sample in this case). You can add up the loss over all batches, and then report the average. I did that and the loss declined more smoothly.
Small modifications made:
def backward(self, train_loader, epoch, num_epochs):
self.train()
cumulative_loss = 0 #accrue loss over all batches
for x_values, y_values in train_loader:
prediction = self.linear(x_values)
loss = self.loss_function(prediction, y_values)
loss.backward()
self.optimizer_function.step()
self.optimizer_function.zero_grad()
cumulative_loss += loss.item() #accumulate losses
#report average over all batches
print(f"Epoch [{epoch + 1:03}/{num_epochs:3}] | "
f"Train Loss: {cumulative_loss / len(train_loader):.4f}")
def validate(self, val_loader):
self.eval()
loss = 0 #used to accumulate losses
with torch.no_grad():
for inputs, targets in val_loader:
outputs = self.linear(inputs)
loss += self.loss_function(outputs, targets).item() #add up losses
print(f'Validation Loss: {loss / len(val_loader):.4f}') #report average
Alternatively, setting batch_size=len(test_data)
for the test loader should give you a smoother average for the validation losses (though it will be more memory-intensive and could be slower). This method won't work well for the training losses because you usually need a relatively small batch size for effective training.
Answered By - user3128
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.