Friday, December 8, 2023

[FIXED] tf-neural network not working - pytorch does

December 08, 2023 neural-network, python, tensorflow No comments

Issue

I have created a tiny dataset where an exact linear relationship holds. The code is as follows:

import numpy as np

def gen_data(n, k):
    np.random.seed(5711)
    beta = np.random.uniform(0, 1, size=(k, 1))
    print("beta is:", beta)
    X = np.random.normal(size=(n, k))
    y = X.dot(beta).reshape(-1, 1)
    D = np.concatenate([X, y], axis=1)
    return D.astype(np.float32)

Now I have fitted a pyTorch neural network with SGD optimizer and MSE-loss and it converged approximately to the true values within 50 epochs and a learning rate of 1e-1

I tried to setup exactly the same model in tensorflow:

import keras.layers
from sklearn.model_selection import train_test_split
from keras.models import Sequential
import tensorflow as tf

n = 10
k = 2
X = gen_data(n, k)
D_train, D_test = train_test_split(X, test_size=0.2)
X_train, y_train = D_train[:,:k], D_train[:,k:]
X_test, y_test = D_test[:,:k], D_test[:,k:]

model = Sequential([keras.layers.Dense(1)])
model.compile(optimizer=tf.keras.optimizers.SGD(lr=1e-1), loss=tf.keras.losses.mean_squared_error)
model.fit(X_train, y_train, batch_size=64, epochs=50)

When I call model.get_weights it shows substantial differences to the true values and the loss is still not even close to zero. I don't know why this model does not perform as well as the pytorch model. Even if you disregard the pytorch model, shoudln't the network converge to the true values in this tiny toy-dataset. What is my error in setting up the model?

EDIT: And here is my full pytorch code for comparison:

import torch
from torch.utils.data import DataLoader, Dataset, Sampler, SequentialSampler, RandomSampler
from torch import nn
from sklearn.model_selection import train_test_split

n = 10
k = 2
device =  "cpu"

class Daten(Dataset):

    def __init__(self, df):
        self.df = df
        self.ycol = df.shape[1] - 1

    def __getitem__(self, index):
        return self.df[index, :self.ycol], self.df[index, self.ycol:]

    def __len__(self):
        return self.df.shape[0]

def split_into(D, batch_size=64, **kwargs):
    D_train, D_test = train_test_split(D, **kwargs)
    df_train, df_test = Daten(D_train), Daten(D_test)
    dl_train, dl_test = DataLoader(df_train, batch_size=batch_size), DataLoader(df_test, batch_size=batch_size)
    return dl_train, dl_test

D = gen_data(n, k)
dl_train, dl_test = split_into(D, test_size=0.2)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Sequential(
            nn.Linear(k, 1)
        )

    def forward(self, x):
        ypred = self.linear(x)
        return ypred


model = NeuralNetwork().to(device)
print(model)
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        print(y.shape)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

epochs = 50
for t in range(epochs):
    print(f"Epoch {t + 1}\n-------------------------------")
    train(dl_train, model, loss_fn, optimizer)
print("Done!")

EDIT:

I increased epochs dramatically. After epochs=1000 we come close to the true values. Therefore my best guess for the discrepancy is that tf applies some non-optimal initialization?

Solution

Your lr parameter for SGD is deprecated:

WARNING:absl:lr is deprecated in Keras optimizer, please use learning_rate or use the legacy optimizer, e.g.,tf.keras.optimizers.legacy.SGD.

If I use

model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=1e-1), loss=tf.keras.losses.mean_squared_error)

Then I get loss: 7.0588e-05 (without bias: loss: 2.0572e-08).
With my simple torch model, I got loss: 5.3355e-05 (without bias: loss: 5.3071e-09).

It's interesting that the bias plays a negative role here, I think the relation between X and y is too linear for the bias to get used, but the model tries it anyways. If you'd add the line

y += np.random.rand(*y.shape)*0.2

to the data creation, then the model with bias will perform better for torch and TF, as there is actual bias in the relation between X and y and the model can learn this.

Answered By - mhenning

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, December 8, 2023

[FIXED] tf-neural network not working - pytorch does

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels