Saturday, November 6, 2021

[FIXED] neural network trained with PyTorch outputs the mean value for every input

November 06, 2021 machine-learning, mnist, neural-network, python, pytorch No comments

Issue

I am using PyTorch in order to get my neural network to recognize digits from the MNIST database.

import torch
import torchvision

I'd like to implement a very simple design similar to what is shown in 3Blue1Brown's video series about neural networks. The following design in particular achieved an error rate of 1.6%.

class Net(torch.nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.layer1 = torch.nn.Linear(784, 800)
        self.layer2 = torch.nn.Linear(800, 10)

    def forward(self, x):
        x = torch.sigmoid(self.layer1(x))
        x = torch.sigmoid(self.layer2(x))
        return x

The data is gathered using torchvision and organised in mini batches containing 32 images each.

batch_size = 32
training_set = torchvision.datasets.MNIST("./", download=True, transform=torchvision.transforms.ToTensor())
training_loader = torch.utils.data.DataLoader(training_set, batch_size=32)

I am using the mean squared error as a loss funtion and stochastic gradient descent with a learning rate of 0.001 as my optimization algorithm.

net = Net()
loss_function = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.001)

Finally the network gets trained and saved using the following code:

for images, labels in training_loader:
    optimizer.zero_grad()
    for i in range(batch_size):
        output = net(torch.flatten(images[i]))
        desired_output = torch.tensor([float(j == labels[i]) for j in range(10)])
        loss = loss_function(output, desired_output)
        loss.backward()
    optimizer.step()
torch.save(net.state_dict(), "./trained_net.pth")

However, here are the outputs of some test images:

tensor([0.0978, 0.1225, 0.1018, 0.0961, 0.1022, 0.0885, 0.1007, 0.1077, 0.0994,
        0.1081], grad_fn=<SigmoidBackward>)
tensor([0.0978, 0.1180, 0.1001, 0.0929, 0.1006, 0.0893, 0.1010, 0.1051, 0.0978,
        0.1067], grad_fn=<SigmoidBackward>)
tensor([0.0981, 0.1227, 0.1018, 0.0970, 0.0979, 0.0908, 0.1001, 0.1092, 0.1011,
        0.1088], grad_fn=<SigmoidBackward>)
tensor([0.1061, 0.1149, 0.1037, 0.1001, 0.0957, 0.0919, 0.1044, 0.1022, 0.0997,
        0.1052], grad_fn=<SigmoidBackward>)
tensor([0.0996, 0.1137, 0.1005, 0.0947, 0.0977, 0.0916, 0.1048, 0.1109, 0.1013,
        0.1085], grad_fn=<SigmoidBackward>)
tensor([0.1008, 0.1154, 0.0986, 0.0996, 0.1031, 0.0952, 0.0995, 0.1063, 0.0982,
        0.1094], grad_fn=<SigmoidBackward>)
tensor([0.0972, 0.1235, 0.1013, 0.0984, 0.0974, 0.0907, 0.1032, 0.1075, 0.1001,
        0.1080], grad_fn=<SigmoidBackward>)
tensor([0.0929, 0.1258, 0.1016, 0.0978, 0.1006, 0.0889, 0.1001, 0.1068, 0.0986,
        0.1024], grad_fn=<SigmoidBackward>)
tensor([0.0982, 0.1207, 0.1040, 0.0990, 0.0999, 0.0910, 0.0980, 0.1051, 0.1039,
        0.1078], grad_fn=<SigmoidBackward>)

As you can see the network seems to approach a state where the answer for every input is:

[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]

This neural network is not better than just guessing. Where did I go wrong in my design or code?

Solution

Here are a few points that would be useful for you:

At first glance your model is not learning since your prediction are as good as a random guess. The first initiative would be to monitor your loss, here you only have a single epoch. At least you could evaluate your model on unseen data:
```
validation_set = torchvision.datasets.MNIST('./', 
    download=True, train=False, transform=T.ToTensor())

validation_loader = DataLoader(validation_set, batch_size=32)
```
You are using a MSE loss (the L2-norm) to train a classification task which is not the right tool for this kind of task. You could instead be using the negative log-likelihood. PyTorch offers nn.CrossEntropyLoss which includes a log-softmax and the negative log-likelihood loss in one module. This change can be implemented by adding in:
```
loss_function = nn.CrossEntropyLoss()
```
and using the right target shapes when applying loss_function (see below). Since the loss function will apply a log-softmax, you shouldn't have an activation function on your model's output.

You are using sigmoid as an activation function, intermediate non-linearities will work better as ReLU (see related post). A sigmoid is more suited for a binary classification task. Again, since we are using nn.CrossEntropyLoss, we have to remove the activation after layer2.

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.flatten = nn.Flatten()
        self.layer1 = torch.nn.Linear(784, 800)
        self.layer2 = torch.nn.Linear(800, 10)

    def forward(self, x):
        x = self.flatten(x)
        x = torch.relu(self.layer1(x))
        x = self.layer2(x)
        return x

A less crucial point is the fact that you could infer estimations on a whole batch instead of looping through each batch one element at a time. A typical training loop for one epoch would look like:
```
for images, labels in training_loader:
    optimizer.zero_grad()
    output = net(images)
    loss = loss_function(output, labels)
    loss.backward()
    optimizer.step()
```

With these kinds of modifications, you can expect to have a validation of around 80% after a single epoch.

Answered By - Ivan

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 6, 2021

[FIXED] neural network trained with PyTorch outputs the mean value for every input

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels