Issue
I am using PyTorch in order to get my neural network to recognize digits from the MNIST database.
import torch
import torchvision
I'd like to implement a very simple design similar to what is shown in 3Blue1Brown's video series about neural networks. The following design in particular achieved an error rate of 1.6%.
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.layer1 = torch.nn.Linear(784, 800)
self.layer2 = torch.nn.Linear(800, 10)
def forward(self, x):
x = torch.sigmoid(self.layer1(x))
x = torch.sigmoid(self.layer2(x))
return x
The data is gathered using torchvision and organised in mini batches containing 32 images each.
batch_size = 32
training_set = torchvision.datasets.MNIST("./", download=True, transform=torchvision.transforms.ToTensor())
training_loader = torch.utils.data.DataLoader(training_set, batch_size=32)
I am using the mean squared error as a loss funtion and stochastic gradient descent with a learning rate of 0.001 as my optimization algorithm.
net = Net()
loss_function = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.001)
Finally the network gets trained and saved using the following code:
for images, labels in training_loader:
optimizer.zero_grad()
for i in range(batch_size):
output = net(torch.flatten(images[i]))
desired_output = torch.tensor([float(j == labels[i]) for j in range(10)])
loss = loss_function(output, desired_output)
loss.backward()
optimizer.step()
torch.save(net.state_dict(), "./trained_net.pth")
However, here are the outputs of some test images:
tensor([0.0978, 0.1225, 0.1018, 0.0961, 0.1022, 0.0885, 0.1007, 0.1077, 0.0994,
0.1081], grad_fn=<SigmoidBackward>)
tensor([0.0978, 0.1180, 0.1001, 0.0929, 0.1006, 0.0893, 0.1010, 0.1051, 0.0978,
0.1067], grad_fn=<SigmoidBackward>)
tensor([0.0981, 0.1227, 0.1018, 0.0970, 0.0979, 0.0908, 0.1001, 0.1092, 0.1011,
0.1088], grad_fn=<SigmoidBackward>)
tensor([0.1061, 0.1149, 0.1037, 0.1001, 0.0957, 0.0919, 0.1044, 0.1022, 0.0997,
0.1052], grad_fn=<SigmoidBackward>)
tensor([0.0996, 0.1137, 0.1005, 0.0947, 0.0977, 0.0916, 0.1048, 0.1109, 0.1013,
0.1085], grad_fn=<SigmoidBackward>)
tensor([0.1008, 0.1154, 0.0986, 0.0996, 0.1031, 0.0952, 0.0995, 0.1063, 0.0982,
0.1094], grad_fn=<SigmoidBackward>)
tensor([0.0972, 0.1235, 0.1013, 0.0984, 0.0974, 0.0907, 0.1032, 0.1075, 0.1001,
0.1080], grad_fn=<SigmoidBackward>)
tensor([0.0929, 0.1258, 0.1016, 0.0978, 0.1006, 0.0889, 0.1001, 0.1068, 0.0986,
0.1024], grad_fn=<SigmoidBackward>)
tensor([0.0982, 0.1207, 0.1040, 0.0990, 0.0999, 0.0910, 0.0980, 0.1051, 0.1039,
0.1078], grad_fn=<SigmoidBackward>)
As you can see the network seems to approach a state where the answer for every input is:
[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
This neural network is not better than just guessing. Where did I go wrong in my design or code?
Solution
Here are a few points that would be useful for you:
At first glance your model is not learning since your prediction are as good as a random guess. The first initiative would be to monitor your loss, here you only have a single epoch. At least you could evaluate your model on unseen data:
validation_set = torchvision.datasets.MNIST('./', download=True, train=False, transform=T.ToTensor()) validation_loader = DataLoader(validation_set, batch_size=32)
You are using a MSE loss (the L2-norm) to train a classification task which is not the right tool for this kind of task. You could instead be using the negative log-likelihood. PyTorch offers
nn.CrossEntropyLoss
which includes a log-softmax and the negative log-likelihood loss in one module. This change can be implemented by adding in:loss_function = nn.CrossEntropyLoss()
and using the right target shapes when applying
loss_function
(see below). Since the loss function will apply a log-softmax, you shouldn't have an activation function on your model's output.You are using sigmoid as an activation function, intermediate non-linearities will work better as ReLU (see related post). A sigmoid is more suited for a binary classification task. Again, since we are using
nn.CrossEntropyLoss
, we have to remove the activation afterlayer2
.class Net(torch.nn.Module): def __init__(self): super(Net, self).__init__() self.flatten = nn.Flatten() self.layer1 = torch.nn.Linear(784, 800) self.layer2 = torch.nn.Linear(800, 10) def forward(self, x): x = self.flatten(x) x = torch.relu(self.layer1(x)) x = self.layer2(x) return x
A less crucial point is the fact that you could infer estimations on a whole batch instead of looping through each batch one element at a time. A typical training loop for one epoch would look like:
for images, labels in training_loader: optimizer.zero_grad() output = net(images) loss = loss_function(output, labels) loss.backward() optimizer.step()
With these kinds of modifications, you can expect to have a validation of around 80% after a single epoch.
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.