Issue
I am learning to train a basic nn model for image classification, the error happened when I was trying to feed in image data into the model. I understand that I should input correct size of image data. My image data is 128*256 with 3 channels,4 classes, and the batch size is 4. What I don't understand is where does the size 113216 come from? I checked all related parameters or image meta data, but didn't find a clue. Here is my code:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(3*128*256, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(4, 3*128*256)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
for epoch in range(2): # loop over the dataset multiple times
print('round start')
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
print(inputs.shape)
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
Thanks for your help!
Solution
Shapes
Conv2d
changes width and height of image withoutpadding
. Rule of thumb (if you want to keep the same image size withstride=1
(default)):padding = kernel_size // 2
- You are changing number of channels, while your
linear
layer has3
for some reason? - Use
print(x.shape)
after each step if you want to know how your tensor data is transformed!
Commented code
Fixed code with comments about shapes after each step:
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = torch.nn.Conv2d(3, 6, 5)
self.pool = torch.nn.MaxPool2d(2, 2)
self.conv2 = torch.nn.Conv2d(6, 16, 5)
# Output shape from convolution is input shape to fc
self.fc1 = torch.nn.Linear(16 * 29 * 61, 120)
self.fc2 = torch.nn.Linear(120, 84)
self.fc3 = torch.nn.Linear(84, 10)
def forward(self, x):
# In: (4, 3, 128, 256)
x = F.relu(self.conv1(x))
# (4, 3, 124, 252) because kernel_size=5 takes 2 pixels
x = self.pool(x)
# (4, 6, 62, 126) # Because pooling halving the size
x = F.relu(self.conv2(x))
# (4, 16, 58, 122) # Same reason as above
x = self.pool(x)
# (4, 16, 29, 61) Because pooling halving the size
# Better use torch.flatten(x, dim=1) so you don't have to input size here
x = x.view(-1, 16 * 29 * 61) # Use -1 to be batch size independent
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
Other things that might help
- Try
torch.nn.AdaptiveMaxPool2d(1)
before ReLU, it will make your network width and height independent - Use
flatten
(ortorch.nn.Flatten()
layer) after this pooling - If so, pass
num_channels
set in last convolution asin_features
fornn.Linear
Answered By - Szymon Maszke
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.