Issue
I'm trying to build a deep learning food classification algorithm using the food 101 dataset. I was able to successfully implement it by using the following model which works when all my images are sized with dimensions 32x32. However, I realised some of the images were almost incomprehensible so I increased the size to 64x64 for all images. However, when I run my code with these larger image sizes it no longer works.
I believe the error is to do with how I've defined the model. I'm new to the area of deep learning and would appreciate any help. If you need any further info pls comment below without taking down the post.
Model definition (uses convolutional layers and a residual block):
class SimpleResidualBlock(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1)
self.relu1 = nn.ReLU()
self.conv2 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1)
self.relu2 = nn.ReLU()
def forward(self, x):
out = self.conv1(x)
out = self.relu1(out)
out = self.conv2(out)
return self.relu2(out) + x # ReLU can be applied before or after adding the input
def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))
class ImageClassificationBase(nn.Module):
def training_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
return loss
def validation_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss.detach(), 'val_acc': acc}
def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
def epoch_end(self, epoch, result):
print("Epoch [{}], last_lr: {:.5f}, train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
epoch, result['lrs'][-1], result['train_loss'], result['val_loss'], result['val_acc']))
def conv_block(in_channels, out_channels, pool=False):
layers = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)]
if pool: layers.append(nn.MaxPool2d(2))
return nn.Sequential(*layers)
class ResNet9(ImageClassificationBase):
def __init__(self, in_channels, num_classes):
super().__init__()
self.conv1 = conv_block(in_channels, 64)
self.conv2 = conv_block(64, 128, pool=True)
self.res1 = nn.Sequential(conv_block(128, 128), conv_block(128, 128))
self.conv3 = conv_block(128, 256, pool=True)
self.conv4 = conv_block(256, 512, pool=True)
self.res2 = nn.Sequential(conv_block(512, 512), conv_block(512, 512))
self.classifier = nn.Sequential(nn.MaxPool2d(4),
nn.Flatten(),
nn.Dropout(0.2),
nn.Linear(512, num_classes))
def forward(self, xb):
out = self.conv1(xb)
out = self.conv2(out)
out = self.res1(out) + out
out = self.conv3(out)
out = self.conv4(out)
out = self.res2(out) + out
out = self.classifier(out)
return out
Error I get when executing it:
RuntimeError: mat1 dim 1 must match mat2 dim 0
Solution
Welcome to stackoverflow. In general, when you give your code and the error you are receiving, it is better to at least provide a test case scenario where others can reproduce the error you are getting. In this case, I was able to identify the problem and create a test case example.
"RuntimeError: mat1 dim 1 must match mat2 dim 0" this error sounded like a matrix multiplication error to me, where you multiply two matrices and dimensions don't match for multiplication. When I look at your code, only place I see that uses a matrix multiplication is that part:
self.classifier = nn.Sequential(nn.MaxPool2d(4),
nn.Flatten(),
nn.Dropout(0.2),
nn.Linear(512, num_classes))
Linear layer is just a basic matrix multiplication: out = input * weight + bias. So it looks like input dimension of linear layer and weight matrix don't match when you change the input image size:
model_resnet = ResNet9(3, 10)
img = torch.rand(10, 3, 64, 64)
out = model_resnet(img)
Reason this happens is that you use MaxPool2d(4) which applies a 4x4 max pooling filter over the input. If input dimensions to max pooling is 4x4, this filter will produce 1x1 result, if it is 8x8 it will produce 2x2 output, so when you increase your input dimensions to 64x64 from 32x32, output of max pooling will be doubled in each axis making nn.Linear(512) not suitable for your new dimension.
Solution is simple, use Adaptive pooling operations. For example:
self.classifier = nn.Sequential(nn.AdaptiveAvgPool2d((1,1)),
nn.Flatten(),
nn.Dropout(0.2),
nn.Linear(512, num_classes))
AdaptiveAvgPool2d will apply a average filter over the received input, and produce 1x1 result all the time no matter the input dimension. Essentially if the input is 8x8 it will apply 8x8 averaging filter, if the input is 4x4 it will apply 4x4 averaging filter. So with this simple change, you can use 32x32 and 64x64 and even higher dimensional images.
Answered By - yutasrobot
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.