Thursday, October 21, 2021

[FIXED] Image classification model works with 32x32 images but not 64x64

October 21, 2021 deep-learning, python, pytorch No comments

Issue

I'm trying to build a deep learning food classification algorithm using the food 101 dataset. I was able to successfully implement it by using the following model which works when all my images are sized with dimensions 32x32. However, I realised some of the images were almost incomprehensible so I increased the size to 64x64 for all images. However, when I run my code with these larger image sizes it no longer works.

I believe the error is to do with how I've defined the model. I'm new to the area of deep learning and would appreciate any help. If you need any further info pls comment below without taking down the post.

Model definition (uses convolutional layers and a residual block):

class SimpleResidualBlock(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1)
        self.relu1 = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1)
        self.relu2 = nn.ReLU()
        
    def forward(self, x):
        out = self.conv1(x)
        out = self.relu1(out)
        out = self.conv2(out)
        return self.relu2(out) + x # ReLU can be applied before or after adding the input


def accuracy(outputs, labels):
    _, preds = torch.max(outputs, dim=1)
    return torch.tensor(torch.sum(preds == labels).item() / len(preds))

class ImageClassificationBase(nn.Module):
    def training_step(self, batch):
        images, labels = batch 
        out = self(images)                  # Generate predictions
        loss = F.cross_entropy(out, labels) # Calculate loss
        return loss
    
    def validation_step(self, batch):
        images, labels = batch 
        out = self(images)                    # Generate predictions
        loss = F.cross_entropy(out, labels)   # Calculate loss
        acc = accuracy(out, labels)           # Calculate accuracy
        return {'val_loss': loss.detach(), 'val_acc': acc}
        
    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        batch_accs = [x['val_acc'] for x in outputs]
        epoch_acc = torch.stack(batch_accs).mean()      # Combine accuracies
        return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
    
    def epoch_end(self, epoch, result):
        print("Epoch [{}], last_lr: {:.5f}, train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
            epoch, result['lrs'][-1], result['train_loss'], result['val_loss'], result['val_acc']))

def conv_block(in_channels, out_channels, pool=False):
    layers = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), 
              nn.BatchNorm2d(out_channels), 
              nn.ReLU(inplace=True)]
    if pool: layers.append(nn.MaxPool2d(2))
    return nn.Sequential(*layers)

class ResNet9(ImageClassificationBase):
    def __init__(self, in_channels, num_classes):
        super().__init__()
        
        self.conv1 = conv_block(in_channels, 64)
        self.conv2 = conv_block(64, 128, pool=True)
        self.res1 = nn.Sequential(conv_block(128, 128), conv_block(128, 128))
        
        self.conv3 = conv_block(128, 256, pool=True)
        self.conv4 = conv_block(256, 512, pool=True)
        self.res2 = nn.Sequential(conv_block(512, 512), conv_block(512, 512))
        
        self.classifier = nn.Sequential(nn.MaxPool2d(4), 
                                        nn.Flatten(), 
                                        nn.Dropout(0.2),
                                        nn.Linear(512, num_classes))
        
    def forward(self, xb):
        out = self.conv1(xb)
        out = self.conv2(out)
        out = self.res1(out) + out
        out = self.conv3(out)
        out = self.conv4(out)
        out = self.res2(out) + out
        out = self.classifier(out)
        return out

Error I get when executing it:

RuntimeError: mat1 dim 1 must match mat2 dim 0

Solution

Welcome to stackoverflow. In general, when you give your code and the error you are receiving, it is better to at least provide a test case scenario where others can reproduce the error you are getting. In this case, I was able to identify the problem and create a test case example.

"RuntimeError: mat1 dim 1 must match mat2 dim 0" this error sounded like a matrix multiplication error to me, where you multiply two matrices and dimensions don't match for multiplication. When I look at your code, only place I see that uses a matrix multiplication is that part:

self.classifier = nn.Sequential(nn.MaxPool2d(4), 
                                nn.Flatten(), 
                                nn.Dropout(0.2),
                                nn.Linear(512, num_classes))

Linear layer is just a basic matrix multiplication: out = input * weight + bias. So it looks like input dimension of linear layer and weight matrix don't match when you change the input image size:

 model_resnet = ResNet9(3, 10)
 img = torch.rand(10, 3, 64, 64)
 out = model_resnet(img)

Reason this happens is that you use MaxPool2d(4) which applies a 4x4 max pooling filter over the input. If input dimensions to max pooling is 4x4, this filter will produce 1x1 result, if it is 8x8 it will produce 2x2 output, so when you increase your input dimensions to 64x64 from 32x32, output of max pooling will be doubled in each axis making nn.Linear(512) not suitable for your new dimension.

Solution is simple, use Adaptive pooling operations. For example:

self.classifier = nn.Sequential(nn.AdaptiveAvgPool2d((1,1)), 
                                nn.Flatten(), 
                                nn.Dropout(0.2),
                                nn.Linear(512, num_classes))

AdaptiveAvgPool2d will apply a average filter over the received input, and produce 1x1 result all the time no matter the input dimension. Essentially if the input is 8x8 it will apply 8x8 averaging filter, if the input is 4x4 it will apply 4x4 averaging filter. So with this simple change, you can use 32x32 and 64x64 and even higher dimensional images.

Answered By - yutasrobot

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, October 21, 2021

[FIXED] Image classification model works with 32x32 images but not 64x64

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels