Monday, December 5, 2022

[FIXED] pytorch - RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x512 and 256x512) for identical sizes

December 05, 2022 conv-neural-network, python, pytorch No comments

Issue

I'm trying to create a CNN that obtains at least 80% accuracy on CIFAR10 data in 20 epochs.

class HWCNN(nn.Module):
    def __init__(self, num_channels, num_classes):
        super(HWCNN, self).__init__()
        self.conv1 = nn.Conv2d(num_channels, 32, 3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, 3, stride=1, padding=1)
        self.pool1 = nn.MaxPool2d(2)

        self.conv3 = nn.Conv2d(64, 128, 3, stride=1, padding=1)
        self.conv4 = nn.Conv2d(128, 128, 3, stride=1, padding=1)
        self.pool2 = nn.MaxPool2d(2)

        self.conv5 = nn.Conv2d(128, 256, 3, stride=1, padding=1)
        self.conv6 = nn.Conv2d(256, 256, 3, stride=1, padding=1)
        self.pool3 = nn.MaxPool2d(2)

        nn.Flatten()
        self.fc1 = nn.Linear(256*4*4, 512)
        self.fc2 = nn.Linear(256, 512)
        self.fc3 = nn.Linear(256, 10)

    def forward(self, X):
        x = F.relu(self.conv1(X))
        x = F.relu(self.conv2(x))
        x = self.pool1(x)
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv4(x))
        x = self.pool2(x)
        x = F.relu(self.conv5(x))
        x = F.relu(self.conv6(x))
        x = self.pool3(x)
        x = x.reshape(-1, 256*4*4)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

import torch.optim as optim

cuda = torch.device('cuda')
model = HWCNN(3, 10)
model.to(cuda)

optimizer = optim.SGD(model.parameters(), lr=0.1)
loss_fn = nn.CrossEntropyLoss()
epochs = 20

train_losses = []
valid_losses = []

best_valid_acc = 0

for epoch in range(0, epochs):
    
    print('Epoch number ', epoch + 1)
    
    train_loss = train(model, loss_fn, optimizer)
    train_losses.append(train_loss)
    train_accuracy = accuracy(model, train_loader)
    
    valid_loss = validate(model, loss_fn, optimizer)
    valid_losses.append(valid_loss)
    valid_accuracy = accuracy(model, valid_loader)
    if best_valid_acc < valid_accuracy:
        best_valid_acc = valid_accuracy
    
    training_stats(train_loss, train_accuracy, valid_loss, valid_accuracy)
print('Best validation accuracy', best_valid_acc)

If I run this, I get RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x512 and 256x512). But the matrices have the same size, 256x512. Why is this happening? I tried modifying the reshape arguments as well, but I can't get this to work. Any ideas? Thank you, in advance!

Solution

Declaring FC layers works like nn.Linear(num_in_features, num_out_features); see official documentation. In other words, you're telling fc1 to start with 25644 input features produce 512 output features. Then fc2 tries to start with 256 input features and produces 512 output features. This is a mismatch, since you're telling fc2 to expect 256 inputs but you actually passed it 512 inputs.

If you fix this mistake, you'll immediately see a new error: fc2 currently returns 512 output features, but then fc3 expects only 256 inputs.

You can fix both problems at once by replacing your line self.fc2 = nn.Linear(256, 512) with self.fc2 = nn.Linear(512, 256).

Edit: You should also remove the line x = self.fc2(x). As is, your code is applying fc2 twice in a row. I'm assuming you just made a typo, since two consecutive FC layers are no better than one if you don't include an activation function like ReLU in between. This is also the cause of the new error mentioned in your comment, since the first fc2 call spits out 256 features and then the second fc2 call expects 512 inputs.

Bonus debugging tip: Try writing

print(model(torch.zeros((3,38,38))))

at any point after you've defined model. That will produce an immediate error if you have dimensions mismatches like the ones in this question. Then you can comment out the last several lines in forward() and see if the error goes away. This way you can quickly narrow down which layer is causing the dimension mismatch. Finding the bug becomes much easier once you know for sure exactly which line is causing the problem.

Incidentally, the "shapes cannot be multiplied" error is referring to matrix multiplication. If matrix A has dimensions m x n and matrix B has dimensions r x s, then the product A*B is only defined if n = r, which isn't satisfied for the dimensions in the error message.

Answered By - David Clyde

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, December 5, 2022

[FIXED] pytorch - RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x512 and 256x512) for identical sizes

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels