Wednesday, June 15, 2022

[FIXED] How to train network on images of different sizes Pytorch

June 15, 2022 neural-network, python, pytorch No comments

Issue

I am trying to feed the Neural network dataset of images and I am getting this error I don't know what might be the cause as all the images have different sizes I have also tried to change batch sizes and kernels but I had no success with this.

 File "c:\Users\david\Desktop\cs_agent\main.py", line 49, in <module>
    for i, data in enumerate(train_loader, 0):
  File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 530, in __next__
    data = self._next_data()
  File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 570, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\collate.py", line 172, in default_collate
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\collate.py", line 172, in <listcomp>
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\collate.py", line 138, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 300, 535] at entry 0 and [3, 1080, 1920] at entry 23

this is my main file


import numpy as np
import matplotlib.pyplot as plt
import torch
import dataset 
import os 
from torch.utils.data import  DataLoader
import torch.nn as nn

import torchvision
import check_device

import neural_network
import torch.optim as optim

EPS = 1.e-7
LR=0.5
WEIGHT_DECAY=0.5
batch_size =50
#DATA LOADING ###################################################################################################################



test_dataset =dataset.csHeadBody(csv_file="images\\test_labels.csv",root_dir="images\\test")
train_dataset =dataset.csHeadBody(csv_file="images\\train_labels.csv",root_dir="images\\train")
train_loader =DataLoader(dataset =train_dataset,batch_size=batch_size,shuffle=True)
test_loader =DataLoader(dataset=test_dataset,batch_size=batch_size,shuffle=True)




#DATA LOADING ###################################################################################################################END


#NEURAL NET #####################################################################################################################################################

net=neural_network.Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)


#NEURAL NET END ######################################################################################



for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        # get the inputs; data is a list of [inputs, labels]
        print(data)
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')

and this is my dataset file

class csHeadBody(Dataset):
    def __init__(self, csv_file, root_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.root_dir, self.img_labels.iloc[idx, 0])
        
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

this is my neural network architecture

import torch.nn.functional as F
import torch.nn as nn
import torch


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 535, 535)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Solution

You need to adjust the parameters of your convolutional and linear layers. The first argument is the number of input channels (3 for standard RGB images in conv1), then the number of output channels and then the convolution kernel size. To clarify, I've used named arguments in the code below. The code works for images of a square input size of 224x224 pixels (standard imagenet size, adjust if needed). If you want image size agnostic code you could use something like global average pooling (mean of each channel in the last conv layer). The net below supports both:

class Net(nn.Module):
    def __init__(self, use_global_average_pooling: bool = False):
        super().__init__()
        self.use_global_average_pooling = use_global_average_pooling
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
        self.pool = nn.MaxPool2d(kernel_size=(2, 2))
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
        if use_global_average_pooling:
            self.fc_gap = nn.Linear(64, 10)
        else:
            self.fc_1 = nn.Linear(54 * 54 * 64, 84)  # 54 img side times 64 out channels from conv2
            self.fc_2 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))  # img side: (224 - 2) // 2 = 111
        x = self.pool(F.relu(self.conv2(x)))  # img side: (111 - 2) // 2 =  54
        if self.use_global_average_pooling:
            # mean for global average pooling (mean over channel dimension)
            x = x.mean(dim=(-1, -2))
            x = F.relu(self.fc_gap(x))
        else:  # use all features
            x = torch.flatten(x, 1)
            x = F.relu(self.fc_1(x))
            x = self.fc_2(x)
        return x

Additionally, the torchvision.io.read_image function used in your Dataset returns an uint8 tensor with integer values from 0 to 255. You'll want floating point values for your network, so you have to divide the result by 255 to get values in the [0, 1] range. Furthermore, neural networks work best with normalized inputs (subtracting the mean and then dividing by the standard error of your training dataset). I've added normalization to the image transforms below. For convenience, it is using the imagenet mean and standard error, which should work fine if your images are similar to imagenet images (otherwise you can calculate them on your own images).

Note that the resizing might distort your images (doesn't keep the original aspect ratio). Often this is no problem, but if it is you might want to pad your images with a constant color (e.g. black) to resize them to the required dimensions (there are also transforms for this in the torchvision library).

IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD = [0.229, 0.224, 0.225]
transforms = torchvision.transforms.Compose([
    torchvision.transforms.Lambda(lambda x: x / 255.),
    torchvision.transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
    torchvision.transforms.Resize((224, 224)),
])

You might also need to adjust the code in your Dataset to load images as an RGB image (if they also have an alpha channel). This can be done like this:

image = read_image(img_path, mode=torchvision.io.image.ImageReadMode.RGB)

You can then initialise your Dataset using:

test_dataset = dataset.csHeadBody(csv_file="images\\test_labels.csv", root_dir="images\\test", transform=transforms)
train_dataset = dataset.csHeadBody(csv_file="images\\train_labels.csv", root_dir="images\\train", transform=transforms)

I haven't tested the code, let me know if it doesn't work!

Answered By - asdf

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, June 15, 2022

[FIXED] How to train network on images of different sizes Pytorch

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels