Issue
I am trying to feed the Neural network dataset of images and I am getting this error I don't know what might be the cause as all the images have different sizes I have also tried to change batch sizes and kernels but I had no success with this.
File "c:\Users\david\Desktop\cs_agent\main.py", line 49, in <module>
for i, data in enumerate(train_loader, 0):
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 530, in __next__
data = self._next_data()
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 570, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\fetch.py", line 52, in fetch
return self.collate_fn(data)
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\collate.py", line 172, in default_collate
return [default_collate(samples) for samples in transposed] # Backwards compatibility.
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\collate.py", line 172, in <listcomp>
return [default_collate(samples) for samples in transposed] # Backwards compatibility.
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\collate.py", line 138, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 300, 535] at entry 0 and [3, 1080, 1920] at entry 23
this is my main file
import numpy as np
import matplotlib.pyplot as plt
import torch
import dataset
import os
from torch.utils.data import DataLoader
import torch.nn as nn
import torchvision
import check_device
import neural_network
import torch.optim as optim
EPS = 1.e-7
LR=0.5
WEIGHT_DECAY=0.5
batch_size =50
#DATA LOADING ###################################################################################################################
test_dataset =dataset.csHeadBody(csv_file="images\\test_labels.csv",root_dir="images\\test")
train_dataset =dataset.csHeadBody(csv_file="images\\train_labels.csv",root_dir="images\\train")
train_loader =DataLoader(dataset =train_dataset,batch_size=batch_size,shuffle=True)
test_loader =DataLoader(dataset=test_dataset,batch_size=batch_size,shuffle=True)
#DATA LOADING ###################################################################################################################END
#NEURAL NET #####################################################################################################################################################
net=neural_network.Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
#NEURAL NET END ######################################################################################
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(train_loader, 0):
# get the inputs; data is a list of [inputs, labels]
print(data)
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0
print('Finished Training')
and this is my dataset file
class csHeadBody(Dataset):
def __init__(self, csv_file, root_dir, transform=None, target_transform=None):
self.img_labels = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
self.target_transform = target_transform
def __len__(self):
return len(self.img_labels)
def __getitem__(self, idx):
img_path = os.path.join(self.root_dir, self.img_labels.iloc[idx, 0])
image = read_image(img_path)
label = self.img_labels.iloc[idx, 1]
if self.transform:
image = self.transform(image)
if self.target_transform:
label = self.target_transform(label)
return image, label
this is my neural network architecture
import torch.nn.functional as F
import torch.nn as nn
import torch
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 535, 535)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
Solution
You need to adjust the parameters of your convolutional and linear layers. The first argument is the number of input channels (3 for standard RGB images in conv1
), then the number of output channels and then the convolution kernel size. To clarify, I've used named arguments in the code below. The code works for images of a square input size of 224x224
pixels (standard imagenet size, adjust if needed). If you want image size agnostic code you could use something like global average pooling (mean of each channel in the last conv layer). The net below supports both:
class Net(nn.Module):
def __init__(self, use_global_average_pooling: bool = False):
super().__init__()
self.use_global_average_pooling = use_global_average_pooling
self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
self.pool = nn.MaxPool2d(kernel_size=(2, 2))
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
if use_global_average_pooling:
self.fc_gap = nn.Linear(64, 10)
else:
self.fc_1 = nn.Linear(54 * 54 * 64, 84) # 54 img side times 64 out channels from conv2
self.fc_2 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x))) # img side: (224 - 2) // 2 = 111
x = self.pool(F.relu(self.conv2(x))) # img side: (111 - 2) // 2 = 54
if self.use_global_average_pooling:
# mean for global average pooling (mean over channel dimension)
x = x.mean(dim=(-1, -2))
x = F.relu(self.fc_gap(x))
else: # use all features
x = torch.flatten(x, 1)
x = F.relu(self.fc_1(x))
x = self.fc_2(x)
return x
Additionally, the torchvision.io.read_image
function used in your Dataset
returns an uint8
tensor with integer values from 0 to 255. You'll want floating point values for your network, so you have to divide the result by 255
to get values in the [0, 1]
range. Furthermore, neural networks work best with normalized inputs (subtracting the mean and then dividing by the standard error of your training dataset). I've added normalization to the image transforms below. For convenience, it is using the imagenet mean and standard error, which should work fine if your images are similar to imagenet images (otherwise you can calculate them on your own images).
Note that the resizing might distort your images (doesn't keep the original aspect ratio). Often this is no problem, but if it is you might want to pad your images with a constant color (e.g. black) to resize them to the required dimensions (there are also transforms for this in the torchvision library).
IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD = [0.229, 0.224, 0.225]
transforms = torchvision.transforms.Compose([
torchvision.transforms.Lambda(lambda x: x / 255.),
torchvision.transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
torchvision.transforms.Resize((224, 224)),
])
You might also need to adjust the code in your Dataset
to load images as an RGB image (if they also have an alpha channel). This can be done like this:
image = read_image(img_path, mode=torchvision.io.image.ImageReadMode.RGB)
You can then initialise your Dataset
using:
test_dataset = dataset.csHeadBody(csv_file="images\\test_labels.csv", root_dir="images\\test", transform=transforms)
train_dataset = dataset.csHeadBody(csv_file="images\\train_labels.csv", root_dir="images\\train", transform=transforms)
I haven't tested the code, let me know if it doesn't work!
Answered By - asdf
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.