Issue
I want to have the right tensor size, because I get the following error in line of loss = criterion(out,target)
:
Expected input batch_size (4200) to match target batch_size (64).
How can I solve the challenge?
My output tensor has the size ([4200, 2]) and my target tensor ([64,2]). The usecase is an image classification. There are two classes. My batch size is 64 and the images has a size of 180 x 115px in grayscale. Please don't confiuse: There are some 'break' to test the code in the early state of development. I load four batches, so 256 images.
With this method i will load my images:
def dataPrep(list_of_data, data_path, category, quantity):
global train_data
target_list = []
train_data_list = []
transform = transforms.Compose([
transforms.ToTensor(),
])
len_data = len(train_data)
print('Len_data: ', len_data)
for item in list_of_data:
f = random.choice(list_of_data)
list_of_data.remove(f)
print(data_path + f)
try:
img = Image.open(data_path +f)
except:
continue
img_crop = img.crop((310,60,425,240))
img_tensor = transform(img_crop)
print(img_tensor.size())
train_data_list.append(img_tensor)
isPseudo = 0
isTrue = 1
if category == True:
target = [isPseudo,isTrue]
else:
isPseudo =1
isTrue = 0
target = [isPseudo, isTrue]
target_list.append(target)
if len(train_data_list) >=64:
train_data.append((torch.stack(train_data_list), target_list))
train_data_list = []
target_list = []
if (len_data*64 + quantity) <= len(train_data)*64:
break
print(len(train_data) *64)
return list_of_data
After I loaded the images, I create my model and optimizer.
model = net.Netz()
optimizer = optim.SGD(model.parameters(), lr= 0.1, momentum = 0.8)
My class 'Netz' looks like this:
class Netz(nn.Module):
def __init__(self):
super(Netz, self).__init__()
self.conv1 = nn.Conv2d(1,10, kernel_size=5)
self.conv2 = nn.Conv2d(10,20, kernel_size = 5)
self.conv_dropout = nn.Dropout2d()
self.fc1 = nn.Linear(320,60)
self.fc2 = nn.Linear(60,2)
def forward(self,x):
x = self.conv1(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.conv2(x)
x = self.conv_dropout(x)
x = F.max_pool2d(x,2)
x = F.relu(x)
x = x.view(-1,320)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, -1)
At the end I will train my CNN:
def trainM(epoch):
model.train()
batch_id = 0
for batch_id, (data, target) in enumerate(net.train_data):
#data = data.cuda()
#target = target.cuda()
target = torch.Tensor(target[64*batch_id:64*(batch_id+1)])
data = Variable(data)
target = Variable(target)
optimizer.zero_grad()
out = model(data)
criterion = F.nll_loss
print('Size of out:', out.size())
print('Size of target:', target.size())
loss = criterion(out,target)
loss.backward()
optimizer.step()
print('Tain Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(epoch,batch_id*len(data), len(net.train_data.dataset), 100*batch_id/len(net.train_data), loss.item()))
batch_id += 1
break
for item in range(0,10):
trainM(item)
break
Solution
The main problem is at Netz x = x.view(-1,320)
you have 64 batches 20 channels 42 x 25 width and height if you reshape it to -1, 320 would get 4200 by 320.
I can suggest 3 possible options to preserve the batchsize;
(what is generally done) pad the input to become square, update convolution part so that its output before FC layers have high number of channels and small number of height and width. for instance obtain
x.shape = (batchsize, 128,2,2)
then make thefc1 = Linear(512, 60)
and before that dox = x.reshape(x.shape[0], -1)
. (here before applyingfc1
you may do a 1x1 convolution).make the number of channels at the end of convolutions 1 i.e. get something like
x.shape = (batchsize,1,42,25)
then adoptfc1
accordingly.do
x=reshape(*x.shape[:2], -1)
in other words preserve both chanel and batchsize. Add another FC layerfc_e = Linear(20,1)
to to compress your channels.
class Netz(nn.Module):
def __init__(self):
super(Netz, self).__init__()
self.conv1 = nn.Conv2d(1,10, kernel_size=5)
self.conv2 = nn.Conv2d(10,20, kernel_size = 5)
self.conv_dropout = nn.Dropout2d()
self.fc1 = nn.Linear(1050,60)
self.fc2 = nn.Linear(60,2)
self.fce = nn.Linear(20,1)
def forward(self,x):
x = self.conv1(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.conv2(x)
x = self.conv_dropout(x)
x = F.max_pool2d(x,2)
x = F.relu(x)
x = x.reshape(x.shape[0],x.shape[1], -1)
x = F.relu(self.fc1(x))
x = self.fc2(x)
x = self.fce(x.permute(0,2,1)).squeeze(-1)
return F.log_softmax(x, -1)
keep in mind that you have a trade off between the amount of information you want to be represented (should be high) and the number of inputs of Linear layer (not that high). At the end its a matter of how you chose to tackle the issue. The third one is the closest to your solution but I recommend figuring out a model complying with the first approach
Answered By - D. ACAR
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.