I am building a multi-class Vision Transformer Network. When passing my values through my loss function, it always returns zero. My output layer consisits of 37 Dense Layers with a softmax-unit on each on of them. criterion is created with nn.CrossEntropyLoss().The output of criterion is 0.0 for every iteration. I am using the colab notebook. I printed out the output and label for one iteration:
for output, label in zip(iter(ouputs_t), iter(labels_t)):
loss += criterion(
# reshape label from (Batch_Size) to (Batch_Size, 1)
torch.reshape(label, (label.shape[0] , 1 ))
output: tensor([[0.1534],
[0.7588]], device='cuda:0', grad_fn=<UnbindBackward0>)
label: tensor([[0.],
[0.]], device='cuda:0')
My Model:
class vit_large_patch16_224_multiTaskNet(nn.Module):
def __init__(self, output_classes, frozen_feature_layers=False):
vit_base_patch16_224 = timm.create_model('vit_large_patch16_224',pretrained=True)
self.is_frozen = frozen_feature_layers
# here we get all the modules(layers) before the fc layer at the end
self.features = nn.ModuleList(vit_base_patch16_224.children())[:-1]
self.features = nn.Sequential(*self.features)
if frozen_feature_layers:
# now lets add our new layers
in_features = vit_base_patch16_224.head.in_features
# it helps with performance. you can play with it
# create more layers, play/experiment with them.
self.fc0 = nn.Linear(in_features, 512)
self.bn_pu = nn.BatchNorm1d(512, eps = 1e-5)
self.output_modules = nn.ModuleList()
for i in range(output_classes):
self.output_modules.append(nn.Linear(512, 1))
# initialize all fc layers to xavier
for m in self.modules():
if isinstance(m, nn.Linear):
torch.nn.init.xavier_normal_(m.weight, gain = 1)
def forward(self, input_imgs):
output = self.features(input_imgs)
final_cs_token = output[:, 0]
output = self.bn_pu(F.relu(self.fc0(final_cs_token)))
output_list= list()
for output_modul in self.output_modules:
# Convert List to Tensor
output_tensor = torch.stack(output_list)
output_tensor = torch.swapaxes(output_tensor, 0 , 1)
return output_tensor
def _set_freeze_(self, status):
for n,p in self.features.named_parameters():
p.requires_grad = status
# for m in self.features.children():
# for p in m.parameters():
# p.requires_grad=status
def freeze_feature_layers(self):
def unfreeze_feature_layers(self):
You are in a multi-class classification scenario, which means you can consider your problem as c
-binary class classification done in parallel (where c
is the total number of class). Having output_t
the logit tensor containing the values outputted by your model's last linear layer and target
the ground-truth tensor containing the true classes states for each instance in the batch. You can apply nn.BCEWithLogitsLoss
since it works with multi-dimensional tensors out of the box:
With dummy inputs:
>>> output_t = torch.rand(47, 32, 1)
>>> target = torch.randint(0, 2, (47, 32, 1)).float()
Then initializing and calling the loss function:
>>> loss = nn.BCEWithLogitsLoss()
>>> loss(output_t, target)
