Saturday, February 5, 2022

[FIXED] Why do I get a ValueError when using CrossEntropyLoss

February 05, 2022 neural-network, python, pytorch No comments

Issue

I am trying to train a pretty simple 2-layer neural network for a multi-class classification class. I am using CrossEntropyLoss and I get the following error: ValueError: Expected target size (128, 4), got torch.Size([128]) in my training loop at the point where I am trying to calculate the loss.

My last layer is a softmax so it outputs the probabilities of each of the 4 classes. My target values are a vector of dimension 128 (just the class values). Am I initializing the CrossEntropyLoss object incorrectly?

I looked up existing posts, this one seemed the most relevant: https://discuss.pytorch.org/t/valueerror-expected-target-size-128-10000-got-torch-size-128-1/29424 However, if I had to squeeze my target values, how would that work? Like right now they are just class values for e.g., [0 3 1 0]. Is that not how they are supposed to look? I would think that the loss function maps the highest probability from the last layer and associates that to the appropriate class index.

Details:

This is using PyTorch
Python version is 3.7
NN architecture is: embedding -> pool -> h1 -> relu -> h2 -> softmax
Model Def (EDITED):

self.embedding_layer = create_embedding_layer(embeddings)
self.pool = nn.MaxPool1d(1)
self.h1 = nn.Linear(embedding_dim, embedding_dim)
self.h2 = nn.Linear(embedding_dim, 4)
self.s = nn.Softmax(dim=2)

forward pass:
    x = self.embedding_layer(x)
    x = self.pool(x)
    x = self.h1(x)
    x = F.relu(x)
    x = self.h2(x)
    x = self.s(x)
    return x

Solution

The issue is that the output of your model is a tensor shaped as (batch, seq_length, n_classes). Each sequence element in each batch is a four-element tensor corresponding to the predicted probability associated with each class (0, 1, 2, and 3). Your target tensor is shaped (batch,) which is usually the correct shape (you didn't use one-hot-encodings). However, in this case, you need to provide a target for each one of the sequence elements.

Assuming the target is the same for each element of your sequence (this might not be true though and is entirely up to you to decide), you may repeat the targets seq_length times. nn.CrossEntropyLoss allows you to provide additional axes, but you have to follow a specific shape layout:

Input: (N, C) where C = number of classes, or (N, C, d_1, d_2, ..., d_K) with K≥1 in the case of K-dimensional loss.

Target: (N) where each value is 0 ≤ targets[i] ≤ C−1 , or (N, d_1, d_2, ..., d_K) with K≥1 in the case of K-dimensional loss.

In your case, C=4 and seq_length (what you referred to as D) would be d_1.

>>> seq_length = 10
>>> out = torch.rand(128, seq_length, 4) # mocking model's output
>>> y = torch.rand(128).long()           # target tensor

>>> criterion = nn.CrossEntropyLoss()

>>> out_perm = out.permute(0, 2, 1)
>>> out_perm.shape
torch.Size([128, 4, 10])  # (N, C, d_1)

>>> y_rep = y[:, None].repeat(1, seq_length)
>>> y_rep.shape
torch.Size([128, 10])     # (N, d_1)

Then call your loss function with criterion(out_perm, y_rep).

Answered By - Ivan

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, February 5, 2022

[FIXED] Why do I get a ValueError when using CrossEntropyLoss

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels