Issue
I am trying to train a pretty simple 2-layer neural network for a multi-class classification class. I am using CrossEntropyLoss
and I get the following error: ValueError: Expected target size (128, 4), got torch.Size([128])
in my training loop at the point where I am trying to calculate the loss.
My last layer is a softmax so it outputs the probabilities of each of the 4 classes. My target values are a vector of dimension 128
(just the class values). Am I initializing the CrossEntropyLoss
object incorrectly?
I looked up existing posts, this one seemed the most relevant:
https://discuss.pytorch.org/t/valueerror-expected-target-size-128-10000-got-torch-size-128-1/29424 However, if I had to squeeze
my target values, how would that work? Like right now they are just class values for e.g., [0 3 1 0]
. Is that not how they are supposed to look? I would think that the loss function maps the highest probability from the last layer and associates that to the appropriate class index.
Details:
- This is using PyTorch
- Python version is 3.7
- NN architecture is: embedding -> pool -> h1 -> relu -> h2 -> softmax
- Model Def (EDITED):
self.embedding_layer = create_embedding_layer(embeddings)
self.pool = nn.MaxPool1d(1)
self.h1 = nn.Linear(embedding_dim, embedding_dim)
self.h2 = nn.Linear(embedding_dim, 4)
self.s = nn.Softmax(dim=2)
forward pass:
x = self.embedding_layer(x)
x = self.pool(x)
x = self.h1(x)
x = F.relu(x)
x = self.h2(x)
x = self.s(x)
return x
Solution
The issue is that the output of your model is a tensor shaped as (batch, seq_length, n_classes)
. Each sequence element in each batch is a four-element tensor corresponding to the predicted probability associated with each class (0
, 1
, 2
, and 3
). Your target tensor is shaped (batch,)
which is usually the correct shape (you didn't use one-hot-encodings). However, in this case, you need to provide a target for each one of the sequence elements.
Assuming the target is the same for each element of your sequence (this might not be true though and is entirely up to you to decide), you may repeat the targets seq_length
times. nn.CrossEntropyLoss
allows you to provide additional axes, but you have to follow a specific shape layout:
- Input: (N, C) where C = number of classes, or (N, C, d_1, d_2, ..., d_K) with K≥1 in the case of K-dimensional loss.
- Target: (N) where each value is 0 ≤ targets[i] ≤ C−1 , or (N, d_1, d_2, ..., d_K) with K≥1 in the case of K-dimensional loss.
In your case, C=4
and seq_length
(what you referred to as D
) would be d_1
.
>>> seq_length = 10
>>> out = torch.rand(128, seq_length, 4) # mocking model's output
>>> y = torch.rand(128).long() # target tensor
>>> criterion = nn.CrossEntropyLoss()
>>> out_perm = out.permute(0, 2, 1)
>>> out_perm.shape
torch.Size([128, 4, 10]) # (N, C, d_1)
>>> y_rep = y[:, None].repeat(1, seq_length)
>>> y_rep.shape
torch.Size([128, 10]) # (N, d_1)
Then call your loss function with criterion(out_perm, y_rep)
.
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.