Issue
I'm training a transformer model for text generation.
let's assume:
vocab size = 100
embbeding size = 50
max sequence length = 30
batch size = 32
loss = cross entropy loss
the last layer in the model is a fully connected layer,
mapping from shape [30, 32, 50]
to [30, 32, 100]
.
the idea is that each of the last 30 sequences in the first dimension, I have a target vector I want to calculate loss with.
the issue is that based on the docs, this loss only excepts 2 dims on the prediction and one on the target - so how can I fit my 3D prediction into it? (and 2D target?)
Solution
Use torch.BCELoss()
instead (Binary cross entropy). This expects input and target to be the same size but they can be any size, and should fall within the range [0,1]. It performs cross-entropy loss element-wise.
EDIT: if you expect only one element from the vocab to be output, then you should use CrossEntropyLoss
and instead encode your labels as a 1D vector rather than a 2D vector (i.e. do 1-hot decoding). BCE
treats each element in the output for a single example as independent from the others, which is not a valid assumption for a multi-class style problem. I originally misread and thought the final output was an embedding, rather than an element from the vocabulary, hence my original suggestion.
Answered By - DerekG
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.