Issue
I understand how GRU works, but now I'm confused by the difference between "hidden" and "output" of GRU in Pytorch: is "output" just the hidden states of the GRU, or the hidden states after some transformations? If "output" is just the hidden states, why do we want both output
and h_n
as return value of GRU.forward
, since we can just get h_n
from the last element of output
?
Solution
According to the documentation:
- output is a tensor of shape (batch_size, sequence_length, num_layers, num_directions * hidden_size)
- h_n a tensor of shape (num_directions * num_layers, batch_size, hidden_size)
The first provides you with hidden states across the entire sequence, allowing you to use intermediate representations (e.g. for attention) or train for step-separable tasks (e.g. token-level classification). The latter provides you with just a single summary vector per input sequence, which is handy if you're only interested in a sequence-level representation that doesn't involve token-wise attention.
There is no implicit hidden-to-output transformation.
Answered By - KonstantinosKokos
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.