Issue
I'm having some inconsistencies with the output of a encoder I got from this github .
The encoder looks as follows:
class Encoder(nn.Module):
r"""Applies a multi-layer LSTM to an variable length input sequence.
"""
def __init__(self, input_size, hidden_size, num_layers,
dropout=0.0, bidirectional=True, rnn_type='lstm'):
super(Encoder, self).__init__()
self.input_size = 40
self.hidden_size = 512
self.num_layers = 8
self.bidirectional = True
self.rnn_type = 'lstm'
self.dropout = 0.0
if self.rnn_type == 'lstm':
self.rnn = nn.LSTM(input_size, hidden_size, num_layers,
batch_first=True,
dropout=dropout,
bidirectional=bidirectional)
def forward(self, padded_input, input_lengths):
"""
Args:
padded_input: N x T x D
input_lengths: N
Returns: output, hidden
- **output**: N x T x H
- **hidden**: (num_layers * num_directions) x N x H
"""
total_length = padded_input.size(1) # get the max sequence length
packed_input = pack_padded_sequence(padded_input, input_lengths,
batch_first=True,enforce_sorted=False)
packed_output, hidden = self.rnn(packed_input)
pdb.set_trace()
output, _ = pad_packed_sequence(packed_output, batch_first=True, total_length=total_length)
return output, hidden
So it only consists of a rnn lstm cell, if I print the encoder this is the output:
LSTM(40, 512, num_layers=8, batch_first=True, bidirectional=True)
So it should have a 512 sized output right? But when I feed a tensor with size torch.Size([16, 1025, 40])
16 samples of 1025 vectors with size 40 (that gets packed to fit the RNN) the output that I get from the RNN has a new encoded size of 1024 torch.Size([16, 1025, 1024])
when it should have been encoded to 512 right?
Is there something Im missing?
Solution
Setting bidirectional=True
makes the LSTM bidirectional, which means there will be two LSTMs, one that goes from left to right and the other that goes from right to left.
From the nn.LSTM
documentation - Outputs:
output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. If a
torch.nn.utils.rnn.PackedSequence
has been given as the input, the output will also be a packed sequence.For the unpacked case, the directions can be separated using
output.view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.
Your output has the size [batch, seq_len, 2 * hidden_size]
(batch
and seq_len
are swapped in your case due to setting batch_first=True
) because of using a bidirectional LSTM. The outputs of the two are concatenated in order to have the information of both, which you could easily separate if you wanted to treat them differently.
Answered By - Michael Jungo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.