Issue
I am designing a machine learning model that takes a feature tensor from ResNet and uses an LSTM to identify the sequences of letters in the image. The feature tensor that's from ResNet is 4-D , however, LSTM_cell wants inputs that are 2-D. I know about other methods such as .view() and .squeeze() that are able to reduce dimensions. However, it seems as if I do this, it changes the size of the dimensions of the feature vectors. At first the vector is [128, 2, 5, 512] but it needs to be [128, 512]. However, calling .view(-1,512) multiplies the dimensions to get [1280, 512]. How would you change dimensions without multiplying?
Solution
Outputs of CNN should be a 3-D Tensor (e.g. [128, x, 512]
) so that it can be treated as a sequence. Then you can feed them into nn.LSTMCell()
with an x-iteration for-loop.
However, 4-D Tensor remains some spatial features and it is not appropriate to be fed into LSTM. A typical practice is to redesign your CNN architecture to make sure that it produces a 3-D Tensor. For example, you can add an nn.Conv2d()
or something else at the end of CNN network to make the outputs as shape [128, x, 512]
.
Answered By - Depressant
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.