Issue
I'm trying to fit a LSTM-model to my data with a Masking Layer in front and I get this error:
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 4 for '{{node binary_crossentropy/weighted_loss/Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](Cast)' with input shapes: [128,4].
This is my code:
from tensorflow.keras.layers import LSTM, Dense, BatchNormalization, Masking
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Nadam
import numpy as np
if __name__ == '__main__':
# define stub data
samples, timesteps, features = 128, 4, 99
X = np.random.rand(samples, timesteps, features)
Y = np.random.randint(0, 2, size=(samples))
# create model
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(None, 99)))
model.add(LSTM(100, return_sequences=True))
model.add(BatchNormalization())
model.add(Dense(1, activation='sigmoid'))
optimizer = Nadam(learning_rate=0.0001)
loss = BinaryCrossentropy(from_logits=False)
model.compile(loss=loss, optimizer=optimizer)
# train model
model.fit(
X,
Y,
batch_size=128)
I see from this related post, that I can't use one-hot encoded labels, but my labels are not one-hot encoded. Also, when I remove the masking layer, training works.
From my understanding one sample consists of 4 timesteps with 99 features here. The shape of X is therefore (128,4,99)
Therefore, I only have to provide one label for each sample, the shape of Y therefore being (128,)
But it seems like the dimensions of X and or Y are not correct, as tensorflow wants to change its dimensions?
I have tried providing a label per timestep of each sample (Y = np.random.randint(0, 2, size=(samples, timesteps))
, with the same result.
Why does adding the masking layer introduce this error? And how can I keep the masking layer without getting the error?
System Information:
- Python version: 3.9.5
- Tensorflow version: 2.5.0
- OS: Windows
Solution
I don't think the problem is the Masking
layer. Since you set the parameter return_sequences
to True
in the LSTM
layer, you are getting a sequence with the same number of time steps as your input and an output space of 100 for each timestep, hence the shape (128, 4, 100)
, where 128 is the batch size. Afterwards, you apply a BatchNormalization
layer and finally a Dense
layer resulting in the shape (128, 4, 1)
. The problem is your labels have a 2D shape (128, 1)
and your model has a 3D output due to the return_sequences
parameter. So, simply setting this parameter to False
should solve your problem. See also this post.
Here is a working example:
from tensorflow.keras.layers import LSTM, Dense, BatchNormalization, Masking
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Nadam
import numpy as np
if __name__ == '__main__':
# define stub data
samples, timesteps, features = 128, 4, 99
X = np.random.rand(samples, timesteps, features)
Y = np.random.randint(0, 2, size=(samples))
# create model
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(None, 99)))
model.add(LSTM(100, return_sequences=False))
model.add(BatchNormalization())
model.add(Dense(1, activation='sigmoid'))
optimizer = Nadam(learning_rate=0.0001)
loss = BinaryCrossentropy(from_logits=False)
model.compile(loss=loss, optimizer=optimizer)
# train model
model.fit(
X,
Y,
batch_size=128)
Answered By - AloneTogether
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.