Issue
I am trying to make a next-word prediction model with LSTM + Mixture Density Network Based on this implementation(https://www.katnoria.com/mdn/).
Input: 300-dimensional word vectors*window size(5) and 21-dimensional array(c) representing topic distribution of the document, used to train hidden initial states.
Output: mixing coefficient*num_gaussians, variance*num_gaussians, mean*num_gaussians*300(vector size)
x.shape, y.shape, c.shape with an experimental 161 obserbations gives me such:
(TensorShape([161, 5, 300]), TensorShape([161, 300]), TensorShape([161, 21]))
from tensorflow.keras.layers import Input, Dense, LSTM, Lambda
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.math import exp
# n_feat is size of word vector
n_feat = 300
window = 5
l = (window, n_feat)
hidden_state_dim = 21
# Number of gaussians to represent the multimodal distribution
k = 26
# Initial
mlp_inp = Input(shape=(hidden_state_dim,))
mlp_dense_h = Dense(128, activation='relu', name="dense_h")(mlp_inp)
mlp_dense_c = Dense(128, activation='relu', name="dense_c")(mlp_inp)
# Network
input = Input(shape=l)
layer1 = LSTM(128, return_sequences=True, name='baselayer1')(input, initial_state=[mlp_dense_h, mlp_dense_c])
layer2 = LSTM(128, name='baselayer2')(layer1)
# Mean
mu = Dense((n_feat * k), activation=None, name='mean_layer')(layer2)
# variance (should be greater than 0 so we exponentiate it)
var_layer = Dense(k, activation=None, name='dense_var_layer')(layer2)
var = Lambda(lambda x: exp(x), output_shape=(k,), name='variance_layer')(var_layer)
# mixing coefficient should sum to 1.0
pi = Dense(k, activation='softmax', name='pi_layer')(layer2)
Below is the .summary() of my model
Model: "model_12"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_7 (InputLayer) [(None, 21)] 0
__________________________________________________________________________________________________
input_8 (InputLayer) [(None, 5, 300)] 0
__________________________________________________________________________________________________
dense_h (Dense) (None, 128) 2816 input_7[0][0]
__________________________________________________________________________________________________
dense_c (Dense) (None, 128) 2816 input_7[0][0]
__________________________________________________________________________________________________
baselayer1 (LSTM) (None, 5, 128) 219648 input_8[0][0]
dense_h[0][0]
dense_c[0][0]
__________________________________________________________________________________________________
baselayer2 (LSTM) (None, 128) 131584 baselayer1[0][0]
__________________________________________________________________________________________________
dense_var_layer (Dense) (None, 26) 3354 baselayer2[0][0]
__________________________________________________________________________________________________
pi_layer (Dense) (None, 26) 3354 baselayer2[0][0]
__________________________________________________________________________________________________
mean_layer (Dense) (None, 7800) 1006200 baselayer2[0][0]
__________________________________________________________________________________________________
variance_layer (Lambda) (None, 26) 0 dense_var_layer[0][0]
==================================================================================================
Total params: 1,369,772
Trainable params: 1,369,772
Non-trainable params: 0
__________________________________________________________________________________________________
However, when I try to run the training process, I get the following error
ValueError: in user code:
<ipython-input-70-084e2be19035>:7 train_step *
loss = mdn_loss(y, pi_, mu_, var_)
<ipython-input-67-9a3cf3d4ccd2>:18 mdn_loss *
out = calc_pdf(y_true, mu, var)
<ipython-input-67-9a3cf3d4ccd2>:6 calc_pdf *
value = tf.subtract(y, mu)**2
.....
ValueError: Dimensions must be equal, but are 300 and 7800 for '{{node Sub}} = Sub[T=DT_FLOAT](y, model_15/mean_layer/BiasAdd)' with input shapes: [161,300], [161,7800].
It tells me that there is a problem with the dimensions of variables specified in tf.subtract() used in calc_pdf(),
# Take a note how easy it is to write the loss function in
# new tensorflow eager mode (debugging the function becomes intuitive too)
def calc_pdf(y, mu, var):
"""Calculate component density"""
value = tf.subtract(y, mu)**2
value = (1/tf.math.sqrt(2 * np.pi * var)) * tf.math.exp((-1/(2*var)) * value)
return value
def mdn_loss(y_true, pi, mu, var):
"""MDN Loss Function
The eager mode in tensorflow 2.0 makes is extremely easy to write
functions like these. It feels a lot more pythonic to me.
"""
out = calc_pdf(y_true, mu, var)
# multiply with each pi and sum it
out = tf.multiply(out, pi)
out = tf.reduce_sum(out, 1, keepdims=True)
out = -tf.math.log(out + 1e-10)
return tf.reduce_mean(out)
but I don't understand how to fix this. I checked the original implementation (in the link above) with 4000 observations, 1 feature, and 26 distributions which had dimensions [4000, 1], [4000, 26] for the particular function, and was working fine. I feel like it should work with [161,300], [161,7800] as well but it's not.
How can I fix this?
(I've checked similar questions regarding "dimension must be equal" but could not figure out how I could make this work for this particular implementation.)
I can post additional info or code if it is not enough, I would really appreciate an answer!
Solution
for MDN model , the likelihood for each sample has to be calculated with all the Gaussians pdf , to do that I think you have to reshape your matrices ( y_true and mu) and take advantage of the broadcasting operation by adding 1 as the last dimension . e.g:
def calc_pdf(y, mu, var):
"""Calculate component density"""
y = tf.reshape(y , (161,300,1))
mu = tf.reshape(mu ,(161,300,26))
value = tf.subtract(y, mu)**2
Answered By - Tou You
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.