Thursday, June 16, 2022

[FIXED] Why doesnt this simple keras model converge?

June 16, 2022 deep-learning, keras, neural-network, tensorflow No comments

Issue

The loss function doesn’t approach 0. It doesn’t seem to converge, and consistently can’t predict Y. I've tried playing with the initializer, activation and layer sizes. Any insight here would be appreciated.

import tensorflow as tf
import keras

activation = 'relu'
initializer = 'he_uniform'
input_layer = tf.keras.layers.Input(shape=(1,),batch_size=1)
dense_layer = keras.layers.Dense(
    32,
    activation=activation,
    kernel_initializer=initializer
)(input_layer)
dense_layer = keras.layers.Dense(
    32,
    activation=activation,
    kernel_initializer=initializer
)(dense_layer)
predictions = keras.layers.Dense(1)(
    dense_layer
)

model = keras.models.Model(inputs=input_layer, outputs=[predictions])
model.summary()

optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)

x = tf.constant([[727.], [1424.], [379], [1777], [51.]])
y = tf.constant([[1.], [1.], [0.], [1.], [0.]])
for item in tf.data.Dataset.from_tensor_slices((x,y)).shuffle(5).repeat():

    with tf.GradientTape() as tape:
        x = item[0]
        output = model(x)
        loss = keras.losses.BinaryCrossentropy(
            from_logits=True
        )(item[1], output)
        # loss = item[1] - output[0]
        dy_dx = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(dy_dx, model.trainable_weights))
        print("batch", item[0], "x",  "output", output, "expected", item[1], "gradient", dy_dx[-1])

        print("loss", loss)

Solution

Your input numbers are huge which leads to numerical issues, and you are not batching your inputs which leads to each batch producing very large gradients (again, due to the large input numbers) in possibly different directions. It works fine when I

Add .batch(5) to the dataset definition (in fact, just replaced shuffle because every batch contains the full dataset) to improve the gradient estimates,
Divide the inputs by 1000 to get them in a more reasonable range,
After that you can increase the learning rate (something as high as 0.1 works fine) to speed up the training significantly.

This should converge very quickly.

import tensorflow as tf
import keras

activation = 'relu'
initializer = 'he_uniform'
input_layer = tf.keras.layers.Input(shape=(1,))
dense_layer = keras.layers.Dense(
    32,
    activation=activation,
    kernel_initializer=initializer
)(input_layer)
dense_layer = keras.layers.Dense(
    32,
    activation=activation,
    kernel_initializer=initializer
)(dense_layer)
predictions = keras.layers.Dense(1)(
    input_layer
)

model = keras.models.Model(inputs=input_layer, outputs=[predictions])
model.summary()

optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)

x = tf.constant([[727.], [1424.], [379], [1777], [51.]]) / 1000.
y = tf.constant([[1.], [1.], [0.], [1.], [0.]])
for step, item in enumerate(tf.data.Dataset.from_tensor_slices((x,y)).batch(5).repeat()):

    with tf.GradientTape() as tape:
        x = item[0]
        output = model(x)
        loss = keras.losses.BinaryCrossentropy(
            from_logits=True
        )(item[1], output)
        # loss = item[1] - output[0]
        dy_dx = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(dy_dx, model.trainable_weights))
        if not step % 100:
            print("batch", item[0], "x",  "output", tf.nn.sigmoid(output), "expected", item[1], "gradient", dy_dx[-1])
            print("loss", loss)

And note: You using no activation function with a binary cross-entropy "from logits" is correct, so ignore people telling you otherwise.

Answered By - xdurch0

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, June 16, 2022

[FIXED] Why doesnt this simple keras model converge?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels