Friday, March 11, 2022

[FIXED] TensorFlow BinaryCrossentropy loss quickly reaches NaN

March 11, 2022 deep-learning, keras, loss-function, nan, tensorflow No comments

Issue

TL;DR - ML model loss, when retrained with new data, reaches NaN quickly. All of the "standard" solutions don't work.

Hello,

Recently, I (successfully) trained a CNN/dense-layered model to be able to classify spectrograms (image representations of audio.) I wanted to try training this model again with new data and made sure that it was the correct dimensions, etc.

However, for some reason, the BinaryCrossentropy loss function steadily declines until around 1.000 and suddenly becomes "NaN" within the first epoch. I have tried lowering the learning rate to 1e-8, am using ReLu throughout and sigmoid for the last layer, but nothing seems to be working. Even simplifying the network to only dense layers, this problem still happens. While I have manually normalized my data, I am pretty confident I did it right so that all of my data falls between [0, 1]. There might be a hole here, but I think that is unlikely.

I attached my code for the model architecture here:

input_shape = (125, 128, 1)

model = models.Sequential([
    
    layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001), input_shape=input_shape),
    layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
    layers.BatchNormalization(),

    layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
    layers.BatchNormalization(),
    
    layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
    layers.BatchNormalization(),
    
    layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
    layers.BatchNormalization(),
    
    layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
    layers.BatchNormalization(),
    
    layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
    layers.BatchNormalization(),
    
    layers.Dropout(0.3),
    
    layers.Flatten(),
    layers.Dense(512, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
    
])

Interestingly though, I tried using this new data to fine-tune a VGG16 model, and it worked! (there is no loss NaN problem.) I've attached that code here, but I genuinely have no idea where/if there is any difference causing the problem:

base_model = keras.applications.VGG16(
    weights="imagenet", 
    input_shape=(125, 128, 3),
    include_top=False,
) 

# Freeze the base_model
base_model.trainable = False

# Create new model on top
inputs = keras.Input(shape=(125, 128, 3))
x = inputs
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dropout(0.5)(x)  # Regularize with dropout
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs, outputs)

model.summary()

I think I've been through all of the "book" solutions, and still can't seem to find the source of the problem. Any help would be much appreciated.

Solution

Turns out it was an issue with some of my input data (divide by zero error during normalization.) Sorry for all the trouble and thanks for your help.

Answered By - Joseph Yu

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, March 11, 2022

[FIXED] TensorFlow BinaryCrossentropy loss quickly reaches NaN

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels