Sunday, July 31, 2022

[FIXED] Model converges to 0.5 during transfer learning

July 31, 2022 keras, pandas, python, tensorflow No comments

Issue

When I retrain a binary classification model after training a model once, the model always converges to predicting 0.5. Initially, after the first epoch, the output of the model is already predicting the same value (0.68), before slowly trending towards 0.5.

The model is loaded through keras.models.load_model("oldModel.h5") and cloned to keep the old model around for use through keras.models.clone_model(model).

The dataset is loaded in exactly the same way during the retraining as it was in the inital training run, through 1. loading filenames and labels into pandas series, 2. loading into tensors, 3. loading the images through a partial call, with caching and prefetching to help training performance.

Besides decreasing the learning rate, all other parameters such as trainable layers and loss functions are kept constant throughout both training processes. Trainable layers are the same, and only consist of the top few layers of the model.

I have checked the training dataset to check if the output is correct, and it is as expected.

Code to load dataset

X_train_file = self.df[self.df.subset=="train"].fileName
Y1 = self.df[self.df.subset=="train"].meanElicat
Y2 = self.df[self.df.subset=="train"].finalY.astype(int)
        
X_test_file = self.df[self.df.subset=="val"].fileName
Y1_test = self.df[self.df.subset=="val"].meanElicat
Y2_test = self.df[self.df.subset=="val"].finalY.astype(int)
      
train_image_paths = tf.convert_to_tensor(X_train_file, dtype=tf.string)
train_Y1 = tf.convert_to_tensor(Y1)
train_Y2 = tf.convert_to_tensor(Y2)
        
test_image_paths = tf.convert_to_tensor(X_test_file, dtype=tf.string)
test_Y1 = tf.convert_to_tensor(Y1_test)
test_Y2 = tf.convert_to_tensor(Y2_test)
        
train = tf.data.Dataset.from_tensor_slices(  ( train_image_paths,  (train_Y1,train_Y2)  )  )

test = tf.data.Dataset.from_tensor_slices(  ( test_image_paths,  (test_Y1,test_Y2)  ) )
        
def map_fn(path, label):
    image = tf.io.decode_jpeg(tf.io.read_file(path))
    image = tf.image.resize(image, [300, 300])
    return image, label
        
self.train_ds = train.map(partial(map_fn), num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(16).cache().prefetch(tf.data.experimental.AUTOTUNE)
self.val_ds = test.map(partial(map_fn), num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(16).cache().prefetch(tf.data.experimental.AUTOTUNE)

Code to train and fit the model

self.model.compile(tf.keras.optimizers.Adam(eval(lr)), 
              loss = {'classification': tf.keras.losses.BinaryCrossentropy()},
              metrics = {'classification': ["accuracy", tf.keras.metrics.AUC(),]})

filepath = os.path.join("weights", f"{str(weightName)}_{blocks:02d}_CW[{cw}]" + 
                        ".E[{epoch}]_{val_loss:.4f}.h5")
callbacks = [
            CSVLogger(weights,blocks,cw, self.logName), 
             tf.keras.callbacks.EarlyStopping(patience=patience), 
             tf.keras.callbacks.ModelCheckpoint(filepath, 
                                                save_best_only=True, 
                                                save_weights_only=False)]

try:
    self.model.fit(self.train_ds, epochs=epoch, callbacks=callbacks, 
              validation_data=self.val_ds, 
              verbose = 2)
except Exception as e:
    print(traceback.format_exc())

Output of retrained model after 1 epoch

array([[0.6818546 ],
       [0.6817692 ],
       [0.68143094],
       [0.6824522 ],
       [0.6820409 ],
       [0.6816176 ],
       [0.68077767],
       [0.68115866],
       ...

Output of retrained model after 11 epochs

array([[0.4997447 ],
       [0.49965417],
       [0.5004351 ],
       [0.49858376],
       [0.49974793],
       [0.500144  ],
       [0.50129014],
       [0.5004081 ],
       ...

Output of model chosen for retraining after first training process

array([[0.01635163],
       [0.8146548 ],
       [0.08911347],
       [0.03006527],
       [0.04414936],

        ...

       [0.8874662 ],
       [0.37499326],
       [0.98350084],
       [0.9966594 ],
       [0.09798203]], dtype=float32)

Thank you for any help rendered!

Solution

The issue was from keras.clone_model(model) which requires the input tensor to be specified. To save on time I just read the model anytime I wanted to use it.

Answered By - vernal123

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, July 31, 2022

[FIXED] Model converges to 0.5 during transfer learning

Issue

Code to load dataset

Code to train and fit the model

Output of retrained model after 1 epoch

Output of retrained model after 11 epochs

Output of model chosen for retraining after first training process

Solution

0 comments:

Post a Comment

Popular Posts

Labels