Monday, August 15, 2022

[FIXED] How can I get the model to resume training from the epoch it left off on?

August 15, 2022 keras, machine-learning, python, tensorflow No comments

Issue

I'm training two autoencoders for a Deepfake, and it needs to go through a round of 150,000 epochs. I stopped it at 10,000 but I want to it be able resume training from the epoch it left off on. Is there a way do so that?

train_setA = video.loading_images(setA_path)/255.0
train_setB = video.loading_images(setB_path)/255.0


train_setA += train_setB.mean( axis=(0,1,2) ) - train_setA.mean( axis=(0,1,2) )


batch_size = int(len(os.listdir(setA_path))/20)

print( "press 'q' to stop training and save model" )

for epoch in range(1000000):
    batch_size = 64
    warped_A, target_A = train_util.training_data( train_setA, batch_size )
    warped_B, target_B = train_util.training_data( train_setB, batch_size )

    loss_A = aeA.train_on_batch( warped_A, target_A )
    loss_B = aeB.train_on_batch( warped_B, target_B )
    print( loss_A, loss_B )
    print('Current epoch no... ' + str(epoch))

    if epoch % 100 == 0:
        save_model_weights()
        print('Model weights saved')
        test_A = target_A[0:14]
        test_B = target_B[0:14]

    figure_A = np.stack([
        test_A,
        aeA.predict( test_A ),
        aeB.predict( test_A ),
        ], axis=1 )
    figure_B = np.stack([
        test_B,
        aeB.predict( test_B ),
        aeA.predict( test_B ),
        ], axis=1 )

    figure = np.concatenate( [ figure_A, figure_B ], axis=0 )
    figure = figure.reshape( (4,7) + figure.shape[1:] )
    figure = train_util.stack_images( figure )

    figure = np.clip( figure * 255, 0, 255 ).astype('uint8')

    cv2.imshow( "", figure )
    key = cv2.waitKey(1)
    if key == ord('q'):
        save_model_weights()
        exit()

Solution

I will tell you in more detail what I know about this topic in 'Keras'

If you save weights after each epoch (for example, ModelCheckpoint), then you can load the saved weights.

For example:

Save:

weight_save_callback = ModelCheckpoint('/path/to/weights.{epoch:02d}-{val_loss:.2f}.hdf5', monitor='val_loss', save_best_only=False) # or True(Best result)
model.fit(X_train,y_train,batch_size=batch_size,nb_epoch=nb_epoch,callbacks=[weight_save_callback])

Load:

model = Sequential()
model.add(...)
model.load('path/to/weights.hf5')

It is important that the models are the same.

Since in some optimizers some of their internal values (for example, the learning rate) are set using the current 'epoch' value, or even you may have (custom) callbacks that depend on the current epoch, initial_epoch allows you to specify the initial epoch value to start with when training. This is mainly needed when you have trained your model for some epochs, and after saving, you want to load it and resume training for several more epochs without disturbing the state of objects that depend on the epoch (for example, an optimizer). So you should set initial_epoch = to a value less than the total number of epochs(i.e. we trained the model for, for example, 20 epochs, and epochs = 40, and then everything will resume as if you initially trained the model for 20 epochs in one training session. However, note that when using the built-in Keras optimizers, you do not need to use initial_epoch, since they store and update their state internally (without taking into account the value of the current epoch), and when saving the model, the state of the optimizer will also be saved.

I hope I helped you

Answered By - Tehnorobot

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, August 15, 2022

[FIXED] How can I get the model to resume training from the epoch it left off on?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels