Saturday, March 12, 2022

[FIXED] Higher starting loss compared to previous epoch

March 12, 2022 deep-learning, keras, lstm, python, tensorflow No comments

Issue

I am currently using code from https://keras.io/examples/vision/handwriting_recognition/ which is a tutorial on text recognition. I am using a local dataset to test the model. And during my experiments I have encountered something which made me question.

1.) Is it normal for a loss value to start at a higher value than the previous loss? If not what could be the cause of this and how can I prevent this?

2.) Is a val_loss of 1 good enough for bi-LSTM networks? If not how can I lessen the loss?

Here is the snippet of two consecutive epochs.

1520/1520 [==============================] - 735s 484ms/step - loss: 2.5462 - val_loss: 2.7302 
 Epoch 12/100
 443/1520 [=======>......................] - ETA: 8:18 - loss: 3.9221

Below is the summary of the current model

Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 image (InputLayer)             [(None, 128, 32, 1)  0           []                               
                                ]                                                                 
                                                                                                  
 Conv1 (Conv2D)                 (None, 128, 32, 32)  320         ['image[0][0]']                  
                                                                                                  
 batchnorm1 (BatchNormalization  (None, 128, 32, 32)  128        ['Conv1[0][0]']                  
 )                                                                                                
                                                                                                  
 pool1 (MaxPooling2D)           (None, 64, 16, 32)   0           ['batchnorm1[0][0]']             
                                                                                                  
 Conv2 (Conv2D)                 (None, 64, 16, 64)   18496       ['pool1[0][0]']                  
                                                                                                  
 Conv3 (Conv2D)                 (None, 64, 16, 64)   36928       ['Conv2[0][0]']                  
                                                                                                  
 batchnorm2 (BatchNormalization  (None, 64, 16, 64)  256         ['Conv3[0][0]']                  
 )                                                                                                
                                                                                                  
 pool2 (MaxPooling2D)           (None, 32, 8, 64)    0           ['batchnorm2[0][0]']             
                                                                                                  
 reshape (Reshape)              (None, 32, 512)      0           ['pool2[0][0]']                  
                                                                                                  
 dense1 (Dense)                 (None, 32, 64)       32832       ['reshape[0][0]']                
                                                                                                  
 dropout_3 (Dropout)            (None, 32, 64)       0           ['dense1[0][0]']                 
                                                                                                  
 bidirectional_9 (Bidirectional  (None, 32, 256)     197632      ['dropout_3[0][0]']              
 )                                                                                                
                                                                                                  
 bidirectional_10 (Bidirectiona  (None, 32, 256)     394240      ['bidirectional_9[0][0]']        
 l)                                                                                               
                                                                                                  
 bidirectional_11 (Bidirectiona  (None, 32, 128)     164352      ['bidirectional_10[0][0]']       
 l)                                                                                               
                                                                                                  
 label (InputLayer)             [(None, None)]       0           []                               
                                                                                                  
 dense2 (Dense)                 (None, 32, 85)       10965       ['bidirectional_11[0][0]']       
                                                                                                  
 ctc_loss (CTCLayer)            (None, 32, 85)       0           ['label[0][0]',                  
                                                                  'dense2[0][0]']                 
                                                                                                  
==================================================================================================
Total params: 856,149
Trainable params: 855,957
Non-trainable params: 192
__________________________________________________________________________________________________

optimizer = Adam
batch_size = 64
total_dataset = 100,000+ 
activation = relu

Solution

To answer the first query:

Yes, it is common for a loss value to start higher than it was in the previous epoch. During each epoch, your model is trained on different batches of data, and the loss is accumulated or averaged (depends on your loss function) over these batches. At the end of the epoch, you observe the loss over the entire dataset. At the start of the next epoch, you observe the loss over the first batch of the dataset that your model is training on.

Your dataset (ideally) follows a general pattern that you want your model to learn. A batch of your dataset will likely contain a sub-pattern out of the general pattern. At the end of an epoch, given that your model has been exposed to the entire dataset before, it will be better optimized to predict the general pattern of your data than a sub-pattern. Therefore, the loss on the batch/ data containing the sub-pattern will be higher.

For the second question:

It's hard to say if a certain numerical value of loss will be good or bad for a network, since your validation loss will depend on many factors. These include what loss function you are using, how many data points were used to compute the loss, and so on. The numerical value of your loss should not matter as long as your model meets the performance criteria you define in your evaluation metric.

Answered By - Ali Haider

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, March 12, 2022

[FIXED] Higher starting loss compared to previous epoch

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels