Issue
I am currently using code from https://keras.io/examples/vision/handwriting_recognition/ which is a tutorial on text recognition. I am using a local dataset to test the model. And during my experiments I have encountered something which made me question.
1.) Is it normal for a loss value to start at a higher value than the previous loss? If not what could be the cause of this and how can I prevent this?
2.) Is a val_loss of 1 good enough for bi-LSTM networks? If not how can I lessen the loss?
Here is the snippet of two consecutive epochs.
1520/1520 [==============================] - 735s 484ms/step - loss: 2.5462 - val_loss: 2.7302
Epoch 12/100
443/1520 [=======>......................] - ETA: 8:18 - loss: 3.9221
Below is the summary of the current model
Layer (type) Output Shape Param # Connected to
==================================================================================================
image (InputLayer) [(None, 128, 32, 1) 0 []
]
Conv1 (Conv2D) (None, 128, 32, 32) 320 ['image[0][0]']
batchnorm1 (BatchNormalization (None, 128, 32, 32) 128 ['Conv1[0][0]']
)
pool1 (MaxPooling2D) (None, 64, 16, 32) 0 ['batchnorm1[0][0]']
Conv2 (Conv2D) (None, 64, 16, 64) 18496 ['pool1[0][0]']
Conv3 (Conv2D) (None, 64, 16, 64) 36928 ['Conv2[0][0]']
batchnorm2 (BatchNormalization (None, 64, 16, 64) 256 ['Conv3[0][0]']
)
pool2 (MaxPooling2D) (None, 32, 8, 64) 0 ['batchnorm2[0][0]']
reshape (Reshape) (None, 32, 512) 0 ['pool2[0][0]']
dense1 (Dense) (None, 32, 64) 32832 ['reshape[0][0]']
dropout_3 (Dropout) (None, 32, 64) 0 ['dense1[0][0]']
bidirectional_9 (Bidirectional (None, 32, 256) 197632 ['dropout_3[0][0]']
)
bidirectional_10 (Bidirectiona (None, 32, 256) 394240 ['bidirectional_9[0][0]']
l)
bidirectional_11 (Bidirectiona (None, 32, 128) 164352 ['bidirectional_10[0][0]']
l)
label (InputLayer) [(None, None)] 0 []
dense2 (Dense) (None, 32, 85) 10965 ['bidirectional_11[0][0]']
ctc_loss (CTCLayer) (None, 32, 85) 0 ['label[0][0]',
'dense2[0][0]']
==================================================================================================
Total params: 856,149
Trainable params: 855,957
Non-trainable params: 192
__________________________________________________________________________________________________
optimizer = Adam
batch_size = 64
total_dataset = 100,000+
activation = relu
Solution
To answer the first query:
Yes, it is common for a loss value to start higher than it was in the previous epoch. During each epoch, your model is trained on different batches of data, and the loss is accumulated or averaged (depends on your loss function) over these batches. At the end of the epoch, you observe the loss over the entire dataset. At the start of the next epoch, you observe the loss over the first batch of the dataset that your model is training on.
Your dataset (ideally) follows a general pattern that you want your model to learn. A batch of your dataset will likely contain a sub-pattern out of the general pattern. At the end of an epoch, given that your model has been exposed to the entire dataset before, it will be better optimized to predict the general pattern of your data than a sub-pattern. Therefore, the loss on the batch/ data containing the sub-pattern will be higher.
For the second question:
It's hard to say if a certain numerical value of loss will be good or bad for a network, since your validation loss will depend on many factors. These include what loss function you are using, how many data points were used to compute the loss, and so on. The numerical value of your loss should not matter as long as your model meets the performance criteria you define in your evaluation metric.
Answered By - Ali Haider
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.