Issue
I am training a neural net in Keras. During training of the first epoch the loss value returns and then suddenly goes loss: nan
before the first epoch ends, significantly dropping the accuracy. Then starting the second epoch the loss: nan
continues but the accuracy is 0. This goes on for the rest of the epochs.
The frustrating bit is that there seems to be no consistency in the output for each time I train. As to say, the loss: nan
shows up at different points in the first epoch.
There have been a couple of questions on this website that give "guides" to problems similar to this I just haven't seen one done so explicitly in keras. I am trying to get my neural network to classify a 1 or a 0.
Here are some things I have done, post-ceding this will be my output and code.
Standardization // Normalization
I posted a question about my data here. I was able to figure it out and perform sklearn's StandardScaler()
and MinMaxScaler()
on my dataset. Both standardization and normalization methods did not help my issue.
Learning Rate
The optimizers I have tried are adam
and SGD
. In both cases I tried lowering the standard learning rate to see if that would help and in both cases. Same issue arose.
Activations
I thought that it was pretty standard to use relu
but I saw on the internet somewhere someone talking about using tanh
, tried it, no dice.
Batch Size
Tried 32, 50, 128, 200. 50 got me the farthest into the 1st epoch, everything else didn't help.
Combating Overfitting
Put a dropout layer in and tried a whole bunch of numbers.
Other Observations
- The epochs train really really fast for the dimensions of the data (I could be wrong).
loss: nan
could have something to do with my loss function beingbinary_crossentropy
and maybe some values are giving that loss function a hard time.kernel_initializer='uniform'
has been untouched and unconsidered in my quest to figure this out.- The internet also told me that there could be a
nan
value in my data but I think that was for an error that broke their script.
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler()
X_train_total_scale = sc.fit_transform((X_train))
X_test_total_scale = sc.transform((X_test))
print(X_train_total_scale.shape) #(4140, 2756)
print(y_train.shape) #(4140,)
##NN
#adam = keras.optimizers.Adam(lr= 0.0001)
sgd = optimizers.SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True)
classifier = Sequential()
classifier.add(Dense(output_dim = 1379, kernel_initializer='uniform', activation='relu', input_dim=2756))
classifier.add(Dropout(0.6))
classifier.add(Dense(output_dim = 1379, kernel_initializer='uniform', activation='relu'))
classifier.add(Dense(output_dim = 1, kernel_initializer='uniform', activation='sigmoid'))
classifier.compile(optimizer=sgd, loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit(X_train_total_scale, y_train, validation_data=(X_test_total_scale, y_test), batch_size=50, epochs=100)
(batch size 200 shown to avoid too-big-a text block)
200/4140 [>.............................] - ETA: 7s - loss: 0.6866 - acc: 0.5400
400/4140 [=>............................] - ETA: 4s - loss: 0.6912 - acc: 0.5300
600/4140 [===>..........................] - ETA: 2s - loss: nan - acc: 0.5300
800/4140 [====>.........................] - ETA: 2s - loss: nan - acc: 0.3975
1000/4140 [======>.......................] - ETA: 1s - loss: nan - acc: 0.3180
1200/4140 [=======>......................] - ETA: 1s - loss: nan - acc: 0.2650
1400/4140 [=========>....................] - ETA: 1s - loss: nan - acc: 0.2271
1600/4140 [==========>...................] - ETA: 1s - loss: nan - acc: 0.1987
1800/4140 [============>.................] - ETA: 1s - loss: nan - acc: 0.1767
2000/4140 [=============>................] - ETA: 0s - loss: nan - acc: 0.1590
2200/4140 [==============>...............] - ETA: 0s - loss: nan - acc: 0.1445
2400/4140 [================>.............] - ETA: 0s - loss: nan - acc: 0.1325
2600/4140 [=================>............] - ETA: 0s - loss: nan - acc: 0.1223
2800/4140 [===================>..........] - ETA: 0s - loss: nan - acc: 0.1136
3000/4140 [====================>.........] - ETA: 0s - loss: nan - acc: 0.1060
3200/4140 [======================>.......] - ETA: 0s - loss: nan - acc: 0.0994
3400/4140 [=======================>......] - ETA: 0s - loss: nan - acc: 0.0935
3600/4140 [=========================>....] - ETA: 0s - loss: nan - acc: 0.0883
3800/4140 [==========================>...] - ETA: 0s - loss: nan - acc: 0.0837
4000/4140 [===========================>..] - ETA: 0s - loss: nan - acc: 0.0795
4140/4140 [==============================] - 2s 368us/step - loss: nan - acc: 0.0768 - val_loss: nan - val_acc: 0.0000e+00
Epoch 2/100
200/4140 [>.............................] - ETA: 1s - loss: nan - acc: 0.0000e+00
400/4140 [=>............................] - ETA: 0s - loss: nan - acc: 0.0000e+00
600/4140 [===>..........................] - ETA: 0s - loss: nan - acc: 0.0000e+00
800/4140 [====>.........................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1000/4140 [======>.......................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1200/4140 [=======>......................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1400/4140 [=========>....................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1600/4140 [==========>...................] - ETA: 0s - loss: nan - acc: 0.0000e+00
... and so on...
I hope to be able to get a full training done (duh) but I would also like to learn about some of the intuition people have to figure out these problems on their own!
Solution
Firstly, check for NaNs or inf in your dataset.
You could try different optimizers, e.g. rmsprop. Learning rate could be smaller, though I haven't used anything lower than 0.0001 (which is what you're using) myself.
I thought that it was pretty standard to use relu but I saw on the internet somewhere someone talking about using tanh, tried it, no dice
Try leaky relu, elu if you're concerned about this.
Answered By - joek47
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.