Issue
I am trying to build a machine learning model which predicts a single number from a series of numbers. I am using an LSTM model with Tensorflow.
You can imagine my dataset to look something like this:
Index | x data | y data |
---|---|---|
0 | np.array(shape (10000,1) ) |
numpy.float32 |
1 | np.array(shape (10000,1) ) |
numpy.float32 |
2 | np.array(shape (10000,1) ) |
numpy.float32 |
... | ... | ... |
56 | np.array(shape (10000,1) ) |
numpy.float32 |
Easily said I just want my model to predict a number (y data) from a sequence of numbers (x data).
For example like this:
- array([3.59280851, 3.60459062, 3.60459062, ...]) => 2.8989773
- array([3.54752101, 3.56740332, 3.56740332, ...]) => 3.0893357
- ...
x and y data
From my x data I created a numpy array x_train
which I want to use to train the network.
Because I am using an LSTM network, x_train should be of shape (samples, time_steps, features).
I reshaped my x_train array to be shaped like this: (57, 10000, 1), because I have 57 samples, which each are of length 10000 and contain a single number.
The y data was created similarly and is of shape (57,1) because, once again, I have 57 samples which each contain a single number as the desired y output.
Current model attempt
My model summary looks like this:
The model was compiled with model.compile(loss="mse", optimizer="adam")
so my loss function is simply the mean squared error and as an optimizer I'm using Adam.
Current results
Training of the model works fine and I can see that the loss and validation loss decreases after some epochs. The actual problem occurs when I want to predict some data y_verify from some data x_verify. I do this after the training is finished to determine how well the model is trained. In the following example I simply used the data I used for training to determine how well the model is trained (I know about overfitting and that verifying with the training set is not the right way of doing it, but that is not the problem I want to demonstrate right not).
In the following graph you can see the y data I provided to the model in blue.
The orange line is the result of calling model.predict(x_verify)
where x_verify is of the same shape as x_train.
I also calculated the mean absolute percentage error (MAPE) of my prediction and the actual data and it came out to be around 4% which is not bad, because I only trained for 40 epochs. But this result still is not helpful at all because as you can see in the graph above the curves do not match at all.
Question:
What is going on here?
Am I using an incorrect loss function?
Why does it seem like the model tries to predict a single value for all samples rather than predicting a different value for all samples like it's supposed to be?
Ideally the prediction should be the y data which I provided so the curves should look the same (more or less).
Do you have any ideas?
Thanks! :)
Solution
After some back and forth in the comments, I'll give my best estimation to your questions:
What is going on here?
Very complex (too many layers deep) model with very little data, trained for too few epochs on non-normalized data (credit to Muhammad in his answer). The biggest issue, as far as I can tell, is the number of training epochs.
Am I using an incorrect loss function?
MSE is an appropriate loss function for a regression task.
Why does it seem like the model tries to predict a single value for all samples rather than predicting a different value for all samples like it's supposed to be? Ideally the prediction should be the y data which I provided so the curves should look the same (more or less). Do you have any ideas?
Too few training epochs is the biggest contributor, as far as I can tell.
Based on the collab notebook that Luca shared:
30 Epochs, no normalization
Way off target, flat predictions (though I can't reproduce how flat the predictions are that Luca posted)
30 Epochs, with normalization
Worse off.
2000(!) epochs, no normalization
Okay, now the predictions are at least in the ballpark
2000 epochs, with normalization
And now the model seems to be starting to figure things out, like we'd hope it should. Given, this is training on the 11 samples that were cobbled together in the notebook, so it's naturally going to overfit. We're just happy to see it learn something.
2000 epochs, normalization, different loss
Never be afraid to try out different losses, as some may be better suited than others. Not knowing the domain of this task, I'm just trying out mean_absolute_error
instead of mean_squared_error
.
Caution! Don't compare loss values between different losses. They're not on the same scale.
2000 epochs, normalization, larger learning rate
Okay, so it's taking a long time to learn. Can I nudge it along a little faster? Sure, up the learning rate of the optimizer, and it'll get you to where you're going faster. Here, we up it by a factor of 5.
model.compile(loss="mse", optimizer=tf.keras.optimizers.Adam(learning_rate=0.005))
You could even employ a learning rate scheduler that starts big and slowly diminishes it over the course of epochs.
def scheduler(epoch, lr):
if epoch < 400:
return lr
else:
return lr * tf.math.exp(-0.01)
lrs = tf.keras.callbacks.LearningRateScheduler(scheduler)
history = model.fit(x=x_train, y=y_train, epochs=1000, callbacks=[lrs])
Hope this all helps!
Answered By - Bryan Dannowitz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.