Issue
Let's assume I have a dataframe with several features, like humidity, pressure, and so on. One of these columns, would be temperature.
At each row, I have the data for one day. I would like to predict the temperature for the next day, with past data only.
How would I shape the dataframe so that it could be used in a RNN with Keras?
Solution
Let's assume you have the following data structure and we want to predict the temperature given 1 day in the past:
import tensorflow as tf
import pandas as pd
import numpy as np
df = pd.DataFrame(data={
'temperature': np.random.random((1, 20)).ravel(),
'pressure': np.random.random((1, 20)).ravel(),
'humidity': np.random.random((1, 20)).ravel(),
'wind': np.random.random((1, 20)).ravel()
})
print(df.to_markdown())
temperature | pressure | humidity | wind | |
---|---|---|---|---|
0 | 0.0589101 | 0.278302 | 0.875369 | 0.622687 |
1 | 0.594924 | 0.797274 | 0.510012 | 0.374484 |
2 | 0.511291 | 0.334929 | 0.401483 | 0.77062 |
3 | 0.711329 | 0.72051 | 0.595685 | 0.872691 |
4 | 0.495425 | 0.520179 | 0.516858 | 0.628928 |
5 | 0.676054 | 0.67902 | 0.0213801 | 0.0267594 |
6 | 0.058189 | 0.69932 | 0.885174 | 0.00602091 |
7 | 0.708245 | 0.871698 | 0.345451 | 0.448352 |
8 | 0.958427 | 0.471423 | 0.412678 | 0.618024 |
9 | 0.941202 | 0.825181 | 0.211916 | 0.0808273 |
10 | 0.49252 | 0.541955 | 0.00522009 | 0.396557 |
11 | 0.323757 | 0.113585 | 0.797503 | 0.323961 |
12 | 0.819055 | 0.637116 | 0.285361 | 0.569794 |
13 | 0.95123 | 0.00604303 | 0.208746 | 0.150214 |
14 | 0.89466 | 0.948916 | 0.556422 | 0.555165 |
15 | 0.705789 | 0.269704 | 0.289568 | 0.391438 |
16 | 0.154502 | 0.703137 | 0.184157 | 0.765623 |
17 | 0.25974 | 0.934706 | 0.172775 | 0.412022 |
18 | 0.403475 | 0.144796 | 0.0224043 | 0.891236 |
19 | 0.922302 | 0.805214 | 0.0232178 | 0.951568 |
The first thing we have to do is separate the data into features and labels:
features = df.iloc[::2, :] # Get every first row
labels = df.iloc[1::2, :] # Get every second row since we want to predict the temperature given 1 day in the past
Features:
temperature | pressure | humidity | wind | |
---|---|---|---|---|
0 | 0.0589101 | 0.278302 | 0.875369 | 0.622687 |
2 | 0.511291 | 0.334929 | 0.401483 | 0.77062 |
4 | 0.495425 | 0.520179 | 0.516858 | 0.628928 |
6 | 0.058189 | 0.69932 | 0.885174 | 0.00602091 |
8 | 0.958427 | 0.471423 | 0.412678 | 0.618024 |
10 | 0.49252 | 0.541955 | 0.00522009 | 0.396557 |
12 | 0.819055 | 0.637116 | 0.285361 | 0.569794 |
14 | 0.89466 | 0.948916 | 0.556422 | 0.555165 |
16 | 0.154502 | 0.703137 | 0.184157 | 0.765623 |
18 | 0.403475 | 0.144796 | 0.0224043 | 0.891236 |
Labels:
temperature | pressure | humidity | wind | |
---|---|---|---|---|
1 | 0.594924 | 0.797274 | 0.510012 | 0.374484 |
3 | 0.711329 | 0.72051 | 0.595685 | 0.872691 |
5 | 0.676054 | 0.67902 | 0.0213801 | 0.0267594 |
7 | 0.708245 | 0.871698 | 0.345451 | 0.448352 |
9 | 0.941202 | 0.825181 | 0.211916 | 0.0808273 |
11 | 0.323757 | 0.113585 | 0.797503 | 0.323961 |
13 | 0.95123 | 0.00604303 | 0.208746 | 0.150214 |
15 | 0.705789 | 0.269704 | 0.289568 | 0.391438 |
17 | 0.25974 | 0.934706 | 0.172775 | 0.412022 |
19 | 0.922302 | 0.805214 | 0.0232178 | 0.951568 |
Since you are only interested in predicting the temperature, we can remove the other features from the labels and convert both to arrays:
features = features.to_numpy() # shape (10, 4)
labels = labels['temperature'].to_numpy() # shape (10,)
features = np.expand_dims(features, axis=1) # shape (10, 1, 4)
Note that a time dimension is added to features
, which essentially means that each sample in the dataset represents one timestep (one day) and for each timestep there are 4 features (temperature, pressure, humidity, wind).
Building and running a RNN model:
inputs = tf.keras.layers.Input(shape=(features.shape[1], features.shape[2]))
rnn_out = tf.keras.layers.SimpleRNN(32)(inputs)
outputs = tf.keras.layers.Dense(1)(rnn_out) # one output = temperature
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss="mse")
model.summary()
history = model.fit(features, labels, batch_size=2, epochs=3)
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 1, 4)] 0
simple_rnn (SimpleRNN) (None, 32) 1184
dense_1 (Dense) (None, 1) 33
=================================================================
Total params: 1,217
Trainable params: 1,217
Non-trainable params: 0
_________________________________________________________________
Epoch 1/3
5/5 [==============================] - 1s 9ms/step - loss: 0.7859
Epoch 2/3
5/5 [==============================] - 0s 7ms/step - loss: 0.5862
Epoch 3/3
5/5 [==============================] - 0s 6ms/step - loss: 0.4354
Make predictions like this:
samples = 1
model.predict(tf.random.normal((samples, 1, 4)))
# array([[-1.610171]], dtype=float32)
You can also consider normalizing your data before training like this:
# You usually also normalize your data before training
mean = df.mean(axis=0)
std = df.std(axis=0)
df = df - mean / std
And that's about it.
Answered By - AloneTogether
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.