Issue
I am trying to move a sequential neural network from the time series tutorial on the Tensorflow website to a functional API one (https://www.tensorflow.org/tutorials/structured_data/time_series#single-shot_models).
The tutorial code is as follows:
multi_dense_model = tf.keras.Sequential()
multi_dense_model.add(tf.keras.layers.Input(shape=(24, 19)))
multi_dense_model.add(tf.keras.layers.Lambda(lambda x: x[:, -1:, :]))
multi_dense_model.add(tf.keras.layers.Dense(512, activation='relu'))
multi_dense_model.add(tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()))
multi_dense_model.add(tf.keras.layers.Reshape([OUT_STEPS, num_features]))
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, mode='min')
multi_dense_model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(), metrics=[tf.keras.metrics.MeanAbsoluteError()])
history = multi_dense_model.fit(multi_window.train, epochs=MAX_EPOCHS, validation_data=multi_window.val, callbacks=[early_stopping])
Where I get the following result:
Epoch 1/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.2391 - mean_absolute_error: 0.3012 - val_loss: 0.2272 - val_mean_absolute_error: 0.2895
Epoch 2/20
1532/1532 [==============================] - 9s 6ms/step - loss: 0.2226 - mean_absolute_error: 0.2850 - val_loss: 0.2283 - val_mean_absolute_error: 0.2908
Epoch 3/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.2192 - mean_absolute_error: 0.2820 - val_loss: 0.2230 - val_mean_absolute_error: 0.2847
Epoch 4/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.2166 - mean_absolute_error: 0.2798 - val_loss: 0.2212 - val_mean_absolute_error: 0.2836
Epoch 5/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.2144 - mean_absolute_error: 0.2780 - val_loss: 0.2189 - val_mean_absolute_error: 0.2809
Epoch 6/20
1532/1532 [==============================] - 9s 6ms/step - loss: 0.2131 - mean_absolute_error: 0.2768 - val_loss: 0.2196 - val_mean_absolute_error: 0.2812
Epoch 7/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.2118 - mean_absolute_error: 0.2759 - val_loss: 0.2193 - val_mean_absolute_error: 0.2827
437/437 [==============================] - 2s 4ms/step - loss: 0.2193 - mean_absolute_error: 0.2827
Now I changed the code to the functional API:
input1 = tf.keras.layers.Input(shape=(24, 19))
lamb1 = tf.keras.layers.Lambda(lambda x: x[:, -1:, :])(input1)
dense1 = tf.keras.layers.Dense(512, activation='relu')(lamb1)
dense2 = tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros(), activation='relu')(dense1)
resha1 = tf.keras.layers.Reshape([OUT_STEPS, num_features])(dense2)
multi_dense_model = tf.keras.models.Model(inputs=input1, outputs=resha1)
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, mode='min')
multi_dense_model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(), metrics=[tf.keras.metrics.MeanAbsoluteError()])
history = multi_dense_model.fit(multi_window.train, epochs=MAX_EPOCHS, validation_data=multi_window.val, callbacks=[early_stopping])
multi_val_performance['Dense'] = multi_dense_model.evaluate(multi_window.val)
multi_performance['Dense'] = multi_dense_model.evaluate(multi_window.test, verbose=0)
multi_window.plot(multi_dense_model)
And get this:
Epoch 1/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 2/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 3/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 4/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 5/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 6/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 7/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Any idea why this might be? I tried a lot of things but cant get it to match. Also model.summary prints pretty much the same for both (Sequential is always ignoring the Input Layer but I think that does not make a difference since you have to specify for Models Input)
This is the complete code I am using in case you want to copy paste
import os
import datetime
import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False
zip_path = tf.keras.utils.get_file(
origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip',
fname='jena_climate_2009_2016.csv.zip',
extract=True)
csv_path, _ = os.path.splitext(zip_path)
df = pd.read_csv(csv_path)
# Slice [start:stop:step], starting from index 5 take every 6th record.
df = df[5::6]
date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y %H:%M:%S')
wv = df['wv (m/s)']
bad_wv = wv == -9999.0
wv[bad_wv] = 0.0
max_wv = df['max. wv (m/s)']
bad_max_wv = max_wv == -9999.0
max_wv[bad_max_wv] = 0.0
# The above inplace edits are reflected in the DataFrame.
df['wv (m/s)'].min()
wv = df.pop('wv (m/s)')
max_wv = df.pop('max. wv (m/s)')
# Convert to radians.
wd_rad = df.pop('wd (deg)')*np.pi / 180
# Calculate the wind x and y components.
df['Wx'] = wv*np.cos(wd_rad)
df['Wy'] = wv*np.sin(wd_rad)
# Calculate the max wind x and y components.
df['max Wx'] = max_wv*np.cos(wd_rad)
df['max Wy'] = max_wv*np.sin(wd_rad)
timestamp_s = date_time.map(pd.Timestamp.timestamp)
day = 24*60*60
year = (365.2425)*day
df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day))
df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day))
df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year))
df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))
column_indices = {name: i for i, name in enumerate(df.columns)}
n = len(df)
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]
num_features = df.shape[1]
train_mean = train_df.mean()
train_std = train_df.std()
train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std
df_std = (df - train_mean) / train_std
df_std = df_std.melt(var_name='Column', value_name='Normalized')
class WindowGenerator():
def __init__(self, input_width, label_width, shift,
train_df=train_df, val_df=val_df, test_df=test_df,
label_columns=None):
# Store the raw data.
self.train_df = train_df
self.val_df = val_df
self.test_df = test_df
# Work out the label column indices.
self.label_columns = label_columns
if label_columns is not None:
self.label_columns_indices = {name: i for i, name in
enumerate(label_columns)}
self.column_indices = {name: i for i, name in
enumerate(train_df.columns)}
# Work out the window parameters.
self.input_width = input_width
self.label_width = label_width
self.shift = shift
self.total_window_size = input_width + shift
self.input_slice = slice(0, input_width)
self.input_indices = np.arange(self.total_window_size)[self.input_slice]
self.label_start = self.total_window_size - self.label_width
self.labels_slice = slice(self.label_start, None)
self.label_indices = np.arange(self.total_window_size)[self.labels_slice]
def __repr__(self):
return '\n'.join([
f'Total window size: {self.total_window_size}',
f'Input indices: {self.input_indices}',
f'Label indices: {self.label_indices}',
f'Label column name(s): {self.label_columns}'])
def make_dataset(self, data):
data = np.array(data, dtype=np.float32)
ds = tf.keras.utils.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=self.total_window_size,
sequence_stride=1,
shuffle=True,
batch_size=32,)
ds = ds.map(self.split_window)
return ds
WindowGenerator.make_dataset = make_dataset
@property
def train(self):
return self.make_dataset(self.train_df)
@property
def val(self):
return self.make_dataset(self.val_df)
@property
def test(self):
return self.make_dataset(self.test_df)
@property
def example(self):
"""Get and cache an example batch of `inputs, labels` for plotting."""
result = getattr(self, '_example', None)
if result is None:
# No example batch was found, so get one from the `.train` dataset
result = next(iter(self.train))
# And cache it for next time
self._example = result
return result
WindowGenerator.train = train
WindowGenerator.val = val
WindowGenerator.test = test
WindowGenerator.example = example
def split_window(self, features):
inputs = features[:, self.input_slice, :]
labels = features[:, self.labels_slice, :]
if self.label_columns is not None:
labels = tf.stack(
[labels[:, :, self.column_indices[name]] for name in self.label_columns],
axis=-1)
# Slicing doesn't preserve static shape information, so set the shapes
# manually. This way the `tf.data.Datasets` are easier to inspect.
inputs.set_shape([None, self.input_width, None])
labels.set_shape([None, self.label_width, None])
return inputs, labels
WindowGenerator.split_window = split_window
OUT_STEPS = 24
multi_window = WindowGenerator(input_width=24,
label_width=OUT_STEPS,
shift=OUT_STEPS)
MAX_EPOCHS = 20
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
# multi_dense_model = tf.keras.Sequential()
# multi_dense_model.add(tf.keras.layers.Input(shape=(24, 19)))
# multi_dense_model.add(tf.keras.layers.Lambda(lambda x: x[:, -1:, :]))
# multi_dense_model.add(tf.keras.layers.Dense(512, activation='relu'))
# multi_dense_model.add(tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()))
# multi_dense_model.add(tf.keras.layers.Reshape([OUT_STEPS, num_features]))
input1 = tf.keras.layers.Input(shape=(24, 19))
lamb1 = tf.keras.layers.Lambda(lambda x: x[:, -1:, :])(input1)
dense1 = tf.keras.layers.Dense(512, activation='relu')(lamb1)
dense2 = tf.keras.layers.Dense(OUT_STEPS*num_features, activation='relu')(dense1)
resha1 = tf.keras.layers.Reshape([OUT_STEPS, num_features])(dense2)
multi_dense_model = tf.keras.models.Model(inputs=input1, outputs=resha1)
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, mode='min')
multi_dense_model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(), metrics=[tf.keras.metrics.MeanAbsoluteError()], run_eagerly=True)
history = multi_dense_model.fit(multi_window.train, epochs=MAX_EPOCHS, validation_data=multi_window.val, callbacks=[early_stopping])
Solution
Most likely because you are applying 2 non linearity:
#Sequential
multi_dense_model.add(tf.keras.layers.Dense(512, activation='relu'))
multi_dense_model.add(tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()))
#Functional
dense1 = tf.keras.layers.Dense(512, activation='relu')(lamb1)
dense2 = tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros(), activation='relu')(dense1)
# scroll right -------------------------------> ^^^^^^^^^^^^^^^^^
And by definition, Dense
layer with no activation, becomes linear layer... So the two models are not equivalent
Answered By - Alberto Sinigaglia
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.