Issue
I would like to utilize the best among various Design of Experiments (DOEs) for predicting my data. For this purpose, I have used an Optuna hyperparameter optimizer and programmed it as described in the code. It performs cross-validation and then conducts an additional 10 iterations per step to provide an accurate Mean Squared Error (MSE) and optimize towards it. As a result, I am given a layer with approximately 25 neurons. In my actual model, this configuration yields good MSE and R_test / R_total values. However, when predicting on a different dataset, I observe a deviation of over 50%. I have attached my model and Hyperparameter Optimization (HPO) details. I would greatly appreciate constructive guidance. My datasets consist of 16 data points with 5 features (X) and a single output value (y).
HPO:
from sklearn.model_selection import train_test_split , cross_val_score
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import MinMaxScaler
import optuna
import numpy as np
import sqlite3
import pandas as pd
# Read data from "Versuchspläne_final.xlsx" for modeling
data_df = pd.read_excel(r"cVersuche_V01_FF.xlsx", sheet_name="Probentabelle", header=1)
# Assuming 'X' columns are ['Temperatur', 'Anteil PP505', 'Drehzahl', 'Anteil Peroxid']
X_noscalar = data_df[['M%1', 'M%2','M%3', 'Drehzahl n/min-1', 'Endzonentemperatur °C']]
y = data_df[['MVR MW']].values.ravel()
# Read scaling parameters from "alle_Werte.xlsx"
scaling_data = pd.read_excel(r"c:.xlsx", sheet_name="Probentabelle", header=1)
scaler_y = MinMaxScaler()
scaler_x = MinMaxScaler()
scaler_y.fit(scaling_data[['MVR MW']]) # Assuming 'MVR' is the target variable in alle_Werte.xlsx
scaler_x.fit(scaling_data[['M%1', 'M%2','M%3', 'Drehzahl n/min-1', 'Endzonentemperatur °C']])
# Apply the previously calculated scaling to 'X' data
X = scaler_x.transform(X_noscalar)
y_scaled = scaler_y.transform(y.reshape(-1, 1)).ravel()
num_splits = 5
shuffled_indices = [train_test_split(range(len(X)), test_size=0.2, shuffle=True) for _ in range(num_splits)]
def objective(trial):
hidden_layer_sizes = tuple([trial.suggest_int(f'n_units_layer_{i}', 1, 100) for i in range(trial.suggest_int('n_layers', 1, 2))])
alpha = trial.suggest_float('alpha', 0.0001, 0.1, log=True)
learning_rate_init = trial.suggest_float('learning_rate_init', 0.001, 0.1, log=True)
learning_rate = trial.suggest_categorical('learning_rate', ['constant', 'invscaling', 'adaptive'])
mean_mse_test = 0
mean_mse_value = 0
num_iterations = 10
for _ in range(num_iterations):
for indices in shuffled_indices:
train_indices, test_indices = indices
X_train, X_test = X[train_indices], X[test_indices]
y_train, y_test = y_scaled[train_indices], y_scaled[test_indices]
model = MLPRegressor(
hidden_layer_sizes=hidden_layer_sizes,
alpha=alpha, max_iter=1000,
learning_rate=learning_rate,
learning_rate_init=learning_rate_init,
solver='lbfgs',
early_stopping=True,
validation_fraction=0.1, # Der Anteil der Daten, der für die Validierung verwendet wird
n_iter_no_change=10, # Anzahl der Iterationen ohne Verbesserung auf der Validierungsmetrik, bevor das Training gestoppt wird
tol=1e-3, # Toleranz für frühzeitiges Stoppen, wenn die Verbesserung kleiner als die Toleranz ist
verbose=True )
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse_test = mean_squared_error(y_test, y_pred)
mean_mse_test += mse_test
mean_mse_test /= num_splits
return mean_mse_test
mean_mse_value /= num_iterations
return mean_mse_value
if __name__ == "__main__":
study = optuna.create_study(study_name="mlp_hyperparam_opt4", storage="sqlite:///mlp_optuna.db", load_if_exists=True)
study.optimize(objective, n_trials=100)
# Beste Hyperparameter und Ergebnisse anzeigen
print("Best trial:")
trial = study.best_trial
print("Value: ", trial.value)
print("Params: ")
for key, value in trial.params.items():
print(f" {key}: {value}")
if __name__ == "__main__":
study = optuna.create_study(study_name="mlp_hyperparam_opt_ff", storage="sqlite:///mlp_optuna.db", load_if_exists=True)
study.optimize(objective, n_trials=300)
# Die besten 10 Trials erhalten
top_trials = study.trials[:10]
# Die besten 10 Trials und ihre Parameter anzeigen
for i, trial in enumerate(top_trials, 1):
print(f"Top {i} trial:")
print("Value:", trial.value)
print("Params:")
for key, value in trial.params.items():
print(f" {key}: {value}")
Model:
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import numpy as np
import joblib
import pandas as pd
import numpy as np
# Read data from "alle_Werte.xlsx" to calculate scaling parameters
scaling_data = pd.read_excel(r"c:\Versuche_V01_gesamt.xlsx", sheet_name="Probentabelle", header=1)
scaler_y = MinMaxScaler()
scaler_x = MinMaxScaler()
scaler_y.fit(scaling_data[['MVR MW']]) # Assuming 'MVR' is the target variable in alle_Werte.xlsx
scaler_x.fit(scaling_data[['M%1', 'M%2','M%3', 'Drehzahl n/min-1', 'Endzonentemperatur °C']])
# Read data from "Versuchspläne_final.xlsx" for modeling
data_df = pd.read_excel(r"c:Versuche_V01_FF.xlsx", sheet_name="Probentabelle", header=1)
# Assuming 'X' columns are ['Temperatur', 'Anteil PP505', 'Drehzahl', 'Anteil Peroxid']
X_noscalar = data_df[['M%1', 'M%2','M%3', 'Drehzahl n/min-1', 'Endzonentemperatur °C']]
y = data_df[['MVR MW']].values.ravel()
#print(X_noscalar)
# Apply the previously calculated scaling to 'X' data from "Versuchspläne_final.xlsx"
X = scaler_x.transform(X_noscalar)
# Scale the target variable 'y' using the scaler for 'y'
y_scaled = scaler_y.transform(y.reshape(-1, 1)).ravel()
# Aufteilen der Daten in Trainings- und Testsets
X_train, X_test, y_train, y_test = train_test_split(X, y_scaled, test_size=0.2, random_state=12)
# Anzahl der Aufteilungen für die Cross-Validation
num_splits = 5
# Initialisiere Listen, um die Ergebnisse für jedes Split zu speichern
mse_scores = []
r2_test_scores = []
# Führe die Cross-Validation durch
for _ in range(num_splits):
# Aufteilen der Daten in Trainings- und Testsets
X_train, X_test, y_train, y_test = train_test_split(X, y_scaled, test_size=0.2, random_state=1) # Random_state=None für zufällige Aufteilung
# Initialize and train the MLP model
model = MLPRegressor(
hidden_layer_sizes=(3),
#max_iter=2000,
alpha=0.019149003457879503,
learning_rate='constant',
learning_rate_init=0.0451257591571734,
solver='lbfgs',
early_stopping=True,
validation_fraction=0.1, # Der Anteil der Daten, der für die Validierung verwendet wird
n_iter_no_change=10, # Anzahl der Iterationen ohne Verbesserung auf der Validierungsmetrik, bevor das Training gestoppt wird
tol=1e-3, # Toleranz für frühzeitiges Stoppen, wenn die Verbesserung kleiner als die Toleranz ist
verbose=True
)
model.fit(X_train, y_train)
# Predictions für den Testdatensatz
y_pred_test = model.predict(X_test)
y_pred_train = model.predict(X_train)
y_pred = model.predict(X)
# Berechnung des Mean Squared Error (MSE)
#mse = mean_squared_error(y_test, y_pred_test)
#mse_scores.append(mse)
# Berechnung des R2-Werts für den Testdatensatz hier mit train test
r2_test = r2_score(y_test, y_pred_test)
r2_test_scores.append(r2_test)
print('r2_test',r2_test)
r2_train = r2_score(y_train, y_pred_train)
#r2_train_scores.append(r2_train)
print('R2_train: ',r2_train)
r2 = r2_score(y_scaled, y_pred)
#r2_train_scores.append(r2_train)
print('R2: ',r2)
# Ergebnisse ausgeben
mse_scores=mean_squared_error(y_test, y_pred_test)
print("MSE Scores:", mse_scores)
# Berechnung der Durchschnittswerte für MSE und R2
mean_mse = np.mean(mse_scores)
mean_r2 = np.mean(r2_test_scores)
print("Mean MSE across splits:", mean_mse)
print("Mean R2 across splits:", mean_r2)
model_filename = "FF_test_22_1.joblib"
joblib.dump(model, model_filename)
new_df = pd.read_excel(r"c:\Users\janbu\Desktop\Bachelorarbeit\Versuchspläne_L\Versuche_V01_stat.xlsx",
sheet_name="Probentabelle", header=1)
X_new_unscaled = new_df[['M%1', 'M%2', 'M%3', 'Drehzahl n/min-1', 'Endzonentemperatur °C']]
# Skaliere die neuen Daten
X_new = scaler_x.transform(X_new_unscaled)
# Durchführung der Vorhersage für die neuen Daten
predicted_scaled_y = model.predict(X_new)
# Umkehrung der Skalierung, um den eigentlichen y-Wert zu erhalten
predicted_y = scaler_y.inverse_transform(predicted_scaled_y.reshape(-1, 1)).ravel()
print("Predicted y:", predicted_y)
new_df = pd.read_excel(r"c:\Users\janbu\Desktop\Bachelorarbeit\Versuchspläne_L\Versuche_V01_stat.xlsx",
sheet_name="Probentabelle", header=1)
# Extrahiere die tatsächlichen y-Werte der neuen Daten
actual_y_new = new_df[['MVR MW']].values.ravel()
print("Actual y:", actual_y_new)
# Skaliere die neuen Daten
X_new = scaler_x.transform(X_new_unscaled)
# Durchführung der Vorhersage für die neuen Daten
predicted_scaled_y = model.predict(X_new)
# Umkehrung der Skalierung, um den eigentlichen y-Wert zu erhalten
predicted_y = scaler_y.inverse_transform(predicted_scaled_y.reshape(-1, 1)).ravel()
# Berechnung der Abweichung zwischen den vorhergesagten Werten und den tatsächlichen Werten
#deviation = actual_y_new - predicted_y
# Hinzufügen der Abweichung zur DataFrame der neuen Daten
new_df['Predicted MVR MW'] = predicted_y
Limitation of the maximum number of layers and neurons, additional iteration loops for cross-validation.
Solution
My datasets consist of 16 data points with 5 features (X) and a single output value (y).
This is too little data to use a neural network. You should use something which has fewer parameters, like a linear regression.
There are other issues at play, like that cross validation is not correctly implemented:
shuffled_indices = [train_test_split(range(len(X)), test_size=0.2, shuffle=True) for _ in range(num_splits)]
This allows for a single point to end up in the test set for multiple folds. Cross validation is supposed to divide the data up into k parts, then choose one of those k parts to be the test set, without re-shuffling data between those parts between each fold.
This wouldn't really matter if you had more data, but as it is, your test dataset is going to end up with 3 data points. With a test dataset that small, luck matters a lot.
However, when predicting on a different dataset, I observe a deviation of over 50%.
This could be the result of model overfit, or it could be a result of a distributional shift. It's hard to say from the details provided.
Answered By - Nick ODell
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.