Issue
I wrote an mlp and want start to tune it to fit a best results. But i've stucked with several different MSE.
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn import metrics
import numpy
import joblib
# load dataset
#dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataframe = read_csv("100.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:6]
Y = dataset[:,6]
# define the model
def larger_model():
# create model
model = Sequential()
model.add(Dense(20, input_dim=6, kernel_initializer='normal', activation='relu'))
model.add(Dense(50, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='linear'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae','mse'])
return model
# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=100, batch_size=5, verbose=1)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=2)
results = cross_val_score(pipeline, X, Y, cv=kfold)
pipeline.fit(X, Y)
prediction = pipeline.predict(X)
result_test = Y
print("%.2f (%.2f) MSE" % (results.mean(), results.std()))
print('Mean Absolute Error:', metrics.mean_absolute_error(prediction, result_test))
print('Mean Squared Error:', metrics.mean_squared_error(prediction, result_test))
Gives me that result:
Epoch 98/100
200/200 [==============================] - 0s 904us/step - loss: 0.0086 - mae: 0.0669 - mse: 0.0086
Epoch 99/100
200/200 [==============================] - 0s 959us/step - loss: 0.0032 - mae: 0.0382 - mse: 0.0032
Epoch 100/100
200/200 [==============================] - 0s 894us/step - loss: 0.0973 - mae: 0.2052 - mse: 0.0973
200/200 [==============================] - 0s 600us/step
21.959478
-0.03 (0.02) MSE
Mean Absolute Error: 0.1959771416462339
Mean Squared Error: 0.0705598179059006
So i see here a 3 different mse results. Why so and which one i should take in mind to understand an overall model score when i willbe tune it?
Solution
Basically what I understood was if you print the results variable then you will get 2 MSE because you used n_splits=2.
-0.03 (0.02) MSE
Above output is the mean or average of the results(MSE) and std of the results(MSE).
Epoch 100/100
200/200 [==============================] - 0s 894us/step - loss: 0.0973 - mae: 0.2052 - mse: 0.0973
Above outputs mse = 0.0973 this is I think for split=2 and it will take only 50% of whole data(X) because remaining 50% it will take as validation data.
Mean Squared Error: 0.0705598179059006
Above output is coming where you are predicting on whole data, not 50% by using best model so obviously, you will get 3 different MSEs for the above 3 prints.
I am also solving a very similar kind of problem, so do one thing divide the dataset into train and test and use train data for training and when you are predicting use test dataset then calculate MSE on test data or else keep this as it is and take Mean Squared Error: 0.0705598179059006 as your final mse.
Answered By - Vijay Anaparthi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.