Thursday, July 28, 2022

[FIXED] After doing GridSearch why am I getting less accurate results?

July 28, 2022 python, scikit-learn No comments

Issue

So I have applied random search first then grid search for my MLP Regressor. The thing is my R^2 for the optimum parameters suggested by randomsearch (hidden layers 18,18,18 (R^2= 0.90)) is better than the same suggested by gridsearch (hidden layers 17,17,17 (R^2= 0.89), why is that?

from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import RandomizedSearchCV


mlp = MLPRegressor(random_state=42)
param_grid_random = {'hidden_layer_sizes': [(18,), (18,18,), (18,18,18,)],
              'activation': ['tanh','relu','logistic'],
              'solver': ['sgd', 'adam'],
              'learning_rate': ['constant','adaptive','invscaling'],
              'alpha': [0.0001, 0.05],
              'max_iter': [10000000000],
              'early_stopping': [False],
              'warm_start': [False]}

GS_random = RandomizedSearchCV(mlp, param_distributions=param_grid_random,n_jobs= 1,cv=5, scoring='r2', n_iter=100,random_state=42) #scoring='neg_mean_squared_error'
                  
                  
GS_random.fit(X_train, y_train)

print(GS_random.best_params_)

{'warm_start': False, 'solver': 'adam', 'max_iter': 10000000000, 'learning_rate': 'constant', 'hidden_layer_sizes': (18, 18, 18), 'early_stopping': False, 'alpha': 0.05, 'activation': 'relu'}

from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import GridSearchCV
mlp = MLPRegressor(random_state=42)
param_grid = {'hidden_layer_sizes': [(17,17,17), (18,18,18), (19,19,19)],
              'activation': ['tanh','relu','logistic'],
              'solver': ['sgd', 'adam'],
              'learning_rate': ['constant','adaptive','invscaling'],
              'alpha': [0.0001, 0.05],
              'max_iter': [10000000000],
              'early_stopping': [False],
              'warm_start': [False]}
GS = GridSearchCV(mlp, param_grid=param_grid,n_jobs= -1,cv=5, scoring='r2') #scoring='neg_mean_squared_error'
                  
                  
GS.fit(X_train, y_train)

print(GS.best_params_)

{'activation': 'relu', 'alpha': 0.05, 'early_stopping': False, 'hidden_layer_sizes': (17, 17, 17), 'learning_rate': 'constant', 'max_iter': 10000000000, 'solver': 'adam', 'warm_start': False}

mlp_new = MLPRegressor(hidden_layer_sizes=(17, 17,17),
                       max_iter = 10000000000,activation = 'relu',
                       solver = 'adam', learning_rate='constant', 
                       alpha=0.05,validation_fraction=0.2,random_state=0,early_stopping=False)
mlp_new.fit(X_train, y_train)
mlp_new_y_predict = mlp_new.predict((X_test))
mlp_new_y_predict

from sklearn import metrics
MLP_r_square = metrics.r2_score(y_test,mlp_new_y_predict)
print('R-Square for the MLP Regressor is:', MLP_r_square)

Solution

GridSearchCV and RandomizedSearchCV both take random_state parameters. This is different to the random state in the model itself (i.e. MLPRegressor) and controls the way in which the data is separated into training and testing datasets.

Try setting the same random state in the different CV calls. It might also be a good idea to pass through a specific CV splitter (rather than cv=5) to ensure reproducibility.

Probably the simplest CV splitter is KFold, you can use it with your CV call and a specified random state like this:

from sklearn.model_selection import KFold
kfold_splitter = KFold(n_splits=5, random_state = 666)

GS_random = RandomizedSearchCV(...,

                               cv=kfold_splitter,

                               ...)

The cross-validation page in the scikit-learn documentation provides a good description of the different cross-validation iterators. The correct one to use depends on the problem at hand (e.g. do you need to stratify? do you have different groups you need to keep separate?).

Answered By - njp

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, July 28, 2022

[FIXED] After doing GridSearch why am I getting less accurate results?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels