Issue
I have the script below which is supposed to use cross validation to train different models aνd then compute mean accuracy, so that I can use the best model for a classification task. But I am getting the same results for each classifier.
Results look like this :
---Filename in processed................ corpusAmazon_train
etiquette : [0 1]
Embeddings bert model used.................... : sm
Model name: Model_LSVC_ovr
------------cross val predict used----------------
accuracy with cross_val_predict : 0.6582974014576258
corpusAmazon_train file terminated---
---------------cross val score used -----------------------
[0.66348722 0.66234262 0.63334605 0.66959176 0.66081648 0.6463182
0.66730256 0.65572519 0.65648855 0.66755725]
0.66 accuracy with a standard deviation of 0.01
Model name: Model_G_NB
------------cross val predict used----------------
accuracy with cross_val_predict : 0.6582974014576258
corpusAmazon_train file terminated---
---------------cross val score used -----------------------
[0.66348722 0.66234262 0.63334605 0.66959176 0.66081648 0.6463182
0.66730256 0.65572519 0.65648855 0.66755725]
0.66 accuracy with a standard deviation of 0.01
Model name: Model_LR
------------cross val predict used----------------
accuracy with cross_val_predict : 0.6582974014576258
corpusAmazon_train file terminated---
---------------cross val score used -----------------------
[0.66348722 0.66234262 0.63334605 0.66959176 0.66081648 0.6463182
0.66730256 0.65572519 0.65648855 0.66755725]
0.66 accuracy with a standard deviation of 0.01
the code line for using cross_validation:
models_list = {'Model_LSVC_ovr': model1, 'Model_G_NB': model2, 'Model_LR': model3, 'Model_RF': model4, 'Model_KN': model5, 'Model_MLP': model6, 'Model_LDA': model7, 'Model_XGB': model8}
# cross_validation
def cross_validation(features, ylabels, models_list, n, lge_model):
cv_splitter = KFold(n_splits=10, shuffle=True, random_state=42)
features, s = get_flaubert_layer(features, lge_model)
for model_name, model in models_list.items():
print("Model name: {}".format(model_name))
print("------------cross val predict used----------------", "\n")
y_pred = cross_val_predict(model, features, ylabels, cv=cv_splitter, verbose=1)
accuracy_score_predict = accuracy_score(ylabels, y_pred)
print("accuracy with cross_val_predict :", accuracy_score_predict)
print("---------------cross val score used -----------------------", "\n")
scores = cross_val_score(model, features, ylabels, scoring='accuracy', cv=cv_splitter)
print("%0.2f accuracy with a standard deviation of %0.2f" % (accuracy_score_mean, accuracy_score_std), "\n")
Even when using cross_val_score, the same accuracy is give for the models. Any idea , is it perhaps I used random_state in my cross_validation function ?
code for the definition of the models :
def classifiers_b():
model1 = LinearSVC()
model2 = GaussianNB() # MultinomialNB() X cannot be a non-negative
model3 = LogisticRegression()
model4 = RandomForestClassifier()
model5 = KNeighborsClassifier()
model6 = MLPClassifier(hidden_layer_sizes=(50, 100, 50), max_iter=500, activation='relu', solver='adam',
random_state=1)
model8 = XGBClassifier(eval_metric="logloss")
model7 = LinearDiscriminantAnalysis()
#models_list = {'Model_LSVC_ovr': model1, 'Model_G_NB': model2, 'Model_LR': model3, 'Model_RF': model4, 'Model_KN': model5, 'Model_MLP': model6, 'Model_LDA': model7, 'Model_XGB': model8}
Solution
I would suggest using a pipeline for each model. It looks like you are performing CV on the same model on each iteration. You can check the doc here for more information on how to use them. Then perform CV for each model pipeline.
Answered By - Jesús Hernández
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.