Tuesday, February 22, 2022

[FIXED] Training 8 different classifiers with crossvalidation give same accuracy with the same file?

February 22, 2022 cross-validation, machine-learning, python, scikit-learn No comments

Issue

I have the script below which is supposed to use cross validation to train different models aνd then compute mean accuracy, so that I can use the best model for a classification task. But I am getting the same results for each classifier.

Results look like this :

---Filename in processed................ corpusAmazon_train
etiquette  : [0 1]
Embeddings bert model used.................... :  sm
Model name: Model_LSVC_ovr
------------cross val predict used---------------- 

accuracy with cross_val_predict : 0.6582974014576258
corpusAmazon_train file terminated--- 

---------------cross val score used ----------------------- 

[0.66348722 0.66234262 0.63334605 0.66959176 0.66081648 0.6463182
 0.66730256 0.65572519 0.65648855 0.66755725]
0.66 accuracy with a standard deviation of 0.01 

Model name: Model_G_NB
------------cross val predict used---------------- 

accuracy with cross_val_predict : 0.6582974014576258
corpusAmazon_train file terminated--- 

---------------cross val score used ----------------------- 

[0.66348722 0.66234262 0.63334605 0.66959176 0.66081648 0.6463182
 0.66730256 0.65572519 0.65648855 0.66755725]
0.66 accuracy with a standard deviation of 0.01 

Model name: Model_LR
------------cross val predict used---------------- 

accuracy with cross_val_predict : 0.6582974014576258
corpusAmazon_train file terminated--- 

---------------cross val score used ----------------------- 

[0.66348722 0.66234262 0.63334605 0.66959176 0.66081648 0.6463182
 0.66730256 0.65572519 0.65648855 0.66755725]
0.66 accuracy with a standard deviation of 0.01

the code line for using cross_validation:

models_list = {'Model_LSVC_ovr': model1, 'Model_G_NB': model2, 'Model_LR': model3, 'Model_RF': model4, 'Model_KN': model5, 'Model_MLP': model6, 'Model_LDA': model7, 'Model_XGB': model8}

# cross_validation
def cross_validation(features, ylabels, models_list, n, lge_model):

    cv_splitter = KFold(n_splits=10, shuffle=True, random_state=42)
    features, s = get_flaubert_layer(features, lge_model)
    for model_name, model in models_list.items():
        print("Model name: {}".format(model_name))
        print("------------cross val predict used----------------", "\n")
        y_pred = cross_val_predict(model, features, ylabels, cv=cv_splitter, verbose=1)
        accuracy_score_predict = accuracy_score(ylabels, y_pred)
        print("accuracy with cross_val_predict :", accuracy_score_predict)

        print("---------------cross val score used -----------------------", "\n")
        scores = cross_val_score(model, features, ylabels, scoring='accuracy', cv=cv_splitter)

        print("%0.2f accuracy with a standard deviation of %0.2f" % (accuracy_score_mean, accuracy_score_std), "\n")

Even when using cross_val_score, the same accuracy is give for the models. Any idea , is it perhaps I used random_state in my cross_validation function ?

code for the definition of the models :

def classifiers_b():

    model1 = LinearSVC()
    model2 = GaussianNB()  # MultinomialNB() X cannot be a non-negative
    model3 = LogisticRegression()
    model4 = RandomForestClassifier()
    model5 = KNeighborsClassifier()
    model6 = MLPClassifier(hidden_layer_sizes=(50, 100, 50), max_iter=500, activation='relu', solver='adam',
                           random_state=1)
    model8 = XGBClassifier(eval_metric="logloss")
    model7 = LinearDiscriminantAnalysis()

    #models_list = {'Model_LSVC_ovr': model1, 'Model_G_NB': model2, 'Model_LR': model3, 'Model_RF': model4, 'Model_KN': model5, 'Model_MLP': model6, 'Model_LDA': model7, 'Model_XGB': model8}

Solution

I would suggest using a pipeline for each model. It looks like you are performing CV on the same model on each iteration. You can check the doc here for more information on how to use them. Then perform CV for each model pipeline.

Answered By - Jesús Hernández

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, February 22, 2022

[FIXED] Training 8 different classifiers with crossvalidation give same accuracy with the same file?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels