Thursday, November 30, 2023

[FIXED] what is the final model to use when using stratified kfold cv?

November 30, 2023 python, scikit-learn No comments

Issue

When using statified kfold cv, we fit the model for each of 5 folds for example so we got 5 models for each fold respectively, Then my question is what is the final model to use for prediction? For example, in below code, the code get accuracy result of each of 10 fold, then which kfold model to use after traning and fitting the data? Do we just use a specific model with a specific fold with highest accuracy?

https://www.geeksforgeeks.org/stratified-k-fold-cross-validation/

# Import Required Modules.
from statistics import mean, stdev
from sklearn import preprocessing
from sklearn.model_selection import StratifiedKFold
from sklearn import linear_model
from sklearn import datasets
  
# FEATCHING FEATURES AND TARGET VARIABLES IN ARRAY FORMAT.
cancer = datasets.load_breast_cancer()
# Input_x_Features.
x = cancer.data                         
 
# Input_ y_Target_Variable.
y = cancer.target                       
   
  
# Feature Scaling for input features.
scaler = preprocessing.MinMaxScaler()
x_scaled = scaler.fit_transform(x)
  
# Create  classifier object.
lr = linear_model.LogisticRegression()
  
# Create StratifiedKFold object.
skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=1)
lst_accu_stratified = []
  
for train_index, test_index in skf.split(x, y):
    x_train_fold, x_test_fold = x_scaled[train_index], x_scaled[test_index]
    y_train_fold, y_test_fold = y[train_index], y[test_index]
    lr.fit(x_train_fold, y_train_fold)
    lst_accu_stratified.append(lr.score(x_test_fold, y_test_fold))
  
# Print the output.
print('List of possible accuracy:', lst_accu_stratified)
print('\nMaximum Accuracy That can be obtained from this model is:',
      max(lst_accu_stratified)*100, '%')
print('\nMinimum Accuracy:',
      min(lst_accu_stratified)*100, '%')
print('\nOverall Accuracy:',
      mean(lst_accu_stratified)*100, '%')
print('\nStandard Deviation is:', stdev(lst_accu_stratified))

Solution

we don't choose any of the models build by the k-fold cross validation as the final model. Instead, we use k-fold CV

(i) to choose the hyper parameters from the model which gives highest accuracy, and we use these hyper parameters to train the model on the entire dataset.

(ii) to understand the average performance of the model over multiple iterations across different subsets by looking at the mean score of the performance.

Answered By - Developer

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, November 30, 2023

[FIXED] what is the final model to use when using stratified kfold cv?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels