Saturday, November 27, 2021

[FIXED] How to display all 4 splits in a array for Kfolds at n=4?

November 27, 2021 k-fold, pandas, python, scikit-learn No comments

Issue

Each tuple in this list should consist of a train_indices list and a test_indices list containing the training/testing data point indices for that particular K th split.

Below is what we want to achieve with the dataset:

data_indices = [(list_of_train_indices_for_split_1, list_of_test_indices_for_split_1)
              (list_of_train_indices_for_split_2, list_of_test_indices_for_split_2)
              (list_of_train_indices_for_split_3, list_of_test_indices_for_split_3)
                                               ...
                                               ...
              (list_of_train_indices_for_split_K, list_of_test_indices_for_split_K)]

Here is my current function:

 def sklearn_kfold_split(data,K):

    kf = KFold(n_splits = K, shuffle = False, random_state = None)
    result = next(kf.split(data), None)

    return [result]

The output of this function:

      sklearn_kfold_split(data,4)

 [(array([15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
     32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
     49, 50, 51, 52, 53, 54, 55, 56, 57]),
  array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14]))]

I am not sure what i should add or change to get this output below:

 [(array([15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
     32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
     49, 50, 51, 52, 53, 54, 55, 56, 57]),
  array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])),
 (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 30, 31,
     32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
     49, 50, 51, 52, 53, 54, 55, 56, 57]),
  array([15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])),
 (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
     17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 44, 45, 46, 47,
     48, 49, 50, 51, 52, 53, 54, 55, 56, 57]),
  array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43])),
 (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
     17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
     34, 35, 36, 37, 38, 39, 40, 41, 42, 43]),
  array([44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57]))]

Any help or advice on what I can change on my function

Solution

The easiest way to fix this is to use list comprehension to iterate over the results from KFold.split:

import pandas as pd
from sklearn.model_selection import KFold

def sklearn_kfold_split(data, K):
    kf = KFold(n_splits=K, shuffle=False, random_state=None)
    result = [(train_index, test_index) for train_index, test_index in kf.split(data)]
    return result


data = list(range(12))
K = 4
sklearn_kfold_split(data_indices, K)

Output:

[(array([ 3,  4,  5,  6,  7,  8,  9, 10, 11]), array([0, 1, 2])),
 (array([ 0,  1,  2,  6,  7,  8,  9, 10, 11]), array([3, 4, 5])),
 (array([ 0,  1,  2,  3,  4,  5,  9, 10, 11]), array([6, 7, 8])),
 (array([0, 1, 2, 3, 4, 5, 6, 7, 8]), array([ 9, 10, 11]))]

Answered By - Wojciech K

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 27, 2021

[FIXED] How to display all 4 splits in a array for Kfolds at n=4?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels