Thursday, February 10, 2022

[FIXED] Saving order of splitting with a vector of index

February 10, 2022 numpy, python, scikit-learn, split No comments

Issue

l want to split data into train and test and also a vector that contains names (it serves me as an index and reference).

name_images has a shape of (2440,)

My data are :

data has a shape of (2440, 3072) 
labels has a shape of (2440,)

from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test= train_test_split(data, labels, test_size=0.3)

but l want also to split my name_images into name_images_train and name_images_test with respect to the split of data and labels

l tried

  x_train, x_test, y_train, y_test,name_images_train,name_images_test= train_test_split(data, labels,name_images, test_size=0.3)

it doesn't preserve the order Any suggestions thank you

EDIT1:

x_train, x_test, y_train, y_test= train_test_split(data, labels,test_size=0.3, random_state=42)

name_images_train, name_images_test=train_test_split(name_images, 
                                                         test_size=0.3, 
                                                         random_state=42)

EDIT1 don't preserve the order

Solution

There are multiple ways to accomplish this.

The most straight forward is to use random_state parameter of train_test_split. As the documentation states:

random_state : int or RandomState :-
Pseudo-random number generator state used for random sampling.

When you fix the random_state, the indices which are generated for splitting the arrays into train and test are exact same each time.

So change your code to:

x_train, x_test, 
y_train, y_test, 
name_images_train, name_images_test=train_test_split(data, labels, name_images, 
                                                     test_size=0.3, 
                                                     random_state=42)

For more understanding on random_state, see my answer here:

https://stackoverflow.com/a/42197534/3374996

Answered By - Vivek Kumar

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, February 10, 2022

[FIXED] Saving order of splitting with a vector of index

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels