Issue
l want to split data into train and test and also a vector that contains names (it serves me as an index and reference).
name_images has a shape of (2440,)
My data are :
data has a shape of (2440, 3072)
labels has a shape of (2440,)
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test= train_test_split(data, labels, test_size=0.3)
but l want also to split my name_images
into name_images_train
and name_images_test
with respect to the split of data
and labels
l tried
x_train, x_test, y_train, y_test,name_images_train,name_images_test= train_test_split(data, labels,name_images, test_size=0.3)
it doesn't preserve the order Any suggestions thank you
EDIT1:
x_train, x_test, y_train, y_test= train_test_split(data, labels,test_size=0.3, random_state=42)
name_images_train, name_images_test=train_test_split(name_images,
test_size=0.3,
random_state=42)
EDIT1
don't preserve the order
Solution
There are multiple ways to accomplish this.
The most straight forward is to use random_state
parameter of train_test_split
. As the documentation states:
random_state : int or RandomState :-
Pseudo-random number generator state used for random sampling.
When you fix the random_state, the indices which are generated for splitting the arrays into train and test are exact same each time.
So change your code to:
x_train, x_test,
y_train, y_test,
name_images_train, name_images_test=train_test_split(data, labels, name_images,
test_size=0.3,
random_state=42)
For more understanding on random_state, see my answer here:
Answered By - Vivek Kumar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.