Issue
I want to split it into two data frames,(train, and test) using the values in the id column. The split should be such that in the first data frame I have 70% of the (unique) ids and in the second data frame, I have 30% of the ids. The ids should be randomly split.
I have multiple values corresponding to one id.
The below script I was trying:
Training_data, Test_data = sklearn.model_selection.train_test_split(data, data['ID_sample'].unique(), train_size=0.30, test_size=0.70, random_state=5)
Solution
Sorted the issue in the following way
samplelist = data["ID_sample"].unique()
training_samp, test_samp = sklearn.model_selection.train_test_split(samplelist, train_size=0.7, test_size=0.3, random_state=5, shuffle=True)
training_data = data[data['ID_sample'].isin(training_samp)]
test_data = data[data['ID_sample'].isin(test_samp)]
Answered By - pratyaysengupta
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.