Issue
I need to split my data into a training set (80%) and test set (20%). I currently do that with the code below:
StratifiedShuffleSplit(n_splits=10,test_size=.2, train_size=.8, random_state=0)
How ever i need to specify a particular attribute for spliting. I am not able to do it
Solution
If you want to split your data in an 80/20 stratified manner, I recommend using train_test_split:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, stratify=y, random_state=0)
# Stratification is done based on the y labels
If you need to use StratifiedShuffleSplit, you can do the following:
sss = StratifiedShuffleSplit(n_splits=10, test_size=.2, random_state=0)
for train_index, test_index in sss.split(X, y):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y[train_index], y[test_index]
# Stratification is done based on the y labels
# You can use training and test sets within this for loop
More info here.
Answered By - Mattravel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.