Issue
I have a pandas dataframe in python that looks like this (but is more extended).
I want to split the data set into three sets: train (70%), test (10%) and validate (20%). Because there are some classes that have little objects, I want to also split them stratified. Is there a way (using scikit) to split them stratified in three sets?
This post shows how to do it without the stratify.
Solution
You could use train_test_split
stratified twice:
non_validate_X, validate_X, non_validate_y, validate_y = train_test_split(X, y, stratify=y, test_size=0.2)
train_X, test_X, train_y, test_y = train_test_split(non_validate_X, non_validate_y,stratify=non_validate_y, test_size=1./8)
I use a split of 1./8 in the second one because 1./8 = 10%/(70% + 10%); the fraction of test in your train+test.
Answered By - Learning is a mess
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.