Issue
I want to create my training, validation and testset from one dataframe with a proportion of 6:2:2.
But additionally within each set, I'd like to have a proportion of 6:4 between the 2 labels. Among the original dataframe this 6:4 proportion is not given, one label is massively overrepresented. Maybe I should adjust that in advance?
I think sklearns train_test_split() might be an option but to be honest its documentation did not make me any wiser...
Are there any best practices for this kind of problem?
Solution
When you have an imbalanced dataset, you can use the parameter 'stratify'
in the train_test_split()
. This will make the dataset be split into training and test sets in such a way that the ratio of the class labels in the variable specified is constant i.e. both train and test set will have the same ratio of class labels.
Answered By - SunilG
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.