Issue
I'm trying to do binary_classification on stock market data.
Since it is a timeseries data, I don't want to shuffle the data.
I would stratify the data without shuffling my data.
sklearn train_test_split stratify works only when the setting is shuffle=True.
[See documentation: If shuffle=False then stratify must be None.
]
Is there any alternative?
Note: My model utilises xgboost algorithm.
Also Note: I don't want to use train_test_split function. I already did that manually like this.
train_df = df.iloc[0: math.floor(9 * len(df)/10)]
test_df = df.iloc[math.floor(9 * len(df)/10):]
Solution
Have you tried using StratifiedKFold?
You can give hyperparameter shuffe =Flase
It will generate indices of train and test data in number of folds
Here is the documentation link
This may help
Answered By - otaku
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.