Issue
My dataset contains a column with some data I need to use for splitting by groups in a way that rows belonging to same group should not be divided into train/test but sent as a whole to one of the splits using PYCARET
10 row sample for clarification:
group_id measure1 measure2 measure3
1 3455 3425 345
1 6455 825 945
1 6444 225 145
2 23 34 233
2 623 22 888
3 3455 3425 345
3 6155 525 645
3 6434 325 845
4 93 345 233
4 693 222 808
every unique group_id should be sent to any split in full this way (using 80/20):
TRAIN SET:
group_id measure1 measure2 measure3
1 3455 3425 345
1 6455 825 945
1 6444 225 145
3 3455 3425 345
3 6155 525 645
3 6434 325 845
4 93 345 233
4 693 222 808
TEST SET:
group_id measure1 measure2 measure3
2 23 34 233
2 623 22 888
Solution
You can try the following per the documentation
https://pycaret.readthedocs.io/en/latest/api/classification.html
fold_strategy = "groupkfold"
Answered By - Nikhil Gupta
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.