Issue
I have recently started working with sklearn and stumble on the Stratified
ShuffleSplit function. Even though I understand its concept and what it is meant to do I don't quite understand the arguments it needs to have to function such as n_split. Based on the documentation of sklearn it is written that
n_splits : int, default 10 Number of re-shuffling & splitting iterations.
My best guess is that it tells the StratifieShufflesplit function the number of starta there are in the data.
Solution
n_splits
is a parameter of almost every cross validator. In general, it determines how many different validation (and training) sets you will create.
If you use StratifiedShuffleSplit
it does not denote the number of strata - those are implied from the underlying relative frequencies of classification targets in your dataset.
See below a quote from the official docs (full link here)
StratifiedShuffleSplit
StratifiedShuffleSplit is a variation of ShuffleSplit, which returns stratified splits, i.e which creates splits by preserving the same percentage for each target class as in the complete set.
Answered By - Jan K
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.