Issue
Can I run StraitifiedShuffleSplit inside GridSearchCV without having to instantiate it first as "ss" in case of my code.
ss = StratifiedShuffleSplit(n_splits=3, test_size=0.5, random_state=0)
grid_search = GridSearchCV(clf_us, param_grid = {parameter: num_range},cv=ss)
Solution
If you are building a classifier and are only concerned with keeping the same label balance in each fold as in the complete data set, you can avoid instantiating StratifiedShuffleSplit by specifying the number of folds in GridSearchCV, e.g. cv=5.
According to the documentation: “For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.” http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
However, if you want to have a finer control over the data splitting then you can’t avoid instantiating StratifiedShuffleSplit. Please see the example in this page to understand how the test_size parameter affects the splitting: http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ShuffleSplit.html#sklearn.model_selection.ShuffleSplit .
Answered By - KRKirov
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.