Issue
How do I get a consistent answer using GridSearchCV in sci-kit learn? I assume I'm getting different answers b/c different random numbers are causing the folds to be different each time I run it, though it is my understanding that the below code should solve this as KFold
has shuffle=False
by default.
clf = GridSearchCV(SVC(), param_grid, cv=KFold(n, n_folds=10))
Solution
As you identified in the comments, predict_proba is NOT deterministic!
But it does accept a random_state (as does KFold). I've found before that setting shuffle=False can lead to really poor results if your data were collected in a non-random order, so IMHO you're better off using shuffle and setting random_state to some number.
From the docs
class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)
random_state : int seed, RandomState instance, or None (default)
The seed of the pseudo random number generator to use when shuffling the data for probability estimation.
Answered By - Alex
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.