Issue
Is there any built in way of doing brute-force feature selection in scikit-learn, i.e. exhaustively evaluate all possible combinations of the input features, and then find the best subset? I am familiar with the "Recursive feature elimination" class but I am specifically interesting in evaluating all possible combinations of the input features one after the other.
Solution
No, best subset selection is not implemented. The easiest way to do it is to write it yourself. This should get you started:
from itertools import chain, combinations
from sklearn.cross_validation import cross_val_score
def best_subset_cv(estimator, X, y, cv=3):
n_features = X.shape[1]
subsets = chain.from_iterable(combinations(xrange(k), k + 1)
for k in xrange(n_features))
best_score = -np.inf
best_subset = None
for subset in subsets:
score = cross_val_score(estimator, X[:, subset], y, cv=cv).mean()
if score > best_score:
best_score, best_subset = score, subset
return best_subset, best_score
This performs k-fold cross-validation inside the loop, so it will fit k 2 ᵖ estimators when giving data with p features.
Answered By - Fred Foo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.