Issue
I have built a GridsearchCV pipeline and it seems to run fine.
However, I can't seem to (after much reading) find a way to properly extract the features selected.
Does this mean that whenever I attempt to run the "best model", the model assumes that the feature set passed in for prediction is exactly the same (even in the sequence of features in the columns)?
Is there a better way to achieve this because otherwise I would have to rebuild the entire feature set when using live data when GridsearchCV says I only need (for example) 5 features.
I was thinking perhaps I would reconstruct the classifier with the hyperparameters of the GridsearchCV output but that seems a little convoluted?
Solution
Does this mean that whenever I attempt to run the "best model", the model assumes that the feature set passed in for prediction is exactly the same (even in the sequence of features in the columns)?
Yes, every sklearn
estimator will assume you are going to pass in the same format of input as it was trained on, and in particular you will have to pass in even features that are ultimately discarded.
The best way to shrink your input footprint is to refit a version (without the selection, etc.) of the pipeline on just the relevant features; you could potentially skip refitting the model step, as long as you can be sure the order of features reaching it is the same. (I've thought about writing a convenience function to try to automate this without refitting things, but it's nontrivial and easy to get wrong in a way that would silently break the model.)
If the problem is just the generation of the unused features in production, you could also just make up values for them; since they aren't used by the model, it doesn't matter if they're the real values, just plausible ones.
A good answer to your first question was deleted, so I'll answer that as well:
However, I can't seem to (after much reading) find a way to properly extract the features selected.
The grid search's best_estimator_
(assuming you refit) pipeline can be inspected for the selected features. (This may be somewhat tricky if your features are changed in the pipeline before reaching the feature selection step.)
Answered By - Ben Reiniger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.