Sunday, December 10, 2023

[FIXED] How to extract best estimator of a SequentialFeatureSelector

December 10, 2023 feature-extraction, machine-learning, python, scikit-learn, sequentialfeatureselector No comments

Issue

I have trained a SequentialFeatureSelector from sklearn and am now interested in the best model (based on the given scoring method) it produced. Is there a possible way of extracting the parameters and using them generate the model that was used?

I have seen that there exists a get_params() function for the SequentialFeatureSelector, but I don't undestand how to interpret the output and retrieve the best estimator.

Solution

The main result of this model is which features it decided to select. You can access that information in various ways. Suppose you have fitted a selector=SequentialFeatureSelector(...).fit(...).

selector.support_ is a boolean vector, where True means it selected that feature. If you started off with 5 features, and told it to select 2, then the vector will be [True, False, False, False, True] if it selected the first and last feature.

You can get the same output as above using selector.get_support(). If you want the indices rather than a boolean vector, you can use selector.get_support(indices=True) - it'll return [0, 4] in this case indicating feature number 0 and feature number 3.

To get the feature names (only applies if you fed the model a dataframe):

selector.feature_names_in_[selector.support_]

After fitting the selector, if you want it to strip out the unselected features, you can use selector.transform(X_test). The .transform(X_test) will apply the already-fitted selector to the supplied data. In this example, if X_test is 100 x 5, then it'll return a 100 x 2 version where it has only kept the features determined from the initial .fit().

SequentialFeatureSelector doesn't keep any of the models fitted during cross-validation. So I think you'd need to fit a new model using the selected features:

#Fit selector
selector = SequentialFeatureSelector(
    LogisticRegression(), n_features_to_select=2
).fit(X, y)

print('Selected feature numbers are', selector.get_support(indices=True))

#Use fitted selector to reduce X
X_reduced = selector.transform(X)

#Fit logreg model on the selected features
logreg_fitted = LogisticRegression().fit(X_reduced, y)

Alternatively, this ensures consistency with the original estimator by saving you from needing to manually specify all the original parameters:

from sklearn.base import clone

best_model = clone(selector.estimator)(**selector.estimator.get_params()).fit(selector.transform(X), y).

If you want identical models (down to the random seed) it'll also be necessary to set up the CV appropriately.

Answered By - user3128

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, December 10, 2023

[FIXED] How to extract best estimator of a SequentialFeatureSelector

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels