Tuesday, July 5, 2022

[FIXED] How to get the support_ values from RFE pipeline?

July 05, 2022 pipeline, python, rfe, scikit-learn No comments

Issue

I created a Pipeline with RFE and RandomForestClassifer in it and then applied RandomizedSearchCV to find the best hyperparameter values for both. This is what my code looks like -

from sklearn.esemble_learning import RandomForestClassifier
from sklearn.feature_selection import RFE
from sklearn.pipeline import Pipeline
from sklearn.model_selection import RandomizedSearchCV

steps = [
         ("rfe", RFE(estimator = RandomForestClassifier(random_state = 42))),
         ("est", RandomForestClassifier())
]
rf_clf_pl = Pipeline(steps = steps)

params = {
    "rfe__n_features_to_select" : range(2, smote_X_train.shape[1] + 1),
    "est__random_state" : np.linspace(0, 42, 5).astype(int),
    "est__n_estimators" : range(50, 201, 10),
    "est__max_depth" : [None] + list(range(5, max_depth, 3)),
    "est__max_leaf_nodes" : [None] + list(range(100, max_leaf_nodes, 20))
}

rs = RandomizedSearchCV(estimator = rf_clf_pl, cv = 4, param_distributions = params, n_jobs = -1, n_iter = 100, random_state = 42)
rs.fit(smote_X_train, smote_y_train)

I tried using the code below but got an error -

rf_clf_pl.named_steps["rfe"].support_

Error -

AttributeError                            Traceback (most recent call last)
<ipython-input-53-c73290f0e090> in <module>()
----> 1 rf_clf_pl.named_steps["rfe"].support_

AttributeError: 'RFE' object has no attribute 'support_'

How can I get the name of the retained features?

Solution

You can access the retained features of the best estimator as follows:

rs.best_estimator_.named_steps['rfe'].support_

Namely, you should access the best_estimator_ attribute of the RandomizedSearchCV fitted instance (i.e. the pipeline re-fitted with the best found hyperparameters thanks to the default parameter refit=True of RandomizedSearchCV).

The way you were trying to access attribute support_ from the pipeline instance does not work because you've not explicitly fitted the pipeline itself nor the fitted RandomizedSearchCV returns the fitted base estimator (despite calling .fit() on it while running the search) with the exception of the best_estimator_ in the case described above.

Here's an example:

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE
from sklearn.pipeline import Pipeline
from sklearn.model_selection import RandomizedSearchCV, train_test_split

iris = load_iris(as_frame=True)
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test= train_test_split(X, y, random_state=0)

steps = [
     ("rfe", RFE(estimator = RandomForestClassifier(random_state = 42))),
     ("est", RandomForestClassifier())
]
rf_clf_pl = Pipeline(steps = steps)

params = {
    "rfe__n_features_to_select" : range(2, X_train.shape[1] + 1),
    "est__random_state" : np.linspace(0, 42, 5).astype(int),
    "est__n_estimators" : range(50, 201, 10),
    "est__max_depth" : [None] + list(range(5, 16, 3)),
    "est__max_leaf_nodes" : [None] + list(range(100, 201, 20))
}

rs = RandomizedSearchCV(estimator = rf_clf_pl, cv = 4, param_distributions = params, n_jobs = -1, n_iter = 100, random_state = 42)

rs.fit(X_train, y_train)

rs.best_estimator_.named_steps['rfe'].support_

Eventually, if you want to access the explicit names of the retained features, you can retrieve them via rs.feature_names_in_[np.where(rs.best_estimator_.named_steps['rfe'].support_)[0]].

Answered By - amiola

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, July 5, 2022

[FIXED] How to get the support_ values from RFE pipeline?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels