Issue
Please consider this code:
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import RFE
from sklearn.pipeline import Pipeline
# data
train_X = pd.DataFrame(data=np.random.rand(20, 3), columns=["a", "b", "c"])
train_y = pd.Series(data=np.random.randint(0,2, 20), name="y")
test_X = pd.DataFrame(data=np.random.rand(10, 3), columns=["a", "b", "c"])
test_y = pd.Series(data=np.random.randint(0,2, 10), name="y")
# scaler
scaler = StandardScaler()
# feature selection
p = Pipeline(steps=[("scaler0", scaler),
("model", SVC(kernel="linear", C=1))])
rfe = RFE(p, n_features_to_select=2, step=1,
importance_getter="named_steps.model.coef_")
rfe.fit(train_X, train_y)
# apply the scaler to the test set
scaled_test = scaler.transform(test_X)
I get this message:
NotFittedError: This StandardScaler instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
Why is the scaler
not fitted?
Solution
When passing a pipeline or an estimator to RFE, it essentially gets cloned by the RFE and fit until it finds the best fit with the reduced number of features.
To access this fit estimator you can use
fit_pipeline = rfe.estimator_
But note, this new pipeline uses the top n_features_to_select
features.
Answered By - Ach113
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.