Issue
I have defined the following pipelines using scikit-learn:
model_lg = Pipeline([("preprocessing", StandardScaler()), ("classifier", LogisticRegression())])
model_dt = Pipeline([("preprocessing", StandardScaler()), ("classifier", DecisionTreeClassifier())])
model_gb = Pipeline([("preprocessing", StandardScaler()), ("classifier", HistGradientBoostingClassifier())])
Then I used cross validation to evaluate the performance of each model:
cv_results_lg = cross_validate(model_lg, data, target, cv=5, return_train_score=True, return_estimator=True)
cv_results_dt = cross_validate(model_dt, data, target, cv=5, return_train_score=True, return_estimator=True)
cv_results_gb = cross_validate(model_gb, data, target, cv=5, return_train_score=True, return_estimator=True)
When I try to inspect the feature importance for each model using the coef_
method, it gives me an attribution error:
model_lg.steps[1][1].coef_
AttributeError: 'LogisticRegression' object has no attribute 'coef_'
model_dt.steps[1][1].coef_
AttributeError: 'DecisionTreeClassifier' object has no attribute 'coef_'
model_gb.steps[1][1].coef_
AttributeError: 'HistGradientBoostingClassifier' object has no attribute 'coef_'
I was wondering, how I can fix this error? or is there any other approach to inspect the feature importance in each model?
Solution
Imo, the point here is the following. On the one side, the pipeline instances model_lg
, model_dt
etc. are not explicitely fitted (you're not calling method .fit()
on them directly) and this prevents you from trying to access the coef_
attribute on the instances themselves.
On the other side, by calling .cross_validate()
with parameter return_estimator=True
(which is possible with .cross_validate()
only among the cross-validation methods), you can get the fitted estimators back for each cv split, but you should access them via your dictionaries cv_results_lg
, cv_results_dt
etc (on the 'estimator'
key). Here's the reference in the code and here's an example:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
X, y = load_iris(return_X_y=True)
model_lg = Pipeline([("preprocessing", StandardScaler()), ("classifier", LogisticRegression())])
cv_results_lg = cross_validate(model_lg, X, y, cv=5, return_train_score=True, return_estimator=True)
These would be - for instance - the results computed on the first fold.
cv_results_lg['estimator'][0].named_steps['classifier'].coef_
Useful insights on related topics might be found in:
- How to get feature importances of a multi-label classification problem?
- Get support and ranking attributes for RFE using Pipeline in Python 3
Answered By - amiola
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.