Issue
I wrote following code block. After i found the best estimator, i want to learn feature importance of the model. But i couldn't figure it out how to do it correctly with column names.
scaler = StandardScaler()
ohe = OneHotEncoder(categories=unique_list, sparse=False)
col_transformers = ColumnTransformer([
("scaler_onestep", scaler, numerical_columns),
("ohe_onestep", ohe, categorical_columns)])
param_grid = {
'XGB__estimator__max_depth': [3, 5, 7, 10],
'XGB__estimator__learning_rate': [0.01, 0.1],
'XGB__estimator__n_estimators': [100]}
model = MultiOutputClassifier(xgb.XGBClassifier(objective="binary:logistic"))
#Define a pipeline
pipeline = Pipeline([("preprocessing", col_transformers), ("XGB", model)])
rs_clf = RandomizedSearchCV(pipeline, param_grid, n_iter=3,
n_jobs=-1, verbose=2, cv=2, scoring="accuracy", refit=True, random_state=42)
rs_clf.fit(X, y)
This gives me the result of feature importances for first label.
rs_clf.best_estimator_.named_steps["XGB"].estimators_[0].feature_importances_
This gives me the catagories.
rs_clf.best_estimator_.named_steps["preprocessing"].transformers[1][1].categories
result has 389 columns, X has 279 columns, so i can not write it directly, how can i do that for one hot encoded data? How can i find this 389 columns names?
Solution
The get_feature_names
method is going to be of great help here. At the moment, StandardScaler
doesn't support it; since xgboost is completely unaffected by feature scaling, I would suggest dropping it and replacing the numerical portion of the ColumnTransformer
with "passthrough"
. Then rs_clf.best_estimator_.named_steps["preprocessing"].get_feature_names()
should give the features in the order they arrive to XGB.
Answered By - Ben Reiniger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.