Issue
Is it possible to compute feature importance (with Random Forest) in scikit learn when features have been onehotencoded?
Solution
Here's an example of how to combine feature names with their importances:
from sklearn.feature_extraction import DictVectorizer
from sklearn.preprocessing import FunctionTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
# some example data
X = pd.DataFrame({'feature': ['value1', 'value2', 'value2', 'value1', 'value2']})
y = [1, 0, 0, 1, 1]
# translate rows to dicts
def row_to_dict(X, y=None):
return X.apply(dict, axis=1)
# define prediction model
ft = FunctionTransformer(row_to_dict, validate=False)
dv = DictVectorizer()
rf = RandomForestClassifier()
# glue steps together
model = make_pipeline(ft, dv, rf)
# train
model.fit(X, y)
# get feature importances
feature_importances = zip(dv.feature_names_, rf.feature_importances_)
# have a look
print feature_importances
Answered By - Kris
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.