Issue
I am using OneHotEncoder to encode few categorical variables (eg - Sex and AgeGroup). The resulting feature names from the encoder are like - 'x0_female', 'x0_male', 'x1_0.0', 'x1_15.0' etc.
>>> train_X = pd.DataFrame({'Sex':['male', 'female']*3, 'AgeGroup':[0,15,30,45,60,75]})
>>> from sklearn.preprocessing import OneHotEncoder
>>> encoder = OneHotEncoder()
>>> train_X_encoded = encoder.fit_transform(train_X[['Sex', 'AgeGroup']])
>>> encoder.get_feature_names()
>>> array(['x0_female', 'x0_male', 'x1_0.0', 'x1_15.0', 'x1_30.0', 'x1_45.0',
'x1_60.0', 'x1_75.0'], dtype=object)
Is there a way to tell OneHotEncoder
to create the feature names in such a way that the column name is added at the beginning, something like - Sex_female, AgeGroup_15.0 etc, similar to what Pandas get_dummies()
does.
Solution
A list with the original column names can be passed to get_feature_names
.
>>> encoder.get_feature_names(['Sex', 'AgeGroup'])
array(['Sex_female', 'Sex_male', 'AgeGroup_0', 'AgeGroup_15',
'AgeGroup_30', 'AgeGroup_45', 'AgeGroup_60', 'AgeGroup_75'],
dtype=object)
- DEPRECATED:
get_feature_names
is deprecated in 1.0 and will be removed in 1.2. Please useget_feature_names_out
instead.
>>> encoder.get_feature_names_out(['Sex', 'AgeGroup'])
array(['Sex_female', 'Sex_male', 'AgeGroup_0', 'AgeGroup_15',
'AgeGroup_30', 'AgeGroup_45', 'AgeGroup_60', 'AgeGroup_75'],
dtype=object)
Answered By - kabochkov
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.