Issue
from sklearn.preprocessing import OneHotEncoder
df.LotFrontage = df.LotFrontage.fillna(value = 0)
categorical_mask = (df.dtypes == "object")
categorical_columns = df.columns[categorical_mask].tolist()
ohe = OneHotEncoder(categories = categorical_mask, sparse = False)
df_encoded = ohe.fit_transform(df)
print(df_encoded[:5, :])
ERROR:
May I know whats wrong with my code?
This is a snippet of the data:
[2
Solution
The categories
argument in the OneHotEncoder
is not there to select which features to encode, for that you need a ColumnTransformer
. Try this:
df.LotFrontage = df.LotFrontage.fillna(value = 0)
categorical_features = df.select_dtypes("object").columns
column_trans = ColumnTransformer(
[
("onehot_categorical", OneHotEncoder(), categorical_features),
],
remainder="passthrough", # or drop if you don't want the non-categoricals at all...
)
df_encoded = column_trans.fit_transform(df)
Note that according to the docs, the categories argument is
categories‘auto’ or a list of array-like, default=’auto’
Categories (unique values) per feature: ‘auto’ : Determine categories automatically from the training data. list : categories[i] holds the categories expected in the ith column. The passed categories should not mix strings and numeric
values within a single feature, and should be sorted in case of numeric values.
So it should hold every possible category or level of each of the categorical features. You might use this is you know the full possible set of levels but suspect your training data might omit some. In your case, I don't think you;'ll need it so 'auto'
, i.e. the default, should be fine.
Answered By - Dan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.