Issue
I have a df, one of the column looks like this:
channels
0 [email, mobile, social]
1 [web, email, mobile, social]
2 [web, email, mobile]
3 [web, email, mobile]
4 [web, email]
5 [web, email, mobile, social]
6 [web, email, mobile, social]
7 [email, mobile, social]
8 [web, email, mobile, social]
9 [web, email, mobile]
How can I split each item in each cell so that I can implement one-hot encoding?
I tried:
portfolio.channels.str.split(expand=True)
Return:
0
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
Solution
You can use MultiLabelBinarizer
from sklearn.
from sklearn.preprocessing import MultiLabelBinarizer
#create the MultiLabelBinarizer and fit_trasnform your data (only first 3 rows here)
mlb = MultiLabelBinarizer()
a = mlb.fit_transform(df.channels.to_numpy())
#create the dataframe with columns names being the
df_ohe = pd.DataFrame(a,df.index, mlb.classes_)
print (df_ohe)
email mobile social web
0 1 1 1 0
1 1 1 1 1
2 1 1 0 1
Answered By - Ben.T
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.