Issue
How do I get something similar to a one-hot encoding, except for 1s and 0s, returning values from a list of the same size as the labels?
As an example DataFrame, with Sklearn's MultiLabelBinarizer
I can turn the following DataFrame of lists:
df = pd.DataFrame({'labels': [['A', 'B'], ['B', 'D', 'F'],
['A', 'C', 'D'], ['D', 'B']],
'values': [[13.2, 1.3], [7.9, 12, 3.2],
[9.0, 1, 14.8], [3.1, 6.7]]})
labels values
0 [A, B] [13.2, 1.3]
1 [B, D, F] [7.9, 12, 3.2]
2 [A, C, D] [9.0, 1, 14.8]
3 [D, B] [3.1, 6.7]
mlb = MultiLabelBinarizer(sparse_output=True)
df = df.join(
pd.DataFrame.sparse.from_spmatrix(
mlb.fit_transform(df.pop('labels')),
index=df.index,
columns=mlb.classes_))
To:
values A B C D F
0 [13.2, 1.3] 1 1 0 0 0
1 [7.9, 12, 3.2] 0 1 0 1 1
2 [9.0, 1, 14.8] 1 0 1 1 0
3 [3.1, 6.7] 0 1 0 1 0
While I am looking for a way to get this:
A B C D F
0 13.2 1.3 0 0 0
1 0 7.9 0 12.0 3.2
2 9.0 0 1. 14.8 0
3 0 6.7 0 3.1 0
Solution
You could explode
+ reset_index
+ pivot
+ fillna
:
out = (pd.pivot(df.explode(df.columns.tolist()).reset_index(),
'index', 'labels', 'values')
.fillna(0)
.rename_axis(index=[None], columns=[None]))
Output:
A B C D F
0 13.2 1.3 0 0.0 0.0
1 0.0 7.9 0 12.0 3.2
2 9.0 0.0 1 14.8 0.0
3 0.0 6.7 0 3.1 0.0
Answered By - enke
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.