Issue
I have a dataframe where one of the columns has its items separated with commas. It looks like:
Data |
---|
a,b,c |
a,c,d |
d,e |
a,e |
a,b,c,d,e |
My goal is to create a matrix that has as header all the unique values from column Data, meaning [a,b,c,d,e]. Then as rows a flag indicating if the value is at that particular row. The matrix should look like this:
Data | a | b | c | d | e |
---|---|---|---|---|---|
a,b,c | 1 | 1 | 1 | 0 | 0 |
a,c,d | 1 | 0 | 1 | 1 | 0 |
d,e | 0 | 0 | 0 | 1 | 1 |
a,e | 1 | 0 | 0 | 0 | 1 |
a,b,c,d,e | 1 | 1 | 1 | 1 | 1 |
To separate column Data what I did is:
df['data'].str.split(',', expand = True)
Then I don't know how to proceed to allocate the flags to each of the columns.
Solution
Maybe you can try this without pivot.
Create the dataframe.
import pandas as pd
import io
s = '''Data
a,b,c
a,c,d
d,e
a,e
a,b,c,d,e'''
df = pd.read_csv(io.StringIO(s), sep = "\s+")
We can use pandas.Series.str.split
with expand
argument equals to True
. And value_counts
each rows with axis = 1
.
Finally fillna
with zero and change the data into integer with astype(int)
.
df["Data"].str.split(pat = ",", expand=True).apply(lambda x : x.value_counts(), axis = 1).fillna(0).astype(int)
#
a b c d e
0 1 1 1 0 0
1 1 0 1 1 0
2 0 0 0 1 1
3 1 0 0 0 1
4 1 1 1 1 1
And then merge it with the original column.
new = df["Data"].str.split(pat = ",", expand=True).apply(lambda x : x.value_counts(), axis = 1).fillna(0).astype(int)
pd.concat([df, new], axis = 1)
#
Data a b c d e
0 a,b,c 1 1 1 0 0
1 a,c,d 1 0 1 1 0
2 d,e 0 0 0 1 1
3 a,e 1 0 0 0 1
4 a,b,c,d,e 1 1 1 1 1
Answered By - Denny Chen
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.