Issue
Given the below dataframe, if rows are grouped by first name and last name, how can I find and create new rows for a group that does not have a row for every type in the types list.
So in the example below, two new rows would be created for Bob Jack that are missing from the original dataframe: one with type 'DA' and another with type 'FA', the value columns can be set to 0.
data = {
'First Name': ['Alice', 'Alice', 'Alice', 'Alice', 'Bob', 'Bob'],
'Last Name': ['Johnson', 'Johnson', 'Johnson', 'Johnson', 'Jack', 'Jack'],
'Type': ['CA', 'DA', 'FA', 'GCA', 'CA', 'GCA'],
'Value': [25, 30, 35, 40, 50, 37]
}
types = ['CA', 'DA', 'FA', 'GCA']
df = pd.DataFrame(data)
Solution
One way to do this is to create a dataframe which is all the combinations of names and types, then left join that to the original dataframe. This will create a df with all combinations, with NaN
values where there was a missing entry in the original data. That can then be filled with 0
. Note that because the value column gets NaN
values in it, it is converted to type float. You can convert that back to int if desired using astype({'Value': int})
in the chain:
out = (df[['First Name', 'Last Name']]
.drop_duplicates()
.merge(pd.Series(types, name='Type'), how='cross')
.merge(df, on=['First Name', 'Last Name', 'Type'], how='left')
.fillna(0)
# use this astype if you need `Value` to be an int
.astype({'Value': int})
)
Output (with the astype
to convert Value
back to int):
First Name Last Name Type Value
0 Alice Johnson CA 25
1 Alice Johnson DA 30
2 Alice Johnson FA 35
3 Alice Johnson GCA 40
4 Bob Jack CA 50
5 Bob Jack DA 0
6 Bob Jack FA 0
7 Bob Jack GCA 37
Answered By - Nick
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.