Issue
Assuming I'm dealing with this dataframe:
ID | Qualified | Year | Amount A | Amount B |
---|---|---|---|---|
1 | No | 2020 | 0 | 150 |
1 | No | 2019 | 0 | 100 |
1 | Yes | 2019 | 10 | 15 |
1 | No | 2018 | 0 | 100 |
1 | Yes | 2018 | 10 | 150 |
2 | Yes | 2020 | 0 | 200 |
2 | No | 2017 | 0 | 100 |
... | ... | ... | ... |
My desired output should be like this:
ID | Qualified | Year | Amount A | Amount B |
---|---|---|---|---|
1 | No | 2020 | 0 | 150 |
1 | Partial | 2019 | 10 | 115 |
1 | Partial | 2018 | 10 | 250 |
2 | Yes | 2020 | 0 | 200 |
2 | No | 2017 | 0 | 100 |
... | ... | ... | ... |
As you can see, Qualified column creates new merged values (Yes & No -> Partial, amount A + B ) from a condition: a year in an ID includes both Yes and No in Qualified column.
Don't know how to approach it. Anyone could provide any methodology?
Solution
You can use the function agg()
and groupby()
to perform this operation.
agg()
allows you to use not only common aggregation functions (such as sum
, mean
, etc.) but also custom defined functions.
I would do as follows:
def agg_qualify(x):
values = x.unique()
if len(x)>1:
return 'Partial'
return values[0]
df.groupby(['ID', 'Year']).agg({
'Qualified': lambda x: agg_qualify(x),
'Amount A': 'sum',
'Amount B': 'sum',
}).reset_index()
Output:
ID Year Qualified Amount A Amount B
0 1 2018 Partial 10 250.0
1 1 2019 Partial 10 115.0
2 1 2020 No 0 150.0
3 2 2020 Yes 0 200.0
Answered By - Antoine Dubuis
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.