Issue
I am working on a groupby operation using the attribute
column but I want to exclude the desc_type 1 and 2
that will be used to calculate total discount inside each attrib.
pd.DataFrame({'ID':[10,10,10,20,30,30],'attribute':['attrib_1','desc_type1','desc_type2','attrib_1','attrib_2','desc_type1'],'value':[100,0,0,100,30,0],'discount':[0,6,2,0,0,13.3]})
output:
ID attribute value discount
10 attrib_1 100 0
10 desc_type1 0 6
10 desc_type2 0 2
20 attrib_1 100 0
30 attrib_2 30 0
30 desc_type1 0 13.3
I want to groupby this dataframe by attribute
but excluding the desc_type1 and desc_type2
.
The desired output:
attribute ID_count value_sum discount_sum
attrib_1 2 200 8
attrib_2 1 30 13.3
explanations:
attrib_1
has discount_sum=8 because ID 30 that belongs to attrib_1
has two desc_type
attrib_2
has discount_sum=13.3 because ID 10 has one desc_type
ID=20
has no discounts types.
What I did so far:
df.groupby('attribute').agg({'ID':'count','value':'sum','discount':'sum'})
But the line above does not exclude the desc_type 1 and 2
from the groupby
Important: an ID may have a discount or not.
link to the realdataset: realdataset
Solution
You can fill the attributes per ID, then groupby.agg
:
m = df['attribute'].str.startswith('desc_type')
group = df['attribute'].mask(m).groupby(df['ID']).ffill()
out = (df
.groupby(group, as_index=False)
.agg(**{'ID_count': ('ID', 'nunique'),
'value_sum': ('value', 'sum'),
'discount_sum': ('discount', 'sum')
})
)
output:
ID_count value_sum discount_sum
0 2 200 8.0
1 1 30 13.3
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.