Issue
Suppose a dataframe df with columns a,b,c,d. I know the way of defining a function to aggregate values in Panda like:
def my_agg(x):
names = {
'a_Total': x['a'].sum(),
'b_Mean': x['b'].mean()
}
return pd.Series(names, index=['a_Total','b_Mean'])
d_aggregate = df.groupby(['c','d']).apply(my_agg)
What I am looking for is a way to take total or mean on a, b respectively based on selective values in column 'c' or 'd'.
Sample data:
df=pd.DataFrame({"a":[10,20,30,40],
"b":[1,2,3,4],
"c":[c1,c1,c1,c2],
"d":[100,200,300,400]})
My aggregate function:
def my_agg91(x):
names = {
'Sum_a': x['a'].sum(),
'Mean_b': x['b'].mean()}
return pd.Series(names, index=['Sum_a','Mean_b'])
df2= df.groupby(['c']).apply(my_agg91)
which gives me:
Sum_a Mean_b
c
c1 60.0 2.0
c2 40.0 4.0
What I want: Sum of 'a' for 'd'<250 and mean of 'b' for 'd'>250 (in single dataframe) please suggest the changes in function to get output:
Sum_a Mean_b
c
c1 30.0 3.0
c2 0.0 4.0
Solution
Filter according in your function, like:
def my_agg92(x):
names = {
'Sum_a': x[x['d'] < 250]['a'].sum(),
'Mean_b': x[x['d'] > 250]['b'].mean()}
return pd.Series(names, index=['Sum_a','Mean_b'])
df.groupby(['c']).apply(my_agg92)
Answered By - abevieiramota
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.