Issue
I have the following table. I want to calculate a weighted average grouped by each date based on the formula below. I can do this using some standard conventional code, but assuming that this data is in a pandas dataframe, is there any easier way to achieve this rather than through iteration?
Date ID wt value w_avg
01/01/2012 100 0.50 60 0.791666667
01/01/2012 101 0.75 80
01/01/2012 102 1.00 100
01/02/2012 201 0.50 100 0.722222222
01/02/2012 202 1.00 80
01/01/2012 w_avg = 0.5 * ( 60/ sum(60,80,100)) + .75 * (80/ sum(60,80,100)) + 1.0 * (100/sum(60,80,100))
01/02/2012 w_avg = 0.5 * ( 100/ sum(100,80)) + 1.0 * ( 80/ sum(100,80))
Solution
I think I would do this with two groupbys.
First to calculate the "weighted average":
In [11]: g = df.groupby('Date')
In [12]: df.value / g.value.transform("sum") * df.wt
Out[12]:
0 0.125000
1 0.250000
2 0.416667
3 0.277778
4 0.444444
dtype: float64
If you set this as a column, you can groupby over it:
In [13]: df['wa'] = df.value / g.value.transform("sum") * df.wt
Now the sum of this column is the desired:
In [14]: g.wa.sum()
Out[14]:
Date
01/01/2012 0.791667
01/02/2012 0.722222
Name: wa, dtype: float64
or potentially:
In [15]: g.wa.transform("sum")
Out[15]:
0 0.791667
1 0.791667
2 0.791667
3 0.722222
4 0.722222
Name: wa, dtype: float64
Answered By - Andy Hayden
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.