Issue
Is there a simple way to calculate the average for each column in a pandas DataFrame and for each row exclude the specific value? The x
in each row below marks the value in each iteration to be excluded from the calculation:
a b a b a b
0 1 2 0 x x 0 1 2
1 2 4 first loop 1 2 4 second loop 1 x x etc.
2 3 6 ---> 2 3 6 ---> 2 3 6 --->
3 4 8 3 4 8 3 4 8
4 5 10 4 5 10 4 5 10
____________ _____________
col_avg: 3.5 7.0 col_avg: 3.25 6.5
Using only 4 values at each iteration, as the "x" is excluded from data set
resulting in a new DataFrame
a_x b_x
0 3.5 7.0
1 3.25 6.5
2 3.0 6.0
3 2.75 5.5
4 2.5 5.0
Thanks
/N
Solution
To start off with the first step, let's say we were interested in summing instead of calculating the average values. In that case, we would be adding all elems along each col except the current elem. Other way to look at it/solve it would be to sum all elems along each col and subtract the current elem itself. So, essentially we could get the sum for all columns with df.sum(0)
and simply subtract df
from it, keeping the axis
aligned. Broadcasting
would take care of performing these operations across all cols in one go.
To get the second step of averaging, we simply divide by the number of elems involved for each col's summing, i.e. df.shape[0]-1
.
Thus, we would have a vectorized solution, like so -
df_out = (df.sum(0) - df)/float(df.shape[0]-1)
Sample run -
In [128]: df
Out[128]:
a b
0 1 2
1 2 4
2 3 6
3 4 8
4 5 10
In [129]: (df.sum(0) - df)/float(df.shape[0]-1)
Out[129]:
a b
0 3.50 7.0
1 3.25 6.5
2 3.00 6.0
3 2.75 5.5
4 2.50 5.0
To set the column names to the desired ones, do : df_out.columns = ['a_x','b_x']
.
Answered By - Divakar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.