Issue
I am trying to figure out the best way to create a new column which is the average of some columns based on the values of other columns. The new column would be the average of columns A and B as long as at least one A_flag or B_flag is true.
cols = ['A','A_flag','B','B_flag']
d= [(5,False,3,False),(2,False,7,True),(1,True,10,True),(12,True,2,False)]
df=pd.DataFrame(columns=cols,data=d)
df
A A_flag B B_flag
0 5 False 3 False
1 2 False 7 True
2 1 True 10 True
3 12 True 2 False
For this example, the first row would produce a value of Nan since both flags are false. The others would be the average -> 4, 4.5, 5.5, 7.
I know I could create an additional column using something like the following and then aggregate the result, but I guess there is something much more efficient. df['A_flag'].apply(lambda x: 0 if x is True else 1)
Solution
Try with np.where
and any
df['new'] = np.where(df[['A_flag','B_flag']].any(1), df[['A','B']].mean(1), np.nan)
df
Out[113]:
A A_flag B B_flag new
0 5 False 3 False NaN
1 2 False 7 True 4.5
2 1 True 10 True 5.5
3 12 True 2 False 7.0
Answered By - BENY
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.