Issue
I have the following dataframe.
c1 c2 v1 v2
0 a a 1 2
1 a a 2 3
2 b a 3 1
3 b a 4 5
5 c d 5 0
I wish to have the following output.
c1 c2 v1 v2
0 a a 2 3
1 b a 4 5
2 c d 5 0
The rule. First group dataframe by c1, c2. Then into each group, keep the row with the maximun value in column v2. Finally, output the original dataframe with all the rows not satisfying the previous rule dropped.
What is the better way to obtain this result? Thanks.
Going around, I have found also this solution based on apply method
Solution
You could use groupby-transform
to generate a boolean selection mask:
grouped = df.groupby(['c1', 'c2'])
mask = grouped['v2'].transform(lambda x: x == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)
yields
c1 c2 v1 v2
0 a a 2 3
1 b a 4 5
2 c d 5 0
Answered By - unutbu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.