Issue
I have a Dataframe like this:
name | phase | value |
---|---|---|
BOB | 1 | .9 |
BOB | 2 | .05 |
BOB | 3 | .05 |
JOHN | 2 | .45 |
JOHN | 3 | .45 |
JOHN | 4 | .05 |
FRANK | 1 | .4 |
FRANK | 3 | .6 |
I want to find which entry in column 'phase' has the maximum value in column 'value'.
If more than one share the same maximum value keep the first or a random value for 'phase'.
Desired result table:
name | phase | value |
---|---|---|
BOB | 1 | .9 |
JOHN | 2 | .45 |
FRANK | 3 | .6 |
my approach was:
df.groupby(['name'])[['phase','value']].max()
but it returned incorrect values.
Solution
You don't need to use groupby
. Sort values by value
and phase
(adjust the order if necessary) and drop duplicates by name
:
out = (df.sort_values(['value', 'phase'], ascending=[False, True])
.drop_duplicates('name')
.sort_index(ignore_index=True))
print(out)
# Output
name phase value
0 BOB 1 0.90
1 JOHN 2 0.45
2 FRANK 3 0.60
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.