Issue
I have following dataframe:
uniq_id value
2016-12-26 11:03:10 001 342
2016-12-26 11:03:13 004 5
2016-12-26 12:03:13 005 14
2016-12-26 12:03:13 008 114
2016-12-27 11:03:10 009 343
2016-12-27 11:03:13 013 5
2016-12-27 12:03:13 016 124
2016-12-27 12:03:13 018 114
And i need get top N records for each day sorted by value. Something like this (for N=2):
2016-12-26 001 342
008 114
2016-12-27 009 343
016 124
Please suggest right way to do that in pandas 0.19.x
Solution
Unfortunately there is no yet such method as DataFrameGroupBy.nlargest()
, which would allow us to do the following:
df.groupby(...).nlargest(2, columns=['value'])
So here is a bit ugly, but working solution:
In [73]: df.set_index(df.index.normalize()).reset_index().sort_values(['index','value'], ascending=[1,0]).groupby('index').head(2)
Out[73]:
index uniq_id value
0 2016-12-26 1 342
3 2016-12-26 8 114
4 2016-12-27 9 343
6 2016-12-27 16 124
PS i think there must be a better one...
UPDATE: if your DF wouldn't have duplicated index values, the following solution should work as well:
In [117]: df
Out[117]:
uniq_id value
2016-12-26 11:03:10 1 342
2016-12-26 11:03:13 4 5
2016-12-26 12:03:13 5 14
2016-12-26 12:33:13 8 114 # <-- i've intentionally changed this index value
2016-12-27 11:03:10 9 343
2016-12-27 11:03:13 13 5
2016-12-27 12:03:13 16 124
2016-12-27 12:33:13 18 114 # <-- i've intentionally changed this index value
In [118]: df.groupby(pd.TimeGrouper('D')).apply(lambda x: x.nlargest(2, 'value')).reset_index(level=1, drop=1)
Out[118]:
uniq_id value
2016-12-26 1 342
2016-12-26 8 114
2016-12-27 9 343
2016-12-27 16 124
Answered By - MaxU - stop genocide of UA
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.