Issue
I have a large dataframe (from 500k to 1M rows) which contains for example these 3 numeric columns: ID, A, B
I want to filter the results in order to obtain a table like the one in the image below, where, for each unique value of column id, i have the maximum and minimum value of A and B. How can i do?
EDIT: i have updated the image below in order to be more clear: when i get the max or min from a column i need to get also the data associated to it of the others columns
Solution
Sample data (note that you posted an image which can't be used by potential answerers without retyping, so I'm making a simple example in its place):
df=pd.DataFrame({ 'id':[1,1,1,1,2,2,2,2],
'a':range(8), 'b':range(8,0,-1) })
The key to this is just using idxmax
and idxmin
and then futzing with the indexes so that you can merge things in a readable way. Here's the whole answer and you may wish to examine intermediate dataframes to see how this is working.
df_max = df.groupby('id').idxmax()
df_max['type'] = 'max'
df_min = df.groupby('id').idxmin()
df_min['type'] = 'min'
df2 = df_max.append(df_min).set_index('type',append=True).stack().rename('index')
df3 = pd.concat([ df2.reset_index().drop('id',axis=1).set_index('index'),
df.loc[df2.values] ], axis=1 )
df3.set_index(['id','level_2','type']).sort_index()
a b
id level_2 type
1 a max 3 5
min 0 8
b max 0 8
min 3 5
2 a max 7 1
min 4 4
b max 4 4
min 7 1
Note in particular that df2 looks like this:
id type
1 max a 3
b 0
2 max a 7
b 4
1 min a 0
b 3
2 min a 4
b 7
The last column there holds the index values in df
that were derived with idxmax
& idxmin
. So basically all the information you need is in df2
. The rest of it is just a matter of merging back with df
and making it more readable.
Answered By - JohnE
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.