Issue
For instance, I create a pandas dataframe:
data = [['nick', 3, '2023/11/22 10:05:00', 'A'], ['tom', 3, '2023/11/22 9:25:00','A'], ['juli', 2, '2023/11/22 12:05:00', 'B'], ['ewa', 4, '2023/11/22 9:55:00', 'B'],['peter', 5, '2023/11/22 11:00:00', 'A'], ['johan', 5, '2023/11/22 9:00:00', 'C']]
col_name = ['name', 'score', 'time', 'category']
df = df = pd.DataFrame(data, columns=col_name)
There are six students from different groups, and they have different scores in an exam. I want to group together the students from the same group and then sort them in the group from the highest score to lowest. Then I want to put the group with the highest score in the beginning of the dataframe followed by the group with lower score. If two students have the same score, the one with earlier time should be put first.
I tried the code:
df_sorted=df.sort_values(['group','score'],ascending=False).groupby('group').apply(lambda x: x)
It groups together the students and sorts them in the same group. The new dataframe I have now:
It only partly achieved what I need to do. Peter has a higher score than Eva and Juli, so Peter, Nick and Tom should be put right under Johan, followed by Ewa and Juli. Tom should be put in front of Nick since the time is earlier.
What should I do next to achieve the goal?
Solution
You can add an extra column with groupby.transform
to add a group score (here max
), then use this to sort the groups:
out = (df
.assign(max=df.groupby('category')['score'].transform('max'),
dt=pd.to_datetime(df['time'])
)
.sort_values(by=['max', 'category', 'score', 'dt'],
ascending=[False, True, False, True]
)
.drop(columns=['max', 'dt'])
)
Same logic without the temporary columns with numpy.lexsort
:
import numpy as np
order = np.lexsort([pd.to_datetime(df['time']),
-df['score'],
df['category'],
-df.groupby('category')['score'].transform('max')])
out = df.iloc[order]
Output:
name score time category
4 peter 5 2023/11/22 11:00:00 A
1 tom 3 2023/11/22 9:25:00 A
0 nick 3 2023/11/22 10:05:00 A
5 johan 5 2023/11/22 9:00:00 C
3 ewa 4 2023/11/22 9:55:00 B
2 juli 2 2023/11/22 12:05:00 B
breaking a tie between two groups with the same max
Here groups A
and C
have the same maximum, if you want to have C
on top because it has a greater average score than A
, you can add an extra sorter:
g = df.groupby('category')['score'].transform
out = (df
.assign(max=g('max'),
mean=g('mean'),
dt=pd.to_datetime(df['time'])
)
.sort_values(by=['max', 'mean', 'category', 'score', 'dt'],
ascending=[False, False, True, False, True]
)
.drop(columns=['max', 'mean', 'dt'])
)
Or:
import numpy as np
g = df.groupby('category')['score'].transform
order = np.lexsort([pd.to_datetime(df['time']),
-df['score'],
df['category'],
-g('mean'),
-g('max')])
out = df.iloc[order]
Output:
name score time category
5 johan 5 2023/11/22 9:00:00 C
4 peter 5 2023/11/22 11:00:00 A
1 tom 3 2023/11/22 9:25:00 A
0 nick 3 2023/11/22 10:05:00 A
3 ewa 4 2023/11/22 9:55:00 B
2 juli 2 2023/11/22 12:05:00 B
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.