Issue
Suppose I have the following dataframe
df = pd.DataFrame([
(2, 2, 'A', .5),
(2, 2, 'A', .6),
(2, 2, 'B', .75),
(2, 2, 'B', .7),
(2, 2, 'C', .6),
(2, 3, 'A', .65),
(2, 3, 'A', .6),
(2, 3, 'B', .75),
(2, 3, 'B', .7),
(2, 3, 'C', .6)
], columns=['out_size', 'problem_size', 'algo', 'time'])
I want to
- group by `[out_size', 'problem_size', 'algo'], and for each group
count the number of occurrences for eachalgo
, then- select/keep the
algo
that has the lowest average time in that group,
result
pd.DataFrame(
[[2, 2, 'A', 0.55],
[2, 3, 'C', 0.6]], columns=['out_size', 'problem_size', 'algo', 'time'])
Solution
You can use a double groupby
:
cols = ['out_size', 'problem_size', 'algo']
out = (df
.groupby(cols, as_index=False)['time'].mean()
.sort_values(by='time')
.groupby(cols[:-1], as_index=False).first()
)
Slightly more efficient alternative that doesn't require to sort the values (but requires to store an intermediate):
cols = ['out_size', 'problem_size', 'algo']
out = df.groupby(cols)['time'].mean()
out = out.loc[out.groupby(cols[:-1]).idxmin()].reset_index()
output:
out_size problem_size algo time
0 2 2 A 0.55
1 2 3 C 0.60
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.