Issue
I have a pandas dataframe called df
that looks like this
name test_type test_number correct
joe 0 1 1
joe 0 2 0
joe 1 1 0
joe 1 2 1
joe 0 1 1
joe 0 2 1
jim 1 1 0
jim 1 2 1
jim 0 1 0
jim 0 2 1
jim 1 1 0
jim 1 2 0
I want a dataset that groups by name
, and extract the mean value of correct
by test_type
(as a single value) as well as the mean value of correct by test_type
and test_number
(as a numpy
array).
Here is what I need:
name correct_0 correct_1 correct_0_by_tn correct_val_1_by_tn
joe 0.75 0.5 [1, 0.5] [0, 1]
jim 0.5 0.25 [0, 1] [0, 0.5]
I've been using df.groupby(["name", "test_type"]).correct.mean().reset_index()
and df.groupby(["name", "test_type", "test_number"]).correct.mean().reset_index()
but I can't manage to 1) extract the mean by test_number
as an array like I want to and 2) organize the output in a coherent dataframe.
Thanks in advance.
Solution
IIUC, you can use:
A = df.groupby(['name', 'test_type'], sort=False)['correct'].mean().unstack()
B = (df
.groupby(['name', 'test_type', 'test_number'])['correct'].mean()
.unstack().agg(list, axis=1).unstack()
)
out = A.join(B.add_suffix('_by_tn')).add_prefix('correct_')
output:
test_type correct_0 correct_1 correct_0_by_tn correct_1_by_tn
name
joe 0.75 0.50 [1.0, 0.5] [0.0, 1.0]
jim 0.50 0.25 [0.0, 1.0] [0.0, 0.5]
Alternative output:
out = (A
.join(B.add_suffix('_by_tn'))
.add_prefix('correct_')
.rename_axis(columns=None)
.reset_index()
)
output:
name correct_0 correct_1 correct_0_by_tn correct_1_by_tn
0 joe 0.75 0.50 [1.0, 0.5] [0.0, 1.0]
1 jim 0.50 0.25 [0.0, 1.0] [0.0, 0.5]
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.