Issue
I have a raw dataset like below:
ColA | ColB | duration | interval | Counter |
---|---|---|---|---|
A | SD | 2 | 4 | 1 |
A | SD | 3 | 3 | 2 |
A | UD | 2 | 1 | 10 |
B | UD | 1 | 2 | 2 |
B | UD | 2 | 2 | 2 |
B | SD | 3 | 3 | 13 |
B | SD | 1 | 4 | 19 |
I am expecting an output result like below:
Explanation of the output:
- SumCounter is the sum of counter values on the group by ColA and ColB values.
- AvgdurationSD/UD and AvgIntervalSD/UD are created by taking the average over ColA and ColB and having a 0 value incase the columns dont match the criteria (e.g. AvgDurationSD and AvIntervalSD has 0 value for a group of ColA = A and ColB = UD.
I understand that I have to use group by
and agg
functions to apply here but I am not really sure how to apply conditions for ColB on individual new columns.
Any help is appreciated:)
Solution
Use DataFrame.pivot_table
with helper column new
by copy like ColB
, then flatten MultiIndex
and add ouput to new DataFrame created by aggregate sum
:
df1 = (df.assign(new=df['ColB'])
.pivot_table(index=['ColA', 'ColB'],
columns='new',
values=['interval','duration'],
fill_value=0,
aggfunc='mean'))
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
df = (df.groupby(['ColA','ColB'])['Counter']
.sum()
.to_frame(name='SumCounter')
.join(df1).reset_index())
print (df)
ColA ColB SumCounter durationSD durationUD intervalSD intervalUD
0 A SD 3 2.5 0.0 3.5 0
1 A UD 10 0.0 2.0 0.0 1
2 B SD 32 2.0 0.0 3.5 0
3 B UD 4 0.0 1.5 0.0 2
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.