Issue
I have a df
:
field category 2022-01-10 2022-01-17 2022-01-24 2022-01-31
A Chair 50 100 200 150
B Chair 75 15 57 75
C Chair 0.57 0.64 0.4 0.47
A Table 50 100 200 150
B Table 75 15 57 75
C Table 0.57 0.64 0.4 0.47
A Mirror 50 100 200 150
B Mirror 75 15 57 75
C Mirror 0.57 0.64 0.4 0.47
I am trying to group by field
or simply drop the category
column and aggregate the values for each week so that the result would look like this:
field 2022-01-10 2022-01-17 2022-01-24 2022-01-31
A 50+50+50 100+100+100 200+200+200 150+150+150
B 75+75+75 15+15+15 57+57+57 75+75+75
C np.mean([0.57, 0.57, 0.57]) # each weeks average of field C
For fields A & B
each week I would sum up the values for that week in different categories, but for field C
I want the average.
I tried:
df.groupby(['field']).agg({'A':sum, 'B':sum, 'C'.np.mean})
but this of course does not work because the fields arethe thing that's being grouped by and not the column.
I believe this problem can be sorted by somehow dropping an category
column and then transforming the values based on field
column but I am not sure how that works.
Solution
Create a dict of functions to apply then flat your dataframe and group by field
and variable
(dates) columns. Use the first index of groupby
to get the right function. Finally reshape your dataframe.
aggfuncs = {'A': lambda x: np.sum(x),
'B': lambda x: np.sum(x),
'C': lambda x: np.mean(x)}
out = df.melt(['field', 'category']).groupby(['field', 'variable'])['value'] \
.apply(lambda x: aggfuncs[x.name[0]](x)).unstack() \
.rename_axis(columns=None).reset_index()
print(out)
# Output
field 2022-01-10 2022-01-17 2022-01-24 2022-01-31
0 A 150.00 300.00 600.0 450.00
1 B 225.00 45.00 171.0 225.00
2 C 0.57 0.64 0.4 0.47
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.