Sunday, March 6, 2022

[FIXED] Grouping by a column and assigning different aggregation function for a certain column that is used in a group by

March 06, 2022 dataframe, numpy, pandas, pandas-groupby, python No comments

Issue

I have a df:

field           category            2022-01-10      2022-01-17      2022-01-24    2022-01-31 
A               Chair               50              100             200           150
B               Chair               75              15              57            75   
C               Chair               0.57            0.64            0.4           0.47
A               Table               50              100             200           150
B               Table               75              15              57            75   
C               Table               0.57            0.64            0.4           0.47
A               Mirror              50              100             200           150
B               Mirror              75              15              57            75   
C               Mirror              0.57            0.64            0.4           0.47

I am trying to group by field or simply drop the category column and aggregate the values for each week so that the result would look like this:

field       2022-01-10      2022-01-17      2022-01-24    2022-01-31 
A           50+50+50        100+100+100     200+200+200   150+150+150
B           75+75+75        15+15+15        57+57+57      75+75+75
C           np.mean([0.57, 0.57, 0.57])     # each weeks average of field C

For fields A & B each week I would sum up the values for that week in different categories, but for field C I want the average.

I tried: df.groupby(['field']).agg({'A':sum, 'B':sum, 'C'.np.mean}) but this of course does not work because the fields arethe thing that's being grouped by and not the column.

I believe this problem can be sorted by somehow dropping an category column and then transforming the values based on field column but I am not sure how that works.

Solution

Create a dict of functions to apply then flat your dataframe and group by field and variable (dates) columns. Use the first index of groupby to get the right function. Finally reshape your dataframe.

aggfuncs = {'A': lambda x: np.sum(x),
            'B': lambda x: np.sum(x),
            'C': lambda x: np.mean(x)}

out = df.melt(['field', 'category']).groupby(['field', 'variable'])['value'] \
        .apply(lambda x: aggfuncs[x.name[0]](x)).unstack() \
        .rename_axis(columns=None).reset_index()
print(out)

# Output
  field  2022-01-10  2022-01-17  2022-01-24  2022-01-31
0     A      150.00      300.00       600.0      450.00
1     B      225.00       45.00       171.0      225.00
2     C        0.57        0.64         0.4        0.47

Answered By - Corralien

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, March 6, 2022

[FIXED] Grouping by a column and assigning different aggregation function for a certain column that is used in a group by

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels