Wednesday, January 24, 2024

[FIXED] Sub-totals for each column combination in Python Pandas

January 24, 2024 dataframe, pandas, pivot, python No comments

Issue

I currently have a sample dataframe as such below

date	subject	animal	colors	value
Jan	English	cat	blue	1
Feb	Chemistry	dog	green	2

And assuming that the values above are the only unique values for each of the columns

I am having difficulties trying to create a new multi-level header dataframe using pivot where aggfunc=median and also includes the "sub-total" for each column combination, where "sub-total" refers to when all categories of that column would be included for the aggregation.

For example, i am hoping the resulting dataframe would look like this below, where 'all' refers to the grouping of all the categories for that particular column.

subject	english	english	english	english	english	english	english	english	english	all
animal	cat	cat	cat	dog	dog	dog	all	all	all	cat
colors	blue	green	all	blue	green	all	blue	green	all	blue
date
Jan
Feb

Following this, there should be a 27 columns because:

number of unique subjects + 1 subtotal = 3
number of unique animals + 1 subtotal = 3
number of unique colors + 1 subtotal = 3

3 x 3 x 3 = 27

Solution

IIUC, you can convert your columns as Categorical before pivot:

from itertools import combinations

# Transform discrete columns as categorical features
cols = ['date', 'subject', 'animal', 'colors']
cats = {col: pd.CategoricalDtype(list(df[col].unique()) + ['all'], ordered=True)
           for col in cols}
df = df.astype(cats)

# Compute intermediate subtotals
data = []
for grp in combinations(cols, r=len(cols)-1):
    df1 = df.groupby(list(grp), as_index=False, observed=True)['value'].sum()
    data.append(df1)
out = pd.concat([df, *data]).fillna('all')

# Reshape your dataframe to get all combinations
out = out.pivot_table(index='date', columns=['subject', 'animal', 'colors'], 
                      values='value', fill_value=-1, aggfunc='sum', observed=False)

Bonus: now you also have the column subtotal:

>>> out
subject English                                         Chemistry                                          all                                        
animal      cat            dog            all                 cat            dog            all            cat            dog            all          
colors     blue green all blue green all blue green all      blue green all blue green all blue green all blue green all blue green all blue green all
date                                                                                                                                                  
Jan           1     0   1    0     0   0    1     0   0         0     0   0    0     0   0    0     0   0    1     0   0    0     0   0    0     0   0
Feb           0     0   0    0     0   0    0     0   0         0     0   0    0     2   2    0     2   0    0     0   0    0     2   0    0     0   0
all           1     0   0    0     0   0    0     0   0         0     0   0    0     2   0    0     0   0    0     0   0    0     0   0    0     0   0

Answered By - Corralien

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, January 24, 2024

[FIXED] Sub-totals for each column combination in Python Pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels