Issue
I currently have a sample dataframe as such below
date | subject | animal | colors | value |
---|---|---|---|---|
Jan | English | cat | blue | 1 |
Feb | Chemistry | dog | green | 2 |
And assuming that the values above are the only unique values for each of the columns
I am having difficulties trying to create a new multi-level header dataframe using pivot where aggfunc=median and also includes the "sub-total" for each column combination, where "sub-total" refers to when all categories of that column would be included for the aggregation.
For example, i am hoping the resulting dataframe would look like this below, where 'all' refers to the grouping of all the categories for that particular column.
subject | english | english | english | english | english | english | english | english | english | repeat the 9 columns for 'Chemistry' | all | repeat the remaining 8 columns for 'all' |
---|---|---|---|---|---|---|---|---|---|---|---|---|
animal | cat | cat | cat | dog | dog | dog | all | all | all | cat | ||
colors | blue | green | all | blue | green | all | blue | green | all | blue | ||
date | ||||||||||||
Jan | ||||||||||||
Feb |
Following this, there should be a 27 columns because:
number of unique subjects + 1 subtotal = 3
number of unique animals + 1 subtotal = 3
number of unique colors + 1 subtotal = 3
3 x 3 x 3 = 27
Solution
IIUC, you can convert your columns as Categorical
before pivot
:
from itertools import combinations
# Transform discrete columns as categorical features
cols = ['date', 'subject', 'animal', 'colors']
cats = {col: pd.CategoricalDtype(list(df[col].unique()) + ['all'], ordered=True)
for col in cols}
df = df.astype(cats)
# Compute intermediate subtotals
data = []
for grp in combinations(cols, r=len(cols)-1):
df1 = df.groupby(list(grp), as_index=False, observed=True)['value'].sum()
data.append(df1)
out = pd.concat([df, *data]).fillna('all')
# Reshape your dataframe to get all combinations
out = out.pivot_table(index='date', columns=['subject', 'animal', 'colors'],
values='value', fill_value=-1, aggfunc='sum', observed=False)
Bonus: now you also have the column subtotal:
>>> out
subject English Chemistry all
animal cat dog all cat dog all cat dog all
colors blue green all blue green all blue green all blue green all blue green all blue green all blue green all blue green all blue green all
date
Jan 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Feb 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 2 0 0 0 0 0 2 0 0 0 0
all 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.