Tuesday, November 21, 2023

[FIXED] How do I add a filter to a pivot table grouping?

November 21, 2023 dataframe, pandas, pivot-table, python No comments

Issue

Follow up to: How do I sum a column based on separate category types, and preserve the zeros?

I used 'Id' as a stand in for a bunch of columns. One of them is 'Date'. Here is my dataframe after applying the pivot table method.

Id	Date	A_Hours	B_Hours
1	2/2/2022	4	4
2	2/4/2023	2	6
3	6/2/2023	3	0

But because 'Date' was one of the columns I added to the index with 'Id', here is the dataframe for a user with multiple hours entered for multiple dates.

Id	Date	A_Hours	B_Hours
1	2/2/2022	4	4
2	2/3/2023	3	3
2	2/4/2023	2	6
3	6/2/2023	3	0

How do I get the following dataframe:

Id	2/3/2023_A_Hours	2/4/2023_B_Hours	6/2/2023_A_Hours	6/2/2023_B_Hours
1	0	0	0	0
2	3	3	2	6
3	0	0	3	0

Note that there is an additional filter placed on top of this - Id 1 has all 0s because in our result dataframe, not only are we combining rows into multiple columns for a single id, we only want to calculate hours entered in 2023. (and exclude all dates < 1/1/2023 in the count)

Also note that this time, I don't want to create a new column name '2/4/2023_B_Hours'. I would actually prefer a multi-level index with "A_hours" and "B_hours" under "2/4/2023". The table in StackOverflow doesn't allow me to format it like that.

Solution

You should provide a reproducible example for clarity.

Assuming this one, we want to consider dates after 2023-01-15 (which are absent in Id 3):

   Id        Date Category  Hours
0   1  2023-01-01        A      1   # this should be filtered out
1   1  2023-02-01        A      3
2   1  2023-02-01        B      4
3   2  2023-03-01        A      2
4   2  2023-03-01        B      6
5   3  2023-01-01        A      3   # this should be filtered out

You can pre-filter the rows (for example using query), then reindex after pivoting to add the missing Ids:

out = (df.query('Date > "2023-01-15"')
         .pivot_table(index='Id', columns='Category', values='Hours',
                      aggfunc='sum', fill_value=0,
                      margins=True, margins_name='Total')
         .add_suffix('_Hours')
         .drop('Total')
         .reindex(df['Id'].unique(), fill_value=0)
         .reset_index().rename_axis(columns=None)
      )

Output:

   Id  A_Hours  B_Hours  Total_Hours
0   1        3        4            7
1   2        2        6            8
2   3        0        0            0

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 21, 2023

[FIXED] How do I add a filter to a pivot table grouping?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels