Tuesday, October 26, 2021

[FIXED] How to group by multiple column values in pandas and apply a ifelse to impute/calculate values

October 26, 2021 numpy, pandas, pandas-groupby, python, r No comments

Issue

I have a dataframe df like below

Node COMMODITY_CODE DAY Capacity_Case  Capacity_Delivery case_ratio deliveries_ratio  window_count
7014.0      SCFZ    1   26610.0         12.0                0.357854    0.354839.            3
7014.0      SCFZ    2   25551.0         11.0                0.457945    0.423077             3
7014.0      SCFZ    3   30669.0         13.0                0.283379    0.258621             3
7030.0      SCDD    1   34244.0         16.0                0.316505    0.300000             4
7030.0      SCDD    2   25954.0         13.0                0.236513    0.232558             4

I want to group by Node, DAY, COMMODITY_CODE and apply a ifelse function that to impute values for null records. Here my conditions are the following:

For the group (Node, DAY, COMMODITY_CODE)
1. if delivery_ratio is null then i want to replace with mean(delivery_ratio) for group and assign it to delivery_ratio_filled
2. if case_ratio is null then i want to replace with mean(case_ratio) for group and assign it to case_ratio_filled
If for the group(Node, DAY, COMMODITY_CODE),
1. delivery_ratio_filled is null, then assign 1/window_count value to it
2. case_ratio_filled is null, then assign 1/window_count to it

I have accomplished this in R with ease using the dplyr package, I would basically like the same in Python using pandas.

df %>%
group_by(Node, DAY_OF_WK, COMMODITY_CODE) %>%
  mutate(delivery_ratio_filled = ifelse(!is.na(delivery_ratio),
                               delivery_ratio, 
                               mean(delivery_ratio)),
         case_ratio_filled = ifelse(!is.na(case_ratio),
                               case_ratio, 
                               mean(case_ratio))) %>%
  mutate(delivery_ratio_filled = ifelse(!is.na(delivery_ratio_filled),
                               delivery_ratio_filled,
                               1.0 / window_count),
         case_ratio_filled = ifelse(!is.na(case_ratio_filled),
                               case_ratio_filled,
                               1.0 / window_count))

Solution

Unfortunately the example input data doesn't contain na values (or groups larger than one item) that would be replaced with computed values. So the new columns are simple copies of the original columns.

The first conditions can be tested with np.where and applied to every row with transform

df[['delivery_ratio_filled','case_ratio_filled']] = (
    df.groupby(['Node', 'DAY', 'COMMODITY_CODE'])[['deliveries_ratio','case_ratio']]
      .transform(
        lambda x: np.where(x.isna(), x.mean(), x)))

The second conditions don't need to be grouped

df['delivery_ratio_filled'] = (
  np.where(df['delivery_ratio_filled'].isna(),
           1 / df['window_count'],
           df['delivery_ratio_filled']))
df['case_ratio_filled'] = (
  np.where(df['case_ratio_filled'].isna(),
           1 / df['window_count'],
           df['case_ratio_filled']))
df

Out:

     Node COMMODITY_CODE  ...  delivery_ratio_filled  case_ratio_filled
0  7014.0           SCFZ  ...               0.354839           0.357854
1  7014.0           SCFZ  ...               0.423077           0.457945
2  7014.0           SCFZ  ...               0.258621           0.283379
3  7030.0           SCDD  ...               0.300000           0.316505
4  7030.0           SCDD  ...               0.232558           0.236513

Answered By - Michael Szczesny

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, October 26, 2021

[FIXED] How to group by multiple column values in pandas and apply a ifelse to impute/calculate values

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels