Issue
I have a dataframe df like below
Node COMMODITY_CODE DAY Capacity_Case Capacity_Delivery case_ratio deliveries_ratio window_count
7014.0 SCFZ 1 26610.0 12.0 0.357854 0.354839. 3
7014.0 SCFZ 2 25551.0 11.0 0.457945 0.423077 3
7014.0 SCFZ 3 30669.0 13.0 0.283379 0.258621 3
7030.0 SCDD 1 34244.0 16.0 0.316505 0.300000 4
7030.0 SCDD 2 25954.0 13.0 0.236513 0.232558 4
I want to group by Node, DAY, COMMODITY_CODE and apply a ifelse function that to impute values for null records. Here my conditions are the following:
For the group (Node, DAY, COMMODITY_CODE)
- if delivery_ratio is null then i want to replace with mean(delivery_ratio) for group and assign it to delivery_ratio_filled
- if case_ratio is null then i want to replace with mean(case_ratio) for group and assign it to case_ratio_filled
If for the group(Node, DAY, COMMODITY_CODE),
- delivery_ratio_filled is null, then assign 1/window_count value to it
- case_ratio_filled is null, then assign 1/window_count to it
I have accomplished this in R with ease using the dplyr package, I would basically like the same in Python using pandas.
df %>%
group_by(Node, DAY_OF_WK, COMMODITY_CODE) %>%
mutate(delivery_ratio_filled = ifelse(!is.na(delivery_ratio),
delivery_ratio,
mean(delivery_ratio)),
case_ratio_filled = ifelse(!is.na(case_ratio),
case_ratio,
mean(case_ratio))) %>%
mutate(delivery_ratio_filled = ifelse(!is.na(delivery_ratio_filled),
delivery_ratio_filled,
1.0 / window_count),
case_ratio_filled = ifelse(!is.na(case_ratio_filled),
case_ratio_filled,
1.0 / window_count))
Solution
Unfortunately the example input data doesn't contain na
values (or groups larger than one item) that would be replaced with computed values. So the new columns are simple copies of the original columns.
The first conditions can be tested with np.where
and applied to every row with transform
df[['delivery_ratio_filled','case_ratio_filled']] = (
df.groupby(['Node', 'DAY', 'COMMODITY_CODE'])[['deliveries_ratio','case_ratio']]
.transform(
lambda x: np.where(x.isna(), x.mean(), x)))
The second conditions don't need to be grouped
df['delivery_ratio_filled'] = (
np.where(df['delivery_ratio_filled'].isna(),
1 / df['window_count'],
df['delivery_ratio_filled']))
df['case_ratio_filled'] = (
np.where(df['case_ratio_filled'].isna(),
1 / df['window_count'],
df['case_ratio_filled']))
df
Out:
Node COMMODITY_CODE ... delivery_ratio_filled case_ratio_filled
0 7014.0 SCFZ ... 0.354839 0.357854
1 7014.0 SCFZ ... 0.423077 0.457945
2 7014.0 SCFZ ... 0.258621 0.283379
3 7030.0 SCDD ... 0.300000 0.316505
4 7030.0 SCDD ... 0.232558 0.236513
Answered By - Michael Szczesny
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.