Issue
Summary: I want to be able to recreate my function without having to manually type out each individual iloc and using if/elif for scalability for situations when the groups get too large to handle
I have a sample table df_stack_exchange
data_stack_exchange = {'store': ['A','B', 'B', 'C', 'C', 'C', 'D', 'D', 'D', 'D'],
'worker': [1,1,2,1,2,3,1,2,3,4],
'boxes': [105, 90, 100, 80, 10, 200, 70, 210, 50, 0],
'optimal_boxes': [0,0,0,0,0,0,0,0,0,0]}
df_stack_exchange = pandas.DataFrame(data_stack_exchange)
store | worker | boxes | optimal_boxes | |
---|---|---|---|---|
0 | A | 1 | 105 | 0 |
1 | B | 1 | 90 | 0 |
2 | B | 2 | 100 | 0 |
3 | C | 1 | 80 | 0 |
4 | C | 2 | 10 | 0 |
5 | C | 3 | 200 | 0 |
6 | D | 1 | 70 | 0 |
7 | D | 2 | 210 | 0 |
8 | D | 3 | 50 | 0 |
9 | D | 4 | 0 | 0 |
Worker priority is in numerical order and I want to assign them a maximum of 100 boxes until there is no more to assign. The only condition is that if there is only one worker available (store A) then the single worker gets all the boxes even if it is greater than 100. See below for the expected dataframe
store | worker | boxes | optimal_boxes | |
---|---|---|---|---|
0 | A | 1 | 105 | 105 |
1 | B | 1 | 90 | 100 |
2 | B | 2 | 100 | 90 |
3 | C | 1 | 80 | 100 |
4 | C | 2 | 10 | 100 |
5 | C | 3 | 200 | 90 |
6 | D | 1 | 70 | 100 |
7 | D | 2 | 210 | 100 |
8 | D | 3 | 50 | 100 |
9 | D | 4 | 0 | 30 |
I created the following function which yields my expected outcome BUT it is not sustainable as I have to manually type out each iloc. I want to be able to recreate this function with a loop or have it be able to scale without having to keep on adding elifs. It is not a scalable solution for when group sizes become 10+ instead of the current max size of 4 (store D)
def box_optimizer(x):
if x['optimal_boxes'].count() == 1:
x['optimal_boxes'].iloc[0] = x['boxes'].sum()
return x
elif x['optimal_boxes'].count() == 2:
x['optimal_boxes'].iloc[0] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[1] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
return x
elif x['optimal_boxes'].count() == 3:
x['optimal_boxes'].iloc[0] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[1] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[2] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
return x
elif x['optimal_boxes'].count() == 4:
x['optimal_boxes'].iloc[0] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[1] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[2] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
x['optimal_boxes'].iloc[3] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
return x
df_stack_exchange_function = pandas.DataFrame(df_stack_exchange.groupby('store', as_index=False, group_keys=False).apply(box_optimizer))
# the expected dataframe output
df_stack_exchange_function
Solution
You don't need a loop, just sum the boxes, divide the total by 100 to get the number of workers with a full box, assign the rest to the last worker:
def assign_boxes(s):
total = s.sum()
d = min(total // 100, len(s)-1)
return [100]*d+[total - 100*d]+[0]*(len(s)-d-1)
df['optimal_boxes'] = df.groupby('store')['boxes'].transform(assign_boxes)
Output:
store worker boxes optimal_boxes
0 A 1 105 105
1 B 1 90 100
2 B 2 100 90
3 C 1 80 100
4 C 2 10 100
5 C 3 200 90
6 D 1 70 100
7 D 2 210 100
8 D 3 50 100
9 D 4 0 30
Explanation:
[100]*d # number of workers with 100
[total - 100*d] # next worker gets all rest
[0]*(len(s)-d-1) # in case there are too many workers, pad with 0s
Detailed explanation:
# example 1
s = pd.Series([70, 110, 50, 0])
total = s.sum() # 230
# min (230//100, 3)
# min (2, 3) = 2
d = min(total // 100, len(s)-1)
out = ([100]*d # [100] * 2 -> [100, 100]
+[total - 100*d] # [230 - 100*2] -> [30]
+[0]*(len(s)-d-1) # [0]*(4-2-1) -> [0]
) # [100, 100, 30, 0]
# example 2
s = pd.Series([70, 210, 50, 0])
total = s.sum() # 330
# min (330//100, 3)
# min (3, 3) = 3
d = min(total // 100, len(s)-1)
out = ([100]*d # [100] * 3 -> [100, 100, 100]
+[total - 100*d] # [330 - 100*3] -> [30]
+[0]*(len(s)-d-1) # [0]*(4-3-1) -> []
) # [100, 100, 100, 30]
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.