Sunday, January 7, 2024

[FIXED] Utilizing a loop or automated method instead of manually typing out each iloc position to change dataframe value

January 07, 2024 dataframe, group-by, pandas, python No comments

Issue

Summary: I want to be able to recreate my function without having to manually type out each individual iloc and using if/elif for scalability for situations when the groups get too large to handle

I have a sample table df_stack_exchange

data_stack_exchange = {'store': ['A','B', 'B', 'C', 'C', 'C', 'D', 'D', 'D', 'D'],
        'worker': [1,1,2,1,2,3,1,2,3,4],
        'boxes': [105, 90, 100, 80, 10, 200, 70, 210, 50, 0],
        'optimal_boxes': [0,0,0,0,0,0,0,0,0,0]}
df_stack_exchange = pandas.DataFrame(data_stack_exchange)

	store	worker	boxes
0	A	1	105
1	B	1	90
2	B	2	100
3	C	1	80
4	C	2	10
5	C	3	200
6	D	1	70
7	D	2	210
8	D	3	50
9	D	4	0

Worker priority is in numerical order and I want to assign them a maximum of 100 boxes until there is no more to assign. The only condition is that if there is only one worker available (store A) then the single worker gets all the boxes even if it is greater than 100. See below for the expected dataframe

	store	worker	boxes	optimal_boxes
0	A	1	105	105
1	B	1	90	100
2	B	2	100	90
3	C	1	80	100
4	C	2	10	100
5	C	3	200	90
6	D	1	70	100
7	D	2	210	100
8	D	3	50	100
9	D	4	0	30

I created the following function which yields my expected outcome BUT it is not sustainable as I have to manually type out each iloc. I want to be able to recreate this function with a loop or have it be able to scale without having to keep on adding elifs. It is not a scalable solution for when group sizes become 10+ instead of the current max size of 4 (store D)

def box_optimizer(x):
    if x['optimal_boxes'].count() == 1:
        x['optimal_boxes'].iloc[0] = x['boxes'].sum()
        return x
    elif x['optimal_boxes'].count() == 2:
        x['optimal_boxes'].iloc[0] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
        x['optimal_boxes'].iloc[1] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
        return x
    elif x['optimal_boxes'].count() == 3:
        x['optimal_boxes'].iloc[0] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
        x['optimal_boxes'].iloc[1] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
        x['optimal_boxes'].iloc[2] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
        return x
    elif x['optimal_boxes'].count() == 4:
        x['optimal_boxes'].iloc[0] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
        x['optimal_boxes'].iloc[1] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
        x['optimal_boxes'].iloc[2] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum())
        x['optimal_boxes'].iloc[3] += numpy.where(x['boxes'].sum() - x['optimal_boxes'].sum() > 100, 100, x['boxes'].sum() - x['optimal_boxes'].sum()) 
        return x

df_stack_exchange_function = pandas.DataFrame(df_stack_exchange.groupby('store', as_index=False, group_keys=False).apply(box_optimizer))

# the expected dataframe output
df_stack_exchange_function

Solution

You don't need a loop, just sum the boxes, divide the total by 100 to get the number of workers with a full box, assign the rest to the last worker:

def assign_boxes(s):
    total = s.sum()
    d = min(total // 100, len(s)-1)
    return [100]*d+[total - 100*d]+[0]*(len(s)-d-1)

df['optimal_boxes'] = df.groupby('store')['boxes'].transform(assign_boxes)

Output:

  store  worker  boxes  optimal_boxes
0     A       1    105            105
1     B       1     90            100
2     B       2    100             90
3     C       1     80            100
4     C       2     10            100
5     C       3    200             90
6     D       1     70            100
7     D       2    210            100
8     D       3     50            100
9     D       4      0             30

Explanation:

[100]*d           # number of workers with 100
[total - 100*d]   # next worker gets all rest
[0]*(len(s)-d-1)  # in case there are too many workers, pad with 0s

Detailed explanation:

# example 1
s = pd.Series([70, 110, 50, 0])
total = s.sum()  # 230

# min (230//100, 3)
# min (2, 3) = 2
d = min(total // 100, len(s)-1)

out = ([100]*d          # [100] * 2 -> [100, 100]
      +[total - 100*d]  # [230 - 100*2] -> [30]
      +[0]*(len(s)-d-1) # [0]*(4-2-1) -> [0]
      ) # [100, 100, 30, 0]

# example 2
s = pd.Series([70, 210, 50, 0])
total = s.sum()  # 330

# min (330//100, 3)
# min (3, 3) = 3
d = min(total // 100, len(s)-1)

out = ([100]*d          # [100] * 3 -> [100, 100, 100]
      +[total - 100*d]  # [330 - 100*3] -> [30]
      +[0]*(len(s)-d-1) # [0]*(4-3-1) -> []
      ) # [100, 100, 100, 30]

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, January 7, 2024

[FIXED] Utilizing a loop or automated method instead of manually typing out each iloc position to change dataframe value

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels