Issue
I am trying to create a column, that should do a calculation per product, based on multiple columns.
Logic for the calculation column:
- Calculations should be done per product
- Use quantity as default
- IF promo[Y/N] = 1, then take previous weeks quantity * season perc. change.
- Except when the promo is on the first week of the product. Then keep quantity as well.
In the example below, You see the calculation column (I placed comments for the logic).
week product promo[Y/N] quantity Season calculation
1 A 0 100 6 100 # no promo, so = quantity col
2 A 0 100 10 100 # no promo, so = quantity col
3 A 1 ? -10 90 # 100 (quantity last week)- 10% (season)
4 A 1 ? 20 108 # quantity last week, what we calculated, 90 + 20% (18) = 108.
5 A 0 80 4 80 # no promo, so = quantity col
1 B 1 100 6 100 # Promo, but first week of this product. So regular quantity.
2 B 0 100 10 100 # no promo, so = quantity col
3 B 1 ? -10 90 # 100 (quantity last week)- 10% (season)
4 B 1 ? 20 108 # quantity last week, what we calculated, 90 + 20% (18) = 108.
5 B 0 80 4 80 # no promo, so = quantity col
I tried to solve this in two ways:
Via a groupby(), and then the Product, but this was messing up my end file (I would like to have it in the format above, so with 1 additional column.
By looping over the dataframe with iterrows(). However, I messed up, because it doesn't distinct between products unfortunately.
Anyone an idea what a proper method is to solve this? Appreciated!
Solution
Using custom apply function to add column to dataframe
Code
def calc(xf):
'''
xf is dataframe from groupby
'''
calc = []
# Faster to loop over rows using zip than iterrows
for promo, quantity, season in zip(xf['promo[Y/N]'], xf['quantity'], xf['Season']):
if not np.isnan(quantity):
calc.append(quantity) # not missing value
elif promo and calc: # beyond first week if calc is not None
prev_quantity = float(calc[-1]) # previous quantity
estimate = round((1 + season/100.)*prev_quantity) # estimate
calc.append(estimate)
else:
calc.append(quantity) # use current quantity
xf['calculated'] = calc # Add calculated column to dataframe
return xf
Test
from io import StringIO
s = '''week product promo[Y/N] quantity Season
1 A 0 100 6
2 A 0 100 10
3 A 1 ? -10
4 A 1 ? 20
5 A 0 80 4
1 B 1 100 6
2 B 0 100 10
3 B 1 ? -10
4 B 1 ? 20
5 B 0 80 4'''
# convert '?' to np.nan (missing value so column become float)
df = pd.read_csv(StringIO(s), sep = '\s{2,}', na_values = ['?'], engine = 'python')
print(df.type)
# Output
week int64 product object promo[Y/N] int64 quantity float64 Season int64 dtype: object
tf = df.groupby('product').apply(calc)
print(tf)
# Output
week product promo[Y/N] quantity Season calculated
0 1 A 0 100 6 100
1 2 A 0 100 10 100
2 3 A 1 ? -10 90
3 4 A 1 ? 20 108
4 5 A 0 80 4 80
5 1 B 1 100 6 100
6 2 B 0 100 10 100
7 3 B 1 ? -10 90
8 4 B 1 ? 20 108
9 5 B 0 80 4 80
Answered By - DarrylG
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.