Monday, November 14, 2022

[FIXED] How to make a calculation based on a value and multiple columns in pandas?

November 14, 2022 pandas, python No comments

Issue

I am trying to create a column, that should do a calculation per product, based on multiple columns.

Logic for the calculation column:

Calculations should be done per product
Use quantity as default
IF promo[Y/N] = 1, then take previous weeks quantity * season perc. change.
Except when the promo is on the first week of the product. Then keep quantity as well.

In the example below, You see the calculation column (I placed comments for the logic).

week    product    promo[Y/N]   quantity    Season       calculation 
  1       A           0           100          6            100 # no promo, so = quantity col
  2       A           0           100          10           100 # no promo, so = quantity col
  3       A           1           ?            -10           90 # 100 (quantity last week)- 10% (season)
  4       A           1           ?            20           108 # quantity last week, what we calculated, 90 + 20% (18) = 108.  
  5       A           0           80            4            80 # no promo, so = quantity col

  1       B           1           100          6            100 # Promo, but first week of this product. So regular quantity. 
  2       B           0           100          10           100 # no promo, so = quantity col
  3       B           1           ?            -10           90 # 100 (quantity last week)- 10% (season)
  4       B           1           ?            20           108 # quantity last week, what we calculated, 90 + 20% (18) = 108.  
  5       B           0           80            4            80 # no promo, so = quantity col

I tried to solve this in two ways:

Via a groupby(), and then the Product, but this was messing up my end file (I would like to have it in the format above, so with 1 additional column.
By looping over the dataframe with iterrows(). However, I messed up, because it doesn't distinct between products unfortunately.

Anyone an idea what a proper method is to solve this? Appreciated!

Solution

Using custom apply function to add column to dataframe

Code

def calc(xf):
    '''
        xf is dataframe from groupby
    '''
    calc = []
    # Faster to loop over rows using zip than iterrows
    for promo, quantity, season in zip(xf['promo[Y/N]'], xf['quantity'], xf['Season']):
        if not np.isnan(quantity):
            calc.append(quantity)           # not missing value
        elif promo and calc:                # beyond first week if calc is not None
            prev_quantity = float(calc[-1]) # previous quantity
            estimate = round((1 + season/100.)*prev_quantity) # estimate
            calc.append(estimate)
        else:
            calc.append(quantity)           # use current quantity
            
    xf['calculated'] = calc                 # Add calculated column to dataframe
    return xf

Test

from io import StringIO

s = '''week    product    promo[Y/N]   quantity    Season
  1       A           0           100          6
  2       A           0           100          10
  3       A           1           ?            -10
  4       A           1           ?            20
  5       A           0           80            4
  1       B           1           100          6
  2       B           0           100          10
  3       B           1           ?            -10
  4       B           1           ?            20
  5       B           0           80            4'''

# convert '?' to np.nan (missing value so column become float)
df = pd.read_csv(StringIO(s), sep = '\s{2,}', na_values = ['?'], engine = 'python')
print(df.type)

# Output

week            int64
product        object
promo[Y/N]      int64
quantity      float64
Season          int64
dtype: object

tf = df.groupby('product').apply(calc)
print(tf)

# Output

   week product  promo[Y/N] quantity  Season calculated
0     1       A           0      100       6        100
1     2       A           0      100      10        100
2     3       A           1        ?     -10         90
3     4       A           1        ?      20        108
4     5       A           0       80       4         80
5     1       B           1      100       6        100
6     2       B           0      100      10        100
7     3       B           1        ?     -10         90
8     4       B           1        ?      20        108
9     5       B           0       80       4         80

Answered By - DarrylG

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, November 14, 2022

[FIXED] How to make a calculation based on a value and multiple columns in pandas?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels