Friday, November 12, 2021

[FIXED] Sum of columns based on range of values of other columns in a Pandas dataframe

November 12, 2021 pandas, python No comments

Issue

This is my dataframe:

df = pd.DataFrame({'sym': ['a', 'b'], 'vol_1': [100, 50], 'price_1': [5, 150], 'vol_2': [1500, 2000], 'price_2': [20, 175],
                   'vol_3': [123, 500], 'price_3': [22, 1000], 'min': [18, 150], 'max': [23, 176]})

I want to add a column that sums vol_1, vol_2, and vol_3 for each row if the price that is in the next column for each vol is in range of min and max cols. For example for the first row I want vol_2 and vol_3 because the prices are in range of min and max. My desired outcome looks like this:

 sym  vol_1  price_1  vol_2  price_2  vol_3  price_3  min  max  vol_sum
0   a    100        5   1500       20    123       22   18   23     1623
1   b     50      150   2000      175    500     1000  150  176     2050

Solution

Reshape the data so you have individual columns for vol, price, min and max. Next, filter for only rows where price is between min and max, group by the sym column and append result to df.

df["vol_sum"] = (pd.wide_to_long(df,
                                 stubnames=["vol", "price"], 
                                 i=["sym", "min", "max"], 
                                 j="number", 
                                 sep="_")
                  .query("min <= price <= max", engine="python")
                  .groupby("sym")
                  .vol
                  .sum()
                  .array
                 )



   sym  vol_1   price_1 vol_2   price_2 vol_3   price_3 min max vol_sum
0   a   100      5      1500    20      123     22      18  23  1623
1   b   50      150     2000    175     500    1000     150 176 2050

update (13, October 2021):

If you choose not to convert to long form, a list comprehension could help; not so sure about the speed though :

vols = [col for col in df if col.startswith('vol')]

bools = {f"vol{col[-2:]}" : df[col].between(df['min'], df['max'])
         for col in df 
         if col.startswith('price')}

df.assign(vol_sum = df[vols].mul(bools).sum(1))

  sym  vol_1  price_1  vol_2  price_2  vol_3  price_3  min  max  vol_sum
0   a    100        5   1500       20    123       22   18   23     1623
1   b     50      150   2000      175    500     1000  150  176     2050

Answered By - sammywemmy

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, November 12, 2021

[FIXED] Sum of columns based on range of values of other columns in a Pandas dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels