Issue
This is my dataframe:
df = pd.DataFrame({'sym': ['a', 'b'], 'vol_1': [100, 50], 'price_1': [5, 150], 'vol_2': [1500, 2000], 'price_2': [20, 175],
'vol_3': [123, 500], 'price_3': [22, 1000], 'min': [18, 150], 'max': [23, 176]})
I want to add a column that sums vol_1
, vol_2
, and vol_3
for each row if the price that is in the next column for each vol is in range of min
and max
cols.
For example for the first row I want vol_2
and vol_3
because the prices are in range of min
and max
.
My desired outcome looks like this:
sym vol_1 price_1 vol_2 price_2 vol_3 price_3 min max vol_sum
0 a 100 5 1500 20 123 22 18 23 1623
1 b 50 150 2000 175 500 1000 150 176 2050
Solution
Reshape the data so you have individual columns for vol
, price
, min
and max
. Next, filter for only rows where price
is between min
and max
, group by the sym
column and append result to df
.
df["vol_sum"] = (pd.wide_to_long(df,
stubnames=["vol", "price"],
i=["sym", "min", "max"],
j="number",
sep="_")
.query("min <= price <= max", engine="python")
.groupby("sym")
.vol
.sum()
.array
)
sym vol_1 price_1 vol_2 price_2 vol_3 price_3 min max vol_sum
0 a 100 5 1500 20 123 22 18 23 1623
1 b 50 150 2000 175 500 1000 150 176 2050
update (13, October 2021):
If you choose not to convert to long form, a list comprehension could help; not so sure about the speed though :
vols = [col for col in df if col.startswith('vol')]
bools = {f"vol{col[-2:]}" : df[col].between(df['min'], df['max'])
for col in df
if col.startswith('price')}
df.assign(vol_sum = df[vols].mul(bools).sum(1))
sym vol_1 price_1 vol_2 price_2 vol_3 price_3 min max vol_sum
0 a 100 5 1500 20 123 22 18 23 1623
1 b 50 150 2000 175 500 1000 150 176 2050
Answered By - sammywemmy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.