Issue
I am struggling with a good way for a solution to this problems.
I am having a dataframe 'orders', which has a time and price column. And I have another dataframe 'prices' which give me the markt price at any time.
import pandas as pd
prices = pd.DataFrame({'time': range(10), 'price': [20, 21, 22, 21, 22, 23, 24, 25, 26, 27]})
orders = pd.DataFrame({'time': [3, 6, 8], 'orderPrice' : [20, 24, 18]})
And I want to add a column to 'order', let's call it 'marketPriceTime', which does:
- take the last row from 'prices', where the 'price' is equal to the orderPrice and it's time is <= the time of the order row.
So the result in this case would be:
ordersResult = pd.DataFrame({'time': [3, 6, 8], 'orderPrice' : [20, 24, 18], 'marketPriceTime': [0, 6, None]})
Solution
What you want is a merge_asof
on time
with price
as ID:
out = pd.merge_asof(orders, prices.rename(columns={'time': 'marketPriceTime'}),
left_on='time', right_on='marketPriceTime',
left_by='orderPrice', right_by='price')
Now the subtlety is that both dataframes must be sorted by time
for the merge_asof
to work. If this is not the case, you must use:
out = pd.merge_asof(orders.reset_index().sort_values(by='time'),
prices.sort_values(by='time').rename(columns={'time': 'marketPriceTime'}),
left_by='orderPrice', right_by='price',
left_on='time', right_on='marketPriceTime'
).set_index('index').reindex(orders.index)
Output:
time orderPrice marketPriceTime price
0 3 20 0.0 20.0
1 6 24 6.0 24.0
2 8 18 NaN NaN
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.