Issue
I have a 5-minute time series dataframe with the titles Open
and Pivot
. The Pivot
column value is the same throughout the day. I need to extract the first three 5-minute Open
data and compare each of the values to the value on the pivot
column and see if it is larger or not. If it is larger, then a 3rd column titled 'Result' would print 1
for the whole day's length and 0
if otherwise.
Here's how the dataframe looks like:
As an example the first 3 values for the day of 2022-09-26 are 18803.900391, 18801.226562 and 18807.296875
. Since two of the values are less than the value on the pivot column 18806.938151
for the corresponding day, the result column would print 0 for the whole day.
A rough idea:
I was loosely thinking of something like this, but I know this is completely wrong:
for i in range(len(df)):
df['result'] = df['Open'].iloc[i:i+3] > df['pivot'].iloc[i]
i=i+288 #number of 5 min candles in a day to skip to next one
I can't seen to find a way to iterate through this. Any idea or recommendation would help! Thankyou in advance!
Here's my complete code to get the dataframe:
import yfinance as yf
import numpy as np
import pandas as pd
import datetime
df = yf.download(tickers='BTC-USD', period = '30d', interval = '5m')
df = df.reset_index()
#resetting df to start at midnight of next day
min_date = df.Datetime.min()
NextDay_Date = (min_date + datetime.timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
df = df[df.Datetime >= NextDay_Date].copy()
df = df.set_index('Datetime')
# resampled daily pivots merged to smaller timeframe
day_df = (df.resample('D')
.agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}))
day_df['pivot'] = (df['High']+ df['Low'] + df['Close'])/3
day_df = day_df.reset_index()
day_df['Datetime'] = day_df['Datetime'].dt.strftime('%Y-%m-%d %H:%M:%S')
day_df['Datetime'] = pd.to_datetime(day_df['Datetime'])
day_df = day_df.set_index('Datetime')
day_df.drop(['Open', 'High', 'Low', 'Close'], axis=1, inplace=True)
#merging both dataframes
df = df.join(day_df)
df['pivot'].fillna(method='ffill', inplace=True)
df.drop(['High','Low','Close','Adj Close', 'Volume'], axis=1, inplace=True)
df
Solution
Here's a basic solution:
df_s = df.reset_index()
df_s['result'] = 0
start_day = -1
for i in range(len(df_s)):
if df_s['Datetime'][i].day != start_day :
start_day = df_s['Datetime'][i].day
res = np.all(df_s['Open'][i:i+3] > df_s['pivot'][i])
df_s['result'][i] = res
Since I don't like multi-indexed dataframes, my first step was to treat as it an int indexed df where Datetime
is just another column. Now, we keep track of 2 things - start_day
and result
and iterate as follows:
- For each new
start_day
check if the first threeOpen
values are greater than the correspondingpivot
value, and store toresult
. i.e, only when the row entry has a differentDatetime
value, update thestart_day
andresult
values - then
result
value must be applied to each row
Now, some of the less obvious reasons why I call this a basic solution:
- in case there are only 2 entries for a certain
start_date
, my code doesn't check the dates (of the 3 entries) before comparing theOpen
andpivot
values - the code assumes the df is sorted by date and takes into account only the day and not the month, so in case the dataset actually has missing entries for certain months, it may run into the same problem as above
- A
SettingwithCopy
warning is raised bypandas
But I hope this should be enough to answer the OP's question
Answered By - MtC
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.