Saturday, October 29, 2022

[FIXED] Check and compare if first 3 row value of dataframe for each day is greater than the value on corresponding column

October 29, 2022 dataframe, iteration, pandas, python No comments

Issue

I have a 5-minute time series dataframe with the titles Open and Pivot. The Pivot column value is the same throughout the day. I need to extract the first three 5-minute Open data and compare each of the values to the value on the pivot column and see if it is larger or not. If it is larger, then a 3rd column titled 'Result' would print 1 for the whole day's length and 0 if otherwise.

Here's how the dataframe looks like:

As an example the first 3 values for the day of 2022-09-26 are 18803.900391, 18801.226562 and 18807.296875. Since two of the values are less than the value on the pivot column 18806.938151for the corresponding day, the result column would print 0 for the whole day.

A rough idea:

I was loosely thinking of something like this, but I know this is completely wrong:

for i in range(len(df)):
    df['result'] = df['Open'].iloc[i:i+3] > df['pivot'].iloc[i]
    i=i+288 #number of 5 min candles in a day to skip to next one

I can't seen to find a way to iterate through this. Any idea or recommendation would help! Thankyou in advance!

Here's my complete code to get the dataframe:

import yfinance as yf
import numpy as np
import pandas as pd
import datetime

df = yf.download(tickers='BTC-USD', period = '30d', interval = '5m')
df = df.reset_index()

#resetting df to start at midnight of next day
min_date = df.Datetime.min()
NextDay_Date = (min_date + datetime.timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
df = df[df.Datetime >= NextDay_Date].copy()
df = df.set_index('Datetime')

# resampled daily pivots merged to smaller timeframe
day_df = (df.resample('D')
               .agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}))
day_df['pivot'] = (df['High']+ df['Low'] + df['Close'])/3

day_df = day_df.reset_index()
day_df['Datetime'] = day_df['Datetime'].dt.strftime('%Y-%m-%d %H:%M:%S')
day_df['Datetime'] = pd.to_datetime(day_df['Datetime'])
day_df = day_df.set_index('Datetime')
day_df.drop(['Open', 'High', 'Low', 'Close'], axis=1, inplace=True)

#merging both dataframes
df = df.join(day_df)
df['pivot'].fillna(method='ffill', inplace=True)

df.drop(['High','Low','Close','Adj Close', 'Volume'], axis=1, inplace=True)

df

Solution

Here's a basic solution:

df_s = df.reset_index()
df_s['result'] = 0 

start_day = -1

for i in range(len(df_s)):
    if df_s['Datetime'][i].day != start_day :
        start_day = df_s['Datetime'][i].day
        res = np.all(df_s['Open'][i:i+3] > df_s['pivot'][i])
    
    df_s['result'][i] = res

Since I don't like multi-indexed dataframes, my first step was to treat as it an int indexed df where Datetime is just another column. Now, we keep track of 2 things - start_day and result and iterate as follows:

For each new start_day check if the first three Open values are greater than the corresponding pivot value, and store to result. i.e, only when the row entry has a different Datetime value, update the start_day and result values
then result value must be applied to each row

Now, some of the less obvious reasons why I call this a basic solution:

in case there are only 2 entries for a certain start_date, my code doesn't check the dates (of the 3 entries) before comparing the Open and pivot values
the code assumes the df is sorted by date and takes into account only the day and not the month, so in case the dataset actually has missing entries for certain months, it may run into the same problem as above
A SettingwithCopy warning is raised by pandas

But I hope this should be enough to answer the OP's question

Answered By - MtC

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, October 29, 2022

[FIXED] Check and compare if first 3 row value of dataframe for each day is greater than the value on corresponding column

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels