Issue
We need to return the list of dates for n consecutive business days ( Friday to Monday is 1 business day ). where values are not changed. Do not assume that dates column have every single dates. Data frame structure would be as given below
date Value
2022-07-19 44.43000000
2022-07-20 44.43000000
2022-07-21 44.43000000
2022-07-22 44.43000000
2022-07-25 44.43000000
... ...
2022-09-02 86.40000000
2022-09-06 85.13000000
2022-09-07 86.86000000
2022-09-08 88.44000000
2022-09-09 89.44000000
If we assume n is 5. We need to return list of 5 consecutive dates. For above examples answer would be
[2022-07-22,2022-07-20,2022-07-21,2022-07-22,2022-07-25]
I tried below code to get consecutive dates present in data frame but I am unable to get consecutive business days.
for k, v in px_dirty.groupby((px_dirty['value'].shift() != px_dirty['value']).cumsum()):
if len(v) == 5:
print(f'[group {k}]')
print(v)
I am not able to figure out how to get consecutive business days.
Solution
Use a date offset
:
from pandas.tseries.offsets import BDay
df['date'] = pd.to_datetime(df['date'])
# identify breaks in successive values
m1 = df['Value'].ne(df['Value'].shift())
# identify breaks in business days
m2 = df['date'].ne(df['date'].shift().add(BDay()))
# group by either break
for k,g in df.groupby((m1|m2).cumsum()):
if len(g) == 5:
print(f'[group {k}]')
print(g)
Output:
[group 1]
date Value
0 2022-07-19 44.43
1 2022-07-20 44.43
2 2022-07-21 44.43
3 2022-07-22 44.43
4 2022-07-25 44.43
Intermediates:
date Value m1 m2 m1|m2 group len
0 2022-07-19 44.43 True True True 1 5
1 2022-07-20 44.43 False False False 1 5
2 2022-07-21 44.43 False False False 1 5
3 2022-07-22 44.43 False False False 1 5
4 2022-07-25 44.43 False False False 1 5
5 2022-09-02 86.40 True True True 2 1
6 2022-09-06 85.13 True True True 3 1
7 2022-09-07 86.86 True False True 4 1
8 2022-09-08 88.44 True False True 5 1
9 2022-09-09 89.44 True False True 6 1
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.