Issue
I have a DataFrame with an DataTimeIndex and I try to find the maximum between the first timestamp after 00:00 and 00:00 of the upcoming day. Right now I have a workin solution using pd.groupby()
with an index by day.
Here is a minimal example with a maximum value of 24 which moves on hour each day:
df = pd.DataFrame(
{'v':list(range(25))*3},
index=pd.date_range('2020-01-01', freq='1H', periods=25*3)
)
df.index.name = 'datetime'
gg = df.groupby(df.index.copy().tz_localize(None).to_period("D"))
m = pd.merge(gg.idxmax(), gg.max(), on="datetime")
m.columns = ["idxmax", "max"]
>>> m
idxmax max
datetime
2020-01-01 2020-01-01 23:00:00 23
2020-01-02 2020-01-02 00:00:00 24
2020-01-03 2020-01-03 01:00:00 24
2020-01-04 2020-01-04 02:00:00 24
What I am looking for is
idxmax max
datetime
2019-12-31 2020-01-01 00:00:00 0
2020-01-01 2020-01-02 00:00:00 24
2020-01-02 2020-01-03 00:00:00 23
2020-01-03 2020-01-03 01:00:00 24
2020-01-04 2020-01-04 02:00:00 24
How can I create the wanted results?
Solution
Use Grouper
with offset
parameter:
df = pd.DataFrame(
{'v':list(range(25))*3},
index=pd.date_range('2020-01-01', freq='1H', periods=25*3)
)
df.index.name = 'datetime'
#if necessary
df.index = df.index.tz_localize(None)
gg = df.groupby(pd.Grouper(freq='D', offset='1H'))
m = pd.merge(gg.idxmax(), gg.max(), on="datetime")
m.columns = ["idxmax", "max"]
print (m)
idxmax max
datetime
2019-12-31 01:00:00 2020-01-01 00:00:00 0
2020-01-01 01:00:00 2020-01-02 00:00:00 24
2020-01-02 01:00:00 2020-01-03 00:00:00 23
2020-01-03 01:00:00 2020-01-03 01:00:00 24
2020-01-04 01:00:00 2020-01-04 02:00:00 24
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.