Issue
I have a dataframe with a datetimeindex as shown with the format, the raw data is supposed to contains record every hourly for a year(each day having 24 record). Some hours/days are missing and not recorded in the data.
How can i get a list of all the missing datetimeindex hour.
Example: 01 hour is missing, how can i find and print out 2012-10-02 01:00:00
I'm currently able to get the missing days but unable to do so for the hour.
missing_day = pd.date_range(start = mdf.index[0], end = mdf.index[-1]).difference(mdf.index)
missing_day = missing_day.strftime('%Y%m%d')
missing = pd.Series(missing_day).array
for i in missing:
print(i)
for x in range(24):
x = str(x)
m = i + x
m = datetime.strptime(m,'%Y%m%d%H')
print(m)
Output(printing 24 hour for each missing days)
What would be the best way to list out all of the missing datetime.
Solution
Use set predicates to find missing index:
out = pd.date_range(df.index.min(), df.index.max(), freq='H').difference(df.index)
print(out)
# Output
DatetimeIndex(['2022-01-01 06:00:00', '2022-01-01 12:00:00',
'2022-01-01 14:00:00', '2022-01-01 16:00:00'],
dtype='datetime64[ns]', freq=None)
Setup:
df = pd.DataFrame({'A':[0]}, index=pd.date_range('2022-01-01', freq='H', periods=24))
df = df.sample(n=20).sort_index()
print(df)
# Output
A
2022-01-01 00:00:00 0
2022-01-01 01:00:00 0
2022-01-01 02:00:00 0
2022-01-01 03:00:00 0
2022-01-01 04:00:00 0
2022-01-01 05:00:00 0
2022-01-01 07:00:00 0
2022-01-01 08:00:00 0
2022-01-01 09:00:00 0
2022-01-01 10:00:00 0
2022-01-01 11:00:00 0
2022-01-01 13:00:00 0
2022-01-01 15:00:00 0
2022-01-01 17:00:00 0
2022-01-01 18:00:00 0
2022-01-01 19:00:00 0
2022-01-01 20:00:00 0
2022-01-01 21:00:00 0
2022-01-01 22:00:00 0
2022-01-01 23:00:00 0
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.