Issue
I need to count the number of hour intervals over a monthly period. I need to group it by only time and not by date. For example
Date | Start | End |
---|---|---|
23-02-2023 | 12:10:00 | 12:34:00 |
24-02-2023 | 12:15:00 | 12:45:00 |
would count 2 for 12:00:00 to 12:59:59 slot
My sample data looks like this (If needed I can change my sample data format)
first_appear | last_appear |
---|---|
12:10:00 | 12:31:00 |
12:33:49 | 13:29:12 |
15:30:20 | 18:40:30 |
20:12:20 | 23:10:20 |
23:34:20 | 6:11:00 |
If you notice the last entry denotes the overlap with the next day. The code
import pandas as pd
import numpy as np
import staircase as sc
df = pd.read_csv('Overlapping Schedule - Sheet2.csv')
df["first_appear"] = pd.to_timedelta(df["first_appear"].map(str))
df["last_appear"] = pd.to_timedelta(df["last_appear"].map(str))
df["first_appear"] = df["first_appear"].dt.floor("H")
df["last_appear"] = df["last_appear"].dt.ceil("H")
sf = sc.Stairs(df, start="first_appear", end="last_appear")
sample_times = pd.timedelta_range("00:00:00", "24:00:00", freq=pd.Timedelta("1hr"))
sf(sample_times, include_index=True)
The output
0 days 00:00:00 | 0 |
---|---|
0 days 01:00:00 | 0 |
0 days 02:00:00 | 0 |
0 days 03:00:00 | 0 |
0 days 04:00:00 | 0 |
0 days 05:00:00 | 0 |
0 days 06:00:00 | 0 |
0 days 07:00:00 | 0 |
0 days 08:00:00 | 0 |
0 days 09:00:00 | 0 |
0 days 10:00:00 | 0 |
0 days 11:00:00 | 0 |
0 days 12:00:00 | 2 |
0 days 13:00:00 | 1 |
0 days 14:00:00 | 0 |
0 days 15:00:00 | 1 |
0 days 16:00:00 | 1 |
0 days 17:00:00 | 1 |
0 days 18:00:00 | 1 |
0 days 19:00:00 | 0 |
0 days 20:00:00 | 1 |
0 days 21:00:00 | 1 |
0 days 22:00:00 | 1 |
0 days 23:00:00 | 1 |
1 days 00:00:00 | 0 |
Ideally I would like to see following entries as well
1 days 01:00:00 | 1 |
---|---|
1 days 02:00:00 | 1 |
1 days 03:00:00 | 1 |
1 days 04:00:00 | 1 |
1 days 05:00:00 | 1 |
1 days 06:00:00 | 1 |
I referred to multiple Stack Overflow answers to come up with this but now I am stuck. @riley's answer to Group and count by time interval - Python helped me to get started
Solution
Try:
def get_hours(row):
out = []
if row["last_appear"] < row["first_appear"]:
out.extend(
pd.timedelta_range(
row["first_appear"].floor("1h"),
"24:00:00",
freq="1H",
)
)
out.extend(
pd.timedelta_range(
"00:00:00",
row["last_appear"].floor("1h"),
freq="1H",
)
+ pd.Timedelta("1 day")
)
else:
out.extend(
pd.timedelta_range(
row["first_appear"].floor("1h"),
row["last_appear"].floor("1h"),
freq="1H",
)
)
return out
df["first_appear"] = pd.to_timedelta(df["first_appear"])
df["last_appear"] = pd.to_timedelta(df["last_appear"])
df = (
df.assign(hours=df.apply(get_hours, axis=1))
.explode("hours")
.groupby("hours")["hours"]
.count()
)
df = df.reindex(pd.timedelta_range("00:00:00", df.index.max(), freq="1H"), fill_value=0)
print(df)
Prints:
0 days 00:00:00 0
0 days 01:00:00 0
0 days 02:00:00 0
0 days 03:00:00 0
0 days 04:00:00 0
0 days 05:00:00 0
0 days 06:00:00 0
0 days 07:00:00 0
0 days 08:00:00 0
0 days 09:00:00 0
0 days 10:00:00 0
0 days 11:00:00 0
0 days 12:00:00 2
0 days 13:00:00 1
0 days 14:00:00 0
0 days 15:00:00 1
0 days 16:00:00 1
0 days 17:00:00 1
0 days 18:00:00 1
0 days 19:00:00 0
0 days 20:00:00 1
0 days 21:00:00 1
0 days 22:00:00 1
0 days 23:00:00 2
1 days 00:00:00 2
1 days 01:00:00 1
1 days 02:00:00 1
1 days 03:00:00 1
1 days 04:00:00 1
1 days 05:00:00 1
1 days 06:00:00 1
Freq: H, Name: hours, dtype: int64
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.