Thursday, January 11, 2024

[FIXED] How to get common timeintervals in pandas

January 11, 2024 dataframe, pandas, python-3.x No comments

Issue

I am using pandas version 1.0.5

import pandas as pd
dat1 = [
    ['2023-12-27','2023-12-27 00:00:00','2023-12-27 02:14:00'],
    ['2023-12-27','2023-12-27 03:16:00','2023-12-27 04:19:00'],
    ['2023-12-27','2023-12-27 18:11:00','2023-12-27 20:13:00'],
    ['2023-12-28','2023-12-28 01:16:00','2023-12-28 02:14:00'],
    ['2023-12-28','2023-12-28 02:16:00','2023-12-28 02:28:00'],
    ['2023-12-28','2023-12-28 02:30:00','2023-12-28 02:56:00'],
['2023-12-28','2023-12-28 18:45:00','2023-12-28 19:00:00'],
    ['2023-12-29','2023-12-29 01:16:00','2023-12-29 02:13:00'],
['2023-12-29','2023-12-29 04:16:00','2023-12-29 05:09:00'],
    ['2023-12-29','2023-12-29 05:11:00','2023-12-29 05:14:00'],
['2023-12-29','2023-12-29 18:00:00','2023-12-29 19:00:00']
]
df = pd.DataFrame(dat1,columns = ['date','Start_tmp','End_tmp'])
df["Start_tmp"] = pd.to_datetime(df["Start_tmp"])
df["End_tmp"] = pd.to_datetime(df["End_tmp"])

My dataframe looks like this:

I need to find common or overlapping interval between the timestamps.

For example, one of the overlapping time across all the three dates (yellow highlighted) is 1:16 - 2:13. The other (blue highlighted) would be 18:45 - 19:00

So my expected output would be like: [57,15]

57 - Minutes between 1:16 - 2:13.

15 - Minutes between 18:45 - 19:00

Any clue how this output can be achieved. Thanks.

Solution

This solution uses:

numpy, no uncommon Python modules, so using Python 1.0.5 you should, hopefully, be in the clear,
no nested loops to care for speed issues with growing dataset,

Method:

Draw the landscape of overlaps
Then select the overlaps corresponding to the number of documented days,
Finally describe the overlaps in terms of their lengths

Number of documented days: (as in Python: Convert timedelta to int in a dataframe)

n = 1 + ( max(df['End_tmp']) - min(df['Start_tmp']) ).days
n
3

Additive landscape:

# initial flat whole-day landcape (height: 0)
L = np.zeros(24*60, dtype='int')
# add up ranges: (reused @sammywemmy's perfect formula for time of day in minutes)
for start, end in zip(df['Start_tmp'].dt.hour.mul(60) + df['Start_tmp'].dt.minute,  # Start_tmp timestamps expressed in minutes
                      df['End_tmp'].dt.hour.mul(60)   + df['End_tmp'].dt.minute):   # End_tmp timestamps expressed in minutes
    L[start:end+1] += 1

plt.plot(L)
plt.hlines(y=[2,3],xmin=0,xmax=1400,colors=['green','red'], linestyles='dashed')
plt.xlabel('time of day (minutes)')
plt.ylabel('time range overlaps')

(Please excuse the typo: these are obviously minutes, not seconds)

Keep only overlaps over all days:

# Reduce heights <n to 0 because not overlaping every day
L[L<n]=0
# Simplify all greater values to 1 because only their presence matters
L[L>0]=1
# Now only overlaps are highlighted

Extract overlap ranges and their lengths

# Highlight edges of range overlaps
D = np.diff(L)
# Describe overlaps as ranges
R = list(zip([a[0]   for a in np.argwhere(D>0)],  # indices where overlaps *begin*, with scalar indices instead of arrays
             [a[0]-1 for a in np.argwhere(D<0)])) # indices where overlaps *end*, with scalar indices instead of arrays
R
[(75, 132), (1124, 1139)]

# Finally their lengths
[b-a for a,b in R]

Final output: [57, 15]

Answered By - OCa

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, January 11, 2024

[FIXED] How to get common timeintervals in pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels