Issue
I have two Pandas dataframes with "datetime" column.
For the df
dataframe I would like to add a column with the seconds-difference with the nearest dflogdf["datetime"]
BEFORE the df["datetime"]
. This means I cannot use merge_asof direction="nearest"
.
For example
df["datetime"]
:
2023-11-15T18:00:00
2023-11-20T19:00:00
2023-11-20T20:00:00
2023-11-20T21:00:00
dflogs["datetime"]:
2023-11-17T18:00:00
2023-11-20T20:00:00
Expected output:
2023-11-15T18:00:00 None (Nothing before)
2023-11-20T19:00:00 262800 (2023-11-17T18:00:00)
2023-11-20T20:00:00 0 (2023-11-20T20:00:00)
2023-11-20T21:00:00 3600 (2023-11-20T20:00:00)
I was thinking around a function like this (not working correctly):
def check_time_diff(item):
item["timediff"] = (item["datetime"] - dflogs['datetime']).min() / pd.Timedelta(seconds=1)
return item
df = df.apply(check_time_diff, axis=1)
Solution
This is a case for merge_asof
with direction backward
out = pd.merge_asof(df[['datetime']],
dflogs[['datetime']].assign(logtime=dflogs['datetime']),
on='datetime', direction='backward')
out['diff'] = out['datetime'].sub(out['logtime']).dt.total_seconds()
Output:
datetime logtime diff
0 2023-11-15 18:00:00 NaT NaN
1 2023-11-20 19:00:00 2023-11-17 18:00:00 262800.0
2 2023-11-20 20:00:00 2023-11-20 20:00:00 0.0
3 2023-11-20 21:00:00 2023-11-20 20:00:00 3600.0
Answered By - Quang Hoang
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.