Issue
I have dataframe like this:
data = {'event': [1, 1, 3, 2, 3, 1, 2, 3, 4, 5], 'team': ['A', 'A', 'B', 'A', 'B', 'C', 'C', 'C', 'D', 'D'], 'timeStamp': ['2023-07-23 14:57:13.357', '2023-07-23 14:57:14.357', '2023-07-23 14:57:15.357', '2023-07-23 14:57:16.357', '2023-07-23 14:57:20.357', '2023-07-23 14:57:13.357', '2023-07-23 14:57:18.357', '2023-07-23 14:57:23.357', '2023-07-23 14:57:23.357', '2023-07-23 14:57:25.357']}
I need to identify is there row with event "2" in 7 seconds after each row for each team.
Output should be like this
event | team | timeStamp | is_2_in_7_sec |
---|---|---|---|
1 | A | 2023-07-23 14:57:13.357 | TRUE |
1 | A | 2023-07-23 14:57:14.357 | TRUE |
3 | B | 2023-07-23 14:57:15.357 | FALSE |
2 | A | 2023-07-23 14:57:16.357 | FALSE |
3 | B | 2023-07-23 14:57:20.357 | FALSE |
1 | C | 2023-07-23 14:57:13.357 | TRUE |
2 | C | 2023-07-23 14:57:18.357 | FALSE |
3 | C | 2023-07-23 14:57:23.357 | FALSE |
4 | D | 2023-07-23 14:57:23.357 | FALSE |
5 | D | 2023-07-23 14:57:25.357 | FALSE |
In R it's quite easy, but in Python I'm stuck at this task.
Solution
General solution with merge_asof
- if need exclude row with event=2
:
df = pd.DataFrame(data)
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
df['is_2_in_7_sec'] = (pd.merge_asof(df.sort_values(by='timeStamp').reset_index(),
df[df['event'].eq(2)].sort_values(by='timeStamp'),
by='team',
on='timeStamp',
tolerance=pd.Timedelta('7s'),
allow_exact_matches=False,
direction='forward')
.set_index('index')['event_y'].notna())
print (df)
event team timeStamp is_2_in_7_sec
0 1 A 2023-07-23 14:57:13.357 True
1 1 A 2023-07-23 14:57:14.357 True
2 3 B 2023-07-23 14:57:15.357 False
3 2 A 2023-07-23 14:57:16.357 False
4 3 B 2023-07-23 14:57:20.357 False
5 1 C 2023-07-23 14:57:13.357 True
6 2 C 2023-07-23 14:57:18.357 False
7 3 C 2023-07-23 14:57:23.357 False
8 4 D 2023-07-23 14:57:23.357 False
9 5 D 2023-07-23 14:57:25.357 False
If need include row with event=2
remove allow_exact_matches=False
:
df = pd.DataFrame(data)
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
df['is_2_in_7_sec'] = (pd.merge_asof(df.sort_values(by='timeStamp').reset_index(),
df[df['event'].eq(2)].sort_values(by='timeStamp'),
by='team',
on='timeStamp',
tolerance=pd.Timedelta('7s'),
direction='forward')
.set_index('index')['event_y'].notna())
print (df)
event team timeStamp is_2_in_7_sec
0 1 A 2023-07-23 14:57:13.357 True
1 1 A 2023-07-23 14:57:14.357 True
2 3 B 2023-07-23 14:57:15.357 False
3 2 A 2023-07-23 14:57:16.357 True
4 3 B 2023-07-23 14:57:20.357 False
5 1 C 2023-07-23 14:57:13.357 True
6 2 C 2023-07-23 14:57:18.357 True
7 3 C 2023-07-23 14:57:23.357 False
8 4 D 2023-07-23 14:57:23.357 False
9 5 D 2023-07-23 14:57:25.357 False
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.