Issue
I would like to attach new IDs to my current groups (id column), based on the grouped delta, which indicates how long the group (id column) did not appear in the data frame. If the delta is more than 5 (seconds), I would like to attach a new ID to my IDs, as shown in the "expected_id" column. The new ID should be unique.
Any advice on how to do this?
Here is my data (df):
id datetime delta expected_id
1 2023-12-04 15:26:47.059800 1.439766 1
1 2023-12-04 15:26:48.493609 1.433809 1
1 2023-12-04 15:26:49.890176 1.396567 1
1 2023-12-04 15:26:51.274575 1.384399 1
2 2023-12-04 15:26:51.274575 169.15666 2
1 2023-12-04 15:26:52.715784 1.441209 1
1 2023-12-04 15:26:54.288126 1.572342 1
2 2023-12-04 15:26:54.288126 3.013551 2
1 2023-12-04 15:26:55.802795 1.514669 1
2 2023-12-04 15:26:55.802795 1.514669 2
1 2023-12-04 15:26:57.189969 1.387174 1
2 2023-12-04 15:26:57.189969 1.387174 2
1 2023-12-04 15:26:58.552681 1.362712 1
1 2023-12-04 15:27:00.005263 1.452582 1
2 2023-12-04 15:27:00.005263 2.815294 2
3 2023-12-04 15:27:00.005263 nan 3
1 2023-12-04 15:27:01.378002 1.372739 1
2 2023-12-04 15:27:01.378002 1.372739 2
1 2023-12-04 15:27:02.758197 1.380195 1
1 2023-12-04 15:27:04.165050 1.406853 1
1 2023-12-04 15:27:05.562082 1.397032 1
1 2023-12-04 15:27:06.953888 1.391806 1
1 2023-12-04 15:27:08.365109 1.411221 1
1 2023-12-04 15:27:09.737913 1.372804 1
1 2023-12-04 15:27:11.295586 1.557673 1
2 2023-12-04 15:27:11.295586 9.917584 4
1 2023-12-04 15:27:12.817794 1.522208 1
1 2023-12-04 15:27:14.337981 1.520187 1
1 2023-12-04 15:27:15.811874 1.473893 1
1 2023-12-04 15:27:17.318122 1.506248 1
2 2023-12-04 15:27:17.318122 6.022536 5
3 2023-12-04 15:27:17.318122 17.312859 6
4 2023-12-04 15:27:17.318122 nan 7
5 2023-12-04 15:27:17.318122 nan 8
6 2023-12-04 15:27:17.318122 nan 9
1 2023-12-04 15:27:18.808131 1.490009 1
2 2023-12-04 15:27:18.808131 1.490009 5
3 2023-12-04 15:27:18.808131 1.490009 6
4 2023-12-04 15:27:18.808131 1.490009 7
5 2023-12-04 15:27:18.808131 1.490009 8
6 2023-12-04 15:27:18.808131 1.490009 9
1 2023-12-04 15:27:20.189021 1.38089 1
2 2023-12-04 15:27:20.189021 1.38089 5
3 2023-12-04 15:27:20.189021 1.38089 6
4 2023-12-04 15:27:20.189021 1.38089 7
5 2023-12-04 15:27:20.189021 1.38089 8
6 2023-12-04 15:27:20.189021 1.38089 9
1 2023-12-04 15:27:21.618110 1.429089 1
1 2023-12-04 15:27:23.099324 1.481214 1
2 2023-12-04 15:27:23.099324 2.910303 5
The delta column I achieve with this:
df['delta'] = df.groupby("id")['datetime'].diff() / np.timedelta64(1, 's')
How to create my "expected_id" column?
Solution
If I understand you right, you need to just convert the id
to the new expected id like this:
df["new_expected_id"] = (df["id"] != df["id"].shift()).cumsum() - 1
print(df)
Prints:
id datetime delta expected_id new_expected_id
0 0 2023-12-04 10:51:30.158743 NaN 0 0
1 1 2023-12-04 10:51:31.734037 NaN 1 1
2 1 2023-12-04 10:51:33.219067 1.48 1 1
3 1 2023-12-04 10:51:34.469723 1.25 1 1
4 0 2023-12-04 10:51:35.862997 5.70 2 2
5 0 2023-12-04 10:51:37.280209 1.41 2 2
6 0 2023-12-04 10:51:38.741301 1.46 2 2
7 0 2023-12-04 10:51:40.239296 1.49 2 2
8 1 2023-12-04 10:51:41.590683 7.12 3 3
9 1 2023-12-04 10:51:43.060751 1.47 3 3
10 1 2023-12-04 10:51:44.566724 1.50 3 3
11 1 2023-12-04 10:51:46.066713 1.49 3 3
12 0 2023-12-04 10:51:47.493897 7.25 4 4
13 0 2023-12-04 10:51:48.994885 1.50 4 4
14 0 2023-12-04 10:51:50.557707 1.56 4 4
15 0 2023-12-04 10:51:52.116537 1.55 4 4
16 0 2023-12-04 10:51:53.642456 1.52 4 4
17 1 2023-12-04 10:51:55.115518 9.04 5 5
EDIT 2: With the new input:
from itertools import count
def fn(g, cnt):
x = next(cnt)
mask = g["datetime"].diff() >= "5s"
g.loc[mask, "expected_id"] = np.array([next(cnt) for _ in range(mask.sum())])
g["expected_id"] = g["expected_id"].ffill()
g["expected_id"] = g["expected_id"].fillna(x).astype(int)
return g
c = count(1)
out = df.groupby("id", group_keys=False, sort=False).apply(fn, cnt=c)
print(out)
Prints:
id datetime delta expected_id
0 1 2023-12-04 15:26:47.059800 1.439766 1
1 1 2023-12-04 15:26:48.493609 1.433809 1
2 1 2023-12-04 15:26:49.890176 1.396567 1
3 1 2023-12-04 15:26:51.274575 1.384399 1
4 2 2023-12-04 15:26:51.274575 169.156660 2
5 1 2023-12-04 15:26:52.715784 1.441209 1
6 1 2023-12-04 15:26:54.288126 1.572342 1
7 2 2023-12-04 15:26:54.288126 3.013551 2
8 1 2023-12-04 15:26:55.802795 1.514669 1
9 2 2023-12-04 15:26:55.802795 1.514669 2
10 1 2023-12-04 15:26:57.189969 1.387174 1
11 2 2023-12-04 15:26:57.189969 1.387174 2
12 1 2023-12-04 15:26:58.552681 1.362712 1
13 1 2023-12-04 15:27:00.005263 1.452582 1
14 2 2023-12-04 15:27:00.005263 2.815294 2
15 3 2023-12-04 15:27:00.005263 NaN 3
16 1 2023-12-04 15:27:01.378002 1.372739 1
17 2 2023-12-04 15:27:01.378002 1.372739 2
18 1 2023-12-04 15:27:02.758197 1.380195 1
19 1 2023-12-04 15:27:04.165050 1.406853 1
20 1 2023-12-04 15:27:05.562082 1.397032 1
21 1 2023-12-04 15:27:06.953888 1.391806 1
22 1 2023-12-04 15:27:08.365109 1.411221 1
23 1 2023-12-04 15:27:09.737913 1.372804 1
24 1 2023-12-04 15:27:11.295586 1.557673 1
25 2 2023-12-04 15:27:11.295586 9.917584 3
26 1 2023-12-04 15:27:12.817794 1.522208 1
27 1 2023-12-04 15:27:14.337981 1.520187 1
28 1 2023-12-04 15:27:15.811874 1.473893 1
29 1 2023-12-04 15:27:17.318122 1.506248 1
30 2 2023-12-04 15:27:17.318122 6.022536 4
31 3 2023-12-04 15:27:17.318122 17.312859 6
32 4 2023-12-04 15:27:17.318122 NaN 7
33 5 2023-12-04 15:27:17.318122 NaN 8
34 6 2023-12-04 15:27:17.318122 NaN 9
35 1 2023-12-04 15:27:18.808131 1.490009 1
36 2 2023-12-04 15:27:18.808131 1.490009 5
37 3 2023-12-04 15:27:18.808131 1.490009 6
38 4 2023-12-04 15:27:18.808131 1.490009 7
39 5 2023-12-04 15:27:18.808131 1.490009 8
40 6 2023-12-04 15:27:18.808131 1.490009 9
41 1 2023-12-04 15:27:20.189021 1.380890 1
42 2 2023-12-04 15:27:20.189021 1.380890 5
43 3 2023-12-04 15:27:20.189021 1.380890 6
44 4 2023-12-04 15:27:20.189021 1.380890 7
45 5 2023-12-04 15:27:20.189021 1.380890 8
46 6 2023-12-04 15:27:20.189021 1.380890 9
47 1 2023-12-04 15:27:21.618110 1.429089 1
48 1 2023-12-04 15:27:23.099324 1.481214 1
49 2 2023-12-04 15:27:23.099324 2.910303 5
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.