Wednesday, February 7, 2024

[FIXED] Identifying row numbers where value is stable before and after the value in the column hits a specified value

February 07, 2024 pandas, periodicity, python, python-3.x No comments

Issue

EDITED

I have a pandas dataframe like so:

data = {'ID': ['A', 'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'B',    'B',    'B',    'B',    'B',    'B',    'B',    'B',    'B',    'B'],
    'column_1': [0, 0,  0,  0,  0.1,    1,  1.5,    2,  3,  4,  4.5,    5,  4.9,    3,  2,  1.8,    1,  0,  0,  1,  3,  0,  1.3,    2,  3,  4.3,    4.8,    5,  4.2,    3.5,    3,  2.6,    2,  1.9,    1,  0,  0,  0,  0,  0,  0.1,    0.2,    0.3,    1,  2,  3,  5,  4,  2,  0.5,    0,  0],
    'column_2': [13,    25, 96, 59, 5,  92, 82, 141,    50, 85, 84, 113,    119,    128,    8,  133,    82, 10, 15, 62, 11, 68, 18, 24, 37, 55, 83, 48, 13, 81, 43, 36, 56, 43, 36, 46, 45, 127,    55, 67, 113,    98, 78, 78, 57, 131,    121,    126,    142,    51, 64, 95]}


    ID  column_1  column_2
0   A       0.0        13
1   A       0.0        25
2   A       0.0        96
3   A       0.0        59
4   A       0.1         5
5   A       1.0        92
6   A       1.5        82
7   A       2.0       141
8   A       3.0        50
9   A       4.0        85
10  A       4.5        84
11  A       5.0       113
12  A       4.9       119
13  A       3.0       128
14  A       2.0         8
15  A       1.8       133
16  A       1.0        82
17  A       0.0        10
18  A       0.0        15
19  A       1.0        62
20  A       3.0        11
21  A       0.0        68
22  A       1.3        18
23  A       2.0        24
24  A       3.0        37
25  A       4.3        55
26  A       4.8        83
27  A       5.0        48
28  A       4.2        13
29  A       3.5        81
30  A       3.0        43
31  A       2.6        36
32  A       2.0        56
33  A       1.9        43
34  A       1.0        36
35  A       0.0        46
36  A       0.0        45
37  A       0.0       127
38  A       0.0        55
39  A       0.0        67
40  A       0.1       113
41  A       0.2        98
42  B       0.3        78
43  B       1.0        78
44  B       2.0        57
45  B       3.0       131
46  B       5.0       121
47  B       4.0       126
48  B       2.0       142
49  B       0.5        51
50  B       0.0        64
51  B       0.0        95

Tracing back from when the value hits 5 in column_1, I want to find the value in column_2 just before the value in column_1 increased from 0 and just after it came back down to 0. So, in the data frame above, the values in column_2 would be 5, 10 and 18, 46. I want to perform some arithmetic and would like to add 2 columns before & after with those values grouped by the ID column. The expected output would be:

    ID  column_1  column_2  Before  After
0   A       0.0        13       0      0
1   A       0.0        25       0      0
2   A       0.0        96       0      0
3   A       0.0        59       0      0
4   A       0.1         5       0      0
5   A       1.0        92       0      0
6   A       1.5        82       0      0
7   A       2.0       141       0      0
8   A       3.0        50       0      0
9   A       4.0        85       0      0
10  A       4.5        84       0      0
11  A       5.0       113       5     10
12  A       4.9       119       0      0
13  A       3.0       128       0      0
14  A       2.0         8       0      0
15  A       1.8       133       0      0
16  A       1.0        82       0      0
17  A       0.0        10       0      0
18  A       0.0        15       0      0
19  A       1.0        62       0      0
20  A       3.0        11       0      0
21  A       0.0        68       0      0
22  A       1.3        18       0      0
23  A       2.0        24       0      0
24  A       3.0        37       0      0
25  A       4.3        55       0      0
26  A       4.8        83       0      0
27  A       5.0        48      18     46
28  A       4.2        13       0      0
29  A       3.5        81       0      0
30  A       3.0        43       0      0
31  A       2.6        36       0      0
32  A       2.0        56       0      0
33  A       1.9        43       0      0
34  A       1.0        36       0      0
35  A       0.0        46       0      0
36  A       0.0        45       0      0
37  A       0.0       127       0      0
38  A       0.0        55       0      0
39  A       0.0        67       0      0
40  A       0.1       113       0      0
41  A       0.2        98       0      0
42  B       0.3        78       0      0
43  B       1.0        78       0      0
44  B       2.0        57       0      0
45  B       3.0       131       0      0
46  B       5.0       121      78     64
47  B       4.0       126       0      0
48  B       2.0       142       0      0
49  B       0.5        51       0      0
50  B       0.0        64       0      0
51  B       0.0        95       0      0

For a given ID if column_1 starts with a non zero value, it should give the first value of column_2 for that group.

The rest of the rows in Before and After can be filled with null or zeroes. Is there an elegant way to achieve this?

Solution

With where and shift you can generate series where only the endpoints of the subsequences are filled and the rest is padded with NaN. Combined with ffill (or backfill, respectively), locating the rows where column_1 equals 5 gives pretty much the desired result:

col1 = df.column_1
col2 = df.column_2

df.loc[col1.eq(5), "Before"] = col2.where(col1.gt(0) & col1.shift().eq(0)).ffill()
df.loc[col1.eq(5), "After"] = col2.where(col1.eq(0) & col1.shift().gt(0)).backfill()

Result:

   ID  column_1  column_2  Before  After
0   A       0.0        13     NaN    NaN
1   A       0.0        25     NaN    NaN
2   A       0.0        96     NaN    NaN
3   A       0.0        59     NaN    NaN
4   A       0.1         5     NaN    NaN
5   A       1.0        92     NaN    NaN
6   A       1.5        82     NaN    NaN
7   A       2.0       141     NaN    NaN
8   A       3.0        50     NaN    NaN
9   A       4.0        85     NaN    NaN
10  A       4.5        84     NaN    NaN
11  A       5.0       113     5.0   10.0
12  A       4.9       119     NaN    NaN
13  A       3.0       128     NaN    NaN
14  A       2.0         8     NaN    NaN
15  A       1.8       133     NaN    NaN
16  A       1.0        82     NaN    NaN
17  A       0.0        10     NaN    NaN
18  A       0.0        15     NaN    NaN
19  A       1.0        62     NaN    NaN
20  A       3.0        11     NaN    NaN
21  A       0.0        68     NaN    NaN
22  A       1.3        18     NaN    NaN
23  A       2.0        24     NaN    NaN
24  A       3.0        37     NaN    NaN
25  A       4.3        55     NaN    NaN
26  A       4.8        83     NaN    NaN
27  A       5.0        48    18.0   46.0
28  A       4.2        13     NaN    NaN
29  A       3.5        81     NaN    NaN
30  A       3.0        43     NaN    NaN
31  A       2.6        36     NaN    NaN
32  A       2.0        56     NaN    NaN
33  A       1.9        43     NaN    NaN
34  A       1.0        36     NaN    NaN
35  A       0.0        46     NaN    NaN
36  A       0.0        45     NaN    NaN
37  A       0.0       127     NaN    NaN
38  A       0.0        55     NaN    NaN
39  A       0.0        67     NaN    NaN

With groupby()

def set_flags(frame):

    col1 = frame.column_1
    col2 = frame.column_2
    m = col1.eq(5)

    frame.loc[m, "Before"] = col2.where(col1.gt(0) & col1.shift(fill_value=0).eq(0)).ffill()
    frame.loc[m, "After"] = col2.where(col1.eq(0) & col1.shift().gt(0)).backfill()

    return frame

df[["Before", "After"]] = 0  # initializing columns with 0 keeps the int type
df = df.groupby("ID").apply(set_flags)

Result with new data

   ID  column_1  column_2  Before  After
0   A       0.0        13       0      0
1   A       0.0        25       0      0
2   A       0.0        96       0      0
3   A       0.0        59       0      0
4   A       0.1         5       0      0
5   A       1.0        92       0      0
6   A       1.5        82       0      0
7   A       2.0       141       0      0
8   A       3.0        50       0      0
9   A       4.0        85       0      0
10  A       4.5        84       0      0
11  A       5.0       113       5     10
12  A       4.9       119       0      0
13  A       3.0       128       0      0
14  A       2.0         8       0      0
15  A       1.8       133       0      0
16  A       1.0        82       0      0
17  A       0.0        10       0      0
18  A       0.0        15       0      0
19  A       1.0        62       0      0
20  A       3.0        11       0      0
21  A       0.0        68       0      0
22  A       1.3        18       0      0
23  A       2.0        24       0      0
24  A       3.0        37       0      0
25  A       4.3        55       0      0
26  A       4.8        83       0      0
27  A       5.0        48      18     46
28  A       4.2        13       0      0
29  A       3.5        81       0      0
30  A       3.0        43       0      0
31  A       2.6        36       0      0
32  A       2.0        56       0      0
33  A       1.9        43       0      0
34  A       1.0        36       0      0
35  A       0.0        46       0      0
36  A       0.0        45       0      0
37  A       0.0       127       0      0
38  A       0.0        55       0      0
39  A       0.0        67       0      0
40  A       0.1       113       0      0
41  A       0.2        98       0      0
42  B       0.3        78       0      0
43  B       1.0        78       0      0
44  B       2.0        57       0      0
45  B       3.0       131       0      0
46  B       5.0       121      78     64
47  B       4.0       126       0      0
48  B       2.0       142       0      0
49  B       0.5        51       0      0
50  B       0.0        64       0      0
51  B       0.0        95       0      0

Answered By - fsimonjetz

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, February 7, 2024

[FIXED] Identifying row numbers where value is stable before and after the value in the column hits a specified value

Issue

EDITED

Solution

Result:

With groupby()

Result with new data

0 comments:

Post a Comment

Popular Posts

Labels