Sunday, December 24, 2023

[FIXED] Comparing value of one column with the next row in another column in pandas dataframe

December 24, 2023 dataframe, group-by, pandas, python, python-3.x No comments

Issue

I have the following large dataset recording the result of a math competition among students in descending order of date: So for example, student 1 comes third in Race 1 while student 3 won Race 2, etc.

Race_ID   Date           adv          C_k
1         1/1/2023       2.5          2.7     
1         1/1/2023       1.4          2.6     
1         1/1/2023       1.3          1.9     
1         1/1/2023       1.1          1.2     
2         11/9/2022      1.4          1.1     
2         11/9/2022      1.3          1.2     
2         11/9/2022      1.0          0.4     
3         17/4/2022      0.9          0.2     
3         17/4/2022      0.8          0.4     
3         17/4/2022      0.7          0.5     
3         17/4/2022      0.6          0.2     
3         17/4/2022      0.5          0.4

That is grouped by Group_ID, the values in adv is sorted in descending value by groups. For each group, I want to define a value

t= min{n| adv_(n+1) <= C_n} and C_t be the corresponding value in the C_k column. In words, I want to create a new column C_t for each group, where the value is equal to the first C_k such that C_k is greater or equal to the adv value in the next row. And if no such value exists, then set C_t = 1 (for example, in Race_ID = 3)

So the desired column looks like this:

Race_ID   Date           adv          C_k     C_t
1         1/1/2023       2.5          2.7     1.9      
1         1/1/2023       1.4          2.6     1.9     
1         1/1/2023       1.3          1.9     1.9     
1         1/1/2023       1.1          1.2     1.9     
2         11/9/2022      1.4          1.1     1.2     
2         11/9/2022      1.3          1.2     1.2     
2         11/9/2022      1.0          0.4     1.2     
3         17/4/2022      0.9          0.2     1.0     
3         17/4/2022      0.8          0.4     1.0     
3         17/4/2022      0.7          0.5     1.0     
3         17/4/2022      0.6          0.2     1.0     
3         17/4/2022      0.5          0.4     1.0

Thanks in advance.

Solution

Compare shifted values per groups by DataFrameGroupBy.shift with Series.le, filter by boolean indexing, remove duplicates with keep last value by DataFrame.drop_duplicates and mapping by Series.map:

s = (df[df.groupby('Race_ID')['adv'].shift(-1).le(df['C_k'])]
         .drop_duplicates(subset=['Race_ID'], keep='last')
         .set_index('Race_ID')['C_k'])

df['C_t'] = df['Race_ID'].map(s).fillna(1)
print (df)
    Race_ID       Date  adv  C_k  C_t
0         1   1/1/2023  2.5  2.7  1.9
1         1   1/1/2023  1.4  2.6  1.9
2         1   1/1/2023  1.3  1.9  1.9
3         1   1/1/2023  1.1  1.2  1.9
4         2  11/9/2022  1.4  1.1  1.2
5         2  11/9/2022  1.3  1.2  1.2
6         2  11/9/2022  1.0  0.4  1.2
7         3  17/4/2022  0.9  0.2  1.0
8         3  17/4/2022  0.8  0.4  1.0
9         3  17/4/2022  0.7  0.5  1.0
10        3  17/4/2022  0.6  0.2  1.0
11        3  17/4/2022  0.5  0.4  1.0

Or use Series.where for set NaNs for not matched values and get last non missing value per group by GroupBy.transform with last:

df['C_t'] = (df['C_k'].where(df.groupby('Race_ID')['adv'].shift(-1).le(df['C_k']))
                      .groupby(df['Race_ID'])
                      .transform('last')
                      .fillna(1))
print (df)
    Race_ID       Date  adv  C_k  C_t
0         1   1/1/2023  2.5  2.7  1.9
1         1   1/1/2023  1.4  2.6  1.9
2         1   1/1/2023  1.3  1.9  1.9
3         1   1/1/2023  1.1  1.2  1.9
4         2  11/9/2022  1.4  1.1  1.2
5         2  11/9/2022  1.3  1.2  1.2
6         2  11/9/2022  1.0  0.4  1.2
7         3  17/4/2022  0.9  0.2  1.0
8         3  17/4/2022  0.8  0.4  1.0
9         3  17/4/2022  0.7  0.5  1.0
10        3  17/4/2022  0.6  0.2  1.0
11        3  17/4/2022  0.5  0.4  1.0

Answered By - jezrael

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, December 24, 2023

[FIXED] Comparing value of one column with the next row in another column in pandas dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels