Friday, October 21, 2022

[FIXED] Find the nearest value in dataframe column and calculate the difference for each row

October 21, 2022 dataframe, find, pandas, python No comments

Issue

Even though I was googling a lot, I couldn't find the solution for my problem.

I have dataframe

    filter10    REF
0   NaN         0.00
1   NaN         0.75
2   NaN         1.50
3   NaN         2.25
4   NaN         3.00
5   NaN         3.75
6   NaN         4.50
...
15  2.804688    11.25
16  3.021875    12.00
17  3.578125    12.75
18  3.779688    13.50
...
27  NaN         20.25
28  NaN         21.00
29  NaN         21.75
30  NaN         22.50
31  6.746875    NaN
32  NaN         NaN
...

I would like now to add the column df['DIFF'] where function goes through whole column filter10 and when it is the number it finds closest number in REF column.

And afterwards calculate the difference between them and put it the same row as number in filter10 is.

I would like this output:

    filter10    REF     DIFF
0   NaN         0.00    NaN
1   NaN         0.75    NaN
2   NaN         1.50    NaN
3   NaN         2.25    NaN
4   NaN         3.00    NaN
5   NaN         3.75    NaN
6   NaN         4.50    NaN
...
15  2.804688    11.25   -0.195312 # 2.804688 - 3 (find closest value in REF) = -0.195312
16  3.021875    12.00   0.021875
17  3.578125    12.75   -0.171875
18  3.779688    13.50   0.029688
...
27  NaN         20.25   NaN
28  NaN         21.00   NaN
29  NaN         21.75   NaN
30  NaN         22.50   NaN
31  6.746875    NaN     -0.003125
32  NaN         NaN     NaN
...

Solution

Use pandas.merge_asof to find the nearest value:

df['DIFF'] = (pd.merge_asof(df['filter10'].dropna().sort_values().reset_index(),
                            df[['REF']].dropna().sort_values('REF'),
                            left_on='filter10', right_on='REF', direction='nearest')
                .set_index('index')['REF'].rsub(df['filter10'])
              )

Output:

    filter10    REF      DIFF
0        NaN   0.00       NaN
1        NaN   0.75       NaN
2        NaN   1.50       NaN
3        NaN   2.25       NaN
4        NaN   3.00       NaN
5        NaN   3.75       NaN
6        NaN   4.50       NaN
15  2.804688  11.25 -0.195312
16  3.021875  12.00  0.021875
17  3.578125  12.75 -0.171875
18  3.779688  13.50  0.029688
27       NaN  20.25       NaN
28       NaN  21.00       NaN
29       NaN  21.75       NaN
30       NaN  22.50       NaN
31  6.746875    NaN  2.246875 # likely different due to missing data
32       NaN    NaN       NaN

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, October 21, 2022

[FIXED] Find the nearest value in dataframe column and calculate the difference for each row

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels