Tuesday, January 2, 2024

[FIXED] pandas loop through a list of dataframes (df) to do calculation, results in previous df to be referenced and used for next df

January 02, 2024 loops, pandas, python No comments

Issue

This is an example for what I’m trying to do, I have two dfs (actually 100+), the ‘r_results’ with value=1 calculated in the first df will be subject to a different threshold in next df as in the ‘buffer_screen’, however I got stuck when I’m trying to reference the ‘r_results’ from previous ‘processed’ df for next raw df for calculation. hope I’m making myself clear.

For all dfs I have, the calculation will be done in order, i.e., the results in first df will be used in the second one, so on so forth until the calculation is done for all dfs.

Code example as below for the calculation in the first df, then 'buffer_screen' function will be used for next dfs in the list.

import pandas as pd
df1 = pd.DataFrame({'ID':[1,2,3],
                'r_f':[0.18878187,0.327355797,0.100753051]})
df2 = pd.DataFrame({'ID':[1,2,3,4,5],
                'r_f':[0.300009355,0.331788473,0.146077926,0.167329833,0.245227094]})
df_lst = [df1, df2]

thd_new = 0.3
thd_curr = 0.3333

def buffer_screen(curr, r):
    if curr==1 and r<=thd_curr:
        return 1
    elif curr==0 and r<=thd_new:
        return 1
    else:
        return 0

for df in df_lst[0:1]:
    df_new = df.assign(r_results = lambda X: X['r_f'].apply(lambda x: 1 if x <= thd_new else 0))

Solution

IIUC, you want to compare the ID of the current dataframe to a threshold. This threshold is thd_new if in the previous dataframe the result was 0 (default) or thd_curr if the result was 1. So you need to build a dictionary of threshold for each ID:

thd = {}
for df in df_lst:
    m = df['r_f'].le(df['ID'].map(thd).fillna(thd_new).values).astype(int)
    df['r_results'] = m
    thd |= dict(zip(df['ID'], np.where(m, thd_curr, thd_new)))
    print(df, end='\n\n')

# Output
   ID       r_f  r_results
0   1  0.188782          1
1   2  0.327356          0
2   3  0.100753          1

   ID       r_f  r_results
0   1  0.300009          1
1   2  0.331788          0
2   3  0.146078          1
3   4  0.167330          1
4   5  0.245227          1

Answered By - Corralien

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, January 2, 2024

[FIXED] pandas loop through a list of dataframes (df) to do calculation, results in previous df to be referenced and used for next df

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels