Issue
This is an example for what I’m trying to do, I have two dfs (actually 100+), the ‘r_results’ with value=1 calculated in the first df will be subject to a different threshold in next df as in the ‘buffer_screen’, however I got stuck when I’m trying to reference the ‘r_results’ from previous ‘processed’ df for next raw df for calculation. hope I’m making myself clear.
For all dfs I have, the calculation will be done in order, i.e., the results in first df will be used in the second one, so on so forth until the calculation is done for all dfs.
Code example as below for the calculation in the first df, then 'buffer_screen' function will be used for next dfs in the list.
import pandas as pd
df1 = pd.DataFrame({'ID':[1,2,3],
'r_f':[0.18878187,0.327355797,0.100753051]})
df2 = pd.DataFrame({'ID':[1,2,3,4,5],
'r_f':[0.300009355,0.331788473,0.146077926,0.167329833,0.245227094]})
df_lst = [df1, df2]
thd_new = 0.3
thd_curr = 0.3333
def buffer_screen(curr, r):
if curr==1 and r<=thd_curr:
return 1
elif curr==0 and r<=thd_new:
return 1
else:
return 0
for df in df_lst[0:1]:
df_new = df.assign(r_results = lambda X: X['r_f'].apply(lambda x: 1 if x <= thd_new else 0))
Solution
IIUC, you want to compare the ID of the current dataframe to a threshold. This threshold is thd_new
if in the previous dataframe the result was 0 (default) or thd_curr
if the result was 1. So you need to build a dictionary of threshold for each ID:
thd = {}
for df in df_lst:
m = df['r_f'].le(df['ID'].map(thd).fillna(thd_new).values).astype(int)
df['r_results'] = m
thd |= dict(zip(df['ID'], np.where(m, thd_curr, thd_new)))
print(df, end='\n\n')
# Output
ID r_f r_results
0 1 0.188782 1
1 2 0.327356 0
2 3 0.100753 1
ID r_f r_results
0 1 0.300009 1
1 2 0.331788 0
2 3 0.146078 1
3 4 0.167330 1
4 5 0.245227 1
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.