Issue
i have an excel file that after each FOR loop adds one "SN" column to excel find.
what i want is to find the same rows in different "SN" columns.
THIS IS THE EXCEL THAT I WANT TO DO OPERATION ON :
print (df)
SN1 R1 SN2 R2
0 SN1 38 SN20 1000
1 SN2 38 SN21 181
2 SN3 532 SN23 5
3 SN4 521 SN25 38
4 SN6 115 SN26 8718
5 SN7 3 SN27 3452
6 SN8 136 SN28 229
7 SN9 168 SN30 136
AND THIS IS THE RESULT THAT I WANT
THE CODES :
v=int(input("how many periods do you have:\n"))
new_excel_by_user=str(input("write the name of your new EXCEL
file:"))
user_excel = pd.ExcelWriter(new_excel_by_user+'.xlsx',
engine='xlsxwriter')
sys_excel=pd.ExcelWriter('sys_excel132.xlsx', engine='xlsxwriter')
final_excel=pd.ExcelWriter('final132.xlsx' , engine='xlsxwriter')
merged_df_H = pd.DataFrame()
merged_df_L = pd.DataFrame()
final_merged=pd.DataFrame()
for i in range(v):
print("\npriod %d :"%(i+1))
excelpathH=''+input(''+r"Insert your Excel path(HIGH):")
excelpathL=''+input(''+r"Insert your Excel path(LOW):")
excelpathH=excelpathH.replace('"', '')
excelpathL=excelpathL.replace('"', '')
dfH=pd.read_excel(os.path.join(excelpathH), engine='openpyxl' )
dfL=pd.read_excel(os.path.join(excelpathL), engine='openpyxl' )
Rnh=dfH['KPIVALUE']
Rnl=dfL['KPIVALUE']
R=Rnh-Rnl
merged_df_H['R%s'%(i+1)]=R
R=merged_df_H['R%s'%(i+1)]
#the below code returns true false only
RESULT=merged_df_H['R%s'%(i+1)].ge(130)
merged_df_H =
pd.concat([merged_df_H,dfH.loc[mask,'SN'].rename('SN%s'%
(i+1)).reset_index(drop=True),
R[mask].reset_index(drop=True)], axis=1)
merged_df_H = pd.concat([ merged_df_H,dfH,dfL,R,RESULT], axis=1 )
final_merged=pd.concat([final_merged,dfH['SN'], R,RESULT],axis=1)
Solution
You can use DataFrame.filter
for get SN
and R
columns, reshape by DataFrame.stack
and filter same values by Series.duplicated
:
R = df.filter(like='R').stack().reset_index(drop=True)
SN = df.filter(like='SN').stack().reset_index(drop=True)
mask = R.duplicated(keep=False)
print (pd.concat([SN[mask], R[mask]], axis=1, keys=('SN-RESULT','R-RESULT')))
SN-RESULT R-RESULT
0 SN1 38
2 SN2 38
7 SN25 38
12 SN8 136
15 SN30 136
out = df.join(pd.concat([SN[mask], R[mask]],
axis=1,
keys=('SN-RESULT','R-RESULT')).reset_index(drop=True))
If columns SN-RESULT, R-RESULT
are already in input DataFrame add rsuffix
to DataFrame.join
:
out = df.join(pd.concat([SN[mask], R[mask]],
axis=1,
keys=('SN-RESULT','R-RESULT')).reset_index(drop=True),
rsuffix='_new')
Another idea with DataFrame.melt
`:
R = df.filter(like='R').melt()['value']
SN = df.filter(like='SN').melt()['value']
mask = R.duplicated(keep=False)
out = df.join(pd.concat([SN[mask], R[mask]],
axis=1,
keys=('SN-RESULT','R-RESULT')).reset_index(drop=True))
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.