Issue
I'm using pandas concat to compare 2 dataframes have same columns and rows:
import pandas as pd
df=pd.read_csv(r'C:\Users\compare\T1.csv')
df2=pd.read_csv(r'C:\Users\compare\T2.csv')
index=['contract','RB','payee','fund']
df_all = pd.concat([df.set_index(index), df2.set_index(index)],
axis='columns', keys=['First', 'Second'])
df_final = df_all.swaplevel(axis='columns')[df.columns[54:56]]
df_final
The output is:
SD1 SD2
First Second First Second
contract RB payee fund
AG72916Z 2 1 W42 15622.9 15622.9 15622.9 15489.2
4 1 W44 14697.8 14697.8 14697.8 14572.1
8 1 W48 7388.56 7388.56 7388.56 7325.37
AL0024AZ C3 1 202 226.585 226.59 220.366 220.37
S3 1 204 804.059 804.06 781.99 781.99
My question is how can I add a differ column after each Second ,so that I can easily tell the comparison result,the output should looks like this:
SD1 SD2
First Second differ First Second differ
contract RB payee fund
AG72916Z 2 1 W42 15622.9 15622.9 0 15622.9 15489.2 133.7
4 1 W44 14697.8 14697.8 0 14697.8 14572.1 125.7
8 1 W48 7388.56 7388.56 0 7388.56 7325.37 63.19
AL0024AZ C3 1 202 226.585 226.59 0.05 220.366 220.37 -0.004
S3 1 204 804.059 804.06 0.01 781.99 781.99 0
Solution
A bit tricky but necessary to keep ordering:
out = df_final.stack(level=0).assign(Diff=lambda x: x['First'] - x['Second']) \
.stack().unstack(level=[-2, -1])
print(out)
# Output
SD1 SD2
First Second Diff First Second Diff
contract RB payee fund
AG72916Z 2 1 W42 15622.900 15622.90 0.000 15622.900 15489.20 133.700
4 1 W44 14697.800 14697.80 0.000 14697.800 14572.10 125.700
8 1 W48 7388.560 7388.56 0.000 7388.560 7325.37 63.190
AL0024AZ C3 1 202 226.585 226.59 -0.005 220.366 220.37 -0.004
S3 1 204 804.059 804.06 -0.001 781.990 781.99 0.000
Update
What if I want to only select the row that Diff large than 100
Use:
>>> out[out.loc[:, (slice(None), 'Diff')].gt(100).any(1)]
SD1 SD2
First Second Diff First Second Diff
contract RB payee fund
AG72916Z 2 1 W42 15622.9 15622.9 0.0 15622.9 15489.2 133.7
4 1 W44 14697.8 14697.8 0.0 14697.8 14572.1 125.7
# Same result with
# idx = pd.IndexSlice
# out[out.loc[:, idx[:, 'Diff']].gt(100).any(1)]
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.