Friday, April 1, 2022

[FIXED] pandas concat compare 2 dataframes with same column names how to build a differ column?

April 01, 2022 concatenation, dataframe, pandas, python No comments

Issue

I'm using pandas concat to compare 2 dataframes have same columns and rows:

import pandas as pd

df=pd.read_csv(r'C:\Users\compare\T1.csv')
df2=pd.read_csv(r'C:\Users\compare\T2.csv')

index=['contract','RB','payee','fund']

df_all = pd.concat([df.set_index(index), df2.set_index(index)], 
                   axis='columns', keys=['First', 'Second'])

df_final = df_all.swaplevel(axis='columns')[df.columns[54:56]]

df_final

The output is:

                              SD1                SD2
                             First     Second    First   Second
contract    RB  payee   fund                
AG72916Z    2   1   W42      15622.9    15622.9  15622.9    15489.2
            4   1   W44      14697.8    14697.8  14697.8    14572.1
            8   1   W48      7388.56    7388.56  7388.56    7325.37
AL0024AZ    C3  1   202      226.585    226.59   220.366    220.37
            S3  1   204      804.059    804.06   781.99     781.99

My question is how can I add a differ column after each Second ,so that I can easily tell the comparison result,the output should looks like this:

                              SD1                         SD2
                             First     Second   differ    First     Second   differ
contract    RB  payee   fund                
AG72916Z    2   1   W42      15622.9    15622.9    0       15622.9  15489.2  133.7
            4   1   W44      14697.8    14697.8    0       14697.8  14572.1  125.7
            8   1   W48      7388.56    7388.56    0       7388.56  7325.37  63.19
AL0024AZ    C3  1   202      226.585    226.59     0.05    220.366  220.37  -0.004
            S3  1   204      804.059    804.06     0.01    781.99   781.99   0

Solution

A bit tricky but necessary to keep ordering:

out = df_final.stack(level=0).assign(Diff=lambda x: x['First'] - x['Second']) \
              .stack().unstack(level=[-2, -1])
print(out)

# Output
                              SD1                         SD2                   
                            First    Second   Diff      First    Second     Diff
contract RB payee fund                                                          
AG72916Z 2  1     W42   15622.900  15622.90  0.000  15622.900  15489.20  133.700
         4  1     W44   14697.800  14697.80  0.000  14697.800  14572.10  125.700
         8  1     W48    7388.560   7388.56  0.000   7388.560   7325.37   63.190
AL0024AZ C3 1     202     226.585    226.59 -0.005    220.366    220.37   -0.004
         S3 1     204     804.059    804.06 -0.001    781.990    781.99    0.000

Update

What if I want to only select the row that Diff large than 100

Use:

>>> out[out.loc[:, (slice(None), 'Diff')].gt(100).any(1)]

                            SD1                    SD2                
                          First   Second Diff    First   Second   Diff
contract RB payee fund                                                
AG72916Z 2  1     W42   15622.9  15622.9  0.0  15622.9  15489.2  133.7
         4  1     W44   14697.8  14697.8  0.0  14697.8  14572.1  125.7

# Same result with
# idx = pd.IndexSlice
# out[out.loc[:, idx[:, 'Diff']].gt(100).any(1)]

Answered By - Corralien

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, April 1, 2022

[FIXED] pandas concat compare 2 dataframes with same column names how to build a differ column?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels