Saturday, July 2, 2022

[FIXED] Matching two columns with the same row values in a csv file

July 02, 2022 csv, pandas, python-3.x, sorting No comments

Issue

I have a csv file with 4 columns:

Name             Dept        Email Name    Hair Color

John Smith       candy       Lincoln Tun   brown
Diana Princ      candy       John Smith    gold
Perry Plat       wood        Oliver Twist  bald
Jerry Springer   clothes     Diana Princ   gold
Calvin Klein     clothes        
Lincoln Tun      warehouse      
Oliver Twist     kitchen

I want to match the columns Name and Email Name by names.

This what the final output should look like:

Name             Dept        Email Name    Hair Color

John Smith       candy       John Smith    gold
Diana Princ      candy       Diana Princ   gold
Perry Plat       wood        
Jerry Springer   clothes     
Calvin Klein     clothes        
Lincoln Tun      warehouse   Lincoln Tun   brown    
Oliver Twist     kitchen     Oliver Twist  bald

I tried something like this in my code:

dfs = np.split(df,len(df.columns), axis=1)
dfs = [df.set_index(df.columns[0], drop=False) for df in dfs]
f=dfs[0].join(dfs[1:]).reset_index(drop=True).fillna(0)

Which sorted my two columns great but made everything else 0's

Name             Dept  Email Name    Hair Color

John Smith       0     John Smith    0
Diana Princ      0     Diana Princ   0
Perry Plat       0     0             0
Jerry Springer   0     0             0
Calvin Klein     0     0             0
Lincoln Tun      0     Lincoln Tun   0
Oliver Twist     0     Oliver Twist  0

Here is my code so far:

import pandas as pd
import numpy as np
import os, csv, sys
    
csvPath = 'User.csv'
    
df= pd.read_csv(csvPath)
    
dfs = np.split(df,len(df.columns), axis=1)
dfs = [df.set_index(df.columns[0], drop=False) for df in dfs]
f=dfs[0].join(dfs[1:]).reset_index(drop=True).fillna(0)
    
testCSV = 'test_user.csv' #to check my csv file
            
f.to_csv(testCSV, encoding='utf-8') #send it to csv

Solution

You could use merge for that:

pd.merge(df[['Name','Dept']],df[['Email Name','Hair Color']], left_on='Name', right_on='Email Name', how='left')

Result

             Name       Dept    Email Name Hair Color
0      John Smith      candy    John Smith       gold
1     Diana Princ      candy   Diana Princ       gold
2      Perry Plat       wood           NaN        NaN
3  Jerry Springer    clothes           NaN        NaN
4    Calvin Klein    clothes           NaN        NaN
5     Lincoln Tun  warehouse   Lincoln Tun      brown
6    Oliver Twist    kitchen  Oliver Twist       bald

Answered By - jch

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, July 2, 2022

[FIXED] Matching two columns with the same row values in a csv file

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels