Wednesday, November 15, 2023

[FIXED] How to compare two CSV files and their multiple strings and merge them into one if condition is met?

November 15, 2023 csv, python, python-3.x No comments

Issue

I have two master files (let's call them one.csv and two.csv) and an instruction to parse them and merge into multiple files under a specific directory structure. I managed to do the first part of it and build a directory structure with underlying files. The only thing I'm not able to achieve on my own is the second and last part of my instruction. I will now simulate relevant content of both CSV files:

one.csv:

ID,Status,Date,Asset,Frequency
4815162342,Open,29/02/1842,9765,Monthly
...

two.csv:

Service,Description,ID,Attribute,Activity
IDR,Conducting,4815162342,Done,THEONETHATGOTAWAY_01
IDR,Conducting,4815162342,Todo,THEONETHATGOTAWAY_02
IDR,Conducting,4815162342,Todo,THEONETHATGOTAWAY_03

NOTE: The first line of both files, as it usually goes, is constant as it represents column name.

The part that I managed to do is parsing one.csv and writing it, in the correct format, to multiple files (file01.csv, file02.csv, file03.csv, etc.), each containing a column name and its respective value starting from row 2; like so:

file01.csv:

ID:,4815162342
Status:,Open
Date:,29/02/1842
Asset:,9765
Frequency:,Monthly

EOF

What my mind refuses to success in is, after parsing one.csv and writing it to files, open each file again, read that 4815162342 (as an example. it's different in every file, but the position of it is obviously the same), then go look for it in two.csv and, if it matches in a row, append the row's 4^th element (THEONETHATGOTAWAY_xx) to file01.csv; like so:

file01.csv:

ID:,4815162342
Status:,Open
Date:,29/02/1842
Asset:,9765
Frequency:,Monthly
THEONETHATGOTAWAY_01
THEONETHATGOTAWAY_02
THEONETHATGOTAWAY_03

EOF

This last file01.csv is my goal.

What I tried so far:

To try and compare the two files for same ID and print if succeeded, I tried:

...
directory = '/home/user/work/'

with open(directory + 'one.csv', 'r') as one, open(directory + 'two.csv', 'r') as two:
    
    # One:
    sheetOne = csv.reader(one, delimiter=';')
    titleOne = next(sheetOne)     # First row as constants (ID,Status,Date,Asset,Frequency)

    # Two:
    sheetTwo = csv.reader(two, delimiter=',')

    for rowOne in sheetOne:
        for rowTwo in sheetTwo:
            if rowTwo[2] == rowOne[0]:     # Check if ID in one.csv row is same as ID in two.csv row
                print("Equal: " + rowTwo[2] + "&" + rowOne[0])

EOF

Instead of print("Equal: " + rowTwo[2] + "&" + rowOne[0]) there will be

with open(directory + "output/" + rowOne[*] + "/" + rowOne[*].capitalize() + "/" + rowOne[*] + ".csv", 'a') as f_2:
    f_2.write(rowTwo[4] + ";")

once I figure out how to make it work.

My guess is that something is not right with that:

    for rowOne in sheetOne:
        for rowTwo in sheetTwo:
            if rowTwo[2] == rowOne[0]:     # Check if ID in one.csv row is same as ID in two.csv row

I'm pretty positive that the other answers wouldn't meet my specific need, but if my question were to be flagged as a duplicate I wouldn't be surprised, because I'm in no shape to play around with the script any more.

Solution

A less overhead approach would be to use pandas :

#pip install pandas
import pandas as pd
from pathlib import Path

dip = Path("/home/user/work")
out = dip / "output" # to be adjusted (if needed)

one = pd.read_csv(dip / "one.csv").add_suffix(":")
one.insert(len(one.columns), "", "") #to simulate the blank line

two = pd.read_csv(dip / "two.csv")

for n, id_ in enumerate(one["ID:"], start=1):
    head = one.loc[one["ID:"].eq(id_)].T
    tail = two.loc[two["ID"].eq(id_), ["Activity"]].set_index(["Activity"])
    pd.concat([head, tail]).to_csv(out / f"file{str(n).zfill(2)}.csv", header=False)

Output (file01.csv) :

ID:,4815162342
Status:,Open
Date:,29/02/1842
Asset:,9765
Frequency:,Monthly
,
THEONETHATGOTAWAY_01,
THEONETHATGOTAWAY_02,
THEONETHATGOTAWAY_03,

In a tabular format :

FF's view :

home
┗━━ user
    ┗━━ work
        ┣━━ one.csv
        ┣━━ two.csv
        ┗━━ output
            ┗━━ file01.csv

Answered By - Timeless

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, November 15, 2023

[FIXED] How to compare two CSV files and their multiple strings and merge them into one if condition is met?

Issue

one.csv:

two.csv:

file01.csv:

file01.csv:

What I tried so far:

Solution

0 comments:

Post a Comment

Popular Posts

Labels