Issue
I have two master files (let's call them one.csv and two.csv) and an instruction to parse them and merge into multiple files under a specific directory structure. I managed to do the first part of it and build a directory structure with underlying files. The only thing I'm not able to achieve on my own is the second and last part of my instruction. I will now simulate relevant content of both CSV files:
one.csv:
- ID,Status,Date,Asset,Frequency
- 4815162342,Open,29/02/1842,9765,Monthly
- ...
two.csv:
- Service,Description,ID,Attribute,Activity
- IDR,Conducting,4815162342,Done,THEONETHATGOTAWAY_01
- IDR,Conducting,4815162342,Todo,THEONETHATGOTAWAY_02
- IDR,Conducting,4815162342,Todo,THEONETHATGOTAWAY_03
NOTE: The first line of both files, as it usually goes, is constant as it represents column name.
The part that I managed to do is parsing one.csv and writing it, in the correct format, to multiple files (file01.csv, file02.csv, file03.csv, etc.), each containing a column name and its respective value starting from row 2; like so:
file01.csv:
- ID:,4815162342
- Status:,Open
- Date:,29/02/1842
- Asset:,9765
- Frequency:,Monthly
EOF
What my mind refuses to success in is, after parsing one.csv and writing it to files, open each file again, read that 4815162342 (as an example. it's different in every file, but the position of it is obviously the same), then go look for it in two.csv and, if it matches in a row, append the row's 4th element (THEONETHATGOTAWAY_xx) to file01.csv; like so:
file01.csv:
- ID:,4815162342
- Status:,Open
- Date:,29/02/1842
- Asset:,9765
- Frequency:,Monthly
- THEONETHATGOTAWAY_01
- THEONETHATGOTAWAY_02
- THEONETHATGOTAWAY_03
EOF
This last file01.csv is my goal.
What I tried so far:
To try and compare the two files for same ID and print if succeeded, I tried:
...
directory = '/home/user/work/'
with open(directory + 'one.csv', 'r') as one, open(directory + 'two.csv', 'r') as two:
# One:
sheetOne = csv.reader(one, delimiter=';')
titleOne = next(sheetOne) # First row as constants (ID,Status,Date,Asset,Frequency)
# Two:
sheetTwo = csv.reader(two, delimiter=',')
for rowOne in sheetOne:
for rowTwo in sheetTwo:
if rowTwo[2] == rowOne[0]: # Check if ID in one.csv row is same as ID in two.csv row
print("Equal: " + rowTwo[2] + "&" + rowOne[0])
EOF
Instead of print("Equal: " + rowTwo[2] + "&" + rowOne[0])
there will be
with open(directory + "output/" + rowOne[*] + "/" + rowOne[*].capitalize() + "/" + rowOne[*] + ".csv", 'a') as f_2:
f_2.write(rowTwo[4] + ";")
once I figure out how to make it work.
My guess is that something is not right with that:
for rowOne in sheetOne:
for rowTwo in sheetTwo:
if rowTwo[2] == rowOne[0]: # Check if ID in one.csv row is same as ID in two.csv row
I'm pretty positive that the other answers wouldn't meet my specific need, but if my question were to be flagged as a duplicate I wouldn't be surprised, because I'm in no shape to play around with the script any more.
Solution
A less overhead approach would be to use pandas :
#pip install pandas
import pandas as pd
from pathlib import Path
dip = Path("/home/user/work")
out = dip / "output" # to be adjusted (if needed)
one = pd.read_csv(dip / "one.csv").add_suffix(":")
one.insert(len(one.columns), "", "") #to simulate the blank line
two = pd.read_csv(dip / "two.csv")
for n, id_ in enumerate(one["ID:"], start=1):
head = one.loc[one["ID:"].eq(id_)].T
tail = two.loc[two["ID"].eq(id_), ["Activity"]].set_index(["Activity"])
pd.concat([head, tail]).to_csv(out / f"file{str(n).zfill(2)}.csv", header=False)
Output (file01.csv
) :
ID:,4815162342
Status:,Open
Date:,29/02/1842
Asset:,9765
Frequency:,Monthly
,
THEONETHATGOTAWAY_01,
THEONETHATGOTAWAY_02,
THEONETHATGOTAWAY_03,
In a tabular format :
FF's view :
home
┗━━ user
┗━━ work
┣━━ one.csv
┣━━ two.csv
┗━━ output
┗━━ file01.csv
Answered By - Timeless
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.