Monday, May 23, 2022

[FIXED] Python, filter csv for unique values in column1 and return

May 23, 2022 csv, python, python-2.x, sorting No comments

Issue

Using python (2.6.9) I have extracted data from an Oracle DB into a .CSV file (to avoid tuples).

In the .CSV I have two columns of data without headers.

Column 1 contains a product number. Column 2 contains the location that product is stored in.

Example snippet from the .CSV:

60139420    1011011990042
60139420    1013071990182
60139421    1009041990122
60139421    1007091990161
60144228    1014101990542
60160281    1003091990212
60181944    1006021990541

What I am wanting to achieve is to filter through Column 1 to find all of the unique values (where the product number appears only once), and return these values, but also have the corresponding location returned from Column 2.

So from the above example, I would like the output to be:

60144228    1014101990542
60160281    1003091990212
60181944    1006021990541

As these three products are unique.

The values can be appended to another variable / array.

Disclosure, I am pretty new to Python. I have checked stackoverflow and other online resources for examples / help, but either couldn't make sense what I have found or I was unable to find what I needed.

This brings me to asking my first question.

This is my code so far.

import cx_Oracle
import csv

conn = cx_Oracle.connect("some db credentials")

c = conn.cursor()
sql = ('''some sql query''')
c.execute(sql)

results = c.fetchall()

with open("test_file.csv", "w") as file:
    for row in results:
        csv.writer(file).writerow(row)

c.close()
conn.close()

If there is a way to achieve this using my code or a better way to achieve it overall, any help would be greatly appreciated.

Thank you in advance!

Solution

Pandas has a drop_duplicates function for dataframes that takes a keep=False argument. So you should be able to do something like df = df.drop_duplicates(keep=False) to get what you're looking for.

Without pandas, you could do something like this:

import csv

counts = {}

with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile, delimiter='\t')
    for row in results:
        prod_num = row[0]
        counts[prod_num] = counts.setdefault(prod_num, 0) + 1
    for row in results:
        prod_num = row[0]
        if counts[prod_num] == 1:
            writer.writerow(row)

I'm going through the results twice: First to count the number of occurrences of each product number, and the second time to output only the rows with product numbers that occur exactly once.

Answered By - Alex Watt

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, May 23, 2022

[FIXED] Python, filter csv for unique values in column1 and return

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels