Issue
Using python (2.6.9) I have extracted data from an Oracle DB into a .CSV file (to avoid tuples).
In the .CSV I have two columns of data without headers.
Column 1 contains a product number. Column 2 contains the location that product is stored in.
Example snippet from the .CSV:
60139420 1011011990042
60139420 1013071990182
60139421 1009041990122
60139421 1007091990161
60144228 1014101990542
60160281 1003091990212
60181944 1006021990541
What I am wanting to achieve is to filter through Column 1 to find all of the unique values (where the product number appears only once), and return these values, but also have the corresponding location returned from Column 2.
So from the above example, I would like the output to be:
60144228 1014101990542
60160281 1003091990212
60181944 1006021990541
As these three products are unique.
The values can be appended to another variable / array.
Disclosure, I am pretty new to Python. I have checked stackoverflow and other online resources for examples / help, but either couldn't make sense what I have found or I was unable to find what I needed.
This brings me to asking my first question.
This is my code so far.
import cx_Oracle
import csv
conn = cx_Oracle.connect("some db credentials")
c = conn.cursor()
sql = ('''some sql query''')
c.execute(sql)
results = c.fetchall()
with open("test_file.csv", "w") as file:
for row in results:
csv.writer(file).writerow(row)
c.close()
conn.close()
If there is a way to achieve this using my code or a better way to achieve it overall, any help would be greatly appreciated.
Thank you in advance!
Solution
Pandas has a drop_duplicates function for dataframes that takes a keep=False
argument. So you should be able to do something like df = df.drop_duplicates(keep=False)
to get what you're looking for.
Without pandas, you could do something like this:
import csv
counts = {}
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter='\t')
for row in results:
prod_num = row[0]
counts[prod_num] = counts.setdefault(prod_num, 0) + 1
for row in results:
prod_num = row[0]
if counts[prod_num] == 1:
writer.writerow(row)
I'm going through the results twice: First to count the number of occurrences of each product number, and the second time to output only the rows with product numbers that occur exactly once.
Answered By - Alex Watt
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.