Issue
I am having an issue with pandas and writing to CSV file. When I run the python scripts I either run out of memory or my computer starts running slow after script is done running. Is there any way to chunk up the data in pieces and write the chunks to CSV? I am bit new to programing in Python.
import itertools, hashlib, pandas as pd,time
chars = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']
numbers_list = list(range(0,25))
chunksize = 1_000_000
rows = []
for combination in itertools.combinations_with_replacement(chars, 10):
for A in numbers_list:
pure = str(A) + ':' + str(combination)
B = pure.replace(")", "").replace("(", "").replace("'", "").replace(",", "").replace(" ", "")
C = hashlib.sha256(B.encode('utf-8')).hexdigest()
rows.append([A , B, C])
t0 = time.time()
df = pd.DataFrame(data=rows, columns=['A', 'B', 'C'])
df.to_csv('data.csv', index=False)
tdelta = time.time() - t0
print(tdelta)
I would be really appreciative the help! Thank you!
Solution
Since you are only using the dataframe to write to a file, skip it completely. You build the full data set into memory in a python list and then again in the dataframe, needlessly eating RAM. The csv
module in the standard lib lets you write line by line.
import itertools, hashlib, time, csv
chars = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']
numbers_list = list(range(0,25))
chunksize = 1_000_000
with open('test.csv', 'w', newline='') as fileobj:
writer = csv.writer(fileobj)
for combination in itertools.combinations_with_replacement(chars, 10):
for A in numbers_list:
pure = str(A) + ':' + str(combination)
B = pure.replace(")", "").replace("(", "").replace("'", "").replace(",", "").replace(" ", "")
C = hashlib.sha256(B.encode('utf-8')).hexdigest()
writer.writerow([A , B, C])
This will go fast until you've filled up the RAM cache that fronts your storage, and then will go at whatever speed the OS can get data to disk.
Answered By - tdelaney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.