Issue
I am trying to process a huge .csv file but I don't need the first ~900000 rows of data. This is how I was originally trying to get rid of that chunk of data, but it makes the program take forever to finish. Is there a more straightforward way to do this where I don't even read those first 900000 rows in the first place?
firstColumn = [ ]
secondColumn = [ ]
thirdColumn = [ ]
readFile = input("Enter name of file to be read: ")
with open(readFile,'r') as readFile:
for eachline in readFile: # converting columns to lists
parts = eachline.strip('\n').split(',')
firstColumn.append(parts[0])
secondColumn.append(parts[1])
thirdColumn.append(parts[2])
for j in range(900000): # nothing happens for these datapoints
del firstColumn[j]
del secondColumn[j]
del thirdColumn[j]
Solution
You can skip the initial lines by doing something like this :
with open(readFile, 'r') as f:
# skip first 900,000 lines
for _ in range(900000):
next(f)
for line in f:
parts = line.strip('\n').split(',')
firstColumn.append(parts[0])
secondColumn.append(parts[1])
thirdColumn.append(parts[2])
Answered By - Charles Dupont
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.