Issue
How can I insert millions or rows from a .txt file into a SQLite3 database using Flask-SQLAlchemy? I tried just reading one line at the time from the .txt file and adding and committing them in a loop but noticed that tis takes an enormous amount of time to do. How can I do this effectively? I tried implementing this solution https://stackoverflow.com/a/7137270 to my code but couldn't get it to work.
The table schema looks like this:
class table(db.Model):
id = db.Column(db.Integer, primary_key=True)
col1 = db.Column(db.Integer)
col2 = db.Column(db.Integer)
col3 = db.Column(db.String(50))
col4 = db.Column(db.String(50))
col5 = db.Column(db.String(50))
col6 = db.Column(db.Integer)
col7 = db.Column(db.String(50))
col8 = db.Column(db.Integer)
col9 = db.Column(db.Integer)
col10 = db.Column(db.Integer)
col11 = db.Column(db.Integer)
col12 = db.Column(db.Integer)
The rows in the .txt
file look like this:
hjk;28770930;Y;T;C;;asd;;1;1233;1233;0.00081103
And there are about 85M rows to add to the db.
Solution
I found one solution that makes the transaction significantly faster. I used the answers from:
Instead of reading one line at the time and writing one row every iteration, I use the chunks()
function to divide the file into chunks and yielding a generator. Then in the load_data_table()
function looping over the chunks and committing after every chunk that contains 10000 rows.
I would still be interested to hear if anyone can figure out a faster way because this will also take approximately 5 hours.
Here is my code:
def chunks(data, n=10000):
buffer = [None] * n
idx = 0
for record in data:
buffer[idx] = record
idx += 1
if idx == n:
yield buffer
buffer = [None] * n
idx = 0
if idx > 0:
yield buffer[:idx]
def load_data_table(filename):
rows = 0
csvData = csv.reader(open('./folder/{}'.format(filename), "r"), delimiter=";")
dataset = tables.query.filter_by(id=1).first()
divData = chunks(csvData) # divide into 10000 rows each
for chunk in divData:
for col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11, col12 in chunk:
add_new = table(col1=col1, col2=col2, col3=col3, col4=col4, col5=col5, col6=col6, col7=col7, col8=col8, col9=col9, col10=col10, col11=col11, col12=col12)
db.session.add(add_new)
db.session.commit()
rows += 10000
print(rows)
Answered By - user9988919
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.