Issue
What is the better way to insert a very large data into PostgreSQL Table?
OS: Ubuntu 22.04 LTS
DB: PostgreSQL 14
Framework: Python 3.11 Django
For now I am using insert into statement of 100,000 rows at a time. It is taking 2 Minutes for the whole process of inserting average of 1,000,000 rows, which is within my acceptable range. But I want to know if there is any better way to do this.
It was working fine but somehow it is taking more time and sometimes giving errors like
OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
from django.db import connection
cursor = connection.cursor()
batch_size = 100000
offset = 0
while True:
transaction_list_query = SELECT * FROM {source_table} LIMIT {batch_size} OFFSET {offset};
cursor.execute(transaction_list_query)
transaction_list = dictfetchall(cursor)
if not transaction_list:
break
data_to_insert = []
for transaction in transaction_list:
# Some Process Intensive Calculations
insert_query = INSERT INTO {per_transaction_table} ({company_ref_id_id_column}, {rrn_column},{transaction_type_ref_id_id_column}, {transactionamount_column}) VALUES {",".join(data_to_insert)} ON CONFLICT ({rrn_column}) DO UPDATE SET {company_ref_id_id_column} = EXCLUDED.{company_ref_id_id_column};
cursor.execute(insert_query)
offset += batch_size
Solution
A faster way could be to use a prepared statement in a session, and then repeatedly exec
ing it with a new batch of rows.
An even faster way would be to use COPY
(optionally WITH (FREEZE)
on unlogged tables: https://www.postgresql.org/docs/current/sql-copy.html
and add indexes and constraints later.
Answered By - Florian Klein
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.