Issue
I tried to update multiple rows (approx. 350000) with a single query by implementing the following function:
def update_items(rows_to_update):
sql_query = """UPDATE contact as t SET
name = e.name
FROM (VALUES %s) AS e(id, name)
WHERE e.id = t.id;"""
conn = get_db_connection()
cur = conn.cursor()
psycopg2.extras.execute_values (
cur, sql_query, rows_to_update, template=None, page_size=100
)
While trying to run the function above, only 31 records were updated. Then, I tried to update row by row with the following function:
def update_items_row_by_row(rows_to_update):
sql_query = """UPDATE contact SET name = %s WHERE id = %s"""
conn = get_db_connection()
with tqdm(total=len(rows_to_update)) as pbar:
for id, name in rows_to_update:
cur = conn.cursor()
# execute the UPDATE statement
cur.execute(sql_query, (name, id))
# get the number of updated rows
# Commit the changes to the database
conn.commit()
cur.close()
pbar.update(1)
The latter has updated all the records so far but is very slow (estimated to end in 9 hours). Does anyone know what is the efficient way to update multiple records?
Solution
By splitting the list into chunks of size equal to page_size, it worked well:
def update_items(rows_to_update):
sql_query = """UPDATE contact as t SET
name = data.name
FROM (VALUES %s) AS data (id, name)
WHERE t.id = data.id"""
conn = get_db_connection()
cur = conn.cursor()
n = 100
with tqdm(total=len(rows_to_update)) as pbar:
for i in range(0, len(rows_to_update), n):
psycopg2.extras.execute_values (
cur, sql_query, rows_to_update[i:i + n], template=None, page_size=n
)
conn.commit()
pbar.update(cur.rowcount)
cur.close()
conn.close()
Answered By - Niko Gamulin
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.