Sunday, February 4, 2024

[FIXED] SQLAlchemy hangs during insert while querying [INFORMATION_SCHEMA].[TABLES]

February 04, 2024 pandas, pyodbc, python, sql-server, sqlalchemy No comments

Issue

I have a Python process that uses SQLAlchemy to insert some data into a MS SQL Server DB. When the Python process runs it hangs during the insert. I turned on SQLAlchemy logging to get some more information. I found that it hangs at this point where SQLAlchemy seems to be requesting table schema info about the entire DB:

2020-10-30 08:12:07 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1235) INFO: SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] 
FROM [INFORMATION_SCHEMA].[TABLES] 
WHERE [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) ORDER BY [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
2020-10-30 08:12:07 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1240) INFO: ('dbo', 'BASE TABLE')

I have other "stuff" going on in the DB at this time, including some open transactions and my guess is that for whatever reason querying [INFORMATION_SCHEMA].[TABLES] creates some deadlock or blocks somehow.

I've also read (here) that [INFORMATION_SCHEMA].[TABLES] is a view that cannot cause a deadlock which would contradict my guess of what is causing this issue.

My question is: Can I alter the configuration/settings of SQLAlchemy so that it does not make this query in the first place?

UPDATE 1: The Python code for the insert is like this:

with sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params).connect() as connection:
    # df is a Pandas DataFrame
    df.to_sql(name=my_table, con=connection, if_exists='append', index=False)

Note that the code works without any problems when I run the Python script during other times of the day when I don't have those other DB transactions going on. In those cases, the log continues immediately like this, listing all the tables in the DB:

2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1235) INFO: SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] 
FROM [INFORMATION_SCHEMA].[TABLES] 
WHERE [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) ORDER BY [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1240) INFO: ('dbo', 'BASE TABLE')
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine._init_metadata(result.py:810) DEBUG: Col ('TABLE_NAME',)
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine.process_rows(result.py:1260) DEBUG: Row ('t_table1',)
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine.process_rows(result.py:1260) DEBUG: Row ('t_table2',)
...

UPDATE 2: Apparently when a table or other object is created in an open transaction and not committed yet, querying [INFORMATION_SCHEMA].[TABLES] will get blocked (source). Is anyone familiar with the internals of SQLAlchemy to suggest how to prevent it from making this query in the first place?

UPDATE 3: After posting this issue on the SQLAlchemy github (issue link) the SQLAlchemy devs confirmed that the query of [INFORMATION_SCHEMA].[TABLES] is in fact being caused by the Pandas function to_sql().

So, my new question is does anyone know how to disable this behavior in the Pandas to_sql() function? I looked over the documentation and could not find anything that would seem to help.

Solution

As of pandas v.2.2.0 you can override the pandas method that runs the check which causes the block/deadlock. Add this before calling to_sql:

from pandas.io.sql import SQLDatabase
def pass_check_case_sensitive(*args, **kwargs):
    pass
SQLDatabase.check_case_sensitive = pass_check_case_sensitive

Answered By - Joe

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, February 4, 2024

[FIXED] SQLAlchemy hangs during insert while querying [INFORMATION_SCHEMA].[TABLES]

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels