Issue
I have a Python process that uses SQLAlchemy to insert some data into a MS SQL Server DB. When the Python process runs it hangs during the insert. I turned on SQLAlchemy logging to get some more information. I found that it hangs at this point where SQLAlchemy seems to be requesting table schema info about the entire DB:
2020-10-30 08:12:07 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1235) INFO: SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
FROM [INFORMATION_SCHEMA].[TABLES]
WHERE [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) ORDER BY [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
2020-10-30 08:12:07 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1240) INFO: ('dbo', 'BASE TABLE')
I have other "stuff" going on in the DB at this time, including some open transactions and my guess is that for whatever reason querying [INFORMATION_SCHEMA].[TABLES]
creates some deadlock or blocks somehow.
I've also read (here) that [INFORMATION_SCHEMA].[TABLES]
is a view that cannot cause a deadlock which would contradict my guess of what is causing this issue.
My question is: Can I alter the configuration/settings of SQLAlchemy so that it does not make this query in the first place?
UPDATE 1: The Python code for the insert is like this:
with sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params).connect() as connection:
# df is a Pandas DataFrame
df.to_sql(name=my_table, con=connection, if_exists='append', index=False)
Note that the code works without any problems when I run the Python script during other times of the day when I don't have those other DB transactions going on. In those cases, the log continues immediately like this, listing all the tables in the DB:
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1235) INFO: SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
FROM [INFORMATION_SCHEMA].[TABLES]
WHERE [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) ORDER BY [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1240) INFO: ('dbo', 'BASE TABLE')
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine._init_metadata(result.py:810) DEBUG: Col ('TABLE_NAME',)
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine.process_rows(result.py:1260) DEBUG: Row ('t_table1',)
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine.process_rows(result.py:1260) DEBUG: Row ('t_table2',)
...
UPDATE 2:
Apparently when a table or other object is created in an open transaction and not committed yet, querying [INFORMATION_SCHEMA].[TABLES]
will get blocked (source). Is anyone familiar with the internals of SQLAlchemy to suggest how to prevent it from making this query in the first place?
UPDATE 3: After posting this issue on the SQLAlchemy github (issue link) the SQLAlchemy devs confirmed that the query of [INFORMATION_SCHEMA].[TABLES] is in fact being caused by the Pandas function to_sql()
.
So, my new question is does anyone know how to disable this behavior in the Pandas to_sql()
function? I looked over the documentation and could not find anything that would seem to help.
Solution
As of pandas v.2.2.0 you can override the pandas method that runs the check which causes the block/deadlock. Add this before calling to_sql
:
from pandas.io.sql import SQLDatabase
def pass_check_case_sensitive(*args, **kwargs):
pass
SQLDatabase.check_case_sensitive = pass_check_case_sensitive
Answered By - Joe
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.