Issue
I have three tables which each takes around 1 minute to query from (i.e total of 3 min) like this
from my_utils import get_engine
import pandas as pd
def main():
con1 = get_engine("table1")
con2 = get_engine("table2")
con3 = get_engine("table3")
df1 = pd.read_sql(query1,con=con1)
df2 = pd.read_sql(query2,con=con2)
df3 = pd.read_sql(query3,con=con3)
main()
which yields to the sky to be "asynchronized".
I have thus tried the following (I'm rather new to using asyncio
)
.
.
import asyncio
async def get_df1(query1):
df1 = pd.read_sql(query1,con=con1)
return df1
async def get_df2(query2):
df2 = pd.read_sql(query2,con=con2)
return df2
async def get_df3(query3):
df3 = pd.read_sql(query3,con=con3)
return df3
async def main():
df1,df2,df3 = await asyncio.gather(get_df1(),get_df2(),get_df3())
asyncio.run(main())
It runs, but it takes the exact same time, as the sync-run.
Am I missing something?
Solution
Switching between coroutines only occurs at an await
statement, and since there are no await
s in your get_df
functions your three queries will only ever be executed sequentially. Since pd.read_sql
is not natively async, you'll have to wrap it with an executor to make an async version:
async def read_sql_async(stmt, con):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(None, pd.read_sql, stmt, con)
You'll then be able to run read_sql
as an awaitable:
df1 = await read_sql_async(query1, con=con1)
Answered By - Simon Bowly
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.