Issue
I'm trying to write some asynchronous code. I started with a public code like the following:
import asyncio
import aiohttp
urls = ['www.example.com/1', 'www.example.com/2', ...]
tasks = []
async def fetch(url, session) -> str:
async with session.get(url) as resp:
return await resp.text()
async def main():
async with aiohttp.ClientSession() as session:
for url in urls:
tasks.append(asyncio.create_task(fetch(url, session)))
response = await asyncio.gather(*tasks, return_exceptions=True)
asyncio.run(main())
I realized that there is another way to get the same result by writing main()
as below:
async def main_2():
async with aiohttp.ClientSession() as session:
for url in urls:
tasks.append(asyncio.create_task(fetch(url, session)))
response = []
for t in tasks:
response.append(await t)
Both methods take same time to finish. So, while processing responses inside main_2()
is so easy, what are the benefits of using asyncio.gather?
Solution
Advantages:
- It automatically schedules any coroutines as tasks for you. If you hadn't been creating the tasks manually, the non-
gather
approach wouldn't even start running them until you tried toawait
them (losing all the benefits of async processing), wheregather
would create tasks for all of them up-front thenawait
them in bulk. - When using
return_exceptions=False
(the default), you'll know when something has gone wrong immediately; with the loop, you might process dozens of results before one turns out to have failed. This may or may not be advantageous, depending on your needs.asyncio.as_completed
may serve better in certain cases (getting results in completion order, as soon as they come in, rather than waiting for everything to finish), it depends on needs. - If you save off the
gather
to a name beforeawait
ing it, you can bulk cancel any outstanding tasks when an exception occurs andreturn_exceptions=False
(justtry:
/except Exception: gathername.cancel()
, without needing to know which tasks need canceling).
Personally, I usually find asyncio.as_completed
more useful, in the same way multiprocessing.Pool.imap_unordered
is nicer than multiprocessing.Pool.map
(because result ordering rarely matters, and it's nice to process results immediately as they become available), but asyncio.gather
is the simpler "all-in-one, wait for everything before continuing" interface.
Answered By - ShadowRanger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.