Issue
I'm attempting to write some asynchronous GET requests with the aiohttp package, and have most of the pieces figured out, but am wondering what the standard approach is when handling the failures (returned as exceptions).
A general idea of my code so far (after some trial and error, I am following the approach here):
import asyncio
import aiofiles
import aiohttp
from pathlib import Path
with open('urls.txt', 'r') as f:
urls = [s.rstrip() for s in f.readlines()]
async def fetch(session, url):
async with session.get(url) as response:
if response.status != 200:
response.raise_for_status()
data = await response.text()
# (Omitted: some more URL processing goes on here)
out_path = Path(f'out/')
if not out_path.is_dir():
out_path.mkdir()
fname = url.split("/")[-1]
async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
await f.write(data)
async def fetch_all(urls, loop):
async with aiohttp.ClientSession(loop=loop) as session:
results = await asyncio.gather(*[fetch(session, url) for url in urls],
return_exceptions=True)
return results
if __name__ == '__main__':
loop = asyncio.get_event_loop()
results = loop.run_until_complete(fetch_all(urls, loop))
Now this runs fine:
- As expected the
results
variable is populated withNone
entries where the corresponding URL [i.e. at the same index in theurls
array variable, i.e. at the same line number in the input fileurls.txt
] was successfully requested, and the corresponding file is written to disk. - This means I can use the results variable to determine which URLs were not successful (those entries in
results
not equal toNone
)
I have looked at a few different guides to using the various asynchronous Python packages (aiohttp
, aiofiles
, and asyncio
) but I haven't seen the standard way to handle this final step.
- Should the retrying to send a GET request be done after the
await
statement has 'finished'/'completed'? - ...or should the retrying to send a GET request be initiated by some sort of callback upon failure
- The errors look like this:
(ClientConnectorError(111, "Connect call failed ('000.XXX.XXX.XXX', 443)")
i.e. the request to IP address000.XXX.XXX.XXX
at port443
failed, probably because there's some limit from the server which I should respect by waiting with a time out before retrying.
- The errors look like this:
- Is there some sort of limit I might consider putting on, to batch the number of requests rather than trying them all?
- I am getting about 40-60 successful requests when attempting a few hundred (over 500) URLs in my list.
Naively, I was expecting run_until_complete
to handle this in such a way that it would finish upon succeeding at requesting all URLs, but this isn't the case.
I haven't worked with asynchronous Python and sessions/loops before, so would appreciate any help to find how to get the results
. Please let me know if I can give any more information to improve this question, thank you!
Solution
Should the retrying to send a GET request be done after the await statement has 'finished'/'completed'? ...or should the retrying to send a GET request be initiated by some sort of callback upon failure
You can do the former. You don't need any special callback, since you are executing inside a coroutine, so a simple while
loop will suffice, and won't interfere with execution of other coroutines. For example:
async def fetch(session, url):
data = None
while data is None:
try:
async with session.get(url) as response:
response.raise_for_status()
data = await response.text()
except aiohttp.ClientError:
# sleep a little and try again
await asyncio.sleep(1)
# (Omitted: some more URL processing goes on here)
out_path = Path(f'out/')
if not out_path.is_dir():
out_path.mkdir()
fname = url.split("/")[-1]
async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
await f.write(data)
Naively, I was expecting
run_until_complete
to handle this in such a way that it would finish upon succeeding at requesting all URLs
The term "complete" is meant in the technical sense of a coroutine completing (running its course), which is achieved either by the coroutine returning or raising an exception.
Answered By - user4815162342
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.