Wednesday, April 20, 2022

[FIXED] Python async requests + writing files confusion

April 20, 2022 aiohttp, asynchronous, python, python-asyncio, rest No comments

Issue

I have a simple python async program that retrieves a gzipped file from a particular URL. I have used the aiohttp for async requests. As per the aiohttp docs (https://docs.aiohttp.org/en/stable/client_quickstart.html), I have used their example under 'Streaming Response Content' in my test method to write the data.

async def main(url):
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(test(session, url))


async def test(session, url):
    async with session.get(url=url) as r:
        with open('test.csv.gz', 'wb') as f:
            async for chunk in r.content.iter_chunked(1024):
                f.write(chunk)

However, I am not sure if the stuff in test() is actually asynchronous or not. Many articles I've read mention the requirement of the 'await' keyword in async coroutines to activate the asynchronicity (e.g. something like r = await session.get(url=url)), but Im wondering if it is the 'async with' and 'async for' patterns that also achieve the same thing?

What I am hoping to achieve is async functionality when doing session.get(), as well as when writing the data to local, such that if I pass in many urls it will a) perform async switching when getting the url and b) perform async switching when writing the data onto local.

For b), would I need to use something like the following?

async with aiofiles.open('test.csv.gz', 'wb') as f:
    async for chunk in r.content.iter_chunked(1024):
        await f.write()

This leads me to a slightly off-topic question, but what is the difference between async with session.get(url=url) as r: and r = await session.get(url=url)?

Please let me know if my understanding is flawed or if there is something fundamental I am missing regarding the async functionality!

Solution

Good question. Look at the following little program, which runs two tasks. Each has an async context manager and an async iterator:

import asyncio
from contextlib import asynccontextmanager

@asynccontextmanager
async def nada():
    # await asyncio.sleep(0.0)
    yield
    
async def aloop():
    for _ in range(5):
        # await asyncio.sleep(0.0)
        yield
    
async def atask(name):
    async for _ in aloop():
        async with nada():
            print("Task", name)

async def main():
    asyncio.create_task(atask("1"))
    await asyncio.create_task(atask("2"))

if __name__ == "__main__":
    asyncio.run(main())

Output:

Task 1
Task 1
Task 1
Task 1
Task 1
Task 2
Task 2
Task 2
Task 2
Task 2

No task switching occurs.

Now uncomment the await asyncio.sleep(0.0) in either the context manager (nada) or the iterator (aloop). Output becomes:

Task 1
Task 2
Task 1
Task 2
Task 1
Task 2
Task 1
Task 2
Task 1
Task 2

Now task switching does occur. But your main program is exactly the same in both cases.

So the answer to your first question is that the 'async with' and 'async for' patterns do not necessarily cause task switching. It depends on their implementation. Both async context managers and async iterators invoke special machinery; if that machinery executes an await expression, task switching will occur. But Python does not require async context managers or async iterators to do that.

This is perfectly legal Python:

async def do_nothing:
    pass

As a practical matter, you are probably OK to trust a widely deployed library like aiohttp. There is not much value in placing the async keyword in front of a method and not performing an await within it. The only use-case I can think of is when an API requires a coroutine but you have no need for asynchronous behavior. It would be poor design to put that sort of function in a general-use library, at any rate not without good documentation.

Your second question - what's the difference between async with session.get(url=url) as r: and r = await session.get(url=url) - is that the first form executes two special functions and the second one doesn't. The first one is somewhat equivalent to:

try:
    x = session.get(url=url)
    r = await x.__aenter__()    
    # the indented block of code executes here
finally:
    await x.__aexit__(...)

The __aexit__ method takes some arguments having to do with exception handling, which you can read about in the docs. Synchronous context managers are similar except the special methods are named __enter__ and __exit__.

Answered By - Paul Cornelius

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, April 20, 2022

[FIXED] Python async requests + writing files confusion

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels