Thursday, March 24, 2022

[FIXED] Python asyncio - how to use if function being called has nothing to return

March 24, 2022 python, python-3.6, python-asyncio No comments

Issue

I have 200 pairs of paths to diff. I wrote a little function that will diff each pair and update a dictionary which itself is one of the arguments to the function. Assume MY_DIFFER is some diffing tool I am calling via subprocess under the hood.

async def do_diff(path1, path2, result):
    result[f"{path1} {path2}"] = MY_DIFFER(path1, path2)

As you can see I have nothing to return from this async function. I am just capturing the result in result.

I call this function in parallel elsewhere using asyncio like so:

path_tuples = [("/path11", "/path12"), ("/path21", "/path22"), ... ]
result = {}
loop = asyncio.get_event_loop()
loop.run_until_complete(
    asyncio.gather(
        *(do_diff(path1, path2, result) for path1, path2 in path_tuples)
    )
)

Questions:

I don't know where to put await in the do_diff function. But the code seems to work without it as well.
I am not sure if the diffs are really happening in parallel, because when I look at the output of ps -eaf in another terminal, I see only one instance of the underlying tool I am calling at a time.
The speed of execution is same as when I was doing the diffs sequentially

So I am clearly doing something wrong. How can I REALLY do the diffs in parallel?

PS: I am in Python 3.6

Solution

Remember that asyncio doesn't run things in parallel, it runs things concurrently, using a cooperative multitasking model -- which means that coroutines need to explicitly yield time to other coroutines for them to run. This is what the await command does; it says "go run some other coroutines while I'm waiting for something to finish".

If you're never awaiting on something, you're not getting concurrent execution.

What you want is for your do_diff method to be able to await on the execution of your external tool, but you can't do that with just the subprocess module. You can do that using the run_in_executor method, which arranges to run a synchronous command (e.g., subprocess.run) in a separate thread or process and wait asynchronously for the result. That might look something like:

async def do_diff(path1, path2, result):
    loop = asyncio.get_event_loop()
    result[f"{path1} {path2}"] = await loop.run_in_executor(None, MY_DIFFER, path1, path2)

This will by default run MY_DIFFER in a separate thread, although you can utilize a separate process instead by passing an explicit executor as the first argument to run_in_executor.

Per my comment, solving this with concurrent.futures might look something like this:

import concurrent.futures
import time


# dummy function that just sleeps for 2 seconds
# replace this with your actual code
def do_diff(path1, path2):
    print(f"diffing path {path1} and {path2}")
    time.sleep(2)
    return path1, path2, "information about diff"


# create 200 path tuples for demonstration purposes
path_tuples = [(f"/path{x}.1", f"/path{x}.2") for x in range(200)]
futures = []

with concurrent.futures.ProcessPoolExecutor(max_workers=100) as executor:
    for path1, path2 in path_tuples:
        # submit the job to the executor
        futures.append(executor.submit(do_diff, path1, path2))

# read the results
for future in futures:
    print(future.result())

Answered By - larsks

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, March 24, 2022

[FIXED] Python asyncio - how to use if function being called has nothing to return

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels