Issue
I'm a little confused about the interaction between multiprocessing and asyncio. My goal is to be able to spawn async processes from other async processes. Here is a small example:
import asyncio
from multiprocessing import Process
async def sleep_n(n):
await asyncio.sleep(n)
def async_sleep(n):
# This does not work
#
# loop = asyncio.get_event_loop()
# loop.run_until_complete(sleep_n(n))
# This works
asyncio.run(sleep_n(n))
async def spawn_another():
await asyncio.sleep(0.2)
p = Process(target=async_sleep, args=(5,))
p.start()
p.join()
def spawn():
# This does not work
# loop = asyncio.get_event_loop()
# loop.run_until_complete(spawn_another())
# This works
asyncio.run(spawn_another())
def doit():
p = Process(target=spawn)
p.start()
p.join()
if __name__ == '__main__':
doit()
If I replace asyncio.run
with get_event_loop().run_until_complete
, I get the following error: "The event loop is already running". This is raised from loop.run_until_complete(sleep_n(n))
. What's the difference between these two?
(NB: the reason I care about this is, if it makes a difference in the proposed remedy, is because in my actual code the thing I'm running in async is a grpc.aio
client which apparently requires me to use run_until_complete
or otherwise I get an error about a Future that's attached to a different event loop. That said, this is just an aside and not really material to the question above.)
Solution
I think I've pinned it down. Its an issue with how multiprocessing works on Linux vs Windows/MacOS
Contexts and start methods
Depending on the platform, multiprocessing supports three ways to start a process. These start methods are
spawn
The parent process starts a fresh Python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.
Available on Unix and Windows. The default on Windows and macOS.
fork
The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.
Available on Unix only. The default on Unix.
forkserver
When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.
Available on Unix platforms which support passing file descriptors over Unix pipes.
So this works on MacOS and Windows because the default is spawn
, versus it fails on Linux where the default is fork
. Because we're using fork
, the entire set of data is being mapped, meaning that we're sharing the existing already instantiated local event loop in the new process. That's why the event loop states that it is already be running (and asyncio is by design non-re-entrant).
To get around this you can set the mode to spawn
manually in the main. When we use this mode, the interpreter will be newly invoked, meaning that there will be no existing event loop to conflict on.
if __name__ == '__main__':
import multiprocessing as mp
mp.set_start_method('spawn')
doit()
Answered By - flakes
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.