Issue
I'm using multiprocessing in a larger code base where some of the import statements have side effects. How can I run a function in a background process without having it inherit global imports?
# helper.py:
print('This message should only print once!')
# main.py:
import multiprocessing as mp
import helper # This prints the message.
def worker():
pass # Unfortunately this also prints the message again.
if __name__ == '__main__':
mp.set_start_method('spawn')
process = mp.Process(target=worker)
process.start()
process.join()
Background: Importing TensorFlow initializes CUDA which reserves some amount of GPU memory. As a result, spawing too many processes leads to a CUDA OOM error, even though the processes don't use TensorFlow.
Similar question without an answer:
Solution
Is there a resources that explains exactly what the multiprocessing module does when starting an
mp.Process
?
Super quick version (using the spawn context not fork)
Some stuff (a pair of pipes for communication, cleanup callbacks, etc) is prepared then a new process is created with fork()
exec()
. On windows it's Create
ProcessW()
. The new python interpreter is called with a startup script spawn_main()
and passed the communication pipe file descriptors via a crafted command string and the -c
switch. The startup script cleans up the environment a little bit, then unpickles the Process
object from its communication pipe. Finally it calls the run
method of the process object.
So what about importing of modules?
Pickle semantics handle some of it, but __main__
and sys.modules
need some tlc, which is handled here (during the "cleans up the environment" bit).
Answered By - Aaron
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.