Issue
I'm working with Python (2.7) and pymongo (3.3), and I need to spawn a child process to run a job asynchronously. Unfortunately pymongo is not fork-safe as described here (and I need to interact with the db before spawning the child process).
I ran an experiment using subprocess.Popen
(with shell
set to True
and then False
) and multiprocessing.Process
. As far as I can tell they both fork the parent process to create the child process, but only multiprocessing.Process
causes pymongo to print its warning that it has detected a forked process.
I'm wondering what the pythonic way of doing this is. It seems that perhaps os.system
will do it for me but subprocess
is described as an intended replacement for os.system
so I wonder whether I'm missing something.
Solution
I think you misunderstand; since PyMongo's documentation warns you that a single MongoClient is not fork-safe, you interpret that to mean that PyMongo prohibits your whole program from ever creating subprocesses.
Any single MongoClient is not fork-safe, meaning you must not create it before forking and use the same MongoClient object after forking. Using PyMongo in your program overall, or using one MongoClient before a fork and a different one after, are all safe.
That's why subprocess.Popen
is ok: you fork, then exec (to replace your program with a different one in the child process), and therefore you cannot possibly use the same MongoClient in the child afterward.
To quote the PyMongo FAQ:
On Unix systems the multiprocessing module spawns processes using fork(). Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient. For example:
# Each process creates its own instance of MongoClient.
def func():
db = pymongo.MongoClient().mydb
# Do something with db.
proc = multiprocessing.Process(target=func)
proc.start()
Never do this:
client = pymongo.MongoClient()
# Each child process attempts to copy a global MongoClient
# created in the parent process. Never do this.
def func():
db = client.mydb
# Do something with db.
proc = multiprocessing.Process(target=func)
proc.start()
Instances of MongoClient copied from the parent process have a high probability of deadlock in the child process due to inherent incompatibilities between fork(), threads, and locks. PyMongo will attempt to issue a warning if there is a chance of this deadlock occurring.
Answered By - A. Jesse Jiryu Davis
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.