Issue
Happy new year!
I am new to the Python multiprocessing
module. To better understand how apply_async
works, I wrote down the short script below. The script hangs unless I comment the second line out (get_ipython().magic('reset -sf')
).
Can someone please tell me why this is happening? I am working under Python 3.5 using the Spyder IDE.
The reason why I am using the IPython magic %reset is because I want to clear all variables before running my script and I read on this webpage that the IPython magic %reset is an equivalent to clear all
from Matlab/Octave.
Thanks in advance for your help!
from IPython import get_ipython
get_ipython().magic('reset -sf')
import random
import multiprocessing
def stakhanov(chunk_idx):
data=random.randint(1,10) # create random integer between 1 and 10:
frame_idx=chunk_idx
return (frame_idx,data)
def stakhanov_finished(result):
(frame_idx,data)=result
DATA_READ[frame_idx]=data
def start_multiprocess_io():
pool = multiprocessing.Pool(NUM_PROCESSES) # create pool of all processes:
chunk_idx = 0
for i in range(NUM_PROCESSES):
pool.apply_async(stakhanov,args=(chunk_idx,),callback=stakhanov_finished)
chunk_idx += 1
pool.close()
pool.join()
if __name__ == '__main__':
global NUM_PROCESSES, DATA_READ
NUM_PROCESSES = multiprocessing.cpu_count() # number of CORES
DATA_READ = [None for _ in range(NUM_PROCESSES)] # declare list
start_multiprocess_io()
Solution
OK, I don't know what the get_ipython.magic
call does, but in the absence of someone who does, let's look at how multiprocessing works on Windows, and why this line:
get_ipython().magic('reset -sf')
is probably wrong. Probably, that should be hidden underneath the same if __name__ == '__main__'
test that you have later.
(If moving the line fixes the problem, you can stop here, but it's worth reading the rest if you want to use the multiprocessing code effectively.)
When you create a multiprocessing.Process
or Pool
instance, the multiprocessing
module spawns an extra Python instance for the new process. This is similar to Linux, except that there is no fork
so it cannot copy the current process. This new spawned process is an all-new, fresh, empty Python.
The empty-so-far Python runs with particular arguments. These vary a bit between Python 2.7 and Python 3.6+; here, I'll quote a fairly long bit from 2.7:
def get_command_line():
'''
Returns prefix of command line used for spawning a child process
'''
if getattr(process.current_process(), '_inheriting', False):
raise RuntimeError('''
Attempt to start a new process before the current process
has finished its bootstrapping phase.
This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.''')
if getattr(sys, 'frozen', False):
return [sys.executable, '--multiprocessing-fork']
else:
prog = 'from multiprocessing.forking import main; main()'
opts = util._args_from_interpreter_flags()
return [_python_exe] + opts + ['-c', prog, '--multiprocessing-fork']
The 3.6 code splits this up a bit and has this fragment:
if getattr(sys, 'frozen', False):
return ([sys.executable, '--multiprocessing-fork'] +
['%s=%r' % item for item in kwds.items()])
else:
prog = 'from multiprocessing.spawn import spawn_main; spawn_main(%s)'
prog %= ', '.join('%s=%r' % item for item in kwds.items())
opts = util._args_from_interpreter_flags()
return [_python_exe] + opts + ['-c', prog, '--multiprocessing-fork']
Either way, what happens at this point is that the new Python should import a module from multiprocessing
and run a function in that module. The function, either main()
or spawn_main()
, loads some information from the process that created it—your process—to find out what program was run.
This all may depend on you to import multiprocessing
and call freeze_support
, if you are using a frozen Python. This is the first branch of the if getattr(sys, 'frozen', False)
test: the problem that is being worked-around here is that -c 'from multiprocessing ...'
option does not function in a frozen Python. ( If you're not using frozen Pythons, the -c
line takes care of most things.)
Anyway, the upshot is that your new Python runs this special main
or spawn_main
, which connects back to your Python process, the one you started yourself. From your Python, the new Python obtains name of the original main module, and it imports it.
It imports it with a regular old import
(well, with a special slightly hacked-up import, and again the details vary a bit by Python version). This means that __name__
is not __main__
but instead is main
or program
or whatever you named your main.py
file. This allows the multiprocessing code to get access to your entire program.
Next, the multiprocessing code figures out what function you wanted to run, from which module. (This is all handled through the pickle
system, which is why you can only run functions that can be pickled, passing arguments that can be pickled.) Having set up all the communication required between your original Python and this new Python that's running the process, the new Python can now call that function, let it do its thing, and when it returns, have the new Python process terminate.
All of this depends on the fact that when the new Python process runs import main
or import prog
or whatever it is that gets your original program loaded, its executable code is protected by a test using if __name__
. This makes sure that that code—your program's main workings—don't get run in the spawned sub-Python. Instead, only the multiprocessing.main
or multiprocessing.spawn_main
actually runs. Everything from your main program gets imported and defined, so that all the functions are available to be called once their names show up via the pickling code. But none of them run yet.
You can violate this rule,1 and run specific bits of code, if and only if they don't break the setup sequence required to run a Process
instance. It seems clear enough, based on the problem seen here, that get_ipython.magic('reset -sf')
breaks the setup sequence.
1One case where you must run specific bits of code is if you must augment sys.path
to insert the location from which some code is imported.
Answered By - torek
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.