Issue
Python 3.2 introduced Concurrent Futures, which appear to be some advanced combination of the older threading and multiprocessing modules.
What are the advantages and disadvantages of using this for CPU bound tasks over the older multiprocessing module?
This article suggests they're much easier to work with - is that the case?
Solution
I wouldn't call concurrent.futures
more "advanced" - it's a simpler interface that works very much the same regardless of whether you use multiple threads or multiple processes as the underlying parallelization gimmick.
So, like virtually all instances of "simpler interface", much the same trade-offs are involved: it has a shallower learning curve, in large part just because there's so much less available to be learned; but, because it offers fewer options, it may eventually frustrate you in ways the richer interfaces won't.
So far as CPU-bound tasks go, that's way too under-specified to say much meaningful. For CPU-bound tasks under CPython, you need multiple processes rather than multiple threads to have any chance of getting a speedup. But how much (if any) of a speedup you get depends on the details of your hardware, your OS, and especially on how much inter-process communication your specific tasks require. Under the covers, all inter-process parallelization gimmicks rely on the same OS primitives - the high-level API you use to get at those isn't a primary factor in bottom-line speed.
Edit: example
Here's the final code shown in the article you referenced, but I'm adding an import statement needed to make it work:
from concurrent.futures import ProcessPoolExecutor
def pool_factorizer_map(nums, nprocs):
# Let the executor divide the work among processes by using 'map'.
with ProcessPoolExecutor(max_workers=nprocs) as executor:
return {num:factors for num, factors in
zip(nums,
executor.map(factorize_naive, nums))}
Here's exactly the same thing using multiprocessing
instead:
import multiprocessing as mp
def mp_factorizer_map(nums, nprocs):
with mp.Pool(nprocs) as pool:
return {num:factors for num, factors in
zip(nums,
pool.map(factorize_naive, nums))}
Note that the ability to use multiprocessing.Pool
objects as context managers was added in Python 3.3.
As for which one is easier to work with, they're essentially identical.
One difference is that Pool
supports so many different ways of doing things that you may not realize how easy it can be until you've climbed quite a way up the learning curve.
Again, all those different ways are both a strength and a weakness. They're a strength because the flexibility may be required in some situations. They're a weakness because of "preferably only one obvious way to do it". A project sticking exclusively (if possible) to concurrent.futures
will probably be easier to maintain over the long run, due to the lack of gratuitous novelty in how its minimal API can be used.
Answered By - Tim Peters
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.