Concurrent.futures vs Multiprocessing in Python 3

Python 3.2 introduced Concurrent Futures, which appear to be some advanced combination of the older threading and multiprocessing modules.

What are the advantages and disadvantages of using this for CPU bound tasks over the older multiprocessing module?

This article suggests they're much easier to work with - is that the case?

I wouldn't call concurrent.futures more "advanced" - it's a simpler interface that works very much the same regardless of whether you use multiple threads or multiple processes as the underlying parallelization gimmick.

So, like virtually all instances of "simpler interface", much the same tradeoffs are involved: it has a shallower learning curve, in large part just because there's so much less available to be learned; but, because it offers fewer options, it may eventually frustrate you in ways the richer interfaces won't.

So far as CPU-bound tasks go, that's waaaay too under-specified to say much meaningful. For CPU-bound tasks under CPython, you need multiple processes rather than multiple threads to have any chance of getting a speedup. But how much (if any) of a speedup you get depends on the details of your hardware, your OS, and especially on how much inter-process communication your specific tasks require. Under the covers, all inter-process parallelization gimmicks rely on the same OS primitives - the high-level API you use to get at those isn't a primary factor in bottom-line speed.

Edit: example

Here's the final code shown in the article you referenced, but I'm adding an import statement needed to make it work:

    from concurrent.futures import ProcessPoolExecutor
    def pool_factorizer_map(nums, nprocs):
        # Let the executor divide the work among processes by using 'map'.
        with ProcessPoolExecutor(max_workers=nprocs) as executor:
            return {num:factors for num, factors in
                                    zip(nums,
                                        executor.map(factorize_naive, nums))}

Here's exactly the same thing using multiprocessing instead:

    import multiprocessing as mp
    def mp_factorizer_map(nums, nprocs):
        with mp.Pool(nprocs) as pool:
            return {num:factors for num, factors in
                                    zip(nums,
                                        pool.map(factorize_naive, nums))}

Note that the ability to use multiprocessing.Pool objects as context managers was added in Python 3.3.

Which one is easier to work with? LOL ;-) They're essentially identical.

One difference is that Pool supports so many different ways of doing things that you may not realize how easy it can be until you've climbed quite a way up the learning curve.

Again, all those different ways are both a strength and a weakness. They're a strength because the flexibility may be required in some situations. They're a weakness because of "preferably only one obvious way to do it". A project sticking exclusively (if possible) to concurrent.futures will probably be easier to maintain over the long run, due to the lack of gratuitous novelty in how its minimalistic API can be used.

From: stackoverflow.com/q/20776189

Back to homepage or read more recommendations: