diff options
Diffstat (limited to 'Doc/library/multiprocessing.rst')
-rw-r--r-- | Doc/library/multiprocessing.rst | 301 |
1 files changed, 215 insertions, 86 deletions
diff --git a/Doc/library/multiprocessing.rst b/Doc/library/multiprocessing.rst index b2ed544..f3197eb 100644 --- a/Doc/library/multiprocessing.rst +++ b/Doc/library/multiprocessing.rst @@ -93,11 +93,108 @@ To show the individual process IDs involved, here is an expanded example:: p.start() p.join() -For an explanation of why (on Windows) the ``if __name__ == '__main__'`` part is +For an explanation of why the ``if __name__ == '__main__'`` part is necessary, see :ref:`multiprocessing-programming`. +Contexts and start methods +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Depending on the platform, :mod:`multiprocessing` supports three ways +to start a process. These *start methods* are + + *spawn* + The parent process starts a fresh python interpreter process. The + child process will only inherit those resources necessary to run + the process objects :meth:`~Process.run` method. In particular, + unnecessary file descriptors and handles from the parent process + will not be inherited. Starting a process using this method is + rather slow compared to using *fork* or *forkserver*. + + Available on Unix and Windows. The default on Windows. + + *fork* + The parent process uses :func:`os.fork` to fork the Python + interpreter. The child process, when it begins, is effectively + identical to the parent process. All resources of the parent are + inherited by the child process. Note that safely forking a + multithreaded process is problematic. + + Available on Unix only. The default on Unix. + + *forkserver* + When the program starts and selects the *forkserver* start method, + a server process is started. From then on, whenever a new process + is needed, the parent process connects to the server and requests + that it fork a new process. The fork server process is single + threaded so it is safe for it to use :func:`os.fork`. No + unnecessary resources are inherited. + + Available on Unix platforms which support passing file descriptors + over Unix pipes. + +Before Python 3.4 *fork* was the only option available on Unix. Also, +prior to Python 3.4, child processes would inherit all the parents +inheritable handles on Windows. + +On Unix using the *spawn* or *forkserver* start methods will also +start a *semaphore tracker* process which tracks the unlinked named +semaphores created by processes of the program. When all processes +have exited the semaphore tracker unlinks any remaining semaphores. +Usually there should be none, but if a process was killed by a signal +there may some "leaked" semaphores. (Unlinking the named semaphores +is a serious matter since the system allows only a limited number, and +they will not be automatically unlinked until the next reboot.) + +To select the a start method you use the :func:`set_start_method` in +the ``if __name__ == '__main__'`` clause of the main module. For +example:: + + import multiprocessing as mp + + def foo(q): + q.put('hello') + + if __name__ == '__main__': + mp.set_start_method('spawn') + q = mp.Queue() + p = mp.Process(target=foo, args=(q,)) + p.start() + print(q.get()) + p.join() + +:func:`set_start_method` should not be used more than once in the +program. + +Alternatively, you can use :func:`get_context` to obtain a context +object. Context objects have the same API as the multiprocessing +module, and allow one to use multiple start methods in the same +program. :: + + import multiprocessing as mp + + def foo(q): + q.put('hello') + + if __name__ == '__main__': + ctx = mp.get_context('spawn') + q = ctx.Queue() + p = ctx.Process(target=foo, args=(q,)) + p.start() + print(q.get()) + p.join() + +Note that objects related to one context may not be compatible with +processes for a different context. In particular, locks created using +the *fork* context cannot be passed to a processes started using the +*spawn* or *forkserver* start methods. + +A library which wants to use a particular start method should probably +use :func:`get_context` to avoid interfering with the choice of the +library user. + + Exchanging objects between processes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -274,15 +371,31 @@ processes in a few different ways. For example:: from multiprocessing import Pool + from time import sleep def f(x): return x*x if __name__ == '__main__': - with Pool(processes=4) as pool: # start 4 worker processes - result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously - print(result.get(timeout=1)) # prints "100" unless your computer is *very* slow - print(pool.map(f, range(10))) # prints "[0, 1, 4,..., 81]" + # start 4 worker processes + with Pool(processes=4) as pool: + + # print "[0, 1, 4,..., 81]" + print(pool.map(f, range(10))) + + # print same numbers in arbitrary order + for i in pool.imap_unordered(f, range(10)): + print(i) + + # evaluate "f(10)" asynchronously + res = pool.apply_async(f, [10]) + print(res.get(timeout=1)) # prints "100" + + # make worker sleep for 10 secs + res = pool.apply_async(sleep, 10) + print(res.get(timeout=1)) # raises multiprocessing.TimeoutError + + # exiting the 'with'-block has stopped the pool Note that the methods of a pool should only ever be used by the process which created it. @@ -731,6 +844,9 @@ Miscellaneous Return the number of CPUs in the system. May raise :exc:`NotImplementedError`. + .. seealso:: + :func:`os.cpu_count` + .. function:: current_process() Return the :class:`Process` object corresponding to the current process. @@ -761,6 +877,43 @@ Miscellaneous If the module is being run normally by the Python interpreter then :func:`freeze_support` has no effect. +.. function:: get_all_start_methods() + + Returns a list of the supported start methods, the first of which + is the default. The possible start methods are ``'fork'``, + ``'spawn'`` and ``'forkserver'``. On Windows only ``'spawn'`` is + available. On Unix ``'fork'`` and ``'spawn'`` are always + supported, with ``'fork'`` being the default. + + .. versionadded:: 3.4 + +.. function:: get_context(method=None) + + Return a context object which has the same attributes as the + :mod:`multiprocessing` module. + + If *method* is *None* then the default context is returned. + Otherwise *method* should be ``'fork'``, ``'spawn'``, + ``'forkserver'``. :exc:`ValueError` is raised if the specified + start method is not available. + + .. versionadded:: 3.4 + +.. function:: get_start_method(allow_none=False) + + Return the name of start method used for starting processes. + + If the start method has not been fixed and *allow_none* is false, + then the start method is fixed to the default and the name is + returned. If the start method has not been fixed and *allow_none* + is true then *None* is returned. + + The return value can be ``'fork'``, ``'spawn'``, ``'forkserver'`` + or *None*. ``'fork'`` is the default on Unix, while ``'spawn'`` is + the default on Windows. + + .. versionadded:: 3.4 + .. function:: set_executable() Sets the path of the Python interpreter to use when starting a child process. @@ -769,8 +922,21 @@ Miscellaneous set_executable(os.path.join(sys.exec_prefix, 'pythonw.exe')) - before they can create child processes. (Windows only) + before they can create child processes. + + .. versionchanged:: 3.4 + Now supported on Unix when the ``'spawn'`` start method is used. + +.. function:: set_start_method(method) + + Set the method which should be used to start child processes. + *method* can be ``'fork'``, ``'spawn'`` or ``'forkserver'``. + + Note that this should be called at most once, and it should be + protected inside the ``if __name__ == '__main__'`` clause of the + main module. + .. versionadded:: 3.4 .. note:: @@ -1666,14 +1832,14 @@ Process Pools One can create a pool of processes which will carry out tasks submitted to it with the :class:`Pool` class. -.. class:: Pool([processes[, initializer[, initargs[, maxtasksperchild]]]]) +.. class:: Pool([processes[, initializer[, initargs[, maxtasksperchild [, context]]]]]) A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation. *processes* is the number of worker processes to use. If *processes* is - ``None`` then the number returned by :func:`cpu_count` is used. If + ``None`` then the number returned by :func:`os.cpu_count` is used. If *initializer* is not ``None`` then each worker process will call ``initializer(*initargs)`` when it starts. @@ -1686,6 +1852,13 @@ with the :class:`Pool` class. unused resources to be freed. The default *maxtasksperchild* is None, which means worker processes will live as long as the pool. + .. versionadded:: 3.4 + *context* can be used to specify the context used for starting + the worker processes. Usually a pool is created using the + function :func:`multiprocessing.Pool` or the :meth:`Pool` method + of a context object. In both cases *context* is set + appropriately. + .. note:: Worker processes within a :class:`Pool` typically live for the complete @@ -2177,43 +2350,8 @@ Below is an example session with logging turned on:: [INFO/MainProcess] sending shutdown message to manager [INFO/SyncManager-...] manager exiting with exitcode 0 -In addition to having these two logging functions, the multiprocessing also -exposes two additional logging level attributes. These are :const:`SUBWARNING` -and :const:`SUBDEBUG`. The table below illustrates where theses fit in the -normal level hierarchy. - -+----------------+----------------+ -| Level | Numeric value | -+================+================+ -| ``SUBWARNING`` | 25 | -+----------------+----------------+ -| ``SUBDEBUG`` | 5 | -+----------------+----------------+ - For a full table of logging levels, see the :mod:`logging` module. -These additional logging levels are used primarily for certain debug messages -within the multiprocessing module. Below is the same example as above, except -with :const:`SUBDEBUG` enabled:: - - >>> import multiprocessing, logging - >>> logger = multiprocessing.log_to_stderr() - >>> logger.setLevel(multiprocessing.SUBDEBUG) - >>> logger.warning('doomed') - [WARNING/MainProcess] doomed - >>> m = multiprocessing.Manager() - [INFO/SyncManager-...] child process calling self.run() - [INFO/SyncManager-...] created temp directory /.../pymp-... - [INFO/SyncManager-...] manager serving at '/.../pymp-djGBXN/listener-...' - >>> del m - [SUBDEBUG/MainProcess] finalizer calling ... - [INFO/MainProcess] sending shutdown message to manager - [DEBUG/SyncManager-...] manager received shutdown message - [SUBDEBUG/SyncManager-...] calling <Finalize object, callback=unlink, ... - [SUBDEBUG/SyncManager-...] finalizer calling <built-in function unlink> ... - [SUBDEBUG/SyncManager-...] calling <Finalize object, dead> - [SUBDEBUG/SyncManager-...] finalizer calling <function rmtree at 0x5aa730> ... - [INFO/SyncManager-...] manager exiting with exitcode 0 The :mod:`multiprocessing.dummy` module ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -2234,8 +2372,10 @@ There are certain guidelines and idioms which should be adhered to when using :mod:`multiprocessing`. -All platforms -~~~~~~~~~~~~~ +All start methods +~~~~~~~~~~~~~~~~~ + +The following applies to all start methods. Avoid shared state @@ -2269,11 +2409,13 @@ Joining zombie processes Better to inherit than pickle/unpickle - On Windows many types from :mod:`multiprocessing` need to be picklable so - that child processes can use them. However, one should generally avoid - sending shared objects to other processes using pipes or queues. Instead - you should arrange the program so that a process which needs access to a - shared resource created elsewhere can inherit it from an ancestor process. + When using the *spawn* or *forkserver* start methods many types + from :mod:`multiprocessing` need to be picklable so that child + processes can use them. However, one should generally avoid + sending shared objects to other processes using pipes or queues. + Instead you should arrange the program so that a process which + needs access to a shared resource created elsewhere can inherit it + from an ancestor process. Avoid terminating processes @@ -2320,15 +2462,17 @@ Joining processes that use queues Explicitly pass resources to child processes - On Unix a child process can make use of a shared resource created in a - parent process using a global resource. However, it is better to pass the - object as an argument to the constructor for the child process. + On Unix using the *fork* start method, a child process can make + use of a shared resource created in a parent process using a + global resource. However, it is better to pass the object as an + argument to the constructor for the child process. - Apart from making the code (potentially) compatible with Windows this also - ensures that as long as the child process is still alive the object will not - be garbage collected in the parent process. This might be important if some - resource is freed when the object is garbage collected in the parent - process. + Apart from making the code (potentially) compatible with Windows + and the other start methods this also ensures that as long as the + child process is still alive the object will not be garbage + collected in the parent process. This might be important if some + resource is freed when the object is garbage collected in the + parent process. So for instance :: @@ -2387,17 +2531,19 @@ Beware of replacing :data:`sys.stdin` with a "file like object" For more information, see :issue:`5155`, :issue:`5313` and :issue:`5331` -Windows -~~~~~~~ +The *spawn* and *forkserver* start methods +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Since Windows lacks :func:`os.fork` it has a few extra restrictions: +There are a few extra restriction which don't apply to the *fork* +start method. More picklability - Ensure that all arguments to :meth:`Process.__init__` are picklable. This - means, in particular, that bound or unbound methods cannot be used directly - as the ``target`` argument on Windows --- just define a function and use - that instead. + Ensure that all arguments to :meth:`Process.__init__` are + picklable. This means, in particular, that bound or unbound + methods cannot be used directly as the ``target`` (unless you use + the *fork* start method) --- just define a function and use that + instead. Also, if you subclass :class:`~multiprocessing.Process` then make sure that instances will be picklable when the :meth:`Process.start @@ -2419,7 +2565,8 @@ Safe importing of main module interpreter without causing unintended side effects (such a starting a new process). - For example, under Windows running the following module would fail with a + For example, using the *spawn* or *forkserver* start method + running the following module would fail with a :exc:`RuntimeError`:: from multiprocessing import Process @@ -2433,13 +2580,14 @@ Safe importing of main module Instead one should protect the "entry point" of the program by using ``if __name__ == '__main__':`` as follows:: - from multiprocessing import Process, freeze_support + from multiprocessing import Process, freeze_support, set_start_method def foo(): print('hello') if __name__ == '__main__': freeze_support() + set_start_method('spawn') p = Process(target=foo) p.start() @@ -2470,26 +2618,7 @@ Using :class:`~multiprocessing.pool.Pool`: :language: python3 -Synchronization types like locks, conditions and queues: - -.. literalinclude:: ../includes/mp_synchronize.py - :language: python3 - - An example showing how to use queues to feed tasks to a collection of worker processes and collect the results: .. literalinclude:: ../includes/mp_workers.py - - -An example of how a pool of worker processes can each run a -:class:`~http.server.SimpleHTTPRequestHandler` instance while sharing a single -listening socket. - -.. literalinclude:: ../includes/mp_webserver.py - - -Some simple benchmarks comparing :mod:`multiprocessing` with :mod:`threading`: - -.. literalinclude:: ../includes/mp_benchmarks.py - |