diff --git a/docs/source/parallel/asyncresult.txt b/docs/source/parallel/asyncresult.txt new file mode 100644 index 0000000..276cdb1 --- /dev/null +++ b/docs/source/parallel/asyncresult.txt @@ -0,0 +1,116 @@ +.. _parallel_asyncresult: + +====================== +The AsyncResult object +====================== + +In non-blocking mode, :meth:`apply` submits the command to be executed and +then returns a :class:`~.AsyncResult` object immediately. The +AsyncResult object gives you a way of getting a result at a later +time through its :meth:`get` method, but it also collects metadata +on execution. + + +Beyond multiprocessing's AsyncResult +==================================== + +.. Note:: + + The :class:`~.AsyncResult` object provides a superset of the interface in + :py:class:`multiprocessing.pool.AsyncResult`. See the + `official Python documentation `_ + for more on the basics of this interface. + +Our AsyncResult objects add a number of convenient features for working with +parallel results, beyond what is provided by the original AsyncResult. + + +get_dict +-------- + +First, is :meth:`.AsyncResult.get_dict`, which pulls results as a dictionary +keyed by engine_id, rather than a flat list. This is useful for quickly +coordinating or distributing information about all of the engines. + +As an example, here is a quick call that gives every engine a dict showing +the PID of every other engine: + +.. sourcecode:: ipython + + In [10]: ar = rc[:].apply_async(os.getpid) + In [11]: pids = ar.get_dict() + In [12]: rc[:]['pid_map'] = pids + +This trick is particularly useful when setting up inter-engine communication, +as in IPython's :file:`examples/parallel/interengine` examples. + + +Metadata +======== + +IPython.parallel tracks some metadata about the tasks, which is stored +in the :attr:`.Client.metadata` dict. The AsyncResult object gives you an +interface for this information as well, including timestamps stdout/err, +and engine IDs. + + +Timing +------ + +IPython tracks various timestamps as :py:class:`.datetime` objects, +and the AsyncResult object has a few properties that turn these into useful +times (in seconds as floats). + +For use while the tasks are still pending: + +* :attr:`ar.elapsed` is just the elapsed seconds since submission, for use + before the AsyncResult is complete. +* :attr:`ar.progress` is the number of tasks that have completed. Fractional progress + would be:: + + 1.0 * ar.progress / len(ar) + +* :meth:`AsyncResult.wait_interactive` will wait for the result to finish, but + print out status updates on progress and elapsed time while it waits. + +For use after the tasks are done: + +* :attr:`ar.serial_time` is the sum of the computation time of all of the tasks + done in parallel. +* :attr:`ar.wall_time` is the time between the first task submitted and last result + received. This is the actual cost of computation, including IPython overhead. + + +.. note:: + + wall_time is only precise if the Client is waiting for results when + the task finished, because the `received` timestamp is made when the result is + unpacked by the Client, triggered by the :meth:`~Client.spin` call. If you + are doing work in the Client, and not waiting/spinning, then `received` might + be artificially high. + +An often interesting metric is the time it actually cost to do the work in parallel +relative to the serial computation, and this can be given simply with + +.. sourcecode:: python + + speedup = ar.serial_time / ar.wall_time + + +Map results are iterable! +========================= + +When an AsyncResult object has multiple results (e.g. the :class:`~AsyncMapResult` +object), you can actually iterate through them, and act on the results as they arrive: + +.. literalinclude:: ../../examples/parallel/itermapresult.py + :language: python + :lines: 20-66 + +.. seealso:: + + When AsyncResult or the AsyncMapResult don't provide what you need (for instance, + handling individual results as they arrive, but with metadata), you can always + just split the original result's ``msg_ids`` attribute, and handle them as you like. + + For an example of this, see :file:`docs/examples/parallel/customresult.py` diff --git a/docs/source/parallel/index.txt b/docs/source/parallel/index.txt index ef81e70..9bfbc76 100644 --- a/docs/source/parallel/index.txt +++ b/docs/source/parallel/index.txt @@ -11,6 +11,7 @@ Using IPython for parallel computing parallel_process.txt parallel_multiengine.txt parallel_task.txt + asyncresult.txt parallel_mpi.txt parallel_db.txt parallel_security.txt diff --git a/docs/source/parallel/parallel_multiengine.txt b/docs/source/parallel/parallel_multiengine.txt index 80fb5a3..9f035f2 100644 --- a/docs/source/parallel/parallel_multiengine.txt +++ b/docs/source/parallel/parallel_multiengine.txt @@ -275,13 +275,9 @@ then returns a :class:`AsyncResult` object immediately. The :class:`AsyncResult` object gives you a way of getting a result at a later time through its :meth:`get` method. -.. Note:: - - The :class:`AsyncResult` object provides a superset of the interface in - :py:class:`multiprocessing.pool.AsyncResult`. See the - `official Python documentation `_ - for more. +.. seealso:: + Docs on the :ref:`AsyncResult ` object. This allows you to quickly submit long running commands without blocking your local Python/IPython session: diff --git a/docs/source/parallel/parallel_task.txt b/docs/source/parallel/parallel_task.txt index 775f0a3..9871bca 100644 --- a/docs/source/parallel/parallel_task.txt +++ b/docs/source/parallel/parallel_task.txt @@ -111,27 +111,6 @@ that turns any Python function into a parallel function: In [11]: f.map(range(32)) # this is done in parallel Out[11]: [0.0,10.0,160.0,...] -.. _parallel_taskmap: - -Map results are iterable! -------------------------- - -When an AsyncResult object actually maps multiple results (e.g. the :class:`~AsyncMapResult` -object), you can actually iterate through them, and act on the results as they arrive: - -.. literalinclude:: ../../examples/parallel/itermapresult.py - :language: python - :lines: 9-34 - -.. seealso:: - - When AsyncResult or the AsyncMapResult don't provide what you need (for instance, - handling individual results as they arrive, but with metadata), you can always - just split the original result's ``msg_ids`` attribute, and handle them as you like. - - For an example of this, see :file:`docs/examples/parallel/customresult.py` - - .. _parallel_dependencies: Dependencies