##// END OF EJS Templates
Implemented autosave.
Implemented autosave.

File last commit:

r2197:71065c54
r3250:46b78b3f
Show More
parallel_intro.txt
237 lines | 8.9 KiB | text/plain | TextLexer
Brian E Granger
Beginning to organize the rst documentation.
r1256 .. _ip1par:
Brian Granger
Updated the multiengine and task interface documentation....
r1678 ============================
Overview and getting started
============================
Brian E Granger
Beginning to organize the rst documentation.
r1256
Introduction
============
Brian Granger
Update of docs to reflect the new ipcluster version....
r1788 This section gives an overview of IPython's sophisticated and powerful
architecture for parallel and distributed computing. This architecture
abstracts out parallelism in a very general way, which enables IPython to
support many different styles of parallelism including:
Brian E Granger
Beginning to organize the rst documentation.
r1256
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677 * Single program, multiple data (SPMD) parallelism.
* Multiple program, multiple data (MPMD) parallelism.
Brian Granger
Update of docs to reflect the new ipcluster version....
r1788 * Message passing using MPI.
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677 * Task farming.
* Data parallel.
* Combinations of these approaches.
* Custom user defined approaches.
Brian E Granger
Beginning to organize the rst documentation.
r1256
Most importantly, IPython enables all types of parallel applications to
be developed, executed, debugged and monitored *interactively*. Hence,
the ``I`` in IPython. The following are some example usage cases for IPython:
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677 * Quickly parallelize algorithms that are embarrassingly parallel
using a number of simple approaches. Many simple things can be
parallelized interactively in one or two lines of code.
* Steer traditional MPI applications on a supercomputer from an
IPython session on your laptop.
* Analyze and visualize large datasets (that could be remote and/or
distributed) interactively using IPython and tools like
matplotlib/TVTK.
* Develop, test and debug new parallel algorithms
(that may use MPI) interactively.
* Tie together multiple MPI jobs running on different systems into
one giant distributed and parallel system.
* Start a parallel job on your cluster and then have a remote
collaborator connect to it and pull back data into their
local IPython session for plotting and analysis.
* Run a set of tasks on a set of CPUs using dynamic load balancing.
Brian E Granger
Beginning to organize the rst documentation.
r1256
Architecture overview
=====================
The IPython architecture consists of three components:
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677 * The IPython engine.
* The IPython controller.
* Various controller clients.
These components live in the :mod:`IPython.kernel` package and are
installed with IPython. They do, however, have additional dependencies
that must be installed. For more information, see our
:ref:`installation documentation <install_index>`.
Brian E Granger
Beginning to organize the rst documentation.
r1256
IPython engine
---------------
The IPython engine is a Python instance that takes Python commands over a
network connection. Eventually, the IPython engine will be a full IPython
interpreter, but for now, it is a regular Python interpreter. The engine
can also handle incoming and outgoing Python objects sent over a network
connection. When multiple engines are started, parallel and distributed
computing becomes possible. An important feature of an IPython engine is
that it blocks while user code is being executed. Read on for how the
IPython controller solves this problem to expose a clean asynchronous API
to the user.
IPython controller
------------------
The IPython controller provides an interface for working with a set of
engines. At an general level, the controller is a process to which
IPython engines can connect. For each connected engine, the controller
manages a queue. All actions that can be performed on the engine go
through this queue. While the engines themselves block when user code is
run, the controller hides that from the user to provide a fully
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677 asynchronous interface to a set of engines.
.. note::
Because the controller listens on a network port for engines to
connect to it, it must be started *before* any engines are started.
Brian E Granger
Beginning to organize the rst documentation.
r1256
Brian Granger
General work on inputhook and the docs....
r2197 The controller also provides a single point of contact for users who wish to
utilize the engines connected to the controller. There are different ways of
working with a controller. In IPython these ways correspond to different
interfaces that the controller is adapted to. Currently we have two default
interfaces to the controller:
Brian E Granger
Beginning to organize the rst documentation.
r1256
Brian Granger
Update of docs to reflect the new ipcluster version....
r1788 * The MultiEngine interface, which provides the simplest possible way of
working with engines interactively.
Brian Granger
General work on inputhook and the docs....
r2197 * The Task interface, which presents the engines as a load balanced
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677 task farming system.
Brian E Granger
Beginning to organize the rst documentation.
r1256
Advanced users can easily add new custom interfaces to enable other
styles of parallelism.
.. note::
A single controller and set of engines can be accessed
through multiple interfaces simultaneously. This opens the
door for lots of interesting things.
Controller clients
------------------
For each controller interface, there is a corresponding client. These
clients allow users to interact with a set of engines through the
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677 interface. Here are the two default clients:
* The :class:`MultiEngineClient` class.
* The :class:`TaskClient` class.
Brian E Granger
Beginning to organize the rst documentation.
r1256
Security
--------
Brian Granger
General work on inputhook and the docs....
r2197 By default (as long as `pyOpenSSL` is installed) all network connections
between the controller and engines and the controller and clients are secure.
What does this mean? First of all, all of the connections will be encrypted
using SSL. Second, the connections are authenticated. We handle authentication
in a capability based security model [Capability]_. In this model, a
"capability (known in some systems as a key) is a communicable, unforgeable
token of authority". Put simply, a capability is like a key to your house. If
you have the key to your house, you can get in. If not, you can't.
In our architecture, the controller is the only process that listens on
network ports, and is thus responsible to creating these keys. In IPython,
these keys are known as Foolscap URLs, or FURLs, because of the underlying
network protocol we are using. As a user, you don't need to know anything
about the details of these FURLs, other than that when the controller starts,
it saves a set of FURLs to files named :file:`something.furl`. The default
location of these files is the :file:`~./ipython/security` directory.
To connect and authenticate to the controller an engine or client simply needs
to present an appropriate FURL (that was originally created by the controller)
to the controller. Thus, the FURL files need to be copied to a location where
the clients and engines can find them. Typically, this is the
:file:`~./ipython/security` directory on the host where the client/engine is
running (which could be a different host than the controller). Once the FURL
files are copied over, everything should work fine.
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677
Brian Granger
Update of docs to reflect the new ipcluster version....
r1788 Currently, there are three FURL files that the controller creates:
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677
ipcontroller-engine.furl
Brian Granger
Update of docs to reflect the new ipcluster version....
r1788 This FURL file is the key that gives an engine the ability to connect
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677 to a controller.
ipcontroller-tc.furl
Brian Granger
Update of docs to reflect the new ipcluster version....
r1788 This FURL file is the key that a :class:`TaskClient` must use to
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677 connect to the task interface of a controller.
ipcontroller-mec.furl
Brian Granger
Update of docs to reflect the new ipcluster version....
r1788 This FURL file is the key that a :class:`MultiEngineClient` must use
to connect to the multiengine interface of a controller.
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677
Brian Granger
Update of docs to reflect the new ipcluster version....
r1788 More details of how these FURL files are used are given below.
Brian E Granger
Beginning to organize the rst documentation.
r1256
Brian Granger
Adding a complete description of the IPython security model.
r1756 A detailed description of the security model and its implementation in IPython
can be found :ref:`here <parallelsecurity>`.
Brian E Granger
Beginning to organize the rst documentation.
r1256 Getting Started
===============
Brian Granger
General work on inputhook and the docs....
r2197 To use IPython for parallel computing, you need to start one instance of the
controller and one or more instances of the engine. Initially, it is best to
simply start a controller and engines on a single host using the
:command:`ipcluster` command. To start a controller and 4 engines on your
localhost, just do::
Brian E Granger
Beginning to organize the rst documentation.
r1256
Brian Granger
Update of docs to reflect the new ipcluster version....
r1788 $ ipcluster local -n 4
Brian E Granger
Beginning to organize the rst documentation.
r1256
Brian Granger
General work on inputhook and the docs....
r2197 More details about starting the IPython controller and engines can be found
:ref:`here <parallel_process>`
Brian E Granger
Beginning to organize the rst documentation.
r1256
Once you have started the IPython controller and one or more engines, you
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677 are ready to use the engines to do something useful. To make sure
Brian Granger
Update of docs to reflect the new ipcluster version....
r1788 everything is working correctly, try the following commands:
.. sourcecode:: ipython
Brian E Granger
Beginning to organize the rst documentation.
r1256
Brian E Granger
Fixed most of the examples. A few still don't work, but this is a start.
r1338 In [1]: from IPython.kernel import client
Brian E Granger
Beginning to organize the rst documentation.
r1256
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677 In [2]: mec = client.MultiEngineClient()
Brian E Granger
Beginning to organize the rst documentation.
r1256
In [4]: mec.get_ids()
Out[4]: [0, 1, 2, 3]
In [5]: mec.execute('print "Hello World"')
Out[5]:
<Results List>
[0] In [1]: print "Hello World"
[0] Out[1]: Hello World
[1] In [1]: print "Hello World"
[1] Out[1]: Hello World
[2] In [1]: print "Hello World"
[2] Out[1]: Hello World
[3] In [1]: print "Hello World"
[3] Out[1]: Hello World
Brian Granger
General work on inputhook and the docs....
r2197 Remember, a client also needs to present a FURL file to the controller. How
does this happen? When a multiengine client is created with no arguments, the
client tries to find the corresponding FURL file in the local
:file:`~./ipython/security` directory. If it finds it, you are set. If you
have put the FURL file in a different location or it has a different name,
create the client like this::
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677
mec = client.MultiEngineClient('/path/to/my/ipcontroller-mec.furl')
Same thing hold true of creating a task client::
tc = client.TaskClient('/path/to/my/ipcontroller-tc.furl')
Brian Granger
General work on inputhook and the docs....
r2197 You are now ready to learn more about the :ref:`MultiEngine
<parallelmultiengine>` and :ref:`Task <paralleltask>` interfaces to the
controller.
Brian Granger
Updating the Sphinx docs in preparation for the release....
r1677
.. note::
Don't forget that the engine, multiengine client and task client all have
Brian Granger
Update of docs to reflect the new ipcluster version....
r1788 *different* furl files. You must move *each* of these around to an
appropriate location so that the engines and clients can use them to
connect to the controller.
.. [Capability] Capability-based security, http://en.wikipedia.org/wiki/Capability-based_security