parallel_intro.txt
237 lines
| 8.9 KiB
| text/plain
|
TextLexer
Brian E Granger
|
r1256 | .. _ip1par: | ||
Brian Granger
|
r1678 | ============================ | ||
Overview and getting started | ||||
============================ | ||||
Brian E Granger
|
r1256 | |||
Introduction | ||||
============ | ||||
Brian Granger
|
r1788 | This section gives an overview of IPython's sophisticated and powerful | ||
architecture for parallel and distributed computing. This architecture | ||||
abstracts out parallelism in a very general way, which enables IPython to | ||||
support many different styles of parallelism including: | ||||
Brian E Granger
|
r1256 | |||
Brian Granger
|
r1677 | * Single program, multiple data (SPMD) parallelism. | ||
* Multiple program, multiple data (MPMD) parallelism. | ||||
Brian Granger
|
r1788 | * Message passing using MPI. | ||
Brian Granger
|
r1677 | * Task farming. | ||
* Data parallel. | ||||
* Combinations of these approaches. | ||||
* Custom user defined approaches. | ||||
Brian E Granger
|
r1256 | |||
Most importantly, IPython enables all types of parallel applications to | ||||
be developed, executed, debugged and monitored *interactively*. Hence, | ||||
the ``I`` in IPython. The following are some example usage cases for IPython: | ||||
Brian Granger
|
r1677 | * Quickly parallelize algorithms that are embarrassingly parallel | ||
using a number of simple approaches. Many simple things can be | ||||
parallelized interactively in one or two lines of code. | ||||
* Steer traditional MPI applications on a supercomputer from an | ||||
IPython session on your laptop. | ||||
* Analyze and visualize large datasets (that could be remote and/or | ||||
distributed) interactively using IPython and tools like | ||||
matplotlib/TVTK. | ||||
* Develop, test and debug new parallel algorithms | ||||
(that may use MPI) interactively. | ||||
* Tie together multiple MPI jobs running on different systems into | ||||
one giant distributed and parallel system. | ||||
* Start a parallel job on your cluster and then have a remote | ||||
collaborator connect to it and pull back data into their | ||||
local IPython session for plotting and analysis. | ||||
* Run a set of tasks on a set of CPUs using dynamic load balancing. | ||||
Brian E Granger
|
r1256 | |||
Architecture overview | ||||
===================== | ||||
The IPython architecture consists of three components: | ||||
Brian Granger
|
r1677 | * The IPython engine. | ||
* The IPython controller. | ||||
* Various controller clients. | ||||
These components live in the :mod:`IPython.kernel` package and are | ||||
installed with IPython. They do, however, have additional dependencies | ||||
that must be installed. For more information, see our | ||||
:ref:`installation documentation <install_index>`. | ||||
Brian E Granger
|
r1256 | |||
IPython engine | ||||
--------------- | ||||
The IPython engine is a Python instance that takes Python commands over a | ||||
network connection. Eventually, the IPython engine will be a full IPython | ||||
interpreter, but for now, it is a regular Python interpreter. The engine | ||||
can also handle incoming and outgoing Python objects sent over a network | ||||
connection. When multiple engines are started, parallel and distributed | ||||
computing becomes possible. An important feature of an IPython engine is | ||||
that it blocks while user code is being executed. Read on for how the | ||||
IPython controller solves this problem to expose a clean asynchronous API | ||||
to the user. | ||||
IPython controller | ||||
------------------ | ||||
The IPython controller provides an interface for working with a set of | ||||
engines. At an general level, the controller is a process to which | ||||
IPython engines can connect. For each connected engine, the controller | ||||
manages a queue. All actions that can be performed on the engine go | ||||
through this queue. While the engines themselves block when user code is | ||||
run, the controller hides that from the user to provide a fully | ||||
Brian Granger
|
r1677 | asynchronous interface to a set of engines. | ||
.. note:: | ||||
Because the controller listens on a network port for engines to | ||||
connect to it, it must be started *before* any engines are started. | ||||
Brian E Granger
|
r1256 | |||
Brian Granger
|
r2197 | The controller also provides a single point of contact for users who wish to | ||
utilize the engines connected to the controller. There are different ways of | ||||
working with a controller. In IPython these ways correspond to different | ||||
interfaces that the controller is adapted to. Currently we have two default | ||||
interfaces to the controller: | ||||
Brian E Granger
|
r1256 | |||
Brian Granger
|
r1788 | * The MultiEngine interface, which provides the simplest possible way of | ||
working with engines interactively. | ||||
Brian Granger
|
r2197 | * The Task interface, which presents the engines as a load balanced | ||
Brian Granger
|
r1677 | task farming system. | ||
Brian E Granger
|
r1256 | |||
Advanced users can easily add new custom interfaces to enable other | ||||
styles of parallelism. | ||||
.. note:: | ||||
A single controller and set of engines can be accessed | ||||
through multiple interfaces simultaneously. This opens the | ||||
door for lots of interesting things. | ||||
Controller clients | ||||
------------------ | ||||
For each controller interface, there is a corresponding client. These | ||||
clients allow users to interact with a set of engines through the | ||||
Brian Granger
|
r1677 | interface. Here are the two default clients: | ||
* The :class:`MultiEngineClient` class. | ||||
* The :class:`TaskClient` class. | ||||
Brian E Granger
|
r1256 | |||
Security | ||||
-------- | ||||
Brian Granger
|
r2197 | By default (as long as `pyOpenSSL` is installed) all network connections | ||
between the controller and engines and the controller and clients are secure. | ||||
What does this mean? First of all, all of the connections will be encrypted | ||||
using SSL. Second, the connections are authenticated. We handle authentication | ||||
in a capability based security model [Capability]_. In this model, a | ||||
"capability (known in some systems as a key) is a communicable, unforgeable | ||||
token of authority". Put simply, a capability is like a key to your house. If | ||||
you have the key to your house, you can get in. If not, you can't. | ||||
In our architecture, the controller is the only process that listens on | ||||
network ports, and is thus responsible to creating these keys. In IPython, | ||||
these keys are known as Foolscap URLs, or FURLs, because of the underlying | ||||
network protocol we are using. As a user, you don't need to know anything | ||||
about the details of these FURLs, other than that when the controller starts, | ||||
it saves a set of FURLs to files named :file:`something.furl`. The default | ||||
location of these files is the :file:`~./ipython/security` directory. | ||||
To connect and authenticate to the controller an engine or client simply needs | ||||
to present an appropriate FURL (that was originally created by the controller) | ||||
to the controller. Thus, the FURL files need to be copied to a location where | ||||
the clients and engines can find them. Typically, this is the | ||||
:file:`~./ipython/security` directory on the host where the client/engine is | ||||
running (which could be a different host than the controller). Once the FURL | ||||
files are copied over, everything should work fine. | ||||
Brian Granger
|
r1677 | |||
Brian Granger
|
r1788 | Currently, there are three FURL files that the controller creates: | ||
Brian Granger
|
r1677 | |||
ipcontroller-engine.furl | ||||
Brian Granger
|
r1788 | This FURL file is the key that gives an engine the ability to connect | ||
Brian Granger
|
r1677 | to a controller. | ||
ipcontroller-tc.furl | ||||
Brian Granger
|
r1788 | This FURL file is the key that a :class:`TaskClient` must use to | ||
Brian Granger
|
r1677 | connect to the task interface of a controller. | ||
ipcontroller-mec.furl | ||||
Brian Granger
|
r1788 | This FURL file is the key that a :class:`MultiEngineClient` must use | ||
to connect to the multiengine interface of a controller. | ||||
Brian Granger
|
r1677 | |||
Brian Granger
|
r1788 | More details of how these FURL files are used are given below. | ||
Brian E Granger
|
r1256 | |||
Brian Granger
|
r1756 | A detailed description of the security model and its implementation in IPython | ||
can be found :ref:`here <parallelsecurity>`. | ||||
Brian E Granger
|
r1256 | Getting Started | ||
=============== | ||||
Brian Granger
|
r2197 | To use IPython for parallel computing, you need to start one instance of the | ||
controller and one or more instances of the engine. Initially, it is best to | ||||
simply start a controller and engines on a single host using the | ||||
:command:`ipcluster` command. To start a controller and 4 engines on your | ||||
localhost, just do:: | ||||
Brian E Granger
|
r1256 | |||
Brian Granger
|
r1788 | $ ipcluster local -n 4 | ||
Brian E Granger
|
r1256 | |||
Brian Granger
|
r2197 | More details about starting the IPython controller and engines can be found | ||
:ref:`here <parallel_process>` | ||||
Brian E Granger
|
r1256 | |||
Once you have started the IPython controller and one or more engines, you | ||||
Brian Granger
|
r1677 | are ready to use the engines to do something useful. To make sure | ||
Brian Granger
|
r1788 | everything is working correctly, try the following commands: | ||
.. sourcecode:: ipython | ||||
Brian E Granger
|
r1256 | |||
Brian E Granger
|
r1338 | In [1]: from IPython.kernel import client | ||
Brian E Granger
|
r1256 | |||
Brian Granger
|
r1677 | In [2]: mec = client.MultiEngineClient() | ||
Brian E Granger
|
r1256 | |||
In [4]: mec.get_ids() | ||||
Out[4]: [0, 1, 2, 3] | ||||
In [5]: mec.execute('print "Hello World"') | ||||
Out[5]: | ||||
<Results List> | ||||
[0] In [1]: print "Hello World" | ||||
[0] Out[1]: Hello World | ||||
[1] In [1]: print "Hello World" | ||||
[1] Out[1]: Hello World | ||||
[2] In [1]: print "Hello World" | ||||
[2] Out[1]: Hello World | ||||
[3] In [1]: print "Hello World" | ||||
[3] Out[1]: Hello World | ||||
Brian Granger
|
r2197 | Remember, a client also needs to present a FURL file to the controller. How | ||
does this happen? When a multiengine client is created with no arguments, the | ||||
client tries to find the corresponding FURL file in the local | ||||
:file:`~./ipython/security` directory. If it finds it, you are set. If you | ||||
have put the FURL file in a different location or it has a different name, | ||||
create the client like this:: | ||||
Brian Granger
|
r1677 | |||
mec = client.MultiEngineClient('/path/to/my/ipcontroller-mec.furl') | ||||
Same thing hold true of creating a task client:: | ||||
tc = client.TaskClient('/path/to/my/ipcontroller-tc.furl') | ||||
Brian Granger
|
r2197 | You are now ready to learn more about the :ref:`MultiEngine | ||
<parallelmultiengine>` and :ref:`Task <paralleltask>` interfaces to the | ||||
controller. | ||||
Brian Granger
|
r1677 | |||
.. note:: | ||||
Don't forget that the engine, multiengine client and task client all have | ||||
Brian Granger
|
r1788 | *different* furl files. You must move *each* of these around to an | ||
appropriate location so that the engines and clients can use them to | ||||
connect to the controller. | ||||
.. [Capability] Capability-based security, http://en.wikipedia.org/wiki/Capability-based_security | ||||