##// END OF EJS Templates
update process/security docs in parallelz
update process/security docs in parallelz

File last commit:

r3600:10b1790c
r3617:ca65815d
Show More
parallel_intro.txt
210 lines | 7.8 KiB | text/plain | TextLexer
.. _ip1par:
============================
Overview and getting started
============================
Introduction
============
This section gives an overview of IPython's sophisticated and powerful
architecture for parallel and distributed computing. This architecture
abstracts out parallelism in a very general way, which enables IPython to
support many different styles of parallelism including:
* Single program, multiple data (SPMD) parallelism.
* Multiple program, multiple data (MPMD) parallelism.
* Message passing using MPI.
* Task farming.
* Data parallel.
* Combinations of these approaches.
* Custom user defined approaches.
Most importantly, IPython enables all types of parallel applications to
be developed, executed, debugged and monitored *interactively*. Hence,
the ``I`` in IPython. The following are some example usage cases for IPython:
* Quickly parallelize algorithms that are embarrassingly parallel
using a number of simple approaches. Many simple things can be
parallelized interactively in one or two lines of code.
* Steer traditional MPI applications on a supercomputer from an
IPython session on your laptop.
* Analyze and visualize large datasets (that could be remote and/or
distributed) interactively using IPython and tools like
matplotlib/TVTK.
* Develop, test and debug new parallel algorithms
(that may use MPI) interactively.
* Tie together multiple MPI jobs running on different systems into
one giant distributed and parallel system.
* Start a parallel job on your cluster and then have a remote
collaborator connect to it and pull back data into their
local IPython session for plotting and analysis.
* Run a set of tasks on a set of CPUs using dynamic load balancing.
Architecture overview
=====================
The IPython architecture consists of four components:
* The IPython engine.
* The IPython controller.
* The controller client.
These components live in the :mod:`IPython.zmq.parallel` package and are
installed with IPython. They do, however, have additional dependencies
that must be installed. For more information, see our
:ref:`installation documentation <install_index>`.
.. TODO: include zmq in install_index
IPython engine
---------------
The IPython engine is a Python instance that takes Python commands over a
network connection. Eventually, the IPython engine will be a full IPython
interpreter, but for now, it is a regular Python interpreter. The engine
can also handle incoming and outgoing Python objects sent over a network
connection. When multiple engines are started, parallel and distributed
computing becomes possible. An important feature of an IPython engine is
that it blocks while user code is being executed. Read on for how the
IPython controller solves this problem to expose a clean asynchronous API
to the user.
IPython controller
------------------
The IPython controller provides an interface for working with a set of engines. At a
general level, the controller is a collection of processes to which IPython engines and
clients can connect. The controller is composed of a :class:`Hub` and a collection of
:class:`Schedulers`. These Schedulers are typically run in separate processes but on the
same machine as the Hub, but can be run anywhere from local threads or on remote machines.
The controller also provides a single point of contact for users who wish to
utilize the engines connected to the controller. There are different ways of
working with a controller. In IPython, all of these models are implemented via
the client's :meth:`.Client.apply` method, with various arguments, or
constructing :class:`.View` objects to represent subsets of engines. The two
primary models for interacting with engines are:
* A MUX interface, where engines are addressed explicitly.
* A Task interface, where the Scheduler is trusted with assigning work to
appropriate engines.
Advanced users can readily extend the View models to enable other
styles of parallelism.
.. note::
A single controller and set of engines can be used with multiple models
simultaneously. This opens the door for lots of interesting things.
The Hub
*******
The center of an IPython cluster is the Controller Hub. This is the process that keeps
track of engine connections, schedulers, clients, as well as all task requests and
results. The primary role of the Hub is to facilitate queries of the cluster state, and
minimize the necessary information required to establish the many connections involved in
connecting new clients and engines.
Schedulers
**********
All actions that can be performed on the engine go through a Scheduler. While the engines
themselves block when user code is run, the schedulers hide that from the user to provide
a fully asynchronous interface to a set of engines.
IPython client
--------------
There is one primary object, the :class:`~.parallel.client.Client`, for connecting to a
controller. For each model, there is a corresponding view. These views allow users to
interact with a set of engines through the interface. Here are the two default views:
* The :class:`DirectView` class for explicit addressing.
* The :class:`LoadBalancedView` class for destination-agnostic scheduling.
Security
--------
IPython uses ZeroMQ for networking, which has provided many advantages, but
one of the setbacks is its utter lack of security [ZeroMQ]_. By default, no IPython
connections are secured, but open ports only listen on localhost. The only
source of security for IPython is via ssh-tunnel. IPython supports both shell
(`openssh`) and `paramiko` based tunnels for connections.
In our architecture, the controller is the only process that listens on
network ports, and is thus the main point of vulnerability. The standard model
for secure connections is to designate that the controller listen on
localhost, and use ssh-tunnels on the same machine to connect clients and/or
engines.
.. warning::
Even at its most secure, the Controller listens on ports on localhost, and
every time you make a tunnel, you open a localhost port on the connecting
machine that points to the Controller. If localhost on the Controller's
machine, or the machine of any client or engine, is untrusted, then your
Controller is insecure. There is no way around this with ZeroMQ.
.. TODO: edit parallelsecurity
A detailed description of the security model and its implementation in IPython
can be found :ref:`here <parallelsecurity>`.
Getting Started
===============
To use IPython for parallel computing, you need to start one instance of the
controller and one or more instances of the engine. Initially, it is best to
simply start a controller and engines on a single host using the
:command:`ipclusterz` command. To start a controller and 4 engines on your
localhost, just do::
$ ipclusterz -n 4
More details about starting the IPython controller and engines can be found
:ref:`here <parallel_process>`
Once you have started the IPython controller and one or more engines, you
are ready to use the engines to do something useful. To make sure
everything is working correctly, try the following commands:
.. sourcecode:: ipython
In [1]: from IPython.zmq.parallel import client
In [2]: c = client.Client()
In [4]: c.ids
Out[4]: set([0, 1, 2, 3])
In [5]: c.apply(lambda : "Hello, World", targets='all', block=True)
Out[5]: {0: 'Hello, World', 1: 'Hello, World', 2: 'Hello, World', 3:
'Hello, World'}
Remember, a client needs to be able to see the Hub. So if they
are on a different machine, and you have ssh access to that machine,
then you would connect to it with::
.. sourcecode:: ipython
In [2]: c = client.Client(sshserver='myhub.example.com')
Where 'myhub.example.com' is the url or IP address of the machine on
which the Hub process is running.
You are now ready to learn more about the :ref:`MUX
<parallelmultiengine>` and :ref:`Task <paralleltask>` interfaces to the
controller.
.. [ZeroMQ] ZeroMQ. http://www.zeromq.org