upstream/ipython Files · docs/source/parallel/parallel_intro.txt

General work on inputhook and the docs....

General work on inputhook and the docs. * Re-wrote the Sphinx docs on GUI integration. * Improved docstrings of inputhook module. * Added current_gui function to inputhook. * Fixed lots of small bugs in the docs.

Brian Granger - - Load All Authors

File last commit:

r2197:71065c54


                r2197:71065c54

Download file

             parallel_intro.txt
        
                    237 lines
            
             | 8.9 KiB
            
                | text/plain
            
             |
                TextLexer

/ docs / source / parallel / parallel_intro.txt

History | Annotation | Raw |Copy content |Copy permalink

				.. _ip1par:

				============================
				Overview and getting started
				============================

				Introduction
				============

				This section gives an overview of IPython's sophisticated and powerful
				architecture for parallel and distributed computing. This architecture
				abstracts out parallelism in a very general way, which enables IPython to
				support many different styles of parallelism including:

				* Single program, multiple data (SPMD) parallelism.
				* Multiple program, multiple data (MPMD) parallelism.
				* Message passing using MPI.
				* Task farming.
				* Data parallel.
				* Combinations of these approaches.
				* Custom user defined approaches.

				Most importantly, IPython enables all types of parallel applications to
				be developed, executed, debugged and monitored interactively. Hence,
				the ``I`` in IPython. The following are some example usage cases for IPython:

				* Quickly parallelize algorithms that are embarrassingly parallel
				using a number of simple approaches. Many simple things can be
				parallelized interactively in one or two lines of code.

				* Steer traditional MPI applications on a supercomputer from an
				IPython session on your laptop.

				* Analyze and visualize large datasets (that could be remote and/or
				distributed) interactively using IPython and tools like
				matplotlib/TVTK.

				* Develop, test and debug new parallel algorithms
				(that may use MPI) interactively.

				* Tie together multiple MPI jobs running on different systems into
				one giant distributed and parallel system.

				* Start a parallel job on your cluster and then have a remote
				collaborator connect to it and pull back data into their
				local IPython session for plotting and analysis.

				* Run a set of tasks on a set of CPUs using dynamic load balancing.

				Architecture overview
				=====================

				The IPython architecture consists of three components:

				* The IPython engine.
				* The IPython controller.
				* Various controller clients.

				These components live in the :mod:`IPython.kernel` package and are
				installed with IPython. They do, however, have additional dependencies
				that must be installed. For more information, see our
				:ref:`installation documentation <install_index>`.

				IPython engine
				---------------

				The IPython engine is a Python instance that takes Python commands over a
				network connection. Eventually, the IPython engine will be a full IPython
				interpreter, but for now, it is a regular Python interpreter. The engine
				can also handle incoming and outgoing Python objects sent over a network
				connection. When multiple engines are started, parallel and distributed
				computing becomes possible. An important feature of an IPython engine is
				that it blocks while user code is being executed. Read on for how the
				IPython controller solves this problem to expose a clean asynchronous API
				to the user.

				IPython controller
				------------------

				The IPython controller provides an interface for working with a set of
				engines. At an general level, the controller is a process to which
				IPython engines can connect. For each connected engine, the controller
				manages a queue. All actions that can be performed on the engine go
				through this queue. While the engines themselves block when user code is
				run, the controller hides that from the user to provide a fully
				asynchronous interface to a set of engines.

				.. note::

				Because the controller listens on a network port for engines to
				connect to it, it must be started before any engines are started.

				The controller also provides a single point of contact for users who wish to
				utilize the engines connected to the controller. There are different ways of
				working with a controller. In IPython these ways correspond to different
				interfaces that the controller is adapted to. Currently we have two default
				interfaces to the controller:

				* The MultiEngine interface, which provides the simplest possible way of
				working with engines interactively.
				* The Task interface, which presents the engines as a load balanced
				task farming system.

				Advanced users can easily add new custom interfaces to enable other
				styles of parallelism.

				.. note::

				A single controller and set of engines can be accessed
				through multiple interfaces simultaneously. This opens the
				door for lots of interesting things.

				Controller clients
				------------------

				For each controller interface, there is a corresponding client. These
				clients allow users to interact with a set of engines through the
				interface. Here are the two default clients:

				* The :class:`MultiEngineClient` class.
				* The :class:`TaskClient` class.

				Security
				--------

				By default (as long as `pyOpenSSL` is installed) all network connections
				between the controller and engines and the controller and clients are secure.
				What does this mean? First of all, all of the connections will be encrypted
				using SSL. Second, the connections are authenticated. We handle authentication
				in a capability based security model [Capability]_. In this model, a
				"capability (known in some systems as a key) is a communicable, unforgeable
				token of authority". Put simply, a capability is like a key to your house. If
				you have the key to your house, you can get in. If not, you can't.

				In our architecture, the controller is the only process that listens on
				network ports, and is thus responsible to creating these keys. In IPython,
				these keys are known as Foolscap URLs, or FURLs, because of the underlying
				network protocol we are using. As a user, you don't need to know anything
				about the details of these FURLs, other than that when the controller starts,
				it saves a set of FURLs to files named :file:`something.furl`. The default
				location of these files is the :file:`~./ipython/security` directory.

				To connect and authenticate to the controller an engine or client simply needs
				to present an appropriate FURL (that was originally created by the controller)
				to the controller. Thus, the FURL files need to be copied to a location where
				the clients and engines can find them. Typically, this is the
				:file:`~./ipython/security` directory on the host where the client/engine is
				running (which could be a different host than the controller). Once the FURL
				files are copied over, everything should work fine.

				Currently, there are three FURL files that the controller creates:

				ipcontroller-engine.furl
				This FURL file is the key that gives an engine the ability to connect
				to a controller.

				ipcontroller-tc.furl
				This FURL file is the key that a :class:`TaskClient` must use to
				connect to the task interface of a controller.

				ipcontroller-mec.furl
				This FURL file is the key that a :class:`MultiEngineClient` must use
				to connect to the multiengine interface of a controller.

				More details of how these FURL files are used are given below.

				A detailed description of the security model and its implementation in IPython
				can be found :ref:`here <parallelsecurity>`.

				Getting Started
				===============

				To use IPython for parallel computing, you need to start one instance of the
				controller and one or more instances of the engine. Initially, it is best to
				simply start a controller and engines on a single host using the
				:command:`ipcluster` command. To start a controller and 4 engines on your
				localhost, just do::

				$ ipcluster local -n 4

				More details about starting the IPython controller and engines can be found
				:ref:`here <parallel_process>`

				Once you have started the IPython controller and one or more engines, you
				are ready to use the engines to do something useful. To make sure
				everything is working correctly, try the following commands:

				.. sourcecode:: ipython

				In [1]: from IPython.kernel import client

				In [2]: mec = client.MultiEngineClient()

				In [4]: mec.get_ids()
				Out[4]: [0, 1, 2, 3]

				In [5]: mec.execute('print "Hello World"')
				Out[5]:
				<Results List>
				[0] In [1]: print "Hello World"
				[0] Out[1]: Hello World

				[1] In [1]: print "Hello World"
				[1] Out[1]: Hello World

				[2] In [1]: print "Hello World"
				[2] Out[1]: Hello World

				[3] In [1]: print "Hello World"
				[3] Out[1]: Hello World

				Remember, a client also needs to present a FURL file to the controller. How
				does this happen? When a multiengine client is created with no arguments, the
				client tries to find the corresponding FURL file in the local
				:file:`~./ipython/security` directory. If it finds it, you are set. If you
				have put the FURL file in a different location or it has a different name,
				create the client like this::

				mec = client.MultiEngineClient('/path/to/my/ipcontroller-mec.furl')

				Same thing hold true of creating a task client::

				tc = client.TaskClient('/path/to/my/ipcontroller-tc.furl')

				You are now ready to learn more about the :ref:`MultiEngine
				<parallelmultiengine>` and :ref:`Task <paralleltask>` interfaces to the
				controller.

				.. note::

				Don't forget that the engine, multiengine client and task client all have
				different furl files. You must move each of these around to an
				appropriate location so that the engines and clients can use them to
				connect to the controller.

				.. [Capability] Capability-based security, http://en.wikipedia.org/wiki/Capability-based_security

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages