|
|
.. _parallel_process:
|
|
|
|
|
|
===========================================
|
|
|
Starting the IPython controller and engines
|
|
|
===========================================
|
|
|
|
|
|
To use IPython for parallel computing, you need to start one instance of
|
|
|
the controller and one or more instances of the engine. The controller
|
|
|
and each engine can run on different machines or on the same machine.
|
|
|
Because of this, there are many different possibilities.
|
|
|
|
|
|
Broadly speaking, there are two ways of going about starting a controller and engines:
|
|
|
|
|
|
* In an automated manner using the :command:`ipcluster` command.
|
|
|
* In a more manual way using the :command:`ipcontroller` and
|
|
|
:command:`ipengine` commands.
|
|
|
|
|
|
This document describes both of these methods. We recommend that new users
|
|
|
start with the :command:`ipcluster` command as it simplifies many common usage
|
|
|
cases.
|
|
|
|
|
|
General considerations
|
|
|
======================
|
|
|
|
|
|
Before delving into the details about how you can start a controller and
|
|
|
engines using the various methods, we outline some of the general issues that
|
|
|
come up when starting the controller and engines. These things come up no
|
|
|
matter which method you use to start your IPython cluster.
|
|
|
|
|
|
Let's say that you want to start the controller on ``host0`` and engines on
|
|
|
hosts ``host1``-``hostn``. The following steps are then required:
|
|
|
|
|
|
1. Start the controller on ``host0`` by running :command:`ipcontroller` on
|
|
|
``host0``.
|
|
|
2. Move the FURL file (:file:`ipcontroller-engine.furl`) created by the
|
|
|
controller from ``host0`` to hosts ``host1``-``hostn``.
|
|
|
3. Start the engines on hosts ``host1``-``hostn`` by running
|
|
|
:command:`ipengine`. This command has to be told where the FURL file
|
|
|
(:file:`ipcontroller-engine.furl`) is located.
|
|
|
|
|
|
At this point, the controller and engines will be connected. By default, the
|
|
|
FURL files created by the controller are put into the
|
|
|
:file:`~/.ipython/security` directory. If the engines share a filesystem with
|
|
|
the controller, step 2 can be skipped as the engines will automatically look
|
|
|
at that location.
|
|
|
|
|
|
The final step required required to actually use the running controller from a
|
|
|
client is to move the FURL files :file:`ipcontroller-mec.furl` and
|
|
|
:file:`ipcontroller-tc.furl` from ``host0`` to the host where the clients will
|
|
|
be run. If these file are put into the :file:`~/.ipython/security` directory
|
|
|
of the client's host, they will be found automatically. Otherwise, the full
|
|
|
path to them has to be passed to the client's constructor.
|
|
|
|
|
|
Using :command:`ipcluster`
|
|
|
==========================
|
|
|
|
|
|
The :command:`ipcluster` command provides a simple way of starting a
|
|
|
controller and engines in the following situations:
|
|
|
|
|
|
1. When the controller and engines are all run on localhost. This is useful
|
|
|
for testing or running on a multicore computer.
|
|
|
2. When engines are started using the :command:`mpirun` command that comes
|
|
|
with most MPI [MPI]_ implementations
|
|
|
3. When engines are started using the PBS [PBS]_ batch system.
|
|
|
4. When the controller is started on localhost and the engines are started on
|
|
|
remote nodes using :command:`ssh`.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
It is also possible for advanced users to add support to
|
|
|
:command:`ipcluster` for starting controllers and engines using other
|
|
|
methods (like Sun's Grid Engine for example).
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
Currently :command:`ipcluster` requires that the
|
|
|
:file:`~/.ipython/security` directory live on a shared filesystem that is
|
|
|
seen by both the controller and engines. If you don't have a shared file
|
|
|
system you will need to use :command:`ipcontroller` and
|
|
|
:command:`ipengine` directly. This constraint can be relaxed if you are
|
|
|
using the :command:`ssh` method to start the cluster.
|
|
|
|
|
|
Underneath the hood, :command:`ipcluster` just uses :command:`ipcontroller`
|
|
|
and :command:`ipengine` to perform the steps described above.
|
|
|
|
|
|
Using :command:`ipcluster` in local mode
|
|
|
----------------------------------------
|
|
|
|
|
|
To start one controller and 4 engines on localhost, just do::
|
|
|
|
|
|
$ ipcluster local -n 4
|
|
|
|
|
|
To see other command line options for the local mode, do::
|
|
|
|
|
|
$ ipcluster local -h
|
|
|
|
|
|
Using :command:`ipcluster` in mpiexec/mpirun mode
|
|
|
-------------------------------------------------
|
|
|
|
|
|
The mpiexec/mpirun mode is useful if you:
|
|
|
|
|
|
1. Have MPI installed.
|
|
|
2. Your systems are configured to use the :command:`mpiexec` or
|
|
|
:command:`mpirun` commands to start MPI processes.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
The preferred command to use is :command:`mpiexec`. However, we also
|
|
|
support :command:`mpirun` for backwards compatibility. The underlying
|
|
|
logic used is exactly the same, the only difference being the name of the
|
|
|
command line program that is called.
|
|
|
|
|
|
If these are satisfied, you can start an IPython cluster using::
|
|
|
|
|
|
$ ipcluster mpiexec -n 4
|
|
|
|
|
|
This does the following:
|
|
|
|
|
|
1. Starts the IPython controller on current host.
|
|
|
2. Uses :command:`mpiexec` to start 4 engines.
|
|
|
|
|
|
On newer MPI implementations (such as OpenMPI), this will work even if you
|
|
|
don't make any calls to MPI or call :func:`MPI_Init`. However, older MPI
|
|
|
implementations actually require each process to call :func:`MPI_Init` upon
|
|
|
starting. The easiest way of having this done is to install the mpi4py
|
|
|
[mpi4py]_ package and then call ipcluster with the ``--mpi`` option::
|
|
|
|
|
|
$ ipcluster mpiexec -n 4 --mpi=mpi4py
|
|
|
|
|
|
Unfortunately, even this won't work for some MPI implementations. If you are
|
|
|
having problems with this, you will likely have to use a custom Python
|
|
|
executable that itself calls :func:`MPI_Init` at the appropriate time.
|
|
|
Fortunately, mpi4py comes with such a custom Python executable that is easy to
|
|
|
install and use. However, this custom Python executable approach will not work
|
|
|
with :command:`ipcluster` currently.
|
|
|
|
|
|
Additional command line options for this mode can be found by doing::
|
|
|
|
|
|
$ ipcluster mpiexec -h
|
|
|
|
|
|
More details on using MPI with IPython can be found :ref:`here <parallelmpi>`.
|
|
|
|
|
|
|
|
|
Using :command:`ipcluster` in PBS mode
|
|
|
--------------------------------------
|
|
|
|
|
|
The PBS mode uses the Portable Batch System [PBS]_ to start the engines. To
|
|
|
use this mode, you first need to create a PBS script template that will be
|
|
|
used to start the engines. Here is a sample PBS script template:
|
|
|
|
|
|
.. sourcecode:: bash
|
|
|
|
|
|
#PBS -N ipython
|
|
|
#PBS -j oe
|
|
|
#PBS -l walltime=00:10:00
|
|
|
#PBS -l nodes=${n/4}:ppn=4
|
|
|
#PBS -q parallel
|
|
|
|
|
|
cd $$PBS_O_WORKDIR
|
|
|
export PATH=$$HOME/usr/local/bin
|
|
|
export PYTHONPATH=$$HOME/usr/local/lib/python2.4/site-packages
|
|
|
/usr/local/bin/mpiexec -n ${n} ipengine --logfile=$$PBS_O_WORKDIR/ipengine
|
|
|
|
|
|
There are a few important points about this template:
|
|
|
|
|
|
1. This template will be rendered at runtime using IPython's :mod:`Itpl`
|
|
|
template engine.
|
|
|
|
|
|
2. Instead of putting in the actual number of engines, use the notation
|
|
|
``${n}`` to indicate the number of engines to be started. You can also uses
|
|
|
expressions like ``${n/4}`` in the template to indicate the number of
|
|
|
nodes.
|
|
|
|
|
|
3. Because ``$`` is a special character used by the template engine, you must
|
|
|
escape any ``$`` by using ``$$``. This is important when referring to
|
|
|
environment variables in the template.
|
|
|
|
|
|
4. Any options to :command:`ipengine` should be given in the batch script
|
|
|
template.
|
|
|
|
|
|
5. Depending on the configuration of you system, you may have to set
|
|
|
environment variables in the script template.
|
|
|
|
|
|
Once you have created such a script, save it with a name like
|
|
|
:file:`pbs.template`. Now you are ready to start your job::
|
|
|
|
|
|
$ ipcluster pbs -n 128 --pbs-script=pbs.template
|
|
|
|
|
|
Additional command line options for this mode can be found by doing::
|
|
|
|
|
|
$ ipcluster pbs -h
|
|
|
|
|
|
Using :command:`ipcluster` in SSH mode
|
|
|
--------------------------------------
|
|
|
|
|
|
The SSH mode uses :command:`ssh` to execute :command:`ipengine` on remote
|
|
|
nodes and the :command:`ipcontroller` on localhost.
|
|
|
|
|
|
When using using this mode it highly recommended that you have set up SSH keys
|
|
|
and are using ssh-agent [SSH]_ for password-less logins.
|
|
|
|
|
|
To use this mode you need a python file describing the cluster, here is an
|
|
|
example of such a "clusterfile":
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
send_furl = True
|
|
|
engines = { 'host1.example.com' : 2,
|
|
|
'host2.example.com' : 5,
|
|
|
'host3.example.com' : 1,
|
|
|
'host4.example.com' : 8 }
|
|
|
|
|
|
Since this is a regular python file usual python syntax applies. Things to
|
|
|
note:
|
|
|
|
|
|
* The `engines` dict, where the keys is the host we want to run engines on and
|
|
|
the value is the number of engines to run on that host.
|
|
|
* send_furl can either be `True` or `False`, if `True` it will copy over the
|
|
|
furl needed for :command:`ipengine` to each host.
|
|
|
|
|
|
The ``--clusterfile`` command line option lets you specify the file to use for
|
|
|
the cluster definition. Once you have your cluster file and you can
|
|
|
:command:`ssh` into the remote hosts with out an password you are ready to
|
|
|
start your cluster like so:
|
|
|
|
|
|
.. sourcecode:: bash
|
|
|
|
|
|
$ ipcluster ssh --clusterfile /path/to/my/clusterfile.py
|
|
|
|
|
|
|
|
|
Two helper shell scripts are used to start and stop :command:`ipengine` on
|
|
|
remote hosts:
|
|
|
|
|
|
* sshx.sh
|
|
|
* engine_killer.sh
|
|
|
|
|
|
Defaults for both of these are contained in the source code for
|
|
|
:command:`ipcluster`. The default scripts are written to a local file in a
|
|
|
tmep directory and then copied to a temp directory on the remote host and
|
|
|
executed from there. On most Unix, Linux and OS X systems this is /tmp.
|
|
|
|
|
|
The default sshx.sh is the following:
|
|
|
|
|
|
.. sourcecode:: bash
|
|
|
|
|
|
#!/bin/sh
|
|
|
"$@" &> /dev/null &
|
|
|
echo $!
|
|
|
|
|
|
If you want to use a custom sshx.sh script you need to use the ``--sshx``
|
|
|
option and specify the file to use. Using a custom sshx.sh file could be
|
|
|
helpful when you need to setup the environment on the remote host before
|
|
|
executing :command:`ipengine`.
|
|
|
|
|
|
For a detailed options list:
|
|
|
|
|
|
.. sourcecode:: bash
|
|
|
|
|
|
$ ipcluster ssh -h
|
|
|
|
|
|
Current limitations of the SSH mode of :command:`ipcluster` are:
|
|
|
|
|
|
* Untested on Windows. Would require a working :command:`ssh` on Windows.
|
|
|
Also, we are using shell scripts to setup and execute commands on remote
|
|
|
hosts.
|
|
|
* :command:`ipcontroller` is started on localhost, with no option to start it
|
|
|
on a remote node.
|
|
|
|
|
|
Using the :command:`ipcontroller` and :command:`ipengine` commands
|
|
|
==================================================================
|
|
|
|
|
|
It is also possible to use the :command:`ipcontroller` and :command:`ipengine`
|
|
|
commands to start your controller and engines. This approach gives you full
|
|
|
control over all aspects of the startup process.
|
|
|
|
|
|
Starting the controller and engine on your local machine
|
|
|
--------------------------------------------------------
|
|
|
|
|
|
To use :command:`ipcontroller` and :command:`ipengine` to start things on your
|
|
|
local machine, do the following.
|
|
|
|
|
|
First start the controller::
|
|
|
|
|
|
$ ipcontroller
|
|
|
|
|
|
Next, start however many instances of the engine you want using (repeatedly)
|
|
|
the command::
|
|
|
|
|
|
$ ipengine
|
|
|
|
|
|
The engines should start and automatically connect to the controller using the
|
|
|
FURL files in :file:`~./ipython/security`. You are now ready to use the
|
|
|
controller and engines from IPython.
|
|
|
|
|
|
.. warning::
|
|
|
|
|
|
The order of the above operations is very important. You *must*
|
|
|
start the controller before the engines, since the engines connect
|
|
|
to the controller as they get started.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
On some platforms (OS X), to put the controller and engine into the
|
|
|
background you may need to give these commands in the form ``(ipcontroller
|
|
|
&)`` and ``(ipengine &)`` (with the parentheses) for them to work
|
|
|
properly.
|
|
|
|
|
|
Starting the controller and engines on different hosts
|
|
|
------------------------------------------------------
|
|
|
|
|
|
When the controller and engines are running on different hosts, things are
|
|
|
slightly more complicated, but the underlying ideas are the same:
|
|
|
|
|
|
1. Start the controller on a host using :command:`ipcontroller`.
|
|
|
2. Copy :file:`ipcontroller-engine.furl` from :file:`~./ipython/security` on
|
|
|
the controller's host to the host where the engines will run.
|
|
|
3. Use :command:`ipengine` on the engine's hosts to start the engines.
|
|
|
|
|
|
The only thing you have to be careful of is to tell :command:`ipengine` where
|
|
|
the :file:`ipcontroller-engine.furl` file is located. There are two ways you
|
|
|
can do this:
|
|
|
|
|
|
* Put :file:`ipcontroller-engine.furl` in the :file:`~./ipython/security`
|
|
|
directory on the engine's host, where it will be found automatically.
|
|
|
* Call :command:`ipengine` with the ``--furl-file=full_path_to_the_file``
|
|
|
flag.
|
|
|
|
|
|
The ``--furl-file`` flag works like this::
|
|
|
|
|
|
$ ipengine --furl-file=/path/to/my/ipcontroller-engine.furl
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
If the controller's and engine's hosts all have a shared file system
|
|
|
(:file:`~./ipython/security` is the same on all of them), then things
|
|
|
will just work!
|
|
|
|
|
|
Make FURL files persistent
|
|
|
---------------------------
|
|
|
|
|
|
At fist glance it may seem that that managing the FURL files is a bit
|
|
|
annoying. Going back to the house and key analogy, copying the FURL around
|
|
|
each time you start the controller is like having to make a new key every time
|
|
|
you want to unlock the door and enter your house. As with your house, you want
|
|
|
to be able to create the key (or FURL file) once, and then simply use it at
|
|
|
any point in the future.
|
|
|
|
|
|
This is possible, but before you do this, you **must** remove any old FURL
|
|
|
files in the :file:`~/.ipython/security` directory.
|
|
|
|
|
|
.. warning::
|
|
|
|
|
|
You **must** remove old FURL files before using persistent FURL files.
|
|
|
|
|
|
Then, The only thing you have to do is decide what ports the controller will
|
|
|
listen on for the engines and clients. This is done as follows::
|
|
|
|
|
|
$ ipcontroller -r --client-port=10101 --engine-port=10102
|
|
|
|
|
|
These options also work with all of the various modes of
|
|
|
:command:`ipcluster`::
|
|
|
|
|
|
$ ipcluster local -n 2 -r --client-port=10101 --engine-port=10102
|
|
|
|
|
|
Then, just copy the furl files over the first time and you are set. You can
|
|
|
start and stop the controller and engines any many times as you want in the
|
|
|
future, just make sure to tell the controller to use the *same* ports.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
You may ask the question: what ports does the controller listen on if you
|
|
|
don't tell is to use specific ones? The default is to use high random port
|
|
|
numbers. We do this for two reasons: i) to increase security through
|
|
|
obscurity and ii) to multiple controllers on a given host to start and
|
|
|
automatically use different ports.
|
|
|
|
|
|
Log files
|
|
|
---------
|
|
|
|
|
|
All of the components of IPython have log files associated with them.
|
|
|
These log files can be extremely useful in debugging problems with
|
|
|
IPython and can be found in the directory :file:`~/.ipython/log`. Sending
|
|
|
the log files to us will often help us to debug any problems.
|
|
|
|
|
|
|
|
|
.. [PBS] Portable Batch System. http://www.openpbs.org/
|
|
|
.. [SSH] SSH-Agent http://en.wikipedia.org/wiki/Ssh-agent
|
|
|
|
|
|
|
|
|
|