parallel_process.txt
389 lines
| 14.4 KiB
| text/plain
|
TextLexer
Brian Granger
|
r1788 | .. _parallel_process: | ||
=========================================== | ||||
Starting the IPython controller and engines | ||||
=========================================== | ||||
To use IPython for parallel computing, you need to start one instance of | ||||
the controller and one or more instances of the engine. The controller | ||||
and each engine can run on different machines or on the same machine. | ||||
Because of this, there are many different possibilities. | ||||
Broadly speaking, there are two ways of going about starting a controller and engines: | ||||
* In an automated manner using the :command:`ipcluster` command. | ||||
* In a more manual way using the :command:`ipcontroller` and | ||||
:command:`ipengine` commands. | ||||
Brian Granger
|
r2197 | This document describes both of these methods. We recommend that new users | ||
start with the :command:`ipcluster` command as it simplifies many common usage | ||||
cases. | ||||
Brian Granger
|
r1788 | |||
General considerations | ||||
====================== | ||||
Brian Granger
|
r2197 | Before delving into the details about how you can start a controller and | ||
engines using the various methods, we outline some of the general issues that | ||||
come up when starting the controller and engines. These things come up no | ||||
matter which method you use to start your IPython cluster. | ||||
Brian Granger
|
r1788 | |||
Brian Granger
|
r2197 | Let's say that you want to start the controller on ``host0`` and engines on | ||
hosts ``host1``-``hostn``. The following steps are then required: | ||||
Brian Granger
|
r1788 | |||
1. Start the controller on ``host0`` by running :command:`ipcontroller` on | ||||
``host0``. | ||||
2. Move the FURL file (:file:`ipcontroller-engine.furl`) created by the | ||||
controller from ``host0`` to hosts ``host1``-``hostn``. | ||||
3. Start the engines on hosts ``host1``-``hostn`` by running | ||||
:command:`ipengine`. This command has to be told where the FURL file | ||||
(:file:`ipcontroller-engine.furl`) is located. | ||||
At this point, the controller and engines will be connected. By default, the | ||||
FURL files created by the controller are put into the | ||||
:file:`~/.ipython/security` directory. If the engines share a filesystem with | ||||
the controller, step 2 can be skipped as the engines will automatically look | ||||
at that location. | ||||
The final step required required to actually use the running controller from a | ||||
client is to move the FURL files :file:`ipcontroller-mec.furl` and | ||||
:file:`ipcontroller-tc.furl` from ``host0`` to the host where the clients will | ||||
Brian Granger
|
r2197 | be run. If these file are put into the :file:`~/.ipython/security` directory | ||
of the client's host, they will be found automatically. Otherwise, the full | ||||
path to them has to be passed to the client's constructor. | ||||
Brian Granger
|
r1788 | |||
Using :command:`ipcluster` | ||||
========================== | ||||
Brian Granger
|
r2197 | The :command:`ipcluster` command provides a simple way of starting a | ||
controller and engines in the following situations: | ||||
Brian Granger
|
r1788 | |||
1. When the controller and engines are all run on localhost. This is useful | ||||
for testing or running on a multicore computer. | ||||
2. When engines are started using the :command:`mpirun` command that comes | ||||
with most MPI [MPI]_ implementations | ||||
3. When engines are started using the PBS [PBS]_ batch system. | ||||
Brian Granger
|
r1830 | 4. When the controller is started on localhost and the engines are started on | ||
remote nodes using :command:`ssh`. | ||||
Brian Granger
|
r1788 | |||
.. note:: | ||||
It is also possible for advanced users to add support to | ||||
:command:`ipcluster` for starting controllers and engines using other | ||||
methods (like Sun's Grid Engine for example). | ||||
.. note:: | ||||
Currently :command:`ipcluster` requires that the | ||||
:file:`~/.ipython/security` directory live on a shared filesystem that is | ||||
seen by both the controller and engines. If you don't have a shared file | ||||
system you will need to use :command:`ipcontroller` and | ||||
Brian Granger
|
r1830 | :command:`ipengine` directly. This constraint can be relaxed if you are | ||
using the :command:`ssh` method to start the cluster. | ||||
Brian Granger
|
r1788 | |||
Underneath the hood, :command:`ipcluster` just uses :command:`ipcontroller` | ||||
and :command:`ipengine` to perform the steps described above. | ||||
Using :command:`ipcluster` in local mode | ||||
---------------------------------------- | ||||
To start one controller and 4 engines on localhost, just do:: | ||||
$ ipcluster local -n 4 | ||||
To see other command line options for the local mode, do:: | ||||
$ ipcluster local -h | ||||
Brian Granger
|
r1880 | Using :command:`ipcluster` in mpiexec/mpirun mode | ||
------------------------------------------------- | ||||
Brian Granger
|
r1788 | |||
Brian Granger
|
r1880 | The mpiexec/mpirun mode is useful if you: | ||
Brian Granger
|
r1788 | |||
1. Have MPI installed. | ||||
Brian Granger
|
r1880 | 2. Your systems are configured to use the :command:`mpiexec` or | ||
:command:`mpirun` commands to start MPI processes. | ||||
.. note:: | ||||
The preferred command to use is :command:`mpiexec`. However, we also | ||||
support :command:`mpirun` for backwards compatibility. The underlying | ||||
logic used is exactly the same, the only difference being the name of the | ||||
command line program that is called. | ||||
Brian Granger
|
r1788 | |||
If these are satisfied, you can start an IPython cluster using:: | ||||
Brian Granger
|
r1880 | $ ipcluster mpiexec -n 4 | ||
Brian Granger
|
r1788 | |||
This does the following: | ||||
1. Starts the IPython controller on current host. | ||||
Brian Granger
|
r1880 | 2. Uses :command:`mpiexec` to start 4 engines. | ||
Brian Granger
|
r1788 | |||
Brian Granger
|
r2197 | On newer MPI implementations (such as OpenMPI), this will work even if you | ||
don't make any calls to MPI or call :func:`MPI_Init`. However, older MPI | ||||
implementations actually require each process to call :func:`MPI_Init` upon | ||||
starting. The easiest way of having this done is to install the mpi4py | ||||
[mpi4py]_ package and then call ipcluster with the ``--mpi`` option:: | ||||
Brian Granger
|
r1788 | |||
Brian Granger
|
r1880 | $ ipcluster mpiexec -n 4 --mpi=mpi4py | ||
Brian Granger
|
r1788 | |||
Brian Granger
|
r2197 | Unfortunately, even this won't work for some MPI implementations. If you are | ||
having problems with this, you will likely have to use a custom Python | ||||
executable that itself calls :func:`MPI_Init` at the appropriate time. | ||||
Fortunately, mpi4py comes with such a custom Python executable that is easy to | ||||
install and use. However, this custom Python executable approach will not work | ||||
with :command:`ipcluster` currently. | ||||
Brian Granger
|
r1788 | |||
Additional command line options for this mode can be found by doing:: | ||||
Brian Granger
|
r1880 | $ ipcluster mpiexec -h | ||
Brian Granger
|
r1788 | |||
More details on using MPI with IPython can be found :ref:`here <parallelmpi>`. | ||||
Using :command:`ipcluster` in PBS mode | ||||
-------------------------------------- | ||||
Brian Granger
|
r2197 | The PBS mode uses the Portable Batch System [PBS]_ to start the engines. To | ||
use this mode, you first need to create a PBS script template that will be | ||||
used to start the engines. Here is a sample PBS script template: | ||||
Brian Granger
|
r1788 | |||
.. sourcecode:: bash | ||||
#PBS -N ipython | ||||
#PBS -j oe | ||||
#PBS -l walltime=00:10:00 | ||||
#PBS -l nodes=${n/4}:ppn=4 | ||||
#PBS -q parallel | ||||
cd $$PBS_O_WORKDIR | ||||
export PATH=$$HOME/usr/local/bin | ||||
export PYTHONPATH=$$HOME/usr/local/lib/python2.4/site-packages | ||||
/usr/local/bin/mpiexec -n ${n} ipengine --logfile=$$PBS_O_WORKDIR/ipengine | ||||
There are a few important points about this template: | ||||
1. This template will be rendered at runtime using IPython's :mod:`Itpl` | ||||
template engine. | ||||
2. Instead of putting in the actual number of engines, use the notation | ||||
``${n}`` to indicate the number of engines to be started. You can also uses | ||||
expressions like ``${n/4}`` in the template to indicate the number of | ||||
nodes. | ||||
3. Because ``$`` is a special character used by the template engine, you must | ||||
escape any ``$`` by using ``$$``. This is important when referring to | ||||
environment variables in the template. | ||||
4. Any options to :command:`ipengine` should be given in the batch script | ||||
template. | ||||
5. Depending on the configuration of you system, you may have to set | ||||
environment variables in the script template. | ||||
Brian Granger
|
r2197 | Once you have created such a script, save it with a name like | ||
:file:`pbs.template`. Now you are ready to start your job:: | ||||
Brian Granger
|
r1788 | |||
$ ipcluster pbs -n 128 --pbs-script=pbs.template | ||||
Additional command line options for this mode can be found by doing:: | ||||
$ ipcluster pbs -h | ||||
Brian Granger
|
r1830 | Using :command:`ipcluster` in SSH mode | ||
-------------------------------------- | ||||
The SSH mode uses :command:`ssh` to execute :command:`ipengine` on remote | ||||
nodes and the :command:`ipcontroller` on localhost. | ||||
Brian Granger
|
r2197 | When using using this mode it highly recommended that you have set up SSH keys | ||
and are using ssh-agent [SSH]_ for password-less logins. | ||||
Brian Granger
|
r1830 | |||
Brian Granger
|
r2197 | To use this mode you need a python file describing the cluster, here is an | ||
example of such a "clusterfile": | ||||
Brian Granger
|
r1830 | |||
.. sourcecode:: python | ||||
send_furl = True | ||||
engines = { 'host1.example.com' : 2, | ||||
'host2.example.com' : 5, | ||||
'host3.example.com' : 1, | ||||
'host4.example.com' : 8 } | ||||
Brian Granger
|
r2197 | Since this is a regular python file usual python syntax applies. Things to | ||
note: | ||||
Brian Granger
|
r1830 | |||
* The `engines` dict, where the keys is the host we want to run engines on and | ||||
the value is the number of engines to run on that host. | ||||
* send_furl can either be `True` or `False`, if `True` it will copy over the | ||||
furl needed for :command:`ipengine` to each host. | ||||
The ``--clusterfile`` command line option lets you specify the file to use for | ||||
the cluster definition. Once you have your cluster file and you can | ||||
:command:`ssh` into the remote hosts with out an password you are ready to | ||||
start your cluster like so: | ||||
.. sourcecode:: bash | ||||
$ ipcluster ssh --clusterfile /path/to/my/clusterfile.py | ||||
Brian Granger
|
r2197 | Two helper shell scripts are used to start and stop :command:`ipengine` on | ||
remote hosts: | ||||
Brian Granger
|
r1830 | |||
* sshx.sh | ||||
* engine_killer.sh | ||||
Brian Granger
|
r2197 | Defaults for both of these are contained in the source code for | ||
:command:`ipcluster`. The default scripts are written to a local file in a | ||||
tmep directory and then copied to a temp directory on the remote host and | ||||
executed from there. On most Unix, Linux and OS X systems this is /tmp. | ||||
Brian Granger
|
r1830 | |||
Brian Granger
|
r1832 | The default sshx.sh is the following: | ||
Brian Granger
|
r1830 | |||
.. sourcecode:: bash | ||||
#!/bin/sh | ||||
"$@" &> /dev/null & | ||||
echo $! | ||||
If you want to use a custom sshx.sh script you need to use the ``--sshx`` | ||||
option and specify the file to use. Using a custom sshx.sh file could be | ||||
helpful when you need to setup the environment on the remote host before | ||||
executing :command:`ipengine`. | ||||
For a detailed options list: | ||||
.. sourcecode:: bash | ||||
$ ipcluster ssh -h | ||||
Current limitations of the SSH mode of :command:`ipcluster` are: | ||||
* Untested on Windows. Would require a working :command:`ssh` on Windows. | ||||
Also, we are using shell scripts to setup and execute commands on remote | ||||
hosts. | ||||
* :command:`ipcontroller` is started on localhost, with no option to start it | ||||
Brian Granger
|
r1832 | on a remote node. | ||
Brian Granger
|
r1830 | |||
Brian Granger
|
r1788 | Using the :command:`ipcontroller` and :command:`ipengine` commands | ||
================================================================== | ||||
Brian Granger
|
r2197 | It is also possible to use the :command:`ipcontroller` and :command:`ipengine` | ||
commands to start your controller and engines. This approach gives you full | ||||
control over all aspects of the startup process. | ||||
Brian Granger
|
r1788 | |||
Starting the controller and engine on your local machine | ||||
-------------------------------------------------------- | ||||
To use :command:`ipcontroller` and :command:`ipengine` to start things on your | ||||
local machine, do the following. | ||||
First start the controller:: | ||||
$ ipcontroller | ||||
Brian Granger
|
r2197 | Next, start however many instances of the engine you want using (repeatedly) | ||
the command:: | ||||
Brian Granger
|
r1788 | |||
$ ipengine | ||||
Brian Granger
|
r2197 | The engines should start and automatically connect to the controller using the | ||
FURL files in :file:`~./ipython/security`. You are now ready to use the | ||||
controller and engines from IPython. | ||||
Brian Granger
|
r1788 | |||
.. warning:: | ||||
The order of the above operations is very important. You *must* | ||||
start the controller before the engines, since the engines connect | ||||
to the controller as they get started. | ||||
.. note:: | ||||
On some platforms (OS X), to put the controller and engine into the | ||||
background you may need to give these commands in the form ``(ipcontroller | ||||
&)`` and ``(ipengine &)`` (with the parentheses) for them to work | ||||
properly. | ||||
Starting the controller and engines on different hosts | ||||
------------------------------------------------------ | ||||
When the controller and engines are running on different hosts, things are | ||||
slightly more complicated, but the underlying ideas are the same: | ||||
1. Start the controller on a host using :command:`ipcontroller`. | ||||
Brian Granger
|
r2197 | 2. Copy :file:`ipcontroller-engine.furl` from :file:`~./ipython/security` on | ||
the controller's host to the host where the engines will run. | ||||
Brian Granger
|
r1788 | 3. Use :command:`ipengine` on the engine's hosts to start the engines. | ||
Brian Granger
|
r2197 | The only thing you have to be careful of is to tell :command:`ipengine` where | ||
the :file:`ipcontroller-engine.furl` file is located. There are two ways you | ||||
can do this: | ||||
Brian Granger
|
r1788 | |||
* Put :file:`ipcontroller-engine.furl` in the :file:`~./ipython/security` | ||||
directory on the engine's host, where it will be found automatically. | ||||
* Call :command:`ipengine` with the ``--furl-file=full_path_to_the_file`` | ||||
flag. | ||||
The ``--furl-file`` flag works like this:: | ||||
$ ipengine --furl-file=/path/to/my/ipcontroller-engine.furl | ||||
.. note:: | ||||
If the controller's and engine's hosts all have a shared file system | ||||
(:file:`~./ipython/security` is the same on all of them), then things | ||||
will just work! | ||||
Make FURL files persistent | ||||
--------------------------- | ||||
Brian Granger
|
r1944 | At fist glance it may seem that that managing the FURL files is a bit | ||
annoying. Going back to the house and key analogy, copying the FURL around | ||||
each time you start the controller is like having to make a new key every time | ||||
you want to unlock the door and enter your house. As with your house, you want | ||||
to be able to create the key (or FURL file) once, and then simply use it at | ||||
any point in the future. | ||||
Brian Granger
|
r1788 | |||
Brian Granger
|
r1958 | This is possible, but before you do this, you **must** remove any old FURL | ||
Brian Granger
|
r1944 | files in the :file:`~/.ipython/security` directory. | ||
.. warning:: | ||||
You **must** remove old FURL files before using persistent FURL files. | ||||
Then, The only thing you have to do is decide what ports the controller will | ||||
listen on for the engines and clients. This is done as follows:: | ||||
Brian Granger
|
r1788 | |||
$ ipcontroller -r --client-port=10101 --engine-port=10102 | ||||
Brian Granger
|
r1883 | |||
Brian Granger
|
r1944 | These options also work with all of the various modes of | ||
Brian Granger
|
r1883 | :command:`ipcluster`:: | ||
$ ipcluster local -n 2 -r --client-port=10101 --engine-port=10102 | ||||
Brian Granger
|
r1788 | |||
Brian Granger
|
r1944 | Then, just copy the furl files over the first time and you are set. You can | ||
start and stop the controller and engines any many times as you want in the | ||||
future, just make sure to tell the controller to use the *same* ports. | ||||
Brian Granger
|
r1788 | |||
.. note:: | ||||
You may ask the question: what ports does the controller listen on if you | ||||
don't tell is to use specific ones? The default is to use high random port | ||||
numbers. We do this for two reasons: i) to increase security through | ||||
obscurity and ii) to multiple controllers on a given host to start and | ||||
automatically use different ports. | ||||
Log files | ||||
--------- | ||||
All of the components of IPython have log files associated with them. | ||||
These log files can be extremely useful in debugging problems with | ||||
IPython and can be found in the directory :file:`~/.ipython/log`. Sending | ||||
the log files to us will often help us to debug any problems. | ||||
.. [PBS] Portable Batch System. http://www.openpbs.org/ | ||||
Brian Granger
|
r1830 | .. [SSH] SSH-Agent http://en.wikipedia.org/wiki/Ssh-agent | ||
Brian Granger
|
r2275 | |||