|
|
.. _parallel_process:
|
|
|
|
|
|
===========================================
|
|
|
Starting the IPython controller and engines
|
|
|
===========================================
|
|
|
|
|
|
To use IPython for parallel computing, you need to start one instance of
|
|
|
the controller and one or more instances of the engine. The controller
|
|
|
and each engine can run on different machines or on the same machine.
|
|
|
Because of this, there are many different possibilities.
|
|
|
|
|
|
Broadly speaking, there are two ways of going about starting a controller and engines:
|
|
|
|
|
|
* In an automated manner using the :command:`ipcluster` command.
|
|
|
* In a more manual way using the :command:`ipcontroller` and
|
|
|
:command:`ipengine` commands.
|
|
|
|
|
|
This document describes both of these methods. We recommend that new users
|
|
|
start with the :command:`ipcluster` command as it simplifies many common usage
|
|
|
cases.
|
|
|
|
|
|
General considerations
|
|
|
======================
|
|
|
|
|
|
Before delving into the details about how you can start a controller and
|
|
|
engines using the various methods, we outline some of the general issues that
|
|
|
come up when starting the controller and engines. These things come up no
|
|
|
matter which method you use to start your IPython cluster.
|
|
|
|
|
|
Let's say that you want to start the controller on ``host0`` and engines on
|
|
|
hosts ``host1``-``hostn``. The following steps are then required:
|
|
|
|
|
|
1. Start the controller on ``host0`` by running :command:`ipcontroller` on
|
|
|
``host0``.
|
|
|
2. Move the JSON file (:file:`ipcontroller-engine.json`) created by the
|
|
|
controller from ``host0`` to hosts ``host1``-``hostn``.
|
|
|
3. Start the engines on hosts ``host1``-``hostn`` by running
|
|
|
:command:`ipengine`. This command has to be told where the JSON file
|
|
|
(:file:`ipcontroller-engine.json`) is located.
|
|
|
|
|
|
At this point, the controller and engines will be connected. By default, the JSON files
|
|
|
created by the controller are put into the :file:`~/.ipython/cluster_default/security`
|
|
|
directory. If the engines share a filesystem with the controller, step 2 can be skipped as
|
|
|
the engines will automatically look at that location.
|
|
|
|
|
|
The final step required to actually use the running controller from a client is to move
|
|
|
the JSON file :file:`ipcontroller-client.json` from ``host0`` to any host where clients
|
|
|
will be run. If these file are put into the :file:`~/.ipython/cluster_default/security`
|
|
|
directory of the client's host, they will be found automatically. Otherwise, the full path
|
|
|
to them has to be passed to the client's constructor.
|
|
|
|
|
|
Using :command:`ipcluster`
|
|
|
===========================
|
|
|
|
|
|
The :command:`ipcluster` command provides a simple way of starting a
|
|
|
controller and engines in the following situations:
|
|
|
|
|
|
1. When the controller and engines are all run on localhost. This is useful
|
|
|
for testing or running on a multicore computer.
|
|
|
2. When engines are started using the :command:`mpiexec` command that comes
|
|
|
with most MPI [MPI]_ implementations
|
|
|
3. When engines are started using the PBS [PBS]_ batch system
|
|
|
(or other `qsub` systems, such as SGE).
|
|
|
4. When the controller is started on localhost and the engines are started on
|
|
|
remote nodes using :command:`ssh`.
|
|
|
5. When engines are started using the Windows HPC Server batch system.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
Currently :command:`ipcluster` requires that the
|
|
|
:file:`~/.ipython/cluster_<profile>/security` directory live on a shared filesystem that is
|
|
|
seen by both the controller and engines. If you don't have a shared file
|
|
|
system you will need to use :command:`ipcontroller` and
|
|
|
:command:`ipengine` directly.
|
|
|
|
|
|
Under the hood, :command:`ipcluster` just uses :command:`ipcontroller`
|
|
|
and :command:`ipengine` to perform the steps described above.
|
|
|
|
|
|
The simplest way to use ipcluster requires no configuration, and will
|
|
|
launch a controller and a number of engines on the local machine. For instance,
|
|
|
to start one controller and 4 engines on localhost, just do::
|
|
|
|
|
|
$ ipcluster start n=4
|
|
|
|
|
|
To see other command line options, do::
|
|
|
|
|
|
$ ipcluster -h
|
|
|
|
|
|
|
|
|
Configuring an IPython cluster
|
|
|
==============================
|
|
|
|
|
|
Cluster configurations are stored as `profiles`. You can create a new profile with::
|
|
|
|
|
|
$ ipcluster create profile=myprofile
|
|
|
|
|
|
This will create the directory :file:`IPYTHONDIR/cluster_myprofile`, and populate it
|
|
|
with the default configuration files for the three IPython cluster commands. Once
|
|
|
you edit those files, you can continue to call ipcluster/ipcontroller/ipengine
|
|
|
with no arguments beyond ``p=myprofile``, and any configuration will be maintained.
|
|
|
|
|
|
There is no limit to the number of profiles you can have, so you can maintain a profile for each
|
|
|
of your common use cases. The default profile will be used whenever the
|
|
|
profile argument is not specified, so edit :file:`IPYTHONDIR/cluster_default/*_config.py` to
|
|
|
represent your most common use case.
|
|
|
|
|
|
The configuration files are loaded with commented-out settings and explanations,
|
|
|
which should cover most of the available possibilities.
|
|
|
|
|
|
Using various batch systems with :command:`ipcluster`
|
|
|
------------------------------------------------------
|
|
|
|
|
|
:command:`ipcluster` has a notion of Launchers that can start controllers
|
|
|
and engines with various remote execution schemes. Currently supported
|
|
|
models include :command:`ssh`, :command`mpiexec`, PBS-style (Torque, SGE),
|
|
|
and Windows HPC Server.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
The Launchers and configuration are designed in such a way that advanced
|
|
|
users can subclass and configure them to fit their own system that we
|
|
|
have not yet supported (such as Condor)
|
|
|
|
|
|
Using :command:`ipcluster` in mpiexec/mpirun mode
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
|
The mpiexec/mpirun mode is useful if you:
|
|
|
|
|
|
1. Have MPI installed.
|
|
|
2. Your systems are configured to use the :command:`mpiexec` or
|
|
|
:command:`mpirun` commands to start MPI processes.
|
|
|
|
|
|
If these are satisfied, you can create a new profile::
|
|
|
|
|
|
$ ipcluster create profile=mpi
|
|
|
|
|
|
and edit the file :file:`IPYTHONDIR/cluster_mpi/ipcluster_config.py`.
|
|
|
|
|
|
There, instruct ipcluster to use the MPIExec launchers by adding the lines:
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
c.IPClusterEnginesApp.engine_launcher = 'IPython.parallel.apps.launcher.MPIExecEngineSetLauncher'
|
|
|
|
|
|
If the default MPI configuration is correct, then you can now start your cluster, with::
|
|
|
|
|
|
$ ipcluster start n=4 profile=mpi
|
|
|
|
|
|
This does the following:
|
|
|
|
|
|
1. Starts the IPython controller on current host.
|
|
|
2. Uses :command:`mpiexec` to start 4 engines.
|
|
|
|
|
|
If you have a reason to also start the Controller with mpi, you can specify:
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
c.IPClusterStartApp.controller_launcher = 'IPython.parallel.apps.launcher.MPIExecControllerLauncher'
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
The Controller *will not* be in the same MPI universe as the engines, so there is not
|
|
|
much reason to do this unless sysadmins demand it.
|
|
|
|
|
|
On newer MPI implementations (such as OpenMPI), this will work even if you
|
|
|
don't make any calls to MPI or call :func:`MPI_Init`. However, older MPI
|
|
|
implementations actually require each process to call :func:`MPI_Init` upon
|
|
|
starting. The easiest way of having this done is to install the mpi4py
|
|
|
[mpi4py]_ package and then specify the ``c.MPI.use`` option in :file:`ipengine_config.py`:
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
c.MPI.use = 'mpi4py'
|
|
|
|
|
|
Unfortunately, even this won't work for some MPI implementations. If you are
|
|
|
having problems with this, you will likely have to use a custom Python
|
|
|
executable that itself calls :func:`MPI_Init` at the appropriate time.
|
|
|
Fortunately, mpi4py comes with such a custom Python executable that is easy to
|
|
|
install and use. However, this custom Python executable approach will not work
|
|
|
with :command:`ipcluster` currently.
|
|
|
|
|
|
More details on using MPI with IPython can be found :ref:`here <parallelmpi>`.
|
|
|
|
|
|
|
|
|
Using :command:`ipcluster` in PBS mode
|
|
|
---------------------------------------
|
|
|
|
|
|
The PBS mode uses the Portable Batch System [PBS]_ to start the engines.
|
|
|
|
|
|
As usual, we will start by creating a fresh profile::
|
|
|
|
|
|
$ ipcluster create profile=pbs
|
|
|
|
|
|
And in :file:`ipcluster_config.py`, we will select the PBS launchers for the controller
|
|
|
and engines:
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
c.Global.controller_launcher = 'IPython.parallel.apps.launcher.PBSControllerLauncher'
|
|
|
c.Global.engine_launcher = 'IPython.parallel.apps.launcher.PBSEngineSetLauncher'
|
|
|
|
|
|
IPython does provide simple default batch templates for PBS and SGE, but you may need
|
|
|
to specify your own. Here is a sample PBS script template:
|
|
|
|
|
|
.. sourcecode:: bash
|
|
|
|
|
|
#PBS -N ipython
|
|
|
#PBS -j oe
|
|
|
#PBS -l walltime=00:10:00
|
|
|
#PBS -l nodes=${n/4}:ppn=4
|
|
|
#PBS -q $queue
|
|
|
|
|
|
cd $$PBS_O_WORKDIR
|
|
|
export PATH=$$HOME/usr/local/bin
|
|
|
export PYTHONPATH=$$HOME/usr/local/lib/python2.7/site-packages
|
|
|
/usr/local/bin/mpiexec -n ${n} ipengine cluster_dir=${cluster_dir}
|
|
|
|
|
|
There are a few important points about this template:
|
|
|
|
|
|
1. This template will be rendered at runtime using IPython's :mod:`Itpl`
|
|
|
template engine.
|
|
|
|
|
|
2. Instead of putting in the actual number of engines, use the notation
|
|
|
``${n}`` to indicate the number of engines to be started. You can also uses
|
|
|
expressions like ``${n/4}`` in the template to indicate the number of
|
|
|
nodes. There will always be a ${n} and ${cluster_dir} variable passed to the template.
|
|
|
These allow the batch system to know how many engines, and where the configuration
|
|
|
files reside. The same is true for the batch queue, with the template variable ``$queue``.
|
|
|
|
|
|
3. Because ``$`` is a special character used by the template engine, you must
|
|
|
escape any ``$`` by using ``$$``. This is important when referring to
|
|
|
environment variables in the template, or in SGE, where the config lines start
|
|
|
with ``#$``, which will have to be ``#$$``.
|
|
|
|
|
|
4. Any options to :command:`ipengine` can be given in the batch script
|
|
|
template, or in :file:`ipengine_config.py`.
|
|
|
|
|
|
5. Depending on the configuration of you system, you may have to set
|
|
|
environment variables in the script template.
|
|
|
|
|
|
The controller template should be similar, but simpler:
|
|
|
|
|
|
.. sourcecode:: bash
|
|
|
|
|
|
#PBS -N ipython
|
|
|
#PBS -j oe
|
|
|
#PBS -l walltime=00:10:00
|
|
|
#PBS -l nodes=1:ppn=4
|
|
|
#PBS -q $queue
|
|
|
|
|
|
cd $$PBS_O_WORKDIR
|
|
|
export PATH=$$HOME/usr/local/bin
|
|
|
export PYTHONPATH=$$HOME/usr/local/lib/python2.7/site-packages
|
|
|
ipcontroller cluster_dir=${cluster_dir}
|
|
|
|
|
|
|
|
|
Once you have created these scripts, save them with names like
|
|
|
:file:`pbs.engine.template`. Now you can load them into the :file:`ipcluster_config` with:
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
c.PBSEngineSetLauncher.batch_template_file = "pbs.engine.template"
|
|
|
|
|
|
c.PBSControllerLauncher.batch_template_file = "pbs.controller.template"
|
|
|
|
|
|
|
|
|
Alternately, you can just define the templates as strings inside :file:`ipcluster_config`.
|
|
|
|
|
|
Whether you are using your own templates or our defaults, the extra configurables available are
|
|
|
the number of engines to launch (``$n``, and the batch system queue to which the jobs are to be
|
|
|
submitted (``$queue``)). These are configurables, and can be specified in
|
|
|
:file:`ipcluster_config`:
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
c.PBSLauncher.queue = 'veryshort.q'
|
|
|
c.PBSEngineSetLauncher.n = 64
|
|
|
|
|
|
Note that assuming you are running PBS on a multi-node cluster, the Controller's default behavior
|
|
|
of listening only on localhost is likely too restrictive. In this case, also assuming the
|
|
|
nodes are safely behind a firewall, you can simply instruct the Controller to listen for
|
|
|
connections on all its interfaces, by adding in :file:`ipcontroller_config`:
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
c.RegistrationFactory.ip = '*'
|
|
|
|
|
|
You can now run the cluster with::
|
|
|
|
|
|
$ ipcluster start profile=pbs n=128
|
|
|
|
|
|
Additional configuration options can be found in the PBS section of :file:`ipcluster_config`.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
Due to the flexibility of configuration, the PBS launchers work with simple changes
|
|
|
to the template for other :command:`qsub`-using systems, such as Sun Grid Engine,
|
|
|
and with further configuration in similar batch systems like Condor.
|
|
|
|
|
|
|
|
|
Using :command:`ipcluster` in SSH mode
|
|
|
---------------------------------------
|
|
|
|
|
|
|
|
|
The SSH mode uses :command:`ssh` to execute :command:`ipengine` on remote
|
|
|
nodes and :command:`ipcontroller` can be run remotely as well, or on localhost.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
When using this mode it highly recommended that you have set up SSH keys
|
|
|
and are using ssh-agent [SSH]_ for password-less logins.
|
|
|
|
|
|
As usual, we start by creating a clean profile::
|
|
|
|
|
|
$ ipcluster create profile= ssh
|
|
|
|
|
|
To use this mode, select the SSH launchers in :file:`ipcluster_config.py`:
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
c.Global.engine_launcher = 'IPython.parallel.apps.launcher.SSHEngineSetLauncher'
|
|
|
# and if the Controller is also to be remote:
|
|
|
c.Global.controller_launcher = 'IPython.parallel.apps.launcher.SSHControllerLauncher'
|
|
|
|
|
|
|
|
|
The controller's remote location and configuration can be specified:
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
# Set the user and hostname for the controller
|
|
|
# c.SSHControllerLauncher.hostname = 'controller.example.com'
|
|
|
# c.SSHControllerLauncher.user = os.environ.get('USER','username')
|
|
|
|
|
|
# Set the arguments to be passed to ipcontroller
|
|
|
# note that remotely launched ipcontroller will not get the contents of
|
|
|
# the local ipcontroller_config.py unless it resides on the *remote host*
|
|
|
# in the location specified by the `cluster_dir` argument.
|
|
|
# c.SSHControllerLauncher.program_args = ['-r', '-ip', '0.0.0.0', '--cluster_dir', '/path/to/cd']
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
SSH mode does not do any file movement, so you will need to distribute configuration
|
|
|
files manually. To aid in this, the `reuse_files` flag defaults to True for ssh-launched
|
|
|
Controllers, so you will only need to do this once, unless you override this flag back
|
|
|
to False.
|
|
|
|
|
|
Engines are specified in a dictionary, by hostname and the number of engines to be run
|
|
|
on that host.
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
c.SSHEngineSetLauncher.engines = { 'host1.example.com' : 2,
|
|
|
'host2.example.com' : 5,
|
|
|
'host3.example.com' : (1, ['cluster_dir=/home/different/location']),
|
|
|
'host4.example.com' : 8 }
|
|
|
|
|
|
* The `engines` dict, where the keys are the host we want to run engines on and
|
|
|
the value is the number of engines to run on that host.
|
|
|
* on host3, the value is a tuple, where the number of engines is first, and the arguments
|
|
|
to be passed to :command:`ipengine` are the second element.
|
|
|
|
|
|
For engines without explicitly specified arguments, the default arguments are set in
|
|
|
a single location:
|
|
|
|
|
|
.. sourcecode:: python
|
|
|
|
|
|
c.SSHEngineSetLauncher.engine_args = ['--cluster_dir', '/path/to/cluster_ssh']
|
|
|
|
|
|
Current limitations of the SSH mode of :command:`ipcluster` are:
|
|
|
|
|
|
* Untested on Windows. Would require a working :command:`ssh` on Windows.
|
|
|
Also, we are using shell scripts to setup and execute commands on remote
|
|
|
hosts.
|
|
|
* No file movement -
|
|
|
|
|
|
Using the :command:`ipcontroller` and :command:`ipengine` commands
|
|
|
====================================================================
|
|
|
|
|
|
It is also possible to use the :command:`ipcontroller` and :command:`ipengine`
|
|
|
commands to start your controller and engines. This approach gives you full
|
|
|
control over all aspects of the startup process.
|
|
|
|
|
|
Starting the controller and engine on your local machine
|
|
|
--------------------------------------------------------
|
|
|
|
|
|
To use :command:`ipcontroller` and :command:`ipengine` to start things on your
|
|
|
local machine, do the following.
|
|
|
|
|
|
First start the controller::
|
|
|
|
|
|
$ ipcontroller
|
|
|
|
|
|
Next, start however many instances of the engine you want using (repeatedly)
|
|
|
the command::
|
|
|
|
|
|
$ ipengine
|
|
|
|
|
|
The engines should start and automatically connect to the controller using the
|
|
|
JSON files in :file:`~/.ipython/cluster_default/security`. You are now ready to use the
|
|
|
controller and engines from IPython.
|
|
|
|
|
|
.. warning::
|
|
|
|
|
|
The order of the above operations may be important. You *must*
|
|
|
start the controller before the engines, unless you are reusing connection
|
|
|
information (via `-r`), in which case ordering is not important.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
On some platforms (OS X), to put the controller and engine into the
|
|
|
background you may need to give these commands in the form ``(ipcontroller
|
|
|
&)`` and ``(ipengine &)`` (with the parentheses) for them to work
|
|
|
properly.
|
|
|
|
|
|
Starting the controller and engines on different hosts
|
|
|
------------------------------------------------------
|
|
|
|
|
|
When the controller and engines are running on different hosts, things are
|
|
|
slightly more complicated, but the underlying ideas are the same:
|
|
|
|
|
|
1. Start the controller on a host using :command:`ipcontroller`.
|
|
|
2. Copy :file:`ipcontroller-engine.json` from :file:`~/.ipython/cluster_<profile>/security` on
|
|
|
the controller's host to the host where the engines will run.
|
|
|
3. Use :command:`ipengine` on the engine's hosts to start the engines.
|
|
|
|
|
|
The only thing you have to be careful of is to tell :command:`ipengine` where
|
|
|
the :file:`ipcontroller-engine.json` file is located. There are two ways you
|
|
|
can do this:
|
|
|
|
|
|
* Put :file:`ipcontroller-engine.json` in the :file:`~/.ipython/cluster_<profile>/security`
|
|
|
directory on the engine's host, where it will be found automatically.
|
|
|
* Call :command:`ipengine` with the ``--file=full_path_to_the_file``
|
|
|
flag.
|
|
|
|
|
|
The ``--file`` flag works like this::
|
|
|
|
|
|
$ ipengine --file=/path/to/my/ipcontroller-engine.json
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
If the controller's and engine's hosts all have a shared file system
|
|
|
(:file:`~/.ipython/cluster_<profile>/security` is the same on all of them), then things
|
|
|
will just work!
|
|
|
|
|
|
Make JSON files persistent
|
|
|
--------------------------
|
|
|
|
|
|
At fist glance it may seem that that managing the JSON files is a bit
|
|
|
annoying. Going back to the house and key analogy, copying the JSON around
|
|
|
each time you start the controller is like having to make a new key every time
|
|
|
you want to unlock the door and enter your house. As with your house, you want
|
|
|
to be able to create the key (or JSON file) once, and then simply use it at
|
|
|
any point in the future.
|
|
|
|
|
|
To do this, the only thing you have to do is specify the `--reuse` flag, so that
|
|
|
the connection information in the JSON files remains accurate::
|
|
|
|
|
|
$ ipcontroller --reuse
|
|
|
|
|
|
Then, just copy the JSON files over the first time and you are set. You can
|
|
|
start and stop the controller and engines any many times as you want in the
|
|
|
future, just make sure to tell the controller to reuse the file.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
You may ask the question: what ports does the controller listen on if you
|
|
|
don't tell is to use specific ones? The default is to use high random port
|
|
|
numbers. We do this for two reasons: i) to increase security through
|
|
|
obscurity and ii) to multiple controllers on a given host to start and
|
|
|
automatically use different ports.
|
|
|
|
|
|
Log files
|
|
|
---------
|
|
|
|
|
|
All of the components of IPython have log files associated with them.
|
|
|
These log files can be extremely useful in debugging problems with
|
|
|
IPython and can be found in the directory :file:`~/.ipython/cluster_<profile>/log`.
|
|
|
Sending the log files to us will often help us to debug any problems.
|
|
|
|
|
|
|
|
|
Configuring `ipcontroller`
|
|
|
---------------------------
|
|
|
|
|
|
Ports and addresses
|
|
|
*******************
|
|
|
|
|
|
|
|
|
Database Backend
|
|
|
****************
|
|
|
|
|
|
|
|
|
.. seealso::
|
|
|
|
|
|
|
|
|
|
|
|
Configuring `ipengine`
|
|
|
-----------------------
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
TODO
|
|
|
|
|
|
|
|
|
|
|
|
.. [PBS] Portable Batch System. http://www.openpbs.org/
|
|
|
.. [SSH] SSH-Agent http://en.wikipedia.org/wiki/ssh-agent
|
|
|
|