parallel_process.txt
867 lines
| 33.7 KiB
| text/plain
|
TextLexer
MinRK
|
r3586 | .. _parallel_process: | ||
=========================================== | ||||
Starting the IPython controller and engines | ||||
=========================================== | ||||
To use IPython for parallel computing, you need to start one instance of | ||||
the controller and one or more instances of the engine. The controller | ||||
and each engine can run on different machines or on the same machine. | ||||
Because of this, there are many different possibilities. | ||||
Broadly speaking, there are two ways of going about starting a controller and engines: | ||||
MinRK
|
r3672 | * In an automated manner using the :command:`ipcluster` command. | ||
* In a more manual way using the :command:`ipcontroller` and | ||||
:command:`ipengine` commands. | ||||
MinRK
|
r3586 | |||
This document describes both of these methods. We recommend that new users | ||||
MinRK
|
r3672 | start with the :command:`ipcluster` command as it simplifies many common usage | ||
MinRK
|
r3586 | cases. | ||
General considerations | ||||
====================== | ||||
Before delving into the details about how you can start a controller and | ||||
engines using the various methods, we outline some of the general issues that | ||||
come up when starting the controller and engines. These things come up no | ||||
matter which method you use to start your IPython cluster. | ||||
MinRK
|
r4109 | If you are running engines on multiple machines, you will likely need to instruct the | ||
controller to listen for connections on an external interface. This can be done by specifying | ||||
the ``ip`` argument on the command-line, or the ``HubFactory.ip`` configurable in | ||||
:file:`ipcontroller_config.py`. | ||||
If your machines are on a trusted network, you can safely instruct the controller to listen | ||||
on all public interfaces with:: | ||||
Thomas Kluyver
|
r4196 | $> ipcontroller --ip=* | ||
MinRK
|
r4109 | |||
Or you can set the same behavior as the default by adding the following line to your :file:`ipcontroller_config.py`: | ||||
.. sourcecode:: python | ||||
c.HubFactory.ip = '*' | ||||
.. note:: | ||||
Due to the lack of security in ZeroMQ, the controller will only listen for connections on | ||||
localhost by default. If you see Timeout errors on engines or clients, then the first | ||||
thing you should check is the ip address the controller is listening on, and make sure | ||||
that it is visible from the timing out machine. | ||||
.. seealso:: | ||||
Our `notes <parallel_security>`_ on security in the new parallel computing code. | ||||
MinRK
|
r3586 | Let's say that you want to start the controller on ``host0`` and engines on | ||
hosts ``host1``-``hostn``. The following steps are then required: | ||||
MinRK
|
r3672 | 1. Start the controller on ``host0`` by running :command:`ipcontroller` on | ||
MinRK
|
r4109 | ``host0``. The controller must be instructed to listen on an interface visible | ||
to the engine machines, via the ``ip`` command-line argument or ``HubFactory.ip`` | ||||
in :file:`ipcontroller_config.py`. | ||||
MinRK
|
r3617 | 2. Move the JSON file (:file:`ipcontroller-engine.json`) created by the | ||
MinRK
|
r3586 | controller from ``host0`` to hosts ``host1``-``hostn``. | ||
3. Start the engines on hosts ``host1``-``hostn`` by running | ||||
MinRK
|
r3672 | :command:`ipengine`. This command has to be told where the JSON file | ||
MinRK
|
r3617 | (:file:`ipcontroller-engine.json`) is located. | ||
At this point, the controller and engines will be connected. By default, the JSON files | ||||
MinRK
|
r4060 | created by the controller are put into the :file:`~/.ipython/profile_default/security` | ||
MinRK
|
r3617 | directory. If the engines share a filesystem with the controller, step 2 can be skipped as | ||
the engines will automatically look at that location. | ||||
The final step required to actually use the running controller from a client is to move | ||||
the JSON file :file:`ipcontroller-client.json` from ``host0`` to any host where clients | ||||
MinRK
|
r4060 | will be run. If these file are put into the :file:`~/.ipython/profile_default/security` | ||
MinRK
|
r3617 | directory of the client's host, they will be found automatically. Otherwise, the full path | ||
to them has to be passed to the client's constructor. | ||||
MinRK
|
r3586 | |||
MinRK
|
r3672 | Using :command:`ipcluster` | ||
MinRK
|
r3663 | =========================== | ||
MinRK
|
r3586 | |||
MinRK
|
r3672 | The :command:`ipcluster` command provides a simple way of starting a | ||
MinRK
|
r3586 | controller and engines in the following situations: | ||
1. When the controller and engines are all run on localhost. This is useful | ||||
for testing or running on a multicore computer. | ||||
MinRK
|
r3990 | 2. When engines are started using the :command:`mpiexec` command that comes | ||
MinRK
|
r3586 | with most MPI [MPI]_ implementations | ||
MinRK
|
r3647 | 3. When engines are started using the PBS [PBS]_ batch system | ||
(or other `qsub` systems, such as SGE). | ||||
MinRK
|
r3586 | 4. When the controller is started on localhost and the engines are started on | ||
remote nodes using :command:`ssh`. | ||||
MinRK
|
r3647 | 5. When engines are started using the Windows HPC Server batch system. | ||
MinRK
|
r3586 | |||
.. note:: | ||||
MinRK
|
r3672 | Currently :command:`ipcluster` requires that the | ||
MinRK
|
r4024 | :file:`~/.ipython/profile_<name>/security` directory live on a shared filesystem that is | ||
MinRK
|
r3586 | seen by both the controller and engines. If you don't have a shared file | ||
MinRK
|
r3672 | system you will need to use :command:`ipcontroller` and | ||
:command:`ipengine` directly. | ||||
MinRK
|
r3586 | |||
MinRK
|
r3672 | Under the hood, :command:`ipcluster` just uses :command:`ipcontroller` | ||
and :command:`ipengine` to perform the steps described above. | ||||
MinRK
|
r3586 | |||
MinRK
|
r3672 | The simplest way to use ipcluster requires no configuration, and will | ||
MinRK
|
r3647 | launch a controller and a number of engines on the local machine. For instance, | ||
to start one controller and 4 engines on localhost, just do:: | ||||
MinRK
|
r3586 | |||
MinRK
|
r4608 | $ ipcluster start -n 4 | ||
MinRK
|
r3586 | |||
MinRK
|
r3990 | To see other command line options, do:: | ||
MinRK
|
r3586 | |||
MinRK
|
r3672 | $ ipcluster -h | ||
MinRK
|
r3586 | |||
MinRK
|
r3617 | |||
MinRK
|
r3647 | Configuring an IPython cluster | ||
============================== | ||||
MinRK
|
r3617 | |||
MinRK
|
r3647 | Cluster configurations are stored as `profiles`. You can create a new profile with:: | ||
Thomas Kluyver
|
r4196 | $ ipython profile create --parallel --profile=myprofile | ||
MinRK
|
r3647 | |||
MinRK
|
r4060 | This will create the directory :file:`IPYTHONDIR/profile_myprofile`, and populate it | ||
MinRK
|
r3647 | with the default configuration files for the three IPython cluster commands. Once | ||
MinRK
|
r3672 | you edit those files, you can continue to call ipcluster/ipcontroller/ipengine | ||
MinRK
|
r4060 | with no arguments beyond ``profile=myprofile``, and any configuration will be maintained. | ||
MinRK
|
r3647 | |||
There is no limit to the number of profiles you can have, so you can maintain a profile for each | ||||
of your common use cases. The default profile will be used whenever the | ||||
MinRK
|
r4060 | profile argument is not specified, so edit :file:`IPYTHONDIR/profile_default/*_config.py` to | ||
MinRK
|
r3647 | represent your most common use case. | ||
The configuration files are loaded with commented-out settings and explanations, | ||||
which should cover most of the available possibilities. | ||||
MinRK
|
r3672 | Using various batch systems with :command:`ipcluster` | ||
MinRK
|
r4109 | ----------------------------------------------------- | ||
MinRK
|
r3647 | |||
MinRK
|
r3672 | :command:`ipcluster` has a notion of Launchers that can start controllers | ||
MinRK
|
r3647 | and engines with various remote execution schemes. Currently supported | ||
MinRK
|
r5183 | models include :command:`ssh`, :command:`mpiexec`, PBS-style (Torque, SGE, LSF), | ||
MinRK
|
r3990 | and Windows HPC Server. | ||
MinRK
|
r3586 | |||
MinRK
|
r5183 | In general, these are configured by the :attr:`IPClusterEngines.engine_set_launcher_class`, | ||
and :attr:`IPClusterStart.controller_launcher_class` configurables, which can be the | ||||
fully specified object name (e.g. ``'IPython.parallel.apps.launcher.LocalControllerLauncher'``), | ||||
but if you are using IPython's builtin launchers, you can specify just the class name, | ||||
or even just the prefix e.g: | ||||
.. sourcecode:: python | ||||
c.IPClusterEngines.engine_launcher_class = 'SSH' | ||||
# equivalent to | ||||
c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher' | ||||
# both of which expand to | ||||
c.IPClusterEngines.engine_launcher_class = 'IPython.parallel.apps.launcher.SSHEngineSetLauncher' | ||||
The shortest form being of particular use on the command line, where all you need to do to | ||||
get an IPython cluster running with engines started with MPI is: | ||||
.. sourcecode:: bash | ||||
MinRK
|
r5696 | $> ipcluster start --engines=MPI | ||
MinRK
|
r5183 | |||
Assuming that the default MPI config is sufficient. | ||||
.. note:: | ||||
shortcuts for builtin launcher names were added in 0.12, as was the ``_class`` suffix | ||||
on the configurable names. If you use the old 0.11 names (e.g. ``engine_set_launcher``), | ||||
they will still work, but you will get a deprecation warning that the name has changed. | ||||
MinRK
|
r3617 | .. note:: | ||
MinRK
|
r3647 | The Launchers and configuration are designed in such a way that advanced | ||
users can subclass and configure them to fit their own system that we | ||||
have not yet supported (such as Condor) | ||||
MinRK
|
r3672 | Using :command:`ipcluster` in mpiexec/mpirun mode | ||
MinRK
|
r5487 | ------------------------------------------------- | ||
MinRK
|
r3617 | |||
MinRK
|
r3586 | The mpiexec/mpirun mode is useful if you: | ||
1. Have MPI installed. | ||||
2. Your systems are configured to use the :command:`mpiexec` or | ||||
:command:`mpirun` commands to start MPI processes. | ||||
MinRK
|
r3647 | If these are satisfied, you can create a new profile:: | ||
Thomas Kluyver
|
r4196 | $ ipython profile create --parallel --profile=mpi | ||
MinRK
|
r3647 | |||
MinRK
|
r4060 | and edit the file :file:`IPYTHONDIR/profile_mpi/ipcluster_config.py`. | ||
MinRK
|
r3586 | |||
MinRK
|
r5696 | There, instruct ipcluster to use the MPI launchers by adding the lines: | ||
MinRK
|
r3586 | |||
MinRK
|
r3647 | .. sourcecode:: python | ||
MinRK
|
r5696 | c.IPClusterEngines.engine_launcher_class = 'MPIEngineSetLauncher' | ||
MinRK
|
r3647 | |||
If the default MPI configuration is correct, then you can now start your cluster, with:: | ||||
MinRK
|
r3586 | |||
MinRK
|
r4608 | $ ipcluster start -n 4 --profile=mpi | ||
MinRK
|
r3586 | |||
This does the following: | ||||
1. Starts the IPython controller on current host. | ||||
2. Uses :command:`mpiexec` to start 4 engines. | ||||
MinRK
|
r3647 | If you have a reason to also start the Controller with mpi, you can specify: | ||
.. sourcecode:: python | ||||
MinRK
|
r5696 | c.IPClusterStart.controller_launcher_class = 'MPIControllerLauncher' | ||
MinRK
|
r3647 | |||
.. note:: | ||||
The Controller *will not* be in the same MPI universe as the engines, so there is not | ||||
much reason to do this unless sysadmins demand it. | ||||
MinRK
|
r3586 | On newer MPI implementations (such as OpenMPI), this will work even if you | ||
don't make any calls to MPI or call :func:`MPI_Init`. However, older MPI | ||||
implementations actually require each process to call :func:`MPI_Init` upon | ||||
starting. The easiest way of having this done is to install the mpi4py | ||||
MinRK
|
r3672 | [mpi4py]_ package and then specify the ``c.MPI.use`` option in :file:`ipengine_config.py`: | ||
MinRK
|
r3647 | |||
.. sourcecode:: python | ||||
MinRK
|
r3586 | |||
MinRK
|
r3647 | c.MPI.use = 'mpi4py' | ||
MinRK
|
r3586 | |||
Unfortunately, even this won't work for some MPI implementations. If you are | ||||
having problems with this, you will likely have to use a custom Python | ||||
executable that itself calls :func:`MPI_Init` at the appropriate time. | ||||
Fortunately, mpi4py comes with such a custom Python executable that is easy to | ||||
install and use. However, this custom Python executable approach will not work | ||||
MinRK
|
r3672 | with :command:`ipcluster` currently. | ||
MinRK
|
r3586 | |||
More details on using MPI with IPython can be found :ref:`here <parallelmpi>`. | ||||
MinRK
|
r3672 | Using :command:`ipcluster` in PBS mode | ||
MinRK
|
r5487 | -------------------------------------- | ||
MinRK
|
r3586 | |||
MinRK
|
r4090 | The PBS mode uses the Portable Batch System (PBS) to start the engines. | ||
MinRK
|
r3647 | |||
As usual, we will start by creating a fresh profile:: | ||||
Thomas Kluyver
|
r4196 | $ ipython profile create --parallel --profile=pbs | ||
MinRK
|
r3647 | |||
MinRK
|
r3672 | And in :file:`ipcluster_config.py`, we will select the PBS launchers for the controller | ||
MinRK
|
r3647 | and engines: | ||
MinRK
|
r3617 | |||
MinRK
|
r3647 | .. sourcecode:: python | ||
MinRK
|
r3617 | |||
MinRK
|
r5183 | c.IPClusterStart.controller_launcher_class = 'PBSControllerLauncher' | ||
c.IPClusterEngines.engine_launcher_class = 'PBSEngineSetLauncher' | ||||
MinRK
|
r3617 | |||
MinRK
|
r4090 | .. note:: | ||
Note that the configurable is IPClusterEngines for the engine launcher, and | ||||
IPClusterStart for the controller launcher. This is because the start command is a | ||||
subclass of the engine command, adding a controller launcher. Since it is a subclass, | ||||
any configuration made in IPClusterEngines is inherited by IPClusterStart unless it is | ||||
overridden. | ||||
MinRK
|
r3670 | IPython does provide simple default batch templates for PBS and SGE, but you may need | ||
to specify your own. Here is a sample PBS script template: | ||||
MinRK
|
r3586 | |||
.. sourcecode:: bash | ||||
#PBS -N ipython | ||||
#PBS -j oe | ||||
#PBS -l walltime=00:10:00 | ||||
MinRK
|
r4004 | #PBS -l nodes={n/4}:ppn=4 | ||
#PBS -q {queue} | ||||
MinRK
|
r3586 | |||
MinRK
|
r4004 | cd $PBS_O_WORKDIR | ||
export PATH=$HOME/usr/local/bin | ||||
export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages | ||||
Brian E. Granger
|
r4219 | /usr/local/bin/mpiexec -n {n} ipengine --profile-dir={profile_dir} | ||
MinRK
|
r3586 | |||
There are a few important points about this template: | ||||
MinRK
|
r4004 | 1. This template will be rendered at runtime using IPython's :class:`EvalFormatter`. | ||
This is simply a subclass of :class:`string.Formatter` that allows simple expressions | ||||
on keys. | ||||
MinRK
|
r3586 | |||
2. Instead of putting in the actual number of engines, use the notation | ||||
MinRK
|
r4004 | ``{n}`` to indicate the number of engines to be started. You can also use | ||
expressions like ``{n/4}`` in the template to indicate the number of nodes. | ||||
There will always be ``{n}`` and ``{profile_dir}`` variables passed to the formatter. | ||||
MinRK
|
r3647 | These allow the batch system to know how many engines, and where the configuration | ||
MinRK
|
r4004 | files reside. The same is true for the batch queue, with the template variable | ||
``{queue}``. | ||||
MinRK
|
r3586 | |||
MinRK
|
r4004 | 3. Any options to :command:`ipengine` can be given in the batch script | ||
MinRK
|
r3672 | template, or in :file:`ipengine_config.py`. | ||
MinRK
|
r3586 | |||
MinRK
|
r4004 | 4. Depending on the configuration of you system, you may have to set | ||
MinRK
|
r3586 | environment variables in the script template. | ||
MinRK
|
r3647 | The controller template should be similar, but simpler: | ||
.. sourcecode:: bash | ||||
#PBS -N ipython | ||||
#PBS -j oe | ||||
#PBS -l walltime=00:10:00 | ||||
#PBS -l nodes=1:ppn=4 | ||||
MinRK
|
r4004 | #PBS -q {queue} | ||
MinRK
|
r3586 | |||
MinRK
|
r4004 | cd $PBS_O_WORKDIR | ||
export PATH=$HOME/usr/local/bin | ||||
export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages | ||||
Brian E. Granger
|
r4219 | ipcontroller --profile-dir={profile_dir} | ||
MinRK
|
r3586 | |||
MinRK
|
r3647 | Once you have created these scripts, save them with names like | ||
MinRK
|
r3672 | :file:`pbs.engine.template`. Now you can load them into the :file:`ipcluster_config` with: | ||
MinRK
|
r3586 | |||
MinRK
|
r3647 | .. sourcecode:: python | ||
MinRK
|
r3670 | c.PBSEngineSetLauncher.batch_template_file = "pbs.engine.template" | ||
MinRK
|
r3647 | |||
MinRK
|
r3670 | c.PBSControllerLauncher.batch_template_file = "pbs.controller.template" | ||
MinRK
|
r3647 | |||
MinRK
|
r3672 | Alternately, you can just define the templates as strings inside :file:`ipcluster_config`. | ||
MinRK
|
r3647 | |||
MinRK
|
r3670 | Whether you are using your own templates or our defaults, the extra configurables available are | ||
MinRK
|
r4004 | the number of engines to launch (``{n}``, and the batch system queue to which the jobs are to be | ||
submitted (``{queue}``)). These are configurables, and can be specified in | ||||
MinRK
|
r3672 | :file:`ipcluster_config`: | ||
MinRK
|
r3670 | |||
.. sourcecode:: python | ||||
c.PBSLauncher.queue = 'veryshort.q' | ||||
MinRK
|
r4064 | c.IPClusterEngines.n = 64 | ||
MinRK
|
r3670 | |||
MinRK
|
r3647 | Note that assuming you are running PBS on a multi-node cluster, the Controller's default behavior | ||
of listening only on localhost is likely too restrictive. In this case, also assuming the | ||||
nodes are safely behind a firewall, you can simply instruct the Controller to listen for | ||||
MinRK
|
r3672 | connections on all its interfaces, by adding in :file:`ipcontroller_config`: | ||
MinRK
|
r3647 | |||
.. sourcecode:: python | ||||
MinRK
|
r4090 | c.HubFactory.ip = '*' | ||
MinRK
|
r3647 | |||
You can now run the cluster with:: | ||||
MinRK
|
r4608 | $ ipcluster start --profile=pbs -n 128 | ||
MinRK
|
r3647 | |||
MinRK
|
r3672 | Additional configuration options can be found in the PBS section of :file:`ipcluster_config`. | ||
MinRK
|
r3586 | |||
MinRK
|
r3617 | .. note:: | ||
MinRK
|
r3647 | Due to the flexibility of configuration, the PBS launchers work with simple changes | ||
to the template for other :command:`qsub`-using systems, such as Sun Grid Engine, | ||||
and with further configuration in similar batch systems like Condor. | ||||
MinRK
|
r3672 | Using :command:`ipcluster` in SSH mode | ||
MinRK
|
r5487 | -------------------------------------- | ||
MinRK
|
r3617 | |||
MinRK
|
r3672 | The SSH mode uses :command:`ssh` to execute :command:`ipengine` on remote | ||
nodes and :command:`ipcontroller` can be run remotely as well, or on localhost. | ||||
MinRK
|
r3586 | |||
MinRK
|
r3647 | .. note:: | ||
MinRK
|
r3586 | |||
MinRK
|
r3647 | When using this mode it highly recommended that you have set up SSH keys | ||
and are using ssh-agent [SSH]_ for password-less logins. | ||||
MinRK
|
r3586 | |||
MinRK
|
r3647 | As usual, we start by creating a clean profile:: | ||
MinRK
|
r3586 | |||
Thomas Kluyver
|
r4196 | $ ipython profile create --parallel --profile=ssh | ||
MinRK
|
r3586 | |||
MinRK
|
r3672 | To use this mode, select the SSH launchers in :file:`ipcluster_config.py`: | ||
MinRK
|
r3586 | |||
MinRK
|
r3647 | .. sourcecode:: python | ||
MinRK
|
r3586 | |||
MinRK
|
r5183 | c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher' | ||
MinRK
|
r3647 | # and if the Controller is also to be remote: | ||
MinRK
|
r5183 | c.IPClusterStart.controller_launcher_class = 'SSHControllerLauncher' | ||
MinRK
|
r3586 | |||
MinRK
|
r3647 | The controller's remote location and configuration can be specified: | ||
MinRK
|
r3586 | |||
MinRK
|
r3647 | .. sourcecode:: python | ||
MinRK
|
r3586 | |||
MinRK
|
r3647 | # Set the user and hostname for the controller | ||
# c.SSHControllerLauncher.hostname = 'controller.example.com' | ||||
# c.SSHControllerLauncher.user = os.environ.get('USER','username') | ||||
MinRK
|
r3586 | |||
MinRK
|
r3672 | # Set the arguments to be passed to ipcontroller | ||
# note that remotely launched ipcontroller will not get the contents of | ||||
# the local ipcontroller_config.py unless it resides on the *remote host* | ||||
Brian E. Granger
|
r4219 | # in the location specified by the `profile-dir` argument. | ||
MinRK
|
r5487 | # c.SSHControllerLauncher.controller_args = ['--reuse', '--ip=*', '--profile-dir=/path/to/cd'] | ||
MinRK
|
r3586 | |||
MinRK
|
r3647 | Engines are specified in a dictionary, by hostname and the number of engines to be run | ||
on that host. | ||||
MinRK
|
r3586 | |||
MinRK
|
r3647 | .. sourcecode:: python | ||
c.SSHEngineSetLauncher.engines = { 'host1.example.com' : 2, | ||||
'host2.example.com' : 5, | ||||
Brian E. Granger
|
r4219 | 'host3.example.com' : (1, ['--profile-dir=/home/different/location']), | ||
MinRK
|
r3647 | 'host4.example.com' : 8 } | ||
MinRK
|
r3586 | |||
MinRK
|
r3647 | * The `engines` dict, where the keys are the host we want to run engines on and | ||
the value is the number of engines to run on that host. | ||||
* on host3, the value is a tuple, where the number of engines is first, and the arguments | ||||
MinRK
|
r3672 | to be passed to :command:`ipengine` are the second element. | ||
MinRK
|
r3586 | |||
MinRK
|
r3647 | For engines without explicitly specified arguments, the default arguments are set in | ||
a single location: | ||||
MinRK
|
r3586 | |||
MinRK
|
r3647 | .. sourcecode:: python | ||
MinRK
|
r3586 | |||
Brian E. Granger
|
r4219 | c.SSHEngineSetLauncher.engine_args = ['--profile-dir=/path/to/profile_ssh'] | ||
MinRK
|
r3586 | |||
MinRK
|
r3672 | Current limitations of the SSH mode of :command:`ipcluster` are: | ||
MinRK
|
r3586 | |||
MinRK
|
r6623 | * Untested and unsupported on Windows. Would require a working :command:`ssh` on Windows. | ||
Also, we are using shell scripts to setup and execute commands on remote hosts. | ||||
Moving files with SSH | ||||
********************* | ||||
SSH launchers will try to move connection files, controlled by the ``to_send`` and | ||||
``to_fetch`` configurables. If your machines are on a shared filesystem, this step is | ||||
unnecessary, and can be skipped by setting these to empty lists: | ||||
.. sourcecode:: python | ||||
c.SSHLauncher.to_send = [] | ||||
c.SSHLauncher.to_fetch = [] | ||||
If our default guesses about paths don't work for you, or other files | ||||
should be moved, you can manually specify these lists as tuples of (local_path, | ||||
remote_path) for to_send, and (remote_path, local_path) for to_fetch. If you do | ||||
specify these lists explicitly, IPython *will not* automatically send connection files, | ||||
so you must include this yourself if they should still be sent/retrieved. | ||||
MinRK
|
r5487 | |||
MinRK
|
r3586 | |||
MinRK
|
r5715 | IPython on EC2 with StarCluster | ||
=============================== | ||||
The excellent StarCluster_ toolkit for managing `Amazon EC2`_ clusters has a plugin | ||||
which makes deploying IPython on EC2 quite simple. The starcluster plugin uses | ||||
:command:`ipcluster` with the SGE launchers to distribute engines across the | ||||
EC2 cluster. See their `ipcluster plugin documentation`_ for more information. | ||||
.. _StarCluster: http://web.mit.edu/starcluster | ||||
.. _Amazon EC2: http://aws.amazon.com/ec2/ | ||||
.. _ipcluster plugin documentation: http://web.mit.edu/starcluster/docs/latest/plugins/ipython.html | ||||
MinRK
|
r3672 | Using the :command:`ipcontroller` and :command:`ipengine` commands | ||
MinRK
|
r5487 | ================================================================== | ||
MinRK
|
r3586 | |||
MinRK
|
r3672 | It is also possible to use the :command:`ipcontroller` and :command:`ipengine` | ||
MinRK
|
r3586 | commands to start your controller and engines. This approach gives you full | ||
control over all aspects of the startup process. | ||||
Starting the controller and engine on your local machine | ||||
-------------------------------------------------------- | ||||
MinRK
|
r3672 | To use :command:`ipcontroller` and :command:`ipengine` to start things on your | ||
MinRK
|
r3586 | local machine, do the following. | ||
First start the controller:: | ||||
MinRK
|
r3672 | $ ipcontroller | ||
MinRK
|
r3647 | |||
MinRK
|
r3586 | Next, start however many instances of the engine you want using (repeatedly) | ||
the command:: | ||||
MinRK
|
r3672 | $ ipengine | ||
MinRK
|
r3586 | |||
The engines should start and automatically connect to the controller using the | ||||
MinRK
|
r4060 | JSON files in :file:`~/.ipython/profile_default/security`. You are now ready to use the | ||
MinRK
|
r3586 | controller and engines from IPython. | ||
.. warning:: | ||||
MinRK
|
r3647 | |||
The order of the above operations may be important. You *must* | ||||
start the controller before the engines, unless you are reusing connection | ||||
MinRK
|
r4109 | information (via ``--reuse``), in which case ordering is not important. | ||
MinRK
|
r3586 | |||
.. note:: | ||||
On some platforms (OS X), to put the controller and engine into the | ||||
background you may need to give these commands in the form ``(ipcontroller | ||||
&)`` and ``(ipengine &)`` (with the parentheses) for them to work | ||||
properly. | ||||
Starting the controller and engines on different hosts | ||||
------------------------------------------------------ | ||||
When the controller and engines are running on different hosts, things are | ||||
slightly more complicated, but the underlying ideas are the same: | ||||
MinRK
|
r4109 | 1. Start the controller on a host using :command:`ipcontroller`. The controller must be | ||
instructed to listen on an interface visible to the engine machines, via the ``ip`` | ||||
MinRK
|
r5487 | command-line argument or ``HubFactory.ip`` in :file:`ipcontroller_config.py`:: | ||
$ ipcontroller --ip=192.168.1.16 | ||||
.. sourcecode:: python | ||||
# in ipcontroller_config.py | ||||
HubFactory.ip = '192.168.1.16' | ||||
MinRK
|
r4024 | 2. Copy :file:`ipcontroller-engine.json` from :file:`~/.ipython/profile_<name>/security` on | ||
MinRK
|
r3586 | the controller's host to the host where the engines will run. | ||
MinRK
|
r3672 | 3. Use :command:`ipengine` on the engine's hosts to start the engines. | ||
MinRK
|
r3586 | |||
MinRK
|
r3672 | The only thing you have to be careful of is to tell :command:`ipengine` where | ||
MinRK
|
r3617 | the :file:`ipcontroller-engine.json` file is located. There are two ways you | ||
MinRK
|
r3586 | can do this: | ||
MinRK
|
r4024 | * Put :file:`ipcontroller-engine.json` in the :file:`~/.ipython/profile_<name>/security` | ||
MinRK
|
r3586 | directory on the engine's host, where it will be found automatically. | ||
Thomas Kluyver
|
r4196 | * Call :command:`ipengine` with the ``--file=full_path_to_the_file`` | ||
MinRK
|
r3586 | flag. | ||
MinRK
|
r4063 | The ``file`` flag works like this:: | ||
MinRK
|
r3586 | |||
Thomas Kluyver
|
r4196 | $ ipengine --file=/path/to/my/ipcontroller-engine.json | ||
MinRK
|
r3586 | |||
.. note:: | ||||
If the controller's and engine's hosts all have a shared file system | ||||
MinRK
|
r4024 | (:file:`~/.ipython/profile_<name>/security` is the same on all of them), then things | ||
MinRK
|
r3586 | will just work! | ||
MinRK
|
r4588 | SSH Tunnels | ||
*********** | ||||
If your engines are not on the same LAN as the controller, or you are on a highly | ||||
restricted network where your nodes cannot see each others ports, then you can | ||||
use SSH tunnels to connect engines to the controller. | ||||
.. note:: | ||||
This does not work in all cases. Manual tunnels may be an option, but are | ||||
highly inconvenient. Support for manual tunnels will be improved. | ||||
You can instruct all engines to use ssh, by specifying the ssh server in | ||||
:file:`ipcontroller-engine.json`: | ||||
.. I know this is really JSON, but the example is a subset of Python: | ||||
.. sourcecode:: python | ||||
{ | ||||
"url":"tcp://192.168.1.123:56951", | ||||
"exec_key":"26f4c040-587d-4a4e-b58b-030b96399584", | ||||
"ssh":"user@example.com", | ||||
"location":"192.168.1.123" | ||||
} | ||||
This will be specified if you give the ``--enginessh=use@example.com`` argument when | ||||
starting :command:`ipcontroller`. | ||||
Or you can specify an ssh server on the command-line when starting an engine:: | ||||
$> ipengine --profile=foo --ssh=my.login.node | ||||
For example, if your system is totally restricted, then all connections will actually be | ||||
loopback, and ssh tunnels will be used to connect engines to the controller:: | ||||
[node1] $> ipcontroller --enginessh=node1 | ||||
[node2] $> ipengine | ||||
[node3] $> ipcluster engines --n=4 | ||||
Or if you want to start many engines on each node, the command `ipcluster engines --n=4` | ||||
without any configuration is equivalent to running ipengine 4 times. | ||||
MinRK
|
r5487 | An example using ipcontroller/engine with ssh | ||
--------------------------------------------- | ||||
No configuration files are necessary to use ipcontroller/engine in an SSH environment | ||||
without a shared filesystem. You simply need to make sure that the controller is listening | ||||
on an interface visible to the engines, and move the connection file from the controller to | ||||
the engines. | ||||
1. start the controller, listening on an ip-address visible to the engine machines:: | ||||
[controller.host] $ ipcontroller --ip=192.168.1.16 | ||||
[IPControllerApp] Using existing profile dir: u'/Users/me/.ipython/profile_default' | ||||
[IPControllerApp] Hub listening on tcp://192.168.1.16:63320 for registration. | ||||
[IPControllerApp] Hub using DB backend: 'IPython.parallel.controller.dictdb.DictDB' | ||||
[IPControllerApp] hub::created hub | ||||
[IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-client.json | ||||
[IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-engine.json | ||||
[IPControllerApp] task::using Python leastload Task scheduler | ||||
[IPControllerApp] Heartmonitor started | ||||
[IPControllerApp] Creating pid file: /Users/me/.ipython/profile_default/pid/ipcontroller.pid | ||||
Scheduler started [leastload] | ||||
2. on each engine, fetch the connection file with scp:: | ||||
[engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ./ | ||||
.. note:: | ||||
The log output of ipcontroller above shows you where the json files were written. | ||||
They will be in :file:`~/.ipython` (or :file:`~/.config/ipython`) under | ||||
:file:`profile_default/security/ipcontroller-engine.json` | ||||
3. start the engines, using the connection file:: | ||||
[engine.host.n] $ ipengine --file=./ipcontroller-engine.json | ||||
A couple of notes: | ||||
* You can avoid having to fetch the connection file every time by adding ``--reuse`` flag | ||||
to ipcontroller, which instructs the controller to read the previous connection file for | ||||
connection info, rather than generate a new one with randomized ports. | ||||
* In step 2, if you fetch the connection file directly into the security dir of a profile, | ||||
then you need not specify its path directly, only the profile (assumes the path exists, | ||||
otherwise you must create it first):: | ||||
[engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ~/.ipython/profile_ssh/security/ | ||||
[engine.host.n] $ ipengine --profile=ssh | ||||
Of course, if you fetch the file into the default profile, no arguments must be passed to | ||||
ipengine at all. | ||||
* Note that ipengine *did not* specify the ip argument. In general, it is unlikely for any | ||||
connection information to be specified at the command-line to ipengine, as all of this | ||||
information should be contained in the connection file written by ipcontroller. | ||||
MinRK
|
r4588 | |||
MinRK
|
r3617 | Make JSON files persistent | ||
MinRK
|
r3647 | -------------------------- | ||
MinRK
|
r3586 | |||
MinRK
|
r3617 | At fist glance it may seem that that managing the JSON files is a bit | ||
annoying. Going back to the house and key analogy, copying the JSON around | ||||
MinRK
|
r3586 | each time you start the controller is like having to make a new key every time | ||
you want to unlock the door and enter your house. As with your house, you want | ||||
MinRK
|
r3617 | to be able to create the key (or JSON file) once, and then simply use it at | ||
MinRK
|
r3586 | any point in the future. | ||
MinRK
|
r3990 | To do this, the only thing you have to do is specify the `--reuse` flag, so that | ||
MinRK
|
r3617 | the connection information in the JSON files remains accurate:: | ||
MinRK
|
r3586 | |||
MinRK
|
r3990 | $ ipcontroller --reuse | ||
MinRK
|
r3586 | |||
MinRK
|
r3617 | Then, just copy the JSON files over the first time and you are set. You can | ||
MinRK
|
r3586 | start and stop the controller and engines any many times as you want in the | ||
MinRK
|
r3647 | future, just make sure to tell the controller to reuse the file. | ||
MinRK
|
r3586 | |||
.. note:: | ||||
You may ask the question: what ports does the controller listen on if you | ||||
don't tell is to use specific ones? The default is to use high random port | ||||
numbers. We do this for two reasons: i) to increase security through | ||||
obscurity and ii) to multiple controllers on a given host to start and | ||||
automatically use different ports. | ||||
Log files | ||||
--------- | ||||
All of the components of IPython have log files associated with them. | ||||
These log files can be extremely useful in debugging problems with | ||||
MinRK
|
r4024 | IPython and can be found in the directory :file:`~/.ipython/profile_<name>/log`. | ||
MinRK
|
r3617 | Sending the log files to us will often help us to debug any problems. | ||
MinRK
|
r3586 | |||
MinRK
|
r3672 | Configuring `ipcontroller` | ||
MinRK
|
r3647 | --------------------------- | ||
MinRK
|
r4090 | The IPython Controller takes its configuration from the file :file:`ipcontroller_config.py` | ||
in the active profile directory. | ||||
MinRK
|
r3663 | Ports and addresses | ||
******************* | ||||
MinRK
|
r3647 | |||
MinRK
|
r4090 | In many cases, you will want to configure the Controller's network identity. By default, | ||
the Controller listens only on loopback, which is the most secure but often impractical. | ||||
To instruct the controller to listen on a specific interface, you can set the | ||||
:attr:`HubFactory.ip` trait. To listen on all interfaces, simply specify: | ||||
.. sourcecode:: python | ||||
c.HubFactory.ip = '*' | ||||
MinRK
|
r4109 | When connecting to a Controller that is listening on loopback or behind a firewall, it may | ||
MinRK
|
r4090 | be necessary to specify an SSH server to use for tunnels, and the external IP of the | ||
Controller. If you specified that the HubFactory listen on loopback, or all interfaces, | ||||
then IPython will try to guess the external IP. If you are on a system with VM network | ||||
devices, or many interfaces, this guess may be incorrect. In these cases, you will want | ||||
to specify the 'location' of the Controller. This is the IP of the machine the Controller | ||||
is on, as seen by the clients, engines, or the SSH server used to tunnel connections. | ||||
For example, to set up a cluster with a Controller on a work node, using ssh tunnels | ||||
through the login node, an example :file:`ipcontroller_config.py` might contain: | ||||
.. sourcecode:: python | ||||
# allow connections on all interfaces from engines | ||||
# engines on the same node will use loopback, while engines | ||||
# from other nodes will use an external IP | ||||
c.HubFactory.ip = '*' | ||||
# you typically only need to specify the location when there are extra | ||||
# interfaces that may not be visible to peer nodes (e.g. VM interfaces) | ||||
c.HubFactory.location = '10.0.1.5' | ||||
# or to get an automatic value, try this: | ||||
import socket | ||||
ex_ip = socket.gethostbyname_ex(socket.gethostname())[-1][0] | ||||
c.HubFactory.location = ex_ip | ||||
# now instruct clients to use the login node for SSH tunnels: | ||||
c.HubFactory.ssh_server = 'login.mycluster.net' | ||||
After doing this, your :file:`ipcontroller-client.json` file will look something like this: | ||||
.. this can be Python, despite the fact that it's actually JSON, because it's | ||||
.. still valid Python | ||||
.. sourcecode:: python | ||||
{ | ||||
"url":"tcp:\/\/*:43447", | ||||
"exec_key":"9c7779e4-d08a-4c3b-ba8e-db1f80b562c1", | ||||
"ssh":"login.mycluster.net", | ||||
"location":"10.0.1.5" | ||||
} | ||||
Then this file will be all you need for a client to connect to the controller, tunneling | ||||
SSH connections through login.mycluster.net. | ||||
MinRK
|
r3663 | |||
Database Backend | ||||
**************** | ||||
MinRK
|
r4090 | The Hub stores all messages and results passed between Clients and Engines. | ||
For large and/or long-running clusters, it would be unreasonable to keep all | ||||
of this information in memory. For this reason, we have two database backends: | ||||
[MongoDB]_ via PyMongo_, and SQLite with the stdlib :py:mod:`sqlite`. | ||||
MongoDB is our design target, and the dict-like model it uses has driven our design. As far | ||||
as we are concerned, BSON can be considered essentially the same as JSON, adding support | ||||
for binary data and datetime objects, and any new database backend must support the same | ||||
data types. | ||||
MinRK
|
r3663 | |||
.. seealso:: | ||||
MinRK
|
r4090 | MongoDB `BSON doc <http://www.mongodb.org/display/DOCS/BSON>`_ | ||
To use one of these backends, you must set the :attr:`HubFactory.db_class` trait: | ||||
.. sourcecode:: python | ||||
# for a simple dict-based in-memory implementation, use dictdb | ||||
# This is the default and the fastest, since it doesn't involve the filesystem | ||||
c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.DictDB' | ||||
# To use MongoDB: | ||||
c.HubFactory.db_class = 'IPython.parallel.controller.mongodb.MongoDB' | ||||
# and SQLite: | ||||
c.HubFactory.db_class = 'IPython.parallel.controller.sqlitedb.SQLiteDB' | ||||
MinRK
|
r5892 | |||
# You can use NoDB to disable the database altogether, in case you don't need | ||||
# to reuse tasks or results, and want to keep memory consumption under control. | ||||
c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.NoDB' | ||||
MinRK
|
r4090 | |||
When using the proper databases, you can actually allow for tasks to persist from | ||||
one session to the next by specifying the MongoDB database or SQLite table in | ||||
which tasks are to be stored. The default is to use a table named for the Hub's Session, | ||||
which is a UUID, and thus different every time. | ||||
.. sourcecode:: python | ||||
# To keep persistant task history in MongoDB: | ||||
c.MongoDB.database = 'tasks' | ||||
MinRK
|
r3663 | |||
MinRK
|
r4090 | # and in SQLite: | ||
c.SQLiteDB.table = 'tasks' | ||||
Since MongoDB servers can be running remotely or configured to listen on a particular port, | ||||
you can specify any arguments you may need to the PyMongo `Connection | ||||
<http://api.mongodb.org/python/1.9/api/pymongo/connection.html#pymongo.connection.Connection>`_: | ||||
.. sourcecode:: python | ||||
# positional args to pymongo.Connection | ||||
c.MongoDB.connection_args = [] | ||||
# keyword args to pymongo.Connection | ||||
c.MongoDB.connection_kwargs = {} | ||||
MinRK
|
r5892 | But sometimes you are moving lots of data around quickly, and you don't need | ||
that information to be stored for later access, even by other Clients to this | ||||
same session. For this case, we have a dummy database, which doesn't actually | ||||
store anything. This lets the Hub stay small in memory, at the obvious expense | ||||
of being able to access the information that would have been stored in the | ||||
database (used for task resubmission, requesting results of tasks you didn't | ||||
submit, etc.). To use this backend, simply pass ``--nodb`` to | ||||
:command:`ipcontroller` on the command-line, or specify the :class:`NoDB` class | ||||
in your :file:`ipcontroller_config.py` as described above. | ||||
.. seealso:: | ||||
For more information on the database backends, see the :ref:`db backend reference <parallel_db>`. | ||||
MinRK
|
r4090 | .. _PyMongo: http://api.mongodb.org/python/1.9/ | ||
MinRK
|
r3647 | |||
MinRK
|
r3672 | Configuring `ipengine` | ||
MinRK
|
r3647 | ----------------------- | ||
MinRK
|
r4090 | The IPython Engine takes its configuration from the file :file:`ipengine_config.py` | ||
The Engine itself also has some amount of configuration. Most of this | ||||
has to do with initializing MPI or connecting to the controller. | ||||
To instruct the Engine to initialize with an MPI environment set up by | ||||
mpi4py, add: | ||||
.. sourcecode:: python | ||||
c.MPI.use = 'mpi4py' | ||||
In this case, the Engine will use our default mpi4py init script to set up | ||||
the MPI environment prior to exection. We have default init scripts for | ||||
mpi4py and pytrilinos. If you want to specify your own code to be run | ||||
at the beginning, specify `c.MPI.init_script`. | ||||
You can also specify a file or python command to be run at startup of the | ||||
Engine: | ||||
.. sourcecode:: python | ||||
c.IPEngineApp.startup_script = u'/path/to/my/startup.py' | ||||
c.IPEngineApp.startup_command = 'import numpy, scipy, mpi4py' | ||||
These commands/files will be run again, after each | ||||
It's also useful on systems with shared filesystems to run the engines | ||||
in some scratch directory. This can be set with: | ||||
.. sourcecode:: python | ||||
c.IPEngineApp.work_dir = u'/path/to/scratch/' | ||||
MinRK
|
r3647 | |||
MinRK
|
r4090 | .. [MongoDB] MongoDB database http://www.mongodb.org | ||
MinRK
|
r3586 | |||
MinRK
|
r4090 | .. [PBS] Portable Batch System http://www.openpbs.org | ||
MinRK
|
r3663 | |||
.. [SSH] SSH-Agent http://en.wikipedia.org/wiki/ssh-agent | ||||