##// END OF EJS Templates
use integer division in PBS template docs...
Min RK -
Show More
@@ -1,884 +1,884 b''
1 .. _parallel_process:
1 .. _parallel_process:
2
2
3 ===========================================
3 ===========================================
4 Starting the IPython controller and engines
4 Starting the IPython controller and engines
5 ===========================================
5 ===========================================
6
6
7 To use IPython for parallel computing, you need to start one instance of
7 To use IPython for parallel computing, you need to start one instance of
8 the controller and one or more instances of the engine. The controller
8 the controller and one or more instances of the engine. The controller
9 and each engine can run on different machines or on the same machine.
9 and each engine can run on different machines or on the same machine.
10 Because of this, there are many different possibilities.
10 Because of this, there are many different possibilities.
11
11
12 Broadly speaking, there are two ways of going about starting a controller and engines:
12 Broadly speaking, there are two ways of going about starting a controller and engines:
13
13
14 * In an automated manner using the :command:`ipcluster` command.
14 * In an automated manner using the :command:`ipcluster` command.
15 * In a more manual way using the :command:`ipcontroller` and
15 * In a more manual way using the :command:`ipcontroller` and
16 :command:`ipengine` commands.
16 :command:`ipengine` commands.
17
17
18 This document describes both of these methods. We recommend that new users
18 This document describes both of these methods. We recommend that new users
19 start with the :command:`ipcluster` command as it simplifies many common usage
19 start with the :command:`ipcluster` command as it simplifies many common usage
20 cases.
20 cases.
21
21
22 General considerations
22 General considerations
23 ======================
23 ======================
24
24
25 Before delving into the details about how you can start a controller and
25 Before delving into the details about how you can start a controller and
26 engines using the various methods, we outline some of the general issues that
26 engines using the various methods, we outline some of the general issues that
27 come up when starting the controller and engines. These things come up no
27 come up when starting the controller and engines. These things come up no
28 matter which method you use to start your IPython cluster.
28 matter which method you use to start your IPython cluster.
29
29
30 If you are running engines on multiple machines, you will likely need to instruct the
30 If you are running engines on multiple machines, you will likely need to instruct the
31 controller to listen for connections on an external interface. This can be done by specifying
31 controller to listen for connections on an external interface. This can be done by specifying
32 the ``ip`` argument on the command-line, or the ``HubFactory.ip`` configurable in
32 the ``ip`` argument on the command-line, or the ``HubFactory.ip`` configurable in
33 :file:`ipcontroller_config.py`.
33 :file:`ipcontroller_config.py`.
34
34
35 If your machines are on a trusted network, you can safely instruct the controller to listen
35 If your machines are on a trusted network, you can safely instruct the controller to listen
36 on all interfaces with::
36 on all interfaces with::
37
37
38 $> ipcontroller --ip=*
38 $> ipcontroller --ip=*
39
39
40
40
41 Or you can set the same behavior as the default by adding the following line to your :file:`ipcontroller_config.py`:
41 Or you can set the same behavior as the default by adding the following line to your :file:`ipcontroller_config.py`:
42
42
43 .. sourcecode:: python
43 .. sourcecode:: python
44
44
45 c.HubFactory.ip = '*'
45 c.HubFactory.ip = '*'
46 # c.HubFactory.location = '10.0.1.1'
46 # c.HubFactory.location = '10.0.1.1'
47
47
48
48
49 .. note::
49 .. note::
50
50
51 ``--ip=*`` instructs ZeroMQ to listen on all interfaces,
51 ``--ip=*`` instructs ZeroMQ to listen on all interfaces,
52 but it does not contain the IP needed for engines / clients
52 but it does not contain the IP needed for engines / clients
53 to know where the controller actually is.
53 to know where the controller actually is.
54 This can be specified with ``--location=10.0.0.1``,
54 This can be specified with ``--location=10.0.0.1``,
55 the specific IP address of the controller, as seen from engines and/or clients.
55 the specific IP address of the controller, as seen from engines and/or clients.
56 IPython tries to guess this value by default, but it will not always guess correctly.
56 IPython tries to guess this value by default, but it will not always guess correctly.
57 Check the ``location`` field in your connection files if you are having connection trouble.
57 Check the ``location`` field in your connection files if you are having connection trouble.
58
58
59 .. note::
59 .. note::
60
60
61 Due to the lack of security in ZeroMQ, the controller will only listen for connections on
61 Due to the lack of security in ZeroMQ, the controller will only listen for connections on
62 localhost by default. If you see Timeout errors on engines or clients, then the first
62 localhost by default. If you see Timeout errors on engines or clients, then the first
63 thing you should check is the ip address the controller is listening on, and make sure
63 thing you should check is the ip address the controller is listening on, and make sure
64 that it is visible from the timing out machine.
64 that it is visible from the timing out machine.
65
65
66 .. seealso::
66 .. seealso::
67
67
68 Our `notes <parallel_security>`_ on security in the new parallel computing code.
68 Our `notes <parallel_security>`_ on security in the new parallel computing code.
69
69
70 Let's say that you want to start the controller on ``host0`` and engines on
70 Let's say that you want to start the controller on ``host0`` and engines on
71 hosts ``host1``-``hostn``. The following steps are then required:
71 hosts ``host1``-``hostn``. The following steps are then required:
72
72
73 1. Start the controller on ``host0`` by running :command:`ipcontroller` on
73 1. Start the controller on ``host0`` by running :command:`ipcontroller` on
74 ``host0``. The controller must be instructed to listen on an interface visible
74 ``host0``. The controller must be instructed to listen on an interface visible
75 to the engine machines, via the ``ip`` command-line argument or ``HubFactory.ip``
75 to the engine machines, via the ``ip`` command-line argument or ``HubFactory.ip``
76 in :file:`ipcontroller_config.py`.
76 in :file:`ipcontroller_config.py`.
77 2. Move the JSON file (:file:`ipcontroller-engine.json`) created by the
77 2. Move the JSON file (:file:`ipcontroller-engine.json`) created by the
78 controller from ``host0`` to hosts ``host1``-``hostn``.
78 controller from ``host0`` to hosts ``host1``-``hostn``.
79 3. Start the engines on hosts ``host1``-``hostn`` by running
79 3. Start the engines on hosts ``host1``-``hostn`` by running
80 :command:`ipengine`. This command has to be told where the JSON file
80 :command:`ipengine`. This command has to be told where the JSON file
81 (:file:`ipcontroller-engine.json`) is located.
81 (:file:`ipcontroller-engine.json`) is located.
82
82
83 At this point, the controller and engines will be connected. By default, the JSON files
83 At this point, the controller and engines will be connected. By default, the JSON files
84 created by the controller are put into the :file:`IPYTHONDIR/profile_default/security`
84 created by the controller are put into the :file:`IPYTHONDIR/profile_default/security`
85 directory. If the engines share a filesystem with the controller, step 2 can be skipped as
85 directory. If the engines share a filesystem with the controller, step 2 can be skipped as
86 the engines will automatically look at that location.
86 the engines will automatically look at that location.
87
87
88 The final step required to actually use the running controller from a client is to move
88 The final step required to actually use the running controller from a client is to move
89 the JSON file :file:`ipcontroller-client.json` from ``host0`` to any host where clients
89 the JSON file :file:`ipcontroller-client.json` from ``host0`` to any host where clients
90 will be run. If these file are put into the :file:`IPYTHONDIR/profile_default/security`
90 will be run. If these file are put into the :file:`IPYTHONDIR/profile_default/security`
91 directory of the client's host, they will be found automatically. Otherwise, the full path
91 directory of the client's host, they will be found automatically. Otherwise, the full path
92 to them has to be passed to the client's constructor.
92 to them has to be passed to the client's constructor.
93
93
94 Using :command:`ipcluster`
94 Using :command:`ipcluster`
95 ===========================
95 ===========================
96
96
97 The :command:`ipcluster` command provides a simple way of starting a
97 The :command:`ipcluster` command provides a simple way of starting a
98 controller and engines in the following situations:
98 controller and engines in the following situations:
99
99
100 1. When the controller and engines are all run on localhost. This is useful
100 1. When the controller and engines are all run on localhost. This is useful
101 for testing or running on a multicore computer.
101 for testing or running on a multicore computer.
102 2. When engines are started using the :command:`mpiexec` command that comes
102 2. When engines are started using the :command:`mpiexec` command that comes
103 with most MPI [MPI]_ implementations
103 with most MPI [MPI]_ implementations
104 3. When engines are started using the PBS [PBS]_ batch system
104 3. When engines are started using the PBS [PBS]_ batch system
105 (or other `qsub` systems, such as SGE).
105 (or other `qsub` systems, such as SGE).
106 4. When the controller is started on localhost and the engines are started on
106 4. When the controller is started on localhost and the engines are started on
107 remote nodes using :command:`ssh`.
107 remote nodes using :command:`ssh`.
108 5. When engines are started using the Windows HPC Server batch system.
108 5. When engines are started using the Windows HPC Server batch system.
109
109
110 .. note::
110 .. note::
111
111
112 Currently :command:`ipcluster` requires that the
112 Currently :command:`ipcluster` requires that the
113 :file:`IPYTHONDIR/profile_<name>/security` directory live on a shared filesystem that is
113 :file:`IPYTHONDIR/profile_<name>/security` directory live on a shared filesystem that is
114 seen by both the controller and engines. If you don't have a shared file
114 seen by both the controller and engines. If you don't have a shared file
115 system you will need to use :command:`ipcontroller` and
115 system you will need to use :command:`ipcontroller` and
116 :command:`ipengine` directly.
116 :command:`ipengine` directly.
117
117
118 Under the hood, :command:`ipcluster` just uses :command:`ipcontroller`
118 Under the hood, :command:`ipcluster` just uses :command:`ipcontroller`
119 and :command:`ipengine` to perform the steps described above.
119 and :command:`ipengine` to perform the steps described above.
120
120
121 The simplest way to use ipcluster requires no configuration, and will
121 The simplest way to use ipcluster requires no configuration, and will
122 launch a controller and a number of engines on the local machine. For instance,
122 launch a controller and a number of engines on the local machine. For instance,
123 to start one controller and 4 engines on localhost, just do::
123 to start one controller and 4 engines on localhost, just do::
124
124
125 $ ipcluster start -n 4
125 $ ipcluster start -n 4
126
126
127 To see other command line options, do::
127 To see other command line options, do::
128
128
129 $ ipcluster -h
129 $ ipcluster -h
130
130
131
131
132 Configuring an IPython cluster
132 Configuring an IPython cluster
133 ==============================
133 ==============================
134
134
135 Cluster configurations are stored as `profiles`. You can create a new profile with::
135 Cluster configurations are stored as `profiles`. You can create a new profile with::
136
136
137 $ ipython profile create --parallel --profile=myprofile
137 $ ipython profile create --parallel --profile=myprofile
138
138
139 This will create the directory :file:`IPYTHONDIR/profile_myprofile`, and populate it
139 This will create the directory :file:`IPYTHONDIR/profile_myprofile`, and populate it
140 with the default configuration files for the three IPython cluster commands. Once
140 with the default configuration files for the three IPython cluster commands. Once
141 you edit those files, you can continue to call ipcluster/ipcontroller/ipengine
141 you edit those files, you can continue to call ipcluster/ipcontroller/ipengine
142 with no arguments beyond ``profile=myprofile``, and any configuration will be maintained.
142 with no arguments beyond ``profile=myprofile``, and any configuration will be maintained.
143
143
144 There is no limit to the number of profiles you can have, so you can maintain a profile for each
144 There is no limit to the number of profiles you can have, so you can maintain a profile for each
145 of your common use cases. The default profile will be used whenever the
145 of your common use cases. The default profile will be used whenever the
146 profile argument is not specified, so edit :file:`IPYTHONDIR/profile_default/*_config.py` to
146 profile argument is not specified, so edit :file:`IPYTHONDIR/profile_default/*_config.py` to
147 represent your most common use case.
147 represent your most common use case.
148
148
149 The configuration files are loaded with commented-out settings and explanations,
149 The configuration files are loaded with commented-out settings and explanations,
150 which should cover most of the available possibilities.
150 which should cover most of the available possibilities.
151
151
152 Using various batch systems with :command:`ipcluster`
152 Using various batch systems with :command:`ipcluster`
153 -----------------------------------------------------
153 -----------------------------------------------------
154
154
155 :command:`ipcluster` has a notion of Launchers that can start controllers
155 :command:`ipcluster` has a notion of Launchers that can start controllers
156 and engines with various remote execution schemes. Currently supported
156 and engines with various remote execution schemes. Currently supported
157 models include :command:`ssh`, :command:`mpiexec`, PBS-style (Torque, SGE, LSF),
157 models include :command:`ssh`, :command:`mpiexec`, PBS-style (Torque, SGE, LSF),
158 and Windows HPC Server.
158 and Windows HPC Server.
159
159
160 In general, these are configured by the :attr:`IPClusterEngines.engine_set_launcher_class`,
160 In general, these are configured by the :attr:`IPClusterEngines.engine_set_launcher_class`,
161 and :attr:`IPClusterStart.controller_launcher_class` configurables, which can be the
161 and :attr:`IPClusterStart.controller_launcher_class` configurables, which can be the
162 fully specified object name (e.g. ``'IPython.parallel.apps.launcher.LocalControllerLauncher'``),
162 fully specified object name (e.g. ``'IPython.parallel.apps.launcher.LocalControllerLauncher'``),
163 but if you are using IPython's builtin launchers, you can specify just the class name,
163 but if you are using IPython's builtin launchers, you can specify just the class name,
164 or even just the prefix e.g:
164 or even just the prefix e.g:
165
165
166 .. sourcecode:: python
166 .. sourcecode:: python
167
167
168 c.IPClusterEngines.engine_launcher_class = 'SSH'
168 c.IPClusterEngines.engine_launcher_class = 'SSH'
169 # equivalent to
169 # equivalent to
170 c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
170 c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
171 # both of which expand to
171 # both of which expand to
172 c.IPClusterEngines.engine_launcher_class = 'IPython.parallel.apps.launcher.SSHEngineSetLauncher'
172 c.IPClusterEngines.engine_launcher_class = 'IPython.parallel.apps.launcher.SSHEngineSetLauncher'
173
173
174 The shortest form being of particular use on the command line, where all you need to do to
174 The shortest form being of particular use on the command line, where all you need to do to
175 get an IPython cluster running with engines started with MPI is:
175 get an IPython cluster running with engines started with MPI is:
176
176
177 .. sourcecode:: bash
177 .. sourcecode:: bash
178
178
179 $> ipcluster start --engines=MPI
179 $> ipcluster start --engines=MPI
180
180
181 Assuming that the default MPI config is sufficient.
181 Assuming that the default MPI config is sufficient.
182
182
183 .. note::
183 .. note::
184
184
185 shortcuts for builtin launcher names were added in 0.12, as was the ``_class`` suffix
185 shortcuts for builtin launcher names were added in 0.12, as was the ``_class`` suffix
186 on the configurable names. If you use the old 0.11 names (e.g. ``engine_set_launcher``),
186 on the configurable names. If you use the old 0.11 names (e.g. ``engine_set_launcher``),
187 they will still work, but you will get a deprecation warning that the name has changed.
187 they will still work, but you will get a deprecation warning that the name has changed.
188
188
189
189
190 .. note::
190 .. note::
191
191
192 The Launchers and configuration are designed in such a way that advanced
192 The Launchers and configuration are designed in such a way that advanced
193 users can subclass and configure them to fit their own system that we
193 users can subclass and configure them to fit their own system that we
194 have not yet supported (such as Condor)
194 have not yet supported (such as Condor)
195
195
196 Using :command:`ipcluster` in mpiexec/mpirun mode
196 Using :command:`ipcluster` in mpiexec/mpirun mode
197 -------------------------------------------------
197 -------------------------------------------------
198
198
199
199
200 The mpiexec/mpirun mode is useful if you:
200 The mpiexec/mpirun mode is useful if you:
201
201
202 1. Have MPI installed.
202 1. Have MPI installed.
203 2. Your systems are configured to use the :command:`mpiexec` or
203 2. Your systems are configured to use the :command:`mpiexec` or
204 :command:`mpirun` commands to start MPI processes.
204 :command:`mpirun` commands to start MPI processes.
205
205
206 If these are satisfied, you can create a new profile::
206 If these are satisfied, you can create a new profile::
207
207
208 $ ipython profile create --parallel --profile=mpi
208 $ ipython profile create --parallel --profile=mpi
209
209
210 and edit the file :file:`IPYTHONDIR/profile_mpi/ipcluster_config.py`.
210 and edit the file :file:`IPYTHONDIR/profile_mpi/ipcluster_config.py`.
211
211
212 There, instruct ipcluster to use the MPI launchers by adding the lines:
212 There, instruct ipcluster to use the MPI launchers by adding the lines:
213
213
214 .. sourcecode:: python
214 .. sourcecode:: python
215
215
216 c.IPClusterEngines.engine_launcher_class = 'MPIEngineSetLauncher'
216 c.IPClusterEngines.engine_launcher_class = 'MPIEngineSetLauncher'
217
217
218 If the default MPI configuration is correct, then you can now start your cluster, with::
218 If the default MPI configuration is correct, then you can now start your cluster, with::
219
219
220 $ ipcluster start -n 4 --profile=mpi
220 $ ipcluster start -n 4 --profile=mpi
221
221
222 This does the following:
222 This does the following:
223
223
224 1. Starts the IPython controller on current host.
224 1. Starts the IPython controller on current host.
225 2. Uses :command:`mpiexec` to start 4 engines.
225 2. Uses :command:`mpiexec` to start 4 engines.
226
226
227 If you have a reason to also start the Controller with mpi, you can specify:
227 If you have a reason to also start the Controller with mpi, you can specify:
228
228
229 .. sourcecode:: python
229 .. sourcecode:: python
230
230
231 c.IPClusterStart.controller_launcher_class = 'MPIControllerLauncher'
231 c.IPClusterStart.controller_launcher_class = 'MPIControllerLauncher'
232
232
233 .. note::
233 .. note::
234
234
235 The Controller *will not* be in the same MPI universe as the engines, so there is not
235 The Controller *will not* be in the same MPI universe as the engines, so there is not
236 much reason to do this unless sysadmins demand it.
236 much reason to do this unless sysadmins demand it.
237
237
238 On newer MPI implementations (such as OpenMPI), this will work even if you
238 On newer MPI implementations (such as OpenMPI), this will work even if you
239 don't make any calls to MPI or call :func:`MPI_Init`. However, older MPI
239 don't make any calls to MPI or call :func:`MPI_Init`. However, older MPI
240 implementations actually require each process to call :func:`MPI_Init` upon
240 implementations actually require each process to call :func:`MPI_Init` upon
241 starting. The easiest way of having this done is to install the mpi4py
241 starting. The easiest way of having this done is to install the mpi4py
242 [mpi4py]_ package and then specify the ``c.MPI.use`` option in :file:`ipengine_config.py`:
242 [mpi4py]_ package and then specify the ``c.MPI.use`` option in :file:`ipengine_config.py`:
243
243
244 .. sourcecode:: python
244 .. sourcecode:: python
245
245
246 c.MPI.use = 'mpi4py'
246 c.MPI.use = 'mpi4py'
247
247
248 Unfortunately, even this won't work for some MPI implementations. If you are
248 Unfortunately, even this won't work for some MPI implementations. If you are
249 having problems with this, you will likely have to use a custom Python
249 having problems with this, you will likely have to use a custom Python
250 executable that itself calls :func:`MPI_Init` at the appropriate time.
250 executable that itself calls :func:`MPI_Init` at the appropriate time.
251 Fortunately, mpi4py comes with such a custom Python executable that is easy to
251 Fortunately, mpi4py comes with such a custom Python executable that is easy to
252 install and use. However, this custom Python executable approach will not work
252 install and use. However, this custom Python executable approach will not work
253 with :command:`ipcluster` currently.
253 with :command:`ipcluster` currently.
254
254
255 More details on using MPI with IPython can be found :ref:`here <parallelmpi>`.
255 More details on using MPI with IPython can be found :ref:`here <parallelmpi>`.
256
256
257
257
258 Using :command:`ipcluster` in PBS mode
258 Using :command:`ipcluster` in PBS mode
259 --------------------------------------
259 --------------------------------------
260
260
261 The PBS mode uses the Portable Batch System (PBS) to start the engines.
261 The PBS mode uses the Portable Batch System (PBS) to start the engines.
262
262
263 As usual, we will start by creating a fresh profile::
263 As usual, we will start by creating a fresh profile::
264
264
265 $ ipython profile create --parallel --profile=pbs
265 $ ipython profile create --parallel --profile=pbs
266
266
267 And in :file:`ipcluster_config.py`, we will select the PBS launchers for the controller
267 And in :file:`ipcluster_config.py`, we will select the PBS launchers for the controller
268 and engines:
268 and engines:
269
269
270 .. sourcecode:: python
270 .. sourcecode:: python
271
271
272 c.IPClusterStart.controller_launcher_class = 'PBSControllerLauncher'
272 c.IPClusterStart.controller_launcher_class = 'PBSControllerLauncher'
273 c.IPClusterEngines.engine_launcher_class = 'PBSEngineSetLauncher'
273 c.IPClusterEngines.engine_launcher_class = 'PBSEngineSetLauncher'
274
274
275 .. note::
275 .. note::
276
276
277 Note that the configurable is IPClusterEngines for the engine launcher, and
277 Note that the configurable is IPClusterEngines for the engine launcher, and
278 IPClusterStart for the controller launcher. This is because the start command is a
278 IPClusterStart for the controller launcher. This is because the start command is a
279 subclass of the engine command, adding a controller launcher. Since it is a subclass,
279 subclass of the engine command, adding a controller launcher. Since it is a subclass,
280 any configuration made in IPClusterEngines is inherited by IPClusterStart unless it is
280 any configuration made in IPClusterEngines is inherited by IPClusterStart unless it is
281 overridden.
281 overridden.
282
282
283 IPython does provide simple default batch templates for PBS and SGE, but you may need
283 IPython does provide simple default batch templates for PBS and SGE, but you may need
284 to specify your own. Here is a sample PBS script template:
284 to specify your own. Here is a sample PBS script template:
285
285
286 .. sourcecode:: bash
286 .. sourcecode:: bash
287
287
288 #PBS -N ipython
288 #PBS -N ipython
289 #PBS -j oe
289 #PBS -j oe
290 #PBS -l walltime=00:10:00
290 #PBS -l walltime=00:10:00
291 #PBS -l nodes={n/4}:ppn=4
291 #PBS -l nodes={n//4}:ppn=4
292 #PBS -q {queue}
292 #PBS -q {queue}
293
293
294 cd $PBS_O_WORKDIR
294 cd $PBS_O_WORKDIR
295 export PATH=$HOME/usr/local/bin
295 export PATH=$HOME/usr/local/bin
296 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
296 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
297 /usr/local/bin/mpiexec -n {n} ipengine --profile-dir={profile_dir}
297 /usr/local/bin/mpiexec -n {n} ipengine --profile-dir={profile_dir}
298
298
299 There are a few important points about this template:
299 There are a few important points about this template:
300
300
301 1. This template will be rendered at runtime using IPython's :class:`EvalFormatter`.
301 1. This template will be rendered at runtime using IPython's :class:`EvalFormatter`.
302 This is simply a subclass of :class:`string.Formatter` that allows simple expressions
302 This is simply a subclass of :class:`string.Formatter` that allows simple expressions
303 on keys.
303 on keys.
304
304
305 2. Instead of putting in the actual number of engines, use the notation
305 2. Instead of putting in the actual number of engines, use the notation
306 ``{n}`` to indicate the number of engines to be started. You can also use
306 ``{n}`` to indicate the number of engines to be started. You can also use
307 expressions like ``{n/4}`` in the template to indicate the number of nodes.
307 expressions like ``{n//4}`` in the template to indicate the number of nodes.
308 There will always be ``{n}`` and ``{profile_dir}`` variables passed to the formatter.
308 There will always be ``{n}`` and ``{profile_dir}`` variables passed to the formatter.
309 These allow the batch system to know how many engines, and where the configuration
309 These allow the batch system to know how many engines, and where the configuration
310 files reside. The same is true for the batch queue, with the template variable
310 files reside. The same is true for the batch queue, with the template variable
311 ``{queue}``.
311 ``{queue}``.
312
312
313 3. Any options to :command:`ipengine` can be given in the batch script
313 3. Any options to :command:`ipengine` can be given in the batch script
314 template, or in :file:`ipengine_config.py`.
314 template, or in :file:`ipengine_config.py`.
315
315
316 4. Depending on the configuration of you system, you may have to set
316 4. Depending on the configuration of you system, you may have to set
317 environment variables in the script template.
317 environment variables in the script template.
318
318
319 The controller template should be similar, but simpler:
319 The controller template should be similar, but simpler:
320
320
321 .. sourcecode:: bash
321 .. sourcecode:: bash
322
322
323 #PBS -N ipython
323 #PBS -N ipython
324 #PBS -j oe
324 #PBS -j oe
325 #PBS -l walltime=00:10:00
325 #PBS -l walltime=00:10:00
326 #PBS -l nodes=1:ppn=4
326 #PBS -l nodes=1:ppn=4
327 #PBS -q {queue}
327 #PBS -q {queue}
328
328
329 cd $PBS_O_WORKDIR
329 cd $PBS_O_WORKDIR
330 export PATH=$HOME/usr/local/bin
330 export PATH=$HOME/usr/local/bin
331 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
331 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
332 ipcontroller --profile-dir={profile_dir}
332 ipcontroller --profile-dir={profile_dir}
333
333
334
334
335 Once you have created these scripts, save them with names like
335 Once you have created these scripts, save them with names like
336 :file:`pbs.engine.template`. Now you can load them into the :file:`ipcluster_config` with:
336 :file:`pbs.engine.template`. Now you can load them into the :file:`ipcluster_config` with:
337
337
338 .. sourcecode:: python
338 .. sourcecode:: python
339
339
340 c.PBSEngineSetLauncher.batch_template_file = "pbs.engine.template"
340 c.PBSEngineSetLauncher.batch_template_file = "pbs.engine.template"
341
341
342 c.PBSControllerLauncher.batch_template_file = "pbs.controller.template"
342 c.PBSControllerLauncher.batch_template_file = "pbs.controller.template"
343
343
344
344
345 Alternately, you can just define the templates as strings inside :file:`ipcluster_config`.
345 Alternately, you can just define the templates as strings inside :file:`ipcluster_config`.
346
346
347 Whether you are using your own templates or our defaults, the extra configurables available are
347 Whether you are using your own templates or our defaults, the extra configurables available are
348 the number of engines to launch (``{n}``, and the batch system queue to which the jobs are to be
348 the number of engines to launch (``{n}``, and the batch system queue to which the jobs are to be
349 submitted (``{queue}``)). These are configurables, and can be specified in
349 submitted (``{queue}``)). These are configurables, and can be specified in
350 :file:`ipcluster_config`:
350 :file:`ipcluster_config`:
351
351
352 .. sourcecode:: python
352 .. sourcecode:: python
353
353
354 c.PBSLauncher.queue = 'veryshort.q'
354 c.PBSLauncher.queue = 'veryshort.q'
355 c.IPClusterEngines.n = 64
355 c.IPClusterEngines.n = 64
356
356
357 Note that assuming you are running PBS on a multi-node cluster, the Controller's default behavior
357 Note that assuming you are running PBS on a multi-node cluster, the Controller's default behavior
358 of listening only on localhost is likely too restrictive. In this case, also assuming the
358 of listening only on localhost is likely too restrictive. In this case, also assuming the
359 nodes are safely behind a firewall, you can simply instruct the Controller to listen for
359 nodes are safely behind a firewall, you can simply instruct the Controller to listen for
360 connections on all its interfaces, by adding in :file:`ipcontroller_config`:
360 connections on all its interfaces, by adding in :file:`ipcontroller_config`:
361
361
362 .. sourcecode:: python
362 .. sourcecode:: python
363
363
364 c.HubFactory.ip = '*'
364 c.HubFactory.ip = '*'
365
365
366 You can now run the cluster with::
366 You can now run the cluster with::
367
367
368 $ ipcluster start --profile=pbs -n 128
368 $ ipcluster start --profile=pbs -n 128
369
369
370 Additional configuration options can be found in the PBS section of :file:`ipcluster_config`.
370 Additional configuration options can be found in the PBS section of :file:`ipcluster_config`.
371
371
372 .. note::
372 .. note::
373
373
374 Due to the flexibility of configuration, the PBS launchers work with simple changes
374 Due to the flexibility of configuration, the PBS launchers work with simple changes
375 to the template for other :command:`qsub`-using systems, such as Sun Grid Engine,
375 to the template for other :command:`qsub`-using systems, such as Sun Grid Engine,
376 and with further configuration in similar batch systems like Condor.
376 and with further configuration in similar batch systems like Condor.
377
377
378
378
379 Using :command:`ipcluster` in SSH mode
379 Using :command:`ipcluster` in SSH mode
380 --------------------------------------
380 --------------------------------------
381
381
382
382
383 The SSH mode uses :command:`ssh` to execute :command:`ipengine` on remote
383 The SSH mode uses :command:`ssh` to execute :command:`ipengine` on remote
384 nodes and :command:`ipcontroller` can be run remotely as well, or on localhost.
384 nodes and :command:`ipcontroller` can be run remotely as well, or on localhost.
385
385
386 .. note::
386 .. note::
387
387
388 When using this mode it highly recommended that you have set up SSH keys
388 When using this mode it highly recommended that you have set up SSH keys
389 and are using ssh-agent [SSH]_ for password-less logins.
389 and are using ssh-agent [SSH]_ for password-less logins.
390
390
391 As usual, we start by creating a clean profile::
391 As usual, we start by creating a clean profile::
392
392
393 $ ipython profile create --parallel --profile=ssh
393 $ ipython profile create --parallel --profile=ssh
394
394
395 To use this mode, select the SSH launchers in :file:`ipcluster_config.py`:
395 To use this mode, select the SSH launchers in :file:`ipcluster_config.py`:
396
396
397 .. sourcecode:: python
397 .. sourcecode:: python
398
398
399 c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
399 c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
400 # and if the Controller is also to be remote:
400 # and if the Controller is also to be remote:
401 c.IPClusterStart.controller_launcher_class = 'SSHControllerLauncher'
401 c.IPClusterStart.controller_launcher_class = 'SSHControllerLauncher'
402
402
403
403
404
404
405 The controller's remote location and configuration can be specified:
405 The controller's remote location and configuration can be specified:
406
406
407 .. sourcecode:: python
407 .. sourcecode:: python
408
408
409 # Set the user and hostname for the controller
409 # Set the user and hostname for the controller
410 # c.SSHControllerLauncher.hostname = 'controller.example.com'
410 # c.SSHControllerLauncher.hostname = 'controller.example.com'
411 # c.SSHControllerLauncher.user = os.environ.get('USER','username')
411 # c.SSHControllerLauncher.user = os.environ.get('USER','username')
412
412
413 # Set the arguments to be passed to ipcontroller
413 # Set the arguments to be passed to ipcontroller
414 # note that remotely launched ipcontroller will not get the contents of
414 # note that remotely launched ipcontroller will not get the contents of
415 # the local ipcontroller_config.py unless it resides on the *remote host*
415 # the local ipcontroller_config.py unless it resides on the *remote host*
416 # in the location specified by the `profile-dir` argument.
416 # in the location specified by the `profile-dir` argument.
417 # c.SSHControllerLauncher.controller_args = ['--reuse', '--ip=*', '--profile-dir=/path/to/cd']
417 # c.SSHControllerLauncher.controller_args = ['--reuse', '--ip=*', '--profile-dir=/path/to/cd']
418
418
419 Engines are specified in a dictionary, by hostname and the number of engines to be run
419 Engines are specified in a dictionary, by hostname and the number of engines to be run
420 on that host.
420 on that host.
421
421
422 .. sourcecode:: python
422 .. sourcecode:: python
423
423
424 c.SSHEngineSetLauncher.engines = { 'host1.example.com' : 2,
424 c.SSHEngineSetLauncher.engines = { 'host1.example.com' : 2,
425 'host2.example.com' : 5,
425 'host2.example.com' : 5,
426 'host3.example.com' : (1, ['--profile-dir=/home/different/location']),
426 'host3.example.com' : (1, ['--profile-dir=/home/different/location']),
427 'host4.example.com' : 8 }
427 'host4.example.com' : 8 }
428
428
429 * The `engines` dict, where the keys are the host we want to run engines on and
429 * The `engines` dict, where the keys are the host we want to run engines on and
430 the value is the number of engines to run on that host.
430 the value is the number of engines to run on that host.
431 * on host3, the value is a tuple, where the number of engines is first, and the arguments
431 * on host3, the value is a tuple, where the number of engines is first, and the arguments
432 to be passed to :command:`ipengine` are the second element.
432 to be passed to :command:`ipengine` are the second element.
433
433
434 For engines without explicitly specified arguments, the default arguments are set in
434 For engines without explicitly specified arguments, the default arguments are set in
435 a single location:
435 a single location:
436
436
437 .. sourcecode:: python
437 .. sourcecode:: python
438
438
439 c.SSHEngineSetLauncher.engine_args = ['--profile-dir=/path/to/profile_ssh']
439 c.SSHEngineSetLauncher.engine_args = ['--profile-dir=/path/to/profile_ssh']
440
440
441 Current limitations of the SSH mode of :command:`ipcluster` are:
441 Current limitations of the SSH mode of :command:`ipcluster` are:
442
442
443 * Untested and unsupported on Windows. Would require a working :command:`ssh` on Windows.
443 * Untested and unsupported on Windows. Would require a working :command:`ssh` on Windows.
444 Also, we are using shell scripts to setup and execute commands on remote hosts.
444 Also, we are using shell scripts to setup and execute commands on remote hosts.
445
445
446
446
447 Moving files with SSH
447 Moving files with SSH
448 *********************
448 *********************
449
449
450 SSH launchers will try to move connection files, controlled by the ``to_send`` and
450 SSH launchers will try to move connection files, controlled by the ``to_send`` and
451 ``to_fetch`` configurables. If your machines are on a shared filesystem, this step is
451 ``to_fetch`` configurables. If your machines are on a shared filesystem, this step is
452 unnecessary, and can be skipped by setting these to empty lists:
452 unnecessary, and can be skipped by setting these to empty lists:
453
453
454 .. sourcecode:: python
454 .. sourcecode:: python
455
455
456 c.SSHLauncher.to_send = []
456 c.SSHLauncher.to_send = []
457 c.SSHLauncher.to_fetch = []
457 c.SSHLauncher.to_fetch = []
458
458
459 If our default guesses about paths don't work for you, or other files
459 If our default guesses about paths don't work for you, or other files
460 should be moved, you can manually specify these lists as tuples of (local_path,
460 should be moved, you can manually specify these lists as tuples of (local_path,
461 remote_path) for to_send, and (remote_path, local_path) for to_fetch. If you do
461 remote_path) for to_send, and (remote_path, local_path) for to_fetch. If you do
462 specify these lists explicitly, IPython *will not* automatically send connection files,
462 specify these lists explicitly, IPython *will not* automatically send connection files,
463 so you must include this yourself if they should still be sent/retrieved.
463 so you must include this yourself if they should still be sent/retrieved.
464
464
465
465
466 IPython on EC2 with StarCluster
466 IPython on EC2 with StarCluster
467 ===============================
467 ===============================
468
468
469 The excellent StarCluster_ toolkit for managing `Amazon EC2`_ clusters has a plugin
469 The excellent StarCluster_ toolkit for managing `Amazon EC2`_ clusters has a plugin
470 which makes deploying IPython on EC2 quite simple. The starcluster plugin uses
470 which makes deploying IPython on EC2 quite simple. The starcluster plugin uses
471 :command:`ipcluster` with the SGE launchers to distribute engines across the
471 :command:`ipcluster` with the SGE launchers to distribute engines across the
472 EC2 cluster. See their `ipcluster plugin documentation`_ for more information.
472 EC2 cluster. See their `ipcluster plugin documentation`_ for more information.
473
473
474 .. _StarCluster: http://web.mit.edu/starcluster
474 .. _StarCluster: http://web.mit.edu/starcluster
475 .. _Amazon EC2: http://aws.amazon.com/ec2/
475 .. _Amazon EC2: http://aws.amazon.com/ec2/
476 .. _ipcluster plugin documentation: http://web.mit.edu/starcluster/docs/latest/plugins/ipython.html
476 .. _ipcluster plugin documentation: http://web.mit.edu/starcluster/docs/latest/plugins/ipython.html
477
477
478
478
479 Using the :command:`ipcontroller` and :command:`ipengine` commands
479 Using the :command:`ipcontroller` and :command:`ipengine` commands
480 ==================================================================
480 ==================================================================
481
481
482 It is also possible to use the :command:`ipcontroller` and :command:`ipengine`
482 It is also possible to use the :command:`ipcontroller` and :command:`ipengine`
483 commands to start your controller and engines. This approach gives you full
483 commands to start your controller and engines. This approach gives you full
484 control over all aspects of the startup process.
484 control over all aspects of the startup process.
485
485
486 Starting the controller and engine on your local machine
486 Starting the controller and engine on your local machine
487 --------------------------------------------------------
487 --------------------------------------------------------
488
488
489 To use :command:`ipcontroller` and :command:`ipengine` to start things on your
489 To use :command:`ipcontroller` and :command:`ipengine` to start things on your
490 local machine, do the following.
490 local machine, do the following.
491
491
492 First start the controller::
492 First start the controller::
493
493
494 $ ipcontroller
494 $ ipcontroller
495
495
496 Next, start however many instances of the engine you want using (repeatedly)
496 Next, start however many instances of the engine you want using (repeatedly)
497 the command::
497 the command::
498
498
499 $ ipengine
499 $ ipengine
500
500
501 The engines should start and automatically connect to the controller using the
501 The engines should start and automatically connect to the controller using the
502 JSON files in :file:`IPYTHONDIR/profile_default/security`. You are now ready to use the
502 JSON files in :file:`IPYTHONDIR/profile_default/security`. You are now ready to use the
503 controller and engines from IPython.
503 controller and engines from IPython.
504
504
505 .. warning::
505 .. warning::
506
506
507 The order of the above operations may be important. You *must*
507 The order of the above operations may be important. You *must*
508 start the controller before the engines, unless you are reusing connection
508 start the controller before the engines, unless you are reusing connection
509 information (via ``--reuse``), in which case ordering is not important.
509 information (via ``--reuse``), in which case ordering is not important.
510
510
511 .. note::
511 .. note::
512
512
513 On some platforms (OS X), to put the controller and engine into the
513 On some platforms (OS X), to put the controller and engine into the
514 background you may need to give these commands in the form ``(ipcontroller
514 background you may need to give these commands in the form ``(ipcontroller
515 &)`` and ``(ipengine &)`` (with the parentheses) for them to work
515 &)`` and ``(ipengine &)`` (with the parentheses) for them to work
516 properly.
516 properly.
517
517
518 Starting the controller and engines on different hosts
518 Starting the controller and engines on different hosts
519 ------------------------------------------------------
519 ------------------------------------------------------
520
520
521 When the controller and engines are running on different hosts, things are
521 When the controller and engines are running on different hosts, things are
522 slightly more complicated, but the underlying ideas are the same:
522 slightly more complicated, but the underlying ideas are the same:
523
523
524 1. Start the controller on a host using :command:`ipcontroller`. The controller must be
524 1. Start the controller on a host using :command:`ipcontroller`. The controller must be
525 instructed to listen on an interface visible to the engine machines, via the ``ip``
525 instructed to listen on an interface visible to the engine machines, via the ``ip``
526 command-line argument or ``HubFactory.ip`` in :file:`ipcontroller_config.py`::
526 command-line argument or ``HubFactory.ip`` in :file:`ipcontroller_config.py`::
527
527
528 $ ipcontroller --ip=192.168.1.16
528 $ ipcontroller --ip=192.168.1.16
529
529
530 .. sourcecode:: python
530 .. sourcecode:: python
531
531
532 # in ipcontroller_config.py
532 # in ipcontroller_config.py
533 HubFactory.ip = '192.168.1.16'
533 HubFactory.ip = '192.168.1.16'
534
534
535 2. Copy :file:`ipcontroller-engine.json` from :file:`IPYTHONDIR/profile_<name>/security` on
535 2. Copy :file:`ipcontroller-engine.json` from :file:`IPYTHONDIR/profile_<name>/security` on
536 the controller's host to the host where the engines will run.
536 the controller's host to the host where the engines will run.
537 3. Use :command:`ipengine` on the engine's hosts to start the engines.
537 3. Use :command:`ipengine` on the engine's hosts to start the engines.
538
538
539 The only thing you have to be careful of is to tell :command:`ipengine` where
539 The only thing you have to be careful of is to tell :command:`ipengine` where
540 the :file:`ipcontroller-engine.json` file is located. There are two ways you
540 the :file:`ipcontroller-engine.json` file is located. There are two ways you
541 can do this:
541 can do this:
542
542
543 * Put :file:`ipcontroller-engine.json` in the :file:`IPYTHONDIR/profile_<name>/security`
543 * Put :file:`ipcontroller-engine.json` in the :file:`IPYTHONDIR/profile_<name>/security`
544 directory on the engine's host, where it will be found automatically.
544 directory on the engine's host, where it will be found automatically.
545 * Call :command:`ipengine` with the ``--file=full_path_to_the_file``
545 * Call :command:`ipengine` with the ``--file=full_path_to_the_file``
546 flag.
546 flag.
547
547
548 The ``file`` flag works like this::
548 The ``file`` flag works like this::
549
549
550 $ ipengine --file=/path/to/my/ipcontroller-engine.json
550 $ ipengine --file=/path/to/my/ipcontroller-engine.json
551
551
552 .. note::
552 .. note::
553
553
554 If the controller's and engine's hosts all have a shared file system
554 If the controller's and engine's hosts all have a shared file system
555 (:file:`IPYTHONDIR/profile_<name>/security` is the same on all of them), then things
555 (:file:`IPYTHONDIR/profile_<name>/security` is the same on all of them), then things
556 will just work!
556 will just work!
557
557
558 SSH Tunnels
558 SSH Tunnels
559 ***********
559 ***********
560
560
561 If your engines are not on the same LAN as the controller, or you are on a highly
561 If your engines are not on the same LAN as the controller, or you are on a highly
562 restricted network where your nodes cannot see each others ports, then you can
562 restricted network where your nodes cannot see each others ports, then you can
563 use SSH tunnels to connect engines to the controller.
563 use SSH tunnels to connect engines to the controller.
564
564
565 .. note::
565 .. note::
566
566
567 This does not work in all cases. Manual tunnels may be an option, but are
567 This does not work in all cases. Manual tunnels may be an option, but are
568 highly inconvenient. Support for manual tunnels will be improved.
568 highly inconvenient. Support for manual tunnels will be improved.
569
569
570 You can instruct all engines to use ssh, by specifying the ssh server in
570 You can instruct all engines to use ssh, by specifying the ssh server in
571 :file:`ipcontroller-engine.json`:
571 :file:`ipcontroller-engine.json`:
572
572
573 .. I know this is really JSON, but the example is a subset of Python:
573 .. I know this is really JSON, but the example is a subset of Python:
574 .. sourcecode:: python
574 .. sourcecode:: python
575
575
576 {
576 {
577 "url":"tcp://192.168.1.123:56951",
577 "url":"tcp://192.168.1.123:56951",
578 "exec_key":"26f4c040-587d-4a4e-b58b-030b96399584",
578 "exec_key":"26f4c040-587d-4a4e-b58b-030b96399584",
579 "ssh":"user@example.com",
579 "ssh":"user@example.com",
580 "location":"192.168.1.123"
580 "location":"192.168.1.123"
581 }
581 }
582
582
583 This will be specified if you give the ``--enginessh=use@example.com`` argument when
583 This will be specified if you give the ``--enginessh=use@example.com`` argument when
584 starting :command:`ipcontroller`.
584 starting :command:`ipcontroller`.
585
585
586 Or you can specify an ssh server on the command-line when starting an engine::
586 Or you can specify an ssh server on the command-line when starting an engine::
587
587
588 $> ipengine --profile=foo --ssh=my.login.node
588 $> ipengine --profile=foo --ssh=my.login.node
589
589
590 For example, if your system is totally restricted, then all connections will actually be
590 For example, if your system is totally restricted, then all connections will actually be
591 loopback, and ssh tunnels will be used to connect engines to the controller::
591 loopback, and ssh tunnels will be used to connect engines to the controller::
592
592
593 [node1] $> ipcontroller --enginessh=node1
593 [node1] $> ipcontroller --enginessh=node1
594 [node2] $> ipengine
594 [node2] $> ipengine
595 [node3] $> ipcluster engines --n=4
595 [node3] $> ipcluster engines --n=4
596
596
597 Or if you want to start many engines on each node, the command `ipcluster engines --n=4`
597 Or if you want to start many engines on each node, the command `ipcluster engines --n=4`
598 without any configuration is equivalent to running ipengine 4 times.
598 without any configuration is equivalent to running ipengine 4 times.
599
599
600 An example using ipcontroller/engine with ssh
600 An example using ipcontroller/engine with ssh
601 ---------------------------------------------
601 ---------------------------------------------
602
602
603 No configuration files are necessary to use ipcontroller/engine in an SSH environment
603 No configuration files are necessary to use ipcontroller/engine in an SSH environment
604 without a shared filesystem. You simply need to make sure that the controller is listening
604 without a shared filesystem. You simply need to make sure that the controller is listening
605 on an interface visible to the engines, and move the connection file from the controller to
605 on an interface visible to the engines, and move the connection file from the controller to
606 the engines.
606 the engines.
607
607
608 1. start the controller, listening on an ip-address visible to the engine machines::
608 1. start the controller, listening on an ip-address visible to the engine machines::
609
609
610 [controller.host] $ ipcontroller --ip=192.168.1.16
610 [controller.host] $ ipcontroller --ip=192.168.1.16
611
611
612 [IPControllerApp] Using existing profile dir: u'/Users/me/.ipython/profile_default'
612 [IPControllerApp] Using existing profile dir: u'/Users/me/.ipython/profile_default'
613 [IPControllerApp] Hub listening on tcp://192.168.1.16:63320 for registration.
613 [IPControllerApp] Hub listening on tcp://192.168.1.16:63320 for registration.
614 [IPControllerApp] Hub using DB backend: 'IPython.parallel.controller.dictdb.DictDB'
614 [IPControllerApp] Hub using DB backend: 'IPython.parallel.controller.dictdb.DictDB'
615 [IPControllerApp] hub::created hub
615 [IPControllerApp] hub::created hub
616 [IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-client.json
616 [IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-client.json
617 [IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-engine.json
617 [IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-engine.json
618 [IPControllerApp] task::using Python leastload Task scheduler
618 [IPControllerApp] task::using Python leastload Task scheduler
619 [IPControllerApp] Heartmonitor started
619 [IPControllerApp] Heartmonitor started
620 [IPControllerApp] Creating pid file: /Users/me/.ipython/profile_default/pid/ipcontroller.pid
620 [IPControllerApp] Creating pid file: /Users/me/.ipython/profile_default/pid/ipcontroller.pid
621 Scheduler started [leastload]
621 Scheduler started [leastload]
622
622
623 2. on each engine, fetch the connection file with scp::
623 2. on each engine, fetch the connection file with scp::
624
624
625 [engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ./
625 [engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ./
626
626
627 .. note::
627 .. note::
628
628
629 The log output of ipcontroller above shows you where the json files were written.
629 The log output of ipcontroller above shows you where the json files were written.
630 They will be in :file:`~/.ipython` under
630 They will be in :file:`~/.ipython` under
631 :file:`profile_default/security/ipcontroller-engine.json`
631 :file:`profile_default/security/ipcontroller-engine.json`
632
632
633 3. start the engines, using the connection file::
633 3. start the engines, using the connection file::
634
634
635 [engine.host.n] $ ipengine --file=./ipcontroller-engine.json
635 [engine.host.n] $ ipengine --file=./ipcontroller-engine.json
636
636
637 A couple of notes:
637 A couple of notes:
638
638
639 * You can avoid having to fetch the connection file every time by adding ``--reuse`` flag
639 * You can avoid having to fetch the connection file every time by adding ``--reuse`` flag
640 to ipcontroller, which instructs the controller to read the previous connection file for
640 to ipcontroller, which instructs the controller to read the previous connection file for
641 connection info, rather than generate a new one with randomized ports.
641 connection info, rather than generate a new one with randomized ports.
642
642
643 * In step 2, if you fetch the connection file directly into the security dir of a profile,
643 * In step 2, if you fetch the connection file directly into the security dir of a profile,
644 then you need not specify its path directly, only the profile (assumes the path exists,
644 then you need not specify its path directly, only the profile (assumes the path exists,
645 otherwise you must create it first)::
645 otherwise you must create it first)::
646
646
647 [engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ~/.ipython/profile_ssh/security/
647 [engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ~/.ipython/profile_ssh/security/
648 [engine.host.n] $ ipengine --profile=ssh
648 [engine.host.n] $ ipengine --profile=ssh
649
649
650 Of course, if you fetch the file into the default profile, no arguments must be passed to
650 Of course, if you fetch the file into the default profile, no arguments must be passed to
651 ipengine at all.
651 ipengine at all.
652
652
653 * Note that ipengine *did not* specify the ip argument. In general, it is unlikely for any
653 * Note that ipengine *did not* specify the ip argument. In general, it is unlikely for any
654 connection information to be specified at the command-line to ipengine, as all of this
654 connection information to be specified at the command-line to ipengine, as all of this
655 information should be contained in the connection file written by ipcontroller.
655 information should be contained in the connection file written by ipcontroller.
656
656
657 Make JSON files persistent
657 Make JSON files persistent
658 --------------------------
658 --------------------------
659
659
660 At fist glance it may seem that that managing the JSON files is a bit
660 At fist glance it may seem that that managing the JSON files is a bit
661 annoying. Going back to the house and key analogy, copying the JSON around
661 annoying. Going back to the house and key analogy, copying the JSON around
662 each time you start the controller is like having to make a new key every time
662 each time you start the controller is like having to make a new key every time
663 you want to unlock the door and enter your house. As with your house, you want
663 you want to unlock the door and enter your house. As with your house, you want
664 to be able to create the key (or JSON file) once, and then simply use it at
664 to be able to create the key (or JSON file) once, and then simply use it at
665 any point in the future.
665 any point in the future.
666
666
667 To do this, the only thing you have to do is specify the `--reuse` flag, so that
667 To do this, the only thing you have to do is specify the `--reuse` flag, so that
668 the connection information in the JSON files remains accurate::
668 the connection information in the JSON files remains accurate::
669
669
670 $ ipcontroller --reuse
670 $ ipcontroller --reuse
671
671
672 Then, just copy the JSON files over the first time and you are set. You can
672 Then, just copy the JSON files over the first time and you are set. You can
673 start and stop the controller and engines any many times as you want in the
673 start and stop the controller and engines any many times as you want in the
674 future, just make sure to tell the controller to reuse the file.
674 future, just make sure to tell the controller to reuse the file.
675
675
676 .. note::
676 .. note::
677
677
678 You may ask the question: what ports does the controller listen on if you
678 You may ask the question: what ports does the controller listen on if you
679 don't tell is to use specific ones? The default is to use high random port
679 don't tell is to use specific ones? The default is to use high random port
680 numbers. We do this for two reasons: i) to increase security through
680 numbers. We do this for two reasons: i) to increase security through
681 obscurity and ii) to multiple controllers on a given host to start and
681 obscurity and ii) to multiple controllers on a given host to start and
682 automatically use different ports.
682 automatically use different ports.
683
683
684 Log files
684 Log files
685 ---------
685 ---------
686
686
687 All of the components of IPython have log files associated with them.
687 All of the components of IPython have log files associated with them.
688 These log files can be extremely useful in debugging problems with
688 These log files can be extremely useful in debugging problems with
689 IPython and can be found in the directory :file:`IPYTHONDIR/profile_<name>/log`.
689 IPython and can be found in the directory :file:`IPYTHONDIR/profile_<name>/log`.
690 Sending the log files to us will often help us to debug any problems.
690 Sending the log files to us will often help us to debug any problems.
691
691
692
692
693 Configuring `ipcontroller`
693 Configuring `ipcontroller`
694 ---------------------------
694 ---------------------------
695
695
696 The IPython Controller takes its configuration from the file :file:`ipcontroller_config.py`
696 The IPython Controller takes its configuration from the file :file:`ipcontroller_config.py`
697 in the active profile directory.
697 in the active profile directory.
698
698
699 Ports and addresses
699 Ports and addresses
700 *******************
700 *******************
701
701
702 In many cases, you will want to configure the Controller's network identity. By default,
702 In many cases, you will want to configure the Controller's network identity. By default,
703 the Controller listens only on loopback, which is the most secure but often impractical.
703 the Controller listens only on loopback, which is the most secure but often impractical.
704 To instruct the controller to listen on a specific interface, you can set the
704 To instruct the controller to listen on a specific interface, you can set the
705 :attr:`HubFactory.ip` trait. To listen on all interfaces, simply specify:
705 :attr:`HubFactory.ip` trait. To listen on all interfaces, simply specify:
706
706
707 .. sourcecode:: python
707 .. sourcecode:: python
708
708
709 c.HubFactory.ip = '*'
709 c.HubFactory.ip = '*'
710
710
711 When connecting to a Controller that is listening on loopback or behind a firewall, it may
711 When connecting to a Controller that is listening on loopback or behind a firewall, it may
712 be necessary to specify an SSH server to use for tunnels, and the external IP of the
712 be necessary to specify an SSH server to use for tunnels, and the external IP of the
713 Controller. If you specified that the HubFactory listen on loopback, or all interfaces,
713 Controller. If you specified that the HubFactory listen on loopback, or all interfaces,
714 then IPython will try to guess the external IP. If you are on a system with VM network
714 then IPython will try to guess the external IP. If you are on a system with VM network
715 devices, or many interfaces, this guess may be incorrect. In these cases, you will want
715 devices, or many interfaces, this guess may be incorrect. In these cases, you will want
716 to specify the 'location' of the Controller. This is the IP of the machine the Controller
716 to specify the 'location' of the Controller. This is the IP of the machine the Controller
717 is on, as seen by the clients, engines, or the SSH server used to tunnel connections.
717 is on, as seen by the clients, engines, or the SSH server used to tunnel connections.
718
718
719 For example, to set up a cluster with a Controller on a work node, using ssh tunnels
719 For example, to set up a cluster with a Controller on a work node, using ssh tunnels
720 through the login node, an example :file:`ipcontroller_config.py` might contain:
720 through the login node, an example :file:`ipcontroller_config.py` might contain:
721
721
722 .. sourcecode:: python
722 .. sourcecode:: python
723
723
724 # allow connections on all interfaces from engines
724 # allow connections on all interfaces from engines
725 # engines on the same node will use loopback, while engines
725 # engines on the same node will use loopback, while engines
726 # from other nodes will use an external IP
726 # from other nodes will use an external IP
727 c.HubFactory.ip = '*'
727 c.HubFactory.ip = '*'
728
728
729 # you typically only need to specify the location when there are extra
729 # you typically only need to specify the location when there are extra
730 # interfaces that may not be visible to peer nodes (e.g. VM interfaces)
730 # interfaces that may not be visible to peer nodes (e.g. VM interfaces)
731 c.HubFactory.location = '10.0.1.5'
731 c.HubFactory.location = '10.0.1.5'
732 # or to get an automatic value, try this:
732 # or to get an automatic value, try this:
733 import socket
733 import socket
734 hostname = socket.gethostname()
734 hostname = socket.gethostname()
735 # alternate choices for hostname include `socket.getfqdn()`
735 # alternate choices for hostname include `socket.getfqdn()`
736 # or `socket.gethostname() + '.local'`
736 # or `socket.gethostname() + '.local'`
737
737
738 ex_ip = socket.gethostbyname_ex(hostname)[-1][-1]
738 ex_ip = socket.gethostbyname_ex(hostname)[-1][-1]
739 c.HubFactory.location = ex_ip
739 c.HubFactory.location = ex_ip
740
740
741 # now instruct clients to use the login node for SSH tunnels:
741 # now instruct clients to use the login node for SSH tunnels:
742 c.HubFactory.ssh_server = 'login.mycluster.net'
742 c.HubFactory.ssh_server = 'login.mycluster.net'
743
743
744 After doing this, your :file:`ipcontroller-client.json` file will look something like this:
744 After doing this, your :file:`ipcontroller-client.json` file will look something like this:
745
745
746 .. this can be Python, despite the fact that it's actually JSON, because it's
746 .. this can be Python, despite the fact that it's actually JSON, because it's
747 .. still valid Python
747 .. still valid Python
748
748
749 .. sourcecode:: python
749 .. sourcecode:: python
750
750
751 {
751 {
752 "url":"tcp:\/\/*:43447",
752 "url":"tcp:\/\/*:43447",
753 "exec_key":"9c7779e4-d08a-4c3b-ba8e-db1f80b562c1",
753 "exec_key":"9c7779e4-d08a-4c3b-ba8e-db1f80b562c1",
754 "ssh":"login.mycluster.net",
754 "ssh":"login.mycluster.net",
755 "location":"10.0.1.5"
755 "location":"10.0.1.5"
756 }
756 }
757
757
758 Then this file will be all you need for a client to connect to the controller, tunneling
758 Then this file will be all you need for a client to connect to the controller, tunneling
759 SSH connections through login.mycluster.net.
759 SSH connections through login.mycluster.net.
760
760
761 Database Backend
761 Database Backend
762 ****************
762 ****************
763
763
764 The Hub stores all messages and results passed between Clients and Engines.
764 The Hub stores all messages and results passed between Clients and Engines.
765 For large and/or long-running clusters, it would be unreasonable to keep all
765 For large and/or long-running clusters, it would be unreasonable to keep all
766 of this information in memory. For this reason, we have two database backends:
766 of this information in memory. For this reason, we have two database backends:
767 [MongoDB]_ via PyMongo_, and SQLite with the stdlib :py:mod:`sqlite`.
767 [MongoDB]_ via PyMongo_, and SQLite with the stdlib :py:mod:`sqlite`.
768
768
769 MongoDB is our design target, and the dict-like model it uses has driven our design. As far
769 MongoDB is our design target, and the dict-like model it uses has driven our design. As far
770 as we are concerned, BSON can be considered essentially the same as JSON, adding support
770 as we are concerned, BSON can be considered essentially the same as JSON, adding support
771 for binary data and datetime objects, and any new database backend must support the same
771 for binary data and datetime objects, and any new database backend must support the same
772 data types.
772 data types.
773
773
774 .. seealso::
774 .. seealso::
775
775
776 MongoDB `BSON doc <http://www.mongodb.org/display/DOCS/BSON>`_
776 MongoDB `BSON doc <http://www.mongodb.org/display/DOCS/BSON>`_
777
777
778 To use one of these backends, you must set the :attr:`HubFactory.db_class` trait:
778 To use one of these backends, you must set the :attr:`HubFactory.db_class` trait:
779
779
780 .. sourcecode:: python
780 .. sourcecode:: python
781
781
782 # for a simple dict-based in-memory implementation, use dictdb
782 # for a simple dict-based in-memory implementation, use dictdb
783 # This is the default and the fastest, since it doesn't involve the filesystem
783 # This is the default and the fastest, since it doesn't involve the filesystem
784 c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.DictDB'
784 c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.DictDB'
785
785
786 # To use MongoDB:
786 # To use MongoDB:
787 c.HubFactory.db_class = 'IPython.parallel.controller.mongodb.MongoDB'
787 c.HubFactory.db_class = 'IPython.parallel.controller.mongodb.MongoDB'
788
788
789 # and SQLite:
789 # and SQLite:
790 c.HubFactory.db_class = 'IPython.parallel.controller.sqlitedb.SQLiteDB'
790 c.HubFactory.db_class = 'IPython.parallel.controller.sqlitedb.SQLiteDB'
791
791
792 # You can use NoDB to disable the database altogether, in case you don't need
792 # You can use NoDB to disable the database altogether, in case you don't need
793 # to reuse tasks or results, and want to keep memory consumption under control.
793 # to reuse tasks or results, and want to keep memory consumption under control.
794 c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.NoDB'
794 c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.NoDB'
795
795
796 When using the proper databases, you can actually allow for tasks to persist from
796 When using the proper databases, you can actually allow for tasks to persist from
797 one session to the next by specifying the MongoDB database or SQLite table in
797 one session to the next by specifying the MongoDB database or SQLite table in
798 which tasks are to be stored. The default is to use a table named for the Hub's Session,
798 which tasks are to be stored. The default is to use a table named for the Hub's Session,
799 which is a UUID, and thus different every time.
799 which is a UUID, and thus different every time.
800
800
801 .. sourcecode:: python
801 .. sourcecode:: python
802
802
803 # To keep persistent task history in MongoDB:
803 # To keep persistent task history in MongoDB:
804 c.MongoDB.database = 'tasks'
804 c.MongoDB.database = 'tasks'
805
805
806 # and in SQLite:
806 # and in SQLite:
807 c.SQLiteDB.table = 'tasks'
807 c.SQLiteDB.table = 'tasks'
808
808
809
809
810 Since MongoDB servers can be running remotely or configured to listen on a particular port,
810 Since MongoDB servers can be running remotely or configured to listen on a particular port,
811 you can specify any arguments you may need to the PyMongo `Connection
811 you can specify any arguments you may need to the PyMongo `Connection
812 <http://api.mongodb.org/python/1.9/api/pymongo/connection.html#pymongo.connection.Connection>`_:
812 <http://api.mongodb.org/python/1.9/api/pymongo/connection.html#pymongo.connection.Connection>`_:
813
813
814 .. sourcecode:: python
814 .. sourcecode:: python
815
815
816 # positional args to pymongo.Connection
816 # positional args to pymongo.Connection
817 c.MongoDB.connection_args = []
817 c.MongoDB.connection_args = []
818
818
819 # keyword args to pymongo.Connection
819 # keyword args to pymongo.Connection
820 c.MongoDB.connection_kwargs = {}
820 c.MongoDB.connection_kwargs = {}
821
821
822 But sometimes you are moving lots of data around quickly, and you don't need
822 But sometimes you are moving lots of data around quickly, and you don't need
823 that information to be stored for later access, even by other Clients to this
823 that information to be stored for later access, even by other Clients to this
824 same session. For this case, we have a dummy database, which doesn't actually
824 same session. For this case, we have a dummy database, which doesn't actually
825 store anything. This lets the Hub stay small in memory, at the obvious expense
825 store anything. This lets the Hub stay small in memory, at the obvious expense
826 of being able to access the information that would have been stored in the
826 of being able to access the information that would have been stored in the
827 database (used for task resubmission, requesting results of tasks you didn't
827 database (used for task resubmission, requesting results of tasks you didn't
828 submit, etc.). To use this backend, simply pass ``--nodb`` to
828 submit, etc.). To use this backend, simply pass ``--nodb`` to
829 :command:`ipcontroller` on the command-line, or specify the :class:`NoDB` class
829 :command:`ipcontroller` on the command-line, or specify the :class:`NoDB` class
830 in your :file:`ipcontroller_config.py` as described above.
830 in your :file:`ipcontroller_config.py` as described above.
831
831
832
832
833 .. seealso::
833 .. seealso::
834
834
835 For more information on the database backends, see the :ref:`db backend reference <parallel_db>`.
835 For more information on the database backends, see the :ref:`db backend reference <parallel_db>`.
836
836
837
837
838 .. _PyMongo: http://api.mongodb.org/python/1.9/
838 .. _PyMongo: http://api.mongodb.org/python/1.9/
839
839
840 Configuring `ipengine`
840 Configuring `ipengine`
841 -----------------------
841 -----------------------
842
842
843 The IPython Engine takes its configuration from the file :file:`ipengine_config.py`
843 The IPython Engine takes its configuration from the file :file:`ipengine_config.py`
844
844
845 The Engine itself also has some amount of configuration. Most of this
845 The Engine itself also has some amount of configuration. Most of this
846 has to do with initializing MPI or connecting to the controller.
846 has to do with initializing MPI or connecting to the controller.
847
847
848 To instruct the Engine to initialize with an MPI environment set up by
848 To instruct the Engine to initialize with an MPI environment set up by
849 mpi4py, add:
849 mpi4py, add:
850
850
851 .. sourcecode:: python
851 .. sourcecode:: python
852
852
853 c.MPI.use = 'mpi4py'
853 c.MPI.use = 'mpi4py'
854
854
855 In this case, the Engine will use our default mpi4py init script to set up
855 In this case, the Engine will use our default mpi4py init script to set up
856 the MPI environment prior to execution. We have default init scripts for
856 the MPI environment prior to execution. We have default init scripts for
857 mpi4py and pytrilinos. If you want to specify your own code to be run
857 mpi4py and pytrilinos. If you want to specify your own code to be run
858 at the beginning, specify `c.MPI.init_script`.
858 at the beginning, specify `c.MPI.init_script`.
859
859
860 You can also specify a file or python command to be run at startup of the
860 You can also specify a file or python command to be run at startup of the
861 Engine:
861 Engine:
862
862
863 .. sourcecode:: python
863 .. sourcecode:: python
864
864
865 c.IPEngineApp.startup_script = u'/path/to/my/startup.py'
865 c.IPEngineApp.startup_script = u'/path/to/my/startup.py'
866
866
867 c.IPEngineApp.startup_command = 'import numpy, scipy, mpi4py'
867 c.IPEngineApp.startup_command = 'import numpy, scipy, mpi4py'
868
868
869 These commands/files will be run again, after each
869 These commands/files will be run again, after each
870
870
871 It's also useful on systems with shared filesystems to run the engines
871 It's also useful on systems with shared filesystems to run the engines
872 in some scratch directory. This can be set with:
872 in some scratch directory. This can be set with:
873
873
874 .. sourcecode:: python
874 .. sourcecode:: python
875
875
876 c.IPEngineApp.work_dir = u'/path/to/scratch/'
876 c.IPEngineApp.work_dir = u'/path/to/scratch/'
877
877
878
878
879
879
880 .. [MongoDB] MongoDB database http://www.mongodb.org
880 .. [MongoDB] MongoDB database http://www.mongodb.org
881
881
882 .. [PBS] Portable Batch System http://www.openpbs.org
882 .. [PBS] Portable Batch System http://www.openpbs.org
883
883
884 .. [SSH] SSH-Agent http://en.wikipedia.org/wiki/ssh-agent
884 .. [SSH] SSH-Agent http://en.wikipedia.org/wiki/ssh-agent
General Comments 0
You need to be logged in to leave comments. Login now