##// END OF EJS Templates
use integer division in PBS template docs...
Min RK -
Show More
@@ -1,884 +1,884 b''
1 1 .. _parallel_process:
2 2
3 3 ===========================================
4 4 Starting the IPython controller and engines
5 5 ===========================================
6 6
7 7 To use IPython for parallel computing, you need to start one instance of
8 8 the controller and one or more instances of the engine. The controller
9 9 and each engine can run on different machines or on the same machine.
10 10 Because of this, there are many different possibilities.
11 11
12 12 Broadly speaking, there are two ways of going about starting a controller and engines:
13 13
14 14 * In an automated manner using the :command:`ipcluster` command.
15 15 * In a more manual way using the :command:`ipcontroller` and
16 16 :command:`ipengine` commands.
17 17
18 18 This document describes both of these methods. We recommend that new users
19 19 start with the :command:`ipcluster` command as it simplifies many common usage
20 20 cases.
21 21
22 22 General considerations
23 23 ======================
24 24
25 25 Before delving into the details about how you can start a controller and
26 26 engines using the various methods, we outline some of the general issues that
27 27 come up when starting the controller and engines. These things come up no
28 28 matter which method you use to start your IPython cluster.
29 29
30 30 If you are running engines on multiple machines, you will likely need to instruct the
31 31 controller to listen for connections on an external interface. This can be done by specifying
32 32 the ``ip`` argument on the command-line, or the ``HubFactory.ip`` configurable in
33 33 :file:`ipcontroller_config.py`.
34 34
35 35 If your machines are on a trusted network, you can safely instruct the controller to listen
36 36 on all interfaces with::
37 37
38 38 $> ipcontroller --ip=*
39 39
40 40
41 41 Or you can set the same behavior as the default by adding the following line to your :file:`ipcontroller_config.py`:
42 42
43 43 .. sourcecode:: python
44 44
45 45 c.HubFactory.ip = '*'
46 46 # c.HubFactory.location = '10.0.1.1'
47 47
48 48
49 49 .. note::
50 50
51 51 ``--ip=*`` instructs ZeroMQ to listen on all interfaces,
52 52 but it does not contain the IP needed for engines / clients
53 53 to know where the controller actually is.
54 54 This can be specified with ``--location=10.0.0.1``,
55 55 the specific IP address of the controller, as seen from engines and/or clients.
56 56 IPython tries to guess this value by default, but it will not always guess correctly.
57 57 Check the ``location`` field in your connection files if you are having connection trouble.
58 58
59 59 .. note::
60 60
61 61 Due to the lack of security in ZeroMQ, the controller will only listen for connections on
62 62 localhost by default. If you see Timeout errors on engines or clients, then the first
63 63 thing you should check is the ip address the controller is listening on, and make sure
64 64 that it is visible from the timing out machine.
65 65
66 66 .. seealso::
67 67
68 68 Our `notes <parallel_security>`_ on security in the new parallel computing code.
69 69
70 70 Let's say that you want to start the controller on ``host0`` and engines on
71 71 hosts ``host1``-``hostn``. The following steps are then required:
72 72
73 73 1. Start the controller on ``host0`` by running :command:`ipcontroller` on
74 74 ``host0``. The controller must be instructed to listen on an interface visible
75 75 to the engine machines, via the ``ip`` command-line argument or ``HubFactory.ip``
76 76 in :file:`ipcontroller_config.py`.
77 77 2. Move the JSON file (:file:`ipcontroller-engine.json`) created by the
78 78 controller from ``host0`` to hosts ``host1``-``hostn``.
79 79 3. Start the engines on hosts ``host1``-``hostn`` by running
80 80 :command:`ipengine`. This command has to be told where the JSON file
81 81 (:file:`ipcontroller-engine.json`) is located.
82 82
83 83 At this point, the controller and engines will be connected. By default, the JSON files
84 84 created by the controller are put into the :file:`IPYTHONDIR/profile_default/security`
85 85 directory. If the engines share a filesystem with the controller, step 2 can be skipped as
86 86 the engines will automatically look at that location.
87 87
88 88 The final step required to actually use the running controller from a client is to move
89 89 the JSON file :file:`ipcontroller-client.json` from ``host0`` to any host where clients
90 90 will be run. If these file are put into the :file:`IPYTHONDIR/profile_default/security`
91 91 directory of the client's host, they will be found automatically. Otherwise, the full path
92 92 to them has to be passed to the client's constructor.
93 93
94 94 Using :command:`ipcluster`
95 95 ===========================
96 96
97 97 The :command:`ipcluster` command provides a simple way of starting a
98 98 controller and engines in the following situations:
99 99
100 100 1. When the controller and engines are all run on localhost. This is useful
101 101 for testing or running on a multicore computer.
102 102 2. When engines are started using the :command:`mpiexec` command that comes
103 103 with most MPI [MPI]_ implementations
104 104 3. When engines are started using the PBS [PBS]_ batch system
105 105 (or other `qsub` systems, such as SGE).
106 106 4. When the controller is started on localhost and the engines are started on
107 107 remote nodes using :command:`ssh`.
108 108 5. When engines are started using the Windows HPC Server batch system.
109 109
110 110 .. note::
111 111
112 112 Currently :command:`ipcluster` requires that the
113 113 :file:`IPYTHONDIR/profile_<name>/security` directory live on a shared filesystem that is
114 114 seen by both the controller and engines. If you don't have a shared file
115 115 system you will need to use :command:`ipcontroller` and
116 116 :command:`ipengine` directly.
117 117
118 118 Under the hood, :command:`ipcluster` just uses :command:`ipcontroller`
119 119 and :command:`ipengine` to perform the steps described above.
120 120
121 121 The simplest way to use ipcluster requires no configuration, and will
122 122 launch a controller and a number of engines on the local machine. For instance,
123 123 to start one controller and 4 engines on localhost, just do::
124 124
125 125 $ ipcluster start -n 4
126 126
127 127 To see other command line options, do::
128 128
129 129 $ ipcluster -h
130 130
131 131
132 132 Configuring an IPython cluster
133 133 ==============================
134 134
135 135 Cluster configurations are stored as `profiles`. You can create a new profile with::
136 136
137 137 $ ipython profile create --parallel --profile=myprofile
138 138
139 139 This will create the directory :file:`IPYTHONDIR/profile_myprofile`, and populate it
140 140 with the default configuration files for the three IPython cluster commands. Once
141 141 you edit those files, you can continue to call ipcluster/ipcontroller/ipengine
142 142 with no arguments beyond ``profile=myprofile``, and any configuration will be maintained.
143 143
144 144 There is no limit to the number of profiles you can have, so you can maintain a profile for each
145 145 of your common use cases. The default profile will be used whenever the
146 146 profile argument is not specified, so edit :file:`IPYTHONDIR/profile_default/*_config.py` to
147 147 represent your most common use case.
148 148
149 149 The configuration files are loaded with commented-out settings and explanations,
150 150 which should cover most of the available possibilities.
151 151
152 152 Using various batch systems with :command:`ipcluster`
153 153 -----------------------------------------------------
154 154
155 155 :command:`ipcluster` has a notion of Launchers that can start controllers
156 156 and engines with various remote execution schemes. Currently supported
157 157 models include :command:`ssh`, :command:`mpiexec`, PBS-style (Torque, SGE, LSF),
158 158 and Windows HPC Server.
159 159
160 160 In general, these are configured by the :attr:`IPClusterEngines.engine_set_launcher_class`,
161 161 and :attr:`IPClusterStart.controller_launcher_class` configurables, which can be the
162 162 fully specified object name (e.g. ``'IPython.parallel.apps.launcher.LocalControllerLauncher'``),
163 163 but if you are using IPython's builtin launchers, you can specify just the class name,
164 164 or even just the prefix e.g:
165 165
166 166 .. sourcecode:: python
167 167
168 168 c.IPClusterEngines.engine_launcher_class = 'SSH'
169 169 # equivalent to
170 170 c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
171 171 # both of which expand to
172 172 c.IPClusterEngines.engine_launcher_class = 'IPython.parallel.apps.launcher.SSHEngineSetLauncher'
173 173
174 174 The shortest form being of particular use on the command line, where all you need to do to
175 175 get an IPython cluster running with engines started with MPI is:
176 176
177 177 .. sourcecode:: bash
178 178
179 179 $> ipcluster start --engines=MPI
180 180
181 181 Assuming that the default MPI config is sufficient.
182 182
183 183 .. note::
184 184
185 185 shortcuts for builtin launcher names were added in 0.12, as was the ``_class`` suffix
186 186 on the configurable names. If you use the old 0.11 names (e.g. ``engine_set_launcher``),
187 187 they will still work, but you will get a deprecation warning that the name has changed.
188 188
189 189
190 190 .. note::
191 191
192 192 The Launchers and configuration are designed in such a way that advanced
193 193 users can subclass and configure them to fit their own system that we
194 194 have not yet supported (such as Condor)
195 195
196 196 Using :command:`ipcluster` in mpiexec/mpirun mode
197 197 -------------------------------------------------
198 198
199 199
200 200 The mpiexec/mpirun mode is useful if you:
201 201
202 202 1. Have MPI installed.
203 203 2. Your systems are configured to use the :command:`mpiexec` or
204 204 :command:`mpirun` commands to start MPI processes.
205 205
206 206 If these are satisfied, you can create a new profile::
207 207
208 208 $ ipython profile create --parallel --profile=mpi
209 209
210 210 and edit the file :file:`IPYTHONDIR/profile_mpi/ipcluster_config.py`.
211 211
212 212 There, instruct ipcluster to use the MPI launchers by adding the lines:
213 213
214 214 .. sourcecode:: python
215 215
216 216 c.IPClusterEngines.engine_launcher_class = 'MPIEngineSetLauncher'
217 217
218 218 If the default MPI configuration is correct, then you can now start your cluster, with::
219 219
220 220 $ ipcluster start -n 4 --profile=mpi
221 221
222 222 This does the following:
223 223
224 224 1. Starts the IPython controller on current host.
225 225 2. Uses :command:`mpiexec` to start 4 engines.
226 226
227 227 If you have a reason to also start the Controller with mpi, you can specify:
228 228
229 229 .. sourcecode:: python
230 230
231 231 c.IPClusterStart.controller_launcher_class = 'MPIControllerLauncher'
232 232
233 233 .. note::
234 234
235 235 The Controller *will not* be in the same MPI universe as the engines, so there is not
236 236 much reason to do this unless sysadmins demand it.
237 237
238 238 On newer MPI implementations (such as OpenMPI), this will work even if you
239 239 don't make any calls to MPI or call :func:`MPI_Init`. However, older MPI
240 240 implementations actually require each process to call :func:`MPI_Init` upon
241 241 starting. The easiest way of having this done is to install the mpi4py
242 242 [mpi4py]_ package and then specify the ``c.MPI.use`` option in :file:`ipengine_config.py`:
243 243
244 244 .. sourcecode:: python
245 245
246 246 c.MPI.use = 'mpi4py'
247 247
248 248 Unfortunately, even this won't work for some MPI implementations. If you are
249 249 having problems with this, you will likely have to use a custom Python
250 250 executable that itself calls :func:`MPI_Init` at the appropriate time.
251 251 Fortunately, mpi4py comes with such a custom Python executable that is easy to
252 252 install and use. However, this custom Python executable approach will not work
253 253 with :command:`ipcluster` currently.
254 254
255 255 More details on using MPI with IPython can be found :ref:`here <parallelmpi>`.
256 256
257 257
258 258 Using :command:`ipcluster` in PBS mode
259 259 --------------------------------------
260 260
261 261 The PBS mode uses the Portable Batch System (PBS) to start the engines.
262 262
263 263 As usual, we will start by creating a fresh profile::
264 264
265 265 $ ipython profile create --parallel --profile=pbs
266 266
267 267 And in :file:`ipcluster_config.py`, we will select the PBS launchers for the controller
268 268 and engines:
269 269
270 270 .. sourcecode:: python
271 271
272 272 c.IPClusterStart.controller_launcher_class = 'PBSControllerLauncher'
273 273 c.IPClusterEngines.engine_launcher_class = 'PBSEngineSetLauncher'
274 274
275 275 .. note::
276 276
277 277 Note that the configurable is IPClusterEngines for the engine launcher, and
278 278 IPClusterStart for the controller launcher. This is because the start command is a
279 279 subclass of the engine command, adding a controller launcher. Since it is a subclass,
280 280 any configuration made in IPClusterEngines is inherited by IPClusterStart unless it is
281 281 overridden.
282 282
283 283 IPython does provide simple default batch templates for PBS and SGE, but you may need
284 284 to specify your own. Here is a sample PBS script template:
285 285
286 286 .. sourcecode:: bash
287 287
288 288 #PBS -N ipython
289 289 #PBS -j oe
290 290 #PBS -l walltime=00:10:00
291 #PBS -l nodes={n/4}:ppn=4
291 #PBS -l nodes={n//4}:ppn=4
292 292 #PBS -q {queue}
293 293
294 294 cd $PBS_O_WORKDIR
295 295 export PATH=$HOME/usr/local/bin
296 296 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
297 297 /usr/local/bin/mpiexec -n {n} ipengine --profile-dir={profile_dir}
298 298
299 299 There are a few important points about this template:
300 300
301 301 1. This template will be rendered at runtime using IPython's :class:`EvalFormatter`.
302 302 This is simply a subclass of :class:`string.Formatter` that allows simple expressions
303 303 on keys.
304 304
305 305 2. Instead of putting in the actual number of engines, use the notation
306 306 ``{n}`` to indicate the number of engines to be started. You can also use
307 expressions like ``{n/4}`` in the template to indicate the number of nodes.
307 expressions like ``{n//4}`` in the template to indicate the number of nodes.
308 308 There will always be ``{n}`` and ``{profile_dir}`` variables passed to the formatter.
309 309 These allow the batch system to know how many engines, and where the configuration
310 310 files reside. The same is true for the batch queue, with the template variable
311 311 ``{queue}``.
312 312
313 313 3. Any options to :command:`ipengine` can be given in the batch script
314 314 template, or in :file:`ipengine_config.py`.
315 315
316 316 4. Depending on the configuration of you system, you may have to set
317 317 environment variables in the script template.
318 318
319 319 The controller template should be similar, but simpler:
320 320
321 321 .. sourcecode:: bash
322 322
323 323 #PBS -N ipython
324 324 #PBS -j oe
325 325 #PBS -l walltime=00:10:00
326 326 #PBS -l nodes=1:ppn=4
327 327 #PBS -q {queue}
328 328
329 329 cd $PBS_O_WORKDIR
330 330 export PATH=$HOME/usr/local/bin
331 331 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
332 332 ipcontroller --profile-dir={profile_dir}
333 333
334 334
335 335 Once you have created these scripts, save them with names like
336 336 :file:`pbs.engine.template`. Now you can load them into the :file:`ipcluster_config` with:
337 337
338 338 .. sourcecode:: python
339 339
340 340 c.PBSEngineSetLauncher.batch_template_file = "pbs.engine.template"
341 341
342 342 c.PBSControllerLauncher.batch_template_file = "pbs.controller.template"
343 343
344 344
345 345 Alternately, you can just define the templates as strings inside :file:`ipcluster_config`.
346 346
347 347 Whether you are using your own templates or our defaults, the extra configurables available are
348 348 the number of engines to launch (``{n}``, and the batch system queue to which the jobs are to be
349 349 submitted (``{queue}``)). These are configurables, and can be specified in
350 350 :file:`ipcluster_config`:
351 351
352 352 .. sourcecode:: python
353 353
354 354 c.PBSLauncher.queue = 'veryshort.q'
355 355 c.IPClusterEngines.n = 64
356 356
357 357 Note that assuming you are running PBS on a multi-node cluster, the Controller's default behavior
358 358 of listening only on localhost is likely too restrictive. In this case, also assuming the
359 359 nodes are safely behind a firewall, you can simply instruct the Controller to listen for
360 360 connections on all its interfaces, by adding in :file:`ipcontroller_config`:
361 361
362 362 .. sourcecode:: python
363 363
364 364 c.HubFactory.ip = '*'
365 365
366 366 You can now run the cluster with::
367 367
368 368 $ ipcluster start --profile=pbs -n 128
369 369
370 370 Additional configuration options can be found in the PBS section of :file:`ipcluster_config`.
371 371
372 372 .. note::
373 373
374 374 Due to the flexibility of configuration, the PBS launchers work with simple changes
375 375 to the template for other :command:`qsub`-using systems, such as Sun Grid Engine,
376 376 and with further configuration in similar batch systems like Condor.
377 377
378 378
379 379 Using :command:`ipcluster` in SSH mode
380 380 --------------------------------------
381 381
382 382
383 383 The SSH mode uses :command:`ssh` to execute :command:`ipengine` on remote
384 384 nodes and :command:`ipcontroller` can be run remotely as well, or on localhost.
385 385
386 386 .. note::
387 387
388 388 When using this mode it highly recommended that you have set up SSH keys
389 389 and are using ssh-agent [SSH]_ for password-less logins.
390 390
391 391 As usual, we start by creating a clean profile::
392 392
393 393 $ ipython profile create --parallel --profile=ssh
394 394
395 395 To use this mode, select the SSH launchers in :file:`ipcluster_config.py`:
396 396
397 397 .. sourcecode:: python
398 398
399 399 c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
400 400 # and if the Controller is also to be remote:
401 401 c.IPClusterStart.controller_launcher_class = 'SSHControllerLauncher'
402 402
403 403
404 404
405 405 The controller's remote location and configuration can be specified:
406 406
407 407 .. sourcecode:: python
408 408
409 409 # Set the user and hostname for the controller
410 410 # c.SSHControllerLauncher.hostname = 'controller.example.com'
411 411 # c.SSHControllerLauncher.user = os.environ.get('USER','username')
412 412
413 413 # Set the arguments to be passed to ipcontroller
414 414 # note that remotely launched ipcontroller will not get the contents of
415 415 # the local ipcontroller_config.py unless it resides on the *remote host*
416 416 # in the location specified by the `profile-dir` argument.
417 417 # c.SSHControllerLauncher.controller_args = ['--reuse', '--ip=*', '--profile-dir=/path/to/cd']
418 418
419 419 Engines are specified in a dictionary, by hostname and the number of engines to be run
420 420 on that host.
421 421
422 422 .. sourcecode:: python
423 423
424 424 c.SSHEngineSetLauncher.engines = { 'host1.example.com' : 2,
425 425 'host2.example.com' : 5,
426 426 'host3.example.com' : (1, ['--profile-dir=/home/different/location']),
427 427 'host4.example.com' : 8 }
428 428
429 429 * The `engines` dict, where the keys are the host we want to run engines on and
430 430 the value is the number of engines to run on that host.
431 431 * on host3, the value is a tuple, where the number of engines is first, and the arguments
432 432 to be passed to :command:`ipengine` are the second element.
433 433
434 434 For engines without explicitly specified arguments, the default arguments are set in
435 435 a single location:
436 436
437 437 .. sourcecode:: python
438 438
439 439 c.SSHEngineSetLauncher.engine_args = ['--profile-dir=/path/to/profile_ssh']
440 440
441 441 Current limitations of the SSH mode of :command:`ipcluster` are:
442 442
443 443 * Untested and unsupported on Windows. Would require a working :command:`ssh` on Windows.
444 444 Also, we are using shell scripts to setup and execute commands on remote hosts.
445 445
446 446
447 447 Moving files with SSH
448 448 *********************
449 449
450 450 SSH launchers will try to move connection files, controlled by the ``to_send`` and
451 451 ``to_fetch`` configurables. If your machines are on a shared filesystem, this step is
452 452 unnecessary, and can be skipped by setting these to empty lists:
453 453
454 454 .. sourcecode:: python
455 455
456 456 c.SSHLauncher.to_send = []
457 457 c.SSHLauncher.to_fetch = []
458 458
459 459 If our default guesses about paths don't work for you, or other files
460 460 should be moved, you can manually specify these lists as tuples of (local_path,
461 461 remote_path) for to_send, and (remote_path, local_path) for to_fetch. If you do
462 462 specify these lists explicitly, IPython *will not* automatically send connection files,
463 463 so you must include this yourself if they should still be sent/retrieved.
464 464
465 465
466 466 IPython on EC2 with StarCluster
467 467 ===============================
468 468
469 469 The excellent StarCluster_ toolkit for managing `Amazon EC2`_ clusters has a plugin
470 470 which makes deploying IPython on EC2 quite simple. The starcluster plugin uses
471 471 :command:`ipcluster` with the SGE launchers to distribute engines across the
472 472 EC2 cluster. See their `ipcluster plugin documentation`_ for more information.
473 473
474 474 .. _StarCluster: http://web.mit.edu/starcluster
475 475 .. _Amazon EC2: http://aws.amazon.com/ec2/
476 476 .. _ipcluster plugin documentation: http://web.mit.edu/starcluster/docs/latest/plugins/ipython.html
477 477
478 478
479 479 Using the :command:`ipcontroller` and :command:`ipengine` commands
480 480 ==================================================================
481 481
482 482 It is also possible to use the :command:`ipcontroller` and :command:`ipengine`
483 483 commands to start your controller and engines. This approach gives you full
484 484 control over all aspects of the startup process.
485 485
486 486 Starting the controller and engine on your local machine
487 487 --------------------------------------------------------
488 488
489 489 To use :command:`ipcontroller` and :command:`ipengine` to start things on your
490 490 local machine, do the following.
491 491
492 492 First start the controller::
493 493
494 494 $ ipcontroller
495 495
496 496 Next, start however many instances of the engine you want using (repeatedly)
497 497 the command::
498 498
499 499 $ ipengine
500 500
501 501 The engines should start and automatically connect to the controller using the
502 502 JSON files in :file:`IPYTHONDIR/profile_default/security`. You are now ready to use the
503 503 controller and engines from IPython.
504 504
505 505 .. warning::
506 506
507 507 The order of the above operations may be important. You *must*
508 508 start the controller before the engines, unless you are reusing connection
509 509 information (via ``--reuse``), in which case ordering is not important.
510 510
511 511 .. note::
512 512
513 513 On some platforms (OS X), to put the controller and engine into the
514 514 background you may need to give these commands in the form ``(ipcontroller
515 515 &)`` and ``(ipengine &)`` (with the parentheses) for them to work
516 516 properly.
517 517
518 518 Starting the controller and engines on different hosts
519 519 ------------------------------------------------------
520 520
521 521 When the controller and engines are running on different hosts, things are
522 522 slightly more complicated, but the underlying ideas are the same:
523 523
524 524 1. Start the controller on a host using :command:`ipcontroller`. The controller must be
525 525 instructed to listen on an interface visible to the engine machines, via the ``ip``
526 526 command-line argument or ``HubFactory.ip`` in :file:`ipcontroller_config.py`::
527 527
528 528 $ ipcontroller --ip=192.168.1.16
529 529
530 530 .. sourcecode:: python
531 531
532 532 # in ipcontroller_config.py
533 533 HubFactory.ip = '192.168.1.16'
534 534
535 535 2. Copy :file:`ipcontroller-engine.json` from :file:`IPYTHONDIR/profile_<name>/security` on
536 536 the controller's host to the host where the engines will run.
537 537 3. Use :command:`ipengine` on the engine's hosts to start the engines.
538 538
539 539 The only thing you have to be careful of is to tell :command:`ipengine` where
540 540 the :file:`ipcontroller-engine.json` file is located. There are two ways you
541 541 can do this:
542 542
543 543 * Put :file:`ipcontroller-engine.json` in the :file:`IPYTHONDIR/profile_<name>/security`
544 544 directory on the engine's host, where it will be found automatically.
545 545 * Call :command:`ipengine` with the ``--file=full_path_to_the_file``
546 546 flag.
547 547
548 548 The ``file`` flag works like this::
549 549
550 550 $ ipengine --file=/path/to/my/ipcontroller-engine.json
551 551
552 552 .. note::
553 553
554 554 If the controller's and engine's hosts all have a shared file system
555 555 (:file:`IPYTHONDIR/profile_<name>/security` is the same on all of them), then things
556 556 will just work!
557 557
558 558 SSH Tunnels
559 559 ***********
560 560
561 561 If your engines are not on the same LAN as the controller, or you are on a highly
562 562 restricted network where your nodes cannot see each others ports, then you can
563 563 use SSH tunnels to connect engines to the controller.
564 564
565 565 .. note::
566 566
567 567 This does not work in all cases. Manual tunnels may be an option, but are
568 568 highly inconvenient. Support for manual tunnels will be improved.
569 569
570 570 You can instruct all engines to use ssh, by specifying the ssh server in
571 571 :file:`ipcontroller-engine.json`:
572 572
573 573 .. I know this is really JSON, but the example is a subset of Python:
574 574 .. sourcecode:: python
575 575
576 576 {
577 577 "url":"tcp://192.168.1.123:56951",
578 578 "exec_key":"26f4c040-587d-4a4e-b58b-030b96399584",
579 579 "ssh":"user@example.com",
580 580 "location":"192.168.1.123"
581 581 }
582 582
583 583 This will be specified if you give the ``--enginessh=use@example.com`` argument when
584 584 starting :command:`ipcontroller`.
585 585
586 586 Or you can specify an ssh server on the command-line when starting an engine::
587 587
588 588 $> ipengine --profile=foo --ssh=my.login.node
589 589
590 590 For example, if your system is totally restricted, then all connections will actually be
591 591 loopback, and ssh tunnels will be used to connect engines to the controller::
592 592
593 593 [node1] $> ipcontroller --enginessh=node1
594 594 [node2] $> ipengine
595 595 [node3] $> ipcluster engines --n=4
596 596
597 597 Or if you want to start many engines on each node, the command `ipcluster engines --n=4`
598 598 without any configuration is equivalent to running ipengine 4 times.
599 599
600 600 An example using ipcontroller/engine with ssh
601 601 ---------------------------------------------
602 602
603 603 No configuration files are necessary to use ipcontroller/engine in an SSH environment
604 604 without a shared filesystem. You simply need to make sure that the controller is listening
605 605 on an interface visible to the engines, and move the connection file from the controller to
606 606 the engines.
607 607
608 608 1. start the controller, listening on an ip-address visible to the engine machines::
609 609
610 610 [controller.host] $ ipcontroller --ip=192.168.1.16
611 611
612 612 [IPControllerApp] Using existing profile dir: u'/Users/me/.ipython/profile_default'
613 613 [IPControllerApp] Hub listening on tcp://192.168.1.16:63320 for registration.
614 614 [IPControllerApp] Hub using DB backend: 'IPython.parallel.controller.dictdb.DictDB'
615 615 [IPControllerApp] hub::created hub
616 616 [IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-client.json
617 617 [IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-engine.json
618 618 [IPControllerApp] task::using Python leastload Task scheduler
619 619 [IPControllerApp] Heartmonitor started
620 620 [IPControllerApp] Creating pid file: /Users/me/.ipython/profile_default/pid/ipcontroller.pid
621 621 Scheduler started [leastload]
622 622
623 623 2. on each engine, fetch the connection file with scp::
624 624
625 625 [engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ./
626 626
627 627 .. note::
628 628
629 629 The log output of ipcontroller above shows you where the json files were written.
630 630 They will be in :file:`~/.ipython` under
631 631 :file:`profile_default/security/ipcontroller-engine.json`
632 632
633 633 3. start the engines, using the connection file::
634 634
635 635 [engine.host.n] $ ipengine --file=./ipcontroller-engine.json
636 636
637 637 A couple of notes:
638 638
639 639 * You can avoid having to fetch the connection file every time by adding ``--reuse`` flag
640 640 to ipcontroller, which instructs the controller to read the previous connection file for
641 641 connection info, rather than generate a new one with randomized ports.
642 642
643 643 * In step 2, if you fetch the connection file directly into the security dir of a profile,
644 644 then you need not specify its path directly, only the profile (assumes the path exists,
645 645 otherwise you must create it first)::
646 646
647 647 [engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ~/.ipython/profile_ssh/security/
648 648 [engine.host.n] $ ipengine --profile=ssh
649 649
650 650 Of course, if you fetch the file into the default profile, no arguments must be passed to
651 651 ipengine at all.
652 652
653 653 * Note that ipengine *did not* specify the ip argument. In general, it is unlikely for any
654 654 connection information to be specified at the command-line to ipengine, as all of this
655 655 information should be contained in the connection file written by ipcontroller.
656 656
657 657 Make JSON files persistent
658 658 --------------------------
659 659
660 660 At fist glance it may seem that that managing the JSON files is a bit
661 661 annoying. Going back to the house and key analogy, copying the JSON around
662 662 each time you start the controller is like having to make a new key every time
663 663 you want to unlock the door and enter your house. As with your house, you want
664 664 to be able to create the key (or JSON file) once, and then simply use it at
665 665 any point in the future.
666 666
667 667 To do this, the only thing you have to do is specify the `--reuse` flag, so that
668 668 the connection information in the JSON files remains accurate::
669 669
670 670 $ ipcontroller --reuse
671 671
672 672 Then, just copy the JSON files over the first time and you are set. You can
673 673 start and stop the controller and engines any many times as you want in the
674 674 future, just make sure to tell the controller to reuse the file.
675 675
676 676 .. note::
677 677
678 678 You may ask the question: what ports does the controller listen on if you
679 679 don't tell is to use specific ones? The default is to use high random port
680 680 numbers. We do this for two reasons: i) to increase security through
681 681 obscurity and ii) to multiple controllers on a given host to start and
682 682 automatically use different ports.
683 683
684 684 Log files
685 685 ---------
686 686
687 687 All of the components of IPython have log files associated with them.
688 688 These log files can be extremely useful in debugging problems with
689 689 IPython and can be found in the directory :file:`IPYTHONDIR/profile_<name>/log`.
690 690 Sending the log files to us will often help us to debug any problems.
691 691
692 692
693 693 Configuring `ipcontroller`
694 694 ---------------------------
695 695
696 696 The IPython Controller takes its configuration from the file :file:`ipcontroller_config.py`
697 697 in the active profile directory.
698 698
699 699 Ports and addresses
700 700 *******************
701 701
702 702 In many cases, you will want to configure the Controller's network identity. By default,
703 703 the Controller listens only on loopback, which is the most secure but often impractical.
704 704 To instruct the controller to listen on a specific interface, you can set the
705 705 :attr:`HubFactory.ip` trait. To listen on all interfaces, simply specify:
706 706
707 707 .. sourcecode:: python
708 708
709 709 c.HubFactory.ip = '*'
710 710
711 711 When connecting to a Controller that is listening on loopback or behind a firewall, it may
712 712 be necessary to specify an SSH server to use for tunnels, and the external IP of the
713 713 Controller. If you specified that the HubFactory listen on loopback, or all interfaces,
714 714 then IPython will try to guess the external IP. If you are on a system with VM network
715 715 devices, or many interfaces, this guess may be incorrect. In these cases, you will want
716 716 to specify the 'location' of the Controller. This is the IP of the machine the Controller
717 717 is on, as seen by the clients, engines, or the SSH server used to tunnel connections.
718 718
719 719 For example, to set up a cluster with a Controller on a work node, using ssh tunnels
720 720 through the login node, an example :file:`ipcontroller_config.py` might contain:
721 721
722 722 .. sourcecode:: python
723 723
724 724 # allow connections on all interfaces from engines
725 725 # engines on the same node will use loopback, while engines
726 726 # from other nodes will use an external IP
727 727 c.HubFactory.ip = '*'
728 728
729 729 # you typically only need to specify the location when there are extra
730 730 # interfaces that may not be visible to peer nodes (e.g. VM interfaces)
731 731 c.HubFactory.location = '10.0.1.5'
732 732 # or to get an automatic value, try this:
733 733 import socket
734 734 hostname = socket.gethostname()
735 735 # alternate choices for hostname include `socket.getfqdn()`
736 736 # or `socket.gethostname() + '.local'`
737 737
738 738 ex_ip = socket.gethostbyname_ex(hostname)[-1][-1]
739 739 c.HubFactory.location = ex_ip
740 740
741 741 # now instruct clients to use the login node for SSH tunnels:
742 742 c.HubFactory.ssh_server = 'login.mycluster.net'
743 743
744 744 After doing this, your :file:`ipcontroller-client.json` file will look something like this:
745 745
746 746 .. this can be Python, despite the fact that it's actually JSON, because it's
747 747 .. still valid Python
748 748
749 749 .. sourcecode:: python
750 750
751 751 {
752 752 "url":"tcp:\/\/*:43447",
753 753 "exec_key":"9c7779e4-d08a-4c3b-ba8e-db1f80b562c1",
754 754 "ssh":"login.mycluster.net",
755 755 "location":"10.0.1.5"
756 756 }
757 757
758 758 Then this file will be all you need for a client to connect to the controller, tunneling
759 759 SSH connections through login.mycluster.net.
760 760
761 761 Database Backend
762 762 ****************
763 763
764 764 The Hub stores all messages and results passed between Clients and Engines.
765 765 For large and/or long-running clusters, it would be unreasonable to keep all
766 766 of this information in memory. For this reason, we have two database backends:
767 767 [MongoDB]_ via PyMongo_, and SQLite with the stdlib :py:mod:`sqlite`.
768 768
769 769 MongoDB is our design target, and the dict-like model it uses has driven our design. As far
770 770 as we are concerned, BSON can be considered essentially the same as JSON, adding support
771 771 for binary data and datetime objects, and any new database backend must support the same
772 772 data types.
773 773
774 774 .. seealso::
775 775
776 776 MongoDB `BSON doc <http://www.mongodb.org/display/DOCS/BSON>`_
777 777
778 778 To use one of these backends, you must set the :attr:`HubFactory.db_class` trait:
779 779
780 780 .. sourcecode:: python
781 781
782 782 # for a simple dict-based in-memory implementation, use dictdb
783 783 # This is the default and the fastest, since it doesn't involve the filesystem
784 784 c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.DictDB'
785 785
786 786 # To use MongoDB:
787 787 c.HubFactory.db_class = 'IPython.parallel.controller.mongodb.MongoDB'
788 788
789 789 # and SQLite:
790 790 c.HubFactory.db_class = 'IPython.parallel.controller.sqlitedb.SQLiteDB'
791 791
792 792 # You can use NoDB to disable the database altogether, in case you don't need
793 793 # to reuse tasks or results, and want to keep memory consumption under control.
794 794 c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.NoDB'
795 795
796 796 When using the proper databases, you can actually allow for tasks to persist from
797 797 one session to the next by specifying the MongoDB database or SQLite table in
798 798 which tasks are to be stored. The default is to use a table named for the Hub's Session,
799 799 which is a UUID, and thus different every time.
800 800
801 801 .. sourcecode:: python
802 802
803 803 # To keep persistent task history in MongoDB:
804 804 c.MongoDB.database = 'tasks'
805 805
806 806 # and in SQLite:
807 807 c.SQLiteDB.table = 'tasks'
808 808
809 809
810 810 Since MongoDB servers can be running remotely or configured to listen on a particular port,
811 811 you can specify any arguments you may need to the PyMongo `Connection
812 812 <http://api.mongodb.org/python/1.9/api/pymongo/connection.html#pymongo.connection.Connection>`_:
813 813
814 814 .. sourcecode:: python
815 815
816 816 # positional args to pymongo.Connection
817 817 c.MongoDB.connection_args = []
818 818
819 819 # keyword args to pymongo.Connection
820 820 c.MongoDB.connection_kwargs = {}
821 821
822 822 But sometimes you are moving lots of data around quickly, and you don't need
823 823 that information to be stored for later access, even by other Clients to this
824 824 same session. For this case, we have a dummy database, which doesn't actually
825 825 store anything. This lets the Hub stay small in memory, at the obvious expense
826 826 of being able to access the information that would have been stored in the
827 827 database (used for task resubmission, requesting results of tasks you didn't
828 828 submit, etc.). To use this backend, simply pass ``--nodb`` to
829 829 :command:`ipcontroller` on the command-line, or specify the :class:`NoDB` class
830 830 in your :file:`ipcontroller_config.py` as described above.
831 831
832 832
833 833 .. seealso::
834 834
835 835 For more information on the database backends, see the :ref:`db backend reference <parallel_db>`.
836 836
837 837
838 838 .. _PyMongo: http://api.mongodb.org/python/1.9/
839 839
840 840 Configuring `ipengine`
841 841 -----------------------
842 842
843 843 The IPython Engine takes its configuration from the file :file:`ipengine_config.py`
844 844
845 845 The Engine itself also has some amount of configuration. Most of this
846 846 has to do with initializing MPI or connecting to the controller.
847 847
848 848 To instruct the Engine to initialize with an MPI environment set up by
849 849 mpi4py, add:
850 850
851 851 .. sourcecode:: python
852 852
853 853 c.MPI.use = 'mpi4py'
854 854
855 855 In this case, the Engine will use our default mpi4py init script to set up
856 856 the MPI environment prior to execution. We have default init scripts for
857 857 mpi4py and pytrilinos. If you want to specify your own code to be run
858 858 at the beginning, specify `c.MPI.init_script`.
859 859
860 860 You can also specify a file or python command to be run at startup of the
861 861 Engine:
862 862
863 863 .. sourcecode:: python
864 864
865 865 c.IPEngineApp.startup_script = u'/path/to/my/startup.py'
866 866
867 867 c.IPEngineApp.startup_command = 'import numpy, scipy, mpi4py'
868 868
869 869 These commands/files will be run again, after each
870 870
871 871 It's also useful on systems with shared filesystems to run the engines
872 872 in some scratch directory. This can be set with:
873 873
874 874 .. sourcecode:: python
875 875
876 876 c.IPEngineApp.work_dir = u'/path/to/scratch/'
877 877
878 878
879 879
880 880 .. [MongoDB] MongoDB database http://www.mongodb.org
881 881
882 882 .. [PBS] Portable Batch System http://www.openpbs.org
883 883
884 884 .. [SSH] SSH-Agent http://en.wikipedia.org/wiki/ssh-agent
General Comments 0
You need to be logged in to leave comments. Login now