##// END OF EJS Templates
add links to starcluster in parallel docs
MinRK -
Show More
@@ -1,826 +1,838
1 1 .. _parallel_process:
2 2
3 3 ===========================================
4 4 Starting the IPython controller and engines
5 5 ===========================================
6 6
7 7 To use IPython for parallel computing, you need to start one instance of
8 8 the controller and one or more instances of the engine. The controller
9 9 and each engine can run on different machines or on the same machine.
10 10 Because of this, there are many different possibilities.
11 11
12 12 Broadly speaking, there are two ways of going about starting a controller and engines:
13 13
14 14 * In an automated manner using the :command:`ipcluster` command.
15 15 * In a more manual way using the :command:`ipcontroller` and
16 16 :command:`ipengine` commands.
17 17
18 18 This document describes both of these methods. We recommend that new users
19 19 start with the :command:`ipcluster` command as it simplifies many common usage
20 20 cases.
21 21
22 22 General considerations
23 23 ======================
24 24
25 25 Before delving into the details about how you can start a controller and
26 26 engines using the various methods, we outline some of the general issues that
27 27 come up when starting the controller and engines. These things come up no
28 28 matter which method you use to start your IPython cluster.
29 29
30 30 If you are running engines on multiple machines, you will likely need to instruct the
31 31 controller to listen for connections on an external interface. This can be done by specifying
32 32 the ``ip`` argument on the command-line, or the ``HubFactory.ip`` configurable in
33 33 :file:`ipcontroller_config.py`.
34 34
35 35 If your machines are on a trusted network, you can safely instruct the controller to listen
36 36 on all public interfaces with::
37 37
38 38 $> ipcontroller --ip=*
39 39
40 40 Or you can set the same behavior as the default by adding the following line to your :file:`ipcontroller_config.py`:
41 41
42 42 .. sourcecode:: python
43 43
44 44 c.HubFactory.ip = '*'
45 45
46 46 .. note::
47 47
48 48 Due to the lack of security in ZeroMQ, the controller will only listen for connections on
49 49 localhost by default. If you see Timeout errors on engines or clients, then the first
50 50 thing you should check is the ip address the controller is listening on, and make sure
51 51 that it is visible from the timing out machine.
52 52
53 53 .. seealso::
54 54
55 55 Our `notes <parallel_security>`_ on security in the new parallel computing code.
56 56
57 57 Let's say that you want to start the controller on ``host0`` and engines on
58 58 hosts ``host1``-``hostn``. The following steps are then required:
59 59
60 60 1. Start the controller on ``host0`` by running :command:`ipcontroller` on
61 61 ``host0``. The controller must be instructed to listen on an interface visible
62 62 to the engine machines, via the ``ip`` command-line argument or ``HubFactory.ip``
63 63 in :file:`ipcontroller_config.py`.
64 64 2. Move the JSON file (:file:`ipcontroller-engine.json`) created by the
65 65 controller from ``host0`` to hosts ``host1``-``hostn``.
66 66 3. Start the engines on hosts ``host1``-``hostn`` by running
67 67 :command:`ipengine`. This command has to be told where the JSON file
68 68 (:file:`ipcontroller-engine.json`) is located.
69 69
70 70 At this point, the controller and engines will be connected. By default, the JSON files
71 71 created by the controller are put into the :file:`~/.ipython/profile_default/security`
72 72 directory. If the engines share a filesystem with the controller, step 2 can be skipped as
73 73 the engines will automatically look at that location.
74 74
75 75 The final step required to actually use the running controller from a client is to move
76 76 the JSON file :file:`ipcontroller-client.json` from ``host0`` to any host where clients
77 77 will be run. If these file are put into the :file:`~/.ipython/profile_default/security`
78 78 directory of the client's host, they will be found automatically. Otherwise, the full path
79 79 to them has to be passed to the client's constructor.
80 80
81 81 Using :command:`ipcluster`
82 82 ===========================
83 83
84 84 The :command:`ipcluster` command provides a simple way of starting a
85 85 controller and engines in the following situations:
86 86
87 87 1. When the controller and engines are all run on localhost. This is useful
88 88 for testing or running on a multicore computer.
89 89 2. When engines are started using the :command:`mpiexec` command that comes
90 90 with most MPI [MPI]_ implementations
91 91 3. When engines are started using the PBS [PBS]_ batch system
92 92 (or other `qsub` systems, such as SGE).
93 93 4. When the controller is started on localhost and the engines are started on
94 94 remote nodes using :command:`ssh`.
95 95 5. When engines are started using the Windows HPC Server batch system.
96 96
97 97 .. note::
98 98
99 99 Currently :command:`ipcluster` requires that the
100 100 :file:`~/.ipython/profile_<name>/security` directory live on a shared filesystem that is
101 101 seen by both the controller and engines. If you don't have a shared file
102 102 system you will need to use :command:`ipcontroller` and
103 103 :command:`ipengine` directly.
104 104
105 105 Under the hood, :command:`ipcluster` just uses :command:`ipcontroller`
106 106 and :command:`ipengine` to perform the steps described above.
107 107
108 108 The simplest way to use ipcluster requires no configuration, and will
109 109 launch a controller and a number of engines on the local machine. For instance,
110 110 to start one controller and 4 engines on localhost, just do::
111 111
112 112 $ ipcluster start -n 4
113 113
114 114 To see other command line options, do::
115 115
116 116 $ ipcluster -h
117 117
118 118
119 119 Configuring an IPython cluster
120 120 ==============================
121 121
122 122 Cluster configurations are stored as `profiles`. You can create a new profile with::
123 123
124 124 $ ipython profile create --parallel --profile=myprofile
125 125
126 126 This will create the directory :file:`IPYTHONDIR/profile_myprofile`, and populate it
127 127 with the default configuration files for the three IPython cluster commands. Once
128 128 you edit those files, you can continue to call ipcluster/ipcontroller/ipengine
129 129 with no arguments beyond ``profile=myprofile``, and any configuration will be maintained.
130 130
131 131 There is no limit to the number of profiles you can have, so you can maintain a profile for each
132 132 of your common use cases. The default profile will be used whenever the
133 133 profile argument is not specified, so edit :file:`IPYTHONDIR/profile_default/*_config.py` to
134 134 represent your most common use case.
135 135
136 136 The configuration files are loaded with commented-out settings and explanations,
137 137 which should cover most of the available possibilities.
138 138
139 139 Using various batch systems with :command:`ipcluster`
140 140 -----------------------------------------------------
141 141
142 142 :command:`ipcluster` has a notion of Launchers that can start controllers
143 143 and engines with various remote execution schemes. Currently supported
144 144 models include :command:`ssh`, :command:`mpiexec`, PBS-style (Torque, SGE, LSF),
145 145 and Windows HPC Server.
146 146
147 147 In general, these are configured by the :attr:`IPClusterEngines.engine_set_launcher_class`,
148 148 and :attr:`IPClusterStart.controller_launcher_class` configurables, which can be the
149 149 fully specified object name (e.g. ``'IPython.parallel.apps.launcher.LocalControllerLauncher'``),
150 150 but if you are using IPython's builtin launchers, you can specify just the class name,
151 151 or even just the prefix e.g:
152 152
153 153 .. sourcecode:: python
154 154
155 155 c.IPClusterEngines.engine_launcher_class = 'SSH'
156 156 # equivalent to
157 157 c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
158 158 # both of which expand to
159 159 c.IPClusterEngines.engine_launcher_class = 'IPython.parallel.apps.launcher.SSHEngineSetLauncher'
160 160
161 161 The shortest form being of particular use on the command line, where all you need to do to
162 162 get an IPython cluster running with engines started with MPI is:
163 163
164 164 .. sourcecode:: bash
165 165
166 166 $> ipcluster start --engines=MPI
167 167
168 168 Assuming that the default MPI config is sufficient.
169 169
170 170 .. note::
171 171
172 172 shortcuts for builtin launcher names were added in 0.12, as was the ``_class`` suffix
173 173 on the configurable names. If you use the old 0.11 names (e.g. ``engine_set_launcher``),
174 174 they will still work, but you will get a deprecation warning that the name has changed.
175 175
176 176
177 177 .. note::
178 178
179 179 The Launchers and configuration are designed in such a way that advanced
180 180 users can subclass and configure them to fit their own system that we
181 181 have not yet supported (such as Condor)
182 182
183 183 Using :command:`ipcluster` in mpiexec/mpirun mode
184 184 -------------------------------------------------
185 185
186 186
187 187 The mpiexec/mpirun mode is useful if you:
188 188
189 189 1. Have MPI installed.
190 190 2. Your systems are configured to use the :command:`mpiexec` or
191 191 :command:`mpirun` commands to start MPI processes.
192 192
193 193 If these are satisfied, you can create a new profile::
194 194
195 195 $ ipython profile create --parallel --profile=mpi
196 196
197 197 and edit the file :file:`IPYTHONDIR/profile_mpi/ipcluster_config.py`.
198 198
199 199 There, instruct ipcluster to use the MPI launchers by adding the lines:
200 200
201 201 .. sourcecode:: python
202 202
203 203 c.IPClusterEngines.engine_launcher_class = 'MPIEngineSetLauncher'
204 204
205 205 If the default MPI configuration is correct, then you can now start your cluster, with::
206 206
207 207 $ ipcluster start -n 4 --profile=mpi
208 208
209 209 This does the following:
210 210
211 211 1. Starts the IPython controller on current host.
212 212 2. Uses :command:`mpiexec` to start 4 engines.
213 213
214 214 If you have a reason to also start the Controller with mpi, you can specify:
215 215
216 216 .. sourcecode:: python
217 217
218 218 c.IPClusterStart.controller_launcher_class = 'MPIControllerLauncher'
219 219
220 220 .. note::
221 221
222 222 The Controller *will not* be in the same MPI universe as the engines, so there is not
223 223 much reason to do this unless sysadmins demand it.
224 224
225 225 On newer MPI implementations (such as OpenMPI), this will work even if you
226 226 don't make any calls to MPI or call :func:`MPI_Init`. However, older MPI
227 227 implementations actually require each process to call :func:`MPI_Init` upon
228 228 starting. The easiest way of having this done is to install the mpi4py
229 229 [mpi4py]_ package and then specify the ``c.MPI.use`` option in :file:`ipengine_config.py`:
230 230
231 231 .. sourcecode:: python
232 232
233 233 c.MPI.use = 'mpi4py'
234 234
235 235 Unfortunately, even this won't work for some MPI implementations. If you are
236 236 having problems with this, you will likely have to use a custom Python
237 237 executable that itself calls :func:`MPI_Init` at the appropriate time.
238 238 Fortunately, mpi4py comes with such a custom Python executable that is easy to
239 239 install and use. However, this custom Python executable approach will not work
240 240 with :command:`ipcluster` currently.
241 241
242 242 More details on using MPI with IPython can be found :ref:`here <parallelmpi>`.
243 243
244 244
245 245 Using :command:`ipcluster` in PBS mode
246 246 --------------------------------------
247 247
248 248 The PBS mode uses the Portable Batch System (PBS) to start the engines.
249 249
250 250 As usual, we will start by creating a fresh profile::
251 251
252 252 $ ipython profile create --parallel --profile=pbs
253 253
254 254 And in :file:`ipcluster_config.py`, we will select the PBS launchers for the controller
255 255 and engines:
256 256
257 257 .. sourcecode:: python
258 258
259 259 c.IPClusterStart.controller_launcher_class = 'PBSControllerLauncher'
260 260 c.IPClusterEngines.engine_launcher_class = 'PBSEngineSetLauncher'
261 261
262 262 .. note::
263 263
264 264 Note that the configurable is IPClusterEngines for the engine launcher, and
265 265 IPClusterStart for the controller launcher. This is because the start command is a
266 266 subclass of the engine command, adding a controller launcher. Since it is a subclass,
267 267 any configuration made in IPClusterEngines is inherited by IPClusterStart unless it is
268 268 overridden.
269 269
270 270 IPython does provide simple default batch templates for PBS and SGE, but you may need
271 271 to specify your own. Here is a sample PBS script template:
272 272
273 273 .. sourcecode:: bash
274 274
275 275 #PBS -N ipython
276 276 #PBS -j oe
277 277 #PBS -l walltime=00:10:00
278 278 #PBS -l nodes={n/4}:ppn=4
279 279 #PBS -q {queue}
280 280
281 281 cd $PBS_O_WORKDIR
282 282 export PATH=$HOME/usr/local/bin
283 283 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
284 284 /usr/local/bin/mpiexec -n {n} ipengine --profile-dir={profile_dir}
285 285
286 286 There are a few important points about this template:
287 287
288 288 1. This template will be rendered at runtime using IPython's :class:`EvalFormatter`.
289 289 This is simply a subclass of :class:`string.Formatter` that allows simple expressions
290 290 on keys.
291 291
292 292 2. Instead of putting in the actual number of engines, use the notation
293 293 ``{n}`` to indicate the number of engines to be started. You can also use
294 294 expressions like ``{n/4}`` in the template to indicate the number of nodes.
295 295 There will always be ``{n}`` and ``{profile_dir}`` variables passed to the formatter.
296 296 These allow the batch system to know how many engines, and where the configuration
297 297 files reside. The same is true for the batch queue, with the template variable
298 298 ``{queue}``.
299 299
300 300 3. Any options to :command:`ipengine` can be given in the batch script
301 301 template, or in :file:`ipengine_config.py`.
302 302
303 303 4. Depending on the configuration of you system, you may have to set
304 304 environment variables in the script template.
305 305
306 306 The controller template should be similar, but simpler:
307 307
308 308 .. sourcecode:: bash
309 309
310 310 #PBS -N ipython
311 311 #PBS -j oe
312 312 #PBS -l walltime=00:10:00
313 313 #PBS -l nodes=1:ppn=4
314 314 #PBS -q {queue}
315 315
316 316 cd $PBS_O_WORKDIR
317 317 export PATH=$HOME/usr/local/bin
318 318 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
319 319 ipcontroller --profile-dir={profile_dir}
320 320
321 321
322 322 Once you have created these scripts, save them with names like
323 323 :file:`pbs.engine.template`. Now you can load them into the :file:`ipcluster_config` with:
324 324
325 325 .. sourcecode:: python
326 326
327 327 c.PBSEngineSetLauncher.batch_template_file = "pbs.engine.template"
328 328
329 329 c.PBSControllerLauncher.batch_template_file = "pbs.controller.template"
330 330
331 331
332 332 Alternately, you can just define the templates as strings inside :file:`ipcluster_config`.
333 333
334 334 Whether you are using your own templates or our defaults, the extra configurables available are
335 335 the number of engines to launch (``{n}``, and the batch system queue to which the jobs are to be
336 336 submitted (``{queue}``)). These are configurables, and can be specified in
337 337 :file:`ipcluster_config`:
338 338
339 339 .. sourcecode:: python
340 340
341 341 c.PBSLauncher.queue = 'veryshort.q'
342 342 c.IPClusterEngines.n = 64
343 343
344 344 Note that assuming you are running PBS on a multi-node cluster, the Controller's default behavior
345 345 of listening only on localhost is likely too restrictive. In this case, also assuming the
346 346 nodes are safely behind a firewall, you can simply instruct the Controller to listen for
347 347 connections on all its interfaces, by adding in :file:`ipcontroller_config`:
348 348
349 349 .. sourcecode:: python
350 350
351 351 c.HubFactory.ip = '*'
352 352
353 353 You can now run the cluster with::
354 354
355 355 $ ipcluster start --profile=pbs -n 128
356 356
357 357 Additional configuration options can be found in the PBS section of :file:`ipcluster_config`.
358 358
359 359 .. note::
360 360
361 361 Due to the flexibility of configuration, the PBS launchers work with simple changes
362 362 to the template for other :command:`qsub`-using systems, such as Sun Grid Engine,
363 363 and with further configuration in similar batch systems like Condor.
364 364
365 365
366 366 Using :command:`ipcluster` in SSH mode
367 367 --------------------------------------
368 368
369 369
370 370 The SSH mode uses :command:`ssh` to execute :command:`ipengine` on remote
371 371 nodes and :command:`ipcontroller` can be run remotely as well, or on localhost.
372 372
373 373 .. note::
374 374
375 375 When using this mode it highly recommended that you have set up SSH keys
376 376 and are using ssh-agent [SSH]_ for password-less logins.
377 377
378 378 As usual, we start by creating a clean profile::
379 379
380 380 $ ipython profile create --parallel --profile=ssh
381 381
382 382 To use this mode, select the SSH launchers in :file:`ipcluster_config.py`:
383 383
384 384 .. sourcecode:: python
385 385
386 386 c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
387 387 # and if the Controller is also to be remote:
388 388 c.IPClusterStart.controller_launcher_class = 'SSHControllerLauncher'
389 389
390 390
391 391
392 392 The controller's remote location and configuration can be specified:
393 393
394 394 .. sourcecode:: python
395 395
396 396 # Set the user and hostname for the controller
397 397 # c.SSHControllerLauncher.hostname = 'controller.example.com'
398 398 # c.SSHControllerLauncher.user = os.environ.get('USER','username')
399 399
400 400 # Set the arguments to be passed to ipcontroller
401 401 # note that remotely launched ipcontroller will not get the contents of
402 402 # the local ipcontroller_config.py unless it resides on the *remote host*
403 403 # in the location specified by the `profile-dir` argument.
404 404 # c.SSHControllerLauncher.controller_args = ['--reuse', '--ip=*', '--profile-dir=/path/to/cd']
405 405
406 406 .. note::
407 407
408 408 SSH mode does not do any file movement, so you will need to distribute configuration
409 409 files manually. To aid in this, the `reuse_files` flag defaults to True for ssh-launched
410 410 Controllers, so you will only need to do this once, unless you override this flag back
411 411 to False.
412 412
413 413 Engines are specified in a dictionary, by hostname and the number of engines to be run
414 414 on that host.
415 415
416 416 .. sourcecode:: python
417 417
418 418 c.SSHEngineSetLauncher.engines = { 'host1.example.com' : 2,
419 419 'host2.example.com' : 5,
420 420 'host3.example.com' : (1, ['--profile-dir=/home/different/location']),
421 421 'host4.example.com' : 8 }
422 422
423 423 * The `engines` dict, where the keys are the host we want to run engines on and
424 424 the value is the number of engines to run on that host.
425 425 * on host3, the value is a tuple, where the number of engines is first, and the arguments
426 426 to be passed to :command:`ipengine` are the second element.
427 427
428 428 For engines without explicitly specified arguments, the default arguments are set in
429 429 a single location:
430 430
431 431 .. sourcecode:: python
432 432
433 433 c.SSHEngineSetLauncher.engine_args = ['--profile-dir=/path/to/profile_ssh']
434 434
435 435 Current limitations of the SSH mode of :command:`ipcluster` are:
436 436
437 437 * Untested on Windows. Would require a working :command:`ssh` on Windows.
438 438 Also, we are using shell scripts to setup and execute commands on remote
439 439 hosts.
440 440 * No file movement - This is a regression from 0.10, which moved connection files
441 441 around with scp. This will be improved, Pull Requests are welcome.
442 442
443 443
444 IPython on EC2 with StarCluster
445 ===============================
446
447 The excellent StarCluster_ toolkit for managing `Amazon EC2`_ clusters has a plugin
448 which makes deploying IPython on EC2 quite simple. The starcluster plugin uses
449 :command:`ipcluster` with the SGE launchers to distribute engines across the
450 EC2 cluster. See their `ipcluster plugin documentation`_ for more information.
451
452 .. _StarCluster: http://web.mit.edu/starcluster
453 .. _Amazon EC2: http://aws.amazon.com/ec2/
454 .. _ipcluster plugin documentation: http://web.mit.edu/starcluster/docs/latest/plugins/ipython.html
455
456
444 457 Using the :command:`ipcontroller` and :command:`ipengine` commands
445 458 ==================================================================
446 459
447 460 It is also possible to use the :command:`ipcontroller` and :command:`ipengine`
448 461 commands to start your controller and engines. This approach gives you full
449 462 control over all aspects of the startup process.
450 463
451 464 Starting the controller and engine on your local machine
452 465 --------------------------------------------------------
453 466
454 467 To use :command:`ipcontroller` and :command:`ipengine` to start things on your
455 468 local machine, do the following.
456 469
457 470 First start the controller::
458 471
459 472 $ ipcontroller
460 473
461 474 Next, start however many instances of the engine you want using (repeatedly)
462 475 the command::
463 476
464 477 $ ipengine
465 478
466 479 The engines should start and automatically connect to the controller using the
467 480 JSON files in :file:`~/.ipython/profile_default/security`. You are now ready to use the
468 481 controller and engines from IPython.
469 482
470 483 .. warning::
471 484
472 485 The order of the above operations may be important. You *must*
473 486 start the controller before the engines, unless you are reusing connection
474 487 information (via ``--reuse``), in which case ordering is not important.
475 488
476 489 .. note::
477 490
478 491 On some platforms (OS X), to put the controller and engine into the
479 492 background you may need to give these commands in the form ``(ipcontroller
480 493 &)`` and ``(ipengine &)`` (with the parentheses) for them to work
481 494 properly.
482 495
483 496 Starting the controller and engines on different hosts
484 497 ------------------------------------------------------
485 498
486 499 When the controller and engines are running on different hosts, things are
487 500 slightly more complicated, but the underlying ideas are the same:
488 501
489 502 1. Start the controller on a host using :command:`ipcontroller`. The controller must be
490 503 instructed to listen on an interface visible to the engine machines, via the ``ip``
491 504 command-line argument or ``HubFactory.ip`` in :file:`ipcontroller_config.py`::
492 505
493 506 $ ipcontroller --ip=192.168.1.16
494 507
495 508 .. sourcecode:: python
496 509
497 510 # in ipcontroller_config.py
498 511 HubFactory.ip = '192.168.1.16'
499 512
500 513 2. Copy :file:`ipcontroller-engine.json` from :file:`~/.ipython/profile_<name>/security` on
501 514 the controller's host to the host where the engines will run.
502 515 3. Use :command:`ipengine` on the engine's hosts to start the engines.
503 516
504 517 The only thing you have to be careful of is to tell :command:`ipengine` where
505 518 the :file:`ipcontroller-engine.json` file is located. There are two ways you
506 519 can do this:
507 520
508 521 * Put :file:`ipcontroller-engine.json` in the :file:`~/.ipython/profile_<name>/security`
509 522 directory on the engine's host, where it will be found automatically.
510 523 * Call :command:`ipengine` with the ``--file=full_path_to_the_file``
511 524 flag.
512 525
513 526 The ``file`` flag works like this::
514 527
515 528 $ ipengine --file=/path/to/my/ipcontroller-engine.json
516 529
517 530 .. note::
518 531
519 532 If the controller's and engine's hosts all have a shared file system
520 533 (:file:`~/.ipython/profile_<name>/security` is the same on all of them), then things
521 534 will just work!
522 535
523 536 SSH Tunnels
524 537 ***********
525 538
526 539 If your engines are not on the same LAN as the controller, or you are on a highly
527 540 restricted network where your nodes cannot see each others ports, then you can
528 541 use SSH tunnels to connect engines to the controller.
529 542
530 543 .. note::
531 544
532 545 This does not work in all cases. Manual tunnels may be an option, but are
533 546 highly inconvenient. Support for manual tunnels will be improved.
534 547
535 548 You can instruct all engines to use ssh, by specifying the ssh server in
536 549 :file:`ipcontroller-engine.json`:
537 550
538 551 .. I know this is really JSON, but the example is a subset of Python:
539 552 .. sourcecode:: python
540 553
541 554 {
542 555 "url":"tcp://192.168.1.123:56951",
543 556 "exec_key":"26f4c040-587d-4a4e-b58b-030b96399584",
544 557 "ssh":"user@example.com",
545 558 "location":"192.168.1.123"
546 559 }
547 560
548 561 This will be specified if you give the ``--enginessh=use@example.com`` argument when
549 562 starting :command:`ipcontroller`.
550 563
551 564 Or you can specify an ssh server on the command-line when starting an engine::
552 565
553 566 $> ipengine --profile=foo --ssh=my.login.node
554 567
555 568 For example, if your system is totally restricted, then all connections will actually be
556 569 loopback, and ssh tunnels will be used to connect engines to the controller::
557 570
558 571 [node1] $> ipcontroller --enginessh=node1
559 572 [node2] $> ipengine
560 573 [node3] $> ipcluster engines --n=4
561 574
562 575 Or if you want to start many engines on each node, the command `ipcluster engines --n=4`
563 576 without any configuration is equivalent to running ipengine 4 times.
564 577
565 578 An example using ipcontroller/engine with ssh
566 579 ---------------------------------------------
567 580
568 581 No configuration files are necessary to use ipcontroller/engine in an SSH environment
569 582 without a shared filesystem. You simply need to make sure that the controller is listening
570 583 on an interface visible to the engines, and move the connection file from the controller to
571 584 the engines.
572 585
573 586 1. start the controller, listening on an ip-address visible to the engine machines::
574 587
575 588 [controller.host] $ ipcontroller --ip=192.168.1.16
576 589
577 590 [IPControllerApp] Using existing profile dir: u'/Users/me/.ipython/profile_default'
578 591 [IPControllerApp] Hub listening on tcp://192.168.1.16:63320 for registration.
579 592 [IPControllerApp] Hub using DB backend: 'IPython.parallel.controller.dictdb.DictDB'
580 593 [IPControllerApp] hub::created hub
581 594 [IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-client.json
582 595 [IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-engine.json
583 596 [IPControllerApp] task::using Python leastload Task scheduler
584 597 [IPControllerApp] Heartmonitor started
585 598 [IPControllerApp] Creating pid file: /Users/me/.ipython/profile_default/pid/ipcontroller.pid
586 599 Scheduler started [leastload]
587 600
588 601 2. on each engine, fetch the connection file with scp::
589 602
590 603 [engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ./
591 604
592 605 .. note::
593 606
594 607 The log output of ipcontroller above shows you where the json files were written.
595 608 They will be in :file:`~/.ipython` (or :file:`~/.config/ipython`) under
596 609 :file:`profile_default/security/ipcontroller-engine.json`
597 610
598 611 3. start the engines, using the connection file::
599 612
600 613 [engine.host.n] $ ipengine --file=./ipcontroller-engine.json
601 614
602 615 A couple of notes:
603 616
604 617 * You can avoid having to fetch the connection file every time by adding ``--reuse`` flag
605 618 to ipcontroller, which instructs the controller to read the previous connection file for
606 619 connection info, rather than generate a new one with randomized ports.
607 620
608 621 * In step 2, if you fetch the connection file directly into the security dir of a profile,
609 622 then you need not specify its path directly, only the profile (assumes the path exists,
610 623 otherwise you must create it first)::
611 624
612 625 [engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ~/.ipython/profile_ssh/security/
613 626 [engine.host.n] $ ipengine --profile=ssh
614 627
615 628 Of course, if you fetch the file into the default profile, no arguments must be passed to
616 629 ipengine at all.
617 630
618 631 * Note that ipengine *did not* specify the ip argument. In general, it is unlikely for any
619 632 connection information to be specified at the command-line to ipengine, as all of this
620 633 information should be contained in the connection file written by ipcontroller.
621 634
622 635 Make JSON files persistent
623 636 --------------------------
624 637
625 638 At fist glance it may seem that that managing the JSON files is a bit
626 639 annoying. Going back to the house and key analogy, copying the JSON around
627 640 each time you start the controller is like having to make a new key every time
628 641 you want to unlock the door and enter your house. As with your house, you want
629 642 to be able to create the key (or JSON file) once, and then simply use it at
630 643 any point in the future.
631 644
632 645 To do this, the only thing you have to do is specify the `--reuse` flag, so that
633 646 the connection information in the JSON files remains accurate::
634 647
635 648 $ ipcontroller --reuse
636 649
637 650 Then, just copy the JSON files over the first time and you are set. You can
638 651 start and stop the controller and engines any many times as you want in the
639 652 future, just make sure to tell the controller to reuse the file.
640 653
641 654 .. note::
642 655
643 656 You may ask the question: what ports does the controller listen on if you
644 657 don't tell is to use specific ones? The default is to use high random port
645 658 numbers. We do this for two reasons: i) to increase security through
646 659 obscurity and ii) to multiple controllers on a given host to start and
647 660 automatically use different ports.
648 661
649 662 Log files
650 663 ---------
651 664
652 665 All of the components of IPython have log files associated with them.
653 666 These log files can be extremely useful in debugging problems with
654 667 IPython and can be found in the directory :file:`~/.ipython/profile_<name>/log`.
655 668 Sending the log files to us will often help us to debug any problems.
656 669
657 670
658 671 Configuring `ipcontroller`
659 672 ---------------------------
660 673
661 674 The IPython Controller takes its configuration from the file :file:`ipcontroller_config.py`
662 675 in the active profile directory.
663 676
664 677 Ports and addresses
665 678 *******************
666 679
667 680 In many cases, you will want to configure the Controller's network identity. By default,
668 681 the Controller listens only on loopback, which is the most secure but often impractical.
669 682 To instruct the controller to listen on a specific interface, you can set the
670 683 :attr:`HubFactory.ip` trait. To listen on all interfaces, simply specify:
671 684
672 685 .. sourcecode:: python
673 686
674 687 c.HubFactory.ip = '*'
675 688
676 689 When connecting to a Controller that is listening on loopback or behind a firewall, it may
677 690 be necessary to specify an SSH server to use for tunnels, and the external IP of the
678 691 Controller. If you specified that the HubFactory listen on loopback, or all interfaces,
679 692 then IPython will try to guess the external IP. If you are on a system with VM network
680 693 devices, or many interfaces, this guess may be incorrect. In these cases, you will want
681 694 to specify the 'location' of the Controller. This is the IP of the machine the Controller
682 695 is on, as seen by the clients, engines, or the SSH server used to tunnel connections.
683 696
684 697 For example, to set up a cluster with a Controller on a work node, using ssh tunnels
685 698 through the login node, an example :file:`ipcontroller_config.py` might contain:
686 699
687 700 .. sourcecode:: python
688 701
689 702 # allow connections on all interfaces from engines
690 703 # engines on the same node will use loopback, while engines
691 704 # from other nodes will use an external IP
692 705 c.HubFactory.ip = '*'
693 706
694 707 # you typically only need to specify the location when there are extra
695 708 # interfaces that may not be visible to peer nodes (e.g. VM interfaces)
696 709 c.HubFactory.location = '10.0.1.5'
697 710 # or to get an automatic value, try this:
698 711 import socket
699 712 ex_ip = socket.gethostbyname_ex(socket.gethostname())[-1][0]
700 713 c.HubFactory.location = ex_ip
701 714
702 715 # now instruct clients to use the login node for SSH tunnels:
703 716 c.HubFactory.ssh_server = 'login.mycluster.net'
704 717
705 718 After doing this, your :file:`ipcontroller-client.json` file will look something like this:
706 719
707 720 .. this can be Python, despite the fact that it's actually JSON, because it's
708 721 .. still valid Python
709 722
710 723 .. sourcecode:: python
711 724
712 725 {
713 726 "url":"tcp:\/\/*:43447",
714 727 "exec_key":"9c7779e4-d08a-4c3b-ba8e-db1f80b562c1",
715 728 "ssh":"login.mycluster.net",
716 729 "location":"10.0.1.5"
717 730 }
718 731
719 732 Then this file will be all you need for a client to connect to the controller, tunneling
720 733 SSH connections through login.mycluster.net.
721 734
722 735 Database Backend
723 736 ****************
724 737
725 738 The Hub stores all messages and results passed between Clients and Engines.
726 739 For large and/or long-running clusters, it would be unreasonable to keep all
727 740 of this information in memory. For this reason, we have two database backends:
728 741 [MongoDB]_ via PyMongo_, and SQLite with the stdlib :py:mod:`sqlite`.
729 742
730 743 MongoDB is our design target, and the dict-like model it uses has driven our design. As far
731 744 as we are concerned, BSON can be considered essentially the same as JSON, adding support
732 745 for binary data and datetime objects, and any new database backend must support the same
733 746 data types.
734 747
735 748 .. seealso::
736 749
737 750 MongoDB `BSON doc <http://www.mongodb.org/display/DOCS/BSON>`_
738 751
739 752 To use one of these backends, you must set the :attr:`HubFactory.db_class` trait:
740 753
741 754 .. sourcecode:: python
742 755
743 756 # for a simple dict-based in-memory implementation, use dictdb
744 757 # This is the default and the fastest, since it doesn't involve the filesystem
745 758 c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.DictDB'
746 759
747 760 # To use MongoDB:
748 761 c.HubFactory.db_class = 'IPython.parallel.controller.mongodb.MongoDB'
749 762
750 763 # and SQLite:
751 764 c.HubFactory.db_class = 'IPython.parallel.controller.sqlitedb.SQLiteDB'
752 765
753 766 When using the proper databases, you can actually allow for tasks to persist from
754 767 one session to the next by specifying the MongoDB database or SQLite table in
755 768 which tasks are to be stored. The default is to use a table named for the Hub's Session,
756 769 which is a UUID, and thus different every time.
757 770
758 771 .. sourcecode:: python
759 772
760 773 # To keep persistant task history in MongoDB:
761 774 c.MongoDB.database = 'tasks'
762 775
763 776 # and in SQLite:
764 777 c.SQLiteDB.table = 'tasks'
765 778
766 779
767 780 Since MongoDB servers can be running remotely or configured to listen on a particular port,
768 781 you can specify any arguments you may need to the PyMongo `Connection
769 782 <http://api.mongodb.org/python/1.9/api/pymongo/connection.html#pymongo.connection.Connection>`_:
770 783
771 784 .. sourcecode:: python
772 785
773 786 # positional args to pymongo.Connection
774 787 c.MongoDB.connection_args = []
775 788
776 789 # keyword args to pymongo.Connection
777 790 c.MongoDB.connection_kwargs = {}
778 791
779 .. _MongoDB: http://www.mongodb.org
780 792 .. _PyMongo: http://api.mongodb.org/python/1.9/
781 793
782 794 Configuring `ipengine`
783 795 -----------------------
784 796
785 797 The IPython Engine takes its configuration from the file :file:`ipengine_config.py`
786 798
787 799 The Engine itself also has some amount of configuration. Most of this
788 800 has to do with initializing MPI or connecting to the controller.
789 801
790 802 To instruct the Engine to initialize with an MPI environment set up by
791 803 mpi4py, add:
792 804
793 805 .. sourcecode:: python
794 806
795 807 c.MPI.use = 'mpi4py'
796 808
797 809 In this case, the Engine will use our default mpi4py init script to set up
798 810 the MPI environment prior to exection. We have default init scripts for
799 811 mpi4py and pytrilinos. If you want to specify your own code to be run
800 812 at the beginning, specify `c.MPI.init_script`.
801 813
802 814 You can also specify a file or python command to be run at startup of the
803 815 Engine:
804 816
805 817 .. sourcecode:: python
806 818
807 819 c.IPEngineApp.startup_script = u'/path/to/my/startup.py'
808 820
809 821 c.IPEngineApp.startup_command = 'import numpy, scipy, mpi4py'
810 822
811 823 These commands/files will be run again, after each
812 824
813 825 It's also useful on systems with shared filesystems to run the engines
814 826 in some scratch directory. This can be set with:
815 827
816 828 .. sourcecode:: python
817 829
818 830 c.IPEngineApp.work_dir = u'/path/to/scratch/'
819 831
820 832
821 833
822 834 .. [MongoDB] MongoDB database http://www.mongodb.org
823 835
824 836 .. [PBS] Portable Batch System http://www.openpbs.org
825 837
826 838 .. [SSH] SSH-Agent http://en.wikipedia.org/wiki/ssh-agent
General Comments 0
You need to be logged in to leave comments. Login now