##// END OF EJS Templates
add note about specifying IP for external engines in parallel_process doc...
MinRK -
Show More
@@ -6,7 +6,7 Details of Parallel Computing with IPython
6
6
7 .. note::
7 .. note::
8
8
9 There are still many sections to fill out
9 There are still many sections to fill out in this doc
10
10
11
11
12 Caveats
12 Caveats
@@ -70,9 +70,11 The :attr:`ndarray.flags.writeable` flag will tell you if you can write to an ar
70 In [6]: _.flags.writeable
70 In [6]: _.flags.writeable
71 Out[6]: False
71 Out[6]: False
72
72
73 If you want to safely edit an array in-place after *sending* it, you must use the `track=True` flag. IPython always performs non-copying sends of arrays, which return immediately. You
73 If you want to safely edit an array in-place after *sending* it, you must use the `track=True`
74 must instruct IPython track those messages *at send time* in order to know for sure that the send has completed. AsyncResults have a :attr:`sent` property, and :meth:`wait_on_send` method
74 flag. IPython always performs non-copying sends of arrays, which return immediately. You must
75 for checking and waiting for 0MQ to finish with a buffer.
75 instruct IPython track those messages *at send time* in order to know for sure that the send has
76 completed. AsyncResults have a :attr:`sent` property, and :meth:`wait_on_send` method for
77 checking and waiting for 0MQ to finish with a buffer.
76
78
77 .. sourcecode:: ipython
79 .. sourcecode:: ipython
78
80
@@ -124,12 +126,12 An example of a function that uses a closure:
124 f1() # returns 1
126 f1() # returns 1
125 f2() # returns 2
127 f2() # returns 2
126
128
127 f1 and f2 will have closures referring to the scope in which `inner` was defined, because they
129 ``f1`` and ``f2`` will have closures referring to the scope in which `inner` was defined,
128 use the variable 'a'. As a result, you would not be able to send ``f1`` or ``f2`` with IPython.
130 because they use the variable 'a'. As a result, you would not be able to send ``f1`` or ``f2``
129 Note that you *would* be able to send `f`. This is only true for interactively defined
131 with IPython. Note that you *would* be able to send `f`. This is only true for interactively
130 functions (as are often used in decorators), and only when there are variables used inside the
132 defined functions (as are often used in decorators), and only when there are variables used
131 inner function, that are defined in the outer function. If the names are *not* in the outer
133 inside the inner function, that are defined in the outer function. If the names are *not* in the
132 function, then there will not be a closure, and the generated function will look in
134 outer function, then there will not be a closure, and the generated function will look in
133 ``globals()`` for the name:
135 ``globals()`` for the name:
134
136
135 .. sourcecode:: python
137 .. sourcecode:: python
@@ -168,9 +170,10 Client method, called `apply`.
168 Apply
170 Apply
169 -----
171 -----
170
172
171 The principal method of remote execution is :meth:`apply`, of View objects. The Client provides
173 The principal method of remote execution is :meth:`apply`, of
172 the full execution and communication API for engines via its low-level
174 :class:`~IPython.parallel.client.view.View` objects. The Client provides the full execution and
173 :meth:`send_apply_message` method.
175 communication API for engines via its low-level :meth:`send_apply_message` method, which is used
176 by all higher level methods of its Views.
174
177
175 f : function
178 f : function
176 The fuction to be called remotely
179 The fuction to be called remotely
@@ -227,21 +230,23 timeout : float/int or None
227 execute and run
230 execute and run
228 ---------------
231 ---------------
229
232
230 For executing strings of Python code, :class:`DirectView`s also provide an :meth:`execute` and a
233 For executing strings of Python code, :class:`DirectView` 's also provide an :meth:`execute` and
231 :meth:`run` method, which rather than take functions and arguments, take simple strings.
234 a :meth:`run` method, which rather than take functions and arguments, take simple strings.
232 `execute` simply takes a string of Python code to execute, and sends it to the Engine(s). `run`
235 `execute` simply takes a string of Python code to execute, and sends it to the Engine(s). `run`
233 is the same as `execute`, but for a *file*, rather than a string. It is simply a wrapper that
236 is the same as `execute`, but for a *file*, rather than a string. It is simply a wrapper that
234 does something very similar to ``execute(open(f).read())``.
237 does something very similar to ``execute(open(f).read())``.
235
238
236 .. note::
239 .. note::
237
240
238 TODO: Example
241 TODO: Examples for execute and run
239
242
240 Views
243 Views
241 =====
244 =====
242
245
243 The principal extension of the :class:`~parallel.Client` is the
246 The principal extension of the :class:`~parallel.Client` is the :class:`~parallel.View`
244 :class:`~parallel.View` class. The client
247 class. The client is typically a singleton for connecting to a cluster, and presents a
248 low-level interface to the Hub and Engines. Most real usage will involve creating one or more
249 :class:`~parallel.View` objects for working with engines in various ways.
245
250
246
251
247 DirectView
252 DirectView
@@ -300,6 +305,27 Execution via DirectView
300
305
301 The DirectView is the simplest way to work with one or more engines directly (hence the name).
306 The DirectView is the simplest way to work with one or more engines directly (hence the name).
302
307
308 For instance, to get the process ID of all your engines:
309
310 .. sourcecode:: ipython
311
312 In [5]: import os
313
314 In [6]: dview.apply_sync(os.getpid)
315 Out[6]: [1354, 1356, 1358, 1360]
316
317 Or to see the hostname of the machine they are on:
318
319 .. sourcecode:: ipython
320
321 In [5]: import socket
322
323 In [6]: dview.apply_sync(socket.gethostname)
324 Out[6]: ['tesla', 'tesla', 'edison', 'edison', 'edison']
325
326 .. note::
327
328 TODO: expand on direct execution
303
329
304 Data movement via DirectView
330 Data movement via DirectView
305 ****************************
331 ****************************
@@ -341,24 +367,28 between engines, MPI should be used:
341 Push and pull
367 Push and pull
342 -------------
368 -------------
343
369
344 push
370 :meth:`~IPython.parallel.client.view.DirectView.push`
345
371
346 pull
372 :meth:`~IPython.parallel.client.view.DirectView.pull`
347
373
374 .. note::
348
375
376 TODO: write this section
349
377
350
378
351
352 LoadBalancedView
379 LoadBalancedView
353 ----------------
380 ----------------
354
381
355 The :class:`.LoadBalancedView`
382 The :class:`~.LoadBalancedView` is the class for load-balanced execution via the task scheduler.
356
383 These views always run tasks on exactly one engine, but let the scheduler determine where that
384 should be, allowing load-balancing of tasks. The LoadBalancedView does allow you to specify
385 restrictions on where and when tasks can execute, for more complicated load-balanced workflows.
357
386
358 Data Movement
387 Data Movement
359 =============
388 =============
360
389
361 Reference
390 Since the :class:`~.LoadBalancedView` does not know where execution will take place, explicit
391 data movement methods like push/pull and scatter/gather do not make sense, and are not provided.
362
392
363 Results
393 Results
364 =======
394 =======
@@ -366,9 +396,9 Results
366 AsyncResults
396 AsyncResults
367 ------------
397 ------------
368
398
369 Our primary representation is the AsyncResult object, based on the object of the same name in
399 Our primary representation of the results of remote execution is the :class:`~.AsyncResult`
370 the built-in :mod:`multiprocessing.pool` module. Our version provides a superset of that
400 object, based on the object of the same name in the built-in :mod:`multiprocessing.pool`
371 interface.
401 module. Our version provides a superset of that interface.
372
402
373 The basic principle of the AsyncResult is the encapsulation of one or more results not yet completed. Execution methods (including data movement, such as push/pull) will all return
403 The basic principle of the AsyncResult is the encapsulation of one or more results not yet completed. Execution methods (including data movement, such as push/pull) will all return
374 AsyncResults when `block=False`.
404 AsyncResults when `block=False`.
@@ -432,7 +462,7 a CompositeError, a subclass of RemoteError, will be raised.
432 .. seealso::
462 .. seealso::
433
463
434 For more information on remote exceptions, see :ref:`the section in the Direct Interface
464 For more information on remote exceptions, see :ref:`the section in the Direct Interface
435 <Parallel_exceptions>`.
465 <parallel_exceptions>`.
436
466
437 Extended interface
467 Extended interface
438 ******************
468 ******************
@@ -445,7 +475,8 that many calls (any of those submitted via DirectView) will map results to engi
445 provide a :meth:`get_dict`, which is also a wrapper on :meth:`get`, which returns a dictionary
475 provide a :meth:`get_dict`, which is also a wrapper on :meth:`get`, which returns a dictionary
446 of the individual results, keyed by engine ID.
476 of the individual results, keyed by engine ID.
447
477
448 You can also prevent a submitted job from actually executing, via the AsyncResult's :meth:`abort` method. This will instruct engines to not execute the job when it arrives.
478 You can also prevent a submitted job from actually executing, via the AsyncResult's
479 :meth:`abort` method. This will instruct engines to not execute the job when it arrives.
449
480
450 The larger extension of the AsyncResult API is the :attr:`metadata` attribute. The metadata
481 The larger extension of the AsyncResult API is the :attr:`metadata` attribute. The metadata
451 is a dictionary (with attribute access) that contains, logically enough, metadata about the
482 is a dictionary (with attribute access) that contains, logically enough, metadata about the
@@ -570,11 +601,9 objects, can be passed as argument to wait. A timeout can be specified, which wi
570 the call from blocking for more than a specified time, but the default behavior is to wait
601 the call from blocking for more than a specified time, but the default behavior is to wait
571 forever.
602 forever.
572
603
573
604 The client also has an ``outstanding`` attribute - a ``set`` of msg_ids that are awaiting
574
605 replies. This is the default if wait is called with no arguments - i.e. wait on *all*
575 The client also has an `outstanding` attribute - a ``set`` of msg_ids that are awaiting replies.
606 outstanding messages.
576 This is the default if wait is called with no arguments - i.e. wait on *all* outstanding
577 messages.
578
607
579
608
580 .. note::
609 .. note::
@@ -584,11 +613,11 messages.
584 Map
613 Map
585 ===
614 ===
586
615
587 Many parallel computing problems can be expressed as a `map`, or running a single program with a
616 Many parallel computing problems can be expressed as a ``map``, or running a single program with
588 variety of different inputs. Python has a built-in :py-func:`map`, which does exactly this, and
617 a variety of different inputs. Python has a built-in :py:func:`map`, which does exactly this,
589 many parallel execution tools in Python, such as the built-in :py-class:`multiprocessing.Pool`
618 and many parallel execution tools in Python, such as the built-in
590 object provide implementations of `map`. All View objects provide a :meth:`map` method as well,
619 :py:class:`multiprocessing.Pool` object provide implementations of `map`. All View objects
591 but the load-balanced and direct implementations differ.
620 provide a :meth:`map` method as well, but the load-balanced and direct implementations differ.
592
621
593 Views' map methods can be called on any number of sequences, but they can also take the `block`
622 Views' map methods can be called on any number of sequences, but they can also take the `block`
594 and `bound` keyword arguments, just like :meth:`~client.apply`, but *only as keywords*.
623 and `bound` keyword arguments, just like :meth:`~client.apply`, but *only as keywords*.
@@ -603,19 +632,27 and `bound` keyword arguments, just like :meth:`~client.apply`, but *only as key
603 Decorators and RemoteFunctions
632 Decorators and RemoteFunctions
604 ==============================
633 ==============================
605
634
606 @parallel
635 .. note::
607
636
608 @remote
637 TODO: write this section
609
638
610 RemoteFunction
639 :func:`~IPython.parallel.client.remotefunction.@parallel`
611
640
612 ParallelFunction
641 :func:`~IPython.parallel.client.remotefunction.@remote`
642
643 :class:`~IPython.parallel.client.remotefunction.RemoteFunction`
644
645 :class:`~IPython.parallel.client.remotefunction.ParallelFunction`
613
646
614 Dependencies
647 Dependencies
615 ============
648 ============
616
649
617 @depend
650 .. note::
651
652 TODO: write this section
653
654 :func:`~IPython.parallel.controller.dependency.@depend`
618
655
619 @require
656 :func:`~IPython.parallel.controller.dependency.@require`
620
657
621 Dependency
658 :class:`~IPython.parallel.controller.dependency.Dependency`
@@ -598,6 +598,7 execution, and will fail with an UnmetDependencyError.
598 ...: time.sleep(t)
598 ...: time.sleep(t)
599 ...: return t
599 ...: return t
600
600
601 .. _parallel_exceptions:
601
602
602 Parallel exceptions
603 Parallel exceptions
603 -------------------
604 -------------------
@@ -617,47 +618,47 more other types of exceptions. Here is how it works:
617 In [77]: dview.execute('1/0')
618 In [77]: dview.execute('1/0')
618 ---------------------------------------------------------------------------
619 ---------------------------------------------------------------------------
619 CompositeError Traceback (most recent call last)
620 CompositeError Traceback (most recent call last)
620 /home/you/<ipython-input-10-15c2c22dec39> in <module>()
621 /home/user/<ipython-input-10-5d56b303a66c> in <module>()
621 ----> 1 dview.execute('1/0', block=True)
622 ----> 1 dview.execute('1/0')
622
623
623 /path/to/site-packages/IPython/parallel/view.py in execute(self, code, block)
624 /path/to/site-packages/IPython/parallel/client/view.pyc in execute(self, code, targets, block)
624 460 default: self.block
625 591 default: self.block
625 461 """
626 592 """
626 --> 462 return self.apply_with_flags(util._execute, args=(code,), block=block)
627 --> 593 return self._really_apply(util._execute, args=(code,), block=block, targets=targets)
627 463
628 594
628 464 def run(self, filename, block=None):
629 595 def run(self, filename, targets=None, block=None):
629
630
630 /home/you/<string> in apply_with_flags(self, f, args, kwargs, block, track)
631 /home/user/<string> in _really_apply(self, f, args, kwargs, targets, block, track)
631
632
632 /path/to/site-packages/IPython/parallel/view.py in sync_results(f, self, *args, **kwargs)
633 /path/to/site-packages/IPython/parallel/client/view.pyc in sync_results(f, self, *args, **kwargs)
633 46 def sync_results(f, self, *args, **kwargs):
634 55 def sync_results(f, self, *args, **kwargs):
634 47 """sync relevant results from self.client to our results attribute."""
635 56 """sync relevant results from self.client to our results attribute."""
635 ---> 48 ret = f(self, *args, **kwargs)
636 ---> 57 ret = f(self, *args, **kwargs)
636 49 delta = self.outstanding.difference(self.client.outstanding)
637 58 delta = self.outstanding.difference(self.client.outstanding)
637 50 completed = self.outstanding.intersection(delta)
638 59 completed = self.outstanding.intersection(delta)
638
639
639 /home/you/<string> in apply_with_flags(self, f, args, kwargs, block, track)
640 /home/user/<string> in _really_apply(self, f, args, kwargs, targets, block, track)
640
641
641 /path/to/site-packages/IPython/parallel/view.py in save_ids(f, self, *args, **kwargs)
642 /path/to/site-packages/IPython/parallel/client/view.pyc in save_ids(f, self, *args, **kwargs)
642 35 n_previous = len(self.client.history)
643 44 n_previous = len(self.client.history)
643 36 try:
644 45 try:
644 ---> 37 ret = f(self, *args, **kwargs)
645 ---> 46 ret = f(self, *args, **kwargs)
645 38 finally:
646 47 finally:
646 39 nmsgs = len(self.client.history) - n_previous
647 48 nmsgs = len(self.client.history) - n_previous
647
648
648 /path/to/site-packages/IPython/parallel/view.py in apply_with_flags(self, f, args, kwargs, block, track)
649 /path/to/site-packages/IPython/parallel/client/view.pyc in _really_apply(self, f, args, kwargs, targets, block, track)
649 398 if block:
650 529 if block:
650 399 try:
651 530 try:
651 --> 400 return ar.get()
652 --> 531 return ar.get()
652 401 except KeyboardInterrupt:
653 532 except KeyboardInterrupt:
653 402 pass
654 533 pass
654
655
655 /path/to/site-packages/IPython/parallel/asyncresult.pyc in get(self, timeout)
656 /path/to/site-packages/IPython/parallel/client/asyncresult.pyc in get(self, timeout)
656 87 return self._result
657 101 return self._result
657 88 else:
658 102 else:
658 ---> 89 raise self._exception
659 --> 103 raise self._exception
659 90 else:
660 104 else:
660 91 raise error.TimeoutError("Result not ready.")
661 105 raise error.TimeoutError("Result not ready.")
661
662
662 CompositeError: one or more exceptions from call to method: _execute
663 CompositeError: one or more exceptions from call to method: _execute
663 [0:apply]: ZeroDivisionError: integer division or modulo by zero
664 [0:apply]: ZeroDivisionError: integer division or modulo by zero
@@ -665,7 +666,6 more other types of exceptions. Here is how it works:
665 [2:apply]: ZeroDivisionError: integer division or modulo by zero
666 [2:apply]: ZeroDivisionError: integer division or modulo by zero
666 [3:apply]: ZeroDivisionError: integer division or modulo by zero
667 [3:apply]: ZeroDivisionError: integer division or modulo by zero
667
668
668
669 Notice how the error message printed when :exc:`CompositeError` is raised has
669 Notice how the error message printed when :exc:`CompositeError` is raised has
670 information about the individual exceptions that were raised on each engine.
670 information about the individual exceptions that were raised on each engine.
671 If you want, you can even raise one of these original exceptions:
671 If you want, you can even raise one of these original exceptions:
@@ -674,22 +674,32 If you want, you can even raise one of these original exceptions:
674
674
675 In [80]: try:
675 In [80]: try:
676 ....: dview.execute('1/0')
676 ....: dview.execute('1/0')
677 ....: except client.CompositeError, e:
677 ....: except parallel.error.CompositeError, e:
678 ....: e.raise_exception()
678 ....: e.raise_exception()
679 ....:
679 ....:
680 ....:
680 ....:
681 ---------------------------------------------------------------------------
681 ---------------------------------------------------------------------------
682 ZeroDivisionError Traceback (most recent call last)
682 RemoteError Traceback (most recent call last)
683
683 /home/user/<ipython-input-17-8597e7e39858> in <module>()
684 /ipython1-client-r3021/docs/examples/<ipython console> in <module>()
684 2 dview.execute('1/0')
685
685 3 except CompositeError as e:
686 /ipython1-client-r3021/ipython1/kernel/error.pyc in raise_exception(self, excid)
686 ----> 4 e.raise_exception()
687 156 raise IndexError("an exception with index %i does not exist"%excid)
687
688 157 else:
688 /path/to/site-packages/IPython/parallel/error.pyc in raise_exception(self, excid)
689 --> 158 raise et, ev, etb
689 266 raise IndexError("an exception with index %i does not exist"%excid)
690 159
690 267 else:
691 160 def collect_exceptions(rlist, method):
691 --> 268 raise RemoteError(en, ev, etb, ei)
692
692 269
693 270
694
695 RemoteError: ZeroDivisionError(integer division or modulo by zero)
696 Traceback (most recent call last):
697 File "/path/to/site-packages/IPython/parallel/engine/streamkernel.py", line 330, in apply_request
698 exec code in working,working
699 File "<string>", line 1, in <module>
700 File "/path/to/site-packages/IPython/parallel/util.py", line 354, in _execute
701 exec code in globals()
702 File "<string>", line 1, in <module>
693 ZeroDivisionError: integer division or modulo by zero
703 ZeroDivisionError: integer division or modulo by zero
694
704
695 If you are working in IPython, you can simple type ``%debug`` after one of
705 If you are working in IPython, you can simple type ``%debug`` after one of
@@ -701,47 +711,47 instance:
701 In [81]: dview.execute('1/0')
711 In [81]: dview.execute('1/0')
702 ---------------------------------------------------------------------------
712 ---------------------------------------------------------------------------
703 CompositeError Traceback (most recent call last)
713 CompositeError Traceback (most recent call last)
704 /home/you/<ipython-input-10-15c2c22dec39> in <module>()
714 /home/user/<ipython-input-10-5d56b303a66c> in <module>()
705 ----> 1 dview.execute('1/0', block=True)
715 ----> 1 dview.execute('1/0')
706
716
707 /path/to/site-packages/IPython/parallel/view.py in execute(self, code, block)
717 /path/to/site-packages/IPython/parallel/client/view.pyc in execute(self, code, targets, block)
708 460 default: self.block
718 591 default: self.block
709 461 """
719 592 """
710 --> 462 return self.apply_with_flags(util._execute, args=(code,), block=block)
720 --> 593 return self._really_apply(util._execute, args=(code,), block=block, targets=targets)
711 463
721 594
712 464 def run(self, filename, block=None):
722 595 def run(self, filename, targets=None, block=None):
713
723
714 /home/you/<string> in apply_with_flags(self, f, args, kwargs, block, track)
724 /home/user/<string> in _really_apply(self, f, args, kwargs, targets, block, track)
715
725
716 /path/to/site-packages/IPython/parallel/view.py in sync_results(f, self, *args, **kwargs)
726 /path/to/site-packages/IPython/parallel/client/view.pyc in sync_results(f, self, *args, **kwargs)
717 46 def sync_results(f, self, *args, **kwargs):
727 55 def sync_results(f, self, *args, **kwargs):
718 47 """sync relevant results from self.client to our results attribute."""
728 56 """sync relevant results from self.client to our results attribute."""
719 ---> 48 ret = f(self, *args, **kwargs)
729 ---> 57 ret = f(self, *args, **kwargs)
720 49 delta = self.outstanding.difference(self.client.outstanding)
730 58 delta = self.outstanding.difference(self.client.outstanding)
721 50 completed = self.outstanding.intersection(delta)
731 59 completed = self.outstanding.intersection(delta)
722
732
723 /home/you/<string> in apply_with_flags(self, f, args, kwargs, block, track)
733 /home/user/<string> in _really_apply(self, f, args, kwargs, targets, block, track)
724
734
725 /path/to/site-packages/IPython/parallel/view.py in save_ids(f, self, *args, **kwargs)
735 /path/to/site-packages/IPython/parallel/client/view.pyc in save_ids(f, self, *args, **kwargs)
726 35 n_previous = len(self.client.history)
736 44 n_previous = len(self.client.history)
727 36 try:
737 45 try:
728 ---> 37 ret = f(self, *args, **kwargs)
738 ---> 46 ret = f(self, *args, **kwargs)
729 38 finally:
739 47 finally:
730 39 nmsgs = len(self.client.history) - n_previous
740 48 nmsgs = len(self.client.history) - n_previous
731
741
732 /path/to/site-packages/IPython/parallel/view.py in apply_with_flags(self, f, args, kwargs, block, track)
742 /path/to/site-packages/IPython/parallel/client/view.pyc in _really_apply(self, f, args, kwargs, targets, block, track)
733 398 if block:
743 529 if block:
734 399 try:
744 530 try:
735 --> 400 return ar.get()
745 --> 531 return ar.get()
736 401 except KeyboardInterrupt:
746 532 except KeyboardInterrupt:
737 402 pass
747 533 pass
738
748
739 /path/to/site-packages/IPython/parallel/asyncresult.pyc in get(self, timeout)
749 /path/to/site-packages/IPython/parallel/client/asyncresult.pyc in get(self, timeout)
740 87 return self._result
750 101 return self._result
741 88 else:
751 102 else:
742 ---> 89 raise self._exception
752 --> 103 raise self._exception
743 90 else:
753 104 else:
744 91 raise error.TimeoutError("Result not ready.")
754 105 raise error.TimeoutError("Result not ready.")
745
755
746 CompositeError: one or more exceptions from call to method: _execute
756 CompositeError: one or more exceptions from call to method: _execute
747 [0:apply]: ZeroDivisionError: integer division or modulo by zero
757 [0:apply]: ZeroDivisionError: integer division or modulo by zero
@@ -750,27 +760,26 instance:
750 [3:apply]: ZeroDivisionError: integer division or modulo by zero
760 [3:apply]: ZeroDivisionError: integer division or modulo by zero
751
761
752 In [82]: %debug
762 In [82]: %debug
753 > /path/to/site-packages/IPython/parallel/asyncresult.py(80)get()
763 > /path/to/site-packages/IPython/parallel/client/asyncresult.py(103)get()
754 79 else:
764 102 else:
755 ---> 80 raise self._exception
765 --> 103 raise self._exception
756 81 else:
766 104 else:
757
767
758
768 # With the debugger running, self._exception is the exceptions instance. We can tab complete
759 # With the debugger running, e is the exceptions instance. We can tab complete
760 # on it and see the extra methods that are available.
769 # on it and see the extra methods that are available.
761 ipdb> e.
770 ipdb> self._exception.<tab>
762 e.__class__ e.__getitem__ e.__new__ e.__setstate__ e.args
771 e.__class__ e.__getitem__ e.__new__ e.__setstate__ e.args
763 e.__delattr__ e.__getslice__ e.__reduce__ e.__str__ e.elist
772 e.__delattr__ e.__getslice__ e.__reduce__ e.__str__ e.elist
764 e.__dict__ e.__hash__ e.__reduce_ex__ e.__weakref__ e.message
773 e.__dict__ e.__hash__ e.__reduce_ex__ e.__weakref__ e.message
765 e.__doc__ e.__init__ e.__repr__ e._get_engine_str e.print_tracebacks
774 e.__doc__ e.__init__ e.__repr__ e._get_engine_str e.print_tracebacks
766 e.__getattribute__ e.__module__ e.__setattr__ e._get_traceback e.raise_exception
775 e.__getattribute__ e.__module__ e.__setattr__ e._get_traceback e.raise_exception
767 ipdb> e.print_tracebacks()
776 ipdb> self._exception.print_tracebacks()
768 [0:apply]:
777 [0:apply]:
769 Traceback (most recent call last):
778 Traceback (most recent call last):
770 File "/path/to/site-packages/IPython/parallel/streamkernel.py", line 332, in apply_request
779 File "/path/to/site-packages/IPython/parallel/engine/streamkernel.py", line 330, in apply_request
771 exec code in working, working
780 exec code in working,working
772 File "<string>", line 1, in <module>
781 File "<string>", line 1, in <module>
773 File "/path/to/site-packages/IPython/parallel/client.py", line 69, in _execute
782 File "/path/to/site-packages/IPython/parallel/util.py", line 354, in _execute
774 exec code in globals()
783 exec code in globals()
775 File "<string>", line 1, in <module>
784 File "<string>", line 1, in <module>
776 ZeroDivisionError: integer division or modulo by zero
785 ZeroDivisionError: integer division or modulo by zero
@@ -778,10 +787,10 instance:
778
787
779 [1:apply]:
788 [1:apply]:
780 Traceback (most recent call last):
789 Traceback (most recent call last):
781 File "/path/to/site-packages/IPython/parallel/streamkernel.py", line 332, in apply_request
790 File "/path/to/site-packages/IPython/parallel/engine/streamkernel.py", line 330, in apply_request
782 exec code in working, working
791 exec code in working,working
783 File "<string>", line 1, in <module>
792 File "<string>", line 1, in <module>
784 File "/path/to/site-packages/IPython/parallel/client.py", line 69, in _execute
793 File "/path/to/site-packages/IPython/parallel/util.py", line 354, in _execute
785 exec code in globals()
794 exec code in globals()
786 File "<string>", line 1, in <module>
795 File "<string>", line 1, in <module>
787 ZeroDivisionError: integer division or modulo by zero
796 ZeroDivisionError: integer division or modulo by zero
@@ -789,10 +798,10 instance:
789
798
790 [2:apply]:
799 [2:apply]:
791 Traceback (most recent call last):
800 Traceback (most recent call last):
792 File "/path/to/site-packages/IPython/parallel/streamkernel.py", line 332, in apply_request
801 File "/path/to/site-packages/IPython/parallel/engine/streamkernel.py", line 330, in apply_request
793 exec code in working, working
802 exec code in working,working
794 File "<string>", line 1, in <module>
803 File "<string>", line 1, in <module>
795 File "/path/to/site-packages/IPython/parallel/client.py", line 69, in _execute
804 File "/path/to/site-packages/IPython/parallel/util.py", line 354, in _execute
796 exec code in globals()
805 exec code in globals()
797 File "<string>", line 1, in <module>
806 File "<string>", line 1, in <module>
798 ZeroDivisionError: integer division or modulo by zero
807 ZeroDivisionError: integer division or modulo by zero
@@ -800,18 +809,13 instance:
800
809
801 [3:apply]:
810 [3:apply]:
802 Traceback (most recent call last):
811 Traceback (most recent call last):
803 File "/path/to/site-packages/IPython/parallel/streamkernel.py", line 332, in apply_request
812 File "/path/to/site-packages/IPython/parallel/engine/streamkernel.py", line 330, in apply_request
804 exec code in working, working
813 exec code in working,working
805 File "<string>", line 1, in <module>
814 File "<string>", line 1, in <module>
806 File "/path/to/site-packages/IPython/parallel/client.py", line 69, in _execute
815 File "/path/to/site-packages/IPython/parallel/util.py", line 354, in _execute
807 exec code in globals()
816 exec code in globals()
808 File "<string>", line 1, in <module>
817 File "<string>", line 1, in <module>
809 ZeroDivisionError: integer division or modulo by zero
818 ZeroDivisionError: integer division or modulo by zero
810
811
812 .. note::
813
814 TODO: The above tracebacks are not up to date
815
819
816
820
817 All of this same error handling magic even works in non-blocking mode:
821 All of this same error handling magic even works in non-blocking mode:
@@ -825,15 +829,15 All of this same error handling magic even works in non-blocking mode:
825 In [85]: ar.get()
829 In [85]: ar.get()
826 ---------------------------------------------------------------------------
830 ---------------------------------------------------------------------------
827 CompositeError Traceback (most recent call last)
831 CompositeError Traceback (most recent call last)
828 /Users/minrk/<ipython-input-3-8531eb3d26fb> in <module>()
832 /home/user/<ipython-input-21-8531eb3d26fb> in <module>()
829 ----> 1 ar.get()
833 ----> 1 ar.get()
830
834
831 /path/to/site-packages/IPython/parallel/asyncresult.pyc in get(self, timeout)
835 /path/to/site-packages/IPython/parallel/client/asyncresult.pyc in get(self, timeout)
832 78 return self._result
836 101 return self._result
833 79 else:
837 102 else:
834 ---> 80 raise self._exception
838 --> 103 raise self._exception
835 81 else:
839 104 else:
836 82 raise error.TimeoutError("Result not ready.")
840 105 raise error.TimeoutError("Result not ready.")
837
841
838 CompositeError: one or more exceptions from call to method: _execute
842 CompositeError: one or more exceptions from call to method: _execute
839 [0:apply]: ZeroDivisionError: integer division or modulo by zero
843 [0:apply]: ZeroDivisionError: integer division or modulo by zero
@@ -27,11 +27,40 engines using the various methods, we outline some of the general issues that
27 come up when starting the controller and engines. These things come up no
27 come up when starting the controller and engines. These things come up no
28 matter which method you use to start your IPython cluster.
28 matter which method you use to start your IPython cluster.
29
29
30 If you are running engines on multiple machines, you will likely need to instruct the
31 controller to listen for connections on an external interface. This can be done by specifying
32 the ``ip`` argument on the command-line, or the ``HubFactory.ip`` configurable in
33 :file:`ipcontroller_config.py`.
34
35 If your machines are on a trusted network, you can safely instruct the controller to listen
36 on all public interfaces with::
37
38 $> ipcontroller ip=*
39
40 Or you can set the same behavior as the default by adding the following line to your :file:`ipcontroller_config.py`:
41
42 .. sourcecode:: python
43
44 c.HubFactory.ip = '*'
45
46 .. note::
47
48 Due to the lack of security in ZeroMQ, the controller will only listen for connections on
49 localhost by default. If you see Timeout errors on engines or clients, then the first
50 thing you should check is the ip address the controller is listening on, and make sure
51 that it is visible from the timing out machine.
52
53 .. seealso::
54
55 Our `notes <parallel_security>`_ on security in the new parallel computing code.
56
30 Let's say that you want to start the controller on ``host0`` and engines on
57 Let's say that you want to start the controller on ``host0`` and engines on
31 hosts ``host1``-``hostn``. The following steps are then required:
58 hosts ``host1``-``hostn``. The following steps are then required:
32
59
33 1. Start the controller on ``host0`` by running :command:`ipcontroller` on
60 1. Start the controller on ``host0`` by running :command:`ipcontroller` on
34 ``host0``.
61 ``host0``. The controller must be instructed to listen on an interface visible
62 to the engine machines, via the ``ip`` command-line argument or ``HubFactory.ip``
63 in :file:`ipcontroller_config.py`.
35 2. Move the JSON file (:file:`ipcontroller-engine.json`) created by the
64 2. Move the JSON file (:file:`ipcontroller-engine.json`) created by the
36 controller from ``host0`` to hosts ``host1``-``hostn``.
65 controller from ``host0`` to hosts ``host1``-``hostn``.
37 3. Start the engines on hosts ``host1``-``hostn`` by running
66 3. Start the engines on hosts ``host1``-``hostn`` by running
@@ -108,7 +137,7 The configuration files are loaded with commented-out settings and explanations,
108 which should cover most of the available possibilities.
137 which should cover most of the available possibilities.
109
138
110 Using various batch systems with :command:`ipcluster`
139 Using various batch systems with :command:`ipcluster`
111 ------------------------------------------------------
140 -----------------------------------------------------
112
141
113 :command:`ipcluster` has a notion of Launchers that can start controllers
142 :command:`ipcluster` has a notion of Launchers that can start controllers
114 and engines with various remote execution schemes. Currently supported
143 and engines with various remote execution schemes. Currently supported
@@ -345,7 +374,7 The controller's remote location and configuration can be specified:
345 # note that remotely launched ipcontroller will not get the contents of
374 # note that remotely launched ipcontroller will not get the contents of
346 # the local ipcontroller_config.py unless it resides on the *remote host*
375 # the local ipcontroller_config.py unless it resides on the *remote host*
347 # in the location specified by the `profile_dir` argument.
376 # in the location specified by the `profile_dir` argument.
348 # c.SSHControllerLauncher.program_args = ['--reuse', 'ip=0.0.0.0', 'profile_dir=/path/to/cd']
377 # c.SSHControllerLauncher.program_args = ['--reuse', 'ip=*', 'profile_dir=/path/to/cd']
349
378
350 .. note::
379 .. note::
351
380
@@ -414,7 +443,7 controller and engines from IPython.
414
443
415 The order of the above operations may be important. You *must*
444 The order of the above operations may be important. You *must*
416 start the controller before the engines, unless you are reusing connection
445 start the controller before the engines, unless you are reusing connection
417 information (via `--reuse`), in which case ordering is not important.
446 information (via ``--reuse``), in which case ordering is not important.
418
447
419 .. note::
448 .. note::
420
449
@@ -429,7 +458,9 Starting the controller and engines on different hosts
429 When the controller and engines are running on different hosts, things are
458 When the controller and engines are running on different hosts, things are
430 slightly more complicated, but the underlying ideas are the same:
459 slightly more complicated, but the underlying ideas are the same:
431
460
432 1. Start the controller on a host using :command:`ipcontroller`.
461 1. Start the controller on a host using :command:`ipcontroller`. The controller must be
462 instructed to listen on an interface visible to the engine machines, via the ``ip``
463 command-line argument or ``HubFactory.ip`` in :file:`ipcontroller_config.py`.
433 2. Copy :file:`ipcontroller-engine.json` from :file:`~/.ipython/profile_<name>/security` on
464 2. Copy :file:`ipcontroller-engine.json` from :file:`~/.ipython/profile_<name>/security` on
434 the controller's host to the host where the engines will run.
465 the controller's host to the host where the engines will run.
435 3. Use :command:`ipengine` on the engine's hosts to start the engines.
466 3. Use :command:`ipengine` on the engine's hosts to start the engines.
@@ -507,8 +538,7 To instruct the controller to listen on a specific interface, you can set the
507
538
508 c.HubFactory.ip = '*'
539 c.HubFactory.ip = '*'
509
540
510
541 When connecting to a Controller that is listening on loopback or behind a firewall, it may
511 When connecting to a Controller that is listening on loopback, or behind a firewall, it may
512 be necessary to specify an SSH server to use for tunnels, and the external IP of the
542 be necessary to specify an SSH server to use for tunnels, and the external IP of the
513 Controller. If you specified that the HubFactory listen on loopback, or all interfaces,
543 Controller. If you specified that the HubFactory listen on loopback, or all interfaces,
514 then IPython will try to guess the external IP. If you are on a system with VM network
544 then IPython will try to guess the external IP. If you are on a system with VM network
General Comments 0
You need to be logged in to leave comments. Login now