##// END OF EJS Templates
Doc tweaks and updates
MinRK -
Show More
@@ -0,0 +1,438 b''
1 .. _parallel_details:
2
3 ==========================================
4 Details of Parallel Computing with IPython
5 ==========================================
6
7 .. note::
8
9 There are still many sections to fill out
10
11
12 Caveats
13 =======
14
15 First, some caveats about the detailed workings of parallel computing with 0MQ and IPython.
16
17 Non-copying sends and numpy arrays
18 ----------------------------------
19
20 When numpy arrays are passed as arguments to apply or via data-movement methods, they are not
21 copied. This means that you must be careful if you are sending an array that you intend to work on.
22 PyZMQ does allow you to track when a message has been sent so you can know when it is safe to edit the buffer, but
23 IPython only allows for this.
24
25 It is also important to note that the non-copying receive of a message is *read-only*. That
26 means that if you intend to work in-place on an array that you have sent or received, you must copy
27 it. This is true for both numpy arrays sent to engines and numpy arrays retrieved as results.
28
29 The following will fail:
30
31 .. sourcecode:: ipython
32
33 In [3]: A = numpy.zeros(2)
34
35 In [4]: def setter(a):
36 ...: a[0]=1
37 ...: return a
38
39 In [5]: rc[0].apply_sync(setter, A)
40 ---------------------------------------------------------------------------
41 RemoteError Traceback (most recent call last)
42 ...
43 RemoteError: RuntimeError(array is not writeable)
44 Traceback (most recent call last):
45 File "/Users/minrk/dev/ip/mine/IPython/zmq/parallel/streamkernel.py", line 329, in apply_request
46 exec code in working, working
47 File "<string>", line 1, in <module>
48 File "<ipython-input-14-736187483856>", line 2, in setter
49 RuntimeError: array is not writeable
50
51 If you do need to edit the array in-place, just remember to copy the array if it's read-only.
52 The :attr:`ndarray.flags.writeable` flag will tell you if you can write to an array.
53
54 .. sourcecode:: ipython
55
56 In [3]: A = numpy.zeros(2)
57
58 In [4]: def setter(a):
59 ...: """only copy read-only arrays"""
60 ...: if not a.flags.writeable:
61 ...: a=a.copy()
62 ...: a[0]=1
63 ...: return a
64
65 In [5]: rc[0].apply_sync(setter, A)
66 Out[5]: array([ 1., 0.])
67
68 # note that results will also be read-only:
69 In [6]: _.flags.writeable
70 Out[6]: False
71
72 What is sendable?
73 -----------------
74
75 If IPython doesn't know what to do with an object, it will pickle it. There is a short list of
76 objects that are not pickled: ``buffers``, ``str/bytes`` objects, and ``numpy``
77 arrays. These are handled specially by IPython in order to prevent the copying of data. Sending
78 bytes or numpy arrays will result in exactly zero in-memory copies of your data (unless the data
79 is very small).
80
81 If you have an object that provides a Python buffer interface, then you can always send that
82 buffer without copying - and reconstruct the object on the other side in your own code. It is
83 possible that the object reconstruction will become extensible, so you can add your own
84 non-copying types, but this does not yet exist.
85
86
87 Running Code
88 ============
89
90 There are two principal units of execution in Python: strings of Python code (e.g. 'a=5'),
91 and Python functions. IPython is designed around the use of functions via the core
92 Client method, called `apply`.
93
94 Apply
95 -----
96
97 The principal method of remote execution is :meth:`apply`, of Client and View objects. The Client provides the full execution and communication API for engines via its apply method.
98
99 f : function
100 The fuction to be called remotely
101 args : tuple/list
102 The positional arguments passed to `f`
103 kwargs : dict
104 The keyword arguments passed to `f`
105 bound : bool (default: False)
106 Whether to pass the Engine(s) Namespace as the first argument to `f`.
107 block : bool (default: self.block)
108 Whether to wait for the result, or return immediately.
109 False:
110 returns AsyncResult
111 True:
112 returns actual result(s) of f(*args, **kwargs)
113 if multiple targets:
114 list of results, matching `targets`
115 track : bool
116 whether to track non-copying sends.
117 [default False]
118
119 targets : int,list of ints, 'all', None
120 Specify the destination of the job.
121 if None:
122 Submit via Task queue for load-balancing.
123 if 'all':
124 Run on all active engines
125 if list:
126 Run on each specified engine
127 if int:
128 Run on single engine
129 Not eht
130
131 balanced : bool, default None
132 whether to load-balance. This will default to True
133 if targets is unspecified, or False if targets is specified.
134
135 If `balanced` and `targets` are both specified, the task will
136 be assigne to *one* of the targets by the scheduler.
137
138 The following arguments are only used when balanced is True:
139
140 after : Dependency or collection of msg_ids
141 Only for load-balanced execution (targets=None)
142 Specify a list of msg_ids as a time-based dependency.
143 This job will only be run *after* the dependencies
144 have been met.
145
146 follow : Dependency or collection of msg_ids
147 Only for load-balanced execution (targets=None)
148 Specify a list of msg_ids as a location-based dependency.
149 This job will only be run on an engine where this dependency
150 is met.
151
152 timeout : float/int or None
153 Only for load-balanced execution (targets=None)
154 Specify an amount of time (in seconds) for the scheduler to
155 wait for dependencies to be met before failing with a
156 DependencyTimeout.
157
158 execute and run
159 ---------------
160
161 For executing strings of Python code, Clients also provide an :meth:`execute` and a :meth:`run`
162 method, which rather than take functions and arguments, take simple strings. `execute` simply
163 takes a string of Python code to execute, and sends it to the Engine(s). `run` is the same as
164 `execute`, but for a *file*, rather than a string. It is simply a wrapper that does something
165 very similar to ``execute(open(f).read())``.
166
167 .. note::
168
169 TODO: Example
170
171 Views
172 =====
173
174 The principal extension of the :class:`~parallel.client.Client` is the
175 :class:`~parallel.view.View` class. The client is a fairly stateless object with respect to
176 execution patterns, where you must specify everything about the execution as keywords to each
177 call to :meth:`apply`. For users who want to more conveniently specify various options for
178 several similar calls, we have the :class:`~parallel.view.View` objects. The basic principle of
179 the views is to encapsulate the keyword arguments to :meth:`client.apply` as attributes,
180 allowing users to specify them once and apply to any subsequent calls until the attribute is
181 changed.
182
183 Two of apply's keyword arguments are set at the construction of the View, and are immutable for
184 a given View: `balanced` and `targets`. `balanced` determines whether the View will be a
185 :class:`.LoadBalancedView` or a :class:`.DirectView`, and `targets` will be the View's `targets`
186 attribute. Attempts to change this will raise errors.
187
188 Views are cached by targets+balanced combinations, so requesting a view multiple times will always return the *same object*, not create a new one:
189
190 .. sourcecode:: ipython
191
192 In [3]: v1 = rc.view([1,2,3], balanced=True)
193 In [4]: v2 = rc.view([1,2,3], balanced=True)
194
195 In [5]: v2 is v1
196 Out[5]: True
197
198
199 A :class:`View` always uses its `targets` attribute, and it will use its `bound`
200 and `block` attributes in its :meth:`apply` method, but the suffixed :meth:`apply_x`
201 methods allow overriding `bound` and `block` for a single call.
202
203 ================== ========== ==========
204 method block bound
205 ================== ========== ==========
206 apply self.block self.bound
207 apply_sync True False
208 apply_async False False
209 apply_sync_bound True True
210 apply_async_bound False True
211 ================== ========== ==========
212
213 DirectView
214 ----------
215
216 The :class:`.DirectView` is the class for the IPython :ref:`Multiplexing Interface
217 <parallel_multiengine>`.
218
219 Creating a DirectView
220 *********************
221
222 DirectViews can be created in two ways, by index access to a client, or by a client's
223 :meth:`view` method. Index access to a Client works in a few ways. First, you can create
224 DirectViews to single engines simply by accessing the client by engine id:
225
226 .. sourcecode:: ipython
227
228 In [2]: rc[0]
229 Out[2]: <DirectView 0>
230
231 You can also create a DirectView with a list of engines:
232
233 .. sourcecode:: ipython
234
235 In [2]: rc[0,1,2]
236 Out[2]: <DirectView [0,1,2]>
237
238 Other methods for accessing elements, such as slicing and negative indexing, work by passing
239 the index directly to the client's :attr:`ids` list, so:
240
241 .. sourcecode:: ipython
242
243 # negative index
244 In [2]: rc[-1]
245 Out[2]: <DirectView 3>
246
247 # or slicing:
248 In [3]: rc[::2]
249 Out[3]: <DirectView [0,2]>
250
251 are always the same as:
252
253 .. sourcecode:: ipython
254
255 In [2]: rc[rc.ids[-1]]
256 Out[2]: <DirectView 3>
257
258 In [3]: rc[rc.ids[::2]]
259 Out[3]: <DirectView [0,2]>
260
261 Also note that the slice is evaluated at the time of construction of the DirectView, so the
262 targets will not change over time if engines are added/removed from the cluster. Requesting
263 two views with the same slice at different times will *not* necessarily return the same View
264 if the number of engines has changed.
265
266 Execution via DirectView
267 ************************
268
269 The DirectView is the simplest way to work with one or more engines directly (hence the name).
270
271
272 Data movement via DirectView
273 ****************************
274
275 Since a Python namespace is just a :class:`dict`, :class:`DirectView` objects provide
276 dictionary-style access by key and methods such as :meth:`get` and
277 :meth:`update` for convenience. This make the remote namespaces of the engines
278 appear as a local dictionary. Underneath, these methods call :meth:`apply`:
279
280 .. sourcecode:: ipython
281
282 In [51]: dview['a']=['foo','bar']
283
284 In [52]: dview['a']
285 Out[52]: [ ['foo', 'bar'], ['foo', 'bar'], ['foo', 'bar'], ['foo', 'bar'] ]
286
287 Scatter and gather
288 ------------------
289
290 Sometimes it is useful to partition a sequence and push the partitions to
291 different engines. In MPI language, this is know as scatter/gather and we
292 follow that terminology. However, it is important to remember that in
293 IPython's :class:`Client` class, :meth:`scatter` is from the
294 interactive IPython session to the engines and :meth:`gather` is from the
295 engines back to the interactive IPython session. For scatter/gather operations
296 between engines, MPI should be used:
297
298 .. sourcecode:: ipython
299
300 In [58]: dview.scatter('a',range(16))
301 Out[58]: [None,None,None,None]
302
303 In [59]: dview['a']
304 Out[59]: [ [0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15] ]
305
306 In [60]: dview.gather('a')
307 Out[60]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
308
309
310
311 LoadBalancedView
312 ----------------
313
314 The :class:`.LoadBalancedView`
315
316
317 Data Movement
318 =============
319
320 push
321
322 pull
323
324 Reference
325
326 Results
327 =======
328
329 AsyncResults are the primary class
330
331 get_result
332
333 results,metadata
334
335 Querying the Hub
336 ================
337
338 The Hub sees all traffic that may pass through the schedulers between engines and clients.
339 It does this so that it can track state, allowing multiple clients to retrieve results of
340 computations submitted by their peers, as well as persisting the state to a database.
341
342 queue_status
343
344 You can check the status of the queues of the engines with this command.
345
346 result_status
347
348 purge_results
349
350 Controlling the Engines
351 =======================
352
353 There are a few actions you can do with Engines that do not involve execution. These
354 messages are sent via the Control socket, and bypass any long queues of waiting execution
355 jobs
356
357 abort
358
359 Sometimes you may want to prevent a job you have submitted from actually running. The method
360 for this is :meth:`abort`. It takes a container of msg_ids, and instructs the Engines to not
361 run the jobs if they arrive. The jobs will then fail with an AbortedTask error.
362
363 clear
364
365 You may want to purge the Engine(s) namespace of any data you have left in it. After
366 running `clear`, there will be no names in the Engine's namespace
367
368 shutdown
369
370 You can also instruct engines (and the Controller) to terminate from a Client. This
371 can be useful when a job is finished, since you can shutdown all the processes with a
372 single command.
373
374 Synchronization
375 ===============
376
377 Since the Client is a synchronous object, events do not automatically trigger in your
378 interactive session - you must poll the 0MQ sockets for incoming messages. Note that
379 this polling *does not* actually make any network requests. It simply performs a `select`
380 operation, to check if messages are already in local memory, waiting to be handled.
381
382 The method that handles incoming messages is :meth:`spin`. This method flushes any waiting messages on the various incoming sockets, and updates the state of the Client.
383
384 If you need to wait for particular results to finish, you can use the :meth:`barrier` method,
385 which will call :meth:`spin` until the messages are no longer outstanding. Anything that
386 represents a collection of messages, such as a list of msg_ids or one or more AsyncResult
387 objects, can be passed as argument to barrier. A timeout can be specified, which will prevent
388 the barrier from blocking for more than a specified time, but the default behavior is to wait
389 forever.
390
391
392
393 The client also has an `outstanding` attribute - a ``set`` of msg_ids that are awaiting replies.
394 This is the default if barrier is called with no arguments - i.e. barrier on *all* outstanding messages.
395
396
397 .. note::
398
399 TODO barrier example
400
401 Map
402 ===
403
404 Many parallel computing problems can be expressed as a `map`, or running a single program with a
405 variety of different inputs. Python has a built-in :py-func:`map`, which does exactly this, and
406 many parallel execution tools in Python, such as the built-in :py-class:`multiprocessing.Pool`
407 object provide implementations of `map`. All View objects provide a :meth:`map` method as well,
408 but the load-balanced and direct implementations differ.
409
410 Views' map methods can be called on any number of sequences, but they can also take the `block`
411 and `bound` keyword arguments, just like :meth:`~client.apply`, but *only as keywords*.
412
413 .. sourcecode:: python
414
415 dview.map(*sequences, block=None)
416
417
418 * iter, map_async, reduce
419
420 Decorators and RemoteFunctions
421 ==============================
422
423 @parallel
424
425 @remote
426
427 RemoteFunction
428
429 ParallelFunction
430
431 Dependencies
432 ============
433
434 @depend
435
436 @require
437
438 Dependency
@@ -34,7 +34,7 b' Here, we have a very simple 5-node DAG:'
34
34
35 With NetworkX, an arrow is just a fattened bit on the edge. Here, we can see that task 0
35 With NetworkX, an arrow is just a fattened bit on the edge. Here, we can see that task 0
36 depends on nothing, and can run immediately. 1 and 2 depend on 0; 3 depends on
36 depends on nothing, and can run immediately. 1 and 2 depend on 0; 3 depends on
37 1 and 2; and 4 depends only on 1.
37 1 and 2; and 4 depends only on 1.
38
38
39 A possible sequence of events for this workflow:
39 A possible sequence of events for this workflow:
40
40
@@ -141,9 +141,9 b' started after all of its predecessors were completed:'
141 :lines: 64-70
141 :lines: 64-70
142
142
143 We can also validate the graph visually. By drawing the graph with each node's x-position
143 We can also validate the graph visually. By drawing the graph with each node's x-position
144 as its start time, all arrows must be pointing to the right if the order was respected.
144 as its start time, all arrows must be pointing to the right if dependencies were respected.
145 For spreading, the y-position will be the in-degree, so tasks with lots of dependencies
145 For spreading, the y-position will be the runtime of the task, so long tasks
146 will be at the top, and tasks with few dependencies will be at the bottom.
146 will be at the top, and quick, small tasks will be at the bottom.
147
147
148 .. sourcecode:: ipython
148 .. sourcecode:: ipython
149
149
@@ -166,7 +166,7 b' will be at the top, and tasks with few dependencies will be at the bottom.'
166 .. figure:: dagdeps.*
166 .. figure:: dagdeps.*
167
167
168 Time started on x, runtime on y, and color-coded by engine-id (in this case there
168 Time started on x, runtime on y, and color-coded by engine-id (in this case there
169 were four engines).
169 were four engines). Edges denote dependencies.
170
170
171
171
172 .. _NetworkX: http://networkx.lanl.gov/
172 .. _NetworkX: http://networkx.lanl.gov/
@@ -16,5 +16,6 b' Using IPython for parallel computing (ZMQ)'
16 parallel_winhpc.txt
16 parallel_winhpc.txt
17 parallel_demos.txt
17 parallel_demos.txt
18 dag_dependencies.txt
18 dag_dependencies.txt
19 parallel_details.txt
19
20
20
21
@@ -45,8 +45,8 b' file to the client machine, or enter its contents as arguments to the Client con'
45
45
46 # If you have copied the json connector file from the controller:
46 # If you have copied the json connector file from the controller:
47 In [2]: rc = client.Client('/path/to/ipcontroller-client.json')
47 In [2]: rc = client.Client('/path/to/ipcontroller-client.json')
48 # or for a remote controller at 10.0.1.5, visible from my.server.com:
48 # or to connect with a specific profile you have set up:
49 In [3]: rc = client.Client('tcp://10.0.1.5:12345', sshserver='my.server.com')
49 In [3]: rc = client.Client(profile='mpi')
50
50
51
51
52 To make sure there are engines connected to the controller, users can get a list
52 To make sure there are engines connected to the controller, users can get a list
@@ -62,7 +62,7 b' Here we see that there are four engines ready to do work for us.'
62 For direct execution, we will make use of a :class:`DirectView` object, which can be
62 For direct execution, we will make use of a :class:`DirectView` object, which can be
63 constructed via list-access to the client:
63 constructed via list-access to the client:
64
64
65 .. sourcecode::
65 .. sourcecode:: ipython
66
66
67 In [4]: dview = rc[:] # use all engines
67 In [4]: dview = rc[:] # use all engines
68
68
@@ -50,7 +50,7 b" directory of the client's host, they will be found automatically. Otherwise, the"
50 to them has to be passed to the client's constructor.
50 to them has to be passed to the client's constructor.
51
51
52 Using :command:`ipclusterz`
52 Using :command:`ipclusterz`
53 ==========================
53 ===========================
54
54
55 The :command:`ipclusterz` command provides a simple way of starting a
55 The :command:`ipclusterz` command provides a simple way of starting a
56 controller and engines in the following situations:
56 controller and engines in the following situations:
@@ -309,7 +309,7 b' To use this mode, select the SSH launchers in :file:`ipclusterz_config.py`:'
309
309
310 .. sourcecode:: python
310 .. sourcecode:: python
311
311
312 c.Global.engine_launcher = 'IPython.zmq.parallel.launcher.PBSEngineSetLauncher'
312 c.Global.engine_launcher = 'IPython.zmq.parallel.launcher.SSHEngineSetLauncher'
313 # and if the Controller is also to be remote:
313 # and if the Controller is also to be remote:
314 c.Global.controller_launcher = 'IPython.zmq.parallel.launcher.SSHControllerLauncher'
314 c.Global.controller_launcher = 'IPython.zmq.parallel.launcher.SSHControllerLauncher'
315
315
@@ -469,15 +469,20 b' IPython and can be found in the directory :file:`~/.ipython/cluster_<profile>/lo'
469 Sending the log files to us will often help us to debug any problems.
469 Sending the log files to us will often help us to debug any problems.
470
470
471
471
472 .. [PBS] Portable Batch System. http://www.openpbs.org/
473 .. [SSH] SSH-Agent http://en.wikipedia.org/wiki/Ssh-agent
474
475 Configuring `ipcontrollerz`
472 Configuring `ipcontrollerz`
476 ---------------------------
473 ---------------------------
477
474
478 .. note::
475 Ports and addresses
476 *******************
479
477
480 TODO
478
479 Database Backend
480 ****************
481
482
483 .. seealso::
484
485
481
486
482 Configuring `ipenginez`
487 Configuring `ipenginez`
483 -----------------------
488 -----------------------
@@ -487,3 +492,6 b' Configuring `ipenginez`'
487 TODO
492 TODO
488
493
489
494
495
496 .. [PBS] Portable Batch System. http://www.openpbs.org/
497 .. [SSH] SSH-Agent http://en.wikipedia.org/wiki/ssh-agent
@@ -38,11 +38,9 b' a :class:`LoadBalancedView`, here called `lview`:'
38
38
39 .. sourcecode:: ipython
39 .. sourcecode:: ipython
40
40
41 In [1]: from IPython.zmq.parallel import client
41 In [1]: from IPython.zmq.parallel import client
42
42
43 In [2]: rc = client.Client()
43 In [2]: rc = client.Client()
44
45 In [3]: lview = rc.view()
46
44
47
45
48 This form assumes that the controller was started on localhost with default
46 This form assumes that the controller was started on localhost with default
@@ -53,9 +51,18 b' argument to the constructor:'
53
51
54 # for a visible LAN controller listening on an external port:
52 # for a visible LAN controller listening on an external port:
55 In [2]: rc = client.Client('tcp://192.168.1.16:10101')
53 In [2]: rc = client.Client('tcp://192.168.1.16:10101')
56 # for a remote controller at my.server.com listening on localhost:
54 # or to connect with a specific profile you have set up:
57 In [3]: rc = client.Client(sshserver='my.server.com')
55 In [3]: rc = client.Client(profile='mpi')
56
57 For load-balanced execution, we will make use of a :class:`LoadBalancedView` object, which can be constructed via the client's :meth:`view` method:
58
59 .. sourcecode:: ipython
60
61 In [4]: lview = rc.view() # default load-balanced view
62
63 .. seealso::
58
64
65 For more information, see the in-depth explanation of :ref:`Views <parallel_details>`.
59
66
60
67
61 Quick and easy parallelism
68 Quick and easy parallelism
General Comments 0
You need to be logged in to leave comments. Login now