##// END OF EJS Templates
added parallel message docs to development section
MinRK -
Show More

The requested changes are too big and content was truncated. Show full diff

1 NO CONTENT: new file 100644, binary diff hidden
NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644
NO CONTENT: new file 100644
The requested commit or file is too big and content was truncated. Show full diff
1 NO CONTENT: new file 100644, binary diff hidden
NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
NO CONTENT: new file 100644, binary diff hidden
@@ -0,0 +1,94 b''
1 .. _parallel_connections:
2
3 ==============================================
4 Connection Diagrams of The IPython ZMQ Cluster
5 ==============================================
6
7 This is a quick summary and illustration of the connections involved in the ZeroMQ based IPython cluster for parallel computing.
8
9 All Connections
10 ===============
11
12 The Parallel Computing code is currently under development in Min RK's IPython fork_ on GitHub.
13
14 .. _fork: http://github.com/minrk/ipython
15
16 The IPython cluster consists of a Controller and one or more clients and engines. The goal of the Controller is to manage and monitor the connections and communications between the clients and the engines.
17
18 It is important for security/practicality reasons that all connections be inbound to the controller process. The arrows in the figures indicate the direction of the connection.
19
20
21 .. figure:: figs/allconnections.png
22 :width: 432px
23 :alt: IPython cluster connections
24 :align: center
25
26 All the connections involved in connecting one client to one engine.
27
28 The Controller consists of two ZMQ Devices - both MonitoredQueues, one for Tasks (load balanced, engine agnostic), one for Multiplexing (explicit targets), a Python device for monitoring (the Heartbeat Monitor).
29
30
31
32 Registration
33 ------------
34
35 .. figure:: figs/regfade.png
36 :width: 432px
37 :alt: IPython Registration connections
38 :align: center
39
40 Engines and Clients only need to know where the Registrar ``XREP`` is located to start connecting.
41
42 Once a controller is launched, the only information needed for connecting clients and/or engines to the controller is the IP/port of the ``XREP`` socket called the Registrar. This socket handles connections from both clients and engines, and replies with the remaining information necessary to establish the remaining connections.
43
44 Heartbeat
45 ---------
46
47 .. figure:: figs/hbfade.png
48 :width: 432px
49 :alt: IPython Registration connections
50 :align: center
51
52 The heartbeat sockets.
53
54 The heartbeat process has been described elsewhere. To summarize: the controller publishes a distinct message periodically via a ``PUB`` socket. Each engine has a ``zmq.FORWARDER`` device with a ``SUB`` socket for input, and ``XREQ`` socket for output. The ``SUB`` socket is connected to the ``PUB`` socket labeled *HB(ping)*, and the ``XREQ`` is connected to the ``XREP`` labeled *HB(pong)*. This results in the same message being relayed back to the Heartbeat Monitor with the addition of the ``XREQ`` prefix. The Heartbeat Monitor receives all the replies via an ``XREP`` socket, and identifies which hearts are still beating by the ``zmq.IDENTITY`` prefix of the ``XREQ`` sockets.
55
56 Queues
57 ------
58
59 .. figure:: figs/queuefade.png
60 :width: 432px
61 :alt: IPython Queue connections
62 :align: center
63
64 Load balanced Task queue on the left, explicitly multiplexed queue on the right.
65
66 The controller has two MonitoredQueue devices. These devices are primarily for relaying messages between clients and engines, but the controller needs to see those messages for its own purposes. Since no Python code may exist between the two sockets in a queue, all messages sent through these queues (both directions) are also sent via a ``PUB`` socket to a monitor, which allows the Controller to monitor queue traffic without interfering with it.
67
68 For tasks, the engine need not be specified. Messages sent to the ``XREP`` socket from the client side are assigned to an engine via ZMQ's ``XREQ`` round-robin load balancing. Engine replies are directed to specific clients via the IDENTITY of the client, which is received as a prefix at the Engine.
69
70 For Multiplexing, ``XREP`` is used for both in and output sockets in the device. Clients must specify the destination by the ``zmq.IDENTITY`` of the ``PAIR`` socket connected to the downstream end of the device.
71
72 At the Kernel level, both of these PAIR sockets are treated in the same way as the ``REP`` socket in the serial version (except using ZMQStreams instead of explicit sockets).
73
74 Client connections
75 ------------------
76
77 .. figure:: figs/queryfade.png
78 :width: 432px
79 :alt: IPython client query connections
80 :align: center
81
82 Clients connect to an ``XREP`` socket to query the controller
83
84 The controller listens on an ``XREP`` socket for queries from clients as to queue status, and control instructions. Clients can connect to this via a PAIR socket or ``XREQ``.
85
86 .. figure:: figs/notiffade.png
87 :width: 432px
88 :alt: IPython Registration connections
89 :align: center
90
91 Engine registration events are published via a ``PUB`` socket.
92
93 The controller publishes all registration/unregistration events via a ``PUB`` socket. This allows clients to stay up to date with what engines are available by subscribing to the feed with a ``SUB`` socket. Other processes could selectively subscribe to just registration or unregistration events.
94
@@ -0,0 +1,209 b''
1 .. _parallel_messages:
2
3 Messaging for Parallel Computing
4 ================================
5
6 This is an extension of the :ref:`messaging <messaging>` doc. Diagrams of the connections can be found in the :ref:`parallel connections <parallel_connections>` doc.
7
8
9
10 ZMQ messaging is also used in the parallel computing IPython system. All messages to/from kernels remain the same as the single kernel model, and are forwarded through a ZMQ Queue device. The controller receives all messages and replies in these channels, and saves results for future use.
11
12 The Controller
13 --------------
14
15 The controller is the central process of the IPython parallel computing model. It has 3 Devices:
16
17 * Heartbeater
18 * Multiplexed Queue
19 * Task Queue
20
21 and 3 sockets:
22
23 * ``XREP`` for both engine and client registration
24 * ``PUB`` for notification of engine changes
25 * ``XREP`` for client requests
26
27
28
29 Registration (``XREP``)
30 ***********************
31
32 The first function of the Controller is to facilitate and monitor connections of clients and engines. Both client and engine registration are handled by the same socket, so only one ip/port pair is needed to connect any number of connections and clients.
33
34 Engines register with the ``zmq.IDENTITY`` of their two ``XREQ`` sockets, one for the queue, which receives execute requests, and one for the heartbeat, which is used to monitor the survival of the Engine process.
35
36 Message type: ``registration_request``::
37
38 content = {
39 'queue' : 'abcd-1234-...', # the queue XREQ id
40 'heartbeat' : '1234-abcd-...' # the heartbeat XREQ id
41 }
42
43 The Controller replies to an Engine's registration request with the engine's integer ID, and all the remaining connection information for connecting the heartbeat process, and kernel socket(s). The message status will be an error if the Engine requests IDs that already in use.
44
45 Message type: ``registration_reply``::
46
47 content = {
48 'status' : 'ok', # or 'error'
49 # if ok:
50 'id' : 0, # int, the engine id
51 'queue' : 'tcp://127.0.0.1:12345', # connection for engine side of the queue
52 'heartbeat' : (a,b), # tuple containing two interfaces needed for heartbeat
53 'task' : 'tcp...', # addr for task queue, or None if no task queue running
54 # if error:
55 'reason' : 'queue_id already registered'
56 }
57
58 Clients use the same socket to start their connections. Connection requests from clients need no information:
59
60 Message type: ``connection_request``::
61
62 content = {}
63
64 The reply to a Client registration request contains the connection information for the multiplexer and load balanced queues, as well as the address for direct controller queries. If any of these addresses is `None`, that functionality is not available.
65
66 Message type: ``connection_reply``::
67
68 content = {
69 'status' : 'ok', # or 'error'
70 # if ok:
71 'queue' : 'tcp://127.0.0.1:12345', # connection for client side of the queue
72 'task' : 'tcp...', # addr for task queue, or None if no task queue running
73 'controller' : 'tcp...' # addr for controller methods, like queue_request, etc.
74 }
75
76 Heartbeat
77 *********
78
79 The controller uses a heartbeat system to monitor engines, and track when they become unresponsive. As described in :ref:`messages <messages>`, and shown in :ref:`connections <parallel_connections>`.
80
81 Notification (``PUB``)
82 **********************
83
84 The controller published all engine registration/unregistration events on a PUB socket. This allows clients to have up-to-date engine ID sets without polling. Registration notifications contain both the integer engine ID and the queue ID, which is necessary for sending messages via the Multiplexer Queue.
85
86 Message type: ``registration_notification``::
87
88 content = {
89 'id' : 0, # engine ID that has been registered
90 'queue' : 'engine_id' # the IDENT for the engine's queue
91 }
92
93 Message type : ``unregistration_notification``::
94
95 content = {
96 'id' : 0 # engine ID that has been unregistered
97 }
98
99
100 Client Queries (``XREP``)
101 *************************
102
103 The controller monitors and logs all queue traffic, so that clients can retrieve past results or monitor pending tasks. Currently, this information resides in memory on the Controller, but will ultimately be offloaded to a database over an additional ZMQ connection. The interface should remain the same or at least similar.
104
105 :func:`queue_request` requests can specify multiple engines to query via the `targets` element. A verbose flag can be passed, to determine whether the result should be the list of `msg_ids` in the queue or simply the length of each list.
106
107 Message type: ``queue_request``::
108
109 content = {
110 'verbose' : True, # whether return should be lists themselves or just lens
111 'targets' : [0,3,1] # list of ints
112 }
113
114 The content of a reply to a :func:queue_request request is a dict, keyed by the engine IDs. Note that they will be the string representation of the integer keys, since JSON cannot handle number keys.
115
116 Message type: ``queue_reply``::
117
118 content = {
119 '0' : {'completed' : 1, 'queue' : 7},
120 '1' : {'completed' : 10, 'queue' : 1}
121 }
122
123 Clients can request individual results directly from the controller. This is primarily for use gathering results of executions not submitted by the particular client, as the client will have all its own results already. Requests are made by msg_id, and can contain one or more msg_id.
124
125 Message type: ``result_request``::
126
127 content = {
128 'msg_ids' : [uuid,'...'] # list of strs
129 }
130
131 The :func:`result_request` reply contains the content objects of the actual execution reply messages
132
133
134 Message type: ``result_reply``::
135
136 content = {
137 'status' : 'ok', # else error
138 # if ok:
139 msg_id : msg, # the content dict is keyed by msg_ids,
140 # values are the result messages
141 'pending' : ['msg_id','...'], # msg_ids still pending
142 # if error:
143 'reason' : "explanation"
144 }
145
146 Clients can also instruct the controller to forget the results of messages. This can be done by message ID or engine ID. Individual messages are dropped by msg_id, and all messages completed on an engine are dropped by engine ID.
147
148 If the msg_ids element is the string ``'all'`` instead of a list, then all completed results are forgotten.
149
150 Message type: ``purge_request``::
151
152 content = {
153 'msg_ids' : ['id1', 'id2',...], # list of msg_ids or 'all'
154 'engine_ids' : [0,2,4] # list of engine IDs
155 }
156
157 The reply to a purge request is simply the status 'ok' if the request succeeded, or an explanation of why it failed, such as requesting the purge of a nonexistent or pending message.
158
159 Message type: ``purge_reply``::
160
161 content = {
162 'status' : 'ok', # or 'error'
163
164 # if error:
165 'reason' : "KeyError: no such msg_id 'whoda'"
166 }
167
168 :func:`apply` and :func:`apply_bound`
169 *************************************
170
171 The `Namespace <http://gist.github.com/483294>`_ model suggests that execution be able to use the model::
172
173 client.apply(f, *args, **kwargs)
174
175 which takes `f`, a function in the user's namespace, and executes ``f(*args, **kwargs)`` on a remote engine, returning the result (or, for non-blocking, information facilitating later retrieval of the result). This model, unlike the execute message which just uses a code string, must be able to send arbitrary (pickleable) Python objects. And ideally, copy as little data as we can. The `buffers` property of a Message was introduced for this purpose.
176
177 Utility method :func:`build_apply_message` in :mod:`IPython.zmq.streamsession` wraps a function signature and builds the correct buffer format.
178
179 Message type: ``apply_request``::
180
181 content = {
182 'bound' : True # whether to execute in the engine's namespace or unbound
183 }
184 buffers = ['...'] # at least 3 in length
185 # as built by build_apply_message(f,args,kwargs)
186
187 Message type: ``apply_reply``::
188
189 content = {
190 'status' : 'ok' # 'ok' or 'error'
191 # other error info here, as in other messages
192 }
193 buffers = ['...'] # either 1 or 2 in length
194 # a serialization of the return value of f(*args,**kwargs)
195 # only populated if status is 'ok'
196
197
198
199
200 Implementation
201 --------------
202
203 There are a few differences in implementation between the `StreamSession` object used in the parallel computing fork and the `Session` object, the main one being that messages are sent in parts, rather than as a single serialized object. `StreamSession` objects also take pack/unpack functions, which are to be used when serializing/deserializing objects. These can be any functions that translate to/from formats that ZMQ sockets can send (buffers,bytes, etc.).
204
205 Split Sends
206 ***********
207
208 Previously, messages were bundled as a single json object and one call to :func:`socket.send_json`. Since the controller inspects all messages, and doesn't need to see the content of the messages, which can be large, messages are serialized and sent in pieces. All messages are sent in at least 3 parts: the header, the parent header, and the content. This allows the controller to unpack and inspect the (always small) header, without spending time unpacking the content unless the message is bound for the controller. Buffers are added on to the end of the message, and can be any objects that present the buffer interface.
209
@@ -505,3 +505,6 b' ul.search li div.context {'
505 ul.keywordmatches li.goodmatch a {
505 ul.keywordmatches li.goodmatch a {
506 font-weight: bold;
506 font-weight: bold;
507 }
507 }
508 div.figure {
509 text-align: center;
510 }
@@ -16,6 +16,8 b" IPython developer's guide"
16 roadmap.txt
16 roadmap.txt
17 reorg.txt
17 reorg.txt
18 messaging.txt
18 messaging.txt
19 parallel_messages.txt
20 parallel_connections.txt
19 magic_blueprint.txt
21 magic_blueprint.txt
20 notification_blueprint.txt
22 notification_blueprint.txt
21 ipgraph.txt
23 ipgraph.txt
@@ -1,3 +1,5 b''
1 .. _messaging:
2
1 ======================
3 ======================
2 Messaging in IPython
4 Messaging in IPython
3 ======================
5 ======================
General Comments 0
You need to be logged in to leave comments. Login now