##// END OF EJS Templates
Add note about pure-zmq heartbeat messaging.
Fernando Perez -
Show More
@@ -1,560 +1,579 b''
1 1 ======================
2 2 Messaging in IPython
3 3 ======================
4 4
5 5
6 6 Introduction
7 7 ============
8 8
9 9 This document explains the basic communications design and messaging
10 10 specification for how the various IPython objects interact over a network
11 11 transport. The current implementation uses the ZeroMQ_ library for messaging
12 12 within and between hosts.
13 13
14 14 .. Note::
15 15
16 16 This document should be considered the authoritative description of the
17 17 IPython messaging protocol, and all developers are strongly encouraged to
18 18 keep it updated as the implementation evolves, so that we have a single
19 19 common reference for all protocol details.
20 20
21 21 The basic design is explained in the following diagram:
22 22
23 23 .. image:: frontend-kernel.png
24 24 :width: 450px
25 25 :alt: IPython kernel/frontend messaging architecture.
26 26 :align: center
27 27 :target: ../_images/frontend-kernel.png
28 28
29 29 A single kernel can be simultaneously connected to one or more frontends. The
30 30 kernel has three sockets that serve the following functions:
31 31
32 32 1. REQ: this socket is connected to a *single* frontend at a time, and it allows
33 33 the kernel to request input from a frontend when :func:`raw_input` is called.
34 34 The frontend holding the matching REP socket acts as a 'virtual keyboard'
35 35 for the kernel while this communication is happening (illustrated in the
36 36 figure by the black outline around the central keyboard). In practice,
37 37 frontends may display such kernel requests using a special input widget or
38 38 otherwise indicating that the user is to type input for the kernel instead
39 39 of normal commands in the frontend.
40 40
41 41 2. XREP: this single sockets allows multiple incoming connections from
42 42 frontends, and this is the socket where requests for code execution, object
43 43 information, prompts, etc. are made to the kernel by any frontend. The
44 44 communication on this socket is a sequence of request/reply actions from
45 45 each frontend and the kernel.
46 46
47 47 3. PUB: this socket is the 'broadcast channel' where the kernel publishes all
48 48 side effects (stdout, stderr, etc.) as well as the requests coming from any
49 49 client over the XREP socket and its own requests on the REP socket. There
50 50 are a number of actions in Python which generate side effects: :func:`print`
51 51 writes to ``sys.stdout``, errors generate tracebacks, etc. Additionally, in
52 52 a multi-client scenario, we want all frontends to be able to know what each
53 53 other has sent to the kernel (this can be useful in collaborative scenarios,
54 54 for example). This socket allows both side effects and the information
55 55 about communications taking place with one client over the XREQ/XREP channel
56 56 to be made available to all clients in a uniform manner.
57 57
58 58 All messages are tagged with enough information (details below) for clients
59 59 to know which messages come from their own interaction with the kernel and
60 60 which ones are from other clients, so they can display each type
61 61 appropriately.
62 62
63 63 The actual format of the messages allowed on each of these channels is
64 64 specified below. Messages are dicts of dicts with string keys and values that
65 65 are reasonably representable in JSON. Our current implementation uses JSON
66 66 explicitly as its message format, but this shouldn't be considered a permanent
67 67 feature. As we've discovered that JSON has non-trivial performance issues due
68 68 to excessive copying, we may in the future move to a pure pickle-based raw
69 69 message format. However, it should be possible to easily convert from the raw
70 70 objects to JSON, since we may have non-python clients (e.g. a web frontend).
71 71 As long as it's easy to make a JSON version of the objects that is a faithful
72 72 representation of all the data, we can communicate with such clients.
73 73
74 74 .. Note::
75 75
76 76 Not all of these have yet been fully fleshed out, but the key ones are, see
77 77 kernel and frontend files for actual implementation details.
78 78
79 79
80 80 Python functional API
81 81 =====================
82 82
83 83 As messages are dicts, they map naturally to a ``func(**kw)`` call form. We
84 84 should develop, at a few key points, functional forms of all the requests that
85 85 take arguments in this manner and automatically construct the necessary dict
86 86 for sending.
87 87
88 88
89 89 General Message Format
90 90 ======================
91 91
92 92 All messages send or received by any IPython process should have the following
93 93 generic structure::
94 94
95 95 {
96 96 # The message header contains a pair of unique identifiers for the
97 97 # originating session and the actual message id, in addition to the
98 98 # username for the process that generated the message. This is useful in
99 99 # collaborative settings where multiple users may be interacting with the
100 100 # same kernel simultaneously, so that frontends can label the various
101 101 # messages in a meaningful way.
102 102 'header' : { 'msg_id' : uuid,
103 103 'username' : str,
104 104 'session' : uuid
105 105 },
106 106
107 107 # In a chain of messages, the header from the parent is copied so that
108 108 # clients can track where messages come from.
109 109 'parent_header' : dict,
110 110
111 111 # All recognized message type strings are listed below.
112 112 'msg_type' : str,
113 113
114 114 # The actual content of the message must be a dict, whose structure
115 115 # depends on the message type.x
116 116 'content' : dict,
117 117 }
118 118
119 119 For each message type, the actual content will differ and all existing message
120 120 types are specified in what follows of this document.
121 121
122 122
123 123 Messages on the XREP/XREQ socket
124 124 ================================
125 125
126 126 .. _execute:
127 127
128 128 Execute
129 129 -------
130 130
131 131 The execution request contains a single string, but this may be a multiline
132 132 string. The kernel is responsible for splitting this into possibly more than
133 133 one block and deciding whether to compile these in 'single' or 'exec' mode.
134 134 We're still sorting out this policy. The current inputsplitter is capable of
135 135 splitting the input for blocks that can all be run as 'single', but in the long
136 136 run it may prove cleaner to only use 'single' mode for truly single-line
137 137 inputs, and run all multiline input in 'exec' mode. This would preserve the
138 138 natural behavior of single-line inputs while allowing long cells to behave more
139 139 likea a script. This design will be refined as we complete the implementation.
140 140
141 141 Message type: ``execute_request``::
142 142
143 143 content = {
144 144 # Source code to be executed by the kernel, one or more lines.
145 145 'code' : str,
146 146
147 147 # A boolean flag which, if True, signals the kernel to execute this
148 148 # code as quietly as possible. This means that the kernel will compile
149 149 # the code with 'exec' instead of 'single' (so sys.displayhook will not
150 150 # fire), and will *not*:
151 151 # - broadcast exceptions on the PUB socket
152 152 # - do any logging
153 153 # - populate any history
154 154 # The default is False.
155 155 'silent' : bool,
156 156 }
157 157
158 158 Upon execution, the kernel *always* sends a reply, with a status code
159 159 indicating what happened and additional data depending on the outcome.
160 160
161 161 Message type: ``execute_reply``::
162 162
163 163 content = {
164 164 # One of: 'ok' OR 'error' OR 'abort'
165 165 'status' : str,
166 166
167 167 # Any additional data depends on status value
168 168 }
169 169
170 170 When status is 'ok', the following extra fields are present::
171 171
172 172 {
173 173 # This has the same structure as the output of a prompt request, but is
174 174 # for the client to set up the *next* prompt (with identical limitations
175 175 # to a prompt request)
176 176 'next_prompt' : {
177 177 'prompt_string' : str,
178 178 'prompt_number' : int,
179 179 },
180 180
181 181 # The prompt number of the actual execution for this code, which may be
182 182 # different from the one used when the code was typed, which was the
183 183 # 'next_prompt' field of the *previous* request. They will differ in the
184 184 # case where there is more than one client talking simultaneously to a
185 185 # kernel, since the numbers can go out of sync. GUI clients can use this
186 186 # to correct the previously written number in-place, terminal ones may
187 187 # re-print a corrected one if desired.
188 188 'prompt_number' : int,
189 189
190 190 # The kernel will often transform the input provided to it. This
191 191 # contains the transformed code, which is what was actually executed.
192 192 'transformed_code' : str,
193 193
194 194 # The execution payload is a dict with string keys that may have been
195 195 # produced by the code being executed. It is retrieved by the kernel at
196 196 # the end of the execution and sent back to the front end, which can take
197 197 # action on it as needed. See main text for further details.
198 198 'payload' : dict,
199 199 }
200 200
201 201 .. admonition:: Execution payloads
202 202
203 203 The notion of an 'execution payload' is different from a return value of a
204 204 given set of code, which normally is just displayed on the pyout stream
205 205 through the PUB socket. The idea of a payload is to allow special types of
206 206 code, typically magics, to populate a data container in the IPython kernel
207 207 that will be shipped back to the caller via this channel. The kernel will
208 208 have an API for this, probably something along the lines of::
209 209
210 210 ip.exec_payload_add(key, value)
211 211
212 212 though this API is still in the design stages. The data returned in this
213 213 payload will allow frontends to present special views of what just happened.
214 214
215 215
216 216 When status is 'error', the following extra fields are present::
217 217
218 218 {
219 219 'exc_name' : str, # Exception name, as a string
220 220 'exc_value' : str, # Exception value, as a string
221 221
222 222 # The traceback will contain a list of frames, represented each as a
223 223 # string. For now we'll stick to the existing design of ultraTB, which
224 224 # controls exception level of detail statefully. But eventually we'll
225 225 # want to grow into a model where more information is collected and
226 226 # packed into the traceback object, with clients deciding how little or
227 227 # how much of it to unpack. But for now, let's start with a simple list
228 228 # of strings, since that requires only minimal changes to ultratb as
229 229 # written.
230 230 'traceback' : list,
231 231 }
232 232
233 233
234 234 When status is 'abort', there are for now no additional data fields. This
235 235 happens when the kernel was interrupted by a signal.
236 236
237 237
238 238 Prompt
239 239 ------
240 240
241 241 A simple request for a current prompt string.
242 242
243 243 Message type: ``prompt_request``::
244 244
245 245 content = {}
246 246
247 247 In the reply, the prompt string comes back with the prompt number placeholder
248 248 *unevaluated*. The message format is:
249 249
250 250 Message type: ``prompt_reply``::
251 251
252 252 content = {
253 253 'prompt_string' : str,
254 254 'prompt_number' : int,
255 255 }
256 256
257 257 Clients can produce a prompt with ``prompt_string.format(prompt_number)``, but
258 258 they should be aware that the actual prompt number for that input could change
259 259 later, in the case where multiple clients are interacting with a single
260 260 kernel.
261 261
262 262
263 263 Object information
264 264 ------------------
265 265
266 266 One of IPython's most used capabilities is the introspection of Python objects
267 267 in the user's namespace, typically invoked via the ``?`` and ``??`` characters
268 268 (which in reality are shorthands for the ``%pinfo`` magic). This is used often
269 269 enough that it warrants an explicit message type, especially because frontends
270 270 may want to get object information in response to user keystrokes (like Tab or
271 271 F1) besides from the user explicitly typing code like ``x??``.
272 272
273 273 Message type: ``object_info_request``::
274 274
275 275 content = {
276 276 # The (possibly dotted) name of the object to be searched in all
277 277 # relevant namespaces
278 278 'name' : str,
279 279
280 280 # The level of detail desired. The default (0) is equivalent to typing
281 281 # 'x?' at the prompt, 1 is equivalent to 'x??'.
282 282 'detail_level' : int,
283 283 }
284 284
285 285 The returned information will be a dictionary with keys very similar to the
286 286 field names that IPython prints at the terminal.
287 287
288 288 Message type: ``object_info_reply``::
289 289
290 290 content = {
291 291 # Flags for magics and system aliases
292 292 'ismagic' : bool,
293 293 'isalias' : bool,
294 294
295 295 # The name of the namespace where the object was found ('builtin',
296 296 # 'magics', 'alias', 'interactive', etc.)
297 297 'namespace' : str,
298 298
299 299 # The type name will be type.__name__ for normal Python objects, but it
300 300 # can also be a string like 'Magic function' or 'System alias'
301 301 'type_name' : str,
302 302
303 303 'string_form' : str,
304 304
305 305 # For objects with a __class__ attribute this will be set
306 306 'base_class' : str,
307 307
308 308 # For objects with a __len__ attribute this will be set
309 309 'length' : int,
310 310
311 311 # If the object is a function, class or method whose file we can find,
312 312 # we give its full path
313 313 'file' : str,
314 314
315 315 # For pure Python callable objects, we can reconstruct the object
316 316 # definition line which provides its call signature
317 317 'definition' : str,
318 318
319 319 # For instances, provide the constructor signature (the definition of
320 320 # the __init__ method):
321 321 'init_definition' : str,
322 322
323 323 # Docstrings: for any object (function, method, module, package) with a
324 324 # docstring, we show it. But in addition, we may provide additional
325 325 # docstrings. For example, for instances we will show the constructor
326 326 # and class docstrings as well, if available.
327 327 'docstring' : str,
328 328
329 329 # For instances, provide the constructor and class docstrings
330 330 'init_docstring' : str,
331 331 'class_docstring' : str,
332 332
333 333 # If detail_level was 1, we also try to find the source code that
334 334 # defines the object, if possible. The string 'None' will indicate
335 335 # that no source was found.
336 336 'source' : str,
337 337 }
338 338
339 339
340 340 Complete
341 341 --------
342 342
343 343 Message type: ``complete_request``::
344 344
345 345 content = {
346 346 # The text to be completed, such as 'a.is'
347 347 'text' : str,
348 348
349 349 # The full line, such as 'print a.is'. This allows completers to
350 350 # make decisions that may require information about more than just the
351 351 # current word.
352 352 'line' : str,
353 353 }
354 354
355 355 Message type: ``complete_reply``::
356 356
357 357 content = {
358 358 # The list of all matches to the completion request, such as
359 359 # ['a.isalnum', 'a.isalpha'] for the above example.
360 360 'matches' : list
361 361 }
362 362
363 363
364 364 History
365 365 -------
366 366
367 367 For clients to explicitly request history from a kernel. The kernel has all
368 368 the actual execution history stored in a single location, so clients can
369 369 request it from the kernel when needed.
370 370
371 371 Message type: ``history_request``::
372 372
373 373 content = {
374 374
375 375 # If true, also return output history in the resulting dict.
376 376 'output' : bool,
377 377
378 378 # This parameter can be one of: A number, a pair of numbers, 'all'
379 379 # If not given, last 40 are returned.
380 380 # - number n: return the last n entries.
381 381 # - pair n1, n2: return entries in the range(n1, n2).
382 382 # - 'all': return all history
383 383 'range' : n or (n1, n2) or 'all',
384 384
385 385 # If a filter is given, it is treated as a regular expression and only
386 386 # matching entries are returned. re.search() is used to find matches.
387 387 'filter' : str,
388 388 }
389 389
390 390 Message type: ``history_reply``::
391 391
392 392 content = {
393 393 # A list of (number, input) pairs
394 394 'input' : list,
395 395
396 396 # A list of (number, output) pairs
397 397 'output' : list,
398 398 }
399 399
400
401 Control
402 -------
403
404 Message type: ``heartbeat``::
405
406 content = {
407 # FIXME - unfinished
408 }
409
410 400
411 401 Messages on the PUB/SUB socket
412 402 ==============================
413 403
414 404 Streams (stdout, stderr, etc)
415 405 ------------------------------
416 406
417 407 Message type: ``stream``::
418 408
419 409 content = {
420 410 # The name of the stream is one of 'stdin', 'stdout', 'stderr'
421 411 'name' : str,
422 412
423 413 # The data is an arbitrary string to be written to that stream
424 414 'data' : str,
425 415 }
426 416
427 417 When a kernel receives a raw_input call, it should also broadcast it on the pub
428 418 socket with the names 'stdin' and 'stdin_reply'. This will allow other clients
429 419 to monitor/display kernel interactions and possibly replay them to their user
430 420 or otherwise expose them.
431 421
432 422 Python inputs
433 423 -------------
434 424
435 425 These messages are the re-broadcast of the ``execute_request``.
436 426
437 427 Message type: ``pyin``::
438 428
439 429 content = {
440 430 # Source code to be executed, one or more lines
441 431 'code' : str
442 432 }
443 433
444 434 Python outputs
445 435 --------------
446 436
447 437 When Python produces output from code that has been compiled in with the
448 438 'single' flag to :func:`compile`, any expression that produces a value (such as
449 439 ``1+1``) is passed to ``sys.displayhook``, which is a callable that can do with
450 440 this value whatever it wants. The default behavior of ``sys.displayhook`` in
451 441 the Python interactive prompt is to print to ``sys.stdout`` the :func:`repr` of
452 442 the value as long as it is not ``None`` (which isn't printed at all). In our
453 443 case, the kernel instantiates as ``sys.displayhook`` an object which has
454 444 similar behavior, but which instead of printing to stdout, broadcasts these
455 445 values as ``pyout`` messages for clients to display appropriately.
456 446
457 447 Message type: ``pyout``::
458 448
459 449 content = {
460 450 # The data is typically the repr() of the object.
461 451 'data' : str,
462 452
463 453 # The prompt number for this execution is also provided so that clients
464 454 # can display it, since IPython automatically creates variables called
465 455 # _N (for prompt N).
466 456 'prompt_number' : int,
467 457 }
468 458
469 459 Python errors
470 460 -------------
471 461
472 462 When an error occurs during code execution
473 463
474 464 Message type: ``pyerr``::
475 465
476 466 content = {
477 467 # Similar content to the execute_reply messages for the 'error' case,
478 468 # except the 'status' field is omitted.
479 469 }
480 470
481 471 Kernel crashes
482 472 --------------
483 473
484 474 When the kernel has an unexpected exception, caught by the last-resort
485 475 sys.excepthook, we should broadcast the crash handler's output before exiting.
486 476 This will allow clients to notice that a kernel died, inform the user and
487 477 propose further actions.
488 478
489 479 Message type: ``crash``::
490 480
491 481 content = {
492 482 # Similarly to the 'error' case for execute_reply messages, this will
493 483 # contain exc_name, exc_type and traceback fields.
494 484
495 485 # An additional field with supplementary information such as where to
496 486 # send the crash message
497 487 'info' : str,
498 488 }
499 489
500 490
501 491 Future ideas
502 492 ------------
503 493
504 494 Other potential message types, currently unimplemented, listed below as ideas.
505 495
506 496 Message type: ``file``::
507 497
508 498 content = {
509 499 'path' : 'cool.jpg',
510 500 'mimetype' : str,
511 501 'data' : str,
512 502 }
513 503
514 504
515 505 Messages on the REQ/REP socket
516 506 ==============================
517 507
518 508 This is a socket that goes in the opposite direction: from the kernel to a
519 509 *single* frontend, and its purpose is to allow ``raw_input`` and similar
520 510 operations that read from ``sys.stdin`` on the kernel to be fulfilled by the
521 511 client. For now we will keep these messages as simple as possible, since they
522 512 basically only mean to convey the ``raw_input(prompt)`` call.
523 513
524 514 Message type: ``input_request``::
525 515
526 516 content = { 'prompt' : str }
527 517
528 518 Message type: ``input_reply``::
529 519
530 520 content = { 'value' : str }
531 521
532 522 .. Note::
533 523
534 524 We do not explicitly try to forward the raw ``sys.stdin`` object, because in
535 525 practice the kernel should behave like an interactive program. When a
536 526 program is opened on the console, the keyboard effectively takes over the
537 527 ``stdin`` file descriptor, and it can't be used for raw reading anymore.
538 528 Since the IPython kernel effectively behaves like a console program (albeit
539 529 one whose "keyboard" is actually living in a separate process and
540 530 transported over the zmq connection), raw ``stdin`` isn't expected to be
541 531 available.
542 532
533
534 Heartbeat for kernels
535 =====================
536
537 Initially we had considered using messages like those above over ZMQ for a
538 kernel 'heartbeat' (a way to detect quickly and reliably whether a kernel is
539 alive at all, even if it may be busy executing user code). But this has the
540 problem that if the kernel is locked inside extension code, it wouldn't execute
541 the python heartbeat code. But it turns out that we can implement a basic
542 heartbeat with pure ZMQ, without using any Python messaging at all.
543
544 The monitor sends out a single zmq message (right now, it is a str of the
545 monitor's lifetime in seconds), and gets the same message right back, prefixed
546 with the zmq identity of the XREQ socket in the heartbeat process. This can be
547 a uuid, or even a full message, but there doesn't seem to be a need for packing
548 up a message when the sender and receiver are the exact same Python object.
549
550 The model is this::
551
552 monitor.send(str(self.lifetime)) # '1.2345678910'
553
554 and the monitor receives some number of messages of the form::
555
556 ['uuid-abcd-dead-beef', '1.2345678910']
557
558 where the first part is the zmq.IDENTITY of the heart's XREQ on the engine, and
559 the rest is the message sent by the monitor. No Python code ever has any
560 access to the message between the monitor's send, and the monitor's recv.
561
543 562
544 563 ToDo
545 564 ====
546 565
547 566 Missing things include:
548 567
549 568 * Important: finish thinking through the payload concept and API.
550 569
551 570 * Important: ensure that we have a good solution for magics like %edit. It's
552 571 likely that with the payload concept we can build a full solution, but not
553 572 100% clear yet.
554 573
555 574 * Finishing the details of the heartbeat protocol.
556 575
557 576 * Signal handling: specify what kind of information kernel should broadcast (or
558 577 not) when it receives signals.
559 578
560 579 .. include:: ../links.rst
General Comments 0
You need to be logged in to leave comments. Login now