upstream/ipython Files · docs/source/development/messaging.txt

.. _messaging:

======================

Messaging in IPython

======================

Introduction

============

This document explains the basic communications design and messaging

specification for how the various IPython objects interact over a network

transport. The current implementation uses the ZeroMQ_ library for messaging

within and between hosts.

.. Note::

This document should be considered the authoritative description of the

IPython messaging protocol, and all developers are strongly encouraged to

keep it updated as the implementation evolves, so that we have a single

common reference for all protocol details.

The basic design is explained in the following diagram:

.. image:: frontend-kernel.png

:width: 450px

:alt: IPython kernel/frontend messaging architecture.

:align: center

:target: ../_images/frontend-kernel.png

A single kernel can be simultaneously connected to one or more frontends. The

kernel has three sockets that serve the following functions:

1. REQ: this socket is connected to a *single* frontend at a time, and it allows

the kernel to request input from a frontend when :func:`raw_input` is called.

The frontend holding the matching REP socket acts as a 'virtual keyboard'

for the kernel while this communication is happening (illustrated in the

figure by the black outline around the central keyboard). In practice,

frontends may display such kernel requests using a special input widget or

otherwise indicating that the user is to type input for the kernel instead

of normal commands in the frontend.

2. XREP: this single sockets allows multiple incoming connections from

frontends, and this is the socket where requests for code execution, object

information, prompts, etc. are made to the kernel by any frontend. The

communication on this socket is a sequence of request/reply actions from

each frontend and the kernel.

3. PUB: this socket is the 'broadcast channel' where the kernel publishes all

side effects (stdout, stderr, etc.) as well as the requests coming from any

client over the XREP socket and its own requests on the REP socket. There

are a number of actions in Python which generate side effects: :func:`print`

writes to ``sys.stdout``, errors generate tracebacks, etc. Additionally, in

a multi-client scenario, we want all frontends to be able to know what each

other has sent to the kernel (this can be useful in collaborative scenarios,

for example). This socket allows both side effects and the information

about communications taking place with one client over the XREQ/XREP channel

to be made available to all clients in a uniform manner.

All messages are tagged with enough information (details below) for clients

to know which messages come from their own interaction with the kernel and

which ones are from other clients, so they can display each type

appropriately.

The actual format of the messages allowed on each of these channels is

specified below. Messages are dicts of dicts with string keys and values that

are reasonably representable in JSON. Our current implementation uses JSON

explicitly as its message format, but this shouldn't be considered a permanent

feature. As we've discovered that JSON has non-trivial performance issues due

to excessive copying, we may in the future move to a pure pickle-based raw

message format. However, it should be possible to easily convert from the raw

objects to JSON, since we may have non-python clients (e.g. a web frontend).

As long as it's easy to make a JSON version of the objects that is a faithful

representation of all the data, we can communicate with such clients.

.. Note::

Not all of these have yet been fully fleshed out, but the key ones are, see

kernel and frontend files for actual implementation details.

Python functional API

=====================

As messages are dicts, they map naturally to a ``func(**kw)`` call form. We

should develop, at a few key points, functional forms of all the requests that

take arguments in this manner and automatically construct the necessary dict

for sending.

General Message Format

======================

All messages send or received by any IPython process should have the following

generic structure::

{

# The message header contains a pair of unique identifiers for the

# originating session and the actual message id, in addition to the

# username for the process that generated the message. This is useful in

# collaborative settings where multiple users may be interacting with the

# same kernel simultaneously, so that frontends can label the various

# messages in a meaningful way.

'header' : { 'msg_id' : uuid,

'username' : str,

'session' : uuid

},

# In a chain of messages, the header from the parent is copied so that

# clients can track where messages come from.

'parent_header' : dict,

# All recognized message type strings are listed below.

'msg_type' : str,

# The actual content of the message must be a dict, whose structure

# depends on the message type.x

'content' : dict,

}

For each message type, the actual content will differ and all existing message

types are specified in what follows of this document.

Messages on the XREP/XREQ socket

================================

.. _execute:

Execute

-------

This message type is used by frontends to ask the kernel to execute code on

behalf of the user, in a namespace reserved to the user's variables (and thus

separate from the kernel's own internal code and variables).

Message type: ``execute_request``::

content = {

# Source code to be executed by the kernel, one or more lines.

'code' : str,

# A boolean flag which, if True, signals the kernel to execute this

# code as quietly as possible. This means that the kernel will compile

# the code witIPython/core/tests/h 'exec' instead of 'single' (so

# sys.displayhook will not fire), and will *not*:

# - broadcast exceptions on the PUB socket

# - do any logging

# - populate any history

#

# The default is False.

'silent' : bool,

# A list of variable names from the user's namespace to be retrieved. What

# returns is a JSON string of the variable's repr(), not a python object.

'user_variables' : list,

# Similarly, a dict mapping names to expressions to be evaluated in the

# user's dict.

'user_expressions' : dict,

}

The ``code`` field contains a single string, but this may be a multiline

string. The kernel is responsible for splitting this into possibly more than

one block and deciding whether to compile these in 'single' or 'exec' mode.

We're still sorting out this policy. The current inputsplitter is capable of

splitting the input for blocks that can all be run as 'single', but in the long

run it may prove cleaner to only use 'single' mode for truly single-line

inputs, and run all multiline input in 'exec' mode. This would preserve the

natural behavior of single-line inputs while allowing long cells to behave more

likea a script. This design will be refined as we complete the implementation.

The ``user_`` fields deserve a detailed explanation. In the past, IPython had

the notion of a prompt string that allowed arbitrary code to be evaluated, and

this was put to good use by many in creating prompts that displayed system

status, path information, and even more esoteric uses like remote instrument

status aqcuired over the network. But now that IPython has a clean separation

between the kernel and the clients, the notion of embedding 'prompt'

maninpulations into the kernel itself feels awkward. Prompts should be a

frontend-side feature, and it should be even possible for different frontends

to display different prompts while interacting with the same kernel.

We have therefore abandoned the idea of a 'prompt string' to be evaluated by

the kernel, and instead provide the ability to retrieve from the user's

namespace information after the execution of the main ``code``, with two fields

of the execution request:

- ``user_variables``: If only variables from the user's namespace are needed, a

list of variable names can be passed and a dict with these names as keys and

their :func:`repr()` as values will be returned.

- ``user_expressions``: For more complex expressions that require function

evaluations, a dict can be provided with string keys and arbitrary python

expressions as values. The return message will contain also a dict with the

same keys and the :func:`repr()` of the evaluated expressions as value.

With this information, frontends can display any status information they wish

in the form that best suits each frontend (a status line, a popup, inline for a

terminal, etc).

.. Note::

In order to obtain the current execution counter for the purposes of

displaying input prompts, frontends simply make an execution request with an

empty code string and ``silent=True``.

Execution semantics

Upon completion of the execution request, the kernel *always* sends a

reply, with a status code indicating what happened and additional data

depending on the outcome.

The ``code`` field is executed first, and then the ``user_variables`` and

``user_expressions`` are computed. This ensures that any error in the

latter don't harm the main code execution.

Any error in retrieving the ``user_variables`` or evaluating the

``user_expressions`` will result in a simple error message in the return

fields of the form::

[ERROR] ExceptionType: Exception message

The user can simply send the same variable name or expression for

evaluation to see a regular traceback.

Execution counter (old prompt number)

The kernel has a single, monotonically increasing counter of all execution

requests that are made with ``silent=False``. This counter is used to

populate the ``In[n]``, ``Out[n]`` and ``_n`` variables, so clients will

likely want to display it in some form to the user, which will typically

(but not necessarily) be done in the prompts. The value of this counter

will be returned as the ``execution_count`` field of all ``execute_reply```

messages.

Message type: ``execute_reply``::

content = {

# One of: 'ok' OR 'error' OR 'abort'

'status' : str,

# The global kernel counter that increases by one with each non-silent

# executed request. This will typically be used by clients to display

# prompt numbers to the user. If the request was a silent one, this will

# be the current value of the counter in the kernel.

'execution_count' : int,

}

When status is 'ok', the following extra fields are present::

{

# The execution payload is a dict with string keys that may have been

# produced by the code being executed. It is retrieved by the kernel at

# the end of the execution and sent back to the front end, which can take

# action on it as needed. See main text for further details.

'payload' : dict,

# Results for the user_variables and user_expressions.

'user_variables' : dict,

'user_expressions' : dict,

# The kernel will often transform the input provided to it. If the

# '---->' transform had been applied, this is filled, otherwise it's the

# empty string. So transformations like magics don't appear here, only

# autocall ones.

'transformed_code' : str,

}

.. admonition:: Execution payloads

The notion of an 'execution payload' is different from a return value of a

given set of code, which normally is just displayed on the pyout stream

through the PUB socket. The idea of a payload is to allow special types of

code, typically magics, to populate a data container in the IPython kernel

that will be shipped back to the caller via this channel. The kernel will

have an API for this, probably something along the lines of::

ip.exec_payload_add(key, value)

though this API is still in the design stages. The data returned in this

payload will allow frontends to present special views of what just happened.

When status is 'error', the following extra fields are present::

{

'exc_name' : str, # Exception name, as a string

'exc_value' : str, # Exception value, as a string

# The traceback will contain a list of frames, represented each as a

# string. For now we'll stick to the existing design of ultraTB, which

# controls exception level of detail statefully. But eventually we'll

# want to grow into a model where more information is collected and

# packed into the traceback object, with clients deciding how little or

# how much of it to unpack. But for now, let's start with a simple list

# of strings, since that requires only minimal changes to ultratb as

# written.

'traceback' : list,

}

When status is 'abort', there are for now no additional data fields. This

happens when the kernel was interrupted by a signal.

Kernel attribute access

-----------------------

While this protocol does not specify full RPC access to arbitrary methods of

the kernel object, the kernel does allow read (and in some cases write) access

to certain attributes.

The policy for which attributes can be read is: any attribute of the kernel, or

its sub-objects, that belongs to a :class:`Configurable` object and has been

declared at the class-level with Traits validation, is in principle accessible

as long as its name does not begin with a leading underscore. The attribute

itself will have metadata indicating whether it allows remote read and/or write

access. The message spec follows for attribute read and write requests.

Message type: ``getattr_request``::

content = {

# The (possibly dotted) name of the attribute

'name' : str,

}

When a ``getattr_request`` fails, there are two possible error types:

- AttributeError: this type of error was raised when trying to access the

given name by the kernel itself. This means that the attribute likely

doesn't exist.

- AccessError: the attribute exists but its value is not readable remotely.

Message type: ``getattr_reply``::

content = {

# One of ['ok', 'AttributeError', 'AccessError'].

'status' : str,

# If status is 'ok', a JSON object.

'value' : object,

}

Message type: ``setattr_request``::

content = {

# The (possibly dotted) name of the attribute

'name' : str,

# A JSON-encoded object, that will be validated by the Traits

# information in the kernel

'value' : object,

}

When a ``setattr_request`` fails, there are also two possible error types with

similar meanings as those of the ``getattr_request`` case, but for writing.

Message type: ``setattr_reply``::

content = {

# One of ['ok', 'AttributeError', 'AccessError'].

'status' : str,

}

Object information

------------------

One of IPython's most used capabilities is the introspection of Python objects

in the user's namespace, typically invoked via the ``?`` and ``??`` characters

(which in reality are shorthands for the ``%pinfo`` magic). This is used often

enough that it warrants an explicit message type, especially because frontends

may want to get object information in response to user keystrokes (like Tab or

F1) besides from the user explicitly typing code like ``x??``.

Message type: ``object_info_request``::

content = {

# The (possibly dotted) name of the object to be searched in all

# relevant namespaces

'name' : str,

# The level of detail desired. The default (0) is equivalent to typing

# 'x?' at the prompt, 1 is equivalent to 'x??'.

'detail_level' : int,

}

The returned information will be a dictionary with keys very similar to the

field names that IPython prints at the terminal.

Message type: ``object_info_reply``::

content = {

# Flags for magics and system aliases

'ismagic' : bool,

'isalias' : bool,

# The name of the namespace where the object was found ('builtin',

# 'magics', 'alias', 'interactive', etc.)

'namespace' : str,

# The type name will be type.__name__ for normal Python objects, but it

# can also be a string like 'Magic function' or 'System alias'

'type_name' : str,

'string_form' : str,

# For objects with a __class__ attribute this will be set

'base_class' : str,

# For objects with a __len__ attribute this will be set

'length' : int,

# If the object is a function, class or method whose file we can find,

# we give its full path

'file' : str,

# For pure Python callable objects, we can reconstruct the object

# definition line which provides its call signature. For convenience this

# is returned as a single 'definition' field, but below the raw parts that

# compose it are also returned as the argspec field.

'definition' : str,

# The individual parts that together form the definition string. Clients

# with rich display capabilities may use this to provide a richer and more

# precise representation of the definition line (e.g. by highlighting

# arguments based on the user's cursor position). For non-callable

# objects, this field is empty.

'argspec' : { # The names of all the arguments

args : list,

# The name of the varargs (*args), if any

varargs : str,

# The name of the varkw (**kw), if any

varkw : str,

# The values (as strings) of all default arguments. Note

# that these must be matched *in reverse* with the 'args'

# list above, since the first positional args have no default

# value at all.

func_defaults : list,

},

# For instances, provide the constructor signature (the definition of

# the __init__ method):

'init_definition' : str,

# Docstrings: for any object (function, method, module, package) with a

# docstring, we show it. But in addition, we may provide additional

# docstrings. For example, for instances we will show the constructor

# and class docstrings as well, if available.

'docstring' : str,

# For instances, provide the constructor and class docstrings

'init_docstring' : str,

'class_docstring' : str,

# If detail_level was 1, we also try to find the source code that

# defines the object, if possible. The string 'None' will indicate

# that no source was found.

'source' : str,

}

Complete

--------

Message type: ``complete_request``::

content = {

# The text to be completed, such as 'a.is'

'text' : str,

# The full line, such as 'print a.is'. This allows completers to

# make decisions that may require information about more than just the

# current word.

'line' : str,

# The entire block of text where the line is. This may be useful in the

# case of multiline completions where more context may be needed. Note: if

# in practice this field proves unnecessary, remove it to lighten the

# messages.

'block' : str,

# The position of the cursor where the user hit 'TAB' on the line.

'cursor_pos' : int,

}

Message type: ``complete_reply``::

content = {

# The list of all matches to the completion request, such as

# ['a.isalnum', 'a.isalpha'] for the above example.

'matches' : list

}

History

-------

For clients to explicitly request history from a kernel. The kernel has all

the actual execution history stored in a single location, so clients can

request it from the kernel when needed.

Message type: ``history_request``::

content = {

# If True, also return output history in the resulting dict.

'output' : bool,

# If True, return the raw input history, else the transformed input.

'raw' : bool,

# This parameter can be one of: A number, a pair of numbers, None

# If not given, last 40 are returned.

# - number n: return the last n entries.

# - pair n1, n2: return entries in the range(n1, n2).

# - None: return all history

'index' : n or (n1, n2) or None,

}

Message type: ``history_reply``::

content = {

# A dict with prompt numbers as keys and either (input, output) or input

# as the value depending on whether output was True or False,

# respectively.

'history' : dict,

}

Messages on the PUB/SUB socket

==============================

Streams (stdout, stderr, etc)

------------------------------

Message type: ``stream``::

content = {

# The name of the stream is one of 'stdin', 'stdout', 'stderr'

'name' : str,

# The data is an arbitrary string to be written to that stream

'data' : str,

}

When a kernel receives a raw_input call, it should also broadcast it on the pub

socket with the names 'stdin' and 'stdin_reply'. This will allow other clients

to monitor/display kernel interactions and possibly replay them to their user

or otherwise expose them.

Python inputs

-------------

These messages are the re-broadcast of the ``execute_request``.

Message type: ``pyin``::

content = {

# Source code to be executed, one or more lines

'code' : str

}

Python outputs

--------------

When Python produces output from code that has been compiled in with the

'single' flag to :func:`compile`, any expression that produces a value (such as

``1+1``) is passed to ``sys.displayhook``, which is a callable that can do with

this value whatever it wants. The default behavior of ``sys.displayhook`` in

the Python interactive prompt is to print to ``sys.stdout`` the :func:`repr` of

the value as long as it is not ``None`` (which isn't printed at all). In our

case, the kernel instantiates as ``sys.displayhook`` an object which has

similar behavior, but which instead of printing to stdout, broadcasts these

values as ``pyout`` messages for clients to display appropriately.

Message type: ``pyout``::

content = {

# The data is typically the repr() of the object.

'data' : str,

# The counter for this execution is also provided so that clients can

# display it, since IPython automatically creates variables called _N (for

# prompt N).

'execution_count' : int,

}

Python errors

-------------

When an error occurs during code execution

Message type: ``pyerr``::

content = {

# Similar content to the execute_reply messages for the 'error' case,

# except the 'status' field is omitted.

}

Kernel crashes

--------------

When the kernel has an unexpected exception, caught by the last-resort

sys.excepthook, we should broadcast the crash handler's output before exiting.

This will allow clients to notice that a kernel died, inform the user and

propose further actions.

Message type: ``crash``::

content = {

# Similarly to the 'error' case for execute_reply messages, this will

# contain exc_name, exc_type and traceback fields.

# An additional field with supplementary information such as where to

# send the crash message

'info' : str,

}

Future ideas

------------

Other potential message types, currently unimplemented, listed below as ideas.

Message type: ``file``::

content = {

'path' : 'cool.jpg',

'mimetype' : str,

'data' : str,

}

Messages on the REQ/REP socket

==============================

This is a socket that goes in the opposite direction: from the kernel to a

*single* frontend, and its purpose is to allow ``raw_input`` and similar

operations that read from ``sys.stdin`` on the kernel to be fulfilled by the

client. For now we will keep these messages as simple as possible, since they

basically only mean to convey the ``raw_input(prompt)`` call.

Message type: ``input_request``::

content = { 'prompt' : str }

Message type: ``input_reply``::

content = { 'value' : str }

.. Note::

We do not explicitly try to forward the raw ``sys.stdin`` object, because in

practice the kernel should behave like an interactive program. When a

program is opened on the console, the keyboard effectively takes over the

``stdin`` file descriptor, and it can't be used for raw reading anymore.

Since the IPython kernel effectively behaves like a console program (albeit

one whose "keyboard" is actually living in a separate process and

transported over the zmq connection), raw ``stdin`` isn't expected to be

available.

Heartbeat for kernels

=====================

Initially we had considered using messages like those above over ZMQ for a

kernel 'heartbeat' (a way to detect quickly and reliably whether a kernel is

alive at all, even if it may be busy executing user code). But this has the

problem that if the kernel is locked inside extension code, it wouldn't execute

the python heartbeat code. But it turns out that we can implement a basic

heartbeat with pure ZMQ, without using any Python messaging at all.

The monitor sends out a single zmq message (right now, it is a str of the

monitor's lifetime in seconds), and gets the same message right back, prefixed

with the zmq identity of the XREQ socket in the heartbeat process. This can be

a uuid, or even a full message, but there doesn't seem to be a need for packing

up a message when the sender and receiver are the exact same Python object.

The model is this::

monitor.send(str(self.lifetime)) # '1.2345678910'

and the monitor receives some number of messages of the form::

['uuid-abcd-dead-beef', '1.2345678910']

where the first part is the zmq.IDENTITY of the heart's XREQ on the engine, and

the rest is the message sent by the monitor. No Python code ever has any

access to the message between the monitor's send, and the monitor's recv.

ToDo

====

Missing things include:

* Important: finish thinking through the payload concept and API.

* Important: ensure that we have a good solution for magics like %edit. It's

likely that with the payload concept we can build a full solution, but not

100% clear yet.

* Finishing the details of the heartbeat protocol.

* Signal handling: specify what kind of information kernel should broadcast (or

not) when it receives signals.

.. include:: ../links.rst

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages