From 8dbbf5e225c816fe2b74c5756ab0b3a558cd9303 2010-08-11 07:34:53
From: Fernando Perez <Fernando.Perez@berkeley.edu>
Date: 2010-08-11 07:34:53
Subject: [PATCH] Major overhaul of the messaging documentation.

---

diff --git a/docs/Makefile b/docs/Makefile
index e6a20f3..2a03b8d 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -39,8 +39,9 @@ pdf: latex
 
 all: html pdf
 
-dist: clean all
+dist: all
 	mkdir -p dist
+	rm -rf dist/*
 	ln build/latex/ipython.pdf dist/
 	cp -al build/html dist/
 	@echo "Build finished.  Final docs are in dist/"
@@ -98,3 +99,6 @@ linkcheck:
 gitwash-update:
 	python ../tools/gitwash_dumper.py source/development ipython
 	cd source/development/gitwash && rename 's/.rst/.txt/' *.rst
+
+nightly: dist
+	rsync -avH --delete dist/ ipython:www/doc/nightly
\ No newline at end of file
diff --git a/docs/autogen_api.py b/docs/autogen_api.py
index d098700..ac842a8 100755
--- a/docs/autogen_api.py
+++ b/docs/autogen_api.py
@@ -27,7 +27,12 @@ if __name__ == '__main__':
                                         r'\.config\.default',
                                         r'\.config\.profile',
                                         r'\.frontend',
-                                        r'\.gui'
+                                        r'\.gui',
+                                        # For now, the zmq code has
+                                        # unconditional top-level code so it's
+                                        # not import safe.  This needs fixing
+                                        # soon.
+                                        r'\.zmq',
                                         ]
 
     docwriter.module_skip_patterns += [ r'\.core\.fakemodule',
diff --git a/docs/source/development/messaging.txt b/docs/source/development/messaging.txt
index c8d4026..f1039ce 100644
--- a/docs/source/development/messaging.txt
+++ b/docs/source/development/messaging.txt
@@ -1,21 +1,82 @@
-=====================
-Message Specification
-=====================
-
-Note: not all of these have yet been fully fleshed out, but the key ones are,
-see kernel and frontend files for actual implementation details.
-
-Messages are dicts of dicts with string keys and values that are reasonably
-representable in JSON.  Our current implementation uses JSON explicitly as its
-message format, but this shouldn't be considered a permanent feature.  As we've
-discovered that JSON has non-trivial performance issues due to excessive
-copying, we may in the future move to a pure pickle-based raw message format.
-However, it should be possible to easily convert from the raw objects to JSON,
-since we may have non-python clients (e.g. a web frontend).  As long as it's
-easy to make a JSON version of the objects that is a faithful representation of
-all the data, we can communicate with such clients.
+======================
+ Messaging in IPython
+======================
 
 
+Introduction
+============
+
+This document explains the basic communications design and messaging
+specification for how the various IPython objects interact over a network
+transport.  The current implementation uses the ZeroMQ_ library for messaging
+within and between hosts.
+
+.. Note::
+
+   This document should be considered the authoritative description of the
+   IPython messaging protocol, and all developers are strongly encouraged to
+   keep it updated as the implementation evolves, so that we have a single
+   common reference for all protocol details.
+   
+The basic design is explained in the following diagram:
+
+.. image:: frontend-kernel.png
+   :width: 450px
+   :alt: IPython kernel/frontend messaging architecture.
+   :align: center
+   :target: ../_images/frontend-kernel.png
+
+A single kernel can be simultaneously connected to one or more frontends.  The
+kernel has three sockets that serve the following functions:
+
+1. REQ: this socket is connected to a *single* frontend at a time, and it allows
+   the kernel to request input from a frontend when :func:`raw_input` is called.
+   The frontend holding the matching REP socket acts as a 'virtual keyboard'
+   for the kernel while this communication is happening (illustrated in the
+   figure by the black outline around the central keyboard).  In practice,
+   frontends may display such kernel requests using a special input widget or
+   otherwise indicating that the user is to type input for the kernel instead
+   of normal commands in the frontend.
+
+2. XREP: this single sockets allows multiple incoming connections from
+   frontends, and this is the socket where requests for code execution, object
+   information, prompts, etc. are made to the kernel by any frontend.  The
+   communication on this socket is a sequence of request/reply actions from
+   each frontend and the kernel.
+
+3. PUB: this socket is the 'broadcast channel' where the kernel publishes all
+   side effects (stdout, stderr, etc.) as well as the requests coming from any
+   client over the XREP socket and its own requests on the REP socket.  There
+   are a number of actions in Python which generate side effects: :func:`print`
+   writes to ``sys.stdout``, errors generate tracebacks, etc.  Additionally, in
+   a multi-client scenario, we want all frontends to be able to know what each
+   other has sent to the kernel (this can be useful in collaborative scenarios,
+   for example).  This socket allows both side effects and the information
+   about communications taking place with one client over the XREQ/XREP channel
+   to be made available to all clients in a uniform manner.
+
+   All messages are tagged with enough information (details below) for clients
+   to know which messages come from their own interaction with the kernel and
+   which ones are from other clients, so they can display each type
+   appropriately.
+
+The actual format of the messages allowed on each of these channels is
+specified below.  Messages are dicts of dicts with string keys and values that
+are reasonably representable in JSON.  Our current implementation uses JSON
+explicitly as its message format, but this shouldn't be considered a permanent
+feature.  As we've discovered that JSON has non-trivial performance issues due
+to excessive copying, we may in the future move to a pure pickle-based raw
+message format.  However, it should be possible to easily convert from the raw
+objects to JSON, since we may have non-python clients (e.g. a web frontend).
+As long as it's easy to make a JSON version of the objects that is a faithful
+representation of all the data, we can communicate with such clients.
+
+.. Note::
+
+   Not all of these have yet been fully fleshed out, but the key ones are, see
+   kernel and frontend files for actual implementation details.
+
+   
 Python functional API
 =====================
 
@@ -26,100 +87,43 @@ for sending.
 
 
 General Message Format
-=====================
-
-General message format::
-
-    {
-        header : { 'msg_id' : 10,    # start with 0
-	           'username' : 'name',
-		   'session' : uuid
-		   },
-	parent_header : dict,
-        msg_type : 'string_message_type',
-        content : blackbox_dict , # Must be a dict
-    }
-
-    
-Request/Reply going from kernel for stdin
-=========================================
-
-This is a socket that goes in the opposite direction: from the kernel to a
-*single* frontend, and its purpose is to allow ``raw_input`` and similar
-operations that read from ``sys.stdin`` on the kernel to be fulfilled by the
-client.  For now we will keep these messages as simple as possible, since they
-basically only mean to convey the ``raw_input(prompt)`` call.
-
-Message type: 'input_request'::
-
-    content = { prompt : string }
-
-Message type: 'input_reply'::
-
-    content = { value : string }
-
-    
-Side effect: (PUB/SUB)
 ======================
 
-Message type: 'stream'::
-
-    content = {
-	name : 'stdout',
-	data : 'blob',
-    }
-
-When a kernel receives a raw_input call, it should also broadcast it on the pub
-socket with the names 'stdin' and 'stdin_reply'.  This will allow other clients
-to monitor/display kernel interactions and possibly replay them to their user
-or otherwise expose them.
+All messages send or received by any IPython process should have the following
+generic structure::
     
-Message type: 'pyin'::
-
-    content = {
-	code = 'x=1',
-    }
-
-Message type: 'pyout'::
+    {
+      # The message header contains a pair of unique identifiers for the
+      # originating session and the actual message id, in addition to the
+      # username for the process that generated the message.  This is useful in
+      # collaborative settings where multiple users may be interacting with the
+      # same kernel simultaneously, so that frontends can label the various
+      # messages in a meaningful way.
+      'header' : { 'msg_id' : uuid,
+                   'username' : str,
+		   'session' : uuid
+		 },
 
-    content = {
-	data = 'repr(obj)',
-	prompt_number = 10
-    }
+      # In a chain of messages, the header from the parent is copied so that
+      # clients can track where messages come from.
+      'parent_header' : dict,
 
-Message type: 'pyerr'::
+      # All recognized message type strings are listed below.
+      'msg_type' : str,
 
-    content = {
-       # Same as the data payload of a code execute_reply, minus the 'status'
-       # field. See below.
+      # The actual content of the message must be a dict, whose structure
+      # depends on the message type.x
+      'content' : dict,
     }
 
-When the kernel has an unexpected exception, caught by the last-resort
-sys.excepthook, we should broadcast the crash handler's output before exiting.
-This will allow clients to notice that a kernel died, inform the user and
-propose further actions.
+For each message type, the actual content will differ and all existing message
+types are specified in what follows of this document.
 
-Message type: 'crash'::
 
-    content = {
-	traceback : 'full traceback',
-	exc_type : 'TypeError',
-	exc_value :  'msg'
-    }
+Messages on the XREP/XREQ socket
+================================
 
-
-Other potential message types, currently unimplemented, listed below as ideas.
-    
-Message type: 'file'::
-    content = {
-	path : 'cool.jpg',
-	mimetype : string
-	data : 'blob'
-    }
-
-    
-Request/Reply
-=============
+.. _execute:
 
 Execute
 -------
@@ -132,22 +136,36 @@ splitting the input for blocks that can all be run as 'single', but in the long
 run it may prove cleaner to only use 'single' mode for truly single-line
 inputs, and run all multiline input in 'exec' mode.  This would preserve the
 natural behavior of single-line inputs while allowing long cells to behave more
-likea a script.  Some thought is still required here...
+likea a script.  This design will be refined as we complete the implementation.
 
-Message type: 'execute_request'::
+Message type: ``execute_request``::
 
     content = {
-	code : 'a = 10',
+        # Source code to be executed by the kernel, one or more lines.
+	'code' : str,
+
+	# A boolean flag which, if True, signals the kernel to execute this
+	# code as quietly as possible.  This means that the kernel will compile
+	# the code with 'exec' instead of 'single' (so sys.displayhook will not
+	# fire), and will *not*:
+	#   - broadcast exceptions on the PUB socket
+	#   - do any logging
+	#   - populate any history
+	# The default is False.
+	'silent' : bool,
     }
 
-Reply:
+Upon execution, the kernel *always* sends a reply, with a status code
+indicating what happened and additional data depending on the outcome.
 
-Message type: 'execute_reply'::
+Message type: ``execute_reply``::
 
     content = {
-      'status' : 'ok' OR 'error' OR 'abort'
+      # One of: 'ok' OR 'error' OR 'abort'
+      'status' : str,
+
       # Any additional data depends on status value
-}
+    }
 
 When status is 'ok', the following extra fields are present::
 
@@ -156,9 +174,9 @@ When status is 'ok', the following extra fields are present::
       # for the client to set up the *next* prompt (with identical limitations
       # to a prompt request)
       'next_prompt' : {
-            prompt_string : string
-	    prompt_number : int
-	    }
+            'prompt_string' : str,
+	    'prompt_number' : int,
+	    },
 	    
       # The prompt number of the actual execution for this code, which may be
       # different from the one used when the code was typed, which was the
@@ -167,25 +185,39 @@ When status is 'ok', the following extra fields are present::
       # kernel, since the numbers can go out of sync.  GUI clients can use this
       # to correct the previously written number in-place, terminal ones may
       # re-print a corrected one if desired.
-      'prompt_number' : number
+      'prompt_number' : int,
 
       # The kernel will often transform the input provided to it.  This
       # contains the transformed code, which is what was actually executed.
-      'transformed_code' : new_code
-
-      # This 'payload' needs a bit more thinking.  The basic idea is that
-      # certain actions will want to return additional information, such as
-      # magics producing data output for display by the clients.  We may need
-      # to define a few types of payload, or specify a syntax for the, not sure
-      # yet... FIXME here.
-      'payload' : things from page(), for example.
+      'transformed_code' : str,
+
+      # The execution payload is a dict with string keys that may have been
+      # produced by the code being executed.  It is retrieved by the kernel at
+      # the end of the execution and sent back to the front end, which can take
+      # action on it as needed.  See main text for further details.
+      'payload' : dict,
     }
 
+.. admonition:: Execution payloads
+    
+   The notion of an 'execution payload' is different from a return value of a
+   given set of code, which normally is just displayed on the pyout stream
+   through the PUB socket.  The idea of a payload is to allow special types of
+   code, typically magics, to populate a data container in the IPython kernel
+   that will be shipped back to the caller via this channel.  The kernel will
+   have an API for this, probably something along the lines of::
+
+       ip.exec_payload_add(key, value)
+
+   though this API is still in the design stages.  The data returned in this
+   payload will allow frontends to present special views of what just happened.
+
+   
 When status is 'error', the following extra fields are present::
 
     {
-      etype : str   # Exception type, as a string
-      evalue : str #  Exception value, as a string
+      'exc_name' : str,   # Exception name, as a string
+      'exc_value' : str,  # Exception value, as a string
 
       # The traceback will contain a list of frames, represented each as a
       # string.  For now we'll stick to the existing design of ultraTB, which
@@ -195,11 +227,12 @@ When status is 'error', the following extra fields are present::
       # how much of it to unpack.  But for now, let's start with a simple list
       # of strings, since that requires only minimal changes to ultratb as
       # written.
-      traceback : list of strings
+      'traceback' : list,
     }
 
 
-When status is 'abort', there are for now no additional data fields.
+When status is 'abort', there are for now no additional data fields.  This
+happens when the kernel was interrupted by a signal.
 
 
 Prompt
@@ -207,78 +240,321 @@ Prompt
 
 A simple request for a current prompt string.
 
-Message type: 'prompt_request'::
+Message type: ``prompt_request``::
 
     content = {}
 
 In the reply, the prompt string comes back with the prompt number placeholder
 *unevaluated*.  The message format is:
     
-Message type: 'prompt_reply'::
+Message type: ``prompt_reply``::
 
     content = {
-      prompt_string : string
-      prompt_number : int
+      'prompt_string' : str,
+      'prompt_number' : int,
     }
 
 Clients can produce a prompt with ``prompt_string.format(prompt_number)``, but
 they should be aware that the actual prompt number for that input could change
 later, in the case where multiple clients are interacting with a single
 kernel. 
+
+
+Object information
+------------------
+
+One of IPython's most used capabilities is the introspection of Python objects
+in the user's namespace, typically invoked via the ``?`` and ``??`` characters
+(which in reality are shorthands for the ``%pinfo`` magic).  This is used often
+enough that it warrants an explicit message type, especially because frontends
+may want to get object information in response to user keystrokes (like Tab or
+F1) besides from the user explicitly typing code like ``x??``.
+
+Message type: ``object_info_request``::
+
+    content = {
+        # The (possibly dotted) name of the object to be searched in all
+	# relevant namespaces
+	'name' : str,
+
+	# The level of detail desired.  The default (0) is equivalent to typing
+	# 'x?' at the prompt, 1 is equivalent to 'x??'.
+	'detail_level' : int,
+    }
+
+The returned information will be a dictionary with keys very similar to the
+field names that IPython prints at the terminal.
     
+Message type: ``object_info_reply``::
+
+    content = {
+        # Flags for magics and system aliases
+	'ismagic' : bool,
+	'isalias' : bool,
+
+	# The name of the namespace where the object was found ('builtin',
+	# 'magics', 'alias', 'interactive', etc.)
+	'namespace' : str,
+
+	# The type name will be type.__name__ for normal Python objects, but it
+	# can also be a string like 'Magic function' or 'System alias'
+	'type_name' : str,
+
+	'string_form' : str,
+
+	# For objects with a __class__ attribute this will be set
+	'base_class' : str,
+
+	# For objects with a __len__ attribute this will be set
+	'length' : int,
+
+	# If the object is a function, class or method whose file we can find,
+	# we give its full path
+	'file' : str,
+
+	# For pure Python callable objects, we can reconstruct the object
+	# definition line which provides its call signature
+	'definition' : str,
+
+	# For instances, provide the constructor signature (the definition of
+	# the __init__ method):
+	'init_definition' : str,
+	
+	# Docstrings: for any object (function, method, module, package) with a
+	# docstring, we show it.  But in addition, we may provide additional
+	# docstrings.  For example, for instances we will show the constructor
+	# and class docstrings as well, if available.
+	'docstring' : str,
+
+	# For instances, provide the constructor and class docstrings
+	'init_docstring' : str,
+	'class_docstring' : str,
+
+	# If detail_level was 1, we also try to find the source code that
+	# defines the object, if possible.  The string 'None' will indicate
+	# that no source was found.
+	'source' : str,
+    }
+
     
 Complete
 --------
 
-Message type: 'complete_request'::
+Message type: ``complete_request``::
 
     content = {
-	text : 'a.f',    # complete on this
-	line : 'print a.f'    # full line
+        # The text to be completed, such as 'a.is'
+	'text' : str,
+
+	# The full line, such as 'print a.is'.  This allows completers to
+	# make decisions that may require information about more than just the
+	# current word.
+	'line' : str,
     }
 
-Message type: 'complete_reply'::
+Message type: ``complete_reply``::
 
     content = {
-	matches : ['a.foo', 'a.bar']
+        # The list of all matches to the completion request, such as
+	# ['a.isalnum', 'a.isalpha'] for the above example.
+	'matches' : list
     }
 
     
 History
 -------
 
-For clients to explicitly request history from a kernel
+For clients to explicitly request history from a kernel.  The kernel has all
+the actual execution history stored in a single location, so clients can
+request it from the kernel when needed.
 
-Message type: 'history_request'::
+Message type: ``history_request``::
 
     content = {
-      output : boolean.  If true, also return output history in the resulting
-      dict.
-
-      range : optional. A number,  a pair of numbers, 'all'
-        If not given, last 40 are returned.
-        - number n: return the last n entries.
-	- pair n1, n2: return entries in the range(n1, n2).
-	- 'all': return all history
-
-      filter : optional, string
-        If given, treated as a regular expression and only matching entries are
-        returned.  re.search() is used to find matches.
+    
+      # If true, also return output history in the resulting dict.
+      'output' : bool,
+
+      # This parameter can be one of: A number,  a pair of numbers, 'all'
+      # If not given, last 40 are returned.
+      #  - number n: return the last n entries.
+      #  - pair n1, n2: return entries in the range(n1, n2).
+      #  - 'all': return all history
+      'range' : n or (n1, n2) or 'all',
+
+      # If a filter is given, it is  treated as a regular expression and only
+      #  matching entries are returned.  re.search() is used to find matches.
+      'filter' : str,
     }
 
-Message type: 'history_reply'::
+Message type: ``history_reply``::
 
     content = {
-      input : list of pairs (number, input)
-      output : list of pairs (number, output). Empty if not requested.
+      # A list of (number, input) pairs
+      'input' : list,
+      
+      # A list of (number, output) pairs
+      'output' : list,
       }
 
 
 Control
 -------
 
-Message type: 'heartbeat'::
+Message type: ``heartbeat``::
+
+    content = {
+        # FIXME - unfinished
+    }
+
+    
+Messages on the PUB/SUB socket
+==============================
+
+Streams (stdout,  stderr, etc)
+------------------------------
+
+Message type: ``stream``::
+
+    content = {
+        # The name of the stream is one of 'stdin', 'stdout', 'stderr'
+	'name' : str,
+	
+	# The data is an arbitrary string to be written to that stream
+	'data' : str,
+    }
+
+When a kernel receives a raw_input call, it should also broadcast it on the pub
+socket with the names 'stdin' and 'stdin_reply'.  This will allow other clients
+to monitor/display kernel interactions and possibly replay them to their user
+or otherwise expose them.
+
+Python inputs
+-------------
+
+These messages are the re-broadcast of the ``execute_request``.
+
+Message type: ``pyin``::
+
+    content = {
+        # Source code to be executed, one or more lines
+	'code' : str
+    }
+
+Python outputs
+--------------
+
+When Python produces output from code that has been compiled in with the
+'single' flag to :func:`compile`, any expression that produces a value (such as
+``1+1``) is passed to ``sys.displayhook``, which is a callable that can do with
+this value whatever it wants.  The default behavior of ``sys.displayhook`` in
+the Python interactive prompt is to print to ``sys.stdout`` the :func:`repr` of
+the value as long as it is not ``None`` (which isn't printed at all).  In our
+case, the kernel instantiates as ``sys.displayhook`` an object which has
+similar behavior, but which instead of printing to stdout, broadcasts these
+values as ``pyout`` messages for clients to display appropriately.
+
+Message type: ``pyout``::
+
+    content = {
+        # The data is typically the repr() of the object.
+	'data' : str,
+	
+	# The prompt number for this execution is also provided so that clients
+	# can display it, since IPython automatically creates variables called
+	# _N (for prompt N).
+	'prompt_number' : int,
+    }
+    
+Python errors
+-------------
+
+When an error occurs during code execution
+
+Message type: ``pyerr``::
+
+    content = {
+       # Similar content to the execute_reply messages for the 'error' case,
+       # except the 'status' field is omitted.
+    }
+
+Kernel crashes
+--------------
+
+When the kernel has an unexpected exception, caught by the last-resort
+sys.excepthook, we should broadcast the crash handler's output before exiting.
+This will allow clients to notice that a kernel died, inform the user and
+propose further actions.
+
+Message type: ``crash``::
 
     content = {
-    # XXX - unfinished
+       # Similarly to the 'error' case for execute_reply messages, this will
+       # contain exc_name, exc_type and traceback fields.
+
+       # An additional field with supplementary information such as where to
+       # send the crash message
+       'info' : str,
     }
+
+
+Future ideas
+------------
+    
+Other potential message types, currently unimplemented, listed below as ideas.
+    
+Message type: ``file``::
+
+    content = {
+	'path' : 'cool.jpg',
+	'mimetype' : str,
+	'data' : str,
+    }
+
+    
+Messages on the REQ/REP socket
+==============================
+
+This is a socket that goes in the opposite direction: from the kernel to a
+*single* frontend, and its purpose is to allow ``raw_input`` and similar
+operations that read from ``sys.stdin`` on the kernel to be fulfilled by the
+client.  For now we will keep these messages as simple as possible, since they
+basically only mean to convey the ``raw_input(prompt)`` call.
+
+Message type: ``input_request``::
+
+    content = { 'prompt' : str }
+
+Message type: ``input_reply``::
+
+    content = { 'value' : str }
+    
+.. Note::
+
+   We do not explicitly try to forward the raw ``sys.stdin`` object, because in
+   practice the kernel should behave like an interactive program.  When a
+   program is opened on the console, the keyboard effectively takes over the
+   ``stdin`` file descriptor, and it can't be used for raw reading anymore.
+   Since the IPython kernel effectively behaves like a console program (albeit
+   one whose "keyboard" is actually living in a separate process and
+   transported over the zmq connection), raw ``stdin`` isn't expected to be
+   available.
+
+
+ToDo
+====
+
+Missing things include:
+
+* Important: finish thinking through the payload concept and API.
+
+* Important: ensure that we have a good solution for magics like %edit.  It's
+  likely that with the payload concept we can build a full solution, but not
+  100% clear yet.
+
+* Finishing the details of the heartbeat protocol.
+
+* Signal handling: specify what kind of information kernel should broadcast (or
+  not) when it receives signals.
+
+.. include:: ../links.rst
diff --git a/docs/source/links.rst b/docs/source/links.rst
index abd5e7e..ce0b5ab 100644
--- a/docs/source/links.rst
+++ b/docs/source/links.rst
@@ -24,6 +24,8 @@
 .. _ipython_downloads: http://ipython.scipy.org/dist
 .. _ipython_pypi: http://pypi.python.org/pypi/ipython
 
+.. _ZeroMQ: http://zeromq.org
+
 .. Documentation tools and related links
 .. _graphviz: http://www.graphviz.org
 .. _Sphinx: http://sphinx.pocoo.org