parallel_connections.txt
160 lines
| 6.0 KiB
| text/plain
|
TextLexer
MinRK
|
r2790 | .. _parallel_connections: | ||
============================================== | ||||
Connection Diagrams of The IPython ZMQ Cluster | ||||
============================================== | ||||
MinRK
|
r3556 | This is a quick summary and illustration of the connections involved in the ZeroMQ based | ||
IPython cluster for parallel computing. | ||||
MinRK
|
r2790 | |||
All Connections | ||||
=============== | ||||
MinRK
|
r3600 | The Parallel Computing code is currently under development in IPython's newparallel_ | ||
branch on GitHub. | ||||
MinRK
|
r2790 | |||
MinRK
|
r3600 | .. _newparallel: http://github.com/ipython/ipython/tree/newparallel | ||
MinRK
|
r2790 | |||
MinRK
|
r3600 | The IPython cluster consists of a Controller, and one or more each of clients and engines. | ||
The goal of the Controller is to manage and monitor the connections and communications | ||||
between the clients and the engines. The Controller is no longer a single process entity, | ||||
but rather a collection of processes - specifically one Hub, and 3 (or more) Schedulers. | ||||
MinRK
|
r2790 | |||
MinRK
|
r3556 | It is important for security/practicality reasons that all connections be inbound to the | ||
MinRK
|
r3600 | controller processes. The arrows in the figures indicate the direction of the | ||
connection. | ||||
MinRK
|
r2790 | |||
.. figure:: figs/allconnections.png | ||||
MinRK
|
r3600 | :width: 432px | ||
:alt: IPython cluster connections | ||||
:align: center | ||||
MinRK
|
r2790 | |||
MinRK
|
r3600 | All the connections involved in connecting one client to one engine. | ||
The Controller consists of 1-4 processes. Central to the cluster is the **Hub**, which | ||||
monitors engine state, execution traffic, and handles registration and notification. The | ||||
Hub includes a Heartbeat Monitor for keeping track of engines that are alive. Outside the | ||||
Hub are 3 **Schedulers**. The MUX queue and Control queue are MonitoredQueue ØMQ | ||||
devices which relay explicitly addressed messages. The Task queue performs load-balancing | ||||
destination-agnostic scheduling. It may be a MonitoredQueue device, but may also be a | ||||
Python Scheduler that behaves externally in an identical fashion to MQ devices, but with | ||||
additional internal logic. | ||||
MinRK
|
r2790 | |||
Registration | ||||
------------ | ||||
.. figure:: figs/regfade.png | ||||
MinRK
|
r3600 | :width: 432px | ||
:alt: IPython Registration connections | ||||
:align: center | ||||
Engines and Clients only need to know where the Registrar ``XREP`` is located to start | ||||
connecting. | ||||
MinRK
|
r2790 | |||
MinRK
|
r3556 | Once a controller is launched, the only information needed for connecting clients and/or | ||
MinRK
|
r3600 | engines is the IP/port of the Hub's ``XREP`` socket called the Registrar. This socket | ||
handles connections from both clients and engines, and replies with the remaining | ||||
MinRK
|
r3556 | information necessary to establish the remaining connections. | ||
MinRK
|
r2790 | |||
Heartbeat | ||||
--------- | ||||
.. figure:: figs/hbfade.png | ||||
MinRK
|
r3600 | :width: 432px | ||
:alt: IPython Registration connections | ||||
:align: center | ||||
The heartbeat sockets. | ||||
The heartbeat process has been described elsewhere. To summarize: the Heartbeat Monitor | ||||
publishes a distinct message periodically via a ``PUB`` socket. Each engine has a | ||||
``zmq.FORWARDER`` device with a ``SUB`` socket for input, and ``XREQ`` socket for output. | ||||
The ``SUB`` socket is connected to the ``PUB`` socket labeled *ping*, and the ``XREQ`` is | ||||
connected to the ``XREP`` labeled *pong*. This results in the same message being relayed | ||||
back to the Heartbeat Monitor with the addition of the ``XREQ`` prefix. The Heartbeat | ||||
Monitor receives all the replies via an ``XREP`` socket, and identifies which hearts are | ||||
still beating by the ``zmq.IDENTITY`` prefix of the ``XREQ`` sockets, which information | ||||
the Hub uses to notify clients of any changes in the available engines. | ||||
Schedulers | ||||
---------- | ||||
MinRK
|
r2790 | |||
.. figure:: figs/queuefade.png | ||||
MinRK
|
r3556 | :width: 432px | ||
:alt: IPython Queue connections | ||||
:align: center | ||||
MinRK
|
r3600 | Load balanced Task scheduler on the left, explicitly multiplexed schedulers on the | ||
right. | ||||
MinRK
|
r3556 | |||
MinRK
|
r3600 | The controller has at least three Schedulers. These devices are primarily for | ||
relaying messages between clients and engines, but the controller needs to see those | ||||
messages for its own purposes. Since no Python code may exist between the two sockets in a | ||||
queue, all messages sent through these queues (both directions) are also sent via a | ||||
``PUB`` socket to a monitor, which allows the Hub to monitor queue traffic without | ||||
interfering with it. | ||||
MinRK
|
r3556 | |||
For tasks, the engine need not be specified. Messages sent to the ``XREP`` socket from the | ||||
client side are assigned to an engine via ZMQ's ``XREQ`` round-robin load balancing. | ||||
Engine replies are directed to specific clients via the IDENTITY of the client, which is | ||||
received as a prefix at the Engine. | ||||
For Multiplexing, ``XREP`` is used for both in and output sockets in the device. Clients | ||||
must specify the destination by the ``zmq.IDENTITY`` of the ``PAIR`` socket connected to | ||||
the downstream end of the device. | ||||
At the Kernel level, both of these PAIR sockets are treated in the same way as the ``REP`` | ||||
socket in the serial version (except using ZMQStreams instead of explicit sockets). | ||||
MinRK
|
r3600 | |||
IOPub | ||||
----- | ||||
.. figure:: figs/iopubfade.png | ||||
:width: 432px | ||||
:alt: IOPub connections | ||||
:align: center | ||||
stdin/out/err are published via a ``PUB/SUB`` relay | ||||
.. note:: | ||||
This isn't actually hooked up yet. | ||||
On the kernels, stdin/stdout/stderr are captured and published via a ``PUB`` socket. These | ||||
``PUB`` sockets all connect to a ``SUB`` socket on the Hub, which subscribes to all | ||||
messages. They are then republished via another ``PUB`` socket in the Hub, which can be | ||||
subscribed by the clients. | ||||
.. note:: | ||||
Once implemented, this will likely be another MonitoredQueue. | ||||
MinRK
|
r2790 | Client connections | ||
------------------ | ||||
.. figure:: figs/queryfade.png | ||||
MinRK
|
r3600 | :width: 432px | ||
:alt: IPython client query connections | ||||
:align: center | ||||
Clients connect to an ``XREP`` socket to query the hub | ||||
The hub listens on an ``XREP`` socket for queries from clients as to queue status, | ||||
and control instructions. Clients can connect to this via a ``PAIR`` socket or ``XREQ``. | ||||
MinRK
|
r2790 | |||
.. figure:: figs/notiffade.png | ||||
MinRK
|
r3600 | :width: 432px | ||
:alt: IPython Registration connections | ||||
:align: center | ||||
Engine registration events are published via a ``PUB`` socket. | ||||
MinRK
|
r2790 | |||
MinRK
|
r3600 | The Hub publishes all registration/unregistration events via a ``PUB`` socket. This | ||
MinRK
|
r3556 | allows clients to stay up to date with what engines are available by subscribing to the | ||
feed with a ``SUB`` socket. Other processes could selectively subscribe to just | ||||
registration or unregistration events. | ||||
MinRK
|
r2790 | |||