upstream/ipython Commit - r8109:a2c57acb

1

.. _parallel_messages:

1

.. _parallel_messages:

2

3

Messaging for Parallel Computing

3

Messaging for Parallel Computing

4

================================

4

================================

5

6

This is an extension of the :ref:`messaging <messaging>` doc. Diagrams of the connections

6

This is an extension of the :ref:`messaging <messaging>` doc. Diagrams of the connections

7

can be found in the :ref:`parallel connections <parallel_connections>` doc.

7

can be found in the :ref:`parallel connections <parallel_connections>` doc.

8

9

10

ZMQ messaging is also used in the parallel computing IPython system. All messages to/from

10

ZMQ messaging is also used in the parallel computing IPython system. All messages to/from

11

kernels remain the same as the single kernel model, and are forwarded through a ZMQ Queue

11

kernels remain the same as the single kernel model, and are forwarded through a ZMQ Queue

12

device. The controller receives all messages and replies in these channels, and saves

12

device. The controller receives all messages and replies in these channels, and saves

13

results for future use.

13

results for future use.

14

15

The Controller

15

The Controller

16

--------------

16

--------------

17

18

The controller is the central collection of processes in the IPython parallel computing

18

The controller is the central collection of processes in the IPython parallel computing

19

model. It has two major components:

19

model. It has two major components:

20

21

* The Hub

21

* The Hub

22

* A collection of Schedulers

22

* A collection of Schedulers

23

24

The Hub

24

The Hub

25

-------

25

-------

26

27

The Hub is the central process for monitoring the state of the engines, and all task

27

The Hub is the central process for monitoring the state of the engines, and all task

28

requests and results. It has no role in execution and does no relay of messages, so

28

requests and results. It has no role in execution and does no relay of messages, so

29

large blocking requests or database actions in the Hub do not have the ability to impede

29

large blocking requests or database actions in the Hub do not have the ability to impede

30

job submission and results.

30

job submission and results.

31

32

Registration (``ROUTER``)

32

Registration (``ROUTER``)

33

***********************

33

*************************

34

35

The first function of the Hub is to facilitate and monitor connections of clients

35

The first function of the Hub is to facilitate and monitor connections of clients

36

and engines. Both client and engine registration are handled by the same socket, so only

36

and engines. Both client and engine registration are handled by the same socket, so only

37

one ip/port pair is needed to connect any number of connections and clients.

37

one ip/port pair is needed to connect any number of connections and clients.

38

39

Engines register with the ``zmq.IDENTITY`` of their two ``DEALER`` sockets, one for the

39

Engines register with the ``zmq.IDENTITY`` of their two ``DEALER`` sockets, one for the

40

queue, which receives execute requests, and one for the heartbeat, which is used to

40

queue, which receives execute requests, and one for the heartbeat, which is used to

41

monitor the survival of the Engine process.

41

monitor the survival of the Engine process.

42

43

Message type: ``registration_request``::

43

Message type: ``registration_request``::

44

45

content = {

45

content = {

46

'uuid' : 'abcd-1234-...', # the zmq.IDENTITY of the engine's sockets

46

'uuid' : 'abcd-1234-...', # the zmq.IDENTITY of the engine's sockets

47

}

47

}

48

49

.. note::

49

.. note::

50

51

these are always the same, at least for now.

51

these are always the same, at least for now.

52

53

The Controller replies to an Engine's registration request with the engine's integer ID,

53

The Controller replies to an Engine's registration request with the engine's integer ID,

54

and all the remaining connection information for connecting the heartbeat process, and

54

and all the remaining connection information for connecting the heartbeat process, and

55

kernel queue socket(s). The message status will be an error if the Engine requests IDs that

55

kernel queue socket(s). The message status will be an error if the Engine requests IDs that

56

already in use.

56

already in use.

57

58

Message type: ``registration_reply``::

58

Message type: ``registration_reply``::

59

60

content = {

60

content = {

61

'status' : 'ok', # or 'error'

61

'status' : 'ok', # or 'error'

62

# if ok:

62

# if ok:

63

'id' : 0, # int, the engine id

63

'id' : 0, # int, the engine id

64

}

64

}

65

66

Clients use the same socket as engines to start their connections. Connection requests

66

Clients use the same socket as engines to start their connections. Connection requests

67

from clients need no information:

67

from clients need no information:

68

69

Message type: ``connection_request``::

69

Message type: ``connection_request``::

70

71

content = {}

71

content = {}

72

73

The reply to a Client registration request contains the connection information for the

73

The reply to a Client registration request contains the connection information for the

74

multiplexer and load balanced queues, as well as the address for direct hub

74

multiplexer and load balanced queues, as well as the address for direct hub

75

queries. If any of these addresses is `None`, that functionality is not available.

75

queries. If any of these addresses is `None`, that functionality is not available.

76

77

Message type: ``connection_reply``::

77

Message type: ``connection_reply``::

78

79

content = {

79

content = {

80

'status' : 'ok', # or 'error'

80

'status' : 'ok', # or 'error'

81

}

81

}

82

83

Heartbeat

83

Heartbeat

84

*********

84

*********

85

86

The hub uses a heartbeat system to monitor engines, and track when they become

86

The hub uses a heartbeat system to monitor engines, and track when they become

87

unresponsive. As described in :ref:`messaging <messaging>`, and shown in :ref:`connections

87

unresponsive. As described in :ref:`messaging <messaging>`, and shown in :ref:`connections

88

<parallel_connections>`.

88

<parallel_connections>`.

89

90

Notification (``PUB``)

90

Notification (``PUB``)

91

**********************

91

**********************

92

93

The hub publishes all engine registration/unregistration events on a ``PUB`` socket.

93

The hub publishes all engine registration/unregistration events on a ``PUB`` socket.

94

This allows clients to have up-to-date engine ID sets without polling. Registration

94

This allows clients to have up-to-date engine ID sets without polling. Registration

95

notifications contain both the integer engine ID and the queue ID, which is necessary for

95

notifications contain both the integer engine ID and the queue ID, which is necessary for

96

sending messages via the Multiplexer Queue and Control Queues.

96

sending messages via the Multiplexer Queue and Control Queues.

97

98

Message type: ``registration_notification``::

98

Message type: ``registration_notification``::

99

100

content = {

100

content = {

101

'id' : 0, # engine ID that has been registered

101

'id' : 0, # engine ID that has been registered

102

'uuid' : 'engine_id' # the IDENT for the engine's sockets

102

'uuid' : 'engine_id' # the IDENT for the engine's sockets

103

}

103

}

104

105

Message type : ``unregistration_notification``::

105

Message type : ``unregistration_notification``::

106

107

content = {

107

content = {

108

'id' : 0 # engine ID that has been unregistered

108

'id' : 0 # engine ID that has been unregistered

109

'uuid' : 'engine_id' # the IDENT for the engine's sockets

109

'uuid' : 'engine_id' # the IDENT for the engine's sockets

110

}

110

}

111

112

113

Client Queries (``ROUTER``)

113

Client Queries (``ROUTER``)

114

*************************

114

***************************

115

116

The hub monitors and logs all queue traffic, so that clients can retrieve past

116

The hub monitors and logs all queue traffic, so that clients can retrieve past

117

results or monitor pending tasks. This information may reside in-memory on the Hub, or

117

results or monitor pending tasks. This information may reside in-memory on the Hub, or

118

on disk in a database (SQLite and MongoDB are currently supported). These requests are

118

on disk in a database (SQLite and MongoDB are currently supported). These requests are

119

handled by the same socket as registration.

119

handled by the same socket as registration.

120

121

122

:func:`queue_request` requests can specify multiple engines to query via the `targets`

122

:func:`queue_request` requests can specify multiple engines to query via the `targets`

123

element. A verbose flag can be passed, to determine whether the result should be the list

123

element. A verbose flag can be passed, to determine whether the result should be the list

124

of `msg_ids` in the queue or simply the length of each list.

124

of `msg_ids` in the queue or simply the length of each list.

125

126

Message type: ``queue_request``::

126

Message type: ``queue_request``::

127

128

content = {

128

content = {

129

'verbose' : True, # whether return should be lists themselves or just lens

129

'verbose' : True, # whether return should be lists themselves or just lens

130

'targets' : [0,3,1] # list of ints

130

'targets' : [0,3,1] # list of ints

131

}

131

}

132

133

The content of a reply to a :func:`queue_request` request is a dict, keyed by the engine

133

The content of a reply to a :func:`queue_request` request is a dict, keyed by the engine

134

IDs. Note that they will be the string representation of the integer keys, since JSON

134

IDs. Note that they will be the string representation of the integer keys, since JSON

135

cannot handle number keys. The three keys of each dict are::

135

cannot handle number keys. The three keys of each dict are::

136

137

'completed' : messages submitted via any queue that ran on the engine

137

'completed' : messages submitted via any queue that ran on the engine

138

'queue' : jobs submitted via MUX queue, whose results have not been received

138

'queue' : jobs submitted via MUX queue, whose results have not been received

139

'tasks' : tasks that are known to have been submitted to the engine, but

139

'tasks' : tasks that are known to have been submitted to the engine, but

140

have not completed. Note that with the pure zmq scheduler, this will

140

have not completed. Note that with the pure zmq scheduler, this will

141

always be 0/[].

141

always be 0/[].

142

143

Message type: ``queue_reply``::

143

Message type: ``queue_reply``::

144

145

content = {

145

content = {

146

'status' : 'ok', # or 'error'

146

'status' : 'ok', # or 'error'

147

# if verbose=False:

147

# if verbose=False:

148

'0' : {'completed' : 1, 'queue' : 7, 'tasks' : 0},

148

'0' : {'completed' : 1, 'queue' : 7, 'tasks' : 0},

149

# if verbose=True:

149

# if verbose=True:

150

'1' : {'completed' : ['abcd-...','1234-...'], 'queue' : ['58008-'], 'tasks' : []},

150

'1' : {'completed' : ['abcd-...','1234-...'], 'queue' : ['58008-'], 'tasks' : []},

151

}

151

}

152

153

Clients can request individual results directly from the hub. This is primarily for

153

Clients can request individual results directly from the hub. This is primarily for

154

gathering results of executions not submitted by the requesting client, as the client

154

gathering results of executions not submitted by the requesting client, as the client

155

will have all its own results already. Requests are made by msg_id, and can contain one or

155

will have all its own results already. Requests are made by msg_id, and can contain one or

156

more msg_id. An additional boolean key 'statusonly' can be used to not request the

156

more msg_id. An additional boolean key 'statusonly' can be used to not request the

157

results, but simply poll the status of the jobs.

157

results, but simply poll the status of the jobs.

158

159

Message type: ``result_request``::

159

Message type: ``result_request``::

160

161

content = {

161

content = {

162

'msg_ids' : ['uuid','...'], # list of strs

162

'msg_ids' : ['uuid','...'], # list of strs

163

'targets' : [1,2,3], # list of int ids or uuids

163

'targets' : [1,2,3], # list of int ids or uuids

164

'statusonly' : False, # bool

164

'statusonly' : False, # bool

165

}

165

}

166

167

The :func:`result_request` reply contains the content objects of the actual execution

167

The :func:`result_request` reply contains the content objects of the actual execution

168

reply messages. If `statusonly=True`, then there will be only the 'pending' and

168

reply messages. If `statusonly=True`, then there will be only the 'pending' and

169

'completed' lists.

169

'completed' lists.

170

171

172

Message type: ``result_reply``::

172

Message type: ``result_reply``::

173

174

content = {

174

content = {

175

'status' : 'ok', # else error

175

'status' : 'ok', # else error

176

# if ok:

176

# if ok:

177

'acbd-...' : msg, # the content dict is keyed by msg_ids,

177

'acbd-...' : msg, # the content dict is keyed by msg_ids,

178

# values are the result messages

178

# values are the result messages

179

# there will be none of these if `statusonly=True`

179

# there will be none of these if `statusonly=True`

180

'pending' : ['msg_id','...'], # msg_ids still pending

180

'pending' : ['msg_id','...'], # msg_ids still pending

181

'completed' : ['msg_id','...'], # list of completed msg_ids

181

'completed' : ['msg_id','...'], # list of completed msg_ids

182

}

182

}

183

buffers = ['bufs','...'] # the buffers that contained the results of the objects.

183

buffers = ['bufs','...'] # the buffers that contained the results of the objects.

184

# this will be empty if no messages are complete, or if

184

# this will be empty if no messages are complete, or if

185

# statusonly is True.

185

# statusonly is True.

186

187

For memory management purposes, Clients can also instruct the hub to forget the

187

For memory management purposes, Clients can also instruct the hub to forget the

188

results of messages. This can be done by message ID or engine ID. Individual messages are

188

results of messages. This can be done by message ID or engine ID. Individual messages are

189

dropped by msg_id, and all messages completed on an engine are dropped by engine ID. This

189

dropped by msg_id, and all messages completed on an engine are dropped by engine ID. This

190

may no longer be necessary with the mongodb-based message logging backend.

190

may no longer be necessary with the mongodb-based message logging backend.

191

192

If the msg_ids element is the string ``'all'`` instead of a list, then all completed

192

If the msg_ids element is the string ``'all'`` instead of a list, then all completed

193

results are forgotten.

193

results are forgotten.

194

195

Message type: ``purge_request``::

195

Message type: ``purge_request``::

196

197

content = {

197

content = {

198

'msg_ids' : ['id1', 'id2',...], # list of msg_ids or 'all'

198

'msg_ids' : ['id1', 'id2',...], # list of msg_ids or 'all'

199

'engine_ids' : [0,2,4] # list of engine IDs

199

'engine_ids' : [0,2,4] # list of engine IDs

200

}

200

}

201

202

The reply to a purge request is simply the status 'ok' if the request succeeded, or an

202

The reply to a purge request is simply the status 'ok' if the request succeeded, or an

203

explanation of why it failed, such as requesting the purge of a nonexistent or pending

203

explanation of why it failed, such as requesting the purge of a nonexistent or pending

204

message.

204

message.

205

206

Message type: ``purge_reply``::

206

Message type: ``purge_reply``::

207

208

content = {

208

content = {

209

'status' : 'ok', # or 'error'

209

'status' : 'ok', # or 'error'

210

}

210

}

211

212

213

Schedulers

213

Schedulers

214

----------

214

----------

215

216

There are three basic schedulers:

216

There are three basic schedulers:

217

218

* Task Scheduler

218

* Task Scheduler

219

* MUX Scheduler

219

* MUX Scheduler

220

* Control Scheduler

220

* Control Scheduler

221

222

The MUX and Control schedulers are simple MonitoredQueue ØMQ devices, with ``ROUTER``

222

The MUX and Control schedulers are simple MonitoredQueue ØMQ devices, with ``ROUTER``

223

sockets on either side. This allows the queue to relay individual messages to particular

223

sockets on either side. This allows the queue to relay individual messages to particular

224

targets via ``zmq.IDENTITY`` routing. The Task scheduler may be a MonitoredQueue ØMQ

224

targets via ``zmq.IDENTITY`` routing. The Task scheduler may be a MonitoredQueue ØMQ

225

device, in which case the client-facing socket is ``ROUTER``, and the engine-facing socket

225

device, in which case the client-facing socket is ``ROUTER``, and the engine-facing socket

226

is ``DEALER``. The result of this is that client-submitted messages are load-balanced via

226

is ``DEALER``. The result of this is that client-submitted messages are load-balanced via

227

the ``DEALER`` socket, but the engine's replies to each message go to the requesting client.

227

the ``DEALER`` socket, but the engine's replies to each message go to the requesting client.

228

229

Raw ``DEALER`` scheduling is quite primitive, and doesn't allow message introspection, so

229

Raw ``DEALER`` scheduling is quite primitive, and doesn't allow message introspection, so

230

there are also Python Schedulers that can be used. These Schedulers behave in much the

230

there are also Python Schedulers that can be used. These Schedulers behave in much the

231

same way as a MonitoredQueue does from the outside, but have rich internal logic to

231

same way as a MonitoredQueue does from the outside, but have rich internal logic to

232

determine destinations, as well as handle dependency graphs Their sockets are always

232

determine destinations, as well as handle dependency graphs Their sockets are always

233

``ROUTER`` on both sides.

233

``ROUTER`` on both sides.

234

235

The Python task schedulers have an additional message type, which informs the Hub of

235

The Python task schedulers have an additional message type, which informs the Hub of

236

the destination of a task as soon as that destination is known.

236

the destination of a task as soon as that destination is known.

237

238

Message type: ``task_destination``::

238

Message type: ``task_destination``::

239

240

content = {

240

content = {

241

'msg_id' : 'abcd-1234-...', # the msg's uuid

241

'msg_id' : 'abcd-1234-...', # the msg's uuid

242

'engine_id' : '1234-abcd-...', # the destination engine's zmq.IDENTITY

242

'engine_id' : '1234-abcd-...', # the destination engine's zmq.IDENTITY

243

}

243

}

244

245

:func:`apply` and :func:`apply_bound`

245

:func:`apply`

246

*************************************

246

*************

247

248

In terms of message classes, the MUX scheduler and Task scheduler relay the exact same

248

In terms of message classes, the MUX scheduler and Task scheduler relay the exact same

249

message types. Their only difference lies in how the destination is selected.

249

message types. Their only difference lies in how the destination is selected.

250

251

The `Namespace <http://gist.github.com/483294>`_ model suggests that execution be able to

251

The `Namespace <http://gist.github.com/483294>`_ model suggests that execution be able to

252

use the model::

252

use the model::

253

254

ns.apply(f, *args, **kwargs)

254

ns.apply(f, *args, **kwargs)

255

256

which takes `f`, a function in the user's namespace, and executes ``f(*args, **kwargs)``

256

which takes `f`, a function in the user's namespace, and executes ``f(*args, **kwargs)``

257

on a remote engine, returning the result (or, for non-blocking, information facilitating

257

on a remote engine, returning the result (or, for non-blocking, information facilitating

258

later retrieval of the result). This model, unlike the execute message which just uses a

258

later retrieval of the result). This model, unlike the execute message which just uses a

259

code string, must be able to send arbitrary (pickleable) Python objects. And ideally, copy

259

code string, must be able to send arbitrary (pickleable) Python objects. And ideally, copy

260

as little data as we can. The `buffers` property of a Message was introduced for this

260

as little data as we can. The `buffers` property of a Message was introduced for this

261

purpose.

261

purpose.

262

263

Utility method :func:`build_apply_message` in :mod:`IPython.zmq.s~~treamsession~~` wraps a

263

Utility method :func:`build_apply_message` in :mod:`IPython.zmq.serialize` wraps a

264

function signature and builds a sendable buffer format for minimal data copying (exactly

264

function signature and builds a sendable buffer format for minimal data copying (exactly

265

zero copies of numpy array data or buffers or large strings).

265

zero copies of numpy array data or buffers or large strings).

266

267

Message type: ``apply_request``::

267

Message type: ``apply_request``::

268

269

~~content~~ = {

269

metadata = {

270

'bound' : True, # whether to execute in the engine's namespace or unbound

271

'after' : ['msg_id',...], # list of msg_ids or output of Dependency.as_dict()

270

'after' : ['msg_id',...], # list of msg_ids or output of Dependency.as_dict()

272

'follow' : ['msg_id',...], # list of msg_ids or output of Dependency.as_dict()

271

'follow' : ['msg_id',...], # list of msg_ids or output of Dependency.as_dict()

273

274

}

272

}

273

content = {}

275

buffers = ['...'] # at least 3 in length

274

buffers = ['...'] # at least 3 in length

276

# as built by build_apply_message(f,args,kwargs)

275

# as built by build_apply_message(f,args,kwargs)

277

276

278

after/follow represent task dependencies. 'after' corresponds to a time dependency. The

277

after/follow represent task dependencies. 'after' corresponds to a time dependency. The

279

request will not arrive at an engine until the 'after' dependency tasks have completed.

278

request will not arrive at an engine until the 'after' dependency tasks have completed.

280

'follow' corresponds to a location dependency. The task will be submitted to the same

279

'follow' corresponds to a location dependency. The task will be submitted to the same

281

engine as these msg_ids (see :class:`Dependency` docs for details).

280

engine as these msg_ids (see :class:`Dependency` docs for details).

282

281

283

Message type: ``apply_reply``::

282

Message type: ``apply_reply``::

284

283

285

content = {

284

content = {

286

'status' : 'ok' # 'ok' or 'error'

285

'status' : 'ok' # 'ok' or 'error'

287

# other error info here, as in other messages

286

# other error info here, as in other messages

288

}

287

}

289

buffers = ['...'] # either 1 or 2 in length

288

buffers = ['...'] # either 1 or 2 in length

290

# a serialization of the return value of f(*args,**kwargs)

289

# a serialization of the return value of f(*args,**kwargs)

291

# only populated if status is 'ok'

290

# only populated if status is 'ok'

292

291

293

All engine execution and data movement is performed via apply messages.

292

All engine execution and data movement is performed via apply messages.

294

293

295

Control Messages

294

Control Messages

296

----------------

295

----------------

297

296

298

Messages that interact with the engines, but are not meant to execute code, are submitted

297

Messages that interact with the engines, but are not meant to execute code, are submitted

299

via the Control queue. These messages have high priority, and are thus received and

298

via the Control queue. These messages have high priority, and are thus received and

300

handled before any execution requests.

299

handled before any execution requests.

301

300

302

Clients may want to clear the namespace on the engine. There are no arguments nor

301

Clients may want to clear the namespace on the engine. There are no arguments nor

303

information involved in this request, so the content is empty.

302

information involved in this request, so the content is empty.

304

303

305

Message type: ``clear_request``::

304

Message type: ``clear_request``::

306

305

307

content = {}

306

content = {}

308

307

309

Message type: ``clear_reply``::

308

Message type: ``clear_reply``::

310

309

311

content = {

310

content = {

312

'status' : 'ok' # 'ok' or 'error'

311

'status' : 'ok' # 'ok' or 'error'

313

# other error info here, as in other messages

312

# other error info here, as in other messages

314

}

313

}

315

314

316

Clients may want to abort tasks that have not yet run. This can by done by message id, or

315

Clients may want to abort tasks that have not yet run. This can by done by message id, or

317

all enqueued messages can be aborted if None is specified.

316

all enqueued messages can be aborted if None is specified.

318

317

319

Message type: ``abort_request``::

318

Message type: ``abort_request``::

320

319

321

content = {

320

content = {

322

'msg_ids' : ['1234-...', '...'] # list of msg_ids or None

321

'msg_ids' : ['1234-...', '...'] # list of msg_ids or None

323

}

322

}

324

323

325

Message type: ``abort_reply``::

324

Message type: ``abort_reply``::

326

325

327

content = {

326

content = {

328

'status' : 'ok' # 'ok' or 'error'

327

'status' : 'ok' # 'ok' or 'error'

329

# other error info here, as in other messages

328

# other error info here, as in other messages

330

}

329

}

331

330

332

The last action a client may want to do is shutdown the kernel. If a kernel receives a

331

The last action a client may want to do is shutdown the kernel. If a kernel receives a

333

shutdown request, then it aborts all queued messages, replies to the request, and exits.

332

shutdown request, then it aborts all queued messages, replies to the request, and exits.

334

333

335

Message type: ``shutdown_request``::

334

Message type: ``shutdown_request``::

336

335

337

content = {}

336

content = {}

338

337

339

Message type: ``shutdown_reply``::

338

Message type: ``shutdown_reply``::

340

339

341

content = {

340

content = {

342

'status' : 'ok' # 'ok' or 'error'

341

'status' : 'ok' # 'ok' or 'error'

343

# other error info here, as in other messages

342

# other error info here, as in other messages

344

}

343

}

345

344

346

345

347

Implementation

346

Implementation

348

--------------

347

--------------

349

348

350

There are a few differences in implementation between the `StreamSession` object used in

349

There are a few differences in implementation between the `StreamSession` object used in

351

the newparallel branch and the `Session` object, the main one being that messages are

350

the newparallel branch and the `Session` object, the main one being that messages are

352

sent in parts, rather than as a single serialized object. `StreamSession` objects also

351

sent in parts, rather than as a single serialized object. `StreamSession` objects also

353

take pack/unpack functions, which are to be used when serializing/deserializing objects.

352

take pack/unpack functions, which are to be used when serializing/deserializing objects.

354

These can be any functions that translate to/from formats that ZMQ sockets can send

353

These can be any functions that translate to/from formats that ZMQ sockets can send

355

(buffers,bytes, etc.).

354

(buffers,bytes, etc.).

356

355

357

Split Sends

356

Split Sends

358

***********

357

***********

359

358

360

Previously, messages were bundled as a single json object and one call to

359

Previously, messages were bundled as a single json object and one call to

361

:func:`socket.send_json`. Since the hub inspects all messages, and doesn't need to

360

:func:`socket.send_json`. Since the hub inspects all messages, and doesn't need to

362

see the content of the messages, which can be large, messages are now serialized and sent in

361

see the content of the messages, which can be large, messages are now serialized and sent in

363

pieces. All messages are sent in at least 3 parts: the header, the parent header, and the

362

pieces. All messages are sent in at least 4 parts: the header, the parent header, the metadata and the content.

364

~~content.~~ This allows the controller to unpack and inspect the (always small) header,

363

This allows the controller to unpack and inspect the (always small) header,

365

without spending time unpacking the content unless the message is bound for the

364

without spending time unpacking the content unless the message is bound for the

366

controller. Buffers are added on to the end of the message, and can be any objects that

365

controller. Buffers are added on to the end of the message, and can be any objects that

367

present the buffer interface.

366

present the buffer interface.

368

367

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

             .. _parallel_messages:
             Messaging for Parallel Computing
             ================================
             This is an extension of the :ref:`messaging <messaging>` doc. Diagrams of the connections
             can be found in the :ref:`parallel connections <parallel_connections>` doc.
             ZMQ messaging is also used in the parallel computing IPython system. All messages to/from
             kernels remain the same as the single kernel model, and are forwarded through a ZMQ Queue
             device. The controller receives all messages and replies in these channels, and saves
             results for future use.
             The Controller
             --------------
             The controller is the central collection of processes in the IPython parallel computing
             model. It has two major components:
                 * The Hub
                 * A collection of Schedulers
             The Hub
             -------
             The Hub is the central process for monitoring the state of the engines, and all task
             requests and results.  It has no role in execution and does no relay of messages, so
             large blocking requests or database actions in the Hub do not have the ability to impede
             job submission and results.
             Registration (``ROUTER``)
-            ***********************
+            *************************
             The first function of the Hub is to facilitate and monitor connections of clients
             and engines. Both client and engine registration are handled by the same socket, so only
             one ip/port pair is needed to connect any number of connections and clients.
             Engines register with the ``zmq.IDENTITY`` of their two ``DEALER`` sockets, one for the
             queue, which receives execute requests, and one for the heartbeat, which is used to
             monitor the survival of the Engine process.
             Message type: ``registration_request``::
                 content = {
                     'uuid'   : 'abcd-1234-...', # the zmq.IDENTITY of the engine's sockets
                 }
             .. note::
                 these are always the same, at least for now.
             The Controller replies to an Engine's registration request with the engine's integer ID,
             and all the remaining connection information for connecting the heartbeat process, and
             kernel queue socket(s). The message status will be an error if the Engine requests IDs that
             already in use.
             Message type: ``registration_reply``::
                 content = {
                     'status' : 'ok', # or 'error'
                     # if ok:
                     'id' : 0, # int, the engine id
                 }
             Clients use the same socket as engines to start their connections. Connection requests
             from clients need no information:
             Message type: ``connection_request``::
                 content = {}
             The reply to a Client registration request contains the connection information for the
             multiplexer and load balanced queues, as well as the address for direct hub
             queries. If any of these addresses is `None`, that functionality is not available.
             Message type: ``connection_reply``::
                 content = {
                     'status' : 'ok', # or 'error'
                 }
             Heartbeat
             *********
             The hub uses a heartbeat system to monitor engines, and track when they become
             unresponsive. As described in :ref:`messaging <messaging>`, and shown in :ref:`connections
             <parallel_connections>`.
             Notification (``PUB``)
             **********************
             The hub publishes all engine registration/unregistration events on a ``PUB`` socket.
             This allows clients to have up-to-date engine ID sets without polling. Registration
             notifications contain both the integer engine ID and the queue ID, which is necessary for
             sending messages via the Multiplexer Queue and Control Queues.
             Message type: ``registration_notification``::
                 content = {
                     'id' : 0, # engine ID that has been registered
                     'uuid' : 'engine_id' # the IDENT for the engine's sockets
                 }
             Message type : ``unregistration_notification``::
                 content = {
                     'id' : 0 # engine ID that has been unregistered
                     'uuid' : 'engine_id' # the IDENT for the engine's sockets
                 }
             Client Queries (``ROUTER``)
-            *************************
+            ***************************
             The hub monitors and logs all queue traffic, so that clients can retrieve past
             results or monitor pending tasks. This information may reside in-memory on the Hub, or
             on disk in a database (SQLite and MongoDB are currently supported).  These requests are
             handled by the same socket as registration.
             :func:`queue_request` requests can specify multiple engines to query via the `targets`
             element. A verbose flag can be passed, to determine whether the result should be the list
             of `msg_ids` in the queue or simply the length of each list.
             Message type: ``queue_request``::
                 content = {
                     'verbose' : True, # whether return should be lists themselves or just lens
                     'targets' : [0,3,1] # list of ints
                 }
             The content of a reply to a :func:`queue_request` request is a dict, keyed by the engine
             IDs. Note that they will be the string representation of the integer keys, since JSON
             cannot handle number keys.  The three keys of each dict are::
                 'completed' :  messages submitted via any queue that ran on the engine
                 'queue' : jobs submitted via MUX queue, whose results have not been received
                 'tasks' : tasks that are known to have been submitted to the engine, but
                             have not completed.  Note that with the pure zmq scheduler, this will
                             always be 0/[].
             Message type: ``queue_reply``::
                 content = {
                     'status' : 'ok', # or 'error'
                     # if verbose=False:
                     '0' : {'completed' : 1, 'queue' : 7, 'tasks' : 0},
                     # if verbose=True:
                     '1' : {'completed' : ['abcd-...','1234-...'], 'queue' : ['58008-'], 'tasks' : []},
                 }
             Clients can request individual results directly from the hub. This is primarily for
             gathering results of executions not submitted by the requesting client, as the client
             will have all its own results already. Requests are made by msg_id, and can contain one or
             more msg_id. An additional boolean key 'statusonly' can be used to not request the
             results, but simply poll the status of the jobs.
             Message type: ``result_request``::
                 content = {
                     'msg_ids' : ['uuid','...'], # list of strs
                     'targets' : [1,2,3], # list of int ids or uuids
                     'statusonly' : False, # bool
                 }
             The :func:`result_request` reply contains the content objects of the actual execution
             reply messages. If `statusonly=True`, then there will be only the 'pending' and
             'completed' lists.
             Message type: ``result_reply``::
                 content = {
                     'status' : 'ok', # else error
                     # if ok:
                     'acbd-...' : msg, # the content dict is keyed by msg_ids,
                                      # values are the result messages
                                     # there will be none of these if `statusonly=True`
                     'pending' : ['msg_id','...'], # msg_ids still pending
                     'completed' : ['msg_id','...'], # list of completed msg_ids
                 }
                 buffers = ['bufs','...'] # the buffers that contained the results of the objects.
                                         # this will be empty if no messages are complete, or if
                                         # statusonly is True.
             For memory management purposes, Clients can also instruct the hub to forget the
             results of messages. This can be done by message ID or engine ID. Individual messages are
             dropped by msg_id, and all messages completed on an engine are dropped by engine ID. This
             may no longer be necessary with the mongodb-based message logging backend.
             If the msg_ids element is the string ``'all'`` instead of a list, then all completed
             results are forgotten.
             Message type: ``purge_request``::
                 content = {
                     'msg_ids' : ['id1', 'id2',...], # list of msg_ids or 'all'
                     'engine_ids' : [0,2,4] # list of engine IDs
                 }
             The reply to a purge request is simply the status 'ok' if the request succeeded, or an
             explanation of why it failed, such as requesting the purge of a nonexistent or pending
             message.
             Message type: ``purge_reply``::
                 content = {
                     'status' : 'ok', # or 'error'
                 }
             Schedulers
             ----------
             There are three basic schedulers:
               * Task Scheduler
               * MUX Scheduler
               * Control Scheduler
             The MUX and Control schedulers are simple MonitoredQueue ØMQ devices, with ``ROUTER``
             sockets on either side. This allows the queue to relay individual messages to particular
             targets via ``zmq.IDENTITY`` routing. The Task scheduler may be a MonitoredQueue ØMQ
             device, in which case the client-facing socket is ``ROUTER``, and the engine-facing socket
             is ``DEALER``.  The result of this is that client-submitted messages are load-balanced via
             the ``DEALER`` socket, but the engine's replies to each message go to the requesting client.
             Raw ``DEALER`` scheduling is quite primitive, and doesn't allow message introspection, so
             there are also Python Schedulers that can be used. These Schedulers behave in much the
             same way as a MonitoredQueue does from the outside, but have rich internal logic to
             determine destinations, as well as handle dependency graphs Their sockets are always
             ``ROUTER`` on both sides.
             The Python task schedulers have an additional message type, which informs the Hub of
             the destination of a task as soon as that destination is known.
             Message type: ``task_destination``::
                 content = {
                     'msg_id' : 'abcd-1234-...', # the msg's uuid
                     'engine_id' : '1234-abcd-...', # the destination engine's zmq.IDENTITY
                 }
-            :func:`apply` and :func:`apply_bound`
+            :func:`apply`
-            *************************************
+            *************
             In terms of message classes, the MUX scheduler and Task scheduler relay the exact same
             message types.  Their only difference lies in how the destination is selected.
             The `Namespace <http://gist.github.com/483294>`_ model suggests that execution be able to
             use the model::
                 ns.apply(f, *args, **kwargs)
             which takes `f`, a function in the user's namespace, and executes ``f(*args, **kwargs)``
             on a remote engine, returning the result (or, for non-blocking, information facilitating
             later retrieval of the result). This model, unlike the execute message which just uses a
             code string, must be able to send arbitrary (pickleable) Python objects. And ideally, copy
             as little data as we can. The `buffers` property of a Message was introduced for this
             purpose.
-            Utility method :func:`build_apply_message` in :mod:`IPython.zmq.streamsession` wraps a
+            Utility method :func:`build_apply_message` in :mod:`IPython.zmq.serialize` wraps a
             function signature and builds a sendable buffer format for minimal data copying (exactly
             zero copies of numpy array data or buffers or large strings).
             Message type: ``apply_request``::
-                content = {
+                metadata = {
-                    'bound' : True, # whether to execute in the engine's namespace or unbound
                     'after' : ['msg_id',...], # list of msg_ids or output of Dependency.as_dict()
                     'follow' : ['msg_id',...], # list of msg_ids or output of Dependency.as_dict()
                 }
+                content = {}
                 buffers = ['...'] # at least 3 in length
                                 # as built by build_apply_message(f,args,kwargs)
             after/follow represent task dependencies. 'after' corresponds to a time dependency. The
             request will not arrive at an engine until the 'after' dependency tasks have completed.
             'follow' corresponds to a location dependency. The task will be submitted to the same
             engine as these msg_ids (see :class:`Dependency` docs for details).
             Message type: ``apply_reply``::
                 content = {
                     'status' : 'ok' # 'ok' or 'error'
                     # other error info here, as in other messages
                 }
                 buffers = ['...'] # either 1 or 2 in length
                                 # a serialization of the return value of f(*args,**kwargs)
                                 # only populated if status is 'ok'
             All engine execution and data movement is performed via apply messages.
             Control Messages
             ----------------
             Messages that interact with the engines, but are not meant to execute code, are submitted
             via the Control queue. These messages have high priority, and are thus received and
             handled before any execution requests.
             Clients may want to clear the namespace on the engine. There are no arguments nor
             information involved in this request, so the content is empty.
             Message type: ``clear_request``::
                 content = {}
             Message type: ``clear_reply``::
                 content = {
                     'status' : 'ok' # 'ok' or 'error'
                     # other error info here, as in other messages
                 }
             Clients may want to abort tasks that have not yet run. This can by done by message id, or
             all enqueued messages can be aborted if None is specified.
             Message type: ``abort_request``::
                 content = {
                     'msg_ids' : ['1234-...', '...'] # list of msg_ids or None
                 }
             Message type: ``abort_reply``::
                 content = {
                     'status' : 'ok' # 'ok' or 'error'
                     # other error info here, as in other messages
                 }
             The last action a client may want to do is shutdown the kernel. If a kernel receives a
             shutdown request, then it aborts all queued messages, replies to the request, and exits.
             Message type: ``shutdown_request``::
                 content = {}
             Message type: ``shutdown_reply``::
                 content = {
                     'status' : 'ok' # 'ok' or 'error'
                     # other error info here, as in other messages
                 }
             Implementation
             --------------
             There are a few differences in implementation between the `StreamSession` object used in
             the newparallel branch and the `Session` object, the main one being that messages are
             sent in parts, rather than as a single serialized object. `StreamSession` objects also
             take pack/unpack functions, which are to be used when serializing/deserializing objects.
             These can be any functions that translate to/from formats that ZMQ sockets can send
             (buffers,bytes, etc.).
             Split Sends
             ***********
             Previously, messages were bundled as a single json object and one call to
             :func:`socket.send_json`. Since the hub inspects all messages, and doesn't need to
             see the content of the messages, which can be large, messages are now serialized and sent in
-            pieces. All messages are sent in at least 3 parts: the header, the parent header, and the
+            pieces. All messages are sent in at least 4 parts: the header, the parent header, the metadata and the content.
-            content. This allows the controller to unpack and inspect the (always small) header,
+            This allows the controller to unpack and inspect the (always small) header,
             without spending time unpacking the content unless the message is bound for the
             controller. Buffers are added on to the end of the message, and can be any objects that
             present the buffer interface.