upstream/ipython Commit - r5919:d2d5e360

1

.. _parallel_task:

1

.. _parallel_task:

2

3

==========================

3

==========================

4

The IPython task interface

4

The IPython task interface

5

==========================

5

==========================

6

7

The task interface to the cluster presents the engines as a fault tolerant,

7

The task interface to the cluster presents the engines as a fault tolerant,

8

dynamic load-balanced system of workers. Unlike the multiengine interface, in

8

dynamic load-balanced system of workers. Unlike the multiengine interface, in

9

the task interface the user have no direct access to individual engines. By

9

the task interface the user have no direct access to individual engines. By

10

allowing the IPython scheduler to assign work, this interface is simultaneously

10

allowing the IPython scheduler to assign work, this interface is simultaneously

11

simpler and more powerful.

11

simpler and more powerful.

12

13

Best of all, the user can use both of these interfaces running at the same time

13

Best of all, the user can use both of these interfaces running at the same time

14

to take advantage of their respective strengths. When the user can break up

14

to take advantage of their respective strengths. When the user can break up

15

the user's work into segments that do not depend on previous execution, the

15

the user's work into segments that do not depend on previous execution, the

16

task interface is ideal. But it also has more power and flexibility, allowing

16

task interface is ideal. But it also has more power and flexibility, allowing

17

the user to guide the distribution of jobs, without having to assign tasks to

17

the user to guide the distribution of jobs, without having to assign tasks to

18

engines explicitly.

18

engines explicitly.

19

20

Starting the IPython controller and engines

20

Starting the IPython controller and engines

21

===========================================

21

===========================================

22

23

To follow along with this tutorial, you will need to start the IPython

23

To follow along with this tutorial, you will need to start the IPython

24

controller and four IPython engines. The simplest way of doing this is to use

24

controller and four IPython engines. The simplest way of doing this is to use

25

the :command:`ipcluster` command::

25

the :command:`ipcluster` command::

26

27

$ ipcluster start -n 4

27

$ ipcluster start -n 4

28

29

For more detailed information about starting the controller and engines, see

29

For more detailed information about starting the controller and engines, see

30

our :ref:`introduction <parallel_overview>` to using IPython for parallel computing.

30

our :ref:`introduction <parallel_overview>` to using IPython for parallel computing.

31

32

Creating a ``LoadBalancedView`` instance

32

Creating a ``LoadBalancedView`` instance

33

========================================

33

========================================

34

35

The first step is to import the IPython :mod:`IPython.parallel`

35

The first step is to import the IPython :mod:`IPython.parallel`

36

module and then create a :class:`.Client` instance, and we will also be using

36

module and then create a :class:`.Client` instance, and we will also be using

37

a :class:`LoadBalancedView`, here called `lview`:

37

a :class:`LoadBalancedView`, here called `lview`:

38

39

.. sourcecode:: ipython

39

.. sourcecode:: ipython

40

41

In [1]: from IPython.parallel import Client

41

In [1]: from IPython.parallel import Client

42

43

In [2]: rc = Client()

43

In [2]: rc = Client()

44

45

46

This form assumes that the controller was started on localhost with default

46

This form assumes that the controller was started on localhost with default

47

configuration. If not, the location of the controller must be given as an

47

configuration. If not, the location of the controller must be given as an

48

argument to the constructor:

48

argument to the constructor:

49

50

.. sourcecode:: ipython

50

.. sourcecode:: ipython

51

52

# for a visible LAN controller listening on an external port:

52

# for a visible LAN controller listening on an external port:

53

In [2]: rc = Client('tcp://192.168.1.16:10101')

53

In [2]: rc = Client('tcp://192.168.1.16:10101')

54

# or to connect with a specific profile you have set up:

54

# or to connect with a specific profile you have set up:

55

In [3]: rc = Client(profile='mpi')

55

In [3]: rc = Client(profile='mpi')

56

57

For load-balanced execution, we will make use of a :class:`LoadBalancedView` object, which can

57

For load-balanced execution, we will make use of a :class:`LoadBalancedView` object, which can

58

be constructed via the client's :meth:`load_balanced_view` method:

58

be constructed via the client's :meth:`load_balanced_view` method:

59

60

.. sourcecode:: ipython

60

.. sourcecode:: ipython

61

62

In [4]: lview = rc.load_balanced_view() # default load-balanced view

62

In [4]: lview = rc.load_balanced_view() # default load-balanced view

63

64

.. seealso::

64

.. seealso::

65

66

For more information, see the in-depth explanation of :ref:`Views <parallel_details>`.

66

For more information, see the in-depth explanation of :ref:`Views <parallel_details>`.

67

68

69

Quick and easy parallelism

69

Quick and easy parallelism

70

==========================

70

==========================

71

72

In many cases, you simply want to apply a Python function to a sequence of

72

In many cases, you simply want to apply a Python function to a sequence of

73

objects, but *in parallel*. Like the multiengine interface, these can be

73

objects, but *in parallel*. Like the multiengine interface, these can be

74

implemented via the task interface. The exact same tools can perform these

74

implemented via the task interface. The exact same tools can perform these

75

actions in load-balanced ways as well as multiplexed ways: a parallel version

75

actions in load-balanced ways as well as multiplexed ways: a parallel version

76

of :func:`map` and :func:`@parallel` function decorator. If one specifies the

76

of :func:`map` and :func:`@parallel` function decorator. If one specifies the

77

argument `balanced=True`, then they are dynamically load balanced. Thus, if the

77

argument `balanced=True`, then they are dynamically load balanced. Thus, if the

78

execution time per item varies significantly, you should use the versions in

78

execution time per item varies significantly, you should use the versions in

79

the task interface.

79

the task interface.

80

81

Parallel map

81

Parallel map

82

------------

82

------------

83

84

To load-balance :meth:`map`,simply use a LoadBalancedView:

84

To load-balance :meth:`map`,simply use a LoadBalancedView:

85

86

.. sourcecode:: ipython

86

.. sourcecode:: ipython

87

88

In [62]: lview.block = True

88

In [62]: lview.block = True

89

90

In [63]: serial_result = map(lambda x:x**10, range(32))

90

In [63]: serial_result = map(lambda x:x**10, range(32))

91

92

In [64]: parallel_result = lview.map(lambda x:x**10, range(32))

92

In [64]: parallel_result = lview.map(lambda x:x**10, range(32))

93

94

In [65]: serial_result==parallel_result

94

In [65]: serial_result==parallel_result

95

Out[65]: True

95

Out[65]: True

96

97

Parallel function decorator

97

Parallel function decorator

98

---------------------------

98

---------------------------

99

100

Parallel functions are just like normal function, but they can be called on

100

Parallel functions are just like normal function, but they can be called on

101

sequences and *in parallel*. The multiengine interface provides a decorator

101

sequences and *in parallel*. The multiengine interface provides a decorator

102

that turns any Python function into a parallel function:

102

that turns any Python function into a parallel function:

103

104

.. sourcecode:: ipython

104

.. sourcecode:: ipython

105

106

In [10]: @lview.parallel()

106

In [10]: @lview.parallel()

107

....: def f(x):

107

....: def f(x):

108

....: return 10.0*x**4

108

....: return 10.0*x**4

109

....:

109

....:

110

111

In [11]: f.map(range(32)) # this is done in parallel

111

In [11]: f.map(range(32)) # this is done in parallel

112

Out[11]: [0.0,10.0,160.0,...]

112

Out[11]: [0.0,10.0,160.0,...]

113

114

.. _parallel_taskmap:

114

.. _parallel_taskmap:

115

116

Map results are iterable!

116

Map results are iterable!

117

-------------------------

117

-------------------------

118

119

When an AsyncResult object actually maps multiple results (e.g. the :class:`~AsyncMapResult`

119

When an AsyncResult object actually maps multiple results (e.g. the :class:`~AsyncMapResult`

120

object), you can actually iterate through them, and act on the results as they arrive:

120

object), you can actually iterate through them, and act on the results as they arrive:

121

122

.. literalinclude:: ../../examples/parallel/itermapresult.py

122

.. literalinclude:: ../../examples/parallel/itermapresult.py

123

:language: python

123

:language: python

124

:lines: 9-34

124

:lines: 9-34

125

126

.. seealso::

126

.. seealso::

127

128

When AsyncResult or the AsyncMapResult don't provide what you need (for instance,

128

When AsyncResult or the AsyncMapResult don't provide what you need (for instance,

129

handling individual results as they arrive, but with metadata), you can always

129

handling individual results as they arrive, but with metadata), you can always

130

just split the original result's ``msg_ids`` attribute, and handle them as you like.

130

just split the original result's ``msg_ids`` attribute, and handle them as you like.

131

132

For an example of this, see :file:`docs/examples/parallel/customresult.py`

132

For an example of this, see :file:`docs/examples/parallel/customresult.py`

133

134

135

.. _parallel_dependencies:

135

.. _parallel_dependencies:

136

137

Dependencies

137

Dependencies

138

============

138

============

139

140

Often, pure atomic load-balancing is too primitive for your work. In these cases, you

140

Often, pure atomic load-balancing is too primitive for your work. In these cases, you

141

may want to associate some kind of `Dependency` that describes when, where, or whether

141

may want to associate some kind of `Dependency` that describes when, where, or whether

142

a task can be run. In IPython, we provide two types of dependencies:

142

a task can be run. In IPython, we provide two types of dependencies:

143

`Functional Dependencies`_ and `Graph Dependencies`_

143

`Functional Dependencies`_ and `Graph Dependencies`_

144

145

.. note::

145

.. note::

146

147

It is important to note that the pure ZeroMQ scheduler does not support dependencies,

147

It is important to note that the pure ZeroMQ scheduler does not support dependencies,

148

and you will see errors or warnings if you try to use dependencies with the pure

148

and you will see errors or warnings if you try to use dependencies with the pure

149

scheduler.

149

scheduler.

150

151

Functional Dependencies

151

Functional Dependencies

152

-----------------------

152

-----------------------

153

154

Functional dependencies are used to determine whether a given engine is capable of running

154

Functional dependencies are used to determine whether a given engine is capable of running

155

a particular task. This is implemented via a special :class:`Exception` class,

155

a particular task. This is implemented via a special :class:`Exception` class,

156

:class:`UnmetDependency`, found in `IPython.parallel.error`. Its use is very simple:

156

:class:`UnmetDependency`, found in `IPython.parallel.error`. Its use is very simple:

157

if a task fails with an UnmetDependency exception, then the scheduler, instead of relaying

157

if a task fails with an UnmetDependency exception, then the scheduler, instead of relaying

158

the error up to the client like any other error, catches the error, and submits the task

158

the error up to the client like any other error, catches the error, and submits the task

159

to a different engine. This will repeat indefinitely, and a task will never be submitted

159

to a different engine. This will repeat indefinitely, and a task will never be submitted

160

to a given engine a second time.

160

to a given engine a second time.

161

162

You can manually raise the :class:`UnmetDependency` yourself, but IPython has provided

162

You can manually raise the :class:`UnmetDependency` yourself, but IPython has provided

163

some decorators for facilitating this behavior.

163

some decorators for facilitating this behavior.

164

165

There are two decorators and a class used for functional dependencies:

165

There are two decorators and a class used for functional dependencies:

166

167

.. sourcecode:: ipython

167

.. sourcecode:: ipython

168

169

In [9]: from IPython.parallel import depend, require, dependent

169

In [9]: from IPython.parallel import depend, require, dependent

170

171

@require

171

@require

172

********

172

********

173

174

The simplest sort of dependency is requiring that a Python module is available. The

174

The simplest sort of dependency is requiring that a Python module is available. The

175

``@require`` decorator lets you define a function that will only run on engines where names

175

``@require`` decorator lets you define a function that will only run on engines where names

176

you specify are importable:

176

you specify are importable:

177

178

.. sourcecode:: ipython

178

.. sourcecode:: ipython

179

180

In [10]: @require('numpy', 'zmq')

180

In [10]: @require('numpy', 'zmq')

181

....: def myfunc():

181

....: def myfunc():

182

....: return dostuff()

182

....: return dostuff()

183

184

Now, any time you apply :func:`myfunc`, the task will only run on a machine that has

184

Now, any time you apply :func:`myfunc`, the task will only run on a machine that has

185

numpy and pyzmq available, and when :func:`myfunc` is called, numpy and zmq will be imported.

185

numpy and pyzmq available, and when :func:`myfunc` is called, numpy and zmq will be imported.

186

187

@depend

187

@depend

188

*******

188

*******

189

190

The ``@depend`` decorator lets you decorate any function with any *other* function to

190

The ``@depend`` decorator lets you decorate any function with any *other* function to

191

evaluate the dependency. The dependency function will be called at the start of the task,

191

evaluate the dependency. The dependency function will be called at the start of the task,

192

and if it returns ``False``, then the dependency will be considered unmet, and the task

192

and if it returns ``False``, then the dependency will be considered unmet, and the task

193

will be assigned to another engine. If the dependency returns *anything other than

193

will be assigned to another engine. If the dependency returns *anything other than

194

``False``*, the rest of the task will continue.

194

``False``*, the rest of the task will continue.

195

196

.. sourcecode:: ipython

196

.. sourcecode:: ipython

197

198

In [10]: def platform_specific(plat):

198

In [10]: def platform_specific(plat):

199

....: import sys

199

....: import sys

200

....: return sys.platform == plat

200

....: return sys.platform == plat

201

202

In [11]: @depend(platform_specific, 'darwin')

202

In [11]: @depend(platform_specific, 'darwin')

203

....: def mactask():

203

....: def mactask():

204

....: do_mac_stuff()

204

....: do_mac_stuff()

205

206

In [12]: @depend(platform_specific, 'nt')

206

In [12]: @depend(platform_specific, 'nt')

207

....: def wintask():

207

....: def wintask():

208

....: do_windows_stuff()

208

....: do_windows_stuff()

209

210

In this case, any time you apply ``mytask``, it will only run on an OSX machine.

210

In this case, any time you apply ``mytask``, it will only run on an OSX machine.

211

``@depend`` is just like ``apply``, in that it has a ``@depend(f,*args,**kwargs)``

211

``@depend`` is just like ``apply``, in that it has a ``@depend(f,*args,**kwargs)``

212

signature.

212

signature.

213

214

dependents

214

dependents

215

**********

215

**********

216

217

You don't have to use the decorators on your tasks, if for instance you may want

217

You don't have to use the decorators on your tasks, if for instance you may want

218

to run tasks with a single function but varying dependencies, you can directly construct

218

to run tasks with a single function but varying dependencies, you can directly construct

219

the :class:`dependent` object that the decorators use:

219

the :class:`dependent` object that the decorators use:

220

221

.. sourcecode::ipython

221

.. sourcecode::ipython

222

223

In [13]: def mytask(*args):

223

In [13]: def mytask(*args):

224

....: dostuff()

224

....: dostuff()

225

226

In [14]: mactask = dependent(mytask, platform_specific, 'darwin')

226

In [14]: mactask = dependent(mytask, platform_specific, 'darwin')

227

# this is the same as decorating the declaration of mytask with @depend

227

# this is the same as decorating the declaration of mytask with @depend

228

# but you can do it again:

228

# but you can do it again:

229

230

In [15]: wintask = dependent(mytask, platform_specific, 'nt')

230

In [15]: wintask = dependent(mytask, platform_specific, 'nt')

231

232

# in general:

232

# in general:

233

In [16]: t = dependent(f, g, *dargs, **dkwargs)

233

In [16]: t = dependent(f, g, *dargs, **dkwargs)

234

235

# is equivalent to:

235

# is equivalent to:

236

In [17]: @depend(g, *dargs, **dkwargs)

236

In [17]: @depend(g, *dargs, **dkwargs)

237

....: def t(a,b,c):

237

....: def t(a,b,c):

238

....: # contents of f

238

....: # contents of f

239

240

Graph Dependencies

240

Graph Dependencies

241

------------------

241

------------------

242

243

Sometimes you want to restrict the time and/or location to run a given task as a function

243

Sometimes you want to restrict the time and/or location to run a given task as a function

244

of the time and/or location of other tasks. This is implemented via a subclass of

244

of the time and/or location of other tasks. This is implemented via a subclass of

245

:class:`set`, called a :class:`Dependency`. A Dependency is just a set of `msg_ids`

245

:class:`set`, called a :class:`Dependency`. A Dependency is just a set of `msg_ids`

246

corresponding to tasks, and a few attributes to guide how to decide when the Dependency

246

corresponding to tasks, and a few attributes to guide how to decide when the Dependency

247

has been met.

247

has been met.

248

249

The switches we provide for interpreting whether a given dependency set has been met:

249

The switches we provide for interpreting whether a given dependency set has been met:

250

251

any|all

251

any|all

252

Whether the dependency is considered met if *any* of the dependencies are done, or

252

Whether the dependency is considered met if *any* of the dependencies are done, or

253

only after *all* of them have finished. This is set by a Dependency's :attr:`all`

253

only after *all* of them have finished. This is set by a Dependency's :attr:`all`

254

boolean attribute, which defaults to ``True``.

254

boolean attribute, which defaults to ``True``.

255

256

success [default: True]

256

success [default: True]

257

Whether to consider tasks that succeeded as fulfilling dependencies.

257

Whether to consider tasks that succeeded as fulfilling dependencies.

258

259

failure [default : False]

259

failure [default : False]

260

Whether to consider tasks that failed as fulfilling dependencies.

260

Whether to consider tasks that failed as fulfilling dependencies.

261

using `failure=True,success=False` is useful for setting up cleanup tasks, to be run

261

using `failure=True,success=False` is useful for setting up cleanup tasks, to be run

262

only when tasks have failed.

262

only when tasks have failed.

263

264

Sometimes you want to run a task after another, but only if that task succeeded. In this case,

264

Sometimes you want to run a task after another, but only if that task succeeded. In this case,

265

``success`` should be ``True`` and ``failure`` should be ``False``. However sometimes you may

265

``success`` should be ``True`` and ``failure`` should be ``False``. However sometimes you may

266

not care whether the task succeeds, and always want the second task to run, in which case you

266

not care whether the task succeeds, and always want the second task to run, in which case you

267

should use `success=failure=True`. The default behavior is to only use successes.

267

should use `success=failure=True`. The default behavior is to only use successes.

268

269

There are other switches for interpretation that are made at the *task* level. These are

269

There are other switches for interpretation that are made at the *task* level. These are

270

specified via keyword arguments to the client's :meth:`apply` method.

270

specified via keyword arguments to the client's :meth:`apply` method.

271

272

after,follow

272

after,follow

273

You may want to run a task *after* a given set of dependencies have been run and/or

273

You may want to run a task *after* a given set of dependencies have been run and/or

274

run it *where* another set of dependencies are met. To support this, every task has an

274

run it *where* another set of dependencies are met. To support this, every task has an

275

`after` dependency to restrict time, and a `follow` dependency to restrict

275

`after` dependency to restrict time, and a `follow` dependency to restrict

276

destination.

276

destination.

277

278

timeout

278

timeout

279

You may also want to set a time-limit for how long the scheduler should wait before a

279

You may also want to set a time-limit for how long the scheduler should wait before a

280

task's dependencies are met. This is done via a `timeout`, which defaults to 0, which

280

task's dependencies are met. This is done via a `timeout`, which defaults to 0, which

281

indicates that the task should never timeout. If the timeout is reached, and the

281

indicates that the task should never timeout. If the timeout is reached, and the

282

scheduler still hasn't been able to assign the task to an engine, the task will fail

282

scheduler still hasn't been able to assign the task to an engine, the task will fail

283

with a :class:`DependencyTimeout`.

283

with a :class:`DependencyTimeout`.

284

285

.. note::

285

.. note::

286

287

Dependencies only work within the task scheduler. You cannot instruct a load-balanced

287

Dependencies only work within the task scheduler. You cannot instruct a load-balanced

288

task to run after a job submitted via the MUX interface.

288

task to run after a job submitted via the MUX interface.

289

290

The simplest form of Dependencies is with `all=True,success=True,failure=False`. In these cases,

290

The simplest form of Dependencies is with `all=True,success=True,failure=False`. In these cases,

291

you can skip using Dependency objects, and just pass msg_ids or AsyncResult objects as the

291

you can skip using Dependency objects, and just pass msg_ids or AsyncResult objects as the

292

`follow` and `after` keywords to :meth:`client.apply`:

292

`follow` and `after` keywords to :meth:`client.apply`:

293

294

.. sourcecode:: ipython

294

.. sourcecode:: ipython

295

296

In [14]: client.block=False

296

In [14]: client.block=False

297

298

In [15]: ar = lview.apply(f, args, kwargs)

298

In [15]: ar = lview.apply(f, args, kwargs)

299

300

In [16]: ar2 = lview.apply(f2)

300

In [16]: ar2 = lview.apply(f2)

301

302

In [17]: with lview.temp_flags(after=[ar,ar2]):

302

In [17]: with lview.temp_flags(after=[ar,ar2]):

303

....: ar3 = lview.apply(f3)

303

....: ar3 = lview.apply(f3)

304

305

In [18]: with lview.temp_flags(follow=[ar], timeout=2.5)

305

In [18]: with lview.temp_flags(follow=[ar], timeout=2.5)

306

....: ar4 = lview.apply(f3)

306

....: ar4 = lview.apply(f3)

307

308

.. seealso::

308

.. seealso::

309

310

Some parallel workloads can be described as a `Directed Acyclic Graph

310

Some parallel workloads can be described as a `Directed Acyclic Graph

311

<http://en.wikipedia.org/wiki/Directed_acyclic_graph>`_, or DAG. See :ref:`DAG

311

<http://en.wikipedia.org/wiki/Directed_acyclic_graph>`_, or DAG. See :ref:`DAG

312

Dependencies <dag_dependencies>` for an example demonstrating how to use map a NetworkX DAG

312

Dependencies <dag_dependencies>` for an example demonstrating how to use map a NetworkX DAG

313

onto task dependencies.

313

onto task dependencies.

314

315

316

Impossible Dependencies

316

Impossible Dependencies

317

***********************

317

***********************

318

319

The schedulers do perform some analysis on graph dependencies to determine whether they

319

The schedulers do perform some analysis on graph dependencies to determine whether they

320

are not possible to be met. If the scheduler does discover that a dependency cannot be

320

are not possible to be met. If the scheduler does discover that a dependency cannot be

321

met, then the task will fail with an :class:`ImpossibleDependency` error. This way, if the

321

met, then the task will fail with an :class:`ImpossibleDependency` error. This way, if the

322

scheduler realized that a task can never be run, it won't sit indefinitely in the

322

scheduler realized that a task can never be run, it won't sit indefinitely in the

323

scheduler clogging the pipeline.

323

scheduler clogging the pipeline.

324

325

The basic cases that are checked:

325

The basic cases that are checked:

326

327

* depending on nonexistent messages

327

* depending on nonexistent messages

328

* `follow` dependencies were run on more than one machine and `all=True`

328

* `follow` dependencies were run on more than one machine and `all=True`

329

* any dependencies failed and `all=True,success=True,failures=False`

329

* any dependencies failed and `all=True,success=True,failures=False`

330

* all dependencies failed and `all=False,success=True,failure=False`

330

* all dependencies failed and `all=False,success=True,failure=False`

331

332

.. warning::

332

.. warning::

333

334

This analysis has not been proven to be rigorous, so it is likely possible for tasks

334

This analysis has not been proven to be rigorous, so it is likely possible for tasks

335

to become impossible to run in obscure situations, so a timeout may be a good choice.

335

to become impossible to run in obscure situations, so a timeout may be a good choice.

336

337

338

Retries and Resubmit

338

Retries and Resubmit

339

====================

339

====================

340

341

Retries

341

Retries

342

-------

342

-------

343

344

Another flag for tasks is `retries`. This is an integer, specifying how many times

344

Another flag for tasks is `retries`. This is an integer, specifying how many times

345

a task should be resubmitted after failure. This is useful for tasks that should still run

345

a task should be resubmitted after failure. This is useful for tasks that should still run

346

if their engine was shutdown, or may have some statistical chance of failing. The default

346

if their engine was shutdown, or may have some statistical chance of failing. The default

347

is to not retry tasks.

347

is to not retry tasks.

348

349

Resubmit

349

Resubmit

350

--------

350

--------

351

352

Sometimes you may want to re-run a task. This could be because it failed for some reason, and

352

Sometimes you may want to re-run a task. This could be because it failed for some reason, and

353

you have fixed the error, or because you want to restore the cluster to an interrupted state.

353

you have fixed the error, or because you want to restore the cluster to an interrupted state.

354

For this, the :class:`Client` has a :meth:`rc.resubmit` method. This simply takes one or more

354

For this, the :class:`Client` has a :meth:`rc.resubmit` method. This simply takes one or more

355

msg_ids, and returns an :class:`AsyncHubResult` for the result(s). You cannot resubmit

355

msg_ids, and returns an :class:`AsyncHubResult` for the result(s). You cannot resubmit

356

a task that is pending - only those that have finished, either successful or unsuccessful.

356

a task that is pending - only those that have finished, either successful or unsuccessful.

357

358

.. _parallel_schedulers:

358

.. _parallel_schedulers:

359

360

Schedulers

360

Schedulers

361

==========

361

==========

362

363

There are a variety of valid ways to determine where jobs should be assigned in a

363

There are a variety of valid ways to determine where jobs should be assigned in a

364

load-balancing situation. In IPython, we support several standard schemes, and

364

load-balancing situation. In IPython, we support several standard schemes, and

365

even make it easy to define your own. The scheme can be selected via the ``scheme``

365

even make it easy to define your own. The scheme can be selected via the ``scheme``

366

argument to :command:`ipcontroller`, or in the :attr:`TaskScheduler.schemename` attribute

366

argument to :command:`ipcontroller`, or in the :attr:`TaskScheduler.schemename` attribute

367

of a controller config object.

367

of a controller config object.

368

369

The built-in routing schemes:

369

The built-in routing schemes:

370

371

To select one of these schemes, simply do::

371

To select one of these schemes, simply do::

372

373

$ ipcontroller --scheme=<schemename>

373

$ ipcontroller --scheme=<schemename>

374

for instance:

374

for instance:

375

$ ipcontroller --scheme=lru

375

$ ipcontroller --scheme=lru

376

377

lru: Least Recently Used

377

lru: Least Recently Used

378

379

Always assign work to the least-recently-used engine. A close relative of

379

Always assign work to the least-recently-used engine. A close relative of

380

round-robin, it will be fair with respect to the number of tasks, agnostic

380

round-robin, it will be fair with respect to the number of tasks, agnostic

381

with respect to runtime of each task.

381

with respect to runtime of each task.

382

383

plainrandom: Plain Random

383

plainrandom: Plain Random

384

385

Randomly picks an engine on which to run.

385

Randomly picks an engine on which to run.

386

387

twobin: Two-Bin Random

387

twobin: Two-Bin Random

388

389

**Requires numpy**

389

**Requires numpy**

390

391

Pick two engines at random, and use the LRU of the two. This is known to be better

391

Pick two engines at random, and use the LRU of the two. This is known to be better

392

than plain random in many cases, but requires a small amount of computation.

392

than plain random in many cases, but requires a small amount of computation.

393

394

leastload: Least Load

394

leastload: Least Load

395

396

**This is the default scheme**

396

**This is the default scheme**

397

398

Always assign tasks to the engine with the fewest outstanding tasks (LRU breaks tie).

398

Always assign tasks to the engine with the fewest outstanding tasks (LRU breaks tie).

399

400

weighted: Weighted Two-Bin Random

400

weighted: Weighted Two-Bin Random

401

402

**Requires numpy**

402

**Requires numpy**

403

404

Pick two engines at random using the number of outstanding tasks as inverse weights,

404

Pick two engines at random using the number of outstanding tasks as inverse weights,

405

and use the one with the lower load.

405

and use the one with the lower load.

406

407

Greedy Assignment

407

Greedy Assignment

408

-----------------

408

-----------------

409

410

Tasks are assigned greedily as they are submitted. If their dependencies are

410

Tasks are assigned greedily as they are submitted. If their dependencies are

411

met, they will be assigned to an engine right away, and multiple tasks can be

411

met, they will be assigned to an engine right away, and multiple tasks can be

412

assigned to an engine at a given time. This limit is set with the

412

assigned to an engine at a given time. This limit is set with the

413

``TaskScheduler.hwm`` (high water mark) configurable:

413

``TaskScheduler.hwm`` (high water mark) configurable:

414

415

.. sourcecode:: python

415

.. sourcecode:: python

416

417

# the most common choices are:

417

# the most common choices are:

418

c.TaskSheduler.hwm = 0 # (minimal latency, default)

418

c.TaskSheduler.hwm = 0 # (minimal latency, default in IPython ≤ 0.12)

419

# or

419

# or

420

c.TaskScheduler.hwm = 1 # (most-informed balancing)

420

c.TaskScheduler.hwm = 1 # (most-informed balancing, default in > 0.12)

421

422

The default is 0, or no-limit. That is, there is no limit to the number of

422

In IPython ≤ 0.12,the default is 0, or no-limit. That is, there is no limit to the number of

423

tasks that can be outstanding on a given engine. This greatly benefits the

423

tasks that can be outstanding on a given engine. This greatly benefits the

424

latency of execution, because network traffic can be hidden behind computation.

424

latency of execution, because network traffic can be hidden behind computation.

425

However, this means that workload is assigned without knowledge of how long

425

However, this means that workload is assigned without knowledge of how long

426

each task might take, and can result in poor load-balancing, particularly for

426

each task might take, and can result in poor load-balancing, particularly for

427

submitting a collection of heterogeneous tasks all at once. You can limit this

427

submitting a collection of heterogeneous tasks all at once. You can limit this

428

effect by setting hwm to a positive integer, 1 being maximum load-balancing (a

428

effect by setting hwm to a positive integer, 1 being maximum load-balancing (a

429

task will never be waiting if there is an idle engine), and any larger number

429

task will never be waiting if there is an idle engine), and any larger number

430

being a compromise between load-balance and latency-hiding.

430

being a compromise between load-balance and latency-hiding.

431

432

In practice, some users have been confused by having this optimization on by

433

default, and the default value has been changed to 1. This can be slower,

434

but has more obvious behavior and won't result in assigning too many tasks to

435

some engines in heterogeneous cases.

436

432

437

433

Pure ZMQ Scheduler

438

Pure ZMQ Scheduler

434

------------------

439

------------------

435

440

436

For maximum throughput, the 'pure' scheme is not Python at all, but a C-level

441

For maximum throughput, the 'pure' scheme is not Python at all, but a C-level

437

:class:`MonitoredQueue` from PyZMQ, which uses a ZeroMQ ``DEALER`` socket to perform all

442

:class:`MonitoredQueue` from PyZMQ, which uses a ZeroMQ ``DEALER`` socket to perform all

438

load-balancing. This scheduler does not support any of the advanced features of the Python

443

load-balancing. This scheduler does not support any of the advanced features of the Python

439

:class:`.Scheduler`.

444

:class:`.Scheduler`.

440

445

441

Disabled features when using the ZMQ Scheduler:

446

Disabled features when using the ZMQ Scheduler:

442

447

443

* Engine unregistration

448

* Engine unregistration

444

Task farming will be disabled if an engine unregisters.

449

Task farming will be disabled if an engine unregisters.

445

Further, if an engine is unregistered during computation, the scheduler may not recover.

450

Further, if an engine is unregistered during computation, the scheduler may not recover.

446

* Dependencies

451

* Dependencies

447

Since there is no Python logic inside the Scheduler, routing decisions cannot be made

452

Since there is no Python logic inside the Scheduler, routing decisions cannot be made

448

based on message content.

453

based on message content.

449

* Early destination notification

454

* Early destination notification

450

The Python schedulers know which engine gets which task, and notify the Hub. This

455

The Python schedulers know which engine gets which task, and notify the Hub. This

451

allows graceful handling of Engines coming and going. There is no way to know

456

allows graceful handling of Engines coming and going. There is no way to know

452

where ZeroMQ messages have gone, so there is no way to know what tasks are on which

457

where ZeroMQ messages have gone, so there is no way to know what tasks are on which

453

engine until they *finish*. This makes recovery from engine shutdown very difficult.

458

engine until they *finish*. This makes recovery from engine shutdown very difficult.

454

459

455

460

456

.. note::

461

.. note::

457

462

458

TODO: performance comparisons

463

TODO: performance comparisons

459

464

460

465

461

466

462

467

463

More details

468

More details

464

============

469

============

465

470

466

The :class:`LoadBalancedView` has many more powerful features that allow quite a bit

471

The :class:`LoadBalancedView` has many more powerful features that allow quite a bit

467

of flexibility in how tasks are defined and run. The next places to look are

472

of flexibility in how tasks are defined and run. The next places to look are

468

in the following classes:

473

in the following classes:

469

474

470

* :class:`~IPython.parallel.client.view.LoadBalancedView`

475

* :class:`~IPython.parallel.client.view.LoadBalancedView`

471

* :class:`~IPython.parallel.client.asyncresult.AsyncResult`

476

* :class:`~IPython.parallel.client.asyncresult.AsyncResult`

472

* :meth:`~IPython.parallel.client.view.LoadBalancedView.apply`

477

* :meth:`~IPython.parallel.client.view.LoadBalancedView.apply`

473

* :mod:`~IPython.parallel.controller.dependency`

478

* :mod:`~IPython.parallel.controller.dependency`

474

479

475

The following is an overview of how to use these classes together:

480

The following is an overview of how to use these classes together:

476

481

477

1. Create a :class:`Client` and :class:`LoadBalancedView`

482

1. Create a :class:`Client` and :class:`LoadBalancedView`

478

2. Define some functions to be run as tasks

483

2. Define some functions to be run as tasks

479

3. Submit your tasks to using the :meth:`apply` method of your

484

3. Submit your tasks to using the :meth:`apply` method of your

480

:class:`LoadBalancedView` instance.

485

:class:`LoadBalancedView` instance.

481

4. Use :meth:`.Client.get_result` to get the results of the

486

4. Use :meth:`.Client.get_result` to get the results of the

482

tasks, or use the :meth:`AsyncResult.get` method of the results to wait

487

tasks, or use the :meth:`AsyncResult.get` method of the results to wait

483

for and then receive the results.

488

for and then receive the results.

484

489

485

.. seealso::

490

.. seealso::

486

491

487

A demo of :ref:`DAG Dependencies <dag_dependencies>` with NetworkX and IPython.

492

A demo of :ref:`DAG Dependencies <dag_dependencies>` with NetworkX and IPython.

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

             .. _parallel_task:
             ==========================
             The IPython task interface
             ==========================
             The task interface to the cluster presents the engines as a fault tolerant,
             dynamic load-balanced system of workers. Unlike the multiengine interface, in
             the task interface the user have no direct access to individual engines. By
             allowing the IPython scheduler to assign work, this interface is simultaneously
             simpler and more powerful.
             Best of all, the user can use both of these interfaces running at the same time
             to take advantage of their respective strengths. When the user can break up
             the user's work into segments that do not depend on previous execution, the
             task interface is ideal. But it also has more power and flexibility, allowing
             the user to guide the distribution of jobs, without having to assign tasks to
             engines explicitly.
             Starting the IPython controller and engines
             ===========================================
             To follow along with this tutorial, you will need to start the IPython
             controller and four IPython engines. The simplest way of doing this is to use
             the :command:`ipcluster` command::
             	$ ipcluster start -n 4
             For more detailed information about starting the controller and engines, see
             our :ref:`introduction <parallel_overview>` to using IPython for parallel computing.
             Creating a ``LoadBalancedView`` instance
             ========================================
             The first step is to import the IPython :mod:`IPython.parallel`
             module and then create a :class:`.Client` instance, and we will also be using
             a :class:`LoadBalancedView`, here called `lview`:
             .. sourcecode:: ipython
                 In [1]: from IPython.parallel import Client
                 In [2]: rc = Client()
             This form assumes that the controller was started on localhost with default
             configuration. If not, the location of the controller must be given as an
             argument to the constructor:
             .. sourcecode:: ipython
                 # for a visible LAN controller listening on an external port:
                 In [2]: rc = Client('tcp://192.168.1.16:10101')
                 # or to connect with a specific profile you have set up:
                 In [3]: rc = Client(profile='mpi')
             For load-balanced execution, we will make use of a :class:`LoadBalancedView` object, which can
             be constructed via the client's :meth:`load_balanced_view` method:
             .. sourcecode:: ipython
                 In [4]: lview = rc.load_balanced_view() # default load-balanced view
             .. seealso::
                 For more information, see the in-depth explanation of :ref:`Views <parallel_details>`.
             Quick and easy parallelism
             ==========================
             In many cases, you simply want to apply a Python function to a sequence of
             objects, but *in parallel*. Like the multiengine interface, these can be
             implemented via the task interface. The exact same tools can perform these
             actions in load-balanced ways as well as multiplexed ways: a parallel version
             of :func:`map` and :func:`@parallel` function decorator. If one specifies the
             argument `balanced=True`, then they are dynamically load balanced. Thus, if the
             execution time per item varies significantly, you should use the versions in
             the task interface.
             Parallel map
             ------------
             To load-balance :meth:`map`,simply use a LoadBalancedView:
             .. sourcecode:: ipython
                 In [62]: lview.block = True
                 In [63]: serial_result = map(lambda x:x**10, range(32))
                 In [64]: parallel_result = lview.map(lambda x:x**10, range(32))
                 In [65]: serial_result==parallel_result
                 Out[65]: True
             Parallel function decorator
             ---------------------------
             Parallel functions are just like normal function, but they can be called on
             sequences and *in parallel*. The multiengine interface provides a decorator
             that turns any Python function into a parallel function:
             .. sourcecode:: ipython
                 In [10]: @lview.parallel()
                    ....: def f(x):
                    ....:     return 10.0*x**4
                    ....:
                 In [11]: f.map(range(32))    # this is done in parallel
                 Out[11]: [0.0,10.0,160.0,...]
             .. _parallel_taskmap:
             Map results are iterable!
             -------------------------
             When an AsyncResult object actually maps multiple results (e.g. the :class:`~AsyncMapResult`
             object), you can actually iterate through them, and act on the results as they arrive:
             .. literalinclude:: ../../examples/parallel/itermapresult.py
                 :language: python
                 :lines: 9-34
             .. seealso::
                 When AsyncResult or the AsyncMapResult don't provide what you need (for instance,
                 handling individual results as they arrive, but with metadata), you can always
                 just split the original result's ``msg_ids`` attribute, and handle them as you like.
                 For an example of this, see :file:`docs/examples/parallel/customresult.py`
             .. _parallel_dependencies:
             Dependencies
             ============
             Often, pure atomic load-balancing is too primitive for your work. In these cases, you
             may want to associate some kind of `Dependency` that describes when, where, or whether
             a task can be run.  In IPython, we provide two types of dependencies:
             `Functional Dependencies`_ and `Graph Dependencies`_
             .. note::
                 It is important to note that the pure ZeroMQ scheduler does not support dependencies,
                 and you will see errors or warnings if you try to use dependencies with the pure
                 scheduler.
             Functional Dependencies
             -----------------------
             Functional dependencies are used to determine whether a given engine is capable of running
             a particular task.  This is implemented via a special :class:`Exception` class,
             :class:`UnmetDependency`, found in `IPython.parallel.error`.  Its use is very simple:
             if a task fails with an UnmetDependency exception, then the scheduler, instead of relaying
             the error up to the client like any other error, catches the error, and submits the task
             to a different engine.  This will repeat indefinitely, and a task will never be submitted
             to a given engine a second time.
             You can manually raise the :class:`UnmetDependency` yourself, but IPython has provided
             some decorators for facilitating this behavior.
             There are two decorators and a class used for functional dependencies:
             .. sourcecode:: ipython
                 In [9]: from IPython.parallel import depend, require, dependent
             @require
             ********
             The simplest sort of dependency is requiring that a Python module is available. The
             ``@require`` decorator lets you define a function that will only run on engines where names
             you specify are importable:
             .. sourcecode:: ipython
                 In [10]: @require('numpy', 'zmq')
                    ....: def myfunc():
                    ....:     return dostuff()
             Now, any time you apply :func:`myfunc`, the task will only run on a machine that has
             numpy and pyzmq available, and when :func:`myfunc` is called, numpy and zmq will be imported.
             @depend
             *******
             The ``@depend`` decorator lets you decorate any function with any *other* function to
             evaluate the dependency. The dependency function will be called at the start of the task,
             and if it returns ``False``, then the dependency will be considered unmet, and the task
             will be assigned to another engine. If the dependency returns *anything other than
             ``False``*, the rest of the task will continue.
             .. sourcecode:: ipython
                 In [10]: def platform_specific(plat):
                    ....:    import sys
                    ....:    return sys.platform == plat
                 In [11]: @depend(platform_specific, 'darwin')
                    ....: def mactask():
                    ....:    do_mac_stuff()
                 In [12]: @depend(platform_specific, 'nt')
                    ....: def wintask():
                    ....:    do_windows_stuff()
             In this case, any time you apply ``mytask``, it will only run on an OSX machine.
             ``@depend`` is just like ``apply``, in that it has a ``@depend(f,*args,**kwargs)``
             signature.
             dependents
             **********
             You don't have to use the decorators on your tasks, if for instance you may want
             to run tasks with a single function but varying dependencies, you can directly construct
             the :class:`dependent` object that the decorators use:
             .. sourcecode::ipython
                 In [13]: def mytask(*args):
                    ....:    dostuff()
                 In [14]: mactask = dependent(mytask, platform_specific, 'darwin')
                 # this is the same as decorating the declaration of mytask with @depend
                 # but you can do it again:
                 In [15]: wintask = dependent(mytask, platform_specific, 'nt')
                 # in general:
                 In [16]: t = dependent(f, g, *dargs, **dkwargs)
                 # is equivalent to:
                 In [17]: @depend(g, *dargs, **dkwargs)
                    ....: def t(a,b,c):
                    ....:     # contents of f
             Graph Dependencies
             ------------------
             Sometimes you want to restrict the time and/or location to run a given task as a function
             of the time and/or location of other tasks. This is implemented via a subclass of
             :class:`set`, called a :class:`Dependency`. A Dependency is just a set of `msg_ids`
             corresponding to tasks, and a few attributes to guide how to decide when the Dependency
             has been met.
             The switches we provide for interpreting whether a given dependency set has been met:
             any|all
                 Whether the dependency is considered met if *any* of the dependencies are done, or
                 only after *all* of them have finished.  This is set by a Dependency's :attr:`all`
                 boolean attribute, which defaults to ``True``.
             success [default: True]
                 Whether to consider tasks that succeeded as fulfilling dependencies.
             failure [default : False]
                 Whether to consider tasks that failed as fulfilling dependencies.
                 using `failure=True,success=False` is useful for setting up cleanup tasks, to be run
                 only when tasks have failed.
             Sometimes you want to run a task after another, but only if that task succeeded. In this case,
             ``success`` should be ``True`` and ``failure`` should be ``False``. However sometimes you may
             not care whether the task succeeds, and always want the second task to run, in which case you
             should use `success=failure=True`. The default behavior is to only use successes.
             There are other switches for interpretation that are made at the *task* level.  These are
             specified via keyword arguments to the client's :meth:`apply` method.
             after,follow
                 You may want to run a task *after* a given set of dependencies have been run and/or
                 run it *where* another set of dependencies are met. To support this, every task has an
                 `after` dependency to restrict time, and a `follow` dependency to restrict
                 destination.
             timeout
                 You may also want to set a time-limit for how long the scheduler should wait before a
                 task's dependencies are met. This is done via a `timeout`, which defaults to 0, which
                 indicates that the task should never timeout. If the timeout is reached, and the
                 scheduler still hasn't been able to assign the task to an engine, the task will fail
                 with a :class:`DependencyTimeout`.
             .. note::
                 Dependencies only work within the task scheduler. You cannot instruct a load-balanced
                 task to run after a job submitted via the MUX interface.
             The simplest form of Dependencies is with `all=True,success=True,failure=False`. In these cases,
             you can skip using Dependency objects, and just pass msg_ids or AsyncResult objects as the
             `follow` and `after` keywords to :meth:`client.apply`:
             .. sourcecode:: ipython
                 In [14]: client.block=False
                 In [15]: ar = lview.apply(f, args, kwargs)
                 In [16]: ar2 = lview.apply(f2)
                 In [17]: with lview.temp_flags(after=[ar,ar2]):
                    ....:    ar3 = lview.apply(f3)
                 In [18]: with lview.temp_flags(follow=[ar], timeout=2.5)
                    ....:    ar4 = lview.apply(f3)
             .. seealso::
                 Some parallel workloads can be described as a `Directed Acyclic Graph
                 <http://en.wikipedia.org/wiki/Directed_acyclic_graph>`_, or DAG. See :ref:`DAG
                 Dependencies <dag_dependencies>` for an example demonstrating how to use map a NetworkX DAG
                 onto task dependencies.
             Impossible Dependencies
             ***********************
             The schedulers do perform some analysis on graph dependencies to determine whether they
             are not possible to be met. If the scheduler does discover that a dependency cannot be
             met, then the task will fail with an :class:`ImpossibleDependency` error. This way, if the
             scheduler realized that a task can never be run, it won't sit indefinitely in the
             scheduler clogging the pipeline.
             The basic cases that are checked:
             * depending on nonexistent messages
             * `follow` dependencies were run on more than one machine and `all=True`
             * any dependencies failed and `all=True,success=True,failures=False`
             * all dependencies failed and `all=False,success=True,failure=False`
             .. warning::
                 This analysis has not been proven to be rigorous, so it is likely possible for tasks
                 to become impossible to run in obscure situations, so a timeout may be a good choice.
             Retries and Resubmit
             ====================
             Retries
             -------
             Another flag for tasks is `retries`.  This is an integer, specifying how many times
             a task should be resubmitted after failure.  This is useful for tasks that should still run
             if their engine was shutdown, or may have some statistical chance of failing.  The default
             is to not retry tasks.
             Resubmit
             --------
             Sometimes you may want to re-run a task. This could be because it failed for some reason, and
             you have fixed the error, or because you want to restore the cluster to an interrupted state.
             For this, the :class:`Client` has a :meth:`rc.resubmit` method.  This simply takes one or more
             msg_ids, and returns an :class:`AsyncHubResult` for the result(s).  You cannot resubmit
             a task that is pending - only those that have finished, either successful or unsuccessful.
             .. _parallel_schedulers:
             Schedulers
             ==========
             There are a variety of valid ways to determine where jobs should be assigned in a
             load-balancing situation.  In IPython, we support several standard schemes, and
             even make it easy to define your own.  The scheme can be selected via the ``scheme``
             argument to :command:`ipcontroller`, or in the :attr:`TaskScheduler.schemename` attribute
             of a controller config object.
             The built-in routing schemes:
             To select one of these schemes, simply do::
                 $ ipcontroller --scheme=<schemename>
                 for instance:
                 $ ipcontroller --scheme=lru
             lru: Least Recently Used
                 Always assign work to the least-recently-used engine.  A close relative of
                 round-robin, it will be fair with respect to the number of tasks, agnostic
                 with respect to runtime of each task.
             plainrandom: Plain Random
                 Randomly picks an engine on which to run.
             twobin: Two-Bin Random
                 **Requires numpy**
                 Pick two engines at random, and use the LRU of the two. This is known to be better
                 than plain random in many cases, but requires a small amount of computation.
             leastload: Least Load
                 **This is the default scheme**
                 Always assign tasks to the engine with the fewest outstanding tasks (LRU breaks tie).
             weighted: Weighted Two-Bin Random
                 **Requires numpy**
                 Pick two engines at random using the number of outstanding tasks as inverse weights,
                 and use the one with the lower load.
             Greedy Assignment
             -----------------
             Tasks are assigned greedily as they are submitted. If their dependencies are
             met, they will be assigned to an engine right away, and multiple tasks can be
             assigned to an engine at a given time. This limit is set with the
             ``TaskScheduler.hwm`` (high water mark) configurable:
             .. sourcecode:: python
                 # the most common choices are:
-                c.TaskSheduler.hwm = 0 # (minimal latency, default)
+                c.TaskSheduler.hwm = 0 # (minimal latency, default in IPython ≤ 0.12)
                 # or
-                c.TaskScheduler.hwm = 1 # (most-informed balancing)
+                c.TaskScheduler.hwm = 1 # (most-informed balancing, default in > 0.12)
-            The default is 0, or no-limit. That is, there is no limit to the number of
+            In IPython ≤ 0.12,the default is 0, or no-limit. That is, there is no limit to the number of
             tasks that can be outstanding on a given engine. This greatly benefits the
             latency of execution, because network traffic can be hidden behind computation.
             However, this means that workload is assigned without knowledge of how long
             each task might take, and can result in poor load-balancing, particularly for
             submitting a collection of heterogeneous tasks all at once. You can limit this
             effect by setting hwm to a positive integer, 1 being maximum load-balancing (a
             task will never be waiting if there is an idle engine), and any larger number
             being a compromise between load-balance and latency-hiding.
+            In practice, some users have been confused by having this optimization on by
+            default, and the default value has been changed to 1. This can be slower,
+            but has more obvious behavior and won't result in assigning too many tasks to
+            some engines in heterogeneous cases.
             Pure ZMQ Scheduler
             ------------------
             For maximum throughput, the 'pure' scheme is not Python at all, but a C-level
             :class:`MonitoredQueue` from PyZMQ, which uses a ZeroMQ ``DEALER`` socket to perform all
             load-balancing. This scheduler does not support any of the advanced features of the Python
             :class:`.Scheduler`.
             Disabled features when using the ZMQ Scheduler:
             * Engine unregistration
                 Task farming will be disabled if an engine unregisters.
                 Further, if an engine is unregistered during computation, the scheduler may not recover.
             * Dependencies
                 Since there is no Python logic inside the Scheduler, routing decisions cannot be made
                 based on message content.
             * Early destination notification
                 The Python schedulers know which engine gets which task, and notify the Hub.  This
                 allows graceful handling of Engines coming and going.  There is no way to know
                 where ZeroMQ messages have gone, so there is no way to know what tasks are on which
                 engine until they *finish*.  This makes recovery from engine shutdown very difficult.
             .. note::
                 TODO: performance comparisons
             More details
             ============
             The :class:`LoadBalancedView` has many more powerful features that allow quite a bit
             of flexibility in how tasks are defined and run. The next places to look are
             in the following classes:
             * :class:`~IPython.parallel.client.view.LoadBalancedView`
             * :class:`~IPython.parallel.client.asyncresult.AsyncResult`
             * :meth:`~IPython.parallel.client.view.LoadBalancedView.apply`
             * :mod:`~IPython.parallel.controller.dependency`
             The following is an overview of how to use these classes together:
 . Create a :class:`Client` and :class:`LoadBalancedView`
 . Define some functions to be run as tasks
 . Submit your tasks to using the :meth:`apply` method of your
                :class:`LoadBalancedView` instance.
 . Use :meth:`.Client.get_result` to get the results of the
                tasks, or use the :meth:`AsyncResult.get` method of the results to wait
                for and then receive the results.
             .. seealso::
                 A demo of :ref:`DAG Dependencies <dag_dependencies>` with NetworkX and IPython.