upstream/ipython Commit - r3624:75d9c512

dependency tweaks + dependency/scheduler docs

MinRK -

r3624:75d9c512

parent child

docs/source/parallelz/dag_dependencies.txt

0 created 644 +172 0

			@@ -0,0 +1,172 b''
		1	.. _dag_dependencies:
		2
		3	================
		4	DAG Dependencies
		5	================
		6
		7	Often, parallel workflow is described in terms of a `Directed Acyclic Graph
		8	<http://en.wikipedia.org/wiki/Directed_acyclic_graph>`_ or DAG. A popular library
		9	for working with Graphs is NetworkX_. Here, we will walk through a demo mapping
		10	a nx DAG to task dependencies.
		11
		12	The full script that runs this demo can be found in
		13	:file:`docs/examples/newparallel/dagdeps.py`.
		14
		15	Why are DAGs good for task dependencies?
		16	----------------------------------------
		17
		18	The 'G' in DAG is 'Graph'. A Graph is a collection of nodes and edges that connect
		19	the nodes. For our purposes, each node would be a task, and each edge would be a
		20	dependency. The 'D' in DAG stands for 'Directed'. This means that each edge has a
		21	direction associated with it. So we can interpret the edge (a,b) as meaning that b depends
		22	on a, whereas the edge (b,a) would mean a depends on b. The 'A' is 'Acyclic', meaning that
		23	there must not be any closed loops in the graph. This is important for dependencies,
		24	because if a loop were closed, then a task could ultimately depend on itself, and never be
		25	able to run. If your workflow can be described as a DAG, then it is impossible for your
		26	dependencies to cause a deadlock.
		27
		28	A Sample DAG
		29	------------
		30
		31	Here, we have a very simple 5-node DAG:
		32
		33	.. figure:: simpledag.*
		34
		35	With NetworkX, an arrow is just a fattened bit on the edge. Here, we can see that task 0
		36	depends on nothing, and can run immediately. 1 and 2 depend on 0; 3 depends on
		37	1 and 2; and 4 depends only on 1.
		38
		39	A possible sequence of events for this workflow:
		40
		41	0. Task 0 can run right away
		42	1. 0 finishes, so 1,2 can start
		43	2. 1 finishes, 3 is still waiting on 2, but 4 can start right away
		44	3. 2 finishes, and 3 can finally start
		45
		46
		47	Further, taking failures into account, assuming all dependencies are run with the default
		48	`success_only=True`, the following cases would occur for each node's failure:
		49
		50	0. fails: all other tasks fail as Impossible
		51	1. 2 can still succeed, but 3,4 are unreachable
		52	2. 3 becomes unreachable, but 4 is unaffected
		53	3. and 4. are terminal, and can have no effect on other nodes
		54
		55	The code to generate the simple DAG:
		56
		57	.. sourcecode:: python
		58
		59	import networkx as nx
		60
		61	G = nx.DiGraph()
		62
		63	# add 5 nodes, labeled 0-4:
		64	map(G.add_node, range(5))
		65	# 1,2 depend on 0:
		66	G.add_edge(0,1)
		67	G.add_edge(0,2)
		68	# 3 depends on 1,2
		69	G.add_edge(1,3)
		70	G.add_edge(2,3)
		71	# 4 depends on 1
		72	G.add_edge(1,4)
		73
		74	# now draw the graph:
		75	pos = { 0 : (0,0), 1 : (1,1), 2 : (-1,1),
		76	3 : (0,2), 4 : (2,2)}
		77	nx.draw(G, pos, edge_color='r')
		78
		79
		80	For demonstration purposes, we have a function that generates a random DAG with a given
		81	number of nodes and edges.
		82
		83	.. literalinclude:: ../../examples/newparallel/dagdeps.py
		84	:language: python
		85	:lines: 20-36
		86
		87	So first, we start with a graph of 32 nodes, with 128 edges:
		88
		89	.. sourcecode:: ipython
		90
		91	In [2]: G = random_dag(32,128)
		92
		93	Now, we need to build our dict of jobs corresponding to the nodes on the graph:
		94
		95	.. sourcecode:: ipython
		96
		97	In [3]: jobs = {}
		98
		99	# in reality, each job would presumably be different
		100	# randomwait is just a function that sleeps for a random interval
		101	In [4]: for node in G:
		102	...: jobs[node] = randomwait
		103
		104	Once we have a dict of jobs matching the nodes on the graph, we can start submitting jobs,
		105	and linking up the dependencies. Since we don't know a job's msg_id until it is submitted,
		106	which is necessary for building dependencies, it is critical that we don't submit any jobs
		107	before other jobs it may depend on. Fortunately, NetworkX provides a
		108	:meth:`topological_sort` method which ensures exactly this. It presents an iterable, that
		109	guarantees that when you arrive at a node, you have already visited all the nodes it
		110	on which it depends:
		111
		112	.. sourcecode:: ipython
		113
		114	In [5]: c = client.Client()
		115
		116	In [6]: results = {}
		117
		118	In [7]: for node in G.topological_sort():
		119	...: # get list of AsyncResult objects from nodes
		120	...: # leading into this one as dependencies
		121	...: deps = [ results[n] for n in G.predecessors(node) ]
		122	...: # submit and store AsyncResult object
		123	...: results[node] = client.apply(jobs[node], after=deps, block=False)
		124
		125	Now that we have submitted all the jobs, we can wait for the results:
		126
		127	.. sourcecode:: ipython
		128
		129	In [8]: [ r.get() for r in results.values() ]
		130
		131	Now, at least we know that all the jobs ran and did not fail (``r.get()`` would have
		132	raised an error if a task failed). But we don't know that the ordering was properly
		133	respected. For this, we can use the :attr:`metadata` attribute of each AsyncResult.
		134
		135	These objects store a variety of metadata about each task, including various timestamps.
		136	We can validate that the dependencies were respected by checking that each task was
		137	started after all of its predecessors were completed:
		138
		139	.. literalinclude:: ../../examples/newparallel/dagdeps.py
		140	:language: python
		141	:lines: 64-70
		142
		143	We can also validate the graph visually. By drawing the graph with each node's x-position
		144	as its start time, all arrows must be pointing to the right if the order was respected.
		145	For spreading, the y-position will be the in-degree, so tasks with lots of dependencies
		146	will be at the top, and tasks with few dependencies will be at the bottom.
		147
		148	.. sourcecode:: ipython
		149
		150	In [10]: from matplotlib.dates import date2num
		151
		152	In [11]: from matplotlib.cm import gist_rainbow
		153
		154	In [12]: pos = {}; colors = {}
		155
		156	In [12]: for node in G:
		157	...: md = results[node].metadata
		158	...: start = date2num(md.started)
		159	...: runtime = date2num(md.completed) - start
		160	...: pos[node] = (start, runtime)
		161	...: colors[node] = md.engine_id
		162
		163	In [13]: nx.draw(G, pos, node_list=colors.keys(), node_color=colors.values(),
		164	...: cmap=gist_rainbow)
		165
		166	.. figure:: dagdeps.*
		167
		168	Time started on x, runtime on y, and color-coded by engine-id (in this case there
		169	were four engines).
		170
		171
		172	.. _NetworkX: http://networkx.lanl.gov/

docs/source/parallelz/dagdeps.pdf

0 created 644 binary 0 0

NO CONTENT: new file 100644, binary diff hidden

docs/source/parallelz/dagdeps.png

0 created 644 binary 0 0

NO CONTENT: new file 100644, binary diff hidden

docs/source/parallelz/simpledag.pdf

0 created 644 binary 0 0

NO CONTENT: new file 100644, binary diff hidden

docs/source/parallelz/simpledag.png

0 created 644 binary 0 0

NO CONTENT: new file 100644, binary diff hidden

IPython/zmq/parallel/client.py

0 +8 -12

                          return dep.msg_ids
                      elif dep is None:
                          return []
-                     elif isinstance(dep, set):
-                         return list(dep)
-                     elif isinstance(dep, (list,dict)):
-                         return dep
-                     elif isinstance(dep, str):
-                         return [dep]
                      else:
-                         raise TypeError("Dependency may be: set,list,dict,Dependency or AsyncResult, not %r"%type(dep))
+                         # pass to Dependency constructor
+                         return list(Dependency(dep))
                  def apply(self, f, args=None, kwargs=None, bound=True, block=None, targets=None,
                                      after=None, follow=None, timeout=None):
                          This job will only be run on an engine where this dependency
                          is met.
-                     timeout : float or None
+                     timeout : float/int or None
                          Only for load-balanced execution (targets=None)
-                         Specify an amount of time (in seconds)
+                         Specify an amount of time (in seconds) for the scheduler to
+                         wait for dependencies to be met before failing with a
+                         DependencyTimeout.
                      Returns
                      -------
                      if not isinstance(kwargs, dict):
                          raise TypeError("kwargs must be dict, not %s"%type(kwargs))
-                     after = self._build_dependency(after)
-                     follow = self._build_dependency(follow)
                      options  = dict(bound=bound, block=block)
                      if targets is None:
                              warnings.warn(msg, RuntimeWarning)
+                     after = self._build_dependency(after)
+                     follow = self._build_dependency(follow)
                      subheader = dict(after=after, follow=follow, timeout=timeout)
                      bufs = ss.pack_apply_message(f,args,kwargs)
                      content = dict(bound=bound)

IPython/zmq/parallel/dependency.py

0 +24 -25

              from IPython.external.decorator import decorator
              from error import UnmetDependency
-             # flags
-             ALL = 1 << 0
-             ANY = 1 << 1
-             HERE = 1 << 2
-             ANYWHERE = 1 << 3
+             from asyncresult import AsyncResult
              class depend(object):
                  Subclassed from set()."""
-                 mode='all'
+                 all=True
                  success_only=True
-                 def __init__(self, dependencies=[], mode='all', success_only=True):
+                 def __init__(self, dependencies=[], all=True, success_only=True):
                      if isinstance(dependencies, dict):
                          # load from dict
-                         mode = dependencies.get('mode', mode)
+                         all = dependencies.get('all', True)
                          success_only = dependencies.get('success_only', success_only)
                          dependencies = dependencies.get('dependencies', [])
-                     set.__init__(self, dependencies)
-                     self.mode = mode.lower()
+                     ids = []
+                     if isinstance(dependencies, AsyncResult):
+                         ids.extend(AsyncResult.msg_ids)
+                     else:
+                         for d in dependencies:
+                             if isinstance(d, basestring):
+                                 ids.append(d)
+                             elif isinstance(d, AsyncResult):
+                                 ids.extend(d.msg_ids)
+                             else:
+                                 raise TypeError("invalid dependency type: %r"%type(d))
+                     set.__init__(self, ids)
+                     self.all = all
                      self.success_only=success_only
-                     if self.mode not in ('any', 'all'):
-                         raise NotImplementedError("Only any|all supported, not %r"%mode)
                  def check(self, completed, failed=None):
                      if failed is not None and not self.success_only:
                          completed = completed.union(failed)
                      if len(self) == 0:
                          return True
-                     if self.mode == 'all':
+                     if self.all:
                          return self.issubset(completed)
-                     elif self.mode == 'any':
-                         return not self.isdisjoint(completed)
                      else:
-                         raise NotImplementedError("Only any|all supported, not %r"%mode)
+                         return not self.isdisjoint(completed)
                  def unreachable(self, failed):
                      if len(self) == 0 or len(failed) == 0 or not self.success_only:
                          return False
-                     print self, self.success_only, self.mode, failed
-                     if self.mode == 'all':
+                     # print self, self.success_only, self.all, failed
+                     if self.all:
                          return not self.isdisjoint(failed)
-                     elif self.mode == 'any':
-                         return self.issubset(failed)
                      else:
-                         raise NotImplementedError("Only any|all supported, not %r"%mode)
+                         return self.issubset(failed)
                  def as_dict(self):
                      """Represent this dependency as a dict. For json compatibility."""
                      return dict(
                          dependencies=list(self),
-                         mode=self.mode,
+                         all=self.all,
                          success_only=self.success_only,
                      )
-             __all__ = ['depend', 'require', 'Dependency']
+             __all__ = ['depend', 'require', 'dependent', 'Dependency']

IPython/zmq/parallel/error.py

0 +4 -1

              class ImpossibleDependency(UnmetDependency):
                  pass
-             class DependencyTimeout(UnmetDependency):
+             class DependencyTimeout(ImpossibleDependency):
+                 pass
+             class InvalidDependency(ImpossibleDependency):
                  pass
              class RemoteError(KernelError):

IPython/zmq/parallel/hub.py

0 +1 -1

                  """The Configurable for setting up a Hub."""
                  # name of a scheduler scheme
-                 scheme = Str('lru', config=True)
+                 scheme = Str('leastload', config=True)
                  # port-pairs for monitoredqueues:
                  hb = Instance(list, config=True)

IPython/zmq/parallel/ipclusterapp.py

0 +19 -6

              import os
              import signal
              import logging
+             import errno
+             import zmq
              from zmq.eventloop import ioloop
              from IPython.external.argparse import ArgumentParser, SUPPRESS
                          # observing of engine stopping is inconsistent. Some launchers
                          # might trigger on a single engine stopping, other wait until
                          # all stop.  TODO: think more about how to handle this.
+                     else:
+                         self.controller_launcher = None
                      el_class = import_item(config.Global.engine_launcher)
                      self.engine_launcher = el_class(
                  def stop_controller(self, r=None):
                      # self.log.info("In stop_controller")
-                     if self.controller_launcher.running:
+                     if self.controller_launcher and self.controller_launcher.running:
                          return self.controller_launcher.stop()
                  def stop_engines(self, r=None):
                      self.write_pid_file()
                      try:
                          self.loop.start()
-                     except:
-                         self.log.info("stopping...")
+                     except KeyboardInterrupt:
+                         pass
+                     except zmq.ZMQError as e:
+                         if e.errno == errno.EINTR:
+                             pass
+                         else:
+                             raise
                      self.remove_pid_file()
                  def start_app_engines(self):
                      # self.write_pid_file()
                      try:
                          self.loop.start()
-                     except:
-                         self.log.fatal("stopping...")
+                     except KeyboardInterrupt:
+                         pass
+                     except zmq.ZMQError as e:
+                         if e.errno == errno.EINTR:
+                             pass
+                         else:
+                             raise
                      # self.remove_pid_file()
                  def start_app_stop(self):

IPython/zmq/parallel/scheduler.py

0 +48 -41

                  mon_stream = Instance(zmqstream.ZMQStream) # hub-facing pub stream
                  # internals:
-                 dependencies = Dict() # dict by msg_id of [ msg_ids that depend on key ]
+                 graph = Dict() # dict by msg_id of [ msg_ids that depend on key ]
                  depending = Dict() # dict by msg_id of (msg_id, raw_msg, after, follow)
                  pending = Dict() # dict by engine_uuid of submitted tasks
                  completed = Dict() # dict by engine_uuid of completed tasks
                  all_completed = Set() # set of all completed tasks
                  all_failed = Set() # set of all failed tasks
                  all_done = Set() # set of all finished tasks=union(completed,failed)
+                 all_ids = Set() # set of all submitted task IDs
                  blacklist = Dict() # dict by msg_id of locations where a job has encountered UnmetDependency
                  auditor = Instance('zmq.eventloop.ioloop.PeriodicCallback')
                          msg = self.session.send(self.client_stream, 'apply_reply', content,
                                                                  parent=parent, ident=idents)
                          self.session.send(self.mon_stream, msg, ident=['outtask']+idents)
-                         self.update_dependencies(msg_id)
+                         self.update_graph(msg_id)
                  #-----------------------------------------------------------------------
                      self.notifier_stream.flush()
                      try:
                          idents, msg = self.session.feed_identities(raw_msg, copy=False)
-                     except Exception as e:
-                         self.log.error("task::Invaid msg: %s"%msg)
+                         msg = self.session.unpack_message(msg, content=False, copy=False)
+                     except:
+                         self.log.error("task::Invaid task: %s"%raw_msg, exc_info=True)
                          return
                      # send to monitor
                      self.mon_stream.send_multipart(['intask']+raw_msg, copy=False)
-                     msg = self.session.unpack_message(msg, content=False, copy=False)
                      header = msg['header']
                      msg_id = header['msg_id']
+                     self.all_ids.add(msg_id)
                      # time dependencies
                      after = Dependency(header.get('after', []))
-                     if after.mode == 'all':
+                     if after.all:
                          after.difference_update(self.all_completed)
                          if not after.success_only:
                              after.difference_update(self.all_failed)
                      # location dependencies
                      follow = Dependency(header.get('follow', []))
-                     # check if unreachable:
-                     if after.unreachable(self.all_failed) or follow.unreachable(self.all_failed):
-                         self.depending[msg_id] = [raw_msg,MET,MET,None]
-                         return self.fail_unreachable(msg_id)
+                     for dep in after,follow:
+                         # check valid:
+                         if msg_id in dep or dep.difference(self.all_ids):
+                             self.depending[msg_id] = [raw_msg,MET,MET,None]
+                             return self.fail_unreachable(msg_id, error.InvalidDependency)
+                         # check if unreachable:
+                         if dep.unreachable(self.all_failed):
+                             self.depending[msg_id] = [raw_msg,MET,MET,None]
+                             return self.fail_unreachable(msg_id)
                      # turn timeouts into datetime objects:
                      timeout = header.get('timeout', None)
                      if after.check(self.all_completed, self.all_failed):
                          # time deps already met, try to run
-                         if not self.maybe_run(msg_id, raw_msg, follow):
+                         if not self.maybe_run(msg_id, raw_msg, follow, timeout):
                              # can't run yet
                              self.save_unmet(msg_id, raw_msg, after, follow, timeout)
                      else:
                                  self.fail_unreachable(msg_id, timeout=True)
                  @logged
-                 def fail_unreachable(self, msg_id, timeout=False):
+                 def fail_unreachable(self, msg_id, why=error.ImpossibleDependency):
                      """a message has become unreachable"""
                      if msg_id not in self.depending:
                          self.log.error("msg %r already failed!"%msg_id)
                          return
                      raw_msg, after, follow, timeout = self.depending.pop(msg_id)
                      for mid in follow.union(after):
-                         if mid in self.dependencies:
-                             self.dependencies[mid].remove(msg_id)
+                         if mid in self.graph:
+                             self.graph[mid].remove(msg_id)
                      # FIXME: unpacking a message I've already unpacked, but didn't save:
                      idents,msg = self.session.feed_identities(raw_msg, copy=False)
                      msg = self.session.unpack_message(msg, copy=False, content=False)
                      header = msg['header']
-                     impossible = error.DependencyTimeout if timeout else error.ImpossibleDependency
                      try:
-                         raise impossible()
+                         raise why()
                      except:
                          content = ss.wrap_exception()
                                                              parent=header, ident=idents)
                      self.session.send(self.mon_stream, msg, ident=['outtask']+idents)
-                     self.update_dependencies(msg_id, success=False)
+                     self.update_graph(msg_id, success=False)
                  @logged
-                 def maybe_run(self, msg_id, raw_msg, follow=None):
+                 def maybe_run(self, msg_id, raw_msg, follow=None, timeout=None):
                      """check location dependencies, and run if they are met."""
                      if follow:
                          indices = filter(can_run, range(len(self.targets)))
                          if not indices:
-                             # TODO evaluate unmeetable follow dependencies
-                             if follow.mode == 'all':
+                             if follow.all:
                                  dests = set()
                                  relevant = self.all_completed if follow.success_only else self.all_done
                                  for m in follow.intersection(relevant):
                      else:
                          indices = None
-                     self.submit_task(msg_id, raw_msg, indices)
+                     self.submit_task(msg_id, raw_msg, follow, timeout, indices)
                      return True
                  @logged
                      self.depending[msg_id] = [raw_msg,after,follow,timeout]
                      # track the ids in follow or after, but not those already finished
                      for dep_id in after.union(follow).difference(self.all_done):
-                         if dep_id not in self.dependencies:
-                             self.dependencies[dep_id] = set()
-                         self.dependencies[dep_id].add(msg_id)
+                         if dep_id not in self.graph:
+                             self.graph[dep_id] = set()
+                         self.graph[dep_id].add(msg_id)
                  @logged
-                 def submit_task(self, msg_id, raw_msg, follow=None, indices=None):
+                 def submit_task(self, msg_id, raw_msg, follow, timeout, indices=None):
                      """Submit a task to any of a subset of our targets."""
                      if indices:
                          loads = [self.loads[i] for i in indices]
                      self.engine_stream.send(target, flags=zmq.SNDMORE, copy=False)
                      self.engine_stream.send_multipart(raw_msg, copy=False)
                      self.add_job(idx)
-                     self.pending[target][msg_id] = (raw_msg, follow)
+                     self.pending[target][msg_id] = (raw_msg, follow, timeout)
                      content = dict(msg_id=msg_id, engine_id=target)
                      self.session.send(self.mon_stream, 'task_destination', content=content,
                                      ident=['tracktask',self.session.session])
                  def dispatch_result(self, raw_msg):
                      try:
                          idents,msg = self.session.feed_identities(raw_msg, copy=False)
-                     except Exception as e:
-                         self.log.error("task::Invaid result: %s"%msg)
+                         msg = self.session.unpack_message(msg, content=False, copy=False)
+                     except:
+                         self.log.error("task::Invaid result: %s"%raw_msg, exc_info=True)
                          return
-                     msg = self.session.unpack_message(msg, content=False, copy=False)
                      header = msg['header']
                      if header.get('dependencies_met', True):
                          success = (header['status'] == 'ok')
                      self.all_done.add(msg_id)
                      self.destinations[msg_id] = engine
-                     self.update_dependencies(msg_id, success)
+                     self.update_graph(msg_id, success)
                  @logged
                  def handle_unmet_dependency(self, idents, parent):
                          self.blacklist[msg_id] = set()
                      self.blacklist[msg_id].add(engine)
                      raw_msg,follow,timeout = self.pending[engine].pop(msg_id)
-                     if not self.maybe_run(msg_id, raw_msg, follow):
+                     if not self.maybe_run(msg_id, raw_msg, follow, timeout):
                          # resubmit failed, put it back in our dependency tree
                          self.save_unmet(msg_id, raw_msg, MET, follow, timeout)
                      pass
                  @logged
-                 def update_dependencies(self, dep_id, success=True):
+                 def update_graph(self, dep_id, success=True):
                      """dep_id just finished. Update our dependency
                      table and submit any jobs that just became runable."""
                      # print ("\n\n***********")
                      # pprint (dep_id)
-                     # pprint (self.dependencies)
+                     # pprint (self.graph)
                      # pprint (self.depending)
                      # pprint (self.all_completed)
                      # pprint (self.all_failed)
                      # print ("\n\n***********\n\n")
-                     if dep_id not in self.dependencies:
+                     if dep_id not in self.graph:
                          return
-                     jobs = self.dependencies.pop(dep_id)
+                     jobs = self.graph.pop(dep_id)
                      for msg_id in jobs:
                          raw_msg, after, follow, timeout = self.depending[msg_id]
                          # if dep_id in after:
-                         #     if after.mode == 'all' and (success or not after.success_only):
+                         #     if after.all and (success or not after.success_only):
                          #         after.remove(dep_id)
                          if after.unreachable(self.all_failed) or follow.unreachable(self.all_failed):
                          elif after.check(self.all_completed, self.all_failed): # time deps met, maybe run
                              self.depending[msg_id][1] = MET
-                             if self.maybe_run(msg_id, raw_msg, follow):
+                             if self.maybe_run(msg_id, raw_msg, follow, timeout):
                                  self.depending.pop(msg_id)
                                  for mid in follow.union(after):
-                                     if mid in self.dependencies:
-                                         self.dependencies[mid].remove(msg_id)
+                                     if mid in self.graph:
+                                         self.graph[mid].remove(msg_id)
                  #----------------------------------------------------------------------
                  # methods to be overridden by subclasses
-             def launch_scheduler(in_addr, out_addr, mon_addr, not_addr, config=None,logname='ZMQ', log_addr=None, loglevel=logging.DEBUG, scheme='weighted'):
+             def launch_scheduler(in_addr, out_addr, mon_addr, not_addr, config=None,logname='ZMQ',
+                                         log_addr=None, loglevel=logging.DEBUG, scheme='lru'):
                  from zmq.eventloop import ioloop
                  from zmq.eventloop.zmqstream import ZMQStream

IPython/zmq/parallel/view.py

0 +6 -1

                  >>> dv_even = client[::2]
                  >>> dv_some = client[1:3]
-                 This object provides dictionary access
+                 This object provides dictionary access to engine namespaces:
+                 # push a=5:
+                 >>> dv['a'] = 5
+                 # pull 'foo':
+                 >>> db['foo']
                  """

docs/examples/newparallel/dagdeps.py ~~docs/examples/newparallel/demo/dag/dagdeps.py~~

0 renamed +12 -8

                  """Submit jobs via client where G describes the time dependencies."""
                  results = {}
                  for node in nx.topological_sort(G):
-                     deps = [ results[n].msg_ids[0] for n in G.predecessors(node) ]
+                     deps = [ results[n] for n in G.predecessors(node) ]
                      results[node] = client.apply(jobs[node], after=deps)
                  return results
                  point at least slightly to the right if the graph is valid.
                  """
                  from matplotlib.dates import date2num
+                 from matplotlib.cm import gist_rainbow
                  print "building DAG"
                  G = random_dag(nodes, edges)
                  jobs = {}
                  pos = {}
+                 colors = {}
                  for node in G:
                      jobs[node] = randomwait
                  client = cmod.Client()
-                 print "submitting tasks"
+                 print "submitting %i tasks with %i dependencies"%(nodes,edges)
                  results = submit_jobs(client, G, jobs)
                  print "waiting for results"
                  client.barrier()
                  print "done"
                  for node in G:
-                     # times[node] = results[node].get()
-                     t = date2num(results[node].metadata.started)
-                     pos[node] = (t, G.in_degree(node)+random())
+                     md = results[node].metadata
+                     start = date2num(md.started)
+                     runtime = date2num(md.completed) - start
+                     pos[node] = (start, runtime)
+                     colors[node] = md.engine_id
                  validate_tree(G, results)
-                 nx.draw(G, pos)
+                 nx.draw(G, pos, node_list = colors.keys(), node_color=colors.values(), cmap=gist_rainbow)
                  return G,results
              if __name__ == '__main__':
                  import pylab
-                 main(32,128)
+                 # main(5,10)
+                 main(32,96)
                  pylab.show()
   No newline at end of file

docs/source/parallelz/index.txt

0 +1 0

                 parallel_security.txt
                 parallel_winhpc.txt
                 parallel_demos.txt
+                dag_dependencies.txt

docs/source/parallelz/parallel_demos.txt

0 +1 -8

                  ipython --pylab
-             at the system command line. If this prints an error message, you will
-             need to install the default profiles from within IPython by doing,
-             .. sourcecode:: ipython
-                 In [1]: %install_profiles
-             and then restarting IPython.
+             at the system command line.
 million digits of pi
              ========================

docs/source/parallelz/parallel_multiengine.txt

0 +14 -17

              communicate with the engines are built on top of it), is :meth:`Client.apply`.
              Ideally, :meth:`apply` would have the signature ``apply(f,*args,**kwargs)``,
              which would call ``f(*args,**kwargs)`` remotely.  However, since :class:`Clients`
-             require some more options, they cannot reasonably provide this interface.
+             require some more options, they cannot easily provide this interface.
              Instead, they provide the signature::
-                 c.apply(f, args=None, kwargs=None, bound=True, block=None,
-                                 targets=None, after=None, follow=None)
+                 c.apply(f, args=None, kwargs=None, bound=True, block=None, targets=None,
+                                 after=None, follow=None, timeout=None)
              In order to provide the nicer interface, we have :class:`View` classes, which wrap
              :meth:`Client.apply` by using attributes and extra :meth:`apply_x` methods to determine
                  In [5]: dview['b'] = 10
                  In [6]: dview.apply_bound(lambda x: a+b+x, 27)
-                 Out[6]: [42,42,42,42]
+                 Out[6]: [42, 42, 42, 42]
              Python commands can be executed on specific engines by calling execute using
              the ``targets`` keyword argument, or creating a :class:`DirectView` instance
                  In [7]: rc[1::2].execute('c=a-b') # shorthand for rc.execute('c=a-b',targets=[1,3])
                  In [8]: rc[:]['c'] # shorthand for rc.pull('c',targets='all')
-                 Out[8]: [15,-5,15,-5]
+                 Out[8]: [15, -5, 15, -5]
              .. note::
              .. Note::
-                 The :class:`AsyncResult` object provides the exact same interface as
+                 The :class:`AsyncResult` object provides a superset of the interface in
                  :py:class:`multiprocessing.pool.AsyncResult`.  See the
                  `official Python documentation <http://docs.python.org/library/multiprocessing#multiprocessing.pool.AsyncResult>`_
                  for more.
              .. sourcecode:: ipython
                  # define our function
-                 In [35]: def wait(t):
-                    ....:     import time
-                    ....:     tic = time.time()
-                    ....:     time.sleep(t)
-                    ....:     return time.time()-tic
+                 In [6]: def wait(t):
+                    ...:     import time
+                    ...:     tic = time.time()
+                    ...:     time.sleep(t)
+                    ...:     return time.time()-tic
-                 # In blocking mode
-                 In [6]: rc.apply('import time')
                  # In non-blocking mode
                  In [7]: pr = rc[:].apply_async(wait, 2)
              Often, it is desirable to wait until a set of :class:`AsyncResult` objects
              are done. For this, there is a the method :meth:`barrier`. This method takes a
-             tuple of :class:`AsyncResult` objects (or `msg_ids`) and blocks until all of the associated
-             results are ready:
+             tuple of :class:`AsyncResult` objects (or `msg_ids`) and blocks until all of the
+             associated results are ready:
              .. sourcecode:: ipython
                  # Wait until all of them are done
                  In [74]: rc.barrier(pr_list)
-                 # Then, their results are ready using get_result or the r attribute
+                 # Then, their results are ready using get() or the `.r` attribute
                  In [75]: pr_list[0].get()
                  Out[75]: [2.9982571601867676, 2.9982588291168213, 2.9987530708312988, 2.9990990161895752]

docs/source/parallelz/parallel_security.txt

0 +2 -1

              .. [RFC5246] <http://tools.ietf.org/html/rfc5246>
+             .. [OpenSSH] <http://www.openssh.com/>
+             .. [Paramiko] <http://www.lag.net/paramiko/>

docs/source/parallelz/parallel_task.txt

0 +272 -9

		@@ -4,13 +4,13 b''
4	4	The IPython task interface
5	5	==========================
6	6
7		The task interface to the c~~ontroll~~er presents the engines as a fault tolerant,
	7	The task interface to the cluster presents the engines as a fault tolerant,
8	8	dynamic load-balanced system of workers. Unlike the multiengine interface, in
9		the task interface, the user have no direct access to individual engines. By
10		allowing the IPython scheduler to assign work, this interface is ~~both simpler~~
11		and more powerful.
	9	the task interface the user have no direct access to individual engines. By
	10	allowing the IPython scheduler to assign work, this interface is simultaneously
	11	simpler and more powerful.
12	12
13		Best of all the user can use both of these interfaces running at the same time
	13	Best of all, the user can use both of these interfaces running at the same time
14	14	to take advantage of their respective strengths. When the user can break up
15	15	the user's work into segments that do not depend on previous execution, the
16	16	task interface is ideal. But it also has more power and flexibility, allowing
		@@ -97,11 +97,275 b' that turns any Python function into a parallel function:'
97	97	In [10]: @lview.parallel()
98	98	....: def f(x):
99	99	....: return 10.0x*4
100		....:
	100	....:
101	101
102	102	In [11]: f.map(range(32)) # this is done in parallel
103	103	Out[11]: [0.0,10.0,160.0,...]
104	104
	105	Dependencies
	106	============
	107
	108	Often, pure atomic load-balancing is too primitive for your work. In these cases, you
	109	may want to associate some kind of `Dependency` that describes when, where, or whether
	110	a task can be run. In IPython, we provide two types of dependencies:
	111	`Functional Dependencies`_ and `Graph Dependencies`_
	112
	113	.. note::
	114
	115	It is important to note that the pure ZeroMQ scheduler does not support dependencies,
	116	and you will see errors or warnings if you try to use dependencies with the pure
	117	scheduler.
	118
	119	Functional Dependencies
	120	-----------------------
	121
	122	Functional dependencies are used to determine whether a given engine is capable of running
	123	a particular task. This is implemented via a special :class:`Exception` class,
	124	:class:`UnmetDependency`, found in `IPython.zmq.parallel.error`. Its use is very simple:
	125	if a task fails with an UnmetDependency exception, then the scheduler, instead of relaying
	126	the error up to the client like any other error, catches the error, and submits the task
	127	to a different engine. This will repeat indefinitely, and a task will never be submitted
	128	to a given engine a second time.
	129
	130	You can manually raise the :class:`UnmetDependency` yourself, but IPython has provided
	131	some decorators for facilitating this behavior.
	132
	133	There are two decorators and a class used for functional dependencies:
	134
	135	.. sourcecode:: ipython
	136
	137	In [9]: from IPython.zmq.parallel.dependency import depend, require, dependent
	138
	139	@require
	140	********
	141
	142	The simplest sort of dependency is requiring that a Python module is available. The
	143	``@require`` decorator lets you define a function that will only run on engines where names
	144	you specify are importable:
	145
	146	.. sourcecode:: ipython
	147
	148	In [10]: @require('numpy', 'zmq')
	149	...: def myfunc():
	150	...: import numpy,zmq
	151	...: return dostuff()
	152
	153	Now, any time you apply :func:`myfunc`, the task will only run on a machine that has
	154	numpy and pyzmq available.
	155
	156	@depend
	157	*******
	158
	159	The ``@depend`` decorator lets you decorate any function with any other function to
	160	evaluate the dependency. The dependency function will be called at the start of the task,
	161	and if it returns ``False``, then the dependency will be considered unmet, and the task
	162	will be assigned to another engine. If the dependency returns *anything other than
	163	``False``*, the rest of the task will continue.
	164
	165	.. sourcecode:: ipython
	166
	167	In [10]: def platform_specific(plat):
	168	...: import sys
	169	...: return sys.platform == plat
	170
	171	In [11]: @depend(platform_specific, 'darwin')
	172	...: def mactask():
	173	...: do_mac_stuff()
	174
	175	In [12]: @depend(platform_specific, 'nt')
	176	...: def wintask():
	177	...: do_windows_stuff()
	178
	179	In this case, any time you apply ``mytask``, it will only run on an OSX machine.
	180	``@depend`` is just like ``apply``, in that it has a ``@depend(f,args,*kwargs)``
	181	signature.
	182
	183	dependents
	184	**********
	185
	186	You don't have to use the decorators on your tasks, if for instance you may want
	187	to run tasks with a single function but varying dependencies, you can directly construct
	188	the :class:`dependent` object that the decorators use:
	189
	190	.. sourcecode::ipython
	191
	192	In [13]: def mytask(*args):
	193	...: dostuff()
	194
	195	In [14]: mactask = dependent(mytask, platform_specific, 'darwin')
	196	# this is the same as decorating the declaration of mytask with @depend
	197	# but you can do it again:
	198
	199	In [15]: wintask = dependent(mytask, platform_specific, 'nt')
	200
	201	# in general:
	202	In [16]: t = dependent(f, g, dargs, *dkwargs)
	203
	204	# is equivalent to:
	205	In [17]: @depend(g, dargs, *dkwargs)
	206	...: def t(a,b,c):
	207	...: # contents of f
	208
	209	Graph Dependencies
	210	------------------
	211
	212	Sometimes you want to restrict the time and/or location to run a given task as a function
	213	of the time and/or location of other tasks. This is implemented via a subclass of
	214	:class:`set`, called a :class:`Dependency`. A Dependency is just a set of `msg_ids`
	215	corresponding to tasks, and a few attributes to guide how to decide when the Dependency
	216	has been met.
	217
	218	The switches we provide for interpreting whether a given dependency set has been met:
	219
	220	any\|all
	221	Whether the dependency is considered met if any of the dependencies are done, or
	222	only after all of them have finished. This is set by a Dependency's :attr:`all`
	223	boolean attribute, which defaults to ``True``.
	224
	225	success_only
	226	Whether to consider only tasks that did not raise an error as being fulfilled.
	227	Sometimes you want to run a task after another, but only if that task succeeded. In
	228	this case, ``success_only`` should be ``True``. However sometimes you may not care
	229	whether the task succeeds, and always want the second task to run, in which case
	230	you should use `success_only=False`. The default behavior is to only use successes.
	231
	232	There are other switches for interpretation that are made at the task level. These are
	233	specified via keyword arguments to the client's :meth:`apply` method.
	234
	235	after,follow
	236	You may want to run a task after a given set of dependencies have been run and/or
	237	run it where another set of dependencies are met. To support this, every task has an
	238	`after` dependency to restrict time, and a `follow` dependency to restrict
	239	destination.
	240
	241	timeout
	242	You may also want to set a time-limit for how long the scheduler should wait before a
	243	task's dependencies are met. This is done via a `timeout`, which defaults to 0, which
	244	indicates that the task should never timeout. If the timeout is reached, and the
	245	scheduler still hasn't been able to assign the task to an engine, the task will fail
	246	with a :class:`DependencyTimeout`.
	247
	248	.. note::
	249
	250	Dependencies only work within the task scheduler. You cannot instruct a load-balanced
	251	task to run after a job submitted via the MUX interface.
	252
	253	The simplest form of Dependencies is with `all=True,success_only=True`. In these cases,
	254	you can skip using Dependency objects, and just pass msg_ids or AsyncResult objects as the
	255	`follow` and `after` keywords to :meth:`client.apply`:
	256
	257	.. sourcecode:: ipython
	258
	259	In [14]: client.block=False
	260
	261	In [15]: ar = client.apply(f, args, kwargs, targets=None)
	262
	263	In [16]: ar2 = client.apply(f2, targets=None)
	264
	265	In [17]: ar3 = client.apply(f3, after=[ar,ar2])
	266
	267	In [17]: ar4 = client.apply(f3, follow=[ar], timeout=2.5)
	268
	269
	270	.. seealso::
	271
	272	Some parallel workloads can be described as a `Directed Acyclic Graph
	273	<http://en.wikipedia.org/wiki/Directed_acyclic_graph>`_, or DAG. See :ref:`DAG
	274	Dependencies <dag_dependencies>` for an example demonstrating how to use map a NetworkX DAG
	275	onto task dependencies.
	276
	277
	278
	279	Impossible Dependencies
	280	***********************
	281
	282	The schedulers do perform some analysis on graph dependencies to determine whether they
	283	are not possible to be met. If the scheduler does discover that a dependency cannot be
	284	met, then the task will fail with an :class:`ImpossibleDependency` error. This way, if the
	285	scheduler realized that a task can never be run, it won't sit indefinitely in the
	286	scheduler clogging the pipeline.
	287
	288	The basic cases that are checked:
	289
	290	* depending on nonexistent messages
	291	* `follow` dependencies were run on more than one machine and `all=True`
	292	* any dependencies failed and `all=True,success_only=True`
	293	* all dependencies failed and `all=False,success_only=True`
	294
	295	.. warning::
	296
	297	This analysis has not been proven to be rigorous, so it is likely possible for tasks
	298	to become impossible to run in obscure situations, so a timeout may be a good choice.
	299
	300	Schedulers
	301	==========
	302
	303	There are a variety of valid ways to determine where jobs should be assigned in a
	304	load-balancing situation. In IPython, we support several standard schemes, and
	305	even make it easy to define your own. The scheme can be selected via the ``--scheme``
	306	argument to :command:`ipcontrollerz`, or in the :attr:`HubFactory.scheme` attribute
	307	of a controller config object.
	308
	309	The built-in routing schemes:
	310
	311	lru: Least Recently Used
	312
	313	Always assign work to the least-recently-used engine. A close relative of
	314	round-robin, it will be fair with respect to the number of tasks, agnostic
	315	with respect to runtime of each task.
	316
	317	plainrandom: Plain Random
	318	Randomly picks an engine on which to run.
	319
	320	twobin: Two-Bin Random
	321
	322	Depends on numpy
	323
	324	Pick two engines at random, and use the LRU of the two. This is known to be better
	325	than plain random in many cases, but requires a small amount of computation.
	326
	327	leastload: Least Load
	328
	329	This is the default scheme
	330
	331	Always assign tasks to the engine with the fewest outstanding tasks (LRU breaks tie).
	332
	333	weighted: Weighted Two-Bin Random
	334
	335	Depends on numpy
	336
	337	Pick two engines at random using the number of outstanding tasks as inverse weights,
	338	and use the one with the lower load.
	339
	340
	341	Pure ZMQ Scheduler
	342	------------------
	343
	344	For maximum throughput, the 'pure' scheme is not Python at all, but a C-level
	345	:class:`MonitoredQueue` from PyZMQ, which uses a ZeroMQ ``XREQ`` socket to perform all
	346	load-balancing. This scheduler does not support any of the advanced features of the Python
	347	:class:`.Scheduler`.
	348
	349	Disabled features when using the ZMQ Scheduler:
	350
	351	* Engine unregistration
	352	Task farming will be disabled if an engine unregisters.
	353	Further, if an engine is unregistered during computation, the scheduler may not recover.
	354	* Dependencies
	355	Since there is no Python logic inside the Scheduler, routing decisions cannot be made
	356	based on message content.
	357	* Early destination notification
	358	The Python schedulers know which engine gets which task, and notify the Hub. This
	359	allows graceful handling of Engines coming and going. There is no way to know
	360	where ZeroMQ messages have gone, so there is no way to know what tasks are on which
	361	engine until they finish. This makes recovery from engine shutdown very difficult.
	362
	363
	364	.. note::
	365
	366	TODO: performance comparisons
	367
	368
105	369	More details
106	370	============
107	371
		@@ -125,8 +389,7 b' The following is an overview of how to use these classes together:'
125	389	tasks, or use the :meth:`AsyncResult.get` method of the results to wait
126	390	for and then receive the results.
127	391
128		We are in the process of developing more detailed information about the task
129		interface. For now, the docstrings of the :meth:`Client.apply`,
130		and :func:`depend` methods should be consulted.
131	392
	393	.. seealso::
132	394
	395	A demo of :ref:`DAG Dependencies <dag_dependencies>` with NetworkX and IPython.

docs/source/parallelz/parallel_winhpc.txt

0 +5 -5

              start IPython's interactive shell and you should see something like the
              following screenshot:
-             .. image:: ipython_shell.*
+             .. image:: ../parallel/ipython_shell.*
              Starting an IPython cluster
              ===========================
              "IPython cluster: started". The result should look something like the following
              screenshot:
-             .. image:: ipclusterz_start.*
+             .. image:: ../parallel/ipcluster_start.*
              At this point, the controller and two engines are running on your local host.
              This configuration is useful for testing and for situations where you want to
              :command:`ipclusterz` prints out the location of the newly created cluster
              directory.
-             .. image:: ipclusterz_create.*
+             .. image:: ../parallel/ipcluster_create.*
              Configuring a cluster profile
              -----------------------------
              following screenshot shows what the HPC Job Manager interface looks like
              with a running IPython cluster.
-             .. image:: hpc_job_manager.*
+             .. image:: ../parallel/hpc_job_manager.*
              Performing a simple interactive parallel computation
              ====================================================
              function, but runs the calculation in parallel. More involved examples of using
              :class:`MultiEngineClient` are provided in the examples that follow.
-             .. image:: mec_simple.*
+             .. image:: ../parallel/mec_simple.*

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages