upstream/ipython Commit - r3624:75d9c512

dependency tweaks + dependency/scheduler docs

MinRK -

r3624:75d9c512

parent child

docs/source/parallelz/dag_dependencies.txt

0 created 644 +172 0

@@ -0,0 +1,172 b''
	1	.. _dag_dependencies:
	2
	3	================
	4	DAG Dependencies
	5	================
	6
	7	Often, parallel workflow is described in terms of a `Directed Acyclic Graph
	8	<http://en.wikipedia.org/wiki/Directed_acyclic_graph>`_ or DAG. A popular library
	9	for working with Graphs is NetworkX_. Here, we will walk through a demo mapping
	10	a nx DAG to task dependencies.
	11
	12	The full script that runs this demo can be found in
	13	:file:`docs/examples/newparallel/dagdeps.py`.
	14
	15	Why are DAGs good for task dependencies?
	16	----------------------------------------
	17
	18	The 'G' in DAG is 'Graph'. A Graph is a collection of nodes and edges that connect
	19	the nodes. For our purposes, each node would be a task, and each edge would be a
	20	dependency. The 'D' in DAG stands for 'Directed'. This means that each edge has a
	21	direction associated with it. So we can interpret the edge (a,b) as meaning that b depends
	22	on a, whereas the edge (b,a) would mean a depends on b. The 'A' is 'Acyclic', meaning that
	23	there must not be any closed loops in the graph. This is important for dependencies,
	24	because if a loop were closed, then a task could ultimately depend on itself, and never be
	25	able to run. If your workflow can be described as a DAG, then it is impossible for your
	26	dependencies to cause a deadlock.
	27
	28	A Sample DAG
	29	------------
	30
	31	Here, we have a very simple 5-node DAG:
	32
	33	.. figure:: simpledag.*
	34
	35	With NetworkX, an arrow is just a fattened bit on the edge. Here, we can see that task 0
	36	depends on nothing, and can run immediately. 1 and 2 depend on 0; 3 depends on
	37	1 and 2; and 4 depends only on 1.
	38
	39	A possible sequence of events for this workflow:
	40
	41	0. Task 0 can run right away
	42	1. 0 finishes, so 1,2 can start
	43	2. 1 finishes, 3 is still waiting on 2, but 4 can start right away
	44	3. 2 finishes, and 3 can finally start
	45
	46
	47	Further, taking failures into account, assuming all dependencies are run with the default
	48	`success_only=True`, the following cases would occur for each node's failure:
	49
	50	0. fails: all other tasks fail as Impossible
	51	1. 2 can still succeed, but 3,4 are unreachable
	52	2. 3 becomes unreachable, but 4 is unaffected
	53	3. and 4. are terminal, and can have no effect on other nodes
	54
	55	The code to generate the simple DAG:
	56
	57	.. sourcecode:: python
	58
	59	import networkx as nx
	60
	61	G = nx.DiGraph()
	62
	63	# add 5 nodes, labeled 0-4:
	64	map(G.add_node, range(5))
	65	# 1,2 depend on 0:
	66	G.add_edge(0,1)
	67	G.add_edge(0,2)
	68	# 3 depends on 1,2
	69	G.add_edge(1,3)
	70	G.add_edge(2,3)
	71	# 4 depends on 1
	72	G.add_edge(1,4)
	73
	74	# now draw the graph:
	75	pos = { 0 : (0,0), 1 : (1,1), 2 : (-1,1),
	76	3 : (0,2), 4 : (2,2)}
	77	nx.draw(G, pos, edge_color='r')
	78
	79
	80	For demonstration purposes, we have a function that generates a random DAG with a given
	81	number of nodes and edges.
	82
	83	.. literalinclude:: ../../examples/newparallel/dagdeps.py
	84	:language: python
	85	:lines: 20-36
	86
	87	So first, we start with a graph of 32 nodes, with 128 edges:
	88
	89	.. sourcecode:: ipython
	90
	91	In [2]: G = random_dag(32,128)
	92
	93	Now, we need to build our dict of jobs corresponding to the nodes on the graph:
	94
	95	.. sourcecode:: ipython
	96
	97	In [3]: jobs = {}
	98
	99	# in reality, each job would presumably be different
	100	# randomwait is just a function that sleeps for a random interval
	101	In [4]: for node in G:
	102	...: jobs[node] = randomwait
	103
	104	Once we have a dict of jobs matching the nodes on the graph, we can start submitting jobs,
	105	and linking up the dependencies. Since we don't know a job's msg_id until it is submitted,
	106	which is necessary for building dependencies, it is critical that we don't submit any jobs
	107	before other jobs it may depend on. Fortunately, NetworkX provides a
	108	:meth:`topological_sort` method which ensures exactly this. It presents an iterable, that
	109	guarantees that when you arrive at a node, you have already visited all the nodes it
	110	on which it depends:
	111
	112	.. sourcecode:: ipython
	113
	114	In [5]: c = client.Client()
	115
	116	In [6]: results = {}
	117
	118	In [7]: for node in G.topological_sort():
	119	...: # get list of AsyncResult objects from nodes
	120	...: # leading into this one as dependencies
	121	...: deps = [ results[n] for n in G.predecessors(node) ]
	122	...: # submit and store AsyncResult object
	123	...: results[node] = client.apply(jobs[node], after=deps, block=False)
	124
	125	Now that we have submitted all the jobs, we can wait for the results:
	126
	127	.. sourcecode:: ipython
	128
	129	In [8]: [ r.get() for r in results.values() ]
	130
	131	Now, at least we know that all the jobs ran and did not fail (``r.get()`` would have
	132	raised an error if a task failed). But we don't know that the ordering was properly
	133	respected. For this, we can use the :attr:`metadata` attribute of each AsyncResult.
	134
	135	These objects store a variety of metadata about each task, including various timestamps.
	136	We can validate that the dependencies were respected by checking that each task was
	137	started after all of its predecessors were completed:
	138
	139	.. literalinclude:: ../../examples/newparallel/dagdeps.py
	140	:language: python
	141	:lines: 64-70
	142
	143	We can also validate the graph visually. By drawing the graph with each node's x-position
	144	as its start time, all arrows must be pointing to the right if the order was respected.
	145	For spreading, the y-position will be the in-degree, so tasks with lots of dependencies
	146	will be at the top, and tasks with few dependencies will be at the bottom.
	147
	148	.. sourcecode:: ipython
	149
	150	In [10]: from matplotlib.dates import date2num
	151
	152	In [11]: from matplotlib.cm import gist_rainbow
	153
	154	In [12]: pos = {}; colors = {}
	155
	156	In [12]: for node in G:
	157	...: md = results[node].metadata
	158	...: start = date2num(md.started)
	159	...: runtime = date2num(md.completed) - start
	160	...: pos[node] = (start, runtime)
	161	...: colors[node] = md.engine_id
	162
	163	In [13]: nx.draw(G, pos, node_list=colors.keys(), node_color=colors.values(),
	164	...: cmap=gist_rainbow)
	165
	166	.. figure:: dagdeps.*
	167
	168	Time started on x, runtime on y, and color-coded by engine-id (in this case there
	169	were four engines).
	170
	171
	172	.. _NetworkX: http://networkx.lanl.gov/

docs/source/parallelz/dagdeps.pdf

0 created 644 binary 0 0

NO CONTENT: new file 100644, binary diff hidden

docs/source/parallelz/dagdeps.png

0 created 644 binary 0 0

NO CONTENT: new file 100644, binary diff hidden

docs/source/parallelz/simpledag.pdf

0 created 644 binary 0 0

NO CONTENT: new file 100644, binary diff hidden

docs/source/parallelz/simpledag.png

0 created 644 binary 0 0

NO CONTENT: new file 100644, binary diff hidden

IPython/zmq/parallel/client.py

0 +8 -12

             """A semi-synchronous Client for the ZMQ controller"""
             #-----------------------------------------------------------------------------
             #  Copyright (C) 2010  The IPython Development Team
             #
             #  Distributed under the terms of the BSD License.  The full license is in
             #  the file COPYING, distributed as part of this software.
             #-----------------------------------------------------------------------------
             #-----------------------------------------------------------------------------
             # Imports
             #-----------------------------------------------------------------------------
             import os
             import time
             from getpass import getpass
             from pprint import pprint
             from datetime import datetime
             import warnings
             import json
             pjoin = os.path.join
             import zmq
             from zmq.eventloop import ioloop, zmqstream
             from IPython.utils.path import get_ipython_dir
             from IPython.external.decorator import decorator
             from IPython.external.ssh import tunnel
             import streamsession as ss
             from clusterdir import ClusterDir, ClusterDirError
             # from remotenamespace import RemoteNamespace
             from view import DirectView, LoadBalancedView
             from dependency import Dependency, depend, require, dependent
             import error
             import map as Map
             from asyncresult import AsyncResult, AsyncMapResult
             from remotefunction import remote,parallel,ParallelFunction,RemoteFunction
             from util import ReverseDict, disambiguate_url, validate_url
             #--------------------------------------------------------------------------
             # helpers for implementing old MEC API via client.apply
             #--------------------------------------------------------------------------
             def _push(ns):
                 """helper method for implementing `client.push` via `client.apply`"""
                 globals().update(ns)
             def _pull(keys):
                 """helper method for implementing `client.pull` via `client.apply`"""
                 g = globals()
                 if isinstance(keys, (list,tuple, set)):
                     for key in keys:
                         if not g.has_key(key):
                             raise NameError("name '%s' is not defined"%key)
                     return map(g.get, keys)
                 else:
                     if not g.has_key(keys):
                         raise NameError("name '%s' is not defined"%keys)
                     return g.get(keys)
             def _clear():
                 """helper method for implementing `client.clear` via `client.apply`"""
                 globals().clear()
             def _execute(code):
                 """helper method for implementing `client.execute` via `client.apply`"""
                 exec code in globals()
             #--------------------------------------------------------------------------
             # Decorators for Client methods
             #--------------------------------------------------------------------------
             @decorator
             def spinfirst(f, self, *args, **kwargs):
                 """Call spin() to sync state prior to calling the method."""
                 self.spin()
                 return f(self, *args, **kwargs)
             @decorator
             def defaultblock(f, self, *args, **kwargs):
                 """Default to self.block; preserve self.block."""
                 block = kwargs.get('block',None)
                 block = self.block if block is None else block
                 saveblock = self.block
                 self.block = block
                 try:
                     ret = f(self, *args, **kwargs)
                 finally:
                     self.block = saveblock
                 return ret
             #--------------------------------------------------------------------------
             # Classes
             #--------------------------------------------------------------------------
             class Metadata(dict):
                 """Subclass of dict for initializing metadata values.
                 Attribute access works on keys.
                 These objects have a strict set of keys - errors will raise if you try
                 to add new keys.
                 """
                 def __init__(self, *args, **kwargs):
                     dict.__init__(self)
                     md = {'msg_id' : None,
                           'submitted' : None,
                           'started' : None,
                           'completed' : None,
                           'received' : None,
                           'engine_uuid' : None,
                           'engine_id' : None,
                           'follow' : None,
                           'after' : None,
                           'status' : None,
                           'pyin' : None,
                           'pyout' : None,
                           'pyerr' : None,
                           'stdout' : '',
                           'stderr' : '',
                         }
                     self.update(md)
                     self.update(dict(*args, **kwargs))
                 def __getattr__(self, key):
                     """getattr aliased to getitem"""
                     if key in self.iterkeys():
                         return self[key]
                     else:
                         raise AttributeError(key)
                 def __setattr__(self, key, value):
                     """setattr aliased to setitem, with strict"""
                     if key in self.iterkeys():
                         self[key] = value
                     else:
                         raise AttributeError(key)
                 def __setitem__(self, key, value):
                     """strict static key enforcement"""
                     if key in self.iterkeys():
                         dict.__setitem__(self, key, value)
                     else:
                         raise KeyError(key)
             class Client(object):
                 """A semi-synchronous client to the IPython ZMQ controller
                 Parameters
                 ----------
                 url_or_file : bytes; zmq url or path to ipcontroller-client.json
                     Connection information for the Hub's registration.  If a json connector
                     file is given, then likely no further configuration is necessary.
                     [Default: use profile]
                 profile : bytes
                     The name of the Cluster profile to be used to find connector information.
                     [Default: 'default']
                 context : zmq.Context
                     Pass an existing zmq.Context instance, otherwise the client will create its own.
                 username : bytes
                     set username to be passed to the Session object
                 debug : bool
                     flag for lots of message printing for debug purposes
                 #-------------- ssh related args ----------------
                 # These are args for configuring the ssh tunnel to be used
                 # credentials are used to forward connections over ssh to the Controller
                 # Note that the ip given in `addr` needs to be relative to sshserver
                 # The most basic case is to leave addr as pointing to localhost (127.0.0.1),
                 # and set sshserver as the same machine the Controller is on. However,
                 # the only requirement is that sshserver is able to see the Controller
                 # (i.e. is within the same trusted network).
                 sshserver : str
                     A string of the form passed to ssh, i.e. 'server.tld' or 'user@server.tld:port'
                     If keyfile or password is specified, and this is not, it will default to
                     the ip given in addr.
                 sshkey : str; path to public ssh key file
                     This specifies a key to be used in ssh login, default None.
                     Regular default ssh keys will be used without specifying this argument.
                 password : str
                     Your ssh password to sshserver. Note that if this is left None,
                     you will be prompted for it if passwordless key based login is unavailable.
                 paramiko : bool
                     flag for whether to use paramiko instead of shell ssh for tunneling.
                     [default: True on win32, False else]
                 #------- exec authentication args -------
                 # If even localhost is untrusted, you can have some protection against
                 # unauthorized execution by using a key.  Messages are still sent
                 # as cleartext, so if someone can snoop your loopback traffic this will
                 # not help against malicious attacks.
                 exec_key : str
                     an authentication key or file containing a key
                     default: None
                 Attributes
                 ----------
                 ids : set of int engine IDs
                     requesting the ids attribute always synchronizes
                     the registration state. To request ids without synchronization,
                     use semi-private _ids attributes.
                 history : list of msg_ids
                     a list of msg_ids, keeping track of all the execution
                     messages you have submitted in order.
                 outstanding : set of msg_ids
                     a set of msg_ids that have been submitted, but whose
                     results have not yet been received.
                 results : dict
                     a dict of all our results, keyed by msg_id
                 block : bool
                     determines default behavior when block not specified
                     in execution methods
                 Methods
                 -------
                 spin : flushes incoming results and registration state changes
                         control methods spin, and requesting `ids` also ensures up to date
                 barrier : wait on one or more msg_ids
                 execution methods: apply/apply_bound/apply_to/apply_bound
                     legacy: execute, run
                 query methods: queue_status, get_result, purge
                 control methods: abort, kill
                 """
                 _connected=False
                 _ssh=False
                 _engines=None
                 _registration_socket=None
                 _query_socket=None
                 _control_socket=None
                 _iopub_socket=None
                 _notification_socket=None
                 _mux_socket=None
                 _task_socket=None
                 _task_scheme=None
                 block = False
                 outstanding=None
                 results = None
                 history = None
                 debug = False
                 targets = None
                 def __init__(self, url_or_file=None, profile='default', cluster_dir=None, ipython_dir=None,
                         context=None, username=None, debug=False, exec_key=None,
                         sshserver=None, sshkey=None, password=None, paramiko=None,
                         ):
                     if context is None:
                         context = zmq.Context()
                     self.context = context
                     self.targets = 'all'
                     self._setup_cluster_dir(profile, cluster_dir, ipython_dir)
                     if self._cd is not None:
                         if url_or_file is None:
                             url_or_file = pjoin(self._cd.security_dir, 'ipcontroller-client.json')
                     assert url_or_file is not None, "I can't find enough information to connect to a controller!"\
                         " Please specify at least one of url_or_file or profile."
                     try:
                         validate_url(url_or_file)
                     except AssertionError:
                         if not os.path.exists(url_or_file):
                             if self._cd:
                                 url_or_file = os.path.join(self._cd.security_dir, url_or_file)
                             assert os.path.exists(url_or_file), "Not a valid connection file or url: %r"%url_or_file
                         with open(url_or_file) as f:
                             cfg = json.loads(f.read())
                     else:
                         cfg = {'url':url_or_file}
                     # sync defaults from args, json:
                     if sshserver:
                         cfg['ssh'] = sshserver
                     if exec_key:
                         cfg['exec_key'] = exec_key
                     exec_key = cfg['exec_key']
                     sshserver=cfg['ssh']
                     url = cfg['url']
                     location = cfg.setdefault('location', None)
                     cfg['url'] = disambiguate_url(cfg['url'], location)
                     url = cfg['url']
                     self._config = cfg
                     self._ssh = bool(sshserver or sshkey or password)
                     if self._ssh and sshserver is None:
                         # default to ssh via localhost
                         sshserver = url.split('://')[1].split(':')[0]
                     if self._ssh and password is None:
                         if tunnel.try_passwordless_ssh(sshserver, sshkey, paramiko):
                             password=False
                         else:
                             password = getpass("SSH Password for %s: "%sshserver)
                     ssh_kwargs = dict(keyfile=sshkey, password=password, paramiko=paramiko)
                     if exec_key is not None and os.path.isfile(exec_key):
                         arg = 'keyfile'
                     else:
                         arg = 'key'
                     key_arg = {arg:exec_key}
                     if username is None:
                         self.session = ss.StreamSession(**key_arg)
                     else:
                         self.session = ss.StreamSession(username, **key_arg)
                     self._registration_socket = self.context.socket(zmq.XREQ)
                     self._registration_socket.setsockopt(zmq.IDENTITY, self.session.session)
                     if self._ssh:
                         tunnel.tunnel_connection(self._registration_socket, url, sshserver, **ssh_kwargs)
                     else:
                         self._registration_socket.connect(url)
                     self._engines = ReverseDict()
                     self._ids = set()
                     self.outstanding=set()
                     self.results = {}
                     self.metadata = {}
                     self.history = []
                     self.debug = debug
                     self.session.debug = debug
                     self._notification_handlers = {'registration_notification' : self._register_engine,
                                                 'unregistration_notification' : self._unregister_engine,
                                                 }
                     self._queue_handlers = {'execute_reply' : self._handle_execute_reply,
                                             'apply_reply' : self._handle_apply_reply}
                     self._connect(sshserver, ssh_kwargs)
                 def _setup_cluster_dir(self, profile, cluster_dir, ipython_dir):
                     if ipython_dir is None:
                         ipython_dir = get_ipython_dir()
                     if cluster_dir is not None:
                         try:
                             self._cd = ClusterDir.find_cluster_dir(cluster_dir)
                         except ClusterDirError:
                             pass
                     elif profile is not None:
                         try:
                             self._cd = ClusterDir.find_cluster_dir_by_profile(
                                 ipython_dir, profile)
                         except ClusterDirError:
                             pass
                     else:
                         self._cd = None
                 @property
                 def ids(self):
                     """Always up-to-date ids property."""
                     self._flush_notifications()
                     return self._ids
                 def _update_engines(self, engines):
                     """Update our engines dict and _ids from a dict of the form: {id:uuid}."""
                     for k,v in engines.iteritems():
                         eid = int(k)
                         self._engines[eid] = bytes(v) # force not unicode
                         self._ids.add(eid)
                     if sorted(self._engines.keys()) != range(len(self._engines)) and \
                                     self._task_scheme == 'pure' and self._task_socket:
                         self._stop_scheduling_tasks()
                 def _stop_scheduling_tasks(self):
                     """Stop scheduling tasks because an engine has been unregistered
                     from a pure ZMQ scheduler.
                     """
                     self._task_socket.close()
                     self._task_socket = None
                     msg = "An engine has been unregistered, and we are using pure " +\
                           "ZMQ task scheduling.  Task farming will be disabled."
                     if self.outstanding:
                         msg += " If you were running tasks when this happened, " +\
                                "some `outstanding` msg_ids may never resolve."
                     warnings.warn(msg, RuntimeWarning)
                 def _build_targets(self, targets):
                     """Turn valid target IDs or 'all' into two lists:
                     (int_ids, uuids).
                     """
                     if targets is None:
                         targets = self._ids
                     elif isinstance(targets, str):
                         if targets.lower() == 'all':
                             targets = self._ids
                         else:
                             raise TypeError("%r not valid str target, must be 'all'"%(targets))
                     elif isinstance(targets, int):
                         targets = [targets]
                     return [self._engines[t] for t in targets], list(targets)
                 def _connect(self, sshserver, ssh_kwargs):
                     """setup all our socket connections to the controller. This is called from
                     __init__."""
                     # Maybe allow reconnecting?
                     if self._connected:
                         return
                     self._connected=True
                     def connect_socket(s, url):
                         url = disambiguate_url(url, self._config['location'])
                         if self._ssh:
                             return tunnel.tunnel_connection(s, url, sshserver, **ssh_kwargs)
                         else:
                             return s.connect(url)
                     self.session.send(self._registration_socket, 'connection_request')
                     idents,msg = self.session.recv(self._registration_socket,mode=0)
                     if self.debug:
                         pprint(msg)
                     msg = ss.Message(msg)
                     content = msg.content
                     self._config['registration'] = dict(content)
                     if content.status == 'ok':
                         if content.mux:
                             self._mux_socket = self.context.socket(zmq.PAIR)
                             self._mux_socket.setsockopt(zmq.IDENTITY, self.session.session)
                             connect_socket(self._mux_socket, content.mux)
                         if content.task:
                             self._task_scheme, task_addr = content.task
                             self._task_socket = self.context.socket(zmq.PAIR)
                             self._task_socket.setsockopt(zmq.IDENTITY, self.session.session)
                             connect_socket(self._task_socket, task_addr)
                         if content.notification:
                             self._notification_socket = self.context.socket(zmq.SUB)
                             connect_socket(self._notification_socket, content.notification)
                             self._notification_socket.setsockopt(zmq.SUBSCRIBE, "")
                         if content.query:
                             self._query_socket = self.context.socket(zmq.PAIR)
                             self._query_socket.setsockopt(zmq.IDENTITY, self.session.session)
                             connect_socket(self._query_socket, content.query)
                         if content.control:
                             self._control_socket = self.context.socket(zmq.PAIR)
                             self._control_socket.setsockopt(zmq.IDENTITY, self.session.session)
                             connect_socket(self._control_socket, content.control)
                         if content.iopub:
                             self._iopub_socket = self.context.socket(zmq.SUB)
                             self._iopub_socket.setsockopt(zmq.SUBSCRIBE, '')
                             self._iopub_socket.setsockopt(zmq.IDENTITY, self.session.session)
                             connect_socket(self._iopub_socket, content.iopub)
                         self._update_engines(dict(content.engines))
                     else:
                         self._connected = False
                         raise Exception("Failed to connect!")
                 #--------------------------------------------------------------------------
                 # handlers and callbacks for incoming messages
                 #--------------------------------------------------------------------------
                 def _register_engine(self, msg):
                     """Register a new engine, and update our connection info."""
                     content = msg['content']
                     eid = content['id']
                     d = {eid : content['queue']}
                     self._update_engines(d)
                     self._ids.add(int(eid))
                 def _unregister_engine(self, msg):
                     """Unregister an engine that has died."""
                     content = msg['content']
                     eid = int(content['id'])
                     if eid in self._ids:
                         self._ids.remove(eid)
                         self._engines.pop(eid)
                     if self._task_socket and self._task_scheme == 'pure':
                         self._stop_scheduling_tasks()
                 def _extract_metadata(self, header, parent, content):
                     md = {'msg_id' : parent['msg_id'],
                           'received' : datetime.now(),
                           'engine_uuid' : header.get('engine', None),
                           'follow' : parent.get('follow', []),
                           'after' : parent.get('after', []),
                           'status' : content['status'],
                         }
                     if md['engine_uuid'] is not None:
                         md['engine_id'] = self._engines.get(md['engine_uuid'], None)
                     if 'date' in parent:
                         md['submitted'] = datetime.strptime(parent['date'], ss.ISO8601)
                     if 'started' in header:
                         md['started'] = datetime.strptime(header['started'], ss.ISO8601)
                     if 'date' in header:
                         md['completed'] = datetime.strptime(header['date'], ss.ISO8601)
                     return md
                 def _handle_execute_reply(self, msg):
                     """Save the reply to an execute_request into our results.
                     execute messages are never actually used. apply is used instead.
                     """
                     parent = msg['parent_header']
                     msg_id = parent['msg_id']
                     if msg_id not in self.outstanding:
                         if msg_id in self.history:
                             print ("got stale result: %s"%msg_id)
                         else:
                             print ("got unknown result: %s"%msg_id)
                     else:
                         self.outstanding.remove(msg_id)
                     self.results[msg_id] = ss.unwrap_exception(msg['content'])
                 def _handle_apply_reply(self, msg):
                     """Save the reply to an apply_request into our results."""
                     parent = msg['parent_header']
                     msg_id = parent['msg_id']
                     if msg_id not in self.outstanding:
                         if msg_id in self.history:
                             print ("got stale result: %s"%msg_id)
                             print self.results[msg_id]
                             print msg
                         else:
                             print ("got unknown result: %s"%msg_id)
                     else:
                         self.outstanding.remove(msg_id)
                     content = msg['content']
                     header = msg['header']
                     # construct metadata:
                     md = self.metadata.setdefault(msg_id, Metadata())
                     md.update(self._extract_metadata(header, parent, content))
                     self.metadata[msg_id] = md
                     # construct result:
                     if content['status'] == 'ok':
                         self.results[msg_id] = ss.unserialize_object(msg['buffers'])[0]
                     elif content['status'] == 'aborted':
                         self.results[msg_id] = error.AbortedTask(msg_id)
                     elif content['status'] == 'resubmitted':
                         # TODO: handle resubmission
                         pass
                     else:
                         e = ss.unwrap_exception(content)
                         if e.engine_info:
                             e_uuid = e.engine_info['engineid']
                             eid = self._engines[e_uuid]
                             e.engine_info['engineid'] = eid
                         self.results[msg_id] = e
                 def _flush_notifications(self):
                     """Flush notifications of engine registrations waiting
                     in ZMQ queue."""
                     msg = self.session.recv(self._notification_socket, mode=zmq.NOBLOCK)
                     while msg is not None:
                         if self.debug:
                             pprint(msg)
                         msg = msg[-1]
                         msg_type = msg['msg_type']
                         handler = self._notification_handlers.get(msg_type, None)
                         if handler is None:
                             raise Exception("Unhandled message type: %s"%msg.msg_type)
                         else:
                             handler(msg)
                         msg = self.session.recv(self._notification_socket, mode=zmq.NOBLOCK)
                 def _flush_results(self, sock):
                     """Flush task or queue results waiting in ZMQ queue."""
                     msg = self.session.recv(sock, mode=zmq.NOBLOCK)
                     while msg is not None:
                         if self.debug:
                             pprint(msg)
                         msg = msg[-1]
                         msg_type = msg['msg_type']
                         handler = self._queue_handlers.get(msg_type, None)
                         if handler is None:
                             raise Exception("Unhandled message type: %s"%msg.msg_type)
                         else:
                             handler(msg)
                         msg = self.session.recv(sock, mode=zmq.NOBLOCK)
                 def _flush_control(self, sock):
                     """Flush replies from the control channel waiting
                     in the ZMQ queue.
                     Currently: ignore them."""
                     msg = self.session.recv(sock, mode=zmq.NOBLOCK)
                     while msg is not None:
                         if self.debug:
                             pprint(msg)
                         msg = self.session.recv(sock, mode=zmq.NOBLOCK)
                 def _flush_iopub(self, sock):
                     """Flush replies from the iopub channel waiting
                     in the ZMQ queue.
                     """
                     msg = self.session.recv(sock, mode=zmq.NOBLOCK)
                     while msg is not None:
                         if self.debug:
                             pprint(msg)
                         msg = msg[-1]
                         parent = msg['parent_header']
                         msg_id = parent['msg_id']
                         content = msg['content']
                         header = msg['header']
                         msg_type = msg['msg_type']
                         # init metadata:
                         md = self.metadata.setdefault(msg_id, Metadata())
                         if msg_type == 'stream':
                             name = content['name']
                             s = md[name] or ''
                             md[name] = s + content['data']
                         elif msg_type == 'pyerr':
                             md.update({'pyerr' : ss.unwrap_exception(content)})
                         else:
                             md.update({msg_type : content['data']})
                         self.metadata[msg_id] = md
                         msg = self.session.recv(sock, mode=zmq.NOBLOCK)
                 #--------------------------------------------------------------------------
                 # getitem
                 #--------------------------------------------------------------------------
                 def __getitem__(self, key):
                     """Dict access returns DirectView multiplexer objects or,
                     if key is None, a LoadBalancedView."""
                     if key is None:
                         return LoadBalancedView(self)
                     if isinstance(key, int):
                         if key not in self.ids:
                             raise IndexError("No such engine: %i"%key)
                         return DirectView(self, key)
                     if isinstance(key, slice):
                         indices = range(len(self.ids))[key]
                         ids = sorted(self._ids)
                         key = [ ids[i] for i in indices ]
                         # newkeys = sorted(self._ids)[thekeys[k]]
                     if isinstance(key, (tuple, list, xrange)):
                         _,targets = self._build_targets(list(key))
                         return DirectView(self, targets)
                     else:
                         raise TypeError("key by int/iterable of ints only, not %s"%(type(key)))
                 #--------------------------------------------------------------------------
                 # Begin public methods
                 #--------------------------------------------------------------------------
                 @property
                 def remote(self):
                     """property for convenient RemoteFunction generation.
                     >>> @client.remote
                     ... def f():
                             import os
                             print (os.getpid())
                     """
                     return remote(self, block=self.block)
                 def spin(self):
                     """Flush any registration notifications and execution results
                     waiting in the ZMQ queue.
                     """
                     if self._notification_socket:
                         self._flush_notifications()
                     if self._mux_socket:
                         self._flush_results(self._mux_socket)
                     if self._task_socket:
                         self._flush_results(self._task_socket)
                     if self._control_socket:
                         self._flush_control(self._control_socket)
                     if self._iopub_socket:
                         self._flush_iopub(self._iopub_socket)
                 def barrier(self, msg_ids=None, timeout=-1):
                     """waits on one or more `msg_ids`, for up to `timeout` seconds.
                     Parameters
                     ----------
                     msg_ids : int, str, or list of ints and/or strs, or one or more AsyncResult objects
                             ints are indices to self.history
                             strs are msg_ids
                             default: wait on all outstanding messages
                     timeout : float
                             a time in seconds, after which to give up.
                             default is -1, which means no timeout
                     Returns
                     -------
                     True : when all msg_ids are done
                     False : timeout reached, some msg_ids still outstanding
                     """
                     tic = time.time()
                     if msg_ids is None:
                         theids = self.outstanding
                     else:
                         if isinstance(msg_ids, (int, str, AsyncResult)):
                             msg_ids = [msg_ids]
                         theids = set()
                         for msg_id in msg_ids:
                             if isinstance(msg_id, int):
                                 msg_id = self.history[msg_id]
                             elif isinstance(msg_id, AsyncResult):
                                 map(theids.add, msg_id.msg_ids)
                                 continue
                             theids.add(msg_id)
                     if not theids.intersection(self.outstanding):
                         return True
                     self.spin()
                     while theids.intersection(self.outstanding):
                         if timeout >= 0 and ( time.time()-tic ) > timeout:
                             break
                         time.sleep(1e-3)
                         self.spin()
                     return len(theids.intersection(self.outstanding)) == 0
                 #--------------------------------------------------------------------------
                 # Control methods
                 #--------------------------------------------------------------------------
                 @spinfirst
                 @defaultblock
                 def clear(self, targets=None, block=None):
                     """Clear the namespace in target(s)."""
                     targets = self._build_targets(targets)[0]
                     for t in targets:
                         self.session.send(self._control_socket, 'clear_request', content={}, ident=t)
                     error = False
                     if self.block:
                         for i in range(len(targets)):
                             idents,msg = self.session.recv(self._control_socket,0)
                             if self.debug:
                                 pprint(msg)
                             if msg['content']['status'] != 'ok':
                                 error = ss.unwrap_exception(msg['content'])
                     if error:
                         return error
                 @spinfirst
                 @defaultblock
                 def abort(self, msg_ids = None, targets=None, block=None):
                     """Abort the execution queues of target(s)."""
                     targets = self._build_targets(targets)[0]
                     if isinstance(msg_ids, basestring):
                         msg_ids = [msg_ids]
                     content = dict(msg_ids=msg_ids)
                     for t in targets:
                         self.session.send(self._control_socket, 'abort_request',
                                 content=content, ident=t)
                     error = False
                     if self.block:
                         for i in range(len(targets)):
                             idents,msg = self.session.recv(self._control_socket,0)
                             if self.debug:
                                 pprint(msg)
                             if msg['content']['status'] != 'ok':
                                 error = ss.unwrap_exception(msg['content'])
                     if error:
                         return error
                 @spinfirst
                 @defaultblock
                 def shutdown(self, targets=None, restart=False, controller=False, block=None):
                     """Terminates one or more engine processes, optionally including the controller."""
                     if controller:
                         targets = 'all'
                     targets = self._build_targets(targets)[0]
                     for t in targets:
                         self.session.send(self._control_socket, 'shutdown_request',
                                     content={'restart':restart},ident=t)
                     error = False
                     if block or controller:
                         for i in range(len(targets)):
                             idents,msg = self.session.recv(self._control_socket,0)
                             if self.debug:
                                 pprint(msg)
                             if msg['content']['status'] != 'ok':
                                 error = ss.unwrap_exception(msg['content'])
                     if controller:
                         time.sleep(0.25)
                         self.session.send(self._query_socket, 'shutdown_request')
                         idents,msg = self.session.recv(self._query_socket, 0)
                         if self.debug:
                             pprint(msg)
                         if msg['content']['status'] != 'ok':
                             error = ss.unwrap_exception(msg['content'])
                     if error:
                         raise error
                 #--------------------------------------------------------------------------
                 # Execution methods
                 #--------------------------------------------------------------------------
                 @defaultblock
                 def execute(self, code, targets='all', block=None):
                     """Executes `code` on `targets` in blocking or nonblocking manner.
                     ``execute`` is always `bound` (affects engine namespace)
                     Parameters
                     ----------
                     code : str
                             the code string to be executed
                     targets : int/str/list of ints/strs
                             the engines on which to execute
                             default : all
                     block : bool
                             whether or not to wait until done to return
                             default: self.block
                     """
                     result = self.apply(_execute, (code,), targets=targets, block=self.block, bound=True)
                     return result
                 def run(self, filename, targets='all', block=None):
                     """Execute contents of `filename` on engine(s).
                     This simply reads the contents of the file and calls `execute`.
                     Parameters
                     ----------
                     filename : str
                             The path to the file
                     targets : int/str/list of ints/strs
                             the engines on which to execute
                             default : all
                     block : bool
                             whether or not to wait until done
                             default: self.block
                     """
                     with open(filename, 'rb') as f:
                         code = f.read()
                     return self.execute(code, targets=targets, block=block)
                 def _maybe_raise(self, result):
                     """wrapper for maybe raising an exception if apply failed."""
                     if isinstance(result, error.RemoteError):
                         raise result
                     return result
                 def _build_dependency(self, dep):
                     """helper for building jsonable dependencies from various input forms"""
                     if isinstance(dep, Dependency):
                         return dep.as_dict()
                     elif isinstance(dep, AsyncResult):
                         return dep.msg_ids
                     elif dep is None:
                         return []
-                    elif isinstance(dep, set):
-                        return list(dep)
-                    elif isinstance(dep, (list,dict)):
-                        return dep
-                    elif isinstance(dep, str):
-                        return [dep]
                     else:
-                        raise TypeError("Dependency may be: set,list,dict,Dependency or AsyncResult, not %r"%type(dep))
+                        # pass to Dependency constructor
+                        return list(Dependency(dep))
                 def apply(self, f, args=None, kwargs=None, bound=True, block=None, targets=None,
                                     after=None, follow=None, timeout=None):
                     """Call `f(*args, **kwargs)` on a remote engine(s), returning the result.
                     This is the central execution command for the client.
                     Parameters
                     ----------
                     f : function
                         The fuction to be called remotely
                     args : tuple/list
                         The positional arguments passed to `f`
                     kwargs : dict
                         The keyword arguments passed to `f`
                     bound : bool (default: True)
                         Whether to execute in the Engine(s) namespace, or in a clean
                         namespace not affecting the engine.
                     block : bool (default: self.block)
                         Whether to wait for the result, or return immediately.
                         False:
                             returns AsyncResult
                         True:
                             returns actual result(s) of f(*args, **kwargs)
                             if multiple targets:
                                 list of results, matching `targets`
                     targets : int,list of ints, 'all', None
                         Specify the destination of the job.
                         if None:
                             Submit via Task queue for load-balancing.
                         if 'all':
                             Run on all active engines
                         if list:
                             Run on each specified engine
                         if int:
                             Run on single engine
                     after : Dependency or collection of msg_ids
                         Only for load-balanced execution (targets=None)
                         Specify a list of msg_ids as a time-based dependency.
                         This job will only be run *after* the dependencies
                         have been met.
                     follow : Dependency or collection of msg_ids
                         Only for load-balanced execution (targets=None)
                         Specify a list of msg_ids as a location-based dependency.
                         This job will only be run on an engine where this dependency
                         is met.
-                    timeout : float or None
+                    timeout : float/int or None
                         Only for load-balanced execution (targets=None)
-                        Specify an amount of time (in seconds)
+                        Specify an amount of time (in seconds) for the scheduler to
+                        wait for dependencies to be met before failing with a
+                        DependencyTimeout.
                     Returns
                     -------
                     if block is False:
                         return AsyncResult wrapping msg_ids
                         output of AsyncResult.get() is identical to that of `apply(...block=True)`
                     else:
                         if single target:
                             return result of `f(*args, **kwargs)`
                         else:
                             return list of results, matching `targets`
                     """
                     # defaults:
                     block = block if block is not None else self.block
                     args = args if args is not None else []
                     kwargs = kwargs if kwargs is not None else {}
                     # enforce types of f,args,kwrags
                     if not callable(f):
                         raise TypeError("f must be callable, not %s"%type(f))
                     if not isinstance(args, (tuple, list)):
                         raise TypeError("args must be tuple or list, not %s"%type(args))
                     if not isinstance(kwargs, dict):
                         raise TypeError("kwargs must be dict, not %s"%type(kwargs))
-                    after = self._build_dependency(after)
-                    follow = self._build_dependency(follow)
                     options  = dict(bound=bound, block=block)
                     if targets is None:
                         if self._task_socket:
                             return self._apply_balanced(f, args, kwargs, timeout=timeout,
                                                         after=after, follow=follow, **options)
                         else:
                             msg = "Task farming is disabled"
                             if self._task_scheme == 'pure':
                                 msg += " because the pure ZMQ scheduler cannot handle"
                                 msg += " disappearing engines."
                             raise RuntimeError(msg)
                     else:
                         return self._apply_direct(f, args, kwargs, targets=targets, **options)
                 def _apply_balanced(self, f, args, kwargs, bound=True, block=None,
                                         after=None, follow=None, timeout=None):
                     """The underlying method for applying functions in a load balanced
                     manner, via the task queue."""
                     if self._task_scheme == 'pure':
                         # pure zmq scheme doesn't support dependencies
                         msg = "Pure ZMQ scheduler doesn't support dependencies"
                         if (follow or after):
                             # hard fail on DAG dependencies
                             raise RuntimeError(msg)
                         if isinstance(f, dependent):
                             # soft warn on functional dependencies
                             warnings.warn(msg, RuntimeWarning)
+                    after = self._build_dependency(after)
+                    follow = self._build_dependency(follow)
                     subheader = dict(after=after, follow=follow, timeout=timeout)
                     bufs = ss.pack_apply_message(f,args,kwargs)
                     content = dict(bound=bound)
                     msg = self.session.send(self._task_socket, "apply_request",
                             content=content, buffers=bufs, subheader=subheader)
                     msg_id = msg['msg_id']
                     self.outstanding.add(msg_id)
                     self.history.append(msg_id)
                     ar = AsyncResult(self, [msg_id], fname=f.__name__)
                     if block:
                         return ar.get()
                     else:
                         return ar
                 def _apply_direct(self, f, args, kwargs, bound=True, block=None, targets=None):
                     """Then underlying method for applying functions to specific engines
                     via the MUX queue."""
                     queues,targets = self._build_targets(targets)
                     subheader = {}
                     content = dict(bound=bound)
                     bufs = ss.pack_apply_message(f,args,kwargs)
                     msg_ids = []
                     for queue in queues:
                         msg = self.session.send(self._mux_socket, "apply_request",
                                 content=content, buffers=bufs,ident=queue, subheader=subheader)
                         msg_id = msg['msg_id']
                         self.outstanding.add(msg_id)
                         self.history.append(msg_id)
                         msg_ids.append(msg_id)
                     ar = AsyncResult(self, msg_ids, fname=f.__name__)
                     if block:
                         return ar.get()
                     else:
                         return ar
                 #--------------------------------------------------------------------------
                 # Map and decorators
                 #--------------------------------------------------------------------------
                 def map(self, f, *sequences):
                     """Parallel version of builtin `map`, using all our engines."""
                     pf = ParallelFunction(self, f, block=self.block,
                                     bound=True, targets='all')
                     return pf.map(*sequences)
                 def parallel(self, bound=True, targets='all', block=True):
                     """Decorator for making a ParallelFunction."""
                     return parallel(self, bound=bound, targets=targets, block=block)
                 def remote(self, bound=True, targets='all', block=True):
                     """Decorator for making a RemoteFunction."""
                     return remote(self, bound=bound, targets=targets, block=block)
                 #--------------------------------------------------------------------------
                 # Data movement
                 #--------------------------------------------------------------------------
                 @defaultblock
                 def push(self, ns, targets='all', block=None):
                     """Push the contents of `ns` into the namespace on `target`"""
                     if not isinstance(ns, dict):
                         raise TypeError("Must be a dict, not %s"%type(ns))
                     result = self.apply(_push, (ns,), targets=targets, block=block, bound=True)
                     return result
                 @defaultblock
                 def pull(self, keys, targets='all', block=None):
                     """Pull objects from `target`'s namespace by `keys`"""
                     if isinstance(keys, str):
                         pass
                     elif isinstance(keys, (list,tuple,set)):
                         for key in keys:
                             if not isinstance(key, str):
                                 raise TypeError
                     result = self.apply(_pull, (keys,), targets=targets, block=block, bound=True)
                     return result
                 def scatter(self, key, seq, dist='b', flatten=False, targets='all', block=None):
                     """
                     Partition a Python sequence and send the partitions to a set of engines.
                     """
                     block = block if block is not None else self.block
                     targets = self._build_targets(targets)[-1]
                     mapObject = Map.dists[dist]()
                     nparts = len(targets)
                     msg_ids = []
                     for index, engineid in enumerate(targets):
                         partition = mapObject.getPartition(seq, index, nparts)
                         if flatten and len(partition) == 1:
                             r = self.push({key: partition[0]}, targets=engineid, block=False)
                         else:
                             r = self.push({key: partition}, targets=engineid, block=False)
                         msg_ids.extend(r.msg_ids)
                     r = AsyncResult(self, msg_ids, fname='scatter')
                     if block:
                         return r.get()
                     else:
                         return r
                 def gather(self, key, dist='b', targets='all', block=None):
                     """
                     Gather a partitioned sequence on a set of engines as a single local seq.
                     """
                     block = block if block is not None else self.block
                     targets = self._build_targets(targets)[-1]
                     mapObject = Map.dists[dist]()
                     msg_ids = []
                     for index, engineid in enumerate(targets):
                         msg_ids.extend(self.pull(key, targets=engineid,block=False).msg_ids)
                     r = AsyncMapResult(self, msg_ids, mapObject, fname='gather')
                     if block:
                         return r.get()
                     else:
                         return r
                 #--------------------------------------------------------------------------
                 # Query methods
                 #--------------------------------------------------------------------------
                 @spinfirst
                 def get_results(self, msg_ids, status_only=False):
                     """Returns the result of the execute or task request with `msg_ids`.
                     Parameters
                     ----------
                     msg_ids : list of ints or msg_ids
                         if int:
                             Passed as index to self.history for convenience.
                     status_only : bool (default: False)
                         if False:
                             return the actual results
                     Returns
                     -------
                     results : dict
                         There will always be the keys 'pending' and 'completed', which will
                         be lists of msg_ids.
                     """
                     if not isinstance(msg_ids, (list,tuple)):
                         msg_ids = [msg_ids]
                     theids = []
                     for msg_id in msg_ids:
                         if isinstance(msg_id, int):
                             msg_id = self.history[msg_id]
                         if not isinstance(msg_id, str):
                             raise TypeError("msg_ids must be str, not %r"%msg_id)
                         theids.append(msg_id)
                     completed = []
                     local_results = {}
                     # comment this block out to temporarily disable local shortcut:
                     for msg_id in list(theids):
                         if msg_id in self.results:
                             completed.append(msg_id)
                             local_results[msg_id] = self.results[msg_id]
                             theids.remove(msg_id)
                     if theids: # some not locally cached
                         content = dict(msg_ids=theids, status_only=status_only)
                         msg = self.session.send(self._query_socket, "result_request", content=content)
                         zmq.select([self._query_socket], [], [])
                         idents,msg = self.session.recv(self._query_socket, zmq.NOBLOCK)
                         if self.debug:
                             pprint(msg)
                         content = msg['content']
                         if content['status'] != 'ok':
                             raise ss.unwrap_exception(content)
                         buffers = msg['buffers']
                     else:
                         content = dict(completed=[],pending=[])
                     content['completed'].extend(completed)
                     if status_only:
                         return content
                     failures = []
                     # load cached results into result:
                     content.update(local_results)
                     # update cache with results:
                     for msg_id in sorted(theids):
                         if msg_id in content['completed']:
                             rec = content[msg_id]
                             parent = rec['header']
                             header = rec['result_header']
                             rcontent = rec['result_content']
                             iodict = rec['io']
                             if isinstance(rcontent, str):
                                 rcontent = self.session.unpack(rcontent)
                             md = self.metadata.setdefault(msg_id, Metadata())
                             md.update(self._extract_metadata(header, parent, rcontent))
                             md.update(iodict)
                             if rcontent['status'] == 'ok':
                                 res,buffers = ss.unserialize_object(buffers)
                             else:
                                 res = ss.unwrap_exception(rcontent)
                                 failures.append(res)
                             self.results[msg_id] = res
                             content[msg_id] = res
                     error.collect_exceptions(failures, "get_results")
                     return content
                 @spinfirst
                 def queue_status(self, targets=None, verbose=False):
                     """Fetch the status of engine queues.
                     Parameters
                     ----------
                     targets : int/str/list of ints/strs
                             the engines on which to execute
                             default : all
                     verbose : bool
                             Whether to return lengths only, or lists of ids for each element
                     """
                     targets = self._build_targets(targets)[1]
                     content = dict(targets=targets, verbose=verbose)
                     self.session.send(self._query_socket, "queue_request", content=content)
                     idents,msg = self.session.recv(self._query_socket, 0)
                     if self.debug:
                         pprint(msg)
                     content = msg['content']
                     status = content.pop('status')
                     if status != 'ok':
                         raise ss.unwrap_exception(content)
                     return ss.rekey(content)
                 @spinfirst
                 def purge_results(self, msg_ids=[], targets=[]):
                     """Tell the controller to forget results.
                     Individual results can be purged by msg_id, or the entire
                     history of specific targets can be purged.
                     Parameters
                     ----------
                     msg_ids : str or list of strs
                             the msg_ids whose results should be forgotten.
                     targets : int/str/list of ints/strs
                             The targets, by uuid or int_id, whose entire history is to be purged.
                             Use `targets='all'` to scrub everything from the controller's memory.
                             default : None
                     """
                     if not targets and not msg_ids:
                         raise ValueError
                     if targets:
                         targets = self._build_targets(targets)[1]
                     content = dict(targets=targets, msg_ids=msg_ids)
                     self.session.send(self._query_socket, "purge_request", content=content)
                     idents, msg = self.session.recv(self._query_socket, 0)
                     if self.debug:
                         pprint(msg)
                     content = msg['content']
                     if content['status'] != 'ok':
                         raise ss.unwrap_exception(content)
                 #----------------------------------------
                 # activate for %px,%autopx magics
                 #----------------------------------------
                 def activate(self):
                     """Make this `View` active for parallel magic commands.
                     IPython has a magic command syntax to work with `MultiEngineClient` objects.
                     In a given IPython session there is a single active one.  While
                     there can be many `Views` created and used by the user,
                     there is only one active one.  The active `View` is used whenever
                     the magic commands %px and %autopx are used.
                     The activate() method is called on a given `View` to make it
                     active.  Once this has been done, the magic commands can be used.
                     """
                     try:
                         # This is injected into __builtins__.
                         ip = get_ipython()
                     except NameError:
                         print "The IPython parallel magics (%result, %px, %autopx) only work within IPython."
                     else:
                         pmagic = ip.plugin_manager.get_plugin('parallelmagic')
                         if pmagic is not None:
                             pmagic.active_multiengine_client = self
                         else:
                             print "You must first load the parallelmagic extension " \
                                   "by doing '%load_ext parallelmagic'"
             class AsynClient(Client):
                 """An Asynchronous client, using the Tornado Event Loop.
                 !!!unfinished!!!"""
                 io_loop = None
                 _queue_stream = None
                 _notifier_stream = None
                 _task_stream = None
                 _control_stream = None
                 def __init__(self, addr, context=None, username=None, debug=False, io_loop=None):
                     Client.__init__(self, addr, context, username, debug)
                     if io_loop is None:
                         io_loop = ioloop.IOLoop.instance()
                     self.io_loop = io_loop
                     self._queue_stream = zmqstream.ZMQStream(self._mux_socket, io_loop)
                     self._control_stream = zmqstream.ZMQStream(self._control_socket, io_loop)
                     self._task_stream = zmqstream.ZMQStream(self._task_socket, io_loop)
                     self._notification_stream = zmqstream.ZMQStream(self._notification_socket, io_loop)
                 def spin(self):
                     for stream in (self.queue_stream, self.notifier_stream,
                                     self.task_stream, self.control_stream):
                         stream.flush()
             __all__ = [ 'Client',
                         'depend',
                         'require',
                         'remote',
                         'parallel',
                         'RemoteFunction',
                         'ParallelFunction',
                         'DirectView',
                         'LoadBalancedView',
                         'AsyncResult',
                         'AsyncMapResult'
                         ]

IPython/zmq/parallel/dependency.py

0 +24 -25

             """Dependency utilities"""
             from IPython.external.decorator import decorator
             from error import UnmetDependency
+            from asyncresult import AsyncResult
-            # flags
-            ALL = 1 << 0
-            ANY = 1 << 1
-            HERE = 1 << 2
-            ANYWHERE = 1 << 3
             class depend(object):
                 """Dependency decorator, for use with tasks."""
                 def __init__(self, f, *args, **kwargs):
                     self.f = f
                     self.args = args
                     self.kwargs = kwargs
                 def __call__(self, f):
                     return dependent(f, self.f, *self.args, **self.kwargs)
             class dependent(object):
                 """A function that depends on another function.
                 This is an object to prevent the closure used
                 in traditional decorators, which are not picklable.
                 """
                 def __init__(self, f, df, *dargs, **dkwargs):
                     self.f = f
                     self.func_name = getattr(f, '__name__', 'f')
                     self.df = df
                     self.dargs = dargs
                     self.dkwargs = dkwargs
                 def __call__(self, *args, **kwargs):
                     if self.df(*self.dargs, **self.dkwargs) is False:
                         raise UnmetDependency()
                     return self.f(*args, **kwargs)
                 @property
                 def __name__(self):
                     return self.func_name
             def _require(*names):
                 for name in names:
                     try:
                         __import__(name)
                     except ImportError:
                         return False
                 return True
             def require(*names):
                 return depend(_require, *names)
             class Dependency(set):
                 """An object for representing a set of msg_id dependencies.
                 Subclassed from set()."""
-                mode='all'
+                all=True
                 success_only=True
-                def __init__(self, dependencies=[], mode='all', success_only=True):
+                def __init__(self, dependencies=[], all=True, success_only=True):
                     if isinstance(dependencies, dict):
                         # load from dict
-                        mode = dependencies.get('mode', mode)
+                        all = dependencies.get('all', True)
                         success_only = dependencies.get('success_only', success_only)
                         dependencies = dependencies.get('dependencies', [])
-                    set.__init__(self, dependencies)
+                    ids = []
-                    self.mode = mode.lower()
+                    if isinstance(dependencies, AsyncResult):
+                        ids.extend(AsyncResult.msg_ids)
+                    else:
+                        for d in dependencies:
+                            if isinstance(d, basestring):
+                                ids.append(d)
+                            elif isinstance(d, AsyncResult):
+                                ids.extend(d.msg_ids)
+                            else:
+                                raise TypeError("invalid dependency type: %r"%type(d))
+                    set.__init__(self, ids)
+                    self.all = all
                     self.success_only=success_only
-                    if self.mode not in ('any', 'all'):
-                        raise NotImplementedError("Only any|all supported, not %r"%mode)
                 def check(self, completed, failed=None):
                     if failed is not None and not self.success_only:
                         completed = completed.union(failed)
                     if len(self) == 0:
                         return True
-                    if self.mode == 'all':
+                    if self.all:
                         return self.issubset(completed)
-                    elif self.mode == 'any':
-                        return not self.isdisjoint(completed)
                     else:
-                        raise NotImplementedError("Only any|all supported, not %r"%mode)
+                        return not self.isdisjoint(completed)
                 def unreachable(self, failed):
                     if len(self) == 0 or len(failed) == 0 or not self.success_only:
                         return False
-                    print self, self.success_only, self.mode, failed
+                    # print self, self.success_only, self.all, failed
-                    if self.mode == 'all':
+                    if self.all:
                         return not self.isdisjoint(failed)
-                    elif self.mode == 'any':
-                        return self.issubset(failed)
                     else:
-                        raise NotImplementedError("Only any|all supported, not %r"%mode)
+                        return self.issubset(failed)
                 def as_dict(self):
                     """Represent this dependency as a dict. For json compatibility."""
                     return dict(
                         dependencies=list(self),
-                        mode=self.mode,
+                        all=self.all,
                         success_only=self.success_only,
                     )
-            __all__ = ['depend', 'require', 'Dependency']
+            __all__ = ['depend', 'require', 'dependent', 'Dependency']

IPython/zmq/parallel/error.py

0 +4 -1

             # encoding: utf-8
             """Classes and functions for kernel related errors and exceptions."""
             from __future__ import print_function
             __docformat__ = "restructuredtext en"
             # Tell nose to skip this module
             __test__ = {}
             #-------------------------------------------------------------------------------
             #  Copyright (C) 2008  The IPython Development Team
             #
             #  Distributed under the terms of the BSD License.  The full license is in
             #  the file COPYING, distributed as part of this software.
             #-------------------------------------------------------------------------------
             #-------------------------------------------------------------------------------
             # Error classes
             #-------------------------------------------------------------------------------
             class IPythonError(Exception):
                 """Base exception that all of our exceptions inherit from.
                 This can be raised by code that doesn't have any more specific
                 information."""
                 pass
             # Exceptions associated with the controller objects
             class ControllerError(IPythonError): pass
             class ControllerCreationError(ControllerError): pass
             # Exceptions associated with the Engines
             class EngineError(IPythonError): pass
             class EngineCreationError(EngineError): pass
             class KernelError(IPythonError):
                 pass
             class NotDefined(KernelError):
                 def __init__(self, name):
                     self.name = name
                     self.args = (name,)
                 def __repr__(self):
                     return '<NotDefined: %s>' % self.name
                 __str__ = __repr__
             class QueueCleared(KernelError):
                 pass
             class IdInUse(KernelError):
                 pass
             class ProtocolError(KernelError):
                 pass
             class ConnectionError(KernelError):
                 pass
             class InvalidEngineID(KernelError):
                 pass
             class NoEnginesRegistered(KernelError):
                 pass
             class InvalidClientID(KernelError):
                 pass
             class InvalidDeferredID(KernelError):
                 pass
             class SerializationError(KernelError):
                 pass
             class MessageSizeError(KernelError):
                 pass
             class PBMessageSizeError(MessageSizeError):
                 pass
             class ResultNotCompleted(KernelError):
                 pass
             class ResultAlreadyRetrieved(KernelError):
                 pass
             class ClientError(KernelError):
                 pass
             class TaskAborted(KernelError):
                 pass
             class TaskTimeout(KernelError):
                 pass
             class NotAPendingResult(KernelError):
                 pass
             class UnpickleableException(KernelError):
                 pass
             class AbortedPendingDeferredError(KernelError):
                 pass
             class InvalidProperty(KernelError):
                 pass
             class MissingBlockArgument(KernelError):
                 pass
             class StopLocalExecution(KernelError):
                 pass
             class SecurityError(KernelError):
                 pass
             class FileTimeoutError(KernelError):
                 pass
             class TimeoutError(KernelError):
                 pass
             class UnmetDependency(KernelError):
                 pass
             class ImpossibleDependency(UnmetDependency):
                 pass
-            class DependencyTimeout(UnmetDependency):
+            class DependencyTimeout(ImpossibleDependency):
+                pass
+            class InvalidDependency(ImpossibleDependency):
                 pass
             class RemoteError(KernelError):
                 """Error raised elsewhere"""
                 ename=None
                 evalue=None
                 traceback=None
                 engine_info=None
                 def __init__(self, ename, evalue, traceback, engine_info=None):
                     self.ename=ename
                     self.evalue=evalue
                     self.traceback=traceback
                     self.engine_info=engine_info or {}
                     self.args=(ename, evalue)
                 def __repr__(self):
                     engineid = self.engine_info.get('engineid', ' ')
                     return "<Remote[%s]:%s(%s)>"%(engineid, self.ename, self.evalue)
                 def __str__(self):
                     sig = "%s(%s)"%(self.ename, self.evalue)
                     if self.traceback:
                         return sig + '\n' + self.traceback
                     else:
                         return sig
             class TaskRejectError(KernelError):
                 """Exception to raise when a task should be rejected by an engine.
                 This exception can be used to allow a task running on an engine to test
                 if the engine (or the user's namespace on the engine) has the needed
                 task dependencies.  If not, the task should raise this exception.  For
                 the task to be retried on another engine, the task should be created
                 with the `retries` argument > 1.
                 The advantage of this approach over our older properties system is that
                 tasks have full access to the user's namespace on the engines and the
                 properties don't have to be managed or tested by the controller.
                 """
             class CompositeError(KernelError):
                 """Error for representing possibly multiple errors on engines"""
                 def __init__(self, message, elist):
                     Exception.__init__(self, *(message, elist))
                     # Don't use pack_exception because it will conflict with the .message
                     # attribute that is being deprecated in 2.6 and beyond.
                     self.msg = message
                     self.elist = elist
                     self.args = [ e[0] for e in elist ]
                 def _get_engine_str(self, ei):
                     if not ei:
                         return '[Engine Exception]'
                     else:
                         return '[%i:%s]: ' % (ei['engineid'], ei['method'])
                 def _get_traceback(self, ev):
                     try:
                         tb = ev._ipython_traceback_text
                     except AttributeError:
                         return 'No traceback available'
                     else:
                         return tb
                 def __str__(self):
                     s = str(self.msg)
                     for en, ev, etb, ei in self.elist:
                         engine_str = self._get_engine_str(ei)
                         s = s + '\n' + engine_str + en + ': ' + str(ev)
                     return s
                 def __repr__(self):
                     return "CompositeError(%i)"%len(self.elist)
                 def print_tracebacks(self, excid=None):
                     if excid is None:
                         for (en,ev,etb,ei) in self.elist:
                             print (self._get_engine_str(ei))
                             print (etb or 'No traceback available')
                             print ()
                     else:
                         try:
                             en,ev,etb,ei = self.elist[excid]
                         except:
                             raise IndexError("an exception with index %i does not exist"%excid)
                         else:
                             print (self._get_engine_str(ei))
                             print (etb or 'No traceback available')
                 def raise_exception(self, excid=0):
                     try:
                         en,ev,etb,ei = self.elist[excid]
                     except:
                         raise IndexError("an exception with index %i does not exist"%excid)
                     else:
                         try:
                             raise RemoteError(en, ev, etb, ei)
                         except:
                             et,ev,tb = sys.exc_info()
             def collect_exceptions(rdict_or_list, method='unspecified'):
                 """check a result dict for errors, and raise CompositeError if any exist.
                 Passthrough otherwise."""
                 elist = []
                 if isinstance(rdict_or_list, dict):
                     rlist = rdict_or_list.values()
                 else:
                     rlist = rdict_or_list
                 for r in rlist:
                     if isinstance(r, RemoteError):
                         en, ev, etb, ei = r.ename, r.evalue, r.traceback, r.engine_info
                         # Sometimes we could have CompositeError in our list.  Just take
                         # the errors out of them and put them in our new list.  This
                         # has the effect of flattening lists of CompositeErrors into one
                         # CompositeError
                         if en=='CompositeError':
                             for e in ev.elist:
                                 elist.append(e)
                         else:
                             elist.append((en, ev, etb, ei))
                 if len(elist)==0:
                     return rdict_or_list
                 else:
                     msg = "one or more exceptions from call to method: %s" % (method)
                     # This silliness is needed so the debugger has access to the exception
                     # instance (e in this case)
                     try:
                         raise CompositeError(msg, elist)
                     except CompositeError, e:
                         raise e

IPython/zmq/parallel/hub.py

0 +1 -1

             #!/usr/bin/env python
             """The IPython Controller Hub with 0MQ
             This is the master object that handles connections from engines and clients,
             and monitors traffic through the various queues.
             """
             #-----------------------------------------------------------------------------
             #  Copyright (C) 2010  The IPython Development Team
             #
             #  Distributed under the terms of the BSD License.  The full license is in
             #  the file COPYING, distributed as part of this software.
             #-----------------------------------------------------------------------------
             #-----------------------------------------------------------------------------
             # Imports
             #-----------------------------------------------------------------------------
             from __future__ import print_function
             import sys
             from datetime import datetime
             import time
             import logging
             import zmq
             from zmq.eventloop import ioloop
             from zmq.eventloop.zmqstream import ZMQStream
             # internal:
             from IPython.config.configurable import Configurable
             from IPython.utils.traitlets import HasTraits, Instance, Int, Str, Dict, Set, List, Bool
             from IPython.utils.importstring import import_item
             from entry_point import select_random_ports
             from factory import RegistrationFactory, LoggingFactory
             from streamsession import Message, wrap_exception, ISO8601
             from heartmonitor import HeartMonitor
             from util import validate_url_container
             try:
                 from pymongo.binary import Binary
             except ImportError:
                 MongoDB=None
             else:
                 from mongodb import MongoDB
             #-----------------------------------------------------------------------------
             # Code
             #-----------------------------------------------------------------------------
             def _passer(*args, **kwargs):
                 return
             def _printer(*args, **kwargs):
                 print (args)
                 print (kwargs)
             def init_record(msg):
                 """Initialize a TaskRecord based on a request."""
                 header = msg['header']
                 return {
                     'msg_id' : header['msg_id'],
                     'header' : header,
                     'content': msg['content'],
                     'buffers': msg['buffers'],
                     'submitted': datetime.strptime(header['date'], ISO8601),
                     'client_uuid' : None,
                     'engine_uuid' : None,
                     'started': None,
                     'completed': None,
                     'resubmitted': None,
                     'result_header' : None,
                     'result_content' : None,
                     'result_buffers' : None,
                     'queue' : None,
                     'pyin' : None,
                     'pyout': None,
                     'pyerr': None,
                     'stdout': '',
                     'stderr': '',
                 }
             class EngineConnector(HasTraits):
                 """A simple object for accessing the various zmq connections of an object.
                 Attributes are:
                 id (int): engine ID
                 uuid (str): uuid (unused?)
                 queue (str): identity of queue's XREQ socket
                 registration (str): identity of registration XREQ socket
                 heartbeat (str): identity of heartbeat XREQ socket
                 """
                 id=Int(0)
                 queue=Str()
                 control=Str()
                 registration=Str()
                 heartbeat=Str()
                 pending=Set()
             class HubFactory(RegistrationFactory):
                 """The Configurable for setting up a Hub."""
                 # name of a scheduler scheme
-                scheme = Str('lru', config=True)
+                scheme = Str('leastload', config=True)
                 # port-pairs for monitoredqueues:
                 hb = Instance(list, config=True)
                 def _hb_default(self):
                     return select_random_ports(2)
                 mux = Instance(list, config=True)
                 def _mux_default(self):
                     return select_random_ports(2)
                 task = Instance(list, config=True)
                 def _task_default(self):
                     return select_random_ports(2)
                 control = Instance(list, config=True)
                 def _control_default(self):
                     return select_random_ports(2)
                 iopub = Instance(list, config=True)
                 def _iopub_default(self):
                     return select_random_ports(2)
                 # single ports:
                 mon_port = Instance(int, config=True)
                 def _mon_port_default(self):
                     return select_random_ports(1)[0]
                 query_port = Instance(int, config=True)
                 def _query_port_default(self):
                     return select_random_ports(1)[0]
                 notifier_port = Instance(int, config=True)
                 def _notifier_port_default(self):
                     return select_random_ports(1)[0]
                 ping = Int(1000, config=True) # ping frequency
                 engine_ip = Str('127.0.0.1', config=True)
                 engine_transport = Str('tcp', config=True)
                 client_ip = Str('127.0.0.1', config=True)
                 client_transport = Str('tcp', config=True)
                 monitor_ip = Str('127.0.0.1', config=True)
                 monitor_transport = Str('tcp', config=True)
                 monitor_url = Str('')
                 db_class = Str('IPython.zmq.parallel.dictdb.DictDB', config=True)
                 # not configurable
                 db = Instance('IPython.zmq.parallel.dictdb.BaseDB')
                 heartmonitor = Instance('IPython.zmq.parallel.heartmonitor.HeartMonitor')
                 subconstructors = List()
                 _constructed = Bool(False)
                 def _ip_changed(self, name, old, new):
                     self.engine_ip = new
                     self.client_ip = new
                     self.monitor_ip = new
                     self._update_monitor_url()
                 def _update_monitor_url(self):
                     self.monitor_url = "%s://%s:%i"%(self.monitor_transport, self.monitor_ip, self.mon_port)
                 def _transport_changed(self, name, old, new):
                     self.engine_transport = new
                     self.client_transport = new
                     self.monitor_transport = new
                     self._update_monitor_url()
                 def __init__(self, **kwargs):
                     super(HubFactory, self).__init__(**kwargs)
                     self._update_monitor_url()
                     # self.on_trait_change(self._sync_ips, 'ip')
                     # self.on_trait_change(self._sync_transports, 'transport')
                     self.subconstructors.append(self.construct_hub)
                 def construct(self):
                     assert not self._constructed, "already constructed!"
                     for subc in self.subconstructors:
                         subc()
                     self._constructed = True
                 def start(self):
                     assert self._constructed, "must be constructed by self.construct() first!"
                     self.heartmonitor.start()
                     self.log.info("Heartmonitor started")
                 def construct_hub(self):
                     """construct"""
                     client_iface = "%s://%s:"%(self.client_transport, self.client_ip) + "%i"
                     engine_iface = "%s://%s:"%(self.engine_transport, self.engine_ip) + "%i"
                     ctx = self.context
                     loop = self.loop
                     # Registrar socket
                     reg = ZMQStream(ctx.socket(zmq.XREP), loop)
                     reg.bind(client_iface % self.regport)
                     self.log.info("Hub listening on %s for registration."%(client_iface%self.regport))
                     if self.client_ip != self.engine_ip:
                         reg.bind(engine_iface % self.regport)
                         self.log.info("Hub listening on %s for registration."%(engine_iface%self.regport))
                     ### Engine connections ###
                     # heartbeat
                     hpub = ctx.socket(zmq.PUB)
                     hpub.bind(engine_iface % self.hb[0])
                     hrep = ctx.socket(zmq.XREP)
                     hrep.bind(engine_iface % self.hb[1])
                     self.heartmonitor = HeartMonitor(loop=loop, pingstream=ZMQStream(hpub,loop), pongstream=ZMQStream(hrep,loop),
                                             period=self.ping, logname=self.log.name)
                     ### Client connections ###
                     # Clientele socket
                     c = ZMQStream(ctx.socket(zmq.XREP), loop)
                     c.bind(client_iface%self.query_port)
                     # Notifier socket
                     n = ZMQStream(ctx.socket(zmq.PUB), loop)
                     n.bind(client_iface%self.notifier_port)
                     ### build and launch the queues ###
                     # monitor socket
                     sub = ctx.socket(zmq.SUB)
                     sub.setsockopt(zmq.SUBSCRIBE, "")
                     sub.bind(self.monitor_url)
                     sub = ZMQStream(sub, loop)
                     # connect the db
                     self.db = import_item(self.db_class)()
                     time.sleep(.25)
                     # build connection dicts
                     self.engine_info = {
                         'control' : engine_iface%self.control[1],
                         'mux': engine_iface%self.mux[1],
                         'heartbeat': (engine_iface%self.hb[0], engine_iface%self.hb[1]),
                         'task' : engine_iface%self.task[1],
                         'iopub' : engine_iface%self.iopub[1],
                         # 'monitor' : engine_iface%self.mon_port,
                         }
                     self.client_info = {
                         'control' : client_iface%self.control[0],
                         'query': client_iface%self.query_port,
                         'mux': client_iface%self.mux[0],
                         'task' : (self.scheme, client_iface%self.task[0]),
                         'iopub' : client_iface%self.iopub[0],
                         'notification': client_iface%self.notifier_port
                         }
                     self.log.debug("hub::Hub engine addrs: %s"%self.engine_info)
                     self.log.debug("hub::Hub client addrs: %s"%self.client_info)
                     self.hub = Hub(loop=loop, session=self.session, monitor=sub, heartmonitor=self.heartmonitor,
                             registrar=reg, clientele=c, notifier=n, db=self.db,
                             engine_info=self.engine_info, client_info=self.client_info,
                             logname=self.log.name)
             class Hub(LoggingFactory):
                 """The IPython Controller Hub with 0MQ connections
                 Parameters
                 ==========
                 loop: zmq IOLoop instance
                 session: StreamSession object
                 <removed> context: zmq context for creating new connections (?)
                 queue: ZMQStream for monitoring the command queue (SUB)
                 registrar: ZMQStream for engine registration requests (XREP)
                 heartbeat: HeartMonitor object checking the pulse of the engines
                 clientele: ZMQStream for client connections (XREP)
                             not used for jobs, only query/control commands
                 notifier: ZMQStream for broadcasting engine registration changes (PUB)
                 db: connection to db for out of memory logging of commands
                             NotImplemented
                 engine_info: dict of zmq connection information for engines to connect
                             to the queues.
                 client_info: dict of zmq connection information for engines to connect
                             to the queues.
                 """
                 # internal data structures:
                 ids=Set() # engine IDs
                 keytable=Dict()
                 by_ident=Dict()
                 engines=Dict()
                 clients=Dict()
                 hearts=Dict()
                 pending=Set()
                 queues=Dict()  # pending msg_ids keyed by engine_id
                 tasks=Dict() # pending msg_ids submitted as tasks, keyed by client_id
                 completed=Dict() # completed msg_ids keyed by engine_id
                 all_completed=Set() # completed msg_ids keyed by engine_id
                 # mia=None
                 incoming_registrations=Dict()
                 registration_timeout=Int()
                 _idcounter=Int(0)
                 # objects from constructor:
                 loop=Instance(ioloop.IOLoop)
                 registrar=Instance(ZMQStream)
                 clientele=Instance(ZMQStream)
                 monitor=Instance(ZMQStream)
                 heartmonitor=Instance(HeartMonitor)
                 notifier=Instance(ZMQStream)
                 db=Instance(object)
                 client_info=Dict()
                 engine_info=Dict()
                 def __init__(self, **kwargs):
                     """
                     # universal:
                     loop: IOLoop for creating future connections
                     session: streamsession for sending serialized data
                     # engine:
                     queue: ZMQStream for monitoring queue messages
                     registrar: ZMQStream for engine registration
                     heartbeat: HeartMonitor object for tracking engines
                     # client:
                     clientele: ZMQStream for client connections
                     # extra:
                     db: ZMQStream for db connection (NotImplemented)
                     engine_info: zmq address/protocol dict for engine connections
                     client_info: zmq address/protocol dict for client connections
                     """
                     super(Hub, self).__init__(**kwargs)
                     self.registration_timeout = max(5000, 2*self.heartmonitor.period)
                     # validate connection dicts:
                     for k,v in self.client_info.iteritems():
                         if k == 'task':
                             validate_url_container(v[1])
                         else:
                             validate_url_container(v)
                     # validate_url_container(self.client_info)
                     validate_url_container(self.engine_info)
                     # register our callbacks
                     self.registrar.on_recv(self.dispatch_register_request)
                     self.clientele.on_recv(self.dispatch_client_msg)
                     self.monitor.on_recv(self.dispatch_monitor_traffic)
                     self.heartmonitor.add_heart_failure_handler(self.handle_heart_failure)
                     self.heartmonitor.add_new_heart_handler(self.handle_new_heart)
                     self.monitor_handlers = { 'in' : self.save_queue_request,
                                             'out': self.save_queue_result,
                                             'intask': self.save_task_request,
                                             'outtask': self.save_task_result,
                                             'tracktask': self.save_task_destination,
                                             'incontrol': _passer,
                                             'outcontrol': _passer,
                                             'iopub': self.save_iopub_message,
                     }
                     self.client_handlers = {'queue_request': self.queue_status,
                                             'result_request': self.get_results,
                                             'purge_request': self.purge_results,
                                             'load_request': self.check_load,
                                             'resubmit_request': self.resubmit_task,
                                             'shutdown_request': self.shutdown_request,
                                             }
                     self.registrar_handlers = {'registration_request' : self.register_engine,
                                             'unregistration_request' : self.unregister_engine,
                                             'connection_request': self.connection_request,
                     }
                     self.log.info("hub::created hub")
                 @property
                 def _next_id(self):
                     """gemerate a new ID.
                     No longer reuse old ids, just count from 0."""
                     newid = self._idcounter
                     self._idcounter += 1
                     return newid
                     # newid = 0
                     # incoming = [id[0] for id in self.incoming_registrations.itervalues()]
                     # # print newid, self.ids, self.incoming_registrations
                     # while newid in self.ids or newid in incoming:
                     #     newid += 1
                     # return newid
                 #-----------------------------------------------------------------------------
                 # message validation
                 #-----------------------------------------------------------------------------
                 def _validate_targets(self, targets):
                     """turn any valid targets argument into a list of integer ids"""
                     if targets is None:
                         # default to all
                         targets = self.ids
                     if isinstance(targets, (int,str,unicode)):
                         # only one target specified
                         targets = [targets]
                     _targets = []
                     for t in targets:
                         # map raw identities to ids
                         if isinstance(t, (str,unicode)):
                             t = self.by_ident.get(t, t)
                         _targets.append(t)
                     targets = _targets
                     bad_targets = [ t for t in targets if t not in self.ids ]
                     if bad_targets:
                         raise IndexError("No Such Engine: %r"%bad_targets)
                     if not targets:
                         raise IndexError("No Engines Registered")
                     return targets
                 def _validate_client_msg(self, msg):
                     """validates and unpacks headers of a message. Returns False if invalid,
                     (ident, header, parent, content)"""
                     client_id = msg[0]
                     try:
                         msg = self.session.unpack_message(msg[1:], content=True)
                     except:
                         self.log.error("client::Invalid Message %s"%msg, exc_info=True)
                         return False
                     msg_type = msg.get('msg_type', None)
                     if msg_type is None:
                         return False
                     header = msg.get('header')
                     # session doesn't handle split content for now:
                     return client_id, msg
                 #-----------------------------------------------------------------------------
                 # dispatch methods (1 per stream)
                 #-----------------------------------------------------------------------------
                 def dispatch_register_request(self, msg):
                     """"""
                     self.log.debug("registration::dispatch_register_request(%s)"%msg)
                     idents,msg = self.session.feed_identities(msg)
                     if not idents:
                         self.log.error("Bad Queue Message: %s"%msg, exc_info=True)
                         return
                     try:
                         msg = self.session.unpack_message(msg,content=True)
                     except:
                         self.log.error("registration::got bad registration message: %s"%msg, exc_info=True)
                         return
                     msg_type = msg['msg_type']
                     content = msg['content']
                     handler = self.registrar_handlers.get(msg_type, None)
                     if handler is None:
                         self.log.error("registration::got bad registration message: %s"%msg)
                     else:
                         handler(idents, msg)
                 def dispatch_monitor_traffic(self, msg):
                     """all ME and Task queue messages come through here, as well as
                     IOPub traffic."""
                     self.log.debug("monitor traffic: %s"%msg[:2])
                     switch = msg[0]
                     idents, msg = self.session.feed_identities(msg[1:])
                     if not idents:
                         self.log.error("Bad Monitor Message: %s"%msg)
                         return
                     handler = self.monitor_handlers.get(switch, None)
                     if handler is not None:
                         handler(idents, msg)
                     else:
                         self.log.error("Invalid monitor topic: %s"%switch)
                 def dispatch_client_msg(self, msg):
                     """Route messages from clients"""
                     idents, msg = self.session.feed_identities(msg)
                     if not idents:
                         self.log.error("Bad Client Message: %s"%msg)
                         return
                     client_id = idents[0]
                     try:
                         msg = self.session.unpack_message(msg, content=True)
                     except:
                         content = wrap_exception()
                         self.log.error("Bad Client Message: %s"%msg, exc_info=True)
                         self.session.send(self.clientele, "hub_error", ident=client_id,
                                 content=content)
                         return
                     # print client_id, header, parent, content
                     #switch on message type:
                     msg_type = msg['msg_type']
                     self.log.info("client:: client %s requested %s"%(client_id, msg_type))
                     handler = self.client_handlers.get(msg_type, None)
                     try:
                         assert handler is not None, "Bad Message Type: %s"%msg_type
                     except:
                         content = wrap_exception()
                         self.log.error("Bad Message Type: %s"%msg_type, exc_info=True)
                         self.session.send(self.clientele, "hub_error", ident=client_id,
                                 content=content)
                         return
                     else:
                         handler(client_id, msg)
                 def dispatch_db(self, msg):
                     """"""
                     raise NotImplementedError
                 #---------------------------------------------------------------------------
                 # handler methods (1 per event)
                 #---------------------------------------------------------------------------
                 #----------------------- Heartbeat --------------------------------------
                 def handle_new_heart(self, heart):
                     """handler to attach to heartbeater.
                     Called when a new heart starts to beat.
                     Triggers completion of registration."""
                     self.log.debug("heartbeat::handle_new_heart(%r)"%heart)
                     if heart not in self.incoming_registrations:
                         self.log.info("heartbeat::ignoring new heart: %r"%heart)
                     else:
                         self.finish_registration(heart)
                 def handle_heart_failure(self, heart):
                     """handler to attach to heartbeater.
                     called when a previously registered heart fails to respond to beat request.
                     triggers unregistration"""
                     self.log.debug("heartbeat::handle_heart_failure(%r)"%heart)
                     eid = self.hearts.get(heart, None)
                     queue = self.engines[eid].queue
                     if eid is None:
                         self.log.info("heartbeat::ignoring heart failure %r"%heart)
                     else:
                         self.unregister_engine(heart, dict(content=dict(id=eid, queue=queue)))
                 #----------------------- MUX Queue Traffic ------------------------------
                 def save_queue_request(self, idents, msg):
                     if len(idents) < 2:
                         self.log.error("invalid identity prefix: %s"%idents)
                         return
                     queue_id, client_id = idents[:2]
                     try:
                         msg = self.session.unpack_message(msg, content=False)
                     except:
                         self.log.error("queue::client %r sent invalid message to %r: %s"%(client_id, queue_id, msg), exc_info=True)
                         return
                     eid = self.by_ident.get(queue_id, None)
                     if eid is None:
                         self.log.error("queue::target %r not registered"%queue_id)
                         self.log.debug("queue::    valid are: %s"%(self.by_ident.keys()))
                         return
                     header = msg['header']
                     msg_id = header['msg_id']
                     record = init_record(msg)
                     record['engine_uuid'] = queue_id
                     record['client_uuid'] = client_id
                     record['queue'] = 'mux'
                     if MongoDB is not None and isinstance(self.db, MongoDB):
                         record['buffers'] = map(Binary, record['buffers'])
                     self.pending.add(msg_id)
                     self.queues[eid].append(msg_id)
                     self.db.add_record(msg_id, record)
                 def save_queue_result(self, idents, msg):
                     if len(idents) < 2:
                         self.log.error("invalid identity prefix: %s"%idents)
                         return
                     client_id, queue_id = idents[:2]
                     try:
                         msg = self.session.unpack_message(msg, content=False)
                     except:
                         self.log.error("queue::engine %r sent invalid message to %r: %s"%(
                                 queue_id,client_id, msg), exc_info=True)
                         return
                     eid = self.by_ident.get(queue_id, None)
                     if eid is None:
                         self.log.error("queue::unknown engine %r is sending a reply: "%queue_id)
                         self.log.debug("queue::       %s"%msg[2:])
                         return
                     parent = msg['parent_header']
                     if not parent:
                         return
                     msg_id = parent['msg_id']
                     if msg_id in self.pending:
                         self.pending.remove(msg_id)
                         self.all_completed.add(msg_id)
                         self.queues[eid].remove(msg_id)
                         self.completed[eid].append(msg_id)
                         rheader = msg['header']
                         completed = datetime.strptime(rheader['date'], ISO8601)
                         started = rheader.get('started', None)
                         if started is not None:
                             started = datetime.strptime(started, ISO8601)
                         result = {
                             'result_header' : rheader,
                             'result_content': msg['content'],
                             'started' : started,
                             'completed' : completed
                         }
                         if MongoDB is not None and isinstance(self.db, MongoDB):
                             result['result_buffers'] = map(Binary, msg['buffers'])
                         else:
                             result['result_buffers'] = msg['buffers']
                         self.db.update_record(msg_id, result)
                     else:
                         self.log.debug("queue:: unknown msg finished %s"%msg_id)
                 #--------------------- Task Queue Traffic ------------------------------
                 def save_task_request(self, idents, msg):
                     """Save the submission of a task."""
                     client_id = idents[0]
                     try:
                         msg = self.session.unpack_message(msg, content=False)
                     except:
                         self.log.error("task::client %r sent invalid task message: %s"%(
                                 client_id, msg), exc_info=True)
                         return
                     record = init_record(msg)
                     if MongoDB is not None and isinstance(self.db, MongoDB):
                         record['buffers'] = map(Binary, record['buffers'])
                     record['client_uuid'] = client_id
                     record['queue'] = 'task'
                     header = msg['header']
                     msg_id = header['msg_id']
                     self.pending.add(msg_id)
                     self.db.add_record(msg_id, record)
                 def save_task_result(self, idents, msg):
                     """save the result of a completed task."""
                     client_id = idents[0]
                     try:
                         msg = self.session.unpack_message(msg, content=False)
                     except:
                         self.log.error("task::invalid task result message send to %r: %s"%(
                                 client_id, msg), exc_info=True)
                         raise
                         return
                     parent = msg['parent_header']
                     if not parent:
                         # print msg
                         self.log.warn("Task %r had no parent!"%msg)
                         return
                     msg_id = parent['msg_id']
                     header = msg['header']
                     engine_uuid = header.get('engine', None)
                     eid = self.by_ident.get(engine_uuid, None)
                     if msg_id in self.pending:
                         self.pending.remove(msg_id)
                         self.all_completed.add(msg_id)
                         if eid is not None:
                             self.completed[eid].append(msg_id)
                             if msg_id in self.tasks[eid]:
                                 self.tasks[eid].remove(msg_id)
                         completed = datetime.strptime(header['date'], ISO8601)
                         started = header.get('started', None)
                         if started is not None:
                             started = datetime.strptime(started, ISO8601)
                         result = {
                             'result_header' : header,
                             'result_content': msg['content'],
                             'started' : started,
                             'completed' : completed,
                             'engine_uuid': engine_uuid
                         }
                         if MongoDB is not None and isinstance(self.db, MongoDB):
                             result['result_buffers'] = map(Binary, msg['buffers'])
                         else:
                             result['result_buffers'] = msg['buffers']
                         self.db.update_record(msg_id, result)
                     else:
                         self.log.debug("task::unknown task %s finished"%msg_id)
                 def save_task_destination(self, idents, msg):
                     try:
                         msg = self.session.unpack_message(msg, content=True)
                     except:
                         self.log.error("task::invalid task tracking message", exc_info=True)
                         return
                     content = msg['content']
                     print (content)
                     msg_id = content['msg_id']
                     engine_uuid = content['engine_id']
                     eid = self.by_ident[engine_uuid]
                     self.log.info("task::task %s arrived on %s"%(msg_id, eid))
                     # if msg_id in self.mia:
                     #     self.mia.remove(msg_id)
                     # else:
                     #     self.log.debug("task::task %s not listed as MIA?!"%(msg_id))
                     self.tasks[eid].append(msg_id)
                     # self.pending[msg_id][1].update(received=datetime.now(),engine=(eid,engine_uuid))
                     self.db.update_record(msg_id, dict(engine_uuid=engine_uuid))
                 def mia_task_request(self, idents, msg):
                     raise NotImplementedError
                     client_id = idents[0]
                     # content = dict(mia=self.mia,status='ok')
                     # self.session.send('mia_reply', content=content, idents=client_id)
                 #--------------------- IOPub Traffic ------------------------------
                 def save_iopub_message(self, topics, msg):
                     """save an iopub message into the db"""
                     print (topics)
                     try:
                         msg = self.session.unpack_message(msg, content=True)
                     except:
                         self.log.error("iopub::invalid IOPub message", exc_info=True)
                         return
                     parent = msg['parent_header']
                     if not parent:
                         self.log.error("iopub::invalid IOPub message: %s"%msg)
                         return
                     msg_id = parent['msg_id']
                     msg_type = msg['msg_type']
                     content = msg['content']
                     # ensure msg_id is in db
                     try:
                         rec = self.db.get_record(msg_id)
                     except:
                         self.log.error("iopub::IOPub message has invalid parent", exc_info=True)
                         return
                     # stream
                     d = {}
                     if msg_type == 'stream':
                         name = content['name']
                         s = rec[name] or ''
                         d[name] = s + content['data']
                     elif msg_type == 'pyerr':
                         d['pyerr'] = content
                     else:
                         d[msg_type] = content['data']
                     self.db.update_record(msg_id, d)
                 #-------------------------------------------------------------------------
                 # Registration requests
                 #-------------------------------------------------------------------------
                 def connection_request(self, client_id, msg):
                     """Reply with connection addresses for clients."""
                     self.log.info("client::client %s connected"%client_id)
                     content = dict(status='ok')
                     content.update(self.client_info)
                     jsonable = {}
                     for k,v in self.keytable.iteritems():
                         jsonable[str(k)] = v
                     content['engines'] = jsonable
                     self.session.send(self.registrar, 'connection_reply', content, parent=msg, ident=client_id)
                 def register_engine(self, reg, msg):
                     """Register a new engine."""
                     content = msg['content']
                     try:
                         queue = content['queue']
                     except KeyError:
                         self.log.error("registration::queue not specified", exc_info=True)
                         return
                     heart = content.get('heartbeat', None)
                     """register a new engine, and create the socket(s) necessary"""
                     eid = self._next_id
                     # print (eid, queue, reg, heart)
                     self.log.debug("registration::register_engine(%i, %r, %r, %r)"%(eid, queue, reg, heart))
                     content = dict(id=eid,status='ok')
                     content.update(self.engine_info)
                     # check if requesting available IDs:
                     if queue in self.by_ident:
                         try:
                             raise KeyError("queue_id %r in use"%queue)
                         except:
                             content = wrap_exception()
                             self.log.error("queue_id %r in use"%queue, exc_info=True)
                     elif heart in self.hearts: # need to check unique hearts?
                         try:
                             raise KeyError("heart_id %r in use"%heart)
                         except:
                             self.log.error("heart_id %r in use"%heart, exc_info=True)
                             content = wrap_exception()
                     else:
                         for h, pack in self.incoming_registrations.iteritems():
                             if heart == h:
                                 try:
                                     raise KeyError("heart_id %r in use"%heart)
                                 except:
                                     self.log.error("heart_id %r in use"%heart, exc_info=True)
                                     content = wrap_exception()
                                 break
                             elif queue == pack[1]:
                                 try:
                                     raise KeyError("queue_id %r in use"%queue)
                                 except:
                                     self.log.error("queue_id %r in use"%queue, exc_info=True)
                                     content = wrap_exception()
                                 break
                     msg = self.session.send(self.registrar, "registration_reply",
                             content=content,
                             ident=reg)
                     if content['status'] == 'ok':
                         if heart in self.heartmonitor.hearts:
                             # already beating
                             self.incoming_registrations[heart] = (eid,queue,reg[0],None)
                             self.finish_registration(heart)
                         else:
                             purge = lambda : self._purge_stalled_registration(heart)
                             dc = ioloop.DelayedCallback(purge, self.registration_timeout, self.loop)
                             dc.start()
                             self.incoming_registrations[heart] = (eid,queue,reg[0],dc)
                     else:
                         self.log.error("registration::registration %i failed: %s"%(eid, content['evalue']))
                     return eid
                 def unregister_engine(self, ident, msg):
                     """Unregister an engine that explicitly requested to leave."""
                     try:
                         eid = msg['content']['id']
                     except:
                         self.log.error("registration::bad engine id for unregistration: %s"%ident, exc_info=True)
                         return
                     self.log.info("registration::unregister_engine(%s)"%eid)
                     content=dict(id=eid, queue=self.engines[eid].queue)
                     self.ids.remove(eid)
                     self.keytable.pop(eid)
                     ec = self.engines.pop(eid)
                     self.hearts.pop(ec.heartbeat)
                     self.by_ident.pop(ec.queue)
                     self.completed.pop(eid)
                     for msg_id in self.queues.pop(eid):
                         msg = self.pending.remove(msg_id)
                         ############## TODO: HANDLE IT ################
                     if self.notifier:
                         self.session.send(self.notifier, "unregistration_notification", content=content)
                 def finish_registration(self, heart):
                     """Second half of engine registration, called after our HeartMonitor
                     has received a beat from the Engine's Heart."""
                     try:
                         (eid,queue,reg,purge) = self.incoming_registrations.pop(heart)
                     except KeyError:
                         self.log.error("registration::tried to finish nonexistant registration", exc_info=True)
                         return
                     self.log.info("registration::finished registering engine %i:%r"%(eid,queue))
                     if purge is not None:
                         purge.stop()
                     control = queue
                     self.ids.add(eid)
                     self.keytable[eid] = queue
                     self.engines[eid] = EngineConnector(id=eid, queue=queue, registration=reg,
                                                 control=control, heartbeat=heart)
                     self.by_ident[queue] = eid
                     self.queues[eid] = list()
                     self.tasks[eid] = list()
                     self.completed[eid] = list()
                     self.hearts[heart] = eid
                     content = dict(id=eid, queue=self.engines[eid].queue)
                     if self.notifier:
                         self.session.send(self.notifier, "registration_notification", content=content)
                     self.log.info("engine::Engine Connected: %i"%eid)
                 def _purge_stalled_registration(self, heart):
                     if heart in self.incoming_registrations:
                         eid = self.incoming_registrations.pop(heart)[0]
                         self.log.info("registration::purging stalled registration: %i"%eid)
                     else:
                         pass
                 #-------------------------------------------------------------------------
                 # Client Requests
                 #-------------------------------------------------------------------------
                 def shutdown_request(self, client_id, msg):
                     """handle shutdown request."""
                     # s = self.context.socket(zmq.XREQ)
                     # s.connect(self.client_connections['mux'])
                     # time.sleep(0.1)
                     # for eid,ec in self.engines.iteritems():
                     #     self.session.send(s, 'shutdown_request', content=dict(restart=False), ident=ec.queue)
                     # time.sleep(1)
                     self.session.send(self.clientele, 'shutdown_reply', content={'status': 'ok'}, ident=client_id)
                     dc = ioloop.DelayedCallback(lambda : self._shutdown(), 1000, self.loop)
                     dc.start()
                 def _shutdown(self):
                     self.log.info("hub::hub shutting down.")
                     time.sleep(0.1)
                     sys.exit(0)
                 def check_load(self, client_id, msg):
                     content = msg['content']
                     try:
                         targets = content['targets']
                         targets = self._validate_targets(targets)
                     except:
                         content = wrap_exception()
                         self.session.send(self.clientele, "hub_error",
                                 content=content, ident=client_id)
                         return
                     content = dict(status='ok')
                     # loads = {}
                     for t in targets:
                         content[bytes(t)] = len(self.queues[t])+len(self.tasks[t])
                     self.session.send(self.clientele, "load_reply", content=content, ident=client_id)
                 def queue_status(self, client_id, msg):
                     """Return the Queue status of one or more targets.
                     if verbose: return the msg_ids
                     else: return len of each type.
                     keys: queue (pending MUX jobs)
                         tasks (pending Task jobs)
                         completed (finished jobs from both queues)"""
                     content = msg['content']
                     targets = content['targets']
                     try:
                         targets = self._validate_targets(targets)
                     except:
                         content = wrap_exception()
                         self.session.send(self.clientele, "hub_error",
                                 content=content, ident=client_id)
                         return
                     verbose = content.get('verbose', False)
                     content = dict(status='ok')
                     for t in targets:
                         queue = self.queues[t]
                         completed = self.completed[t]
                         tasks = self.tasks[t]
                         if not verbose:
                             queue = len(queue)
                             completed = len(completed)
                             tasks = len(tasks)
                         content[bytes(t)] = {'queue': queue, 'completed': completed , 'tasks': tasks}
                         # pending
                     self.session.send(self.clientele, "queue_reply", content=content, ident=client_id)
                 def purge_results(self, client_id, msg):
                     """Purge results from memory. This method is more valuable before we move
                     to a DB based message storage mechanism."""
                     content = msg['content']
                     msg_ids = content.get('msg_ids', [])
                     reply = dict(status='ok')
                     if msg_ids == 'all':
                         self.db.drop_matching_records(dict(completed={'$ne':None}))
                     else:
                         for msg_id in msg_ids:
                             if msg_id in self.all_completed:
                                 self.db.drop_record(msg_id)
                             else:
                                 if msg_id in self.pending:
                                     try:
                                         raise IndexError("msg pending: %r"%msg_id)
                                     except:
                                         reply = wrap_exception()
                                 else:
                                     try:
                                         raise IndexError("No such msg: %r"%msg_id)
                                     except:
                                         reply = wrap_exception()
                                 break
                         eids = content.get('engine_ids', [])
                         for eid in eids:
                             if eid not in self.engines:
                                 try:
                                     raise IndexError("No such engine: %i"%eid)
                                 except:
                                     reply = wrap_exception()
                                 break
                             msg_ids = self.completed.pop(eid)
                             uid = self.engines[eid].queue
                             self.db.drop_matching_records(dict(engine_uuid=uid, completed={'$ne':None}))
                     self.session.send(self.clientele, 'purge_reply', content=reply, ident=client_id)
                 def resubmit_task(self, client_id, msg, buffers):
                     """Resubmit a task."""
                     raise NotImplementedError
                 def get_results(self, client_id, msg):
                     """Get the result of 1 or more messages."""
                     content = msg['content']
                     msg_ids = sorted(set(content['msg_ids']))
                     statusonly = content.get('status_only', False)
                     pending = []
                     completed = []
                     content = dict(status='ok')
                     content['pending'] = pending
                     content['completed'] = completed
                     buffers = []
                     if not statusonly:
                         content['results'] = {}
                         records = self.db.find_records(dict(msg_id={'$in':msg_ids}))
                     for msg_id in msg_ids:
                         if msg_id in self.pending:
                             pending.append(msg_id)
                         elif msg_id in self.all_completed:
                             completed.append(msg_id)
                             if not statusonly:
                                 rec = records[msg_id]
                                 io_dict = {}
                                 for key in 'pyin pyout pyerr stdout stderr'.split():
                                         io_dict[key] = rec[key]
                                 content[msg_id] = { 'result_content': rec['result_content'],
                                                     'header': rec['header'],
                                                     'result_header' : rec['result_header'],
                                                     'io' : io_dict,
                                                   }
                                 buffers.extend(map(str, rec['result_buffers']))
                         else:
                             try:
                                 raise KeyError('No such message: '+msg_id)
                             except:
                                 content = wrap_exception()
                             break
                     self.session.send(self.clientele, "result_reply", content=content,
                                                         parent=msg, ident=client_id,
                                                         buffers=buffers)

IPython/zmq/parallel/ipclusterapp.py

0 +19 -6

             #!/usr/bin/env python
             # encoding: utf-8
             """
             The ipcluster application.
             """
             #-----------------------------------------------------------------------------
             #  Copyright (C) 2008-2009  The IPython Development Team
             #
             #  Distributed under the terms of the BSD License.  The full license is in
             #  the file COPYING, distributed as part of this software.
             #-----------------------------------------------------------------------------
             #-----------------------------------------------------------------------------
             # Imports
             #-----------------------------------------------------------------------------
             import re
             import logging
             import os
             import signal
             import logging
+            import errno
+            import zmq
             from zmq.eventloop import ioloop
             from IPython.external.argparse import ArgumentParser, SUPPRESS
             from IPython.utils.importstring import import_item
             from IPython.zmq.parallel.clusterdir import (
                 ApplicationWithClusterDir, ClusterDirConfigLoader,
                 ClusterDirError, PIDFileError
             )
             #-----------------------------------------------------------------------------
             # Module level variables
             #-----------------------------------------------------------------------------
             default_config_file_name = u'ipcluster_config.py'
             _description = """\
             Start an IPython cluster for parallel computing.\n\n
             An IPython cluster consists of 1 controller and 1 or more engines.
             This command automates the startup of these processes using a wide
             range of startup methods (SSH, local processes, PBS, mpiexec,
             Windows HPC Server 2008). To start a cluster with 4 engines on your
             local host simply do 'ipclusterz start -n 4'. For more complex usage
             you will typically do 'ipclusterz create -p mycluster', then edit
             configuration files, followed by 'ipclusterz start -p mycluster -n 4'.
             """
             # Exit codes for ipcluster
             # This will be the exit code if the ipcluster appears to be running because
             # a .pid file exists
             ALREADY_STARTED = 10
             # This will be the exit code if ipcluster stop is run, but there is not .pid
             # file to be found.
             ALREADY_STOPPED = 11
             # This will be the exit code if ipcluster engines is run, but there is not .pid
             # file to be found.
             NO_CLUSTER = 12
             #-----------------------------------------------------------------------------
             # Command line options
             #-----------------------------------------------------------------------------
             class IPClusterAppConfigLoader(ClusterDirConfigLoader):
                 def _add_arguments(self):
                     # Don't call ClusterDirConfigLoader._add_arguments as we don't want
                     # its defaults on self.parser. Instead, we will put those on
                     # default options on our subparsers.
                     # This has all the common options that all subcommands use
                     parent_parser1 = ArgumentParser(
                         add_help=False,
                         argument_default=SUPPRESS
                     )
                     self._add_ipython_dir(parent_parser1)
                     self._add_log_level(parent_parser1)
                     # This has all the common options that other subcommands use
                     parent_parser2 = ArgumentParser(
                         add_help=False,
                         argument_default=SUPPRESS
                     )
                     self._add_cluster_profile(parent_parser2)
                     self._add_cluster_dir(parent_parser2)
                     self._add_work_dir(parent_parser2)
                     paa = parent_parser2.add_argument
                     paa('--log-to-file',
                         action='store_true', dest='Global.log_to_file',
                         help='Log to a file in the log directory (default is stdout)')
                     # Create the object used to create the subparsers.
                     subparsers = self.parser.add_subparsers(
                         dest='Global.subcommand',
                         title='ipcluster subcommands',
                         description=
                         """ipcluster has a variety of subcommands. The general way of
                         running ipcluster is 'ipclusterz <cmd> [options]'. To get help
                         on a particular subcommand do 'ipclusterz <cmd> -h'."""
                         # help="For more help, type 'ipclusterz <cmd> -h'",
                     )
                     # The "list" subcommand parser
                     parser_list = subparsers.add_parser(
                         'list',
                         parents=[parent_parser1],
                         argument_default=SUPPRESS,
                         help="List all clusters in cwd and ipython_dir.",
                         description=
                         """List all available clusters, by cluster directory, that can
                         be found in the current working directly or in the ipython
                         directory. Cluster directories are named using the convention
                         'cluster_<profile>'."""
                     )
                     # The "create" subcommand parser
                     parser_create = subparsers.add_parser(
                         'create',
                         parents=[parent_parser1, parent_parser2],
                         argument_default=SUPPRESS,
                         help="Create a new cluster directory.",
                         description=
                         """Create an ipython cluster directory by its profile name or
                         cluster directory path. Cluster directories contain
                         configuration, log and security related files and are named
                         using the convention 'cluster_<profile>'. By default they are
                         located in your ipython directory. Once created, you will
                         probably need to edit the configuration files in the cluster
                         directory to configure your cluster. Most users will create a
                         cluster directory by profile name,
                         'ipclusterz create -p mycluster', which will put the directory
                         in '<ipython_dir>/cluster_mycluster'.
                         """
                     )
                     paa = parser_create.add_argument
                     paa('--reset-config',
                         dest='Global.reset_config', action='store_true',
                         help=
                         """Recopy the default config files to the cluster directory.
                         You will loose any modifications you have made to these files.""")
                     # The "start" subcommand parser
                     parser_start = subparsers.add_parser(
                         'start',
                         parents=[parent_parser1, parent_parser2],
                         argument_default=SUPPRESS,
                         help="Start a cluster.",
                         description=
                         """Start an ipython cluster by its profile name or cluster
                         directory. Cluster directories contain configuration, log and
                         security related files and are named using the convention
                         'cluster_<profile>' and should be creating using the 'start'
                         subcommand of 'ipcluster'. If your cluster directory is in
                         the cwd or the ipython directory, you can simply refer to it
                         using its profile name, 'ipclusterz start -n 4 -p <profile>`,
                         otherwise use the '--cluster-dir' option.
                         """
                     )
                     paa = parser_start.add_argument
                     paa('-n', '--number',
                         type=int, dest='Global.n',
                         help='The number of engines to start.',
                         metavar='Global.n')
                     paa('--clean-logs',
                         dest='Global.clean_logs', action='store_true',
                         help='Delete old log flies before starting.')
                     paa('--no-clean-logs',
                         dest='Global.clean_logs', action='store_false',
                         help="Don't delete old log flies before starting.")
                     paa('--daemon',
                         dest='Global.daemonize', action='store_true',
                         help='Daemonize the ipcluster program. This implies --log-to-file')
                     paa('--no-daemon',
                         dest='Global.daemonize', action='store_false',
                         help="Dont't daemonize the ipcluster program.")
                     paa('--delay',
                         type=float, dest='Global.delay',
                         help="Specify the delay (in seconds) between starting the controller and starting the engine(s).")
                     # The "stop" subcommand parser
                     parser_stop = subparsers.add_parser(
                         'stop',
                         parents=[parent_parser1, parent_parser2],
                         argument_default=SUPPRESS,
                         help="Stop a running cluster.",
                         description=
                         """Stop a running ipython cluster by its profile name or cluster
                         directory. Cluster directories are named using the convention
                         'cluster_<profile>'. If your cluster directory is in
                         the cwd or the ipython directory, you can simply refer to it
                         using its profile name, 'ipclusterz stop -p <profile>`, otherwise
                         use the '--cluster-dir' option.
                         """
                     )
                     paa = parser_stop.add_argument
                     paa('--signal',
                         dest='Global.signal', type=int,
                         help="The signal number to use in stopping the cluster (default=2).",
                         metavar="Global.signal")
                     # the "engines" subcommand parser
                     parser_engines = subparsers.add_parser(
                         'engines',
                         parents=[parent_parser1, parent_parser2],
                         argument_default=SUPPRESS,
                         help="Attach some engines to an existing controller or cluster.",
                         description=
                         """Start one or more engines to connect to an existing Cluster
                         by profile name or cluster directory.
                         Cluster directories contain configuration, log and
                         security related files and are named using the convention
                         'cluster_<profile>' and should be creating using the 'start'
                         subcommand of 'ipcluster'. If your cluster directory is in
                         the cwd or the ipython directory, you can simply refer to it
                         using its profile name, 'ipclusterz engines -n 4 -p <profile>`,
                         otherwise use the '--cluster-dir' option.
                         """
                     )
                     paa = parser_engines.add_argument
                     paa('-n', '--number',
                         type=int, dest='Global.n',
                         help='The number of engines to start.',
                         metavar='Global.n')
                     paa('--daemon',
                         dest='Global.daemonize', action='store_true',
                         help='Daemonize the ipcluster program. This implies --log-to-file')
                     paa('--no-daemon',
                         dest='Global.daemonize', action='store_false',
                         help="Dont't daemonize the ipcluster program.")
             #-----------------------------------------------------------------------------
             # Main application
             #-----------------------------------------------------------------------------
             class IPClusterApp(ApplicationWithClusterDir):
                 name = u'ipclusterz'
                 description = _description
                 usage = None
                 command_line_loader = IPClusterAppConfigLoader
                 default_config_file_name = default_config_file_name
                 default_log_level = logging.INFO
                 auto_create_cluster_dir = False
                 def create_default_config(self):
                     super(IPClusterApp, self).create_default_config()
                     self.default_config.Global.controller_launcher = \
                         'IPython.zmq.parallel.launcher.LocalControllerLauncher'
                     self.default_config.Global.engine_launcher = \
                         'IPython.zmq.parallel.launcher.LocalEngineSetLauncher'
                     self.default_config.Global.n = 2
                     self.default_config.Global.delay = 2
                     self.default_config.Global.reset_config = False
                     self.default_config.Global.clean_logs = True
                     self.default_config.Global.signal = signal.SIGINT
                     self.default_config.Global.daemonize = False
                 def find_resources(self):
                     subcommand = self.command_line_config.Global.subcommand
                     if subcommand=='list':
                         self.list_cluster_dirs()
                         # Exit immediately because there is nothing left to do.
                         self.exit()
                     elif subcommand=='create':
                         self.auto_create_cluster_dir = True
                         super(IPClusterApp, self).find_resources()
                     elif subcommand=='start' or subcommand=='stop':
                         self.auto_create_cluster_dir = True
                         try:
                             super(IPClusterApp, self).find_resources()
                         except ClusterDirError:
                             raise ClusterDirError(
                                 "Could not find a cluster directory. A cluster dir must "
                                 "be created before running 'ipclusterz start'.  Do "
                                 "'ipclusterz create -h' or 'ipclusterz list -h' for more "
                                 "information about creating and listing cluster dirs."
                             )
                     elif subcommand=='engines':
                         self.auto_create_cluster_dir = False
                         try:
                             super(IPClusterApp, self).find_resources()
                         except ClusterDirError:
                             raise ClusterDirError(
                                 "Could not find a cluster directory. A cluster dir must "
                                 "be created before running 'ipclusterz start'.  Do "
                                 "'ipclusterz create -h' or 'ipclusterz list -h' for more "
                                 "information about creating and listing cluster dirs."
                             )
                 def list_cluster_dirs(self):
                     # Find the search paths
                     cluster_dir_paths = os.environ.get('IPCLUSTER_DIR_PATH','')
                     if cluster_dir_paths:
                         cluster_dir_paths = cluster_dir_paths.split(':')
                     else:
                         cluster_dir_paths = []
                     try:
                         ipython_dir = self.command_line_config.Global.ipython_dir
                     except AttributeError:
                         ipython_dir = self.default_config.Global.ipython_dir
                     paths = [os.getcwd(), ipython_dir] + \
                         cluster_dir_paths
                     paths = list(set(paths))
                     self.log.info('Searching for cluster dirs in paths: %r' % paths)
                     for path in paths:
                         files = os.listdir(path)
                         for f in files:
                             full_path = os.path.join(path, f)
                             if os.path.isdir(full_path) and f.startswith('cluster_'):
                                 profile = full_path.split('_')[-1]
                                 start_cmd = 'ipclusterz start -p %s -n 4' % profile
                                 print start_cmd + " ==> " + full_path
                 def pre_construct(self):
                     # IPClusterApp.pre_construct() is where we cd to the working directory.
                     super(IPClusterApp, self).pre_construct()
                     config = self.master_config
                     try:
                         daemon = config.Global.daemonize
                         if daemon:
                             config.Global.log_to_file = True
                     except AttributeError:
                         pass
                 def construct(self):
                     config = self.master_config
                     subcmd = config.Global.subcommand
                     reset = config.Global.reset_config
                     if subcmd == 'list':
                         return
                     if subcmd == 'create':
                         self.log.info('Copying default config files to cluster directory '
                         '[overwrite=%r]' % (reset,))
                         self.cluster_dir_obj.copy_all_config_files(overwrite=reset)
                     if subcmd =='start':
                         self.cluster_dir_obj.copy_all_config_files(overwrite=False)
                         self.start_logging()
                         self.loop = ioloop.IOLoop.instance()
                         # reactor.callWhenRunning(self.start_launchers)
                         dc = ioloop.DelayedCallback(self.start_launchers, 0, self.loop)
                         dc.start()
                     if subcmd == 'engines':
                         self.start_logging()
                         self.loop = ioloop.IOLoop.instance()
                         # reactor.callWhenRunning(self.start_launchers)
                         engine_only = lambda : self.start_launchers(controller=False)
                         dc = ioloop.DelayedCallback(engine_only, 0, self.loop)
                         dc.start()
                 def start_launchers(self, controller=True):
                     config = self.master_config
                     # Create the launchers. In both bases, we set the work_dir of
                     # the launcher to the cluster_dir. This is where the launcher's
                     # subprocesses will be launched. It is not where the controller
                     # and engine will be launched.
                     if controller:
                         cl_class = import_item(config.Global.controller_launcher)
                         self.controller_launcher = cl_class(
                             work_dir=self.cluster_dir, config=config,
                             logname=self.log.name
                         )
                         # Setup the observing of stopping. If the controller dies, shut
                         # everything down as that will be completely fatal for the engines.
                         self.controller_launcher.on_stop(self.stop_launchers)
                         # But, we don't monitor the stopping of engines. An engine dying
                         # is just fine and in principle a user could start a new engine.
                         # Also, if we did monitor engine stopping, it is difficult to
                         # know what to do when only some engines die. Currently, the
                         # observing of engine stopping is inconsistent. Some launchers
                         # might trigger on a single engine stopping, other wait until
                         # all stop.  TODO: think more about how to handle this.
+                    else:
+                        self.controller_launcher = None
                     el_class = import_item(config.Global.engine_launcher)
                     self.engine_launcher = el_class(
                         work_dir=self.cluster_dir, config=config, logname=self.log.name
                     )
                     # Setup signals
                     signal.signal(signal.SIGINT, self.sigint_handler)
                     # Start the controller and engines
                     self._stopping = False  # Make sure stop_launchers is not called 2x.
                     if controller:
                         self.start_controller()
                     dc = ioloop.DelayedCallback(self.start_engines, 1000*config.Global.delay*controller, self.loop)
                     dc.start()
                     self.startup_message()
                 def startup_message(self, r=None):
                     self.log.info("IPython cluster: started")
                     return r
                 def start_controller(self, r=None):
                     # self.log.info("In start_controller")
                     config = self.master_config
                     d = self.controller_launcher.start(
                         cluster_dir=config.Global.cluster_dir
                     )
                     return d
                 def start_engines(self, r=None):
                     # self.log.info("In start_engines")
                     config = self.master_config
                     d = self.engine_launcher.start(
                         config.Global.n,
                         cluster_dir=config.Global.cluster_dir
                     )
                     return d
                 def stop_controller(self, r=None):
                     # self.log.info("In stop_controller")
-                    if self.controller_launcher.running:
+                    if self.controller_launcher and self.controller_launcher.running:
                         return self.controller_launcher.stop()
                 def stop_engines(self, r=None):
                     # self.log.info("In stop_engines")
                     if self.engine_launcher.running:
                         d = self.engine_launcher.stop()
                         # d.addErrback(self.log_err)
                         return d
                     else:
                         return None
                 def log_err(self, f):
                     self.log.error(f.getTraceback())
                     return None
                 def stop_launchers(self, r=None):
                     if not self._stopping:
                         self._stopping = True
                         # if isinstance(r, failure.Failure):
                         #     self.log.error('Unexpected error in ipcluster:')
                         #     self.log.info(r.getTraceback())
                         self.log.error("IPython cluster: stopping")
                         # These return deferreds. We are not doing anything with them
                         # but we are holding refs to them as a reminder that they
                         # do return deferreds.
                         d1 = self.stop_engines()
                         d2 = self.stop_controller()
                         # Wait a few seconds to let things shut down.
                         dc = ioloop.DelayedCallback(self.loop.stop, 4000, self.loop)
                         dc.start()
                         # reactor.callLater(4.0, reactor.stop)
                 def sigint_handler(self, signum, frame):
                     self.stop_launchers()
                 def start_logging(self):
                     # Remove old log files of the controller and engine
                     if self.master_config.Global.clean_logs:
                         log_dir = self.master_config.Global.log_dir
                         for f in os.listdir(log_dir):
                             if re.match(r'ip(engine|controller)z-\d+\.(log|err|out)',f):
                                 os.remove(os.path.join(log_dir, f))
                     # This will remove old log files for ipcluster itself
                     super(IPClusterApp, self).start_logging()
                 def start_app(self):
                     """Start the application, depending on what subcommand is used."""
                     subcmd = self.master_config.Global.subcommand
                     if subcmd=='create' or subcmd=='list':
                         return
                     elif subcmd=='start':
                         self.start_app_start()
                     elif subcmd=='stop':
                         self.start_app_stop()
                     elif subcmd=='engines':
                         self.start_app_engines()
                 def start_app_start(self):
                     """Start the app for the start subcommand."""
                     config = self.master_config
                     # First see if the cluster is already running
                     try:
                         pid = self.get_pid_from_file()
                     except PIDFileError:
                         pass
                     else:
                         self.log.critical(
                             'Cluster is already running with [pid=%s]. '
                             'use "ipclusterz stop" to stop the cluster.' % pid
                         )
                         # Here I exit with a unusual exit status that other processes
                         # can watch for to learn how I existed.
                         self.exit(ALREADY_STARTED)
                     # Now log and daemonize
                     self.log.info(
                         'Starting ipclusterz with [daemon=%r]' % config.Global.daemonize
                     )
                     # TODO: Get daemonize working on Windows or as a Windows Server.
                     if config.Global.daemonize:
                         if os.name=='posix':
                             from twisted.scripts._twistd_unix import daemonize
                             daemonize()
                     # Now write the new pid file AFTER our new forked pid is active.
                     self.write_pid_file()
                     try:
                         self.loop.start()
-                    except:
+                    except KeyboardInterrupt:
-                        self.log.info("stopping...")
+                        pass
+                    except zmq.ZMQError as e:
+                        if e.errno == errno.EINTR:
+                            pass
+                        else:
+                            raise
                     self.remove_pid_file()
                 def start_app_engines(self):
                     """Start the app for the start subcommand."""
                     config = self.master_config
                     # First see if the cluster is already running
                     # Now log and daemonize
                     self.log.info(
                         'Starting engines with [daemon=%r]' % config.Global.daemonize
                     )
                     # TODO: Get daemonize working on Windows or as a Windows Server.
                     if config.Global.daemonize:
                         if os.name=='posix':
                             from twisted.scripts._twistd_unix import daemonize
                             daemonize()
                     # Now write the new pid file AFTER our new forked pid is active.
                     # self.write_pid_file()
                     try:
                         self.loop.start()
-                    except:
+                    except KeyboardInterrupt:
-                        self.log.fatal("stopping...")
+                        pass
+                    except zmq.ZMQError as e:
+                        if e.errno == errno.EINTR:
+                            pass
+                        else:
+                            raise
                     # self.remove_pid_file()
                 def start_app_stop(self):
                     """Start the app for the stop subcommand."""
                     config = self.master_config
                     try:
                         pid = self.get_pid_from_file()
                     except PIDFileError:
                         self.log.critical(
                             'Problem reading pid file, cluster is probably not running.'
                         )
                         # Here I exit with a unusual exit status that other processes
                         # can watch for to learn how I existed.
                         self.exit(ALREADY_STOPPED)
                     else:
                         if os.name=='posix':
                             sig = config.Global.signal
                             self.log.info(
                                 "Stopping cluster [pid=%r] with [signal=%r]" % (pid, sig)
                             )
                             os.kill(pid, sig)
                         elif os.name=='nt':
                             # As of right now, we don't support daemonize on Windows, so
                             # stop will not do anything. Minimally, it should clean up the
                             # old .pid files.
                             self.remove_pid_file()
             def launch_new_instance():
                 """Create and run the IPython cluster."""
                 app = IPClusterApp()
                 app.start()
             if __name__ == '__main__':
                 launch_new_instance()

IPython/zmq/parallel/scheduler.py

0 +48 -41

             """The Python scheduler for rich scheduling.
             The Pure ZMQ scheduler does not allow routing schemes other than LRU,
             nor does it check msg_id DAG dependencies. For those, a slightly slower
             Python Scheduler exists.
             """
             #----------------------------------------------------------------------
             # Imports
             #----------------------------------------------------------------------
             from __future__ import print_function
             import sys
             import logging
             from random import randint, random
             from types import FunctionType
             from datetime import datetime, timedelta
             try:
                 import numpy
             except ImportError:
                 numpy = None
             import zmq
             from zmq.eventloop import ioloop, zmqstream
             # local imports
             from IPython.external.decorator import decorator
             # from IPython.config.configurable import Configurable
             from IPython.utils.traitlets import Instance, Dict, List, Set
             import error
             # from client import Client
             from dependency import Dependency
             import streamsession as ss
             from entry_point import connect_logger, local_logger
             from factory import SessionFactory
             @decorator
             def logged(f,self,*args,**kwargs):
                 # print ("#--------------------")
                 self.log.debug("scheduler::%s(*%s,**%s)"%(f.func_name, args, kwargs))
                 # print ("#--")
                 return f(self,*args, **kwargs)
             #----------------------------------------------------------------------
             # Chooser functions
             #----------------------------------------------------------------------
             def plainrandom(loads):
                 """Plain random pick."""
                 n = len(loads)
                 return randint(0,n-1)
             def lru(loads):
                 """Always pick the front of the line.
                 The content of `loads` is ignored.
                 Assumes LRU ordering of loads, with oldest first.
                 """
                 return 0
             def twobin(loads):
                 """Pick two at random, use the LRU of the two.
                 The content of loads is ignored.
                 Assumes LRU ordering of loads, with oldest first.
                 """
                 n = len(loads)
                 a = randint(0,n-1)
                 b = randint(0,n-1)
                 return min(a,b)
             def weighted(loads):
                 """Pick two at random using inverse load as weight.
                 Return the less loaded of the two.
                 """
                 # weight 0 a million times more than 1:
                 weights = 1./(1e-6+numpy.array(loads))
                 sums = weights.cumsum()
                 t = sums[-1]
                 x = random()*t
                 y = random()*t
                 idx = 0
                 idy = 0
                 while sums[idx] < x:
                     idx += 1
                 while sums[idy] < y:
                     idy += 1
                 if weights[idy] > weights[idx]:
                     return idy
                 else:
                     return idx
             def leastload(loads):
                 """Always choose the lowest load.
                 If the lowest load occurs more than once, the first
                 occurance will be used.  If loads has LRU ordering, this means
                 the LRU of those with the lowest load is chosen.
                 """
                 return loads.index(min(loads))
             #---------------------------------------------------------------------
             # Classes
             #---------------------------------------------------------------------
             # store empty default dependency:
             MET = Dependency([])
             class TaskScheduler(SessionFactory):
                 """Python TaskScheduler object.
                 This is the simplest object that supports msg_id based
                 DAG dependencies. *Only* task msg_ids are checked, not
                 msg_ids of jobs submitted via the MUX queue.
                 """
                 # input arguments:
                 scheme = Instance(FunctionType, default=leastload) # function for determining the destination
                 client_stream = Instance(zmqstream.ZMQStream) # client-facing stream
                 engine_stream = Instance(zmqstream.ZMQStream) # engine-facing stream
                 notifier_stream = Instance(zmqstream.ZMQStream) # hub-facing sub stream
                 mon_stream = Instance(zmqstream.ZMQStream) # hub-facing pub stream
                 # internals:
-                dependencies = Dict() # dict by msg_id of [ msg_ids that depend on key ]
+                graph = Dict() # dict by msg_id of [ msg_ids that depend on key ]
                 depending = Dict() # dict by msg_id of (msg_id, raw_msg, after, follow)
                 pending = Dict() # dict by engine_uuid of submitted tasks
                 completed = Dict() # dict by engine_uuid of completed tasks
                 failed = Dict() # dict by engine_uuid of failed tasks
                 destinations = Dict() # dict by msg_id of engine_uuids where jobs ran (reverse of completed+failed)
                 clients = Dict() # dict by msg_id for who submitted the task
                 targets = List() # list of target IDENTs
                 loads = List() # list of engine loads
                 all_completed = Set() # set of all completed tasks
                 all_failed = Set() # set of all failed tasks
                 all_done = Set() # set of all finished tasks=union(completed,failed)
+                all_ids = Set() # set of all submitted task IDs
                 blacklist = Dict() # dict by msg_id of locations where a job has encountered UnmetDependency
                 auditor = Instance('zmq.eventloop.ioloop.PeriodicCallback')
                 def start(self):
                     self.engine_stream.on_recv(self.dispatch_result, copy=False)
                     self._notification_handlers = dict(
                         registration_notification = self._register_engine,
                         unregistration_notification = self._unregister_engine
                     )
                     self.notifier_stream.on_recv(self.dispatch_notification)
                     self.auditor = ioloop.PeriodicCallback(self.audit_timeouts, 2e3, self.loop) # 1 Hz
                     self.auditor.start()
                     self.log.info("Scheduler started...%r"%self)
                 def resume_receiving(self):
                     """Resume accepting jobs."""
                     self.client_stream.on_recv(self.dispatch_submission, copy=False)
                 def stop_receiving(self):
                     """Stop accepting jobs while there are no engines.
                     Leave them in the ZMQ queue."""
                     self.client_stream.on_recv(None)
                 #-----------------------------------------------------------------------
                 # [Un]Registration Handling
                 #-----------------------------------------------------------------------
                 def dispatch_notification(self, msg):
                     """dispatch register/unregister events."""
                     idents,msg = self.session.feed_identities(msg)
                     msg = self.session.unpack_message(msg)
                     msg_type = msg['msg_type']
                     handler = self._notification_handlers.get(msg_type, None)
                     if handler is None:
                         raise Exception("Unhandled message type: %s"%msg_type)
                     else:
                         try:
                             handler(str(msg['content']['queue']))
                         except KeyError:
                             self.log.error("task::Invalid notification msg: %s"%msg)
                 @logged
                 def _register_engine(self, uid):
                     """New engine with ident `uid` became available."""
                     # head of the line:
                     self.targets.insert(0,uid)
                     self.loads.insert(0,0)
                     # initialize sets
                     self.completed[uid] = set()
                     self.failed[uid] = set()
                     self.pending[uid] = {}
                     if len(self.targets) == 1:
                         self.resume_receiving()
                 def _unregister_engine(self, uid):
                     """Existing engine with ident `uid` became unavailable."""
                     if len(self.targets) == 1:
                         # this was our only engine
                         self.stop_receiving()
                     # handle any potentially finished tasks:
                     self.engine_stream.flush()
                     self.completed.pop(uid)
                     self.failed.pop(uid)
                     # don't pop destinations, because it might be used later
                     # map(self.destinations.pop, self.completed.pop(uid))
                     # map(self.destinations.pop, self.failed.pop(uid))
                     idx = self.targets.index(uid)
                     self.targets.pop(idx)
                     self.loads.pop(idx)
                     # wait 5 seconds before cleaning up pending jobs, since the results might
                     # still be incoming
                     if self.pending[uid]:
                         dc = ioloop.DelayedCallback(lambda : self.handle_stranded_tasks(uid), 5000, self.loop)
                         dc.start()
                 @logged
                 def handle_stranded_tasks(self, engine):
                     """Deal with jobs resident in an engine that died."""
                     lost = self.pending.pop(engine)
                     for msg_id, (raw_msg,follow) in lost.iteritems():
                         self.all_failed.add(msg_id)
                         self.all_done.add(msg_id)
                         idents,msg = self.session.feed_identities(raw_msg, copy=False)
                         msg = self.session.unpack_message(msg, copy=False, content=False)
                         parent = msg['header']
                         idents = [idents[0],engine]+idents[1:]
                         print (idents)
                         try:
                             raise error.EngineError("Engine %r died while running task %r"%(engine, msg_id))
                         except:
                             content = ss.wrap_exception()
                         msg = self.session.send(self.client_stream, 'apply_reply', content,
                                                                 parent=parent, ident=idents)
                         self.session.send(self.mon_stream, msg, ident=['outtask']+idents)
-                        self.update_dependencies(msg_id)
+                        self.update_graph(msg_id)
                 #-----------------------------------------------------------------------
                 # Job Submission
                 #-----------------------------------------------------------------------
                 @logged
                 def dispatch_submission(self, raw_msg):
                     """Dispatch job submission to appropriate handlers."""
                     # ensure targets up to date:
                     self.notifier_stream.flush()
                     try:
                         idents, msg = self.session.feed_identities(raw_msg, copy=False)
-                    except Exception as e:
+                        msg = self.session.unpack_message(msg, content=False, copy=False)
-                        self.log.error("task::Invaid msg: %s"%msg)
+                    except:
+                        self.log.error("task::Invaid task: %s"%raw_msg, exc_info=True)
                         return
                     # send to monitor
                     self.mon_stream.send_multipart(['intask']+raw_msg, copy=False)
-                    msg = self.session.unpack_message(msg, content=False, copy=False)
                     header = msg['header']
                     msg_id = header['msg_id']
+                    self.all_ids.add(msg_id)
                     # time dependencies
                     after = Dependency(header.get('after', []))
-                    if after.mode == 'all':
+                    if after.all:
                         after.difference_update(self.all_completed)
                         if not after.success_only:
                             after.difference_update(self.all_failed)
                     if after.check(self.all_completed, self.all_failed):
                         # recast as empty set, if `after` already met,
                         # to prevent unnecessary set comparisons
                         after = MET
                     # location dependencies
                     follow = Dependency(header.get('follow', []))
-                    # check if unreachable:
-                    if after.unreachable(self.all_failed) or follow.unreachable(self.all_failed):
+                    for dep in after,follow:
-                        self.depending[msg_id] = [raw_msg,MET,MET,None]
+                        # check valid:
-                        return self.fail_unreachable(msg_id)
+                        if msg_id in dep or dep.difference(self.all_ids):
+                            self.depending[msg_id] = [raw_msg,MET,MET,None]
+                            return self.fail_unreachable(msg_id, error.InvalidDependency)
+                        # check if unreachable:
+                        if dep.unreachable(self.all_failed):
+                            self.depending[msg_id] = [raw_msg,MET,MET,None]
+                            return self.fail_unreachable(msg_id)
                     # turn timeouts into datetime objects:
                     timeout = header.get('timeout', None)
                     if timeout:
                         timeout = datetime.now() + timedelta(0,timeout,0)
                     if after.check(self.all_completed, self.all_failed):
                         # time deps already met, try to run
-                        if not self.maybe_run(msg_id, raw_msg, follow):
+                        if not self.maybe_run(msg_id, raw_msg, follow, timeout):
                             # can't run yet
                             self.save_unmet(msg_id, raw_msg, after, follow, timeout)
                     else:
                         self.save_unmet(msg_id, raw_msg, after, follow, timeout)
                 # @logged
                 def audit_timeouts(self):
                     """Audit all waiting tasks for expired timeouts."""
                     now = datetime.now()
                     for msg_id in self.depending.keys():
                         # must recheck, in case one failure cascaded to another:
                         if msg_id in self.depending:
                             raw,after,follow,timeout = self.depending[msg_id]
                             if timeout and timeout < now:
                                 self.fail_unreachable(msg_id, timeout=True)
                 @logged
-                def fail_unreachable(self, msg_id, timeout=False):
+                def fail_unreachable(self, msg_id, why=error.ImpossibleDependency):
                     """a message has become unreachable"""
                     if msg_id not in self.depending:
                         self.log.error("msg %r already failed!"%msg_id)
                         return
                     raw_msg, after, follow, timeout = self.depending.pop(msg_id)
                     for mid in follow.union(after):
-                        if mid in self.dependencies:
+                        if mid in self.graph:
-                            self.dependencies[mid].remove(msg_id)
+                            self.graph[mid].remove(msg_id)
                     # FIXME: unpacking a message I've already unpacked, but didn't save:
                     idents,msg = self.session.feed_identities(raw_msg, copy=False)
                     msg = self.session.unpack_message(msg, copy=False, content=False)
                     header = msg['header']
-                    impossible = error.DependencyTimeout if timeout else error.ImpossibleDependency
                     try:
-                        raise impossible()
+                        raise why()
                     except:
                         content = ss.wrap_exception()
                     self.all_done.add(msg_id)
                     self.all_failed.add(msg_id)
                     msg = self.session.send(self.client_stream, 'apply_reply', content,
                                                             parent=header, ident=idents)
                     self.session.send(self.mon_stream, msg, ident=['outtask']+idents)
-                    self.update_dependencies(msg_id, success=False)
+                    self.update_graph(msg_id, success=False)
                 @logged
-                def maybe_run(self, msg_id, raw_msg, follow=None):
+                def maybe_run(self, msg_id, raw_msg, follow=None, timeout=None):
                     """check location dependencies, and run if they are met."""
                     if follow:
                         def can_run(idx):
                             target = self.targets[idx]
                             return target not in self.blacklist.get(msg_id, []) and\
                                     follow.check(self.completed[target], self.failed[target])
                         indices = filter(can_run, range(len(self.targets)))
                         if not indices:
-                            # TODO evaluate unmeetable follow dependencies
+                            if follow.all:
-                            if follow.mode == 'all':
                                 dests = set()
                                 relevant = self.all_completed if follow.success_only else self.all_done
                                 for m in follow.intersection(relevant):
                                     dests.add(self.destinations[m])
                                 if len(dests) > 1:
                                     self.fail_unreachable(msg_id)
                             return False
                     else:
                         indices = None
-                    self.submit_task(msg_id, raw_msg, indices)
+                    self.submit_task(msg_id, raw_msg, follow, timeout, indices)
                     return True
                 @logged
                 def save_unmet(self, msg_id, raw_msg, after, follow, timeout):
                     """Save a message for later submission when its dependencies are met."""
                     self.depending[msg_id] = [raw_msg,after,follow,timeout]
                     # track the ids in follow or after, but not those already finished
                     for dep_id in after.union(follow).difference(self.all_done):
-                        if dep_id not in self.dependencies:
+                        if dep_id not in self.graph:
-                            self.dependencies[dep_id] = set()
+                            self.graph[dep_id] = set()
-                        self.dependencies[dep_id].add(msg_id)
+                        self.graph[dep_id].add(msg_id)
                 @logged
-                def submit_task(self, msg_id, raw_msg, follow=None, indices=None):
+                def submit_task(self, msg_id, raw_msg, follow, timeout, indices=None):
                     """Submit a task to any of a subset of our targets."""
                     if indices:
                         loads = [self.loads[i] for i in indices]
                     else:
                         loads = self.loads
                     idx = self.scheme(loads)
                     if indices:
                         idx = indices[idx]
                     target = self.targets[idx]
                     # print (target, map(str, msg[:3]))
                     self.engine_stream.send(target, flags=zmq.SNDMORE, copy=False)
                     self.engine_stream.send_multipart(raw_msg, copy=False)
                     self.add_job(idx)
-                    self.pending[target][msg_id] = (raw_msg, follow)
+                    self.pending[target][msg_id] = (raw_msg, follow, timeout)
                     content = dict(msg_id=msg_id, engine_id=target)
                     self.session.send(self.mon_stream, 'task_destination', content=content,
                                     ident=['tracktask',self.session.session])
                 #-----------------------------------------------------------------------
                 # Result Handling
                 #-----------------------------------------------------------------------
                 @logged
                 def dispatch_result(self, raw_msg):
                     try:
                         idents,msg = self.session.feed_identities(raw_msg, copy=False)
-                    except Exception as e:
+                        msg = self.session.unpack_message(msg, content=False, copy=False)
-                        self.log.error("task::Invaid result: %s"%msg)
+                    except:
+                        self.log.error("task::Invaid result: %s"%raw_msg, exc_info=True)
                         return
-                    msg = self.session.unpack_message(msg, content=False, copy=False)
                     header = msg['header']
                     if header.get('dependencies_met', True):
                         success = (header['status'] == 'ok')
                         self.handle_result(idents, msg['parent_header'], raw_msg, success)
                         # send to Hub monitor
                         self.mon_stream.send_multipart(['outtask']+raw_msg, copy=False)
                     else:
                         self.handle_unmet_dependency(idents, msg['parent_header'])
                 @logged
                 def handle_result(self, idents, parent, raw_msg, success=True):
                     # first, relay result to client
                     engine = idents[0]
                     client = idents[1]
                     # swap_ids for XREP-XREP mirror
                     raw_msg[:2] = [client,engine]
                     # print (map(str, raw_msg[:4]))
                     self.client_stream.send_multipart(raw_msg, copy=False)
                     # now, update our data structures
                     msg_id = parent['msg_id']
                     self.blacklist.pop(msg_id, None)
                     self.pending[engine].pop(msg_id)
                     if success:
                         self.completed[engine].add(msg_id)
                         self.all_completed.add(msg_id)
                     else:
                         self.failed[engine].add(msg_id)
                         self.all_failed.add(msg_id)
                     self.all_done.add(msg_id)
                     self.destinations[msg_id] = engine
-                    self.update_dependencies(msg_id, success)
+                    self.update_graph(msg_id, success)
                 @logged
                 def handle_unmet_dependency(self, idents, parent):
                     engine = idents[0]
                     msg_id = parent['msg_id']
                     if msg_id not in self.blacklist:
                         self.blacklist[msg_id] = set()
                     self.blacklist[msg_id].add(engine)
                     raw_msg,follow,timeout = self.pending[engine].pop(msg_id)
-                    if not self.maybe_run(msg_id, raw_msg, follow):
+                    if not self.maybe_run(msg_id, raw_msg, follow, timeout):
                         # resubmit failed, put it back in our dependency tree
                         self.save_unmet(msg_id, raw_msg, MET, follow, timeout)
                     pass
                 @logged
-                def update_dependencies(self, dep_id, success=True):
+                def update_graph(self, dep_id, success=True):
                     """dep_id just finished. Update our dependency
                     table and submit any jobs that just became runable."""
                     # print ("\n\n***********")
                     # pprint (dep_id)
-                    # pprint (self.dependencies)
+                    # pprint (self.graph)
                     # pprint (self.depending)
                     # pprint (self.all_completed)
                     # pprint (self.all_failed)
                     # print ("\n\n***********\n\n")
-                    if dep_id not in self.dependencies:
+                    if dep_id not in self.graph:
                         return
-                    jobs = self.dependencies.pop(dep_id)
+                    jobs = self.graph.pop(dep_id)
                     for msg_id in jobs:
                         raw_msg, after, follow, timeout = self.depending[msg_id]
                         # if dep_id in after:
-                        #     if after.mode == 'all' and (success or not after.success_only):
+                        #     if after.all and (success or not after.success_only):
                         #         after.remove(dep_id)
                         if after.unreachable(self.all_failed) or follow.unreachable(self.all_failed):
                             self.fail_unreachable(msg_id)
                         elif after.check(self.all_completed, self.all_failed): # time deps met, maybe run
                             self.depending[msg_id][1] = MET
-                            if self.maybe_run(msg_id, raw_msg, follow):
+                            if self.maybe_run(msg_id, raw_msg, follow, timeout):
                                 self.depending.pop(msg_id)
                                 for mid in follow.union(after):
-                                    if mid in self.dependencies:
+                                    if mid in self.graph:
-                                        self.dependencies[mid].remove(msg_id)
+                                        self.graph[mid].remove(msg_id)
                 #----------------------------------------------------------------------
                 # methods to be overridden by subclasses
                 #----------------------------------------------------------------------
                 def add_job(self, idx):
                     """Called after self.targets[idx] just got the job with header.
                     Override with subclasses.  The default ordering is simple LRU.
                     The default loads are the number of outstanding jobs."""
                     self.loads[idx] += 1
                     for lis in (self.targets, self.loads):
                         lis.append(lis.pop(idx))
                 def finish_job(self, idx):
                     """Called after self.targets[idx] just finished a job.
                     Override with subclasses."""
                     self.loads[idx] -= 1
-            def launch_scheduler(in_addr, out_addr, mon_addr, not_addr, config=None,logname='ZMQ', log_addr=None, loglevel=logging.DEBUG, scheme='weighted'):
+            def launch_scheduler(in_addr, out_addr, mon_addr, not_addr, config=None,logname='ZMQ',
+                                        log_addr=None, loglevel=logging.DEBUG, scheme='lru'):
                 from zmq.eventloop import ioloop
                 from zmq.eventloop.zmqstream import ZMQStream
                 ctx = zmq.Context()
                 loop = ioloop.IOLoop()
                 print (in_addr, out_addr, mon_addr, not_addr)
                 ins = ZMQStream(ctx.socket(zmq.XREP),loop)
                 ins.bind(in_addr)
                 outs = ZMQStream(ctx.socket(zmq.XREP),loop)
                 outs.bind(out_addr)
                 mons = ZMQStream(ctx.socket(zmq.PUB),loop)
                 mons.connect(mon_addr)
                 nots = ZMQStream(ctx.socket(zmq.SUB),loop)
                 nots.setsockopt(zmq.SUBSCRIBE, '')
                 nots.connect(not_addr)
                 scheme = globals().get(scheme, None)
                 # setup logging
                 if log_addr:
                     connect_logger(logname, ctx, log_addr, root="scheduler", loglevel=loglevel)
                 else:
                     local_logger(logname, loglevel)
                 scheduler = TaskScheduler(client_stream=ins, engine_stream=outs,
                                         mon_stream=mons, notifier_stream=nots,
                                         scheme=scheme, loop=loop, logname=logname,
                                         config=config)
                 scheduler.start()
                 try:
                     loop.start()
                 except KeyboardInterrupt:
                     print ("interrupted, exiting...", file=sys.__stderr__)

IPython/zmq/parallel/view.py

0 +6 -1

             """Views of remote engines"""
             #-----------------------------------------------------------------------------
             #  Copyright (C) 2010  The IPython Development Team
             #
             #  Distributed under the terms of the BSD License.  The full license is in
             #  the file COPYING, distributed as part of this software.
             #-----------------------------------------------------------------------------
             #-----------------------------------------------------------------------------
             # Imports
             #-----------------------------------------------------------------------------
             from IPython.external.decorator import decorator
             from IPython.zmq.parallel.remotefunction import ParallelFunction, parallel
             #-----------------------------------------------------------------------------
             # Decorators
             #-----------------------------------------------------------------------------
             @decorator
             def myblock(f, self, *args, **kwargs):
                 """override client.block with self.block during a call"""
                 block = self.client.block
                 self.client.block = self.block
                 try:
                     ret = f(self, *args, **kwargs)
                 finally:
                     self.client.block = block
                 return ret
             @decorator
             def save_ids(f, self, *args, **kwargs):
                 """Keep our history and outstanding attributes up to date after a method call."""
                 n_previous = len(self.client.history)
                 ret = f(self, *args, **kwargs)
                 nmsgs = len(self.client.history) - n_previous
                 msg_ids = self.client.history[-nmsgs:]
                 self.history.extend(msg_ids)
                 map(self.outstanding.add, msg_ids)
                 return ret
             @decorator
             def sync_results(f, self, *args, **kwargs):
                 """sync relevant results from self.client to our results attribute."""
                 ret = f(self, *args, **kwargs)
                 delta = self.outstanding.difference(self.client.outstanding)
                 completed = self.outstanding.intersection(delta)
                 self.outstanding = self.outstanding.difference(completed)
                 for msg_id in completed:
                     self.results[msg_id] = self.client.results[msg_id]
                 return ret
             @decorator
             def spin_after(f, self, *args, **kwargs):
                 """call spin after the method."""
                 ret = f(self, *args, **kwargs)
                 self.spin()
                 return ret
             #-----------------------------------------------------------------------------
             # Classes
             #-----------------------------------------------------------------------------
             class View(object):
                 """Base View class for more convenint apply(f,*args,**kwargs) syntax via attributes.
                 Don't use this class, use subclasses.
                 """
                 _targets = None
                 block=None
                 bound=None
                 history=None
                 def __init__(self, client, targets=None):
                     self.client = client
                     self._targets = targets
                     self._ntargets = 1 if isinstance(targets, (int,type(None))) else len(targets)
                     self.block = client.block
                     self.bound=False
                     self.history = []
                     self.outstanding = set()
                     self.results = {}
                 def __repr__(self):
                     strtargets = str(self._targets)
                     if len(strtargets) > 16:
                         strtargets = strtargets[:12]+'...]'
                     return "<%s %s>"%(self.__class__.__name__, strtargets)
                 @property
                 def targets(self):
                     return self._targets
                 @targets.setter
                 def targets(self, value):
                     self._targets = value
                     # raise AttributeError("Cannot set my targets argument after construction!")
                 @sync_results
                 def spin(self):
                     """spin the client, and sync"""
                     self.client.spin()
                 @sync_results
                 @save_ids
                 def apply(self, f, *args, **kwargs):
                     """calls f(*args, **kwargs) on remote engines, returning the result.
                     This method does not involve the engine's namespace.
                     if self.block is False:
                         returns msg_id
                     else:
                         returns actual result of f(*args, **kwargs)
                     """
                     return self.client.apply(f, args, kwargs, block=self.block, targets=self.targets, bound=self.bound)
                 @save_ids
                 def apply_async(self, f, *args, **kwargs):
                     """calls f(*args, **kwargs) on remote engines in a nonblocking manner.
                     This method does not involve the engine's namespace.
                     returns msg_id
                     """
                     return self.client.apply(f,args,kwargs, block=False, targets=self.targets, bound=False)
                 @spin_after
                 @save_ids
                 def apply_sync(self, f, *args, **kwargs):
                     """calls f(*args, **kwargs) on remote engines in a blocking manner,
                      returning the result.
                     This method does not involve the engine's namespace.
                     returns: actual result of f(*args, **kwargs)
                     """
                     return self.client.apply(f,args,kwargs, block=True, targets=self.targets, bound=False)
                 @sync_results
                 @save_ids
                 def apply_bound(self, f, *args, **kwargs):
                     """calls f(*args, **kwargs) bound to engine namespace(s).
                     if self.block is False:
                         returns msg_id
                     else:
                         returns actual result of f(*args, **kwargs)
                     This method has access to the targets' globals
                     """
                     return self.client.apply(f, args, kwargs, block=self.block, targets=self.targets, bound=True)
                 @sync_results
                 @save_ids
                 def apply_async_bound(self, f, *args, **kwargs):
                     """calls f(*args, **kwargs) bound to engine namespace(s)
                     in a nonblocking manner.
                     returns: msg_id
                     This method has access to the targets' globals
                     """
                     return self.client.apply(f, args, kwargs, block=False, targets=self.targets, bound=True)
                 @spin_after
                 @save_ids
                 def apply_sync_bound(self, f, *args, **kwargs):
                     """calls f(*args, **kwargs) bound to engine namespace(s), waiting for the result.
                     returns: actual result of f(*args, **kwargs)
                     This method has access to the targets' globals
                     """
                     return self.client.apply(f, args, kwargs, block=True, targets=self.targets, bound=True)
                 @spin_after
                 @save_ids
                 def map(self, f, *sequences):
                     """Parallel version of builtin `map`, using this view's engines."""
                     if isinstance(self.targets, int):
                         targets = [self.targets]
                     else:
                         targets = self.targets
                     pf = ParallelFunction(self.client, f, block=self.block,
                                     bound=True, targets=targets)
                     return pf.map(*sequences)
                 def parallel(self, bound=True, block=True):
                     """Decorator for making a ParallelFunction"""
                     return parallel(self.client, bound=bound, targets=self.targets, block=block)
                 def abort(self, msg_ids=None, block=None):
                     """Abort jobs on my engines.
                     Parameters
                     ----------
                     msg_ids : None, str, list of strs, optional
                         if None: abort all jobs.
                         else: abort specific msg_id(s).
                     """
                     block = block if block is not None else self.block
                     return self.client.abort(msg_ids=msg_ids, targets=self.targets, block=block)
                 def queue_status(self, verbose=False):
                     """Fetch the Queue status of my engines"""
                     return self.client.queue_status(targets=self.targets, verbose=verbose)
                 def purge_results(self, msg_ids=[], targets=[]):
                     """Instruct the controller to forget specific results."""
                     if targets is None or targets == 'all':
                         targets = self.targets
                     return self.client.purge_results(msg_ids=msg_ids, targets=targets)
             class DirectView(View):
                 """Direct Multiplexer View of one or more engines.
                 These are created via indexed access to a client:
                 >>> dv_1 = client[1]
                 >>> dv_all = client[:]
                 >>> dv_even = client[::2]
                 >>> dv_some = client[1:3]
-                This object provides dictionary access
+                This object provides dictionary access to engine namespaces:
+                # push a=5:
+                >>> dv['a'] = 5
+                # pull 'foo':
+                >>> db['foo']
                 """
                 @sync_results
                 @save_ids
                 def execute(self, code, block=True):
                     """execute some code on my targets."""
                     return self.client.execute(code, block=self.block, targets=self.targets)
                 def update(self, ns):
                     """update remote namespace with dict `ns`"""
                     return self.client.push(ns, targets=self.targets, block=self.block)
                 push = update
                 def get(self, key_s):
                     """get object(s) by `key_s` from remote namespace
                     will return one object if it is a key.
                     It also takes a list of keys, and will return a list of objects."""
                     # block = block if block is not None else self.block
                     return self.client.pull(key_s, block=True, targets=self.targets)
                 @sync_results
                 @save_ids
                 def pull(self, key_s, block=True):
                     """get object(s) by `key_s` from remote namespace
                     will return one object if it is a key.
                     It also takes a list of keys, and will return a list of objects."""
                     block = block if block is not None else self.block
                     return self.client.pull(key_s, block=block, targets=self.targets)
                 def scatter(self, key, seq, dist='b', flatten=False, targets=None, block=None):
                     """
                     Partition a Python sequence and send the partitions to a set of engines.
                     """
                     block = block if block is not None else self.block
                     targets = targets if targets is not None else self.targets
                     return self.client.scatter(key, seq, dist=dist, flatten=flatten,
                                 targets=targets, block=block)
                 @sync_results
                 @save_ids
                 def gather(self, key, dist='b', targets=None, block=None):
                     """
                     Gather a partitioned sequence on a set of engines as a single local seq.
                     """
                     block = block if block is not None else self.block
                     targets = targets if targets is not None else self.targets
                     return self.client.gather(key, dist=dist, targets=targets, block=block)
                 def __getitem__(self, key):
                     return self.get(key)
                 def __setitem__(self,key, value):
                     self.update({key:value})
                 def clear(self, block=False):
                     """Clear the remote namespaces on my engines."""
                     block = block if block is not None else self.block
                     return self.client.clear(targets=self.targets, block=block)
                 def kill(self, block=True):
                     """Kill my engines."""
                     block = block if block is not None else self.block
                     return self.client.kill(targets=self.targets, block=block)
                 #----------------------------------------
                 # activate for %px,%autopx magics
                 #----------------------------------------
                 def activate(self):
                     """Make this `View` active for parallel magic commands.
                     IPython has a magic command syntax to work with `MultiEngineClient` objects.
                     In a given IPython session there is a single active one.  While
                     there can be many `Views` created and used by the user,
                     there is only one active one.  The active `View` is used whenever
                     the magic commands %px and %autopx are used.
                     The activate() method is called on a given `View` to make it
                     active.  Once this has been done, the magic commands can be used.
                     """
                     try:
                         # This is injected into __builtins__.
                         ip = get_ipython()
                     except NameError:
                         print "The IPython parallel magics (%result, %px, %autopx) only work within IPython."
                     else:
                         pmagic = ip.plugin_manager.get_plugin('parallelmagic')
                         if pmagic is not None:
                             pmagic.active_multiengine_client = self
                         else:
                             print "You must first load the parallelmagic extension " \
                                   "by doing '%load_ext parallelmagic'"
             class LoadBalancedView(View):
                 """An engine-agnostic View that only executes via the Task queue.
                 Typically created via:
                 >>> lbv = client[None]
                 <LoadBalancedView tcp://127.0.0.1:12345>
                 but can also be created with:
                 >>> lbc = LoadBalancedView(client)
                 TODO: allow subset of engines across which to balance.
                 """
                 def __repr__(self):
                     return "<%s %s>"%(self.__class__.__name__, self.client._config['url'])
                 @property
                 def targets(self):
                     return None
                 @targets.setter
                 def targets(self, value):
                     raise AttributeError("Cannot set targets for LoadbalancedView!")
   No newline at end of file

docs/examples/newparallel/dagdeps.py ~~docs/examples/newparallel/demo/dag/dagdeps.py~~

0 renamed +12 -8

             """Example for generating an arbitrary DAG as a dependency map.
             This demo uses networkx to generate the graph.
             Authors
             -------
             * MinRK
             """
             import networkx as nx
             from random import randint, random
             from IPython.zmq.parallel import client as cmod
             def randomwait():
                 import time
                 from random import random
                 time.sleep(random())
                 return time.time()
             def random_dag(nodes, edges):
                 """Generate a random Directed Acyclic Graph (DAG) with a given number of nodes and edges."""
                 G = nx.DiGraph()
                 for i in range(nodes):
                     G.add_node(i)
                 while edges > 0:
                     a = randint(0,nodes-1)
                     b=a
                     while b==a:
                         b = randint(0,nodes-1)
                     G.add_edge(a,b)
                     if nx.is_directed_acyclic_graph(G):
                         edges -= 1
                     else:
                         # we closed a loop!
                         G.remove_edge(a,b)
                 return G
             def add_children(G, parent, level, n=2):
                 """Add children recursively to a binary tree."""
                 if level == 0:
                     return
                 for i in range(n):
                     child = parent+str(i)
                     G.add_node(child)
                     G.add_edge(parent,child)
                     add_children(G, child, level-1, n)
             def make_bintree(levels):
                 """Make a symmetrical binary tree with @levels"""
                 G = nx.DiGraph()
                 root = '0'
                 G.add_node(root)
                 add_children(G, root, levels, 2)
                 return G
             def submit_jobs(client, G, jobs):
                 """Submit jobs via client where G describes the time dependencies."""
                 results = {}
                 for node in nx.topological_sort(G):
-                    deps = [ results[n].msg_ids[0] for n in G.predecessors(node) ]
+                    deps = [ results[n] for n in G.predecessors(node) ]
                     results[node] = client.apply(jobs[node], after=deps)
                 return results
             def validate_tree(G, results):
                 """Validate that jobs executed after their dependencies."""
                 for node in G:
                     started = results[node].metadata.started
                     for parent in G.predecessors(node):
                         finished = results[parent].metadata.completed
                         assert started > finished, "%s should have happened after %s"%(node, parent)
             def main(nodes, edges):
                 """Generate a random graph, submit jobs, then validate that the
                 dependency order was enforced.
                 Finally, plot the graph, with time on the x-axis, and
                 in-degree on the y (just for spread).  All arrows must
                 point at least slightly to the right if the graph is valid.
                 """
                 from matplotlib.dates import date2num
+                from matplotlib.cm import gist_rainbow
                 print "building DAG"
                 G = random_dag(nodes, edges)
                 jobs = {}
                 pos = {}
+                colors = {}
                 for node in G:
                     jobs[node] = randomwait
                 client = cmod.Client()
-                print "submitting tasks"
+                print "submitting %i tasks with %i dependencies"%(nodes,edges)
                 results = submit_jobs(client, G, jobs)
                 print "waiting for results"
                 client.barrier()
                 print "done"
                 for node in G:
-                    # times[node] = results[node].get()
+                    md = results[node].metadata
-                    t = date2num(results[node].metadata.started)
+                    start = date2num(md.started)
-                    pos[node] = (t, G.in_degree(node)+random())
+                    runtime = date2num(md.completed) - start
+                    pos[node] = (start, runtime)
+                    colors[node] = md.engine_id
                 validate_tree(G, results)
-                nx.draw(G, pos)
+                nx.draw(G, pos, node_list = colors.keys(), node_color=colors.values(), cmap=gist_rainbow)
                 return G,results
             if __name__ == '__main__':
                 import pylab
-                main(32,128)
+                # main(5,10)
+                main(32,96)
                 pylab.show()
   No newline at end of file

docs/source/parallelz/index.txt

0 +1 0

             .. _parallelz_index:
             ==========================================
             Using IPython for parallel computing (ZMQ)
             ==========================================
             .. toctree::
                :maxdepth: 2
                parallel_intro.txt
                parallel_process.txt
                parallel_multiengine.txt
                parallel_task.txt
                parallel_mpi.txt
                parallel_security.txt
                parallel_winhpc.txt
                parallel_demos.txt
+               dag_dependencies.txt

docs/source/parallelz/parallel_demos.txt

0 +1 -8

             =================
             Parallel examples
             =================
             .. note::
                 Performance numbers from ``IPython.kernel``, not newparallel.
             In this section we describe two more involved examples of using an IPython
             cluster to perform a parallel computation. In these examples, we will be using
             IPython's "pylab" mode, which enables interactive plotting using the
             Matplotlib package. IPython can be started in this mode by typing::
                 ipython --pylab
-            at the system command line. If this prints an error message, you will
+            at the system command line.
-            need to install the default profiles from within IPython by doing,
-            .. sourcecode:: ipython
-                In [1]: %install_profiles
-            and then restarting IPython.
 million digits of pi
             ========================
             In this example we would like to study the distribution of digits in the
             number pi (in base 10). While it is not known if pi is a normal number (a
             number is normal in base 10 if 0-9 occur with equal likelihood) numerical
             investigations suggest that it is. We will begin with a serial calculation on
 ,000 digits of pi and then perform a parallel calculation involving 150
             million digits.
             In both the serial and parallel calculation we will be using functions defined
             in the :file:`pidigits.py` file, which is available in the
             :file:`docs/examples/newparallel` directory of the IPython source distribution.
             These functions provide basic facilities for working with the digits of pi and
             can be loaded into IPython by putting :file:`pidigits.py` in your current
             working directory and then doing:
             .. sourcecode:: ipython
                 In [1]: run pidigits.py
             Serial calculation
             ------------------
             For the serial calculation, we will use `SymPy <http://www.sympy.org>`_ to
             calculate 10,000 digits of pi and then look at the frequencies of the digits
 -9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While
             SymPy is capable of calculating many more digits of pi, our purpose here is to
             set the stage for the much larger parallel calculation.
             In this example, we use two functions from :file:`pidigits.py`:
             :func:`one_digit_freqs` (which calculates how many times each digit occurs)
             and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result).
             Here is an interactive IPython session that uses these functions with
             SymPy:
             .. sourcecode:: ipython
                 In [7]: import sympy
                 In [8]: pi = sympy.pi.evalf(40)
                 In [9]: pi
                 Out[9]: 3.141592653589793238462643383279502884197
                 In [10]: pi = sympy.pi.evalf(10000)
                 In [11]: digits = (d for d in str(pi)[2:])  # create a sequence of digits
                 In [12]: run pidigits.py  # load one_digit_freqs/plot_one_digit_freqs
                 In [13]: freqs = one_digit_freqs(digits)
                 In [14]: plot_one_digit_freqs(freqs)
                 Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>]
             The resulting plot of the single digit counts shows that each digit occurs
             approximately 1,000 times, but that with only 10,000 digits the
             statistical fluctuations are still rather large:
             .. image:: ../parallel/single_digits.*
             It is clear that to reduce the relative fluctuations in the counts, we need
             to look at many more digits of pi. That brings us to the parallel calculation.
             Parallel calculation
             --------------------
             Calculating many digits of pi is a challenging computational problem in itself.
             Because we want to focus on the distribution of digits in this example, we
             will use pre-computed digit of pi from the website of Professor Yasumasa
             Kanada at the University of Tokyo (http://www.super-computing.org). These
             digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/)
             that each have 10 million digits of pi.
             For the parallel calculation, we have copied these files to the local hard
             drives of the compute nodes. A total of 15 of these files will be used, for a
             total of 150 million digits of pi. To make things a little more interesting we
             will calculate the frequencies of all 2 digits sequences (00-99) and then plot
             the result using a 2D matrix in Matplotlib.
             The overall idea of the calculation is simple: each IPython engine will
             compute the two digit counts for the digits in a single file. Then in a final
             step the counts from each engine will be added up. To perform this
             calculation, we will need two top-level functions from :file:`pidigits.py`:
             .. literalinclude:: ../../examples/newparallel/pidigits.py
                :language: python
                :lines: 41-56
             We will also use the :func:`plot_two_digit_freqs` function to plot the
             results. The code to run this calculation in parallel is contained in
             :file:`docs/examples/newparallel/parallelpi.py`. This code can be run in parallel
             using IPython by following these steps:
 . Use :command:`ipclusterz` to start 15 engines. We used an 8 core (2 quad
                core CPUs) cluster with hyperthreading enabled which makes the 8 cores
                looks like 16 (1 controller + 15 engines) in the OS. However, the maximum
                speedup we can observe is still only 8x.
 . With the file :file:`parallelpi.py` in your current working directory, open
                up IPython in pylab mode and type ``run parallelpi.py``.  This will download
                the pi files via ftp the first time you run it, if they are not
                present in the Engines' working directory.
             When run on our 8 core cluster, we observe a speedup of 7.7x. This is slightly
             less than linear scaling (8x) because the controller is also running on one of
             the cores.
             To emphasize the interactive nature of IPython, we now show how the
             calculation can also be run by simply typing the commands from
             :file:`parallelpi.py` interactively into IPython:
             .. sourcecode:: ipython
                 In [1]: from IPython.zmq.parallel import client
                 # The Client allows us to use the engines interactively.
                 # We simply pass Client the name of the cluster profile we
                 # are using.
                 In [2]: c = client.Client(profile='mycluster')
                 In [3]: c.ids
                 Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
                 In [4]: run pidigits.py
                 In [5]: filestring = 'pi200m.ascii.%(i)02dof20'
                 # Create the list of files to process.
                 In [6]: files = [filestring % {'i':i} for i in range(1,16)]
                 In [7]: files
                 Out[7]:
                 ['pi200m.ascii.01of20',
                  'pi200m.ascii.02of20',
                  'pi200m.ascii.03of20',
                  'pi200m.ascii.04of20',
                  'pi200m.ascii.05of20',
                  'pi200m.ascii.06of20',
                  'pi200m.ascii.07of20',
                  'pi200m.ascii.08of20',
                  'pi200m.ascii.09of20',
                  'pi200m.ascii.10of20',
                  'pi200m.ascii.11of20',
                  'pi200m.ascii.12of20',
                  'pi200m.ascii.13of20',
                  'pi200m.ascii.14of20',
                  'pi200m.ascii.15of20']
                 # download the data files if they don't already exist:
                 In [8]: c.map(fetch_pi_file, files)
                 # This is the parallel calculation using the Client.map method
                 # which applies compute_two_digit_freqs to each file in files in parallel.
                 In [9]: freqs_all = c.map(compute_two_digit_freqs, files)
                 # Add up the frequencies from each engine.
                 In [10]: freqs = reduce_freqs(freqs_all)
                 In [11]: plot_two_digit_freqs(freqs)
                 Out[11]: <matplotlib.image.AxesImage object at 0x18beb110>
                 In [12]: plt.title('2 digit counts of 150m digits of pi')
                 Out[12]: <matplotlib.text.Text object at 0x18d1f9b0>
             The resulting plot generated by Matplotlib is shown below. The colors indicate
             which two digit sequences are more (red) or less (blue) likely to occur in the
             first 150 million digits of pi. We clearly see that the sequence "41" is
             most likely and that "06" and "07" are least likely. Further analysis would
             show that the relative size of the statistical fluctuations have decreased
             compared to the 10,000 digit calculation.
             .. image:: ../parallel/two_digit_counts.*
             Parallel options pricing
             ========================
             An option is a financial contract that gives the buyer of the contract the
             right to buy (a "call") or sell (a "put") a secondary asset (a stock for
             example) at a particular date in the future (the expiration date) for a
             pre-agreed upon price (the strike price). For this right, the buyer pays the
             seller a premium (the option price). There are a wide variety of flavors of
             options (American, European, Asian, etc.) that are useful for different
             purposes: hedging against risk, speculation, etc.
             Much of modern finance is driven by the need to price these contracts
             accurately based on what is known about the properties (such as volatility) of
             the underlying asset. One method of pricing options is to use a Monte Carlo
             simulation of the underlying asset price. In this example we use this approach
             to price both European and Asian (path dependent) options for various strike
             prices and volatilities.
             The code for this example can be found in the :file:`docs/examples/newparallel`
             directory of the IPython source. The function :func:`price_options` in
             :file:`mcpricer.py` implements the basic Monte Carlo pricing algorithm using
             the NumPy package and is shown here:
             .. literalinclude:: ../../examples/newparallel/mcpricer.py
                :language: python
             To run this code in parallel, we will use IPython's :class:`LoadBalancedView` class,
             which distributes work to the engines using dynamic load balancing. This
             view is a wrapper of the :class:`Client` class shown in
             the previous example. The parallel calculation using :class:`LoadBalancedView` can
             be found in the file :file:`mcpricer.py`. The code in this file creates a
             :class:`TaskClient` instance and then submits a set of tasks using
             :meth:`TaskClient.run` that calculate the option prices for different
             volatilities and strike prices. The results are then plotted as a 2D contour
             plot using Matplotlib.
             .. literalinclude:: ../../examples/newparallel/mcdriver.py
                :language: python
             To use this code, start an IPython cluster using :command:`ipclusterz`, open
             IPython in the pylab mode with the file :file:`mcdriver.py` in your current
             working directory and then type:
             .. sourcecode:: ipython
                 In [7]: run mcdriver.py
                 Submitted tasks:  [0, 1, 2, ...]
             Once all the tasks have finished, the results can be plotted using the
             :func:`plot_options` function. Here we make contour plots of the Asian
             call and Asian put options as function of the volatility and strike price:
             .. sourcecode:: ipython
                 In [8]: plot_options(sigma_vals, K_vals, prices['acall'])
                 In [9]: plt.figure()
                 Out[9]: <matplotlib.figure.Figure object at 0x18c178d0>
                 In [10]: plot_options(sigma_vals, K_vals, prices['aput'])
             These results are shown in the two figures below. On a 8 core cluster the
             entire calculation (10 strike prices, 10 volatilities, 100,000 paths for each)
             took 30 seconds in parallel, giving a speedup of 7.7x, which is comparable
             to the speedup observed in our previous example.
             .. image:: ../parallel/asian_call.*
             .. image:: ../parallel/asian_put.*
             Conclusion
             ==========
             To conclude these examples, we summarize the key features of IPython's
             parallel architecture that have been demonstrated:
             * Serial code can be parallelized often with only a few extra lines of code.
               We have used the :class:`DirectView` and :class:`LoadBalancedView` classes
               for this purpose.
             * The resulting parallel code can be run without ever leaving the IPython's
               interactive shell.
             * Any data computed in parallel can be explored interactively through
               visualization or further numerical calculations.
             * We have run these examples on a cluster running Windows HPC Server 2008.
               IPython's built in support for the Windows HPC job scheduler makes it
               easy to get started with IPython's parallel capabilities.
             .. note::
                 The newparallel code has never been run on Windows HPC Server, so the last
                 conclusion is untested.

docs/source/parallelz/parallel_multiengine.txt

0 +14 -17

             .. _parallelmultiengine:
             ==========================
             IPython's Direct interface
             ==========================
             The direct, or multiengine, interface represents one possible way of working with a set of
             IPython engines. The basic idea behind the multiengine interface is that the
             capabilities of each engine are directly and explicitly exposed to the user.
             Thus, in the multiengine interface, each engine is given an id that is used to
             identify the engine and give it work to do. This interface is very intuitive
             and is designed with interactive usage in mind, and is thus the best place for
             new users of IPython to begin.
             Starting the IPython controller and engines
             ===========================================
             To follow along with this tutorial, you will need to start the IPython
             controller and four IPython engines. The simplest way of doing this is to use
             the :command:`ipclusterz` command::
                 $ ipclusterz start -n 4
             For more detailed information about starting the controller and engines, see
             our :ref:`introduction <ip1par>` to using IPython for parallel computing.
             Creating a ``Client`` instance
             ==============================
             The first step is to import the IPython :mod:`IPython.zmq.parallel.client`
             module and then create a :class:`.Client` instance:
             .. sourcecode:: ipython
                 In [1]: from IPython.zmq.parallel import client
                 In [2]: rc = client.Client()
             This form assumes that the default connection information (stored in
             :file:`ipcontroller-client.json` found in `~/.ipython/clusterz_default/security`) is
             accurate. If the controller was started on a remote machine, you must copy that connection
             file to the client machine, or enter its contents as arguments to the Client constructor:
             .. sourcecode:: ipython
                 # If you have copied the json connector file from the controller:
                 In [2]: rc = client.Client('/path/to/ipcontroller-client.json')
                 # for a remote controller at 10.0.1.5, visible from my.server.com:
                 In [3]: rc = client.Client('tcp://10.0.1.5:12345', sshserver='my.server.com')
             To make sure there are engines connected to the controller, use can get a list
             of engine ids:
             .. sourcecode:: ipython
                 In [3]: rc.ids
                 Out[3]: set([0, 1, 2, 3])
             Here we see that there are four engines ready to do work for us.
             Quick and easy parallelism
             ==========================
             In many cases, you simply want to apply a Python function to a sequence of
             objects, but *in parallel*. The client interface provides a simple way
             of accomplishing this: using the builtin :func:`map` and the ``@remote``
             function decorator, or the client's :meth:`map` method.
             Parallel map
             ------------
             Python's builtin :func:`map` functions allows a function to be applied to a
             sequence element-by-element. This type of code is typically trivial to
             parallelize. In fact, since IPython's interface is all about functions anyway,
             you can just use the builtin :func:`map`, or a client's :meth:`map` method:
             .. sourcecode:: ipython
                 In [62]: serial_result = map(lambda x:x**10, range(32))
                 In [66]: parallel_result = rc.map(lambda x: x**10, range(32))
                 In [67]: serial_result==parallel_result
                 Out[67]: True
             .. note::
                 The client's own version of :meth:`map` or that of :class:`.DirectView` do
                 not do any load balancing. For a load balanced version, use a
                 :class:`LoadBalancedView`, or a :class:`ParallelFunction` with
                 `targets=None`.
             .. seealso::
                 :meth:`map` is implemented via :class:`.ParallelFunction`.
             Remote function decorator
             -------------------------
             Remote functions are just like normal functions, but when they are called,
             they execute on one or more engines, rather than locally. IPython provides
             some decorators:
             .. sourcecode:: ipython
                 In [10]: @rc.remote(block=True)
                    ....: def f(x):
                    ....:     return 10.0*x**4
                    ....:
                 In [11]: map(f, range(32))    # this is done in parallel
                 Out[11]: [0.0,10.0,160.0,...]
             See the docstring for the :func:`parallel` and :func:`remote` decorators for
             options.
             Calling Python functions
             ========================
             The most basic type of operation that can be performed on the engines is to
             execute Python code or call Python functions. Executing Python code can be
             done in blocking or non-blocking mode (non-blocking is default) using the
             :meth:`execute` method, and calling functions can be done via the
             :meth:`.View.apply` method.
             apply
             -----
             The main method for doing remote execution (in fact, all methods that
             communicate with the engines are built on top of it), is :meth:`Client.apply`.
             Ideally, :meth:`apply` would have the signature ``apply(f,*args,**kwargs)``,
             which would call ``f(*args,**kwargs)`` remotely.  However, since :class:`Clients`
-            require some more options, they cannot reasonably provide this interface.
+            require some more options, they cannot easily provide this interface.
             Instead, they provide the signature::
-                c.apply(f, args=None, kwargs=None, bound=True, block=None,
+                c.apply(f, args=None, kwargs=None, bound=True, block=None, targets=None,
-                                targets=None, after=None, follow=None)
+                                after=None, follow=None, timeout=None)
             In order to provide the nicer interface, we have :class:`View` classes, which wrap
             :meth:`Client.apply` by using attributes and extra :meth:`apply_x` methods to determine
             the extra arguments. For instance, performing index-access on a client creates a
             :class:`.LoadBalancedView`.
             .. sourcecode:: ipython
                 In [4]: view = rc[1:3]
                 Out[4]: <DirectView [1, 2]>
                 In [5]: view.apply<tab>
                 view.apply  view.apply_async  view.apply_async_bound  view.apply_bound  view.apply_sync  view.apply_sync_bound
             A :class:`DirectView` always uses its `targets` attribute, and it will use its `bound`
             and `block` attributes in its :meth:`apply` method, but the suffixed :meth:`apply_x`
             methods allow specifying `bound` and `block` via the different methods.
             ==================  ==========  ==========
             method              block       bound
             ==================  ==========  ==========
             apply               self.block  self.bound
             apply_sync          True        False
             apply_async         False       False
             apply_sync_bound    True        True
             apply_async_bound   False       True
             ==================  ==========  ==========
             For explanation of these values, read on.
             Blocking execution
             ------------------
             In blocking mode, the :class:`.DirectView` object (called ``dview`` in
             these examples) submits the command to the controller, which places the
             command in the engines' queues for execution. The :meth:`apply` call then
             blocks until the engines are done executing the command:
             .. sourcecode:: ipython
                 In [2]: rc.block=True
                 In [3]: dview = rc[:] # A DirectView of all engines
                 In [4]: dview['a'] = 5
                 In [5]: dview['b'] = 10
                 In [6]: dview.apply_bound(lambda x: a+b+x, 27)
-                Out[6]: [42,42,42,42]
+                Out[6]: [42, 42, 42, 42]
             Python commands can be executed on specific engines by calling execute using
             the ``targets`` keyword argument, or creating a :class:`DirectView` instance
             by index-access to the client:
             .. sourcecode:: ipython
                 In [6]: rc[::2].execute('c=a+b') # shorthand for rc.execute('c=a+b',targets=[0,2])
                 In [7]: rc[1::2].execute('c=a-b') # shorthand for rc.execute('c=a-b',targets=[1,3])
                 In [8]: rc[:]['c'] # shorthand for rc.pull('c',targets='all')
-                Out[8]: [15,-5,15,-5]
+                Out[8]: [15, -5, 15, -5]
             .. note::
                 Note that every call to ``rc.<meth>(...,targets=x)`` can be made via
                 ``rc[<x>].<meth>(...)``, which constructs a View object. The only place
                 where this differs in in :meth:`apply`. The :class:`Client` takes many
                 arguments to apply, so it requires `args` and `kwargs` to be passed as
                 individual arguments. Extended options such as `bound`,`targets`, and
                 `block` are controlled by the attributes of the :class:`View` objects, so
                 they can provide the much more convenient
                 :meth:`View.apply(f,*args,**kwargs)`, which simply calls
                 ``f(*args,**kwargs)`` remotely.
             This example also shows one of the most important things about the IPython
             engines: they have a persistent user namespaces. The :meth:`apply` method can
             be run in either a bound or unbound way. The default for a View is to be
             unbound, unless called by the :meth:`apply_bound` method:
             .. sourcecode:: ipython
                 In [9]: rc[:]['b'] = 5 # assign b to 5 everywhere
                 In [10]: v0 = rc[0]
                 In [12]: v0.apply_bound(lambda : b)
                 Out[12]: 5
                 In [13]: v0.apply(lambda : b)
                 ---------------------------------------------------------------------------
                 RemoteError                               Traceback (most recent call last)
                 /home/you/<ipython-input-34-21a468eb10f0> in <module>()
                 ----> 1 v0.apply(lambda : b)
                 ...
                 RemoteError: NameError(global name 'b' is not defined)
                 Traceback (most recent call last):
                   File "/Users/minrk/dev/ip/mine/IPython/zmq/parallel/streamkernel.py", line 294, in apply_request
                     exec code in working, working
                   File "<string>", line 1, in <module>
                   File "<ipython-input-34-21a468eb10f0>", line 1, in <lambda>
                 NameError: global name 'b' is not defined
             Specifically, `bound=True` specifies that the engine's namespace is to be used
             for execution, and `bound=False` specifies that the engine's namespace is not
             to be used (hence, 'b' is undefined during unbound execution, since the
             function is called in an empty namespace). Unbound execution is often useful
             for large numbers of atomic tasks, which prevents bloating the engine's
             memory, while bound execution lets you build on your previous work.
             Non-blocking execution
             ----------------------
             In non-blocking mode, :meth:`apply` submits the command to be executed and
             then returns a :class:`AsyncResult` object immediately. The
             :class:`AsyncResult` object gives you a way of getting a result at a later
             time through its :meth:`get` method.
             .. Note::
-                The :class:`AsyncResult` object provides the exact same interface as
+                The :class:`AsyncResult` object provides a superset of the interface in
                 :py:class:`multiprocessing.pool.AsyncResult`.  See the
                 `official Python documentation <http://docs.python.org/library/multiprocessing#multiprocessing.pool.AsyncResult>`_
                 for more.
             This allows you to quickly submit long running commands without blocking your
             local Python/IPython session:
             .. sourcecode:: ipython
                 # define our function
-                In [35]: def wait(t):
+                In [6]: def wait(t):
-                   ....:     import time
+                   ...:     import time
-                   ....:     tic = time.time()
+                   ...:     tic = time.time()
-                   ....:     time.sleep(t)
+                   ...:     time.sleep(t)
-                   ....:     return time.time()-tic
+                   ...:     return time.time()-tic
-                # In blocking mode
-                In [6]: rc.apply('import time')
                 # In non-blocking mode
                 In [7]: pr = rc[:].apply_async(wait, 2)
                 # Now block for the result
                 In [8]: pr.get()
                 Out[8]: [2.0006198883056641, 1.9997570514678955, 1.9996809959411621, 2.0003249645233154]
                 # Again in non-blocking mode
                 In [9]: pr = rc[:].apply_async(wait, 10)
                 # Poll to see if the result is ready
                 In [10]: pr.ready()
                 Out[10]: False
                 # ask for the result, but wait a maximum of 1 second:
                 In [45]: pr.get(1)
                 ---------------------------------------------------------------------------
                 TimeoutError                              Traceback (most recent call last)
                 /home/you/<ipython-input-45-7cd858bbb8e0> in <module>()
                 ----> 1 pr.get(1)
                 /path/to/site-packages/IPython/zmq/parallel/asyncresult.pyc in get(self, timeout)
 raise self._exception
 else:
                 ---> 64             raise error.TimeoutError("Result not ready.")
 def ready(self):
                 TimeoutError: Result not ready.
             .. Note::
                 Note the import inside the function. This is a common model, to ensure
                 that the appropriate modules are imported where the task is run.
             Often, it is desirable to wait until a set of :class:`AsyncResult` objects
             are done. For this, there is a the method :meth:`barrier`. This method takes a
-            tuple of :class:`AsyncResult` objects (or `msg_ids`) and blocks until all of the associated
+            tuple of :class:`AsyncResult` objects (or `msg_ids`) and blocks until all of the
-            results are ready:
+            associated results are ready:
             .. sourcecode:: ipython
                 In [72]: rc.block=False
                 # A trivial list of AsyncResults objects
                 In [73]: pr_list = [rc[:].apply_async(wait, 3) for i in range(10)]
                 # Wait until all of them are done
                 In [74]: rc.barrier(pr_list)
-                # Then, their results are ready using get_result or the r attribute
+                # Then, their results are ready using get() or the `.r` attribute
                 In [75]: pr_list[0].get()
                 Out[75]: [2.9982571601867676, 2.9982588291168213, 2.9987530708312988, 2.9990990161895752]
             The ``block`` and ``targets`` keyword arguments and attributes
             --------------------------------------------------------------
             .. warning::
                 This is different now, I haven't updated this section.
                 -MinRK
             Most methods(like :meth:`apply`) accept
             ``block`` and ``targets`` as keyword arguments. As we have seen above, these
             keyword arguments control the blocking mode and which engines the command is
             applied to. The :class:`Client` class also has :attr:`block` and
             :attr:`targets` attributes that control the default behavior when the keyword
             arguments are not provided. Thus the following logic is used for :attr:`block`
             and :attr:`targets`:
             * If no keyword argument is provided, the instance attributes are used.
             * Keyword argument, if provided override the instance attributes.
             The following examples demonstrate how to use the instance attributes:
             .. sourcecode:: ipython
                 In [16]: rc.targets = [0,2]
                 In [17]: rc.block = False
                 In [18]: pr = rc.execute('a=5')
                 In [19]: pr.r
                 Out[19]:
                 <Results List>
                 [0] In [6]: a=5
                 [2] In [6]: a=5
                 # Note targets='all' means all engines
                 In [20]: rc.targets = 'all'
                 In [21]: rc.block = True
                 In [22]: rc.execute('b=10; print b')
                 Out[22]:
                 <Results List>
                 [0] In [7]: b=10; print b
                 [0] Out[7]: 10
                 [1] In [6]: b=10; print b
                 [1] Out[6]: 10
                 [2] In [7]: b=10; print b
                 [2] Out[7]: 10
                 [3] In [6]: b=10; print b
                 [3] Out[6]: 10
             The :attr:`block` and :attr:`targets` instance attributes also determine the
             behavior of the parallel magic commands.
             Parallel magic commands
             -----------------------
             .. warning::
                 The magics have not been changed to work with the zeromq system. ``%px``
                 and ``%autopx`` do work, but ``%result`` does not. %px and %autopx *do
                 not* print stdin/out.
             We provide a few IPython magic commands (``%px``, ``%autopx`` and ``%result``)
             that make it more pleasant to execute Python commands on the engines
             interactively. These are simply shortcuts to :meth:`execute` and
             :meth:`get_result`. The ``%px`` magic executes a single Python command on the
             engines specified by the :attr:`targets` attribute of the
             :class:`MultiEngineClient` instance (by default this is ``'all'``):
             .. sourcecode:: ipython
                 # Create a DirectView for all targets
                 In [22]: dv = rc[:]
                 # Make this DirectView active for parallel magic commands
                 In [23]: dv.activate()
                 In [24]: dv.block=True
                 In [25]: import numpy
                 In [26]: %px import numpy
                 Parallel execution on engines: [0, 1, 2, 3]
                 Out[26]:[None,None,None,None]
                 In [27]: %px a = numpy.random.rand(2,2)
                 Parallel execution on engines: [0, 1, 2, 3]
                 In [28]: %px ev = numpy.linalg.eigvals(a)
                 Parallel execution on engines: [0, 1, 2, 3]
                 In [28]: dv['ev']
                 Out[44]: [ array([ 1.09522024, -0.09645227]),
                            array([ 1.21435496, -0.35546712]),
                            array([ 0.72180653,  0.07133042]),
                            array([  1.46384341e+00,   1.04353244e-04])
                          ]
             .. Note::
                 ``%result`` doesn't work
             The ``%result`` magic gets and prints the stdin/stdout/stderr of the last
             command executed on each engine. It is simply a shortcut to the
             :meth:`get_result` method:
             .. sourcecode:: ipython
                 In [29]: %result
                 Out[29]:
                 <Results List>
                 [0] In [10]: print numpy.linalg.eigvals(a)
                 [0] Out[10]: [ 1.28167017  0.14197338]
                 [1] In [9]: print numpy.linalg.eigvals(a)
                 [1] Out[9]: [-0.14093616  1.27877273]
                 [2] In [10]: print numpy.linalg.eigvals(a)
                 [2] Out[10]: [-0.37023573  1.06779409]
                 [3] In [9]: print numpy.linalg.eigvals(a)
                 [3] Out[9]: [ 0.83664764 -0.25602658]
             The ``%autopx`` magic switches to a mode where everything you type is executed
             on the engines given by the :attr:`targets` attribute:
             .. sourcecode:: ipython
                 In [30]: dv.block=False
                 In [31]: %autopx
                 Auto Parallel Enabled
                 Type %autopx to disable
                 In [32]: max_evals = []
                 <IPython.zmq.parallel.asyncresult.AsyncResult object at 0x17b8a70>
                 In [33]: for i in range(100):
                    ....:     a = numpy.random.rand(10,10)
                    ....:     a = a+a.transpose()
                    ....:     evals = numpy.linalg.eigvals(a)
                    ....:     max_evals.append(evals[0].real)
                    ....:
                    ....:
                 <IPython.zmq.parallel.asyncresult.AsyncResult object at 0x17af8f0>
                 In [34]: %autopx
                 Auto Parallel Disabled
                 In [35]: dv.block=True
                 In [36]: px ans= "Average max eigenvalue is: %f"%(sum(max_evals)/len(max_evals))
                 Parallel execution on engines: [0, 1, 2, 3]
                 In [37]: dv['ans']
                 Out[37]: [ 'Average max eigenvalue is:  10.1387247332',
                            'Average max eigenvalue is:  10.2076902286',
                            'Average max eigenvalue is:  10.1891484655',
                            'Average max eigenvalue is:  10.1158837784',]
             .. Note::
                 Multiline ``%autpx`` gets fouled up by NameErrors, because IPython
                 currently introspects too much.
             Moving Python objects around
             ============================
             In addition to calling functions and executing code on engines, you can
             transfer Python objects to and from your IPython session and the engines. In
             IPython, these operations are called :meth:`push` (sending an object to the
             engines) and :meth:`pull` (getting an object from the engines).
             Basic push and pull
             -------------------
             Here are some examples of how you use :meth:`push` and :meth:`pull`:
             .. sourcecode:: ipython
                 In [38]: rc.push(dict(a=1.03234,b=3453))
                 Out[38]: [None,None,None,None]
                 In [39]: rc.pull('a')
                 Out[39]: [ 1.03234, 1.03234, 1.03234, 1.03234]
                 In [40]: rc.pull('b',targets=0)
                 Out[40]: 3453
                 In [41]: rc.pull(('a','b'))
                 Out[41]: [ [1.03234, 3453], [1.03234, 3453], [1.03234, 3453], [1.03234, 3453] ]
                 # zmq client does not have zip_pull
                 In [42]: rc.zip_pull(('a','b'))
                 Out[42]: [(1.03234, 1.03234, 1.03234, 1.03234), (3453, 3453, 3453, 3453)]
                 In [43]: rc.push(dict(c='speed'))
                 Out[43]: [None,None,None,None]
             In non-blocking mode :meth:`push` and :meth:`pull` also return
             :class:`AsyncResult` objects:
             .. sourcecode:: ipython
                 In [47]: rc.block=False
                 In [48]: pr = rc.pull('a')
                 In [49]: pr.get()
                 Out[49]: [1.03234, 1.03234, 1.03234, 1.03234]
             Dictionary interface
             --------------------
             Since a namespace is just a :class:`dict`, :class:`DirectView` objects provide
             dictionary-style access by key and methods such as :meth:`get` and
             :meth:`update` for convenience. This make the remote namespaces of the engines
             appear as a local dictionary. Underneath, this uses :meth:`push` and
             :meth:`pull`:
             .. sourcecode:: ipython
                 In [50]: rc.block=True
                 In [51]: rc[:]['a']=['foo','bar']
                 In [52]: rc[:]['a']
                 Out[52]: [ ['foo', 'bar'], ['foo', 'bar'], ['foo', 'bar'], ['foo', 'bar'] ]
             Scatter and gather
             ------------------
             Sometimes it is useful to partition a sequence and push the partitions to
             different engines. In MPI language, this is know as scatter/gather and we
             follow that terminology. However, it is important to remember that in
             IPython's :class:`Client` class, :meth:`scatter` is from the
             interactive IPython session to the engines and :meth:`gather` is from the
             engines back to the interactive IPython session. For scatter/gather operations
             between engines, MPI should be used:
             .. sourcecode:: ipython
                 In [58]: rc.scatter('a',range(16))
                 Out[58]: [None,None,None,None]
                 In [59]: rc[:]['a']
                 Out[59]: [ [0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15] ]
                 In [60]: rc.gather('a')
                 Out[60]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
             Other things to look at
             =======================
             How to do parallel list comprehensions
             --------------------------------------
             In many cases list comprehensions are nicer than using the map function. While
             we don't have fully parallel list comprehensions, it is simple to get the
             basic effect using :meth:`scatter` and :meth:`gather`:
             .. sourcecode:: ipython
                 In [66]: rc.scatter('x',range(64))
                 Out[66]: [None,None,None,None]
                 In [67]: px y = [i**10 for i in x]
                 Parallel execution on engines: [0, 1, 2, 3]
                 Out[67]:
                 In [68]: y = rc.gather('y')
                 In [69]: print y
                 [0, 1, 1024, 59049, 1048576, 9765625, 60466176, 282475249, 1073741824,...]
             Parallel exceptions
             -------------------
             In the multiengine interface, parallel commands can raise Python exceptions,
             just like serial commands. But, it is a little subtle, because a single
             parallel command can actually raise multiple exceptions (one for each engine
             the command was run on). To express this idea, the MultiEngine interface has a
             :exc:`CompositeError` exception class that will be raised in most cases. The
             :exc:`CompositeError` class is a special type of exception that wraps one or
             more other types of exceptions. Here is how it works:
             .. sourcecode:: ipython
                 In [76]: rc.block=True
                 In [77]: rc.execute('1/0')
                 ---------------------------------------------------------------------------
                 CompositeError                            Traceback (most recent call last)
                 /ipython1-client-r3021/docs/examples/<ipython console> in <module>()
                 /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in execute(self, lines, targets, block)
 targets, block = self._findTargetsAndBlock(targets, block)
 result = blockingCallFromThread(self.smultiengine.execute, lines,
                 --> 434             targets=targets, block=block)
 if block:
 result = ResultList(result)
                 /ipython1-client-r3021/ipython1/kernel/twistedutil.pyc in blockingCallFromThread(f, *a, **kw)
 result.raiseException()
 except Exception, e:
                 ---> 74             raise e
 return result
                 CompositeError: one or more exceptions from call to method: execute
                 [0:execute]: ZeroDivisionError: integer division or modulo by zero
                 [1:execute]: ZeroDivisionError: integer division or modulo by zero
                 [2:execute]: ZeroDivisionError: integer division or modulo by zero
                 [3:execute]: ZeroDivisionError: integer division or modulo by zero
             Notice how the error message printed when :exc:`CompositeError` is raised has
             information about the individual exceptions that were raised on each engine.
             If you want, you can even raise one of these original exceptions:
             .. sourcecode:: ipython
                 In [80]: try:
                    ....:     rc.execute('1/0')
                    ....: except client.CompositeError, e:
                    ....:     e.raise_exception()
                    ....:
                    ....:
                 ---------------------------------------------------------------------------
                 ZeroDivisionError                         Traceback (most recent call last)
                 /ipython1-client-r3021/docs/examples/<ipython console> in <module>()
                 /ipython1-client-r3021/ipython1/kernel/error.pyc in raise_exception(self, excid)
 raise IndexError("an exception with index %i does not exist"%excid)
 else:
                 --> 158             raise et, ev, etb
 def collect_exceptions(rlist, method):
                 ZeroDivisionError: integer division or modulo by zero
             If you are working in IPython, you can simple type ``%debug`` after one of
             these :exc:`CompositeError` exceptions is raised, and inspect the exception
             instance:
             .. sourcecode:: ipython
                 In [81]: rc.execute('1/0')
                 ---------------------------------------------------------------------------
                 CompositeError                            Traceback (most recent call last)
                 /ipython1-client-r3021/docs/examples/<ipython console> in <module>()
                 /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in execute(self, lines, targets, block)
 targets, block = self._findTargetsAndBlock(targets, block)
 result = blockingCallFromThread(self.smultiengine.execute, lines,
                 --> 434             targets=targets, block=block)
 if block:
 result = ResultList(result)
                 /ipython1-client-r3021/ipython1/kernel/twistedutil.pyc in blockingCallFromThread(f, *a, **kw)
 result.raiseException()
 except Exception, e:
                 ---> 74             raise e
 return result
                 CompositeError: one or more exceptions from call to method: execute
                 [0:execute]: ZeroDivisionError: integer division or modulo by zero
                 [1:execute]: ZeroDivisionError: integer division or modulo by zero
                 [2:execute]: ZeroDivisionError: integer division or modulo by zero
                 [3:execute]: ZeroDivisionError: integer division or modulo by zero
                 In [82]: %debug
                 >
                 /ipython1-client-r3021/ipython1/kernel/twistedutil.py(74)blockingCallFromThread()
 except Exception, e:
                 ---> 74             raise e
 return result
                 # With the debugger running, e is the exceptions instance.  We can tab complete
                 # on it and see the extra methods that are available.
                 ipdb> e.
                 e.__class__         e.__getitem__       e.__new__           e.__setstate__      e.args
                 e.__delattr__       e.__getslice__      e.__reduce__        e.__str__           e.elist
                 e.__dict__          e.__hash__          e.__reduce_ex__     e.__weakref__       e.message
                 e.__doc__           e.__init__          e.__repr__          e._get_engine_str   e.print_tracebacks
                 e.__getattribute__  e.__module__        e.__setattr__       e._get_traceback    e.raise_exception
                 ipdb> e.print_tracebacks()
                 [0:execute]:
                 ---------------------------------------------------------------------------
                 ZeroDivisionError                         Traceback (most recent call last)
                 /ipython1-client-r3021/docs/examples/<string> in <module>()
                 ZeroDivisionError: integer division or modulo by zero
                 [1:execute]:
                 ---------------------------------------------------------------------------
                 ZeroDivisionError                         Traceback (most recent call last)
                 /ipython1-client-r3021/docs/examples/<string> in <module>()
                 ZeroDivisionError: integer division or modulo by zero
                 [2:execute]:
                 ---------------------------------------------------------------------------
                 ZeroDivisionError                         Traceback (most recent call last)
                 /ipython1-client-r3021/docs/examples/<string> in <module>()
                 ZeroDivisionError: integer division or modulo by zero
                 [3:execute]:
                 ---------------------------------------------------------------------------
                 ZeroDivisionError                         Traceback (most recent call last)
                 /ipython1-client-r3021/docs/examples/<string> in <module>()
                 ZeroDivisionError: integer division or modulo by zero
             All of this same error handling magic even works in non-blocking mode:
             .. sourcecode:: ipython
                 In [83]: rc.block=False
                 In [84]: pr = rc.execute('1/0')
                 In [85]: pr.get()
                 ---------------------------------------------------------------------------
                 CompositeError                            Traceback (most recent call last)
                 /ipython1-client-r3021/docs/examples/<ipython console> in <module>()
                 /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in _get_r(self)
 def _get_r(self):
                 --> 172         return self.get_result(block=True)
 r = property(_get_r)
                 /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in get_result(self, default, block)
 return self.result
 try:
                 --> 133             result = self.client.get_pending_deferred(self.result_id, block)
 except error.ResultNotCompleted:
 return default
                 /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in get_pending_deferred(self, deferredID, block)
 def get_pending_deferred(self, deferredID, block):
                 --> 387         return blockingCallFromThread(self.smultiengine.get_pending_deferred, deferredID, block)
 def barrier(self, pendingResults):
                 /ipython1-client-r3021/ipython1/kernel/twistedutil.pyc in blockingCallFromThread(f, *a, **kw)
 result.raiseException()
 except Exception, e:
                 ---> 74             raise e
 return result
                 CompositeError: one or more exceptions from call to method: execute
                 [0:execute]: ZeroDivisionError: integer division or modulo by zero
                 [1:execute]: ZeroDivisionError: integer division or modulo by zero
                 [2:execute]: ZeroDivisionError: integer division or modulo by zero
                 [3:execute]: ZeroDivisionError: integer division or modulo by zero

docs/source/parallelz/parallel_security.txt

0 +2 -1

             .. _parallelsecurity:
             ===========================
             Security details of IPython
             ===========================
             .. note::
                 This section is not thorough, and IPython.zmq needs a thorough security
                 audit.
             IPython's :mod:`IPython.zmq` package exposes the full power of the
             Python interpreter over a TCP/IP network for the purposes of parallel
             computing. This feature brings up the important question of IPython's security
             model. This document gives details about this model and how it is implemented
             in IPython's architecture.
             Processs and network topology
             =============================
             To enable parallel computing, IPython has a number of different processes that
             run. These processes are discussed at length in the IPython documentation and
             are summarized here:
             * The IPython *engine*.  This process is a full blown Python
               interpreter in which user code is executed.  Multiple
               engines are started to make parallel computing possible.
             * The IPython *hub*.  This process monitors a set of
               engines and schedulers, and keeps track of the state of the processes. It listens
               for registration connections from engines and clients, and monitor connections
               from schedulers.
             * The IPython *schedulers*. This is a set of processes that relay commands and results
               between clients and engines. They are typically on the same machine as the controller,
               and listen for connections from engines and clients, but connect to the Hub.
             * The IPython *client*.  This process is typically an
               interactive Python process that is used to coordinate the
               engines to get a parallel computation done.
             Collectively, these processes are called the IPython *kernel*, and the hub and schedulers
             together are referred to as the *controller*.
             .. note::
                 Are these really still referred to as the Kernel?  It doesn't seem so to me.  'cluster'
                 seems more accurate.
                 -MinRK
             These processes communicate over any transport supported by ZeroMQ (tcp,pgm,infiniband,ipc)
             with a well defined topology. The IPython hub and schedulers listen on sockets. Upon
             starting, an engine connects to a hub and registers itself, which then informs the engine
             of the connection information for the schedulers, and the engine then connects to the
             schedulers. These engine/hub and engine/scheduler connections persist for the
             lifetime of each engine.
             The IPython client also connects to the controller processes using a number of socket
             connections. As of writing, this is one socket per scheduler (4), and 3 connections to the
             hub for a total of 7. These connections persist for the lifetime of the client only.
             A given IPython controller and set of engines engines typically has a relatively
             short lifetime. Typically this lifetime corresponds to the duration of a single parallel
             simulation performed by a single user. Finally, the hub, schedulers, engines, and client
             processes typically execute with the permissions of that same user. More specifically, the
             controller and engines are *not* executed as root or with any other superuser permissions.
             Application logic
             =================
             When running the IPython kernel to perform a parallel computation, a user
             utilizes the IPython client to send Python commands and data through the
             IPython schedulers to the IPython engines, where those commands are executed
             and the data processed. The design of IPython ensures that the client is the
             only access point for the capabilities of the engines. That is, the only way
             of addressing the engines is through a client.
             A user can utilize the client to instruct the IPython engines to execute
             arbitrary Python commands. These Python commands can include calls to the
             system shell, access the filesystem, etc., as required by the user's
             application code. From this perspective, when a user runs an IPython engine on
             a host, that engine has the same capabilities and permissions as the user
             themselves (as if they were logged onto the engine's host with a terminal).
             Secure network connections
             ==========================
             Overview
             --------
             ZeroMQ provides exactly no security. For this reason, users of IPython must be very
             careful in managing connections, because an open TCP/IP socket presents access to
             arbitrary execution as the user on the engine machines. As a result, the default behavior
             of controller processes is to only listen for clients on the loopback interface, and the
             client must establish SSH tunnels to connect to the controller processes.
             .. warning::
                 If the controller's loopback interface is untrusted, then IPython should be considered
                 vulnerable, and this extends to the loopback of all connected clients, which have
                 opened a loopback port that is redirected to the controller's loopback port.
             SSH
             ---
             Since ZeroMQ provides no security, SSH tunnels are the primary source of secure
             connections. A connector file, such as `ipcontroller-client.json`, will contain
             information for connecting to the controller, possibly including the address of an
             ssh-server through with the client is to tunnel. The Client object then creates tunnels
             using either [OpenSSH]_ or [Paramiko]_, depending on the platform. If users do not wish to
             use OpenSSH or Paramiko, or the tunneling utilities are insufficient, then they may
             construct the tunnels themselves, and simply connect clients and engines as if the
             controller were on loopback on the connecting machine.
             .. note::
                 There is not currently tunneling available for engines.
             Authentication
             --------------
             To protect users of shared machines, an execution key is used to authenticate all messages.
             The Session object that handles the message protocol uses a unique key to verify valid
             messages. This can be any value specified by the user, but the default behavior is a
             pseudo-random 128-bit number, as generated by `uuid.uuid4()`. This key is checked on every
             message everywhere it is unpacked (Controller, Engine, and Client) to ensure that it came
             from an authentic user, and no messages that do not contain this key are acted upon in any
             way.
             There is exactly one key per cluster - it must be the same everywhere. Typically, the
             controller creates this key, and stores it in the private connection files
             `ipython-{engine|client}.json`. These files are typically stored in the
             `~/.ipython/clusterz_<profile>/security` directory, and are maintained as readable only by
             the owner, just as is common practice with a user's keys in their `.ssh` directory.
             .. warning::
                 It is important to note that the key authentication, as emphasized by the use of
                 a uuid rather than generating a key with a cryptographic library, provides a
                 defense against *accidental* messages more than it does against malicious attacks.
                 If loopback is compromised, it would be trivial for an attacker to intercept messages
                 and deduce the key, as there is no encryption.
             Specific security vulnerabilities
             =================================
             There are a number of potential security vulnerabilities present in IPython's
             architecture. In this section we discuss those vulnerabilities and detail how
             the security architecture described above prevents them from being exploited.
             Unauthorized clients
             --------------------
             The IPython client can instruct the IPython engines to execute arbitrary
             Python code with the permissions of the user who started the engines. If an
             attacker were able to connect their own hostile IPython client to the IPython
             controller, they could instruct the engines to execute code.
             On the first level, this attack is prevented by requiring access to the controller's
             ports, which are recommended to only be open on loopback if the controller is on an
             untrusted local network. If the attacker does have access to the Controller's ports, then
             the attack is prevented by the capabilities based client authentication of the execution
             key. The relevant authentication information is encoded into the JSON file that clients
             must present to gain access to the IPython controller. By limiting the distribution of
             those keys, a user can grant access to only authorized persons, just as with SSH keys.
             It is highly unlikely that an execution key could be guessed by an attacker
             in a brute force guessing attack. A given instance of the IPython controller
             only runs for a relatively short amount of time (on the order of hours). Thus
             an attacker would have only a limited amount of time to test a search space of
             size 2**128.
             .. warning::
                 If the attacker has gained enough access to intercept loopback connections on
                 *either* the controller or client, then the key is easily deduced from network
                 traffic.
             Unauthorized engines
             --------------------
             If an attacker were able to connect a hostile engine to a user's controller,
             the user might unknowingly send sensitive code or data to the hostile engine.
             This attacker's engine would then have full access to that code and data.
             This type of attack is prevented in the same way as the unauthorized client
             attack, through the usage of the capabilities based authentication scheme.
             Unauthorized controllers
             ------------------------
             It is also possible that an attacker could try to convince a user's IPython
             client or engine to connect to a hostile IPython controller. That controller
             would then have full access to the code and data sent between the IPython
             client and the IPython engines.
             Again, this attack is prevented through the capabilities in a connection file, which
             ensure that a client or engine connects to the correct controller. It is also important to
             note that the connection files also encode the IP address and port that the controller is
             listening on, so there is little chance of mistakenly connecting to a controller running
             on a different IP address and port.
             When starting an engine or client, a user must specify the key to use
             for that connection. Thus, in order to introduce a hostile controller, the
             attacker must convince the user to use the key associated with the
             hostile controller. As long as a user is diligent in only using keys from
             trusted sources, this attack is not possible.
             .. note::
                 I may be wrong, the unauthorized controller may be easier to fake than this.
             Other security measures
             =======================
             A number of other measures are taken to further limit the security risks
             involved in running the IPython kernel.
             First, by default, the IPython controller listens on random port numbers.
             While this can be overridden by the user, in the default configuration, an
             attacker would have to do a port scan to even find a controller to attack.
             When coupled with the relatively short running time of a typical controller
             (on the order of hours), an attacker would have to work extremely hard and
             extremely *fast* to even find a running controller to attack.
             Second, much of the time, especially when run on supercomputers or clusters,
             the controller is running behind a firewall. Thus, for engines or client to
             connect to the controller:
             * The different processes have to all be behind the firewall.
             or:
             * The user has to use SSH port forwarding to tunnel the
               connections through the firewall.
             In either case, an attacker is presented with additional barriers that prevent
             attacking or even probing the system.
             Summary
             =======
             IPython's architecture has been carefully designed with security in mind. The
             capabilities based authentication model, in conjunction with SSH tunneled
             TCP/IP channels, address the core potential vulnerabilities in the system,
             while still enabling user's to use the system in open networks.
             Other questions
             ===============
             .. note::
                 this does not apply to ZMQ, but I am sure there will be questions.
             About keys
             ----------
             Can you clarify the roles of the certificate and its keys versus the FURL,
             which is also called a key?
             The certificate created by IPython processes is a standard public key x509
             certificate, that is used by the SSL handshake protocol to setup encrypted
             channel between the controller and the IPython engine or client. This public
             and private key associated with this certificate are used only by the SSL
             handshake protocol in setting up this encrypted channel.
             The FURL serves a completely different and independent purpose from the
             key pair associated with the certificate. When we refer to a FURL as a
             key, we are using the word "key" in the capabilities based security model
             sense. This has nothing to do with "key" in the public/private key sense used
             in the SSL protocol.
             With that said the FURL is used as an cryptographic key, to grant
             IPython engines and clients access to particular capabilities that the
             controller offers.
             Self signed certificates
             ------------------------
             Is the controller creating a self-signed certificate? Is this created for per
             instance/session, one-time-setup or each-time the controller is started?
             The Foolscap network protocol, which handles the SSL protocol details, creates
             a self-signed x509 certificate using OpenSSL for each IPython process. The
             lifetime of the certificate is handled differently for the IPython controller
             and the engines/client.
             For the IPython engines and client, the certificate is only held in memory for
             the lifetime of its process. It is never written to disk.
             For the controller, the certificate can be created anew each time the
             controller starts or it can be created once and reused each time the
             controller starts. If at any point, the certificate is deleted, a new one is
             created the next time the controller starts.
             SSL private key
             ---------------
             How the private key (associated with the certificate) is distributed?
             In the usual implementation of the SSL protocol, the private key is never
             distributed. We follow this standard always.
             SSL versus Foolscap authentication
             ----------------------------------
             Many SSL connections only perform one sided authentication (the server to the
             client). How is the client authentication in IPython's system related to SSL
             authentication?
             We perform a two way SSL handshake in which both parties request and verify
             the certificate of their peer. This mutual authentication is handled by the
             SSL handshake and is separate and independent from the additional
             authentication steps that the CLIENT and SERVER perform after an encrypted
             channel is established.
             .. [RFC5246] <http://tools.ietf.org/html/rfc5246>
+            .. [OpenSSH] <http://www.openssh.com/>
+            .. [Paramiko] <http://www.lag.net/paramiko/>

docs/source/parallelz/parallel_task.txt

0 +272 -9

@@ -1,132 +1,395 b''
1	.. _paralleltask:	1	.. _paralleltask:
2		2
3	==========================	3	==========================
4	The IPython task interface	4	The IPython task interface
5	==========================	5	==========================
6		6
7	The task interface to the c~~ontroll~~er presents the engines as a fault tolerant,	7	The task interface to the cluster presents the engines as a fault tolerant,
8	dynamic load-balanced system of workers. Unlike the multiengine interface, in	8	dynamic load-balanced system of workers. Unlike the multiengine interface, in
9	the task interface, the user have no direct access to individual engines. By	9	the task interface the user have no direct access to individual engines. By
10	allowing the IPython scheduler to assign work, this interface is ~~both simpler~~	10	allowing the IPython scheduler to assign work, this interface is simultaneously
11	and more powerful.	11	simpler and more powerful.
12		12
13	Best of all the user can use both of these interfaces running at the same time	13	Best of all, the user can use both of these interfaces running at the same time
14	to take advantage of their respective strengths. When the user can break up	14	to take advantage of their respective strengths. When the user can break up
15	the user's work into segments that do not depend on previous execution, the	15	the user's work into segments that do not depend on previous execution, the
16	task interface is ideal. But it also has more power and flexibility, allowing	16	task interface is ideal. But it also has more power and flexibility, allowing
17	the user to guide the distribution of jobs, without having to assign tasks to	17	the user to guide the distribution of jobs, without having to assign tasks to
18	engines explicitly.	18	engines explicitly.
19		19
20	Starting the IPython controller and engines	20	Starting the IPython controller and engines
21	===========================================	21	===========================================
22		22
23	To follow along with this tutorial, you will need to start the IPython	23	To follow along with this tutorial, you will need to start the IPython
24	controller and four IPython engines. The simplest way of doing this is to use	24	controller and four IPython engines. The simplest way of doing this is to use
25	the :command:`ipclusterz` command::	25	the :command:`ipclusterz` command::
26		26
27	$ ipclusterz start -n 4	27	$ ipclusterz start -n 4
28		28
29	For more detailed information about starting the controller and engines, see	29	For more detailed information about starting the controller and engines, see
30	our :ref:`introduction <ip1par>` to using IPython for parallel computing.	30	our :ref:`introduction <ip1par>` to using IPython for parallel computing.
31		31
32	Creating a ``Client`` instance	32	Creating a ``Client`` instance
33	==============================	33	==============================
34		34
35	The first step is to import the IPython :mod:`IPython.zmq.parallel.client`	35	The first step is to import the IPython :mod:`IPython.zmq.parallel.client`
36	module and then create a :class:`.Client` instance:	36	module and then create a :class:`.Client` instance:
37		37
38	.. sourcecode:: ipython	38	.. sourcecode:: ipython
39		39
40	In [1]: from IPython.zmq.parallel import client	40	In [1]: from IPython.zmq.parallel import client
41		41
42	In [2]: rc = client.Client()	42	In [2]: rc = client.Client()
43		43
44	In [3]: lview = rc[None]	44	In [3]: lview = rc[None]
45	Out[3]: <LoadBalancedView tcp://127.0.0.1:10101>	45	Out[3]: <LoadBalancedView tcp://127.0.0.1:10101>
46		46
47		47
48	This form assumes that the controller was started on localhost with default	48	This form assumes that the controller was started on localhost with default
49	configuration. If not, the location of the controller must be given as an	49	configuration. If not, the location of the controller must be given as an
50	argument to the constructor:	50	argument to the constructor:
51		51
52	.. sourcecode:: ipython	52	.. sourcecode:: ipython
53		53
54	# for a visible LAN controller listening on an external port:	54	# for a visible LAN controller listening on an external port:
55	In [2]: rc = client.Client('tcp://192.168.1.16:10101')	55	In [2]: rc = client.Client('tcp://192.168.1.16:10101')
56	# for a remote controller at my.server.com listening on localhost:	56	# for a remote controller at my.server.com listening on localhost:
57	In [3]: rc = client.Client(sshserver='my.server.com')	57	In [3]: rc = client.Client(sshserver='my.server.com')
58		58
59		59
60		60
61	Quick and easy parallelism	61	Quick and easy parallelism
62	==========================	62	==========================
63		63
64	In many cases, you simply want to apply a Python function to a sequence of	64	In many cases, you simply want to apply a Python function to a sequence of
65	objects, but in parallel. Like the multiengine interface, these can be	65	objects, but in parallel. Like the multiengine interface, these can be
66	implemented via the task interface. The exact same tools can perform these	66	implemented via the task interface. The exact same tools can perform these
67	actions in load-balanced ways as well as multiplexed ways: a parallel version	67	actions in load-balanced ways as well as multiplexed ways: a parallel version
68	of :func:`map` and :func:`@parallel` function decorator. If one specifies the	68	of :func:`map` and :func:`@parallel` function decorator. If one specifies the
69	argument `targets=None`, then they are dynamically load balanced. Thus, if the	69	argument `targets=None`, then they are dynamically load balanced. Thus, if the
70	execution time per item varies significantly, you should use the versions in	70	execution time per item varies significantly, you should use the versions in
71	the task interface.	71	the task interface.
72		72
73	Parallel map	73	Parallel map
74	------------	74	------------
75		75
76	To load-balance :meth:`map`,simply use a LoadBalancedView, created by asking	76	To load-balance :meth:`map`,simply use a LoadBalancedView, created by asking
77	for the ``None`` element:	77	for the ``None`` element:
78		78
79	.. sourcecode:: ipython	79	.. sourcecode:: ipython
80		80
81	In [63]: serial_result = map(lambda x:x**10, range(32))	81	In [63]: serial_result = map(lambda x:x**10, range(32))
82		82
83	In [64]: parallel_result = tc[None].map(lambda x:x**10, range(32))	83	In [64]: parallel_result = tc[None].map(lambda x:x**10, range(32))
84		84
85	In [65]: serial_result==parallel_result	85	In [65]: serial_result==parallel_result
86	Out[65]: True	86	Out[65]: True
87		87
88	Parallel function decorator	88	Parallel function decorator
89	---------------------------	89	---------------------------
90		90
91	Parallel functions are just like normal function, but they can be called on	91	Parallel functions are just like normal function, but they can be called on
92	sequences and in parallel. The multiengine interface provides a decorator	92	sequences and in parallel. The multiengine interface provides a decorator
93	that turns any Python function into a parallel function:	93	that turns any Python function into a parallel function:
94		94
95	.. sourcecode:: ipython	95	.. sourcecode:: ipython
96		96
97	In [10]: @lview.parallel()	97	In [10]: @lview.parallel()
98	....: def f(x):	98	....: def f(x):
99	....: return 10.0x*4	99	....: return 10.0x*4
100	....:	100	....:
101		101
102	In [11]: f.map(range(32)) # this is done in parallel	102	In [11]: f.map(range(32)) # this is done in parallel
103	Out[11]: [0.0,10.0,160.0,...]	103	Out[11]: [0.0,10.0,160.0,...]
104		104
		105	Dependencies
		106	============
		107
		108	Often, pure atomic load-balancing is too primitive for your work. In these cases, you
		109	may want to associate some kind of `Dependency` that describes when, where, or whether
		110	a task can be run. In IPython, we provide two types of dependencies:
		111	`Functional Dependencies`_ and `Graph Dependencies`_
		112
		113	.. note::
		114
		115	It is important to note that the pure ZeroMQ scheduler does not support dependencies,
		116	and you will see errors or warnings if you try to use dependencies with the pure
		117	scheduler.
		118
		119	Functional Dependencies
		120	-----------------------
		121
		122	Functional dependencies are used to determine whether a given engine is capable of running
		123	a particular task. This is implemented via a special :class:`Exception` class,
		124	:class:`UnmetDependency`, found in `IPython.zmq.parallel.error`. Its use is very simple:
		125	if a task fails with an UnmetDependency exception, then the scheduler, instead of relaying
		126	the error up to the client like any other error, catches the error, and submits the task
		127	to a different engine. This will repeat indefinitely, and a task will never be submitted
		128	to a given engine a second time.
		129
		130	You can manually raise the :class:`UnmetDependency` yourself, but IPython has provided
		131	some decorators for facilitating this behavior.
		132
		133	There are two decorators and a class used for functional dependencies:
		134
		135	.. sourcecode:: ipython
		136
		137	In [9]: from IPython.zmq.parallel.dependency import depend, require, dependent
		138
		139	@require
		140	********
		141
		142	The simplest sort of dependency is requiring that a Python module is available. The
		143	``@require`` decorator lets you define a function that will only run on engines where names
		144	you specify are importable:
		145
		146	.. sourcecode:: ipython
		147
		148	In [10]: @require('numpy', 'zmq')
		149	...: def myfunc():
		150	...: import numpy,zmq
		151	...: return dostuff()
		152
		153	Now, any time you apply :func:`myfunc`, the task will only run on a machine that has
		154	numpy and pyzmq available.
		155
		156	@depend
		157	*******
		158
		159	The ``@depend`` decorator lets you decorate any function with any other function to
		160	evaluate the dependency. The dependency function will be called at the start of the task,
		161	and if it returns ``False``, then the dependency will be considered unmet, and the task
		162	will be assigned to another engine. If the dependency returns *anything other than
		163	``False``*, the rest of the task will continue.
		164
		165	.. sourcecode:: ipython
		166
		167	In [10]: def platform_specific(plat):
		168	...: import sys
		169	...: return sys.platform == plat
		170
		171	In [11]: @depend(platform_specific, 'darwin')
		172	...: def mactask():
		173	...: do_mac_stuff()
		174
		175	In [12]: @depend(platform_specific, 'nt')
		176	...: def wintask():
		177	...: do_windows_stuff()
		178
		179	In this case, any time you apply ``mytask``, it will only run on an OSX machine.
		180	``@depend`` is just like ``apply``, in that it has a ``@depend(f,args,*kwargs)``
		181	signature.
		182
		183	dependents
		184	**********
		185
		186	You don't have to use the decorators on your tasks, if for instance you may want
		187	to run tasks with a single function but varying dependencies, you can directly construct
		188	the :class:`dependent` object that the decorators use:
		189
		190	.. sourcecode::ipython
		191
		192	In [13]: def mytask(*args):
		193	...: dostuff()
		194
		195	In [14]: mactask = dependent(mytask, platform_specific, 'darwin')
		196	# this is the same as decorating the declaration of mytask with @depend
		197	# but you can do it again:
		198
		199	In [15]: wintask = dependent(mytask, platform_specific, 'nt')
		200
		201	# in general:
		202	In [16]: t = dependent(f, g, dargs, *dkwargs)
		203
		204	# is equivalent to:
		205	In [17]: @depend(g, dargs, *dkwargs)
		206	...: def t(a,b,c):
		207	...: # contents of f
		208
		209	Graph Dependencies
		210	------------------
		211
		212	Sometimes you want to restrict the time and/or location to run a given task as a function
		213	of the time and/or location of other tasks. This is implemented via a subclass of
		214	:class:`set`, called a :class:`Dependency`. A Dependency is just a set of `msg_ids`
		215	corresponding to tasks, and a few attributes to guide how to decide when the Dependency
		216	has been met.
		217
		218	The switches we provide for interpreting whether a given dependency set has been met:
		219
		220	any\|all
		221	Whether the dependency is considered met if any of the dependencies are done, or
		222	only after all of them have finished. This is set by a Dependency's :attr:`all`
		223	boolean attribute, which defaults to ``True``.
		224
		225	success_only
		226	Whether to consider only tasks that did not raise an error as being fulfilled.
		227	Sometimes you want to run a task after another, but only if that task succeeded. In
		228	this case, ``success_only`` should be ``True``. However sometimes you may not care
		229	whether the task succeeds, and always want the second task to run, in which case
		230	you should use `success_only=False`. The default behavior is to only use successes.
		231
		232	There are other switches for interpretation that are made at the task level. These are
		233	specified via keyword arguments to the client's :meth:`apply` method.
		234
		235	after,follow
		236	You may want to run a task after a given set of dependencies have been run and/or
		237	run it where another set of dependencies are met. To support this, every task has an
		238	`after` dependency to restrict time, and a `follow` dependency to restrict
		239	destination.
		240
		241	timeout
		242	You may also want to set a time-limit for how long the scheduler should wait before a
		243	task's dependencies are met. This is done via a `timeout`, which defaults to 0, which
		244	indicates that the task should never timeout. If the timeout is reached, and the
		245	scheduler still hasn't been able to assign the task to an engine, the task will fail
		246	with a :class:`DependencyTimeout`.
		247
		248	.. note::
		249
		250	Dependencies only work within the task scheduler. You cannot instruct a load-balanced
		251	task to run after a job submitted via the MUX interface.
		252
		253	The simplest form of Dependencies is with `all=True,success_only=True`. In these cases,
		254	you can skip using Dependency objects, and just pass msg_ids or AsyncResult objects as the
		255	`follow` and `after` keywords to :meth:`client.apply`:
		256
		257	.. sourcecode:: ipython
		258
		259	In [14]: client.block=False
		260
		261	In [15]: ar = client.apply(f, args, kwargs, targets=None)
		262
		263	In [16]: ar2 = client.apply(f2, targets=None)
		264
		265	In [17]: ar3 = client.apply(f3, after=[ar,ar2])
		266
		267	In [17]: ar4 = client.apply(f3, follow=[ar], timeout=2.5)
		268
		269
		270	.. seealso::
		271
		272	Some parallel workloads can be described as a `Directed Acyclic Graph
		273	<http://en.wikipedia.org/wiki/Directed_acyclic_graph>`_, or DAG. See :ref:`DAG
		274	Dependencies <dag_dependencies>` for an example demonstrating how to use map a NetworkX DAG
		275	onto task dependencies.
		276
		277
		278
		279	Impossible Dependencies
		280	***********************
		281
		282	The schedulers do perform some analysis on graph dependencies to determine whether they
		283	are not possible to be met. If the scheduler does discover that a dependency cannot be
		284	met, then the task will fail with an :class:`ImpossibleDependency` error. This way, if the
		285	scheduler realized that a task can never be run, it won't sit indefinitely in the
		286	scheduler clogging the pipeline.
		287
		288	The basic cases that are checked:
		289
		290	* depending on nonexistent messages
		291	* `follow` dependencies were run on more than one machine and `all=True`
		292	* any dependencies failed and `all=True,success_only=True`
		293	* all dependencies failed and `all=False,success_only=True`
		294
		295	.. warning::
		296
		297	This analysis has not been proven to be rigorous, so it is likely possible for tasks
		298	to become impossible to run in obscure situations, so a timeout may be a good choice.
		299
		300	Schedulers
		301	==========
		302
		303	There are a variety of valid ways to determine where jobs should be assigned in a
		304	load-balancing situation. In IPython, we support several standard schemes, and
		305	even make it easy to define your own. The scheme can be selected via the ``--scheme``
		306	argument to :command:`ipcontrollerz`, or in the :attr:`HubFactory.scheme` attribute
		307	of a controller config object.
		308
		309	The built-in routing schemes:
		310
		311	lru: Least Recently Used
		312
		313	Always assign work to the least-recently-used engine. A close relative of
		314	round-robin, it will be fair with respect to the number of tasks, agnostic
		315	with respect to runtime of each task.
		316
		317	plainrandom: Plain Random
		318	Randomly picks an engine on which to run.
		319
		320	twobin: Two-Bin Random
		321
		322	Depends on numpy
		323
		324	Pick two engines at random, and use the LRU of the two. This is known to be better
		325	than plain random in many cases, but requires a small amount of computation.
		326
		327	leastload: Least Load
		328
		329	This is the default scheme
		330
		331	Always assign tasks to the engine with the fewest outstanding tasks (LRU breaks tie).
		332
		333	weighted: Weighted Two-Bin Random
		334
		335	Depends on numpy
		336
		337	Pick two engines at random using the number of outstanding tasks as inverse weights,
		338	and use the one with the lower load.
		339
		340
		341	Pure ZMQ Scheduler
		342	------------------
		343
		344	For maximum throughput, the 'pure' scheme is not Python at all, but a C-level
		345	:class:`MonitoredQueue` from PyZMQ, which uses a ZeroMQ ``XREQ`` socket to perform all
		346	load-balancing. This scheduler does not support any of the advanced features of the Python
		347	:class:`.Scheduler`.
		348
		349	Disabled features when using the ZMQ Scheduler:
		350
		351	* Engine unregistration
		352	Task farming will be disabled if an engine unregisters.
		353	Further, if an engine is unregistered during computation, the scheduler may not recover.
		354	* Dependencies
		355	Since there is no Python logic inside the Scheduler, routing decisions cannot be made
		356	based on message content.
		357	* Early destination notification
		358	The Python schedulers know which engine gets which task, and notify the Hub. This
		359	allows graceful handling of Engines coming and going. There is no way to know
		360	where ZeroMQ messages have gone, so there is no way to know what tasks are on which
		361	engine until they finish. This makes recovery from engine shutdown very difficult.
		362
		363
		364	.. note::
		365
		366	TODO: performance comparisons
		367
		368
105	More details	369	More details
106	============	370	============
107		371
108	The :class:`Client` has many more powerful features that allow quite a bit	372	The :class:`Client` has many more powerful features that allow quite a bit
109	of flexibility in how tasks are defined and run. The next places to look are	373	of flexibility in how tasks are defined and run. The next places to look are
110	in the following classes:	374	in the following classes:
111		375
112	* :class:`IPython.zmq.parallel.client.Client`	376	* :class:`IPython.zmq.parallel.client.Client`
113	* :class:`IPython.zmq.parallel.client.AsyncResult`	377	* :class:`IPython.zmq.parallel.client.AsyncResult`
114	* :meth:`IPython.zmq.parallel.client.Client.apply`	378	* :meth:`IPython.zmq.parallel.client.Client.apply`
115	* :mod:`IPython.zmq.parallel.dependency`	379	* :mod:`IPython.zmq.parallel.dependency`
116		380
117	The following is an overview of how to use these classes together:	381	The following is an overview of how to use these classes together:
118		382
119	1. Create a :class:`Client`.	383	1. Create a :class:`Client`.
120	2. Define some functions to be run as tasks	384	2. Define some functions to be run as tasks
121	3. Submit your tasks to using the :meth:`apply` method of your	385	3. Submit your tasks to using the :meth:`apply` method of your
122	:class:`Client` instance, specifying `targets=None`. This signals	386	:class:`Client` instance, specifying `targets=None`. This signals
123	the :class:`Client` to entrust the Scheduler with assigning tasks to engines.	387	the :class:`Client` to entrust the Scheduler with assigning tasks to engines.
124	4. Use :meth:`Client.get_results` to get the results of the	388	4. Use :meth:`Client.get_results` to get the results of the
125	tasks, or use the :meth:`AsyncResult.get` method of the results to wait	389	tasks, or use the :meth:`AsyncResult.get` method of the results to wait
126	for and then receive the results.	390	for and then receive the results.
127		391
128	We are in the process of developing more detailed information about the task
129	interface. For now, the docstrings of the :meth:`Client.apply`,
130	and :func:`depend` methods should be consulted.
131		392
		393	.. seealso::
132		394
		395	A demo of :ref:`DAG Dependencies <dag_dependencies>` with NetworkX and IPython.

docs/source/parallelz/parallel_winhpc.txt

0 +5 -5

             ============================================
             Getting started with Windows HPC Server 2008
             ============================================
             .. note::
                 Not adapted to zmq yet
             Introduction
             ============
             The Python programming language is an increasingly popular language for
             numerical computing. This is due to a unique combination of factors. First,
             Python is a high-level and *interactive* language that is well matched to
             interactive numerical work. Second, it is easy (often times trivial) to
             integrate legacy C/C++/Fortran code into Python. Third, a large number of
             high-quality open source projects provide all the needed building blocks for
             numerical computing: numerical arrays (NumPy), algorithms (SciPy), 2D/3D
             Visualization (Matplotlib, Mayavi, Chaco), Symbolic Mathematics (Sage, Sympy)
             and others.
             The IPython project is a core part of this open-source toolchain and is
             focused on creating a comprehensive environment for interactive and
             exploratory computing in the Python programming language. It enables all of
             the above tools to be used interactively and consists of two main components:
             * An enhanced interactive Python shell with support for interactive plotting
               and visualization.
             * An architecture for interactive parallel computing.
             With these components, it is possible to perform all aspects of a parallel
             computation interactively. This type of workflow is particularly relevant in
             scientific and numerical computing where algorithms, code and data are
             continually evolving as the user/developer explores a problem. The broad
             treads in computing (commodity clusters, multicore, cloud computing, etc.)
             make these capabilities of IPython particularly relevant.
             While IPython is a cross platform tool, it has particularly strong support for
             Windows based compute clusters running Windows HPC Server 2008. This document
             describes how to get started with IPython on Windows HPC Server 2008. The
             content and emphasis here is practical: installing IPython, configuring
             IPython to use the Windows job scheduler and running example parallel programs
             interactively. A more complete description of IPython's parallel computing
             capabilities can be found in IPython's online documentation
             (http://ipython.scipy.org/moin/Documentation).
             Setting up your Windows cluster
             ===============================
             This document assumes that you already have a cluster running Windows
             HPC Server 2008. Here is a broad overview of what is involved with setting up
             such a cluster:
 . Install Windows Server 2008 on the head and compute nodes in the cluster.
 . Setup the network configuration on each host. Each host should have a
                static IP address.
 . On the head node, activate the "Active Directory Domain Services" role
                and make the head node the domain controller.
 . Join the compute nodes to the newly created Active Directory (AD) domain.
 . Setup user accounts in the domain with shared home directories.
 . Install the HPC Pack 2008 on the head node to create a cluster.
 . Install the HPC Pack 2008 on the compute nodes.
             More details about installing and configuring Windows HPC Server 2008 can be
             found on the Windows HPC Home Page (http://www.microsoft.com/hpc). Regardless
             of what steps you follow to set up your cluster, the remainder of this
             document will assume that:
             * There are domain users that can log on to the AD domain and submit jobs
               to the cluster scheduler.
             * These domain users have shared home directories. While shared home
               directories are not required to use IPython, they make it much easier to
               use IPython.
             Installation of IPython and its dependencies
             ============================================
             IPython and all of its dependencies are freely available and open source.
             These packages provide a powerful and cost-effective approach to numerical and
             scientific computing on Windows. The following dependencies are needed to run
             IPython on Windows:
             * Python 2.5 or 2.6 (http://www.python.org)
             * pywin32 (http://sourceforge.net/projects/pywin32/)
             * PyReadline (https://launchpad.net/pyreadline)
             * zope.interface and Twisted (http://twistedmatrix.com)
             * Foolcap (http://foolscap.lothar.com/trac)
             * pyOpenSSL (https://launchpad.net/pyopenssl)
             * IPython (http://ipython.scipy.org)
             In addition, the following dependencies are needed to run the demos described
             in this document.
             * NumPy and SciPy (http://www.scipy.org)
             * wxPython (http://www.wxpython.org)
             * Matplotlib (http://matplotlib.sourceforge.net/)
             The easiest way of obtaining these dependencies is through the Enthought
             Python Distribution (EPD) (http://www.enthought.com/products/epd.php). EPD is
             produced by Enthought, Inc. and contains all of these packages and others in a
             single installer and is available free for academic users. While it is also
             possible to download and install each package individually, this is a tedious
             process. Thus, we highly recommend using EPD to install these packages on
             Windows.
             Regardless of how you install the dependencies, here are the steps you will
             need to follow:
 . Install all of the packages listed above, either individually or using EPD
                on the head node, compute nodes and user workstations.
 . Make sure that :file:`C:\\Python25` and :file:`C:\\Python25\\Scripts` are
                in the system :envvar:`%PATH%` variable on each node.
 . Install the latest development version of IPython. This can be done by
                downloading the the development version from the IPython website
                (http://ipython.scipy.org) and following the installation instructions.
             Further details about installing IPython or its dependencies can be found in
             the online IPython documentation (http://ipython.scipy.org/moin/Documentation)
             Once you are finished with the installation, you can try IPython out by
             opening a Windows Command Prompt and typing ``ipython``. This will
             start IPython's interactive shell and you should see something like the
             following screenshot:
-            .. image:: ipython_shell.*
+            .. image:: ../parallel/ipython_shell.*
             Starting an IPython cluster
             ===========================
             To use IPython's parallel computing capabilities, you will need to start an
             IPython cluster. An IPython cluster consists of one controller and multiple
             engines:
             IPython controller
                 The IPython controller manages the engines and acts as a gateway between
                 the engines and the client, which runs in the user's interactive IPython
                 session. The controller is started using the :command:`ipcontroller`
                 command.
             IPython engine
                 IPython engines run a user's Python code in parallel on the compute nodes.
                 Engines are starting using the :command:`ipengine` command.
             Once these processes are started, a user can run Python code interactively and
             in parallel on the engines from within the IPython shell using an appropriate
             client. This includes the ability to interact with, plot and visualize data
             from the engines.
             IPython has a command line program called :command:`ipclusterz` that automates
             all aspects of starting the controller and engines on the compute nodes.
             :command:`ipclusterz` has full support for the Windows HPC job scheduler,
             meaning that :command:`ipclusterz` can use this job scheduler to start the
             controller and engines. In our experience, the Windows HPC job scheduler is
             particularly well suited for interactive applications, such as IPython. Once
             :command:`ipclusterz` is configured properly, a user can start an IPython
             cluster from their local workstation almost instantly, without having to log
             on to the head node (as is typically required by Unix based job schedulers).
             This enables a user to move seamlessly between serial and parallel
             computations.
             In this section we show how to use :command:`ipclusterz` to start an IPython
             cluster using the Windows HPC Server 2008 job scheduler. To make sure that
             :command:`ipclusterz` is installed and working properly, you should first try
             to start an IPython cluster on your local host. To do this, open a Windows
             Command Prompt and type the following command::
                 ipclusterz start -n 2
             You should see a number of messages printed to the screen, ending with
             "IPython cluster: started". The result should look something like the following
             screenshot:
-            .. image:: ipclusterz_start.*
+            .. image:: ../parallel/ipcluster_start.*
             At this point, the controller and two engines are running on your local host.
             This configuration is useful for testing and for situations where you want to
             take advantage of multiple cores on your local computer.
             Now that we have confirmed that :command:`ipclusterz` is working properly, we
             describe how to configure and run an IPython cluster on an actual compute
             cluster running Windows HPC Server 2008. Here is an outline of the needed
             steps:
 . Create a cluster profile using: ``ipclusterz create -p mycluster``
 . Edit configuration files in the directory :file:`.ipython\\cluster_mycluster`
 . Start the cluster using: ``ipcluser start -p mycluster -n 32``
             Creating a cluster profile
             --------------------------
             In most cases, you will have to create a cluster profile to use IPython on a
             cluster. A cluster profile is a name (like "mycluster") that is associated
             with a particular cluster configuration. The profile name is used by
             :command:`ipclusterz` when working with the cluster.
             Associated with each cluster profile is a cluster directory. This cluster
             directory is a specially named directory (typically located in the
             :file:`.ipython` subdirectory of your home directory) that contains the
             configuration files for a particular cluster profile, as well as log files and
             security keys. The naming convention for cluster directories is:
             :file:`cluster_<profile name>`. Thus, the cluster directory for a profile named
             "foo" would be :file:`.ipython\\cluster_foo`.
             To create a new cluster profile (named "mycluster") and the associated cluster
             directory, type the following command at the Windows Command Prompt::
                 ipclusterz create -p mycluster
             The output of this command is shown in the screenshot below. Notice how
             :command:`ipclusterz` prints out the location of the newly created cluster
             directory.
-            .. image:: ipclusterz_create.*
+            .. image:: ../parallel/ipcluster_create.*
             Configuring a cluster profile
             -----------------------------
             Next, you will need to configure the newly created cluster profile by editing
             the following configuration files in the cluster directory:
             * :file:`ipclusterz_config.py`
             * :file:`ipcontroller_config.py`
             * :file:`ipengine_config.py`
             When :command:`ipclusterz` is run, these configuration files are used to
             determine how the engines and controller will be started. In most cases,
             you will only have to set a few of the attributes in these files.
             To configure :command:`ipclusterz` to use the Windows HPC job scheduler, you
             will need to edit the following attributes in the file
             :file:`ipclusterz_config.py`::
                 # Set these at the top of the file to tell ipclusterz to use the
                 # Windows HPC job scheduler.
                 c.Global.controller_launcher = \
                     'IPython.zmq.parallel.launcher.WindowsHPCControllerLauncher'
                 c.Global.engine_launcher = \
                     'IPython.zmq.parallel.launcher.WindowsHPCEngineSetLauncher'
                 # Set these to the host name of the scheduler (head node) of your cluster.
                 c.WindowsHPCControllerLauncher.scheduler = 'HEADNODE'
                 c.WindowsHPCEngineSetLauncher.scheduler = 'HEADNODE'
             There are a number of other configuration attributes that can be set, but
             in most cases these will be sufficient to get you started.
             .. warning::
                 If any of your configuration attributes involve specifying the location
                 of shared directories or files, you must make sure that you use UNC paths
                 like :file:`\\\\host\\share`. It is also important that you specify
                 these paths using raw Python strings: ``r'\\host\share'`` to make sure
                 that the backslashes are properly escaped.
             Starting the cluster profile
             ----------------------------
             Once a cluster profile has been configured, starting an IPython cluster using
             the profile is simple::
                 ipclusterz start -p mycluster -n 32
             The ``-n`` option tells :command:`ipclusterz` how many engines to start (in
             this case 32). Stopping the cluster is as simple as typing Control-C.
             Using the HPC Job Manager
             -------------------------
             When ``ipclusterz start`` is run the first time, :command:`ipclusterz` creates
             two XML job description files in the cluster directory:
             * :file:`ipcontroller_job.xml`
             * :file:`ipengineset_job.xml`
             Once these files have been created, they can be imported into the HPC Job
             Manager application. Then, the controller and engines for that profile can be
             started using the HPC Job Manager directly, without using :command:`ipclusterz`.
             However, anytime the cluster profile is re-configured, ``ipclusterz start``
             must be run again to regenerate the XML job description files. The
             following screenshot shows what the HPC Job Manager interface looks like
             with a running IPython cluster.
-            .. image:: hpc_job_manager.*
+            .. image:: ../parallel/hpc_job_manager.*
             Performing a simple interactive parallel computation
             ====================================================
             Once you have started your IPython cluster, you can start to use it. To do
             this, open up a new Windows Command Prompt and start up IPython's interactive
             shell by typing::
                 ipython
             Then you can create a :class:`MultiEngineClient` instance for your profile and
             use the resulting instance to do a simple interactive parallel computation. In
             the code and screenshot that follows, we take a simple Python function and
             apply it to each element of an array of integers in parallel using the
             :meth:`MultiEngineClient.map` method:
             .. sourcecode:: ipython
                 In [1]: from IPython.zmq.parallel.client import *
                 In [2]: c = MultiEngineClient(profile='mycluster')
                 In [3]: mec.get_ids()
                 Out[3]: [0, 1, 2, 3, 4, 5, 67, 8, 9, 10, 11, 12, 13, 14]
                 In [4]: def f(x):
                    ...:     return x**10
                 In [5]: mec.map(f, range(15))  # f is applied in parallel
                 Out[5]:
                 [0,
 ,
 ,
 ,
                  1048576,
                  9765625,
                  60466176,
                  282475249,
                  1073741824,
                  3486784401L,
                  10000000000L,
                  25937424601L,
                  61917364224L,
                  137858491849L,
                  289254654976L]
             The :meth:`map` method has the same signature as Python's builtin :func:`map`
             function, but runs the calculation in parallel. More involved examples of using
             :class:`MultiEngineClient` are provided in the examples that follow.
-            .. image:: mec_simple.*
+            .. image:: ../parallel/mec_simple.*

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages