Show More
@@ -0,0 +1,172 b'' | |||||
|
1 | .. _dag_dependencies: | |||
|
2 | ||||
|
3 | ================ | |||
|
4 | DAG Dependencies | |||
|
5 | ================ | |||
|
6 | ||||
|
7 | Often, parallel workflow is described in terms of a `Directed Acyclic Graph | |||
|
8 | <http://en.wikipedia.org/wiki/Directed_acyclic_graph>`_ or DAG. A popular library | |||
|
9 | for working with Graphs is NetworkX_. Here, we will walk through a demo mapping | |||
|
10 | a nx DAG to task dependencies. | |||
|
11 | ||||
|
12 | The full script that runs this demo can be found in | |||
|
13 | :file:`docs/examples/newparallel/dagdeps.py`. | |||
|
14 | ||||
|
15 | Why are DAGs good for task dependencies? | |||
|
16 | ---------------------------------------- | |||
|
17 | ||||
|
18 | The 'G' in DAG is 'Graph'. A Graph is a collection of **nodes** and **edges** that connect | |||
|
19 | the nodes. For our purposes, each node would be a task, and each edge would be a | |||
|
20 | dependency. The 'D' in DAG stands for 'Directed'. This means that each edge has a | |||
|
21 | direction associated with it. So we can interpret the edge (a,b) as meaning that b depends | |||
|
22 | on a, whereas the edge (b,a) would mean a depends on b. The 'A' is 'Acyclic', meaning that | |||
|
23 | there must not be any closed loops in the graph. This is important for dependencies, | |||
|
24 | because if a loop were closed, then a task could ultimately depend on itself, and never be | |||
|
25 | able to run. If your workflow can be described as a DAG, then it is impossible for your | |||
|
26 | dependencies to cause a deadlock. | |||
|
27 | ||||
|
28 | A Sample DAG | |||
|
29 | ------------ | |||
|
30 | ||||
|
31 | Here, we have a very simple 5-node DAG: | |||
|
32 | ||||
|
33 | .. figure:: simpledag.* | |||
|
34 | ||||
|
35 | With NetworkX, an arrow is just a fattened bit on the edge. Here, we can see that task 0 | |||
|
36 | depends on nothing, and can run immediately. 1 and 2 depend on 0; 3 depends on | |||
|
37 | 1 and 2; and 4 depends only on 1. | |||
|
38 | ||||
|
39 | A possible sequence of events for this workflow: | |||
|
40 | ||||
|
41 | 0. Task 0 can run right away | |||
|
42 | 1. 0 finishes, so 1,2 can start | |||
|
43 | 2. 1 finishes, 3 is still waiting on 2, but 4 can start right away | |||
|
44 | 3. 2 finishes, and 3 can finally start | |||
|
45 | ||||
|
46 | ||||
|
47 | Further, taking failures into account, assuming all dependencies are run with the default | |||
|
48 | `success_only=True`, the following cases would occur for each node's failure: | |||
|
49 | ||||
|
50 | 0. fails: all other tasks fail as Impossible | |||
|
51 | 1. 2 can still succeed, but 3,4 are unreachable | |||
|
52 | 2. 3 becomes unreachable, but 4 is unaffected | |||
|
53 | 3. and 4. are terminal, and can have no effect on other nodes | |||
|
54 | ||||
|
55 | The code to generate the simple DAG: | |||
|
56 | ||||
|
57 | .. sourcecode:: python | |||
|
58 | ||||
|
59 | import networkx as nx | |||
|
60 | ||||
|
61 | G = nx.DiGraph() | |||
|
62 | ||||
|
63 | # add 5 nodes, labeled 0-4: | |||
|
64 | map(G.add_node, range(5)) | |||
|
65 | # 1,2 depend on 0: | |||
|
66 | G.add_edge(0,1) | |||
|
67 | G.add_edge(0,2) | |||
|
68 | # 3 depends on 1,2 | |||
|
69 | G.add_edge(1,3) | |||
|
70 | G.add_edge(2,3) | |||
|
71 | # 4 depends on 1 | |||
|
72 | G.add_edge(1,4) | |||
|
73 | ||||
|
74 | # now draw the graph: | |||
|
75 | pos = { 0 : (0,0), 1 : (1,1), 2 : (-1,1), | |||
|
76 | 3 : (0,2), 4 : (2,2)} | |||
|
77 | nx.draw(G, pos, edge_color='r') | |||
|
78 | ||||
|
79 | ||||
|
80 | For demonstration purposes, we have a function that generates a random DAG with a given | |||
|
81 | number of nodes and edges. | |||
|
82 | ||||
|
83 | .. literalinclude:: ../../examples/newparallel/dagdeps.py | |||
|
84 | :language: python | |||
|
85 | :lines: 20-36 | |||
|
86 | ||||
|
87 | So first, we start with a graph of 32 nodes, with 128 edges: | |||
|
88 | ||||
|
89 | .. sourcecode:: ipython | |||
|
90 | ||||
|
91 | In [2]: G = random_dag(32,128) | |||
|
92 | ||||
|
93 | Now, we need to build our dict of jobs corresponding to the nodes on the graph: | |||
|
94 | ||||
|
95 | .. sourcecode:: ipython | |||
|
96 | ||||
|
97 | In [3]: jobs = {} | |||
|
98 | ||||
|
99 | # in reality, each job would presumably be different | |||
|
100 | # randomwait is just a function that sleeps for a random interval | |||
|
101 | In [4]: for node in G: | |||
|
102 | ...: jobs[node] = randomwait | |||
|
103 | ||||
|
104 | Once we have a dict of jobs matching the nodes on the graph, we can start submitting jobs, | |||
|
105 | and linking up the dependencies. Since we don't know a job's msg_id until it is submitted, | |||
|
106 | which is necessary for building dependencies, it is critical that we don't submit any jobs | |||
|
107 | before other jobs it may depend on. Fortunately, NetworkX provides a | |||
|
108 | :meth:`topological_sort` method which ensures exactly this. It presents an iterable, that | |||
|
109 | guarantees that when you arrive at a node, you have already visited all the nodes it | |||
|
110 | on which it depends: | |||
|
111 | ||||
|
112 | .. sourcecode:: ipython | |||
|
113 | ||||
|
114 | In [5]: c = client.Client() | |||
|
115 | ||||
|
116 | In [6]: results = {} | |||
|
117 | ||||
|
118 | In [7]: for node in G.topological_sort(): | |||
|
119 | ...: # get list of AsyncResult objects from nodes | |||
|
120 | ...: # leading into this one as dependencies | |||
|
121 | ...: deps = [ results[n] for n in G.predecessors(node) ] | |||
|
122 | ...: # submit and store AsyncResult object | |||
|
123 | ...: results[node] = client.apply(jobs[node], after=deps, block=False) | |||
|
124 | ||||
|
125 | Now that we have submitted all the jobs, we can wait for the results: | |||
|
126 | ||||
|
127 | .. sourcecode:: ipython | |||
|
128 | ||||
|
129 | In [8]: [ r.get() for r in results.values() ] | |||
|
130 | ||||
|
131 | Now, at least we know that all the jobs ran and did not fail (``r.get()`` would have | |||
|
132 | raised an error if a task failed). But we don't know that the ordering was properly | |||
|
133 | respected. For this, we can use the :attr:`metadata` attribute of each AsyncResult. | |||
|
134 | ||||
|
135 | These objects store a variety of metadata about each task, including various timestamps. | |||
|
136 | We can validate that the dependencies were respected by checking that each task was | |||
|
137 | started after all of its predecessors were completed: | |||
|
138 | ||||
|
139 | .. literalinclude:: ../../examples/newparallel/dagdeps.py | |||
|
140 | :language: python | |||
|
141 | :lines: 64-70 | |||
|
142 | ||||
|
143 | We can also validate the graph visually. By drawing the graph with each node's x-position | |||
|
144 | as its start time, all arrows must be pointing to the right if the order was respected. | |||
|
145 | For spreading, the y-position will be the in-degree, so tasks with lots of dependencies | |||
|
146 | will be at the top, and tasks with few dependencies will be at the bottom. | |||
|
147 | ||||
|
148 | .. sourcecode:: ipython | |||
|
149 | ||||
|
150 | In [10]: from matplotlib.dates import date2num | |||
|
151 | ||||
|
152 | In [11]: from matplotlib.cm import gist_rainbow | |||
|
153 | ||||
|
154 | In [12]: pos = {}; colors = {} | |||
|
155 | ||||
|
156 | In [12]: for node in G: | |||
|
157 | ...: md = results[node].metadata | |||
|
158 | ...: start = date2num(md.started) | |||
|
159 | ...: runtime = date2num(md.completed) - start | |||
|
160 | ...: pos[node] = (start, runtime) | |||
|
161 | ...: colors[node] = md.engine_id | |||
|
162 | ||||
|
163 | In [13]: nx.draw(G, pos, node_list=colors.keys(), node_color=colors.values(), | |||
|
164 | ...: cmap=gist_rainbow) | |||
|
165 | ||||
|
166 | .. figure:: dagdeps.* | |||
|
167 | ||||
|
168 | Time started on x, runtime on y, and color-coded by engine-id (in this case there | |||
|
169 | were four engines). | |||
|
170 | ||||
|
171 | ||||
|
172 | .. _NetworkX: http://networkx.lanl.gov/ |
1 | NO CONTENT: new file 100644, binary diff hidden |
|
NO CONTENT: new file 100644, binary diff hidden |
1 | NO CONTENT: new file 100644, binary diff hidden |
|
NO CONTENT: new file 100644, binary diff hidden |
1 | NO CONTENT: new file 100644, binary diff hidden |
|
NO CONTENT: new file 100644, binary diff hidden |
1 | NO CONTENT: new file 100644, binary diff hidden |
|
NO CONTENT: new file 100644, binary diff hidden |
@@ -1,1320 +1,1316 b'' | |||||
1 | """A semi-synchronous Client for the ZMQ controller""" |
|
1 | """A semi-synchronous Client for the ZMQ controller""" | |
2 | #----------------------------------------------------------------------------- |
|
2 | #----------------------------------------------------------------------------- | |
3 | # Copyright (C) 2010 The IPython Development Team |
|
3 | # Copyright (C) 2010 The IPython Development Team | |
4 | # |
|
4 | # | |
5 | # Distributed under the terms of the BSD License. The full license is in |
|
5 | # Distributed under the terms of the BSD License. The full license is in | |
6 | # the file COPYING, distributed as part of this software. |
|
6 | # the file COPYING, distributed as part of this software. | |
7 | #----------------------------------------------------------------------------- |
|
7 | #----------------------------------------------------------------------------- | |
8 |
|
8 | |||
9 | #----------------------------------------------------------------------------- |
|
9 | #----------------------------------------------------------------------------- | |
10 | # Imports |
|
10 | # Imports | |
11 | #----------------------------------------------------------------------------- |
|
11 | #----------------------------------------------------------------------------- | |
12 |
|
12 | |||
13 | import os |
|
13 | import os | |
14 | import time |
|
14 | import time | |
15 | from getpass import getpass |
|
15 | from getpass import getpass | |
16 | from pprint import pprint |
|
16 | from pprint import pprint | |
17 | from datetime import datetime |
|
17 | from datetime import datetime | |
18 | import warnings |
|
18 | import warnings | |
19 | import json |
|
19 | import json | |
20 | pjoin = os.path.join |
|
20 | pjoin = os.path.join | |
21 |
|
21 | |||
22 | import zmq |
|
22 | import zmq | |
23 | from zmq.eventloop import ioloop, zmqstream |
|
23 | from zmq.eventloop import ioloop, zmqstream | |
24 |
|
24 | |||
25 | from IPython.utils.path import get_ipython_dir |
|
25 | from IPython.utils.path import get_ipython_dir | |
26 | from IPython.external.decorator import decorator |
|
26 | from IPython.external.decorator import decorator | |
27 | from IPython.external.ssh import tunnel |
|
27 | from IPython.external.ssh import tunnel | |
28 |
|
28 | |||
29 | import streamsession as ss |
|
29 | import streamsession as ss | |
30 | from clusterdir import ClusterDir, ClusterDirError |
|
30 | from clusterdir import ClusterDir, ClusterDirError | |
31 | # from remotenamespace import RemoteNamespace |
|
31 | # from remotenamespace import RemoteNamespace | |
32 | from view import DirectView, LoadBalancedView |
|
32 | from view import DirectView, LoadBalancedView | |
33 | from dependency import Dependency, depend, require, dependent |
|
33 | from dependency import Dependency, depend, require, dependent | |
34 | import error |
|
34 | import error | |
35 | import map as Map |
|
35 | import map as Map | |
36 | from asyncresult import AsyncResult, AsyncMapResult |
|
36 | from asyncresult import AsyncResult, AsyncMapResult | |
37 | from remotefunction import remote,parallel,ParallelFunction,RemoteFunction |
|
37 | from remotefunction import remote,parallel,ParallelFunction,RemoteFunction | |
38 | from util import ReverseDict, disambiguate_url, validate_url |
|
38 | from util import ReverseDict, disambiguate_url, validate_url | |
39 |
|
39 | |||
40 | #-------------------------------------------------------------------------- |
|
40 | #-------------------------------------------------------------------------- | |
41 | # helpers for implementing old MEC API via client.apply |
|
41 | # helpers for implementing old MEC API via client.apply | |
42 | #-------------------------------------------------------------------------- |
|
42 | #-------------------------------------------------------------------------- | |
43 |
|
43 | |||
44 | def _push(ns): |
|
44 | def _push(ns): | |
45 | """helper method for implementing `client.push` via `client.apply`""" |
|
45 | """helper method for implementing `client.push` via `client.apply`""" | |
46 | globals().update(ns) |
|
46 | globals().update(ns) | |
47 |
|
47 | |||
48 | def _pull(keys): |
|
48 | def _pull(keys): | |
49 | """helper method for implementing `client.pull` via `client.apply`""" |
|
49 | """helper method for implementing `client.pull` via `client.apply`""" | |
50 | g = globals() |
|
50 | g = globals() | |
51 | if isinstance(keys, (list,tuple, set)): |
|
51 | if isinstance(keys, (list,tuple, set)): | |
52 | for key in keys: |
|
52 | for key in keys: | |
53 | if not g.has_key(key): |
|
53 | if not g.has_key(key): | |
54 | raise NameError("name '%s' is not defined"%key) |
|
54 | raise NameError("name '%s' is not defined"%key) | |
55 | return map(g.get, keys) |
|
55 | return map(g.get, keys) | |
56 | else: |
|
56 | else: | |
57 | if not g.has_key(keys): |
|
57 | if not g.has_key(keys): | |
58 | raise NameError("name '%s' is not defined"%keys) |
|
58 | raise NameError("name '%s' is not defined"%keys) | |
59 | return g.get(keys) |
|
59 | return g.get(keys) | |
60 |
|
60 | |||
61 | def _clear(): |
|
61 | def _clear(): | |
62 | """helper method for implementing `client.clear` via `client.apply`""" |
|
62 | """helper method for implementing `client.clear` via `client.apply`""" | |
63 | globals().clear() |
|
63 | globals().clear() | |
64 |
|
64 | |||
65 | def _execute(code): |
|
65 | def _execute(code): | |
66 | """helper method for implementing `client.execute` via `client.apply`""" |
|
66 | """helper method for implementing `client.execute` via `client.apply`""" | |
67 | exec code in globals() |
|
67 | exec code in globals() | |
68 |
|
68 | |||
69 |
|
69 | |||
70 | #-------------------------------------------------------------------------- |
|
70 | #-------------------------------------------------------------------------- | |
71 | # Decorators for Client methods |
|
71 | # Decorators for Client methods | |
72 | #-------------------------------------------------------------------------- |
|
72 | #-------------------------------------------------------------------------- | |
73 |
|
73 | |||
74 | @decorator |
|
74 | @decorator | |
75 | def spinfirst(f, self, *args, **kwargs): |
|
75 | def spinfirst(f, self, *args, **kwargs): | |
76 | """Call spin() to sync state prior to calling the method.""" |
|
76 | """Call spin() to sync state prior to calling the method.""" | |
77 | self.spin() |
|
77 | self.spin() | |
78 | return f(self, *args, **kwargs) |
|
78 | return f(self, *args, **kwargs) | |
79 |
|
79 | |||
80 | @decorator |
|
80 | @decorator | |
81 | def defaultblock(f, self, *args, **kwargs): |
|
81 | def defaultblock(f, self, *args, **kwargs): | |
82 | """Default to self.block; preserve self.block.""" |
|
82 | """Default to self.block; preserve self.block.""" | |
83 | block = kwargs.get('block',None) |
|
83 | block = kwargs.get('block',None) | |
84 | block = self.block if block is None else block |
|
84 | block = self.block if block is None else block | |
85 | saveblock = self.block |
|
85 | saveblock = self.block | |
86 | self.block = block |
|
86 | self.block = block | |
87 | try: |
|
87 | try: | |
88 | ret = f(self, *args, **kwargs) |
|
88 | ret = f(self, *args, **kwargs) | |
89 | finally: |
|
89 | finally: | |
90 | self.block = saveblock |
|
90 | self.block = saveblock | |
91 | return ret |
|
91 | return ret | |
92 |
|
92 | |||
93 |
|
93 | |||
94 | #-------------------------------------------------------------------------- |
|
94 | #-------------------------------------------------------------------------- | |
95 | # Classes |
|
95 | # Classes | |
96 | #-------------------------------------------------------------------------- |
|
96 | #-------------------------------------------------------------------------- | |
97 |
|
97 | |||
98 | class Metadata(dict): |
|
98 | class Metadata(dict): | |
99 | """Subclass of dict for initializing metadata values. |
|
99 | """Subclass of dict for initializing metadata values. | |
100 |
|
100 | |||
101 | Attribute access works on keys. |
|
101 | Attribute access works on keys. | |
102 |
|
102 | |||
103 | These objects have a strict set of keys - errors will raise if you try |
|
103 | These objects have a strict set of keys - errors will raise if you try | |
104 | to add new keys. |
|
104 | to add new keys. | |
105 | """ |
|
105 | """ | |
106 | def __init__(self, *args, **kwargs): |
|
106 | def __init__(self, *args, **kwargs): | |
107 | dict.__init__(self) |
|
107 | dict.__init__(self) | |
108 | md = {'msg_id' : None, |
|
108 | md = {'msg_id' : None, | |
109 | 'submitted' : None, |
|
109 | 'submitted' : None, | |
110 | 'started' : None, |
|
110 | 'started' : None, | |
111 | 'completed' : None, |
|
111 | 'completed' : None, | |
112 | 'received' : None, |
|
112 | 'received' : None, | |
113 | 'engine_uuid' : None, |
|
113 | 'engine_uuid' : None, | |
114 | 'engine_id' : None, |
|
114 | 'engine_id' : None, | |
115 | 'follow' : None, |
|
115 | 'follow' : None, | |
116 | 'after' : None, |
|
116 | 'after' : None, | |
117 | 'status' : None, |
|
117 | 'status' : None, | |
118 |
|
118 | |||
119 | 'pyin' : None, |
|
119 | 'pyin' : None, | |
120 | 'pyout' : None, |
|
120 | 'pyout' : None, | |
121 | 'pyerr' : None, |
|
121 | 'pyerr' : None, | |
122 | 'stdout' : '', |
|
122 | 'stdout' : '', | |
123 | 'stderr' : '', |
|
123 | 'stderr' : '', | |
124 | } |
|
124 | } | |
125 | self.update(md) |
|
125 | self.update(md) | |
126 | self.update(dict(*args, **kwargs)) |
|
126 | self.update(dict(*args, **kwargs)) | |
127 |
|
127 | |||
128 | def __getattr__(self, key): |
|
128 | def __getattr__(self, key): | |
129 | """getattr aliased to getitem""" |
|
129 | """getattr aliased to getitem""" | |
130 | if key in self.iterkeys(): |
|
130 | if key in self.iterkeys(): | |
131 | return self[key] |
|
131 | return self[key] | |
132 | else: |
|
132 | else: | |
133 | raise AttributeError(key) |
|
133 | raise AttributeError(key) | |
134 |
|
134 | |||
135 | def __setattr__(self, key, value): |
|
135 | def __setattr__(self, key, value): | |
136 | """setattr aliased to setitem, with strict""" |
|
136 | """setattr aliased to setitem, with strict""" | |
137 | if key in self.iterkeys(): |
|
137 | if key in self.iterkeys(): | |
138 | self[key] = value |
|
138 | self[key] = value | |
139 | else: |
|
139 | else: | |
140 | raise AttributeError(key) |
|
140 | raise AttributeError(key) | |
141 |
|
141 | |||
142 | def __setitem__(self, key, value): |
|
142 | def __setitem__(self, key, value): | |
143 | """strict static key enforcement""" |
|
143 | """strict static key enforcement""" | |
144 | if key in self.iterkeys(): |
|
144 | if key in self.iterkeys(): | |
145 | dict.__setitem__(self, key, value) |
|
145 | dict.__setitem__(self, key, value) | |
146 | else: |
|
146 | else: | |
147 | raise KeyError(key) |
|
147 | raise KeyError(key) | |
148 |
|
148 | |||
149 |
|
149 | |||
150 | class Client(object): |
|
150 | class Client(object): | |
151 | """A semi-synchronous client to the IPython ZMQ controller |
|
151 | """A semi-synchronous client to the IPython ZMQ controller | |
152 |
|
152 | |||
153 | Parameters |
|
153 | Parameters | |
154 | ---------- |
|
154 | ---------- | |
155 |
|
155 | |||
156 | url_or_file : bytes; zmq url or path to ipcontroller-client.json |
|
156 | url_or_file : bytes; zmq url or path to ipcontroller-client.json | |
157 | Connection information for the Hub's registration. If a json connector |
|
157 | Connection information for the Hub's registration. If a json connector | |
158 | file is given, then likely no further configuration is necessary. |
|
158 | file is given, then likely no further configuration is necessary. | |
159 | [Default: use profile] |
|
159 | [Default: use profile] | |
160 | profile : bytes |
|
160 | profile : bytes | |
161 | The name of the Cluster profile to be used to find connector information. |
|
161 | The name of the Cluster profile to be used to find connector information. | |
162 | [Default: 'default'] |
|
162 | [Default: 'default'] | |
163 | context : zmq.Context |
|
163 | context : zmq.Context | |
164 | Pass an existing zmq.Context instance, otherwise the client will create its own. |
|
164 | Pass an existing zmq.Context instance, otherwise the client will create its own. | |
165 | username : bytes |
|
165 | username : bytes | |
166 | set username to be passed to the Session object |
|
166 | set username to be passed to the Session object | |
167 | debug : bool |
|
167 | debug : bool | |
168 | flag for lots of message printing for debug purposes |
|
168 | flag for lots of message printing for debug purposes | |
169 |
|
169 | |||
170 | #-------------- ssh related args ---------------- |
|
170 | #-------------- ssh related args ---------------- | |
171 | # These are args for configuring the ssh tunnel to be used |
|
171 | # These are args for configuring the ssh tunnel to be used | |
172 | # credentials are used to forward connections over ssh to the Controller |
|
172 | # credentials are used to forward connections over ssh to the Controller | |
173 | # Note that the ip given in `addr` needs to be relative to sshserver |
|
173 | # Note that the ip given in `addr` needs to be relative to sshserver | |
174 | # The most basic case is to leave addr as pointing to localhost (127.0.0.1), |
|
174 | # The most basic case is to leave addr as pointing to localhost (127.0.0.1), | |
175 | # and set sshserver as the same machine the Controller is on. However, |
|
175 | # and set sshserver as the same machine the Controller is on. However, | |
176 | # the only requirement is that sshserver is able to see the Controller |
|
176 | # the only requirement is that sshserver is able to see the Controller | |
177 | # (i.e. is within the same trusted network). |
|
177 | # (i.e. is within the same trusted network). | |
178 |
|
178 | |||
179 | sshserver : str |
|
179 | sshserver : str | |
180 | A string of the form passed to ssh, i.e. 'server.tld' or 'user@server.tld:port' |
|
180 | A string of the form passed to ssh, i.e. 'server.tld' or 'user@server.tld:port' | |
181 | If keyfile or password is specified, and this is not, it will default to |
|
181 | If keyfile or password is specified, and this is not, it will default to | |
182 | the ip given in addr. |
|
182 | the ip given in addr. | |
183 | sshkey : str; path to public ssh key file |
|
183 | sshkey : str; path to public ssh key file | |
184 | This specifies a key to be used in ssh login, default None. |
|
184 | This specifies a key to be used in ssh login, default None. | |
185 | Regular default ssh keys will be used without specifying this argument. |
|
185 | Regular default ssh keys will be used without specifying this argument. | |
186 | password : str |
|
186 | password : str | |
187 | Your ssh password to sshserver. Note that if this is left None, |
|
187 | Your ssh password to sshserver. Note that if this is left None, | |
188 | you will be prompted for it if passwordless key based login is unavailable. |
|
188 | you will be prompted for it if passwordless key based login is unavailable. | |
189 | paramiko : bool |
|
189 | paramiko : bool | |
190 | flag for whether to use paramiko instead of shell ssh for tunneling. |
|
190 | flag for whether to use paramiko instead of shell ssh for tunneling. | |
191 | [default: True on win32, False else] |
|
191 | [default: True on win32, False else] | |
192 |
|
192 | |||
193 | #------- exec authentication args ------- |
|
193 | #------- exec authentication args ------- | |
194 | # If even localhost is untrusted, you can have some protection against |
|
194 | # If even localhost is untrusted, you can have some protection against | |
195 | # unauthorized execution by using a key. Messages are still sent |
|
195 | # unauthorized execution by using a key. Messages are still sent | |
196 | # as cleartext, so if someone can snoop your loopback traffic this will |
|
196 | # as cleartext, so if someone can snoop your loopback traffic this will | |
197 | # not help against malicious attacks. |
|
197 | # not help against malicious attacks. | |
198 |
|
198 | |||
199 | exec_key : str |
|
199 | exec_key : str | |
200 | an authentication key or file containing a key |
|
200 | an authentication key or file containing a key | |
201 | default: None |
|
201 | default: None | |
202 |
|
202 | |||
203 |
|
203 | |||
204 | Attributes |
|
204 | Attributes | |
205 | ---------- |
|
205 | ---------- | |
206 | ids : set of int engine IDs |
|
206 | ids : set of int engine IDs | |
207 | requesting the ids attribute always synchronizes |
|
207 | requesting the ids attribute always synchronizes | |
208 | the registration state. To request ids without synchronization, |
|
208 | the registration state. To request ids without synchronization, | |
209 | use semi-private _ids attributes. |
|
209 | use semi-private _ids attributes. | |
210 |
|
210 | |||
211 | history : list of msg_ids |
|
211 | history : list of msg_ids | |
212 | a list of msg_ids, keeping track of all the execution |
|
212 | a list of msg_ids, keeping track of all the execution | |
213 | messages you have submitted in order. |
|
213 | messages you have submitted in order. | |
214 |
|
214 | |||
215 | outstanding : set of msg_ids |
|
215 | outstanding : set of msg_ids | |
216 | a set of msg_ids that have been submitted, but whose |
|
216 | a set of msg_ids that have been submitted, but whose | |
217 | results have not yet been received. |
|
217 | results have not yet been received. | |
218 |
|
218 | |||
219 | results : dict |
|
219 | results : dict | |
220 | a dict of all our results, keyed by msg_id |
|
220 | a dict of all our results, keyed by msg_id | |
221 |
|
221 | |||
222 | block : bool |
|
222 | block : bool | |
223 | determines default behavior when block not specified |
|
223 | determines default behavior when block not specified | |
224 | in execution methods |
|
224 | in execution methods | |
225 |
|
225 | |||
226 | Methods |
|
226 | Methods | |
227 | ------- |
|
227 | ------- | |
228 | spin : flushes incoming results and registration state changes |
|
228 | spin : flushes incoming results and registration state changes | |
229 | control methods spin, and requesting `ids` also ensures up to date |
|
229 | control methods spin, and requesting `ids` also ensures up to date | |
230 |
|
230 | |||
231 | barrier : wait on one or more msg_ids |
|
231 | barrier : wait on one or more msg_ids | |
232 |
|
232 | |||
233 | execution methods: apply/apply_bound/apply_to/apply_bound |
|
233 | execution methods: apply/apply_bound/apply_to/apply_bound | |
234 | legacy: execute, run |
|
234 | legacy: execute, run | |
235 |
|
235 | |||
236 | query methods: queue_status, get_result, purge |
|
236 | query methods: queue_status, get_result, purge | |
237 |
|
237 | |||
238 | control methods: abort, kill |
|
238 | control methods: abort, kill | |
239 |
|
239 | |||
240 | """ |
|
240 | """ | |
241 |
|
241 | |||
242 |
|
242 | |||
243 | _connected=False |
|
243 | _connected=False | |
244 | _ssh=False |
|
244 | _ssh=False | |
245 | _engines=None |
|
245 | _engines=None | |
246 | _registration_socket=None |
|
246 | _registration_socket=None | |
247 | _query_socket=None |
|
247 | _query_socket=None | |
248 | _control_socket=None |
|
248 | _control_socket=None | |
249 | _iopub_socket=None |
|
249 | _iopub_socket=None | |
250 | _notification_socket=None |
|
250 | _notification_socket=None | |
251 | _mux_socket=None |
|
251 | _mux_socket=None | |
252 | _task_socket=None |
|
252 | _task_socket=None | |
253 | _task_scheme=None |
|
253 | _task_scheme=None | |
254 | block = False |
|
254 | block = False | |
255 | outstanding=None |
|
255 | outstanding=None | |
256 | results = None |
|
256 | results = None | |
257 | history = None |
|
257 | history = None | |
258 | debug = False |
|
258 | debug = False | |
259 | targets = None |
|
259 | targets = None | |
260 |
|
260 | |||
261 | def __init__(self, url_or_file=None, profile='default', cluster_dir=None, ipython_dir=None, |
|
261 | def __init__(self, url_or_file=None, profile='default', cluster_dir=None, ipython_dir=None, | |
262 | context=None, username=None, debug=False, exec_key=None, |
|
262 | context=None, username=None, debug=False, exec_key=None, | |
263 | sshserver=None, sshkey=None, password=None, paramiko=None, |
|
263 | sshserver=None, sshkey=None, password=None, paramiko=None, | |
264 | ): |
|
264 | ): | |
265 | if context is None: |
|
265 | if context is None: | |
266 | context = zmq.Context() |
|
266 | context = zmq.Context() | |
267 | self.context = context |
|
267 | self.context = context | |
268 | self.targets = 'all' |
|
268 | self.targets = 'all' | |
269 |
|
269 | |||
270 | self._setup_cluster_dir(profile, cluster_dir, ipython_dir) |
|
270 | self._setup_cluster_dir(profile, cluster_dir, ipython_dir) | |
271 | if self._cd is not None: |
|
271 | if self._cd is not None: | |
272 | if url_or_file is None: |
|
272 | if url_or_file is None: | |
273 | url_or_file = pjoin(self._cd.security_dir, 'ipcontroller-client.json') |
|
273 | url_or_file = pjoin(self._cd.security_dir, 'ipcontroller-client.json') | |
274 | assert url_or_file is not None, "I can't find enough information to connect to a controller!"\ |
|
274 | assert url_or_file is not None, "I can't find enough information to connect to a controller!"\ | |
275 | " Please specify at least one of url_or_file or profile." |
|
275 | " Please specify at least one of url_or_file or profile." | |
276 |
|
276 | |||
277 | try: |
|
277 | try: | |
278 | validate_url(url_or_file) |
|
278 | validate_url(url_or_file) | |
279 | except AssertionError: |
|
279 | except AssertionError: | |
280 | if not os.path.exists(url_or_file): |
|
280 | if not os.path.exists(url_or_file): | |
281 | if self._cd: |
|
281 | if self._cd: | |
282 | url_or_file = os.path.join(self._cd.security_dir, url_or_file) |
|
282 | url_or_file = os.path.join(self._cd.security_dir, url_or_file) | |
283 | assert os.path.exists(url_or_file), "Not a valid connection file or url: %r"%url_or_file |
|
283 | assert os.path.exists(url_or_file), "Not a valid connection file or url: %r"%url_or_file | |
284 | with open(url_or_file) as f: |
|
284 | with open(url_or_file) as f: | |
285 | cfg = json.loads(f.read()) |
|
285 | cfg = json.loads(f.read()) | |
286 | else: |
|
286 | else: | |
287 | cfg = {'url':url_or_file} |
|
287 | cfg = {'url':url_or_file} | |
288 |
|
288 | |||
289 | # sync defaults from args, json: |
|
289 | # sync defaults from args, json: | |
290 | if sshserver: |
|
290 | if sshserver: | |
291 | cfg['ssh'] = sshserver |
|
291 | cfg['ssh'] = sshserver | |
292 | if exec_key: |
|
292 | if exec_key: | |
293 | cfg['exec_key'] = exec_key |
|
293 | cfg['exec_key'] = exec_key | |
294 | exec_key = cfg['exec_key'] |
|
294 | exec_key = cfg['exec_key'] | |
295 | sshserver=cfg['ssh'] |
|
295 | sshserver=cfg['ssh'] | |
296 | url = cfg['url'] |
|
296 | url = cfg['url'] | |
297 | location = cfg.setdefault('location', None) |
|
297 | location = cfg.setdefault('location', None) | |
298 | cfg['url'] = disambiguate_url(cfg['url'], location) |
|
298 | cfg['url'] = disambiguate_url(cfg['url'], location) | |
299 | url = cfg['url'] |
|
299 | url = cfg['url'] | |
300 |
|
300 | |||
301 | self._config = cfg |
|
301 | self._config = cfg | |
302 |
|
302 | |||
303 | self._ssh = bool(sshserver or sshkey or password) |
|
303 | self._ssh = bool(sshserver or sshkey or password) | |
304 | if self._ssh and sshserver is None: |
|
304 | if self._ssh and sshserver is None: | |
305 | # default to ssh via localhost |
|
305 | # default to ssh via localhost | |
306 | sshserver = url.split('://')[1].split(':')[0] |
|
306 | sshserver = url.split('://')[1].split(':')[0] | |
307 | if self._ssh and password is None: |
|
307 | if self._ssh and password is None: | |
308 | if tunnel.try_passwordless_ssh(sshserver, sshkey, paramiko): |
|
308 | if tunnel.try_passwordless_ssh(sshserver, sshkey, paramiko): | |
309 | password=False |
|
309 | password=False | |
310 | else: |
|
310 | else: | |
311 | password = getpass("SSH Password for %s: "%sshserver) |
|
311 | password = getpass("SSH Password for %s: "%sshserver) | |
312 | ssh_kwargs = dict(keyfile=sshkey, password=password, paramiko=paramiko) |
|
312 | ssh_kwargs = dict(keyfile=sshkey, password=password, paramiko=paramiko) | |
313 | if exec_key is not None and os.path.isfile(exec_key): |
|
313 | if exec_key is not None and os.path.isfile(exec_key): | |
314 | arg = 'keyfile' |
|
314 | arg = 'keyfile' | |
315 | else: |
|
315 | else: | |
316 | arg = 'key' |
|
316 | arg = 'key' | |
317 | key_arg = {arg:exec_key} |
|
317 | key_arg = {arg:exec_key} | |
318 | if username is None: |
|
318 | if username is None: | |
319 | self.session = ss.StreamSession(**key_arg) |
|
319 | self.session = ss.StreamSession(**key_arg) | |
320 | else: |
|
320 | else: | |
321 | self.session = ss.StreamSession(username, **key_arg) |
|
321 | self.session = ss.StreamSession(username, **key_arg) | |
322 | self._registration_socket = self.context.socket(zmq.XREQ) |
|
322 | self._registration_socket = self.context.socket(zmq.XREQ) | |
323 | self._registration_socket.setsockopt(zmq.IDENTITY, self.session.session) |
|
323 | self._registration_socket.setsockopt(zmq.IDENTITY, self.session.session) | |
324 | if self._ssh: |
|
324 | if self._ssh: | |
325 | tunnel.tunnel_connection(self._registration_socket, url, sshserver, **ssh_kwargs) |
|
325 | tunnel.tunnel_connection(self._registration_socket, url, sshserver, **ssh_kwargs) | |
326 | else: |
|
326 | else: | |
327 | self._registration_socket.connect(url) |
|
327 | self._registration_socket.connect(url) | |
328 | self._engines = ReverseDict() |
|
328 | self._engines = ReverseDict() | |
329 | self._ids = set() |
|
329 | self._ids = set() | |
330 | self.outstanding=set() |
|
330 | self.outstanding=set() | |
331 | self.results = {} |
|
331 | self.results = {} | |
332 | self.metadata = {} |
|
332 | self.metadata = {} | |
333 | self.history = [] |
|
333 | self.history = [] | |
334 | self.debug = debug |
|
334 | self.debug = debug | |
335 | self.session.debug = debug |
|
335 | self.session.debug = debug | |
336 |
|
336 | |||
337 | self._notification_handlers = {'registration_notification' : self._register_engine, |
|
337 | self._notification_handlers = {'registration_notification' : self._register_engine, | |
338 | 'unregistration_notification' : self._unregister_engine, |
|
338 | 'unregistration_notification' : self._unregister_engine, | |
339 | } |
|
339 | } | |
340 | self._queue_handlers = {'execute_reply' : self._handle_execute_reply, |
|
340 | self._queue_handlers = {'execute_reply' : self._handle_execute_reply, | |
341 | 'apply_reply' : self._handle_apply_reply} |
|
341 | 'apply_reply' : self._handle_apply_reply} | |
342 | self._connect(sshserver, ssh_kwargs) |
|
342 | self._connect(sshserver, ssh_kwargs) | |
343 |
|
343 | |||
344 |
|
344 | |||
345 | def _setup_cluster_dir(self, profile, cluster_dir, ipython_dir): |
|
345 | def _setup_cluster_dir(self, profile, cluster_dir, ipython_dir): | |
346 | if ipython_dir is None: |
|
346 | if ipython_dir is None: | |
347 | ipython_dir = get_ipython_dir() |
|
347 | ipython_dir = get_ipython_dir() | |
348 | if cluster_dir is not None: |
|
348 | if cluster_dir is not None: | |
349 | try: |
|
349 | try: | |
350 | self._cd = ClusterDir.find_cluster_dir(cluster_dir) |
|
350 | self._cd = ClusterDir.find_cluster_dir(cluster_dir) | |
351 | except ClusterDirError: |
|
351 | except ClusterDirError: | |
352 | pass |
|
352 | pass | |
353 | elif profile is not None: |
|
353 | elif profile is not None: | |
354 | try: |
|
354 | try: | |
355 | self._cd = ClusterDir.find_cluster_dir_by_profile( |
|
355 | self._cd = ClusterDir.find_cluster_dir_by_profile( | |
356 | ipython_dir, profile) |
|
356 | ipython_dir, profile) | |
357 | except ClusterDirError: |
|
357 | except ClusterDirError: | |
358 | pass |
|
358 | pass | |
359 | else: |
|
359 | else: | |
360 | self._cd = None |
|
360 | self._cd = None | |
361 |
|
361 | |||
362 | @property |
|
362 | @property | |
363 | def ids(self): |
|
363 | def ids(self): | |
364 | """Always up-to-date ids property.""" |
|
364 | """Always up-to-date ids property.""" | |
365 | self._flush_notifications() |
|
365 | self._flush_notifications() | |
366 | return self._ids |
|
366 | return self._ids | |
367 |
|
367 | |||
368 | def _update_engines(self, engines): |
|
368 | def _update_engines(self, engines): | |
369 | """Update our engines dict and _ids from a dict of the form: {id:uuid}.""" |
|
369 | """Update our engines dict and _ids from a dict of the form: {id:uuid}.""" | |
370 | for k,v in engines.iteritems(): |
|
370 | for k,v in engines.iteritems(): | |
371 | eid = int(k) |
|
371 | eid = int(k) | |
372 | self._engines[eid] = bytes(v) # force not unicode |
|
372 | self._engines[eid] = bytes(v) # force not unicode | |
373 | self._ids.add(eid) |
|
373 | self._ids.add(eid) | |
374 | if sorted(self._engines.keys()) != range(len(self._engines)) and \ |
|
374 | if sorted(self._engines.keys()) != range(len(self._engines)) and \ | |
375 | self._task_scheme == 'pure' and self._task_socket: |
|
375 | self._task_scheme == 'pure' and self._task_socket: | |
376 | self._stop_scheduling_tasks() |
|
376 | self._stop_scheduling_tasks() | |
377 |
|
377 | |||
378 | def _stop_scheduling_tasks(self): |
|
378 | def _stop_scheduling_tasks(self): | |
379 | """Stop scheduling tasks because an engine has been unregistered |
|
379 | """Stop scheduling tasks because an engine has been unregistered | |
380 | from a pure ZMQ scheduler. |
|
380 | from a pure ZMQ scheduler. | |
381 | """ |
|
381 | """ | |
382 |
|
382 | |||
383 | self._task_socket.close() |
|
383 | self._task_socket.close() | |
384 | self._task_socket = None |
|
384 | self._task_socket = None | |
385 | msg = "An engine has been unregistered, and we are using pure " +\ |
|
385 | msg = "An engine has been unregistered, and we are using pure " +\ | |
386 | "ZMQ task scheduling. Task farming will be disabled." |
|
386 | "ZMQ task scheduling. Task farming will be disabled." | |
387 | if self.outstanding: |
|
387 | if self.outstanding: | |
388 | msg += " If you were running tasks when this happened, " +\ |
|
388 | msg += " If you were running tasks when this happened, " +\ | |
389 | "some `outstanding` msg_ids may never resolve." |
|
389 | "some `outstanding` msg_ids may never resolve." | |
390 | warnings.warn(msg, RuntimeWarning) |
|
390 | warnings.warn(msg, RuntimeWarning) | |
391 |
|
391 | |||
392 | def _build_targets(self, targets): |
|
392 | def _build_targets(self, targets): | |
393 | """Turn valid target IDs or 'all' into two lists: |
|
393 | """Turn valid target IDs or 'all' into two lists: | |
394 | (int_ids, uuids). |
|
394 | (int_ids, uuids). | |
395 | """ |
|
395 | """ | |
396 | if targets is None: |
|
396 | if targets is None: | |
397 | targets = self._ids |
|
397 | targets = self._ids | |
398 | elif isinstance(targets, str): |
|
398 | elif isinstance(targets, str): | |
399 | if targets.lower() == 'all': |
|
399 | if targets.lower() == 'all': | |
400 | targets = self._ids |
|
400 | targets = self._ids | |
401 | else: |
|
401 | else: | |
402 | raise TypeError("%r not valid str target, must be 'all'"%(targets)) |
|
402 | raise TypeError("%r not valid str target, must be 'all'"%(targets)) | |
403 | elif isinstance(targets, int): |
|
403 | elif isinstance(targets, int): | |
404 | targets = [targets] |
|
404 | targets = [targets] | |
405 | return [self._engines[t] for t in targets], list(targets) |
|
405 | return [self._engines[t] for t in targets], list(targets) | |
406 |
|
406 | |||
407 | def _connect(self, sshserver, ssh_kwargs): |
|
407 | def _connect(self, sshserver, ssh_kwargs): | |
408 | """setup all our socket connections to the controller. This is called from |
|
408 | """setup all our socket connections to the controller. This is called from | |
409 | __init__.""" |
|
409 | __init__.""" | |
410 |
|
410 | |||
411 | # Maybe allow reconnecting? |
|
411 | # Maybe allow reconnecting? | |
412 | if self._connected: |
|
412 | if self._connected: | |
413 | return |
|
413 | return | |
414 | self._connected=True |
|
414 | self._connected=True | |
415 |
|
415 | |||
416 | def connect_socket(s, url): |
|
416 | def connect_socket(s, url): | |
417 | url = disambiguate_url(url, self._config['location']) |
|
417 | url = disambiguate_url(url, self._config['location']) | |
418 | if self._ssh: |
|
418 | if self._ssh: | |
419 | return tunnel.tunnel_connection(s, url, sshserver, **ssh_kwargs) |
|
419 | return tunnel.tunnel_connection(s, url, sshserver, **ssh_kwargs) | |
420 | else: |
|
420 | else: | |
421 | return s.connect(url) |
|
421 | return s.connect(url) | |
422 |
|
422 | |||
423 | self.session.send(self._registration_socket, 'connection_request') |
|
423 | self.session.send(self._registration_socket, 'connection_request') | |
424 | idents,msg = self.session.recv(self._registration_socket,mode=0) |
|
424 | idents,msg = self.session.recv(self._registration_socket,mode=0) | |
425 | if self.debug: |
|
425 | if self.debug: | |
426 | pprint(msg) |
|
426 | pprint(msg) | |
427 | msg = ss.Message(msg) |
|
427 | msg = ss.Message(msg) | |
428 | content = msg.content |
|
428 | content = msg.content | |
429 | self._config['registration'] = dict(content) |
|
429 | self._config['registration'] = dict(content) | |
430 | if content.status == 'ok': |
|
430 | if content.status == 'ok': | |
431 | if content.mux: |
|
431 | if content.mux: | |
432 | self._mux_socket = self.context.socket(zmq.PAIR) |
|
432 | self._mux_socket = self.context.socket(zmq.PAIR) | |
433 | self._mux_socket.setsockopt(zmq.IDENTITY, self.session.session) |
|
433 | self._mux_socket.setsockopt(zmq.IDENTITY, self.session.session) | |
434 | connect_socket(self._mux_socket, content.mux) |
|
434 | connect_socket(self._mux_socket, content.mux) | |
435 | if content.task: |
|
435 | if content.task: | |
436 | self._task_scheme, task_addr = content.task |
|
436 | self._task_scheme, task_addr = content.task | |
437 | self._task_socket = self.context.socket(zmq.PAIR) |
|
437 | self._task_socket = self.context.socket(zmq.PAIR) | |
438 | self._task_socket.setsockopt(zmq.IDENTITY, self.session.session) |
|
438 | self._task_socket.setsockopt(zmq.IDENTITY, self.session.session) | |
439 | connect_socket(self._task_socket, task_addr) |
|
439 | connect_socket(self._task_socket, task_addr) | |
440 | if content.notification: |
|
440 | if content.notification: | |
441 | self._notification_socket = self.context.socket(zmq.SUB) |
|
441 | self._notification_socket = self.context.socket(zmq.SUB) | |
442 | connect_socket(self._notification_socket, content.notification) |
|
442 | connect_socket(self._notification_socket, content.notification) | |
443 | self._notification_socket.setsockopt(zmq.SUBSCRIBE, "") |
|
443 | self._notification_socket.setsockopt(zmq.SUBSCRIBE, "") | |
444 | if content.query: |
|
444 | if content.query: | |
445 | self._query_socket = self.context.socket(zmq.PAIR) |
|
445 | self._query_socket = self.context.socket(zmq.PAIR) | |
446 | self._query_socket.setsockopt(zmq.IDENTITY, self.session.session) |
|
446 | self._query_socket.setsockopt(zmq.IDENTITY, self.session.session) | |
447 | connect_socket(self._query_socket, content.query) |
|
447 | connect_socket(self._query_socket, content.query) | |
448 | if content.control: |
|
448 | if content.control: | |
449 | self._control_socket = self.context.socket(zmq.PAIR) |
|
449 | self._control_socket = self.context.socket(zmq.PAIR) | |
450 | self._control_socket.setsockopt(zmq.IDENTITY, self.session.session) |
|
450 | self._control_socket.setsockopt(zmq.IDENTITY, self.session.session) | |
451 | connect_socket(self._control_socket, content.control) |
|
451 | connect_socket(self._control_socket, content.control) | |
452 | if content.iopub: |
|
452 | if content.iopub: | |
453 | self._iopub_socket = self.context.socket(zmq.SUB) |
|
453 | self._iopub_socket = self.context.socket(zmq.SUB) | |
454 | self._iopub_socket.setsockopt(zmq.SUBSCRIBE, '') |
|
454 | self._iopub_socket.setsockopt(zmq.SUBSCRIBE, '') | |
455 | self._iopub_socket.setsockopt(zmq.IDENTITY, self.session.session) |
|
455 | self._iopub_socket.setsockopt(zmq.IDENTITY, self.session.session) | |
456 | connect_socket(self._iopub_socket, content.iopub) |
|
456 | connect_socket(self._iopub_socket, content.iopub) | |
457 | self._update_engines(dict(content.engines)) |
|
457 | self._update_engines(dict(content.engines)) | |
458 |
|
458 | |||
459 | else: |
|
459 | else: | |
460 | self._connected = False |
|
460 | self._connected = False | |
461 | raise Exception("Failed to connect!") |
|
461 | raise Exception("Failed to connect!") | |
462 |
|
462 | |||
463 | #-------------------------------------------------------------------------- |
|
463 | #-------------------------------------------------------------------------- | |
464 | # handlers and callbacks for incoming messages |
|
464 | # handlers and callbacks for incoming messages | |
465 | #-------------------------------------------------------------------------- |
|
465 | #-------------------------------------------------------------------------- | |
466 |
|
466 | |||
467 | def _register_engine(self, msg): |
|
467 | def _register_engine(self, msg): | |
468 | """Register a new engine, and update our connection info.""" |
|
468 | """Register a new engine, and update our connection info.""" | |
469 | content = msg['content'] |
|
469 | content = msg['content'] | |
470 | eid = content['id'] |
|
470 | eid = content['id'] | |
471 | d = {eid : content['queue']} |
|
471 | d = {eid : content['queue']} | |
472 | self._update_engines(d) |
|
472 | self._update_engines(d) | |
473 | self._ids.add(int(eid)) |
|
473 | self._ids.add(int(eid)) | |
474 |
|
474 | |||
475 | def _unregister_engine(self, msg): |
|
475 | def _unregister_engine(self, msg): | |
476 | """Unregister an engine that has died.""" |
|
476 | """Unregister an engine that has died.""" | |
477 | content = msg['content'] |
|
477 | content = msg['content'] | |
478 | eid = int(content['id']) |
|
478 | eid = int(content['id']) | |
479 | if eid in self._ids: |
|
479 | if eid in self._ids: | |
480 | self._ids.remove(eid) |
|
480 | self._ids.remove(eid) | |
481 | self._engines.pop(eid) |
|
481 | self._engines.pop(eid) | |
482 | if self._task_socket and self._task_scheme == 'pure': |
|
482 | if self._task_socket and self._task_scheme == 'pure': | |
483 | self._stop_scheduling_tasks() |
|
483 | self._stop_scheduling_tasks() | |
484 |
|
484 | |||
485 | def _extract_metadata(self, header, parent, content): |
|
485 | def _extract_metadata(self, header, parent, content): | |
486 | md = {'msg_id' : parent['msg_id'], |
|
486 | md = {'msg_id' : parent['msg_id'], | |
487 | 'received' : datetime.now(), |
|
487 | 'received' : datetime.now(), | |
488 | 'engine_uuid' : header.get('engine', None), |
|
488 | 'engine_uuid' : header.get('engine', None), | |
489 | 'follow' : parent.get('follow', []), |
|
489 | 'follow' : parent.get('follow', []), | |
490 | 'after' : parent.get('after', []), |
|
490 | 'after' : parent.get('after', []), | |
491 | 'status' : content['status'], |
|
491 | 'status' : content['status'], | |
492 | } |
|
492 | } | |
493 |
|
493 | |||
494 | if md['engine_uuid'] is not None: |
|
494 | if md['engine_uuid'] is not None: | |
495 | md['engine_id'] = self._engines.get(md['engine_uuid'], None) |
|
495 | md['engine_id'] = self._engines.get(md['engine_uuid'], None) | |
496 |
|
496 | |||
497 | if 'date' in parent: |
|
497 | if 'date' in parent: | |
498 | md['submitted'] = datetime.strptime(parent['date'], ss.ISO8601) |
|
498 | md['submitted'] = datetime.strptime(parent['date'], ss.ISO8601) | |
499 | if 'started' in header: |
|
499 | if 'started' in header: | |
500 | md['started'] = datetime.strptime(header['started'], ss.ISO8601) |
|
500 | md['started'] = datetime.strptime(header['started'], ss.ISO8601) | |
501 | if 'date' in header: |
|
501 | if 'date' in header: | |
502 | md['completed'] = datetime.strptime(header['date'], ss.ISO8601) |
|
502 | md['completed'] = datetime.strptime(header['date'], ss.ISO8601) | |
503 | return md |
|
503 | return md | |
504 |
|
504 | |||
505 | def _handle_execute_reply(self, msg): |
|
505 | def _handle_execute_reply(self, msg): | |
506 | """Save the reply to an execute_request into our results. |
|
506 | """Save the reply to an execute_request into our results. | |
507 |
|
507 | |||
508 | execute messages are never actually used. apply is used instead. |
|
508 | execute messages are never actually used. apply is used instead. | |
509 | """ |
|
509 | """ | |
510 |
|
510 | |||
511 | parent = msg['parent_header'] |
|
511 | parent = msg['parent_header'] | |
512 | msg_id = parent['msg_id'] |
|
512 | msg_id = parent['msg_id'] | |
513 | if msg_id not in self.outstanding: |
|
513 | if msg_id not in self.outstanding: | |
514 | if msg_id in self.history: |
|
514 | if msg_id in self.history: | |
515 | print ("got stale result: %s"%msg_id) |
|
515 | print ("got stale result: %s"%msg_id) | |
516 | else: |
|
516 | else: | |
517 | print ("got unknown result: %s"%msg_id) |
|
517 | print ("got unknown result: %s"%msg_id) | |
518 | else: |
|
518 | else: | |
519 | self.outstanding.remove(msg_id) |
|
519 | self.outstanding.remove(msg_id) | |
520 | self.results[msg_id] = ss.unwrap_exception(msg['content']) |
|
520 | self.results[msg_id] = ss.unwrap_exception(msg['content']) | |
521 |
|
521 | |||
522 | def _handle_apply_reply(self, msg): |
|
522 | def _handle_apply_reply(self, msg): | |
523 | """Save the reply to an apply_request into our results.""" |
|
523 | """Save the reply to an apply_request into our results.""" | |
524 | parent = msg['parent_header'] |
|
524 | parent = msg['parent_header'] | |
525 | msg_id = parent['msg_id'] |
|
525 | msg_id = parent['msg_id'] | |
526 | if msg_id not in self.outstanding: |
|
526 | if msg_id not in self.outstanding: | |
527 | if msg_id in self.history: |
|
527 | if msg_id in self.history: | |
528 | print ("got stale result: %s"%msg_id) |
|
528 | print ("got stale result: %s"%msg_id) | |
529 | print self.results[msg_id] |
|
529 | print self.results[msg_id] | |
530 | print msg |
|
530 | print msg | |
531 | else: |
|
531 | else: | |
532 | print ("got unknown result: %s"%msg_id) |
|
532 | print ("got unknown result: %s"%msg_id) | |
533 | else: |
|
533 | else: | |
534 | self.outstanding.remove(msg_id) |
|
534 | self.outstanding.remove(msg_id) | |
535 | content = msg['content'] |
|
535 | content = msg['content'] | |
536 | header = msg['header'] |
|
536 | header = msg['header'] | |
537 |
|
537 | |||
538 | # construct metadata: |
|
538 | # construct metadata: | |
539 | md = self.metadata.setdefault(msg_id, Metadata()) |
|
539 | md = self.metadata.setdefault(msg_id, Metadata()) | |
540 | md.update(self._extract_metadata(header, parent, content)) |
|
540 | md.update(self._extract_metadata(header, parent, content)) | |
541 | self.metadata[msg_id] = md |
|
541 | self.metadata[msg_id] = md | |
542 |
|
542 | |||
543 | # construct result: |
|
543 | # construct result: | |
544 | if content['status'] == 'ok': |
|
544 | if content['status'] == 'ok': | |
545 | self.results[msg_id] = ss.unserialize_object(msg['buffers'])[0] |
|
545 | self.results[msg_id] = ss.unserialize_object(msg['buffers'])[0] | |
546 | elif content['status'] == 'aborted': |
|
546 | elif content['status'] == 'aborted': | |
547 | self.results[msg_id] = error.AbortedTask(msg_id) |
|
547 | self.results[msg_id] = error.AbortedTask(msg_id) | |
548 | elif content['status'] == 'resubmitted': |
|
548 | elif content['status'] == 'resubmitted': | |
549 | # TODO: handle resubmission |
|
549 | # TODO: handle resubmission | |
550 | pass |
|
550 | pass | |
551 | else: |
|
551 | else: | |
552 | e = ss.unwrap_exception(content) |
|
552 | e = ss.unwrap_exception(content) | |
553 | if e.engine_info: |
|
553 | if e.engine_info: | |
554 | e_uuid = e.engine_info['engineid'] |
|
554 | e_uuid = e.engine_info['engineid'] | |
555 | eid = self._engines[e_uuid] |
|
555 | eid = self._engines[e_uuid] | |
556 | e.engine_info['engineid'] = eid |
|
556 | e.engine_info['engineid'] = eid | |
557 | self.results[msg_id] = e |
|
557 | self.results[msg_id] = e | |
558 |
|
558 | |||
559 | def _flush_notifications(self): |
|
559 | def _flush_notifications(self): | |
560 | """Flush notifications of engine registrations waiting |
|
560 | """Flush notifications of engine registrations waiting | |
561 | in ZMQ queue.""" |
|
561 | in ZMQ queue.""" | |
562 | msg = self.session.recv(self._notification_socket, mode=zmq.NOBLOCK) |
|
562 | msg = self.session.recv(self._notification_socket, mode=zmq.NOBLOCK) | |
563 | while msg is not None: |
|
563 | while msg is not None: | |
564 | if self.debug: |
|
564 | if self.debug: | |
565 | pprint(msg) |
|
565 | pprint(msg) | |
566 | msg = msg[-1] |
|
566 | msg = msg[-1] | |
567 | msg_type = msg['msg_type'] |
|
567 | msg_type = msg['msg_type'] | |
568 | handler = self._notification_handlers.get(msg_type, None) |
|
568 | handler = self._notification_handlers.get(msg_type, None) | |
569 | if handler is None: |
|
569 | if handler is None: | |
570 | raise Exception("Unhandled message type: %s"%msg.msg_type) |
|
570 | raise Exception("Unhandled message type: %s"%msg.msg_type) | |
571 | else: |
|
571 | else: | |
572 | handler(msg) |
|
572 | handler(msg) | |
573 | msg = self.session.recv(self._notification_socket, mode=zmq.NOBLOCK) |
|
573 | msg = self.session.recv(self._notification_socket, mode=zmq.NOBLOCK) | |
574 |
|
574 | |||
575 | def _flush_results(self, sock): |
|
575 | def _flush_results(self, sock): | |
576 | """Flush task or queue results waiting in ZMQ queue.""" |
|
576 | """Flush task or queue results waiting in ZMQ queue.""" | |
577 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) |
|
577 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) | |
578 | while msg is not None: |
|
578 | while msg is not None: | |
579 | if self.debug: |
|
579 | if self.debug: | |
580 | pprint(msg) |
|
580 | pprint(msg) | |
581 | msg = msg[-1] |
|
581 | msg = msg[-1] | |
582 | msg_type = msg['msg_type'] |
|
582 | msg_type = msg['msg_type'] | |
583 | handler = self._queue_handlers.get(msg_type, None) |
|
583 | handler = self._queue_handlers.get(msg_type, None) | |
584 | if handler is None: |
|
584 | if handler is None: | |
585 | raise Exception("Unhandled message type: %s"%msg.msg_type) |
|
585 | raise Exception("Unhandled message type: %s"%msg.msg_type) | |
586 | else: |
|
586 | else: | |
587 | handler(msg) |
|
587 | handler(msg) | |
588 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) |
|
588 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) | |
589 |
|
589 | |||
590 | def _flush_control(self, sock): |
|
590 | def _flush_control(self, sock): | |
591 | """Flush replies from the control channel waiting |
|
591 | """Flush replies from the control channel waiting | |
592 | in the ZMQ queue. |
|
592 | in the ZMQ queue. | |
593 |
|
593 | |||
594 | Currently: ignore them.""" |
|
594 | Currently: ignore them.""" | |
595 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) |
|
595 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) | |
596 | while msg is not None: |
|
596 | while msg is not None: | |
597 | if self.debug: |
|
597 | if self.debug: | |
598 | pprint(msg) |
|
598 | pprint(msg) | |
599 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) |
|
599 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) | |
600 |
|
600 | |||
601 | def _flush_iopub(self, sock): |
|
601 | def _flush_iopub(self, sock): | |
602 | """Flush replies from the iopub channel waiting |
|
602 | """Flush replies from the iopub channel waiting | |
603 | in the ZMQ queue. |
|
603 | in the ZMQ queue. | |
604 | """ |
|
604 | """ | |
605 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) |
|
605 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) | |
606 | while msg is not None: |
|
606 | while msg is not None: | |
607 | if self.debug: |
|
607 | if self.debug: | |
608 | pprint(msg) |
|
608 | pprint(msg) | |
609 | msg = msg[-1] |
|
609 | msg = msg[-1] | |
610 | parent = msg['parent_header'] |
|
610 | parent = msg['parent_header'] | |
611 | msg_id = parent['msg_id'] |
|
611 | msg_id = parent['msg_id'] | |
612 | content = msg['content'] |
|
612 | content = msg['content'] | |
613 | header = msg['header'] |
|
613 | header = msg['header'] | |
614 | msg_type = msg['msg_type'] |
|
614 | msg_type = msg['msg_type'] | |
615 |
|
615 | |||
616 | # init metadata: |
|
616 | # init metadata: | |
617 | md = self.metadata.setdefault(msg_id, Metadata()) |
|
617 | md = self.metadata.setdefault(msg_id, Metadata()) | |
618 |
|
618 | |||
619 | if msg_type == 'stream': |
|
619 | if msg_type == 'stream': | |
620 | name = content['name'] |
|
620 | name = content['name'] | |
621 | s = md[name] or '' |
|
621 | s = md[name] or '' | |
622 | md[name] = s + content['data'] |
|
622 | md[name] = s + content['data'] | |
623 | elif msg_type == 'pyerr': |
|
623 | elif msg_type == 'pyerr': | |
624 | md.update({'pyerr' : ss.unwrap_exception(content)}) |
|
624 | md.update({'pyerr' : ss.unwrap_exception(content)}) | |
625 | else: |
|
625 | else: | |
626 | md.update({msg_type : content['data']}) |
|
626 | md.update({msg_type : content['data']}) | |
627 |
|
627 | |||
628 | self.metadata[msg_id] = md |
|
628 | self.metadata[msg_id] = md | |
629 |
|
629 | |||
630 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) |
|
630 | msg = self.session.recv(sock, mode=zmq.NOBLOCK) | |
631 |
|
631 | |||
632 | #-------------------------------------------------------------------------- |
|
632 | #-------------------------------------------------------------------------- | |
633 | # getitem |
|
633 | # getitem | |
634 | #-------------------------------------------------------------------------- |
|
634 | #-------------------------------------------------------------------------- | |
635 |
|
635 | |||
636 | def __getitem__(self, key): |
|
636 | def __getitem__(self, key): | |
637 | """Dict access returns DirectView multiplexer objects or, |
|
637 | """Dict access returns DirectView multiplexer objects or, | |
638 | if key is None, a LoadBalancedView.""" |
|
638 | if key is None, a LoadBalancedView.""" | |
639 | if key is None: |
|
639 | if key is None: | |
640 | return LoadBalancedView(self) |
|
640 | return LoadBalancedView(self) | |
641 | if isinstance(key, int): |
|
641 | if isinstance(key, int): | |
642 | if key not in self.ids: |
|
642 | if key not in self.ids: | |
643 | raise IndexError("No such engine: %i"%key) |
|
643 | raise IndexError("No such engine: %i"%key) | |
644 | return DirectView(self, key) |
|
644 | return DirectView(self, key) | |
645 |
|
645 | |||
646 | if isinstance(key, slice): |
|
646 | if isinstance(key, slice): | |
647 | indices = range(len(self.ids))[key] |
|
647 | indices = range(len(self.ids))[key] | |
648 | ids = sorted(self._ids) |
|
648 | ids = sorted(self._ids) | |
649 | key = [ ids[i] for i in indices ] |
|
649 | key = [ ids[i] for i in indices ] | |
650 | # newkeys = sorted(self._ids)[thekeys[k]] |
|
650 | # newkeys = sorted(self._ids)[thekeys[k]] | |
651 |
|
651 | |||
652 | if isinstance(key, (tuple, list, xrange)): |
|
652 | if isinstance(key, (tuple, list, xrange)): | |
653 | _,targets = self._build_targets(list(key)) |
|
653 | _,targets = self._build_targets(list(key)) | |
654 | return DirectView(self, targets) |
|
654 | return DirectView(self, targets) | |
655 | else: |
|
655 | else: | |
656 | raise TypeError("key by int/iterable of ints only, not %s"%(type(key))) |
|
656 | raise TypeError("key by int/iterable of ints only, not %s"%(type(key))) | |
657 |
|
657 | |||
658 | #-------------------------------------------------------------------------- |
|
658 | #-------------------------------------------------------------------------- | |
659 | # Begin public methods |
|
659 | # Begin public methods | |
660 | #-------------------------------------------------------------------------- |
|
660 | #-------------------------------------------------------------------------- | |
661 |
|
661 | |||
662 | @property |
|
662 | @property | |
663 | def remote(self): |
|
663 | def remote(self): | |
664 | """property for convenient RemoteFunction generation. |
|
664 | """property for convenient RemoteFunction generation. | |
665 |
|
665 | |||
666 | >>> @client.remote |
|
666 | >>> @client.remote | |
667 | ... def f(): |
|
667 | ... def f(): | |
668 | import os |
|
668 | import os | |
669 | print (os.getpid()) |
|
669 | print (os.getpid()) | |
670 | """ |
|
670 | """ | |
671 | return remote(self, block=self.block) |
|
671 | return remote(self, block=self.block) | |
672 |
|
672 | |||
673 | def spin(self): |
|
673 | def spin(self): | |
674 | """Flush any registration notifications and execution results |
|
674 | """Flush any registration notifications and execution results | |
675 | waiting in the ZMQ queue. |
|
675 | waiting in the ZMQ queue. | |
676 | """ |
|
676 | """ | |
677 | if self._notification_socket: |
|
677 | if self._notification_socket: | |
678 | self._flush_notifications() |
|
678 | self._flush_notifications() | |
679 | if self._mux_socket: |
|
679 | if self._mux_socket: | |
680 | self._flush_results(self._mux_socket) |
|
680 | self._flush_results(self._mux_socket) | |
681 | if self._task_socket: |
|
681 | if self._task_socket: | |
682 | self._flush_results(self._task_socket) |
|
682 | self._flush_results(self._task_socket) | |
683 | if self._control_socket: |
|
683 | if self._control_socket: | |
684 | self._flush_control(self._control_socket) |
|
684 | self._flush_control(self._control_socket) | |
685 | if self._iopub_socket: |
|
685 | if self._iopub_socket: | |
686 | self._flush_iopub(self._iopub_socket) |
|
686 | self._flush_iopub(self._iopub_socket) | |
687 |
|
687 | |||
688 | def barrier(self, msg_ids=None, timeout=-1): |
|
688 | def barrier(self, msg_ids=None, timeout=-1): | |
689 | """waits on one or more `msg_ids`, for up to `timeout` seconds. |
|
689 | """waits on one or more `msg_ids`, for up to `timeout` seconds. | |
690 |
|
690 | |||
691 | Parameters |
|
691 | Parameters | |
692 | ---------- |
|
692 | ---------- | |
693 | msg_ids : int, str, or list of ints and/or strs, or one or more AsyncResult objects |
|
693 | msg_ids : int, str, or list of ints and/or strs, or one or more AsyncResult objects | |
694 | ints are indices to self.history |
|
694 | ints are indices to self.history | |
695 | strs are msg_ids |
|
695 | strs are msg_ids | |
696 | default: wait on all outstanding messages |
|
696 | default: wait on all outstanding messages | |
697 | timeout : float |
|
697 | timeout : float | |
698 | a time in seconds, after which to give up. |
|
698 | a time in seconds, after which to give up. | |
699 | default is -1, which means no timeout |
|
699 | default is -1, which means no timeout | |
700 |
|
700 | |||
701 | Returns |
|
701 | Returns | |
702 | ------- |
|
702 | ------- | |
703 | True : when all msg_ids are done |
|
703 | True : when all msg_ids are done | |
704 | False : timeout reached, some msg_ids still outstanding |
|
704 | False : timeout reached, some msg_ids still outstanding | |
705 | """ |
|
705 | """ | |
706 | tic = time.time() |
|
706 | tic = time.time() | |
707 | if msg_ids is None: |
|
707 | if msg_ids is None: | |
708 | theids = self.outstanding |
|
708 | theids = self.outstanding | |
709 | else: |
|
709 | else: | |
710 | if isinstance(msg_ids, (int, str, AsyncResult)): |
|
710 | if isinstance(msg_ids, (int, str, AsyncResult)): | |
711 | msg_ids = [msg_ids] |
|
711 | msg_ids = [msg_ids] | |
712 | theids = set() |
|
712 | theids = set() | |
713 | for msg_id in msg_ids: |
|
713 | for msg_id in msg_ids: | |
714 | if isinstance(msg_id, int): |
|
714 | if isinstance(msg_id, int): | |
715 | msg_id = self.history[msg_id] |
|
715 | msg_id = self.history[msg_id] | |
716 | elif isinstance(msg_id, AsyncResult): |
|
716 | elif isinstance(msg_id, AsyncResult): | |
717 | map(theids.add, msg_id.msg_ids) |
|
717 | map(theids.add, msg_id.msg_ids) | |
718 | continue |
|
718 | continue | |
719 | theids.add(msg_id) |
|
719 | theids.add(msg_id) | |
720 | if not theids.intersection(self.outstanding): |
|
720 | if not theids.intersection(self.outstanding): | |
721 | return True |
|
721 | return True | |
722 | self.spin() |
|
722 | self.spin() | |
723 | while theids.intersection(self.outstanding): |
|
723 | while theids.intersection(self.outstanding): | |
724 | if timeout >= 0 and ( time.time()-tic ) > timeout: |
|
724 | if timeout >= 0 and ( time.time()-tic ) > timeout: | |
725 | break |
|
725 | break | |
726 | time.sleep(1e-3) |
|
726 | time.sleep(1e-3) | |
727 | self.spin() |
|
727 | self.spin() | |
728 | return len(theids.intersection(self.outstanding)) == 0 |
|
728 | return len(theids.intersection(self.outstanding)) == 0 | |
729 |
|
729 | |||
730 | #-------------------------------------------------------------------------- |
|
730 | #-------------------------------------------------------------------------- | |
731 | # Control methods |
|
731 | # Control methods | |
732 | #-------------------------------------------------------------------------- |
|
732 | #-------------------------------------------------------------------------- | |
733 |
|
733 | |||
734 | @spinfirst |
|
734 | @spinfirst | |
735 | @defaultblock |
|
735 | @defaultblock | |
736 | def clear(self, targets=None, block=None): |
|
736 | def clear(self, targets=None, block=None): | |
737 | """Clear the namespace in target(s).""" |
|
737 | """Clear the namespace in target(s).""" | |
738 | targets = self._build_targets(targets)[0] |
|
738 | targets = self._build_targets(targets)[0] | |
739 | for t in targets: |
|
739 | for t in targets: | |
740 | self.session.send(self._control_socket, 'clear_request', content={}, ident=t) |
|
740 | self.session.send(self._control_socket, 'clear_request', content={}, ident=t) | |
741 | error = False |
|
741 | error = False | |
742 | if self.block: |
|
742 | if self.block: | |
743 | for i in range(len(targets)): |
|
743 | for i in range(len(targets)): | |
744 | idents,msg = self.session.recv(self._control_socket,0) |
|
744 | idents,msg = self.session.recv(self._control_socket,0) | |
745 | if self.debug: |
|
745 | if self.debug: | |
746 | pprint(msg) |
|
746 | pprint(msg) | |
747 | if msg['content']['status'] != 'ok': |
|
747 | if msg['content']['status'] != 'ok': | |
748 | error = ss.unwrap_exception(msg['content']) |
|
748 | error = ss.unwrap_exception(msg['content']) | |
749 | if error: |
|
749 | if error: | |
750 | return error |
|
750 | return error | |
751 |
|
751 | |||
752 |
|
752 | |||
753 | @spinfirst |
|
753 | @spinfirst | |
754 | @defaultblock |
|
754 | @defaultblock | |
755 | def abort(self, msg_ids = None, targets=None, block=None): |
|
755 | def abort(self, msg_ids = None, targets=None, block=None): | |
756 | """Abort the execution queues of target(s).""" |
|
756 | """Abort the execution queues of target(s).""" | |
757 | targets = self._build_targets(targets)[0] |
|
757 | targets = self._build_targets(targets)[0] | |
758 | if isinstance(msg_ids, basestring): |
|
758 | if isinstance(msg_ids, basestring): | |
759 | msg_ids = [msg_ids] |
|
759 | msg_ids = [msg_ids] | |
760 | content = dict(msg_ids=msg_ids) |
|
760 | content = dict(msg_ids=msg_ids) | |
761 | for t in targets: |
|
761 | for t in targets: | |
762 | self.session.send(self._control_socket, 'abort_request', |
|
762 | self.session.send(self._control_socket, 'abort_request', | |
763 | content=content, ident=t) |
|
763 | content=content, ident=t) | |
764 | error = False |
|
764 | error = False | |
765 | if self.block: |
|
765 | if self.block: | |
766 | for i in range(len(targets)): |
|
766 | for i in range(len(targets)): | |
767 | idents,msg = self.session.recv(self._control_socket,0) |
|
767 | idents,msg = self.session.recv(self._control_socket,0) | |
768 | if self.debug: |
|
768 | if self.debug: | |
769 | pprint(msg) |
|
769 | pprint(msg) | |
770 | if msg['content']['status'] != 'ok': |
|
770 | if msg['content']['status'] != 'ok': | |
771 | error = ss.unwrap_exception(msg['content']) |
|
771 | error = ss.unwrap_exception(msg['content']) | |
772 | if error: |
|
772 | if error: | |
773 | return error |
|
773 | return error | |
774 |
|
774 | |||
775 | @spinfirst |
|
775 | @spinfirst | |
776 | @defaultblock |
|
776 | @defaultblock | |
777 | def shutdown(self, targets=None, restart=False, controller=False, block=None): |
|
777 | def shutdown(self, targets=None, restart=False, controller=False, block=None): | |
778 | """Terminates one or more engine processes, optionally including the controller.""" |
|
778 | """Terminates one or more engine processes, optionally including the controller.""" | |
779 | if controller: |
|
779 | if controller: | |
780 | targets = 'all' |
|
780 | targets = 'all' | |
781 | targets = self._build_targets(targets)[0] |
|
781 | targets = self._build_targets(targets)[0] | |
782 | for t in targets: |
|
782 | for t in targets: | |
783 | self.session.send(self._control_socket, 'shutdown_request', |
|
783 | self.session.send(self._control_socket, 'shutdown_request', | |
784 | content={'restart':restart},ident=t) |
|
784 | content={'restart':restart},ident=t) | |
785 | error = False |
|
785 | error = False | |
786 | if block or controller: |
|
786 | if block or controller: | |
787 | for i in range(len(targets)): |
|
787 | for i in range(len(targets)): | |
788 | idents,msg = self.session.recv(self._control_socket,0) |
|
788 | idents,msg = self.session.recv(self._control_socket,0) | |
789 | if self.debug: |
|
789 | if self.debug: | |
790 | pprint(msg) |
|
790 | pprint(msg) | |
791 | if msg['content']['status'] != 'ok': |
|
791 | if msg['content']['status'] != 'ok': | |
792 | error = ss.unwrap_exception(msg['content']) |
|
792 | error = ss.unwrap_exception(msg['content']) | |
793 |
|
793 | |||
794 | if controller: |
|
794 | if controller: | |
795 | time.sleep(0.25) |
|
795 | time.sleep(0.25) | |
796 | self.session.send(self._query_socket, 'shutdown_request') |
|
796 | self.session.send(self._query_socket, 'shutdown_request') | |
797 | idents,msg = self.session.recv(self._query_socket, 0) |
|
797 | idents,msg = self.session.recv(self._query_socket, 0) | |
798 | if self.debug: |
|
798 | if self.debug: | |
799 | pprint(msg) |
|
799 | pprint(msg) | |
800 | if msg['content']['status'] != 'ok': |
|
800 | if msg['content']['status'] != 'ok': | |
801 | error = ss.unwrap_exception(msg['content']) |
|
801 | error = ss.unwrap_exception(msg['content']) | |
802 |
|
802 | |||
803 | if error: |
|
803 | if error: | |
804 | raise error |
|
804 | raise error | |
805 |
|
805 | |||
806 | #-------------------------------------------------------------------------- |
|
806 | #-------------------------------------------------------------------------- | |
807 | # Execution methods |
|
807 | # Execution methods | |
808 | #-------------------------------------------------------------------------- |
|
808 | #-------------------------------------------------------------------------- | |
809 |
|
809 | |||
810 | @defaultblock |
|
810 | @defaultblock | |
811 | def execute(self, code, targets='all', block=None): |
|
811 | def execute(self, code, targets='all', block=None): | |
812 | """Executes `code` on `targets` in blocking or nonblocking manner. |
|
812 | """Executes `code` on `targets` in blocking or nonblocking manner. | |
813 |
|
813 | |||
814 | ``execute`` is always `bound` (affects engine namespace) |
|
814 | ``execute`` is always `bound` (affects engine namespace) | |
815 |
|
815 | |||
816 | Parameters |
|
816 | Parameters | |
817 | ---------- |
|
817 | ---------- | |
818 | code : str |
|
818 | code : str | |
819 | the code string to be executed |
|
819 | the code string to be executed | |
820 | targets : int/str/list of ints/strs |
|
820 | targets : int/str/list of ints/strs | |
821 | the engines on which to execute |
|
821 | the engines on which to execute | |
822 | default : all |
|
822 | default : all | |
823 | block : bool |
|
823 | block : bool | |
824 | whether or not to wait until done to return |
|
824 | whether or not to wait until done to return | |
825 | default: self.block |
|
825 | default: self.block | |
826 | """ |
|
826 | """ | |
827 | result = self.apply(_execute, (code,), targets=targets, block=self.block, bound=True) |
|
827 | result = self.apply(_execute, (code,), targets=targets, block=self.block, bound=True) | |
828 | return result |
|
828 | return result | |
829 |
|
829 | |||
830 | def run(self, filename, targets='all', block=None): |
|
830 | def run(self, filename, targets='all', block=None): | |
831 | """Execute contents of `filename` on engine(s). |
|
831 | """Execute contents of `filename` on engine(s). | |
832 |
|
832 | |||
833 | This simply reads the contents of the file and calls `execute`. |
|
833 | This simply reads the contents of the file and calls `execute`. | |
834 |
|
834 | |||
835 | Parameters |
|
835 | Parameters | |
836 | ---------- |
|
836 | ---------- | |
837 | filename : str |
|
837 | filename : str | |
838 | The path to the file |
|
838 | The path to the file | |
839 | targets : int/str/list of ints/strs |
|
839 | targets : int/str/list of ints/strs | |
840 | the engines on which to execute |
|
840 | the engines on which to execute | |
841 | default : all |
|
841 | default : all | |
842 | block : bool |
|
842 | block : bool | |
843 | whether or not to wait until done |
|
843 | whether or not to wait until done | |
844 | default: self.block |
|
844 | default: self.block | |
845 |
|
845 | |||
846 | """ |
|
846 | """ | |
847 | with open(filename, 'rb') as f: |
|
847 | with open(filename, 'rb') as f: | |
848 | code = f.read() |
|
848 | code = f.read() | |
849 | return self.execute(code, targets=targets, block=block) |
|
849 | return self.execute(code, targets=targets, block=block) | |
850 |
|
850 | |||
851 | def _maybe_raise(self, result): |
|
851 | def _maybe_raise(self, result): | |
852 | """wrapper for maybe raising an exception if apply failed.""" |
|
852 | """wrapper for maybe raising an exception if apply failed.""" | |
853 | if isinstance(result, error.RemoteError): |
|
853 | if isinstance(result, error.RemoteError): | |
854 | raise result |
|
854 | raise result | |
855 |
|
855 | |||
856 | return result |
|
856 | return result | |
857 |
|
857 | |||
858 | def _build_dependency(self, dep): |
|
858 | def _build_dependency(self, dep): | |
859 | """helper for building jsonable dependencies from various input forms""" |
|
859 | """helper for building jsonable dependencies from various input forms""" | |
860 | if isinstance(dep, Dependency): |
|
860 | if isinstance(dep, Dependency): | |
861 | return dep.as_dict() |
|
861 | return dep.as_dict() | |
862 | elif isinstance(dep, AsyncResult): |
|
862 | elif isinstance(dep, AsyncResult): | |
863 | return dep.msg_ids |
|
863 | return dep.msg_ids | |
864 | elif dep is None: |
|
864 | elif dep is None: | |
865 | return [] |
|
865 | return [] | |
866 | elif isinstance(dep, set): |
|
|||
867 | return list(dep) |
|
|||
868 | elif isinstance(dep, (list,dict)): |
|
|||
869 | return dep |
|
|||
870 | elif isinstance(dep, str): |
|
|||
871 | return [dep] |
|
|||
872 | else: |
|
866 | else: | |
873 | raise TypeError("Dependency may be: set,list,dict,Dependency or AsyncResult, not %r"%type(dep)) |
|
867 | # pass to Dependency constructor | |
|
868 | return list(Dependency(dep)) | |||
874 |
|
869 | |||
875 | def apply(self, f, args=None, kwargs=None, bound=True, block=None, targets=None, |
|
870 | def apply(self, f, args=None, kwargs=None, bound=True, block=None, targets=None, | |
876 | after=None, follow=None, timeout=None): |
|
871 | after=None, follow=None, timeout=None): | |
877 | """Call `f(*args, **kwargs)` on a remote engine(s), returning the result. |
|
872 | """Call `f(*args, **kwargs)` on a remote engine(s), returning the result. | |
878 |
|
873 | |||
879 | This is the central execution command for the client. |
|
874 | This is the central execution command for the client. | |
880 |
|
875 | |||
881 | Parameters |
|
876 | Parameters | |
882 | ---------- |
|
877 | ---------- | |
883 |
|
878 | |||
884 | f : function |
|
879 | f : function | |
885 | The fuction to be called remotely |
|
880 | The fuction to be called remotely | |
886 | args : tuple/list |
|
881 | args : tuple/list | |
887 | The positional arguments passed to `f` |
|
882 | The positional arguments passed to `f` | |
888 | kwargs : dict |
|
883 | kwargs : dict | |
889 | The keyword arguments passed to `f` |
|
884 | The keyword arguments passed to `f` | |
890 | bound : bool (default: True) |
|
885 | bound : bool (default: True) | |
891 | Whether to execute in the Engine(s) namespace, or in a clean |
|
886 | Whether to execute in the Engine(s) namespace, or in a clean | |
892 | namespace not affecting the engine. |
|
887 | namespace not affecting the engine. | |
893 | block : bool (default: self.block) |
|
888 | block : bool (default: self.block) | |
894 | Whether to wait for the result, or return immediately. |
|
889 | Whether to wait for the result, or return immediately. | |
895 | False: |
|
890 | False: | |
896 | returns AsyncResult |
|
891 | returns AsyncResult | |
897 | True: |
|
892 | True: | |
898 | returns actual result(s) of f(*args, **kwargs) |
|
893 | returns actual result(s) of f(*args, **kwargs) | |
899 | if multiple targets: |
|
894 | if multiple targets: | |
900 | list of results, matching `targets` |
|
895 | list of results, matching `targets` | |
901 | targets : int,list of ints, 'all', None |
|
896 | targets : int,list of ints, 'all', None | |
902 | Specify the destination of the job. |
|
897 | Specify the destination of the job. | |
903 | if None: |
|
898 | if None: | |
904 | Submit via Task queue for load-balancing. |
|
899 | Submit via Task queue for load-balancing. | |
905 | if 'all': |
|
900 | if 'all': | |
906 | Run on all active engines |
|
901 | Run on all active engines | |
907 | if list: |
|
902 | if list: | |
908 | Run on each specified engine |
|
903 | Run on each specified engine | |
909 | if int: |
|
904 | if int: | |
910 | Run on single engine |
|
905 | Run on single engine | |
911 |
|
906 | |||
912 | after : Dependency or collection of msg_ids |
|
907 | after : Dependency or collection of msg_ids | |
913 | Only for load-balanced execution (targets=None) |
|
908 | Only for load-balanced execution (targets=None) | |
914 | Specify a list of msg_ids as a time-based dependency. |
|
909 | Specify a list of msg_ids as a time-based dependency. | |
915 | This job will only be run *after* the dependencies |
|
910 | This job will only be run *after* the dependencies | |
916 | have been met. |
|
911 | have been met. | |
917 |
|
912 | |||
918 | follow : Dependency or collection of msg_ids |
|
913 | follow : Dependency or collection of msg_ids | |
919 | Only for load-balanced execution (targets=None) |
|
914 | Only for load-balanced execution (targets=None) | |
920 | Specify a list of msg_ids as a location-based dependency. |
|
915 | Specify a list of msg_ids as a location-based dependency. | |
921 | This job will only be run on an engine where this dependency |
|
916 | This job will only be run on an engine where this dependency | |
922 | is met. |
|
917 | is met. | |
923 |
|
918 | |||
924 | timeout : float or None |
|
919 | timeout : float/int or None | |
925 | Only for load-balanced execution (targets=None) |
|
920 | Only for load-balanced execution (targets=None) | |
926 | Specify an amount of time (in seconds) |
|
921 | Specify an amount of time (in seconds) for the scheduler to | |
|
922 | wait for dependencies to be met before failing with a | |||
|
923 | DependencyTimeout. | |||
927 |
|
924 | |||
928 | Returns |
|
925 | Returns | |
929 | ------- |
|
926 | ------- | |
930 | if block is False: |
|
927 | if block is False: | |
931 | return AsyncResult wrapping msg_ids |
|
928 | return AsyncResult wrapping msg_ids | |
932 | output of AsyncResult.get() is identical to that of `apply(...block=True)` |
|
929 | output of AsyncResult.get() is identical to that of `apply(...block=True)` | |
933 | else: |
|
930 | else: | |
934 | if single target: |
|
931 | if single target: | |
935 | return result of `f(*args, **kwargs)` |
|
932 | return result of `f(*args, **kwargs)` | |
936 | else: |
|
933 | else: | |
937 | return list of results, matching `targets` |
|
934 | return list of results, matching `targets` | |
938 | """ |
|
935 | """ | |
939 |
|
936 | |||
940 | # defaults: |
|
937 | # defaults: | |
941 | block = block if block is not None else self.block |
|
938 | block = block if block is not None else self.block | |
942 | args = args if args is not None else [] |
|
939 | args = args if args is not None else [] | |
943 | kwargs = kwargs if kwargs is not None else {} |
|
940 | kwargs = kwargs if kwargs is not None else {} | |
944 |
|
941 | |||
945 | # enforce types of f,args,kwrags |
|
942 | # enforce types of f,args,kwrags | |
946 | if not callable(f): |
|
943 | if not callable(f): | |
947 | raise TypeError("f must be callable, not %s"%type(f)) |
|
944 | raise TypeError("f must be callable, not %s"%type(f)) | |
948 | if not isinstance(args, (tuple, list)): |
|
945 | if not isinstance(args, (tuple, list)): | |
949 | raise TypeError("args must be tuple or list, not %s"%type(args)) |
|
946 | raise TypeError("args must be tuple or list, not %s"%type(args)) | |
950 | if not isinstance(kwargs, dict): |
|
947 | if not isinstance(kwargs, dict): | |
951 | raise TypeError("kwargs must be dict, not %s"%type(kwargs)) |
|
948 | raise TypeError("kwargs must be dict, not %s"%type(kwargs)) | |
952 |
|
949 | |||
953 | after = self._build_dependency(after) |
|
|||
954 | follow = self._build_dependency(follow) |
|
|||
955 |
|
||||
956 | options = dict(bound=bound, block=block) |
|
950 | options = dict(bound=bound, block=block) | |
957 |
|
951 | |||
958 | if targets is None: |
|
952 | if targets is None: | |
959 | if self._task_socket: |
|
953 | if self._task_socket: | |
960 | return self._apply_balanced(f, args, kwargs, timeout=timeout, |
|
954 | return self._apply_balanced(f, args, kwargs, timeout=timeout, | |
961 | after=after, follow=follow, **options) |
|
955 | after=after, follow=follow, **options) | |
962 | else: |
|
956 | else: | |
963 | msg = "Task farming is disabled" |
|
957 | msg = "Task farming is disabled" | |
964 | if self._task_scheme == 'pure': |
|
958 | if self._task_scheme == 'pure': | |
965 | msg += " because the pure ZMQ scheduler cannot handle" |
|
959 | msg += " because the pure ZMQ scheduler cannot handle" | |
966 | msg += " disappearing engines." |
|
960 | msg += " disappearing engines." | |
967 | raise RuntimeError(msg) |
|
961 | raise RuntimeError(msg) | |
968 | else: |
|
962 | else: | |
969 | return self._apply_direct(f, args, kwargs, targets=targets, **options) |
|
963 | return self._apply_direct(f, args, kwargs, targets=targets, **options) | |
970 |
|
964 | |||
971 | def _apply_balanced(self, f, args, kwargs, bound=True, block=None, |
|
965 | def _apply_balanced(self, f, args, kwargs, bound=True, block=None, | |
972 | after=None, follow=None, timeout=None): |
|
966 | after=None, follow=None, timeout=None): | |
973 | """The underlying method for applying functions in a load balanced |
|
967 | """The underlying method for applying functions in a load balanced | |
974 | manner, via the task queue.""" |
|
968 | manner, via the task queue.""" | |
975 |
|
969 | |||
976 | if self._task_scheme == 'pure': |
|
970 | if self._task_scheme == 'pure': | |
977 | # pure zmq scheme doesn't support dependencies |
|
971 | # pure zmq scheme doesn't support dependencies | |
978 | msg = "Pure ZMQ scheduler doesn't support dependencies" |
|
972 | msg = "Pure ZMQ scheduler doesn't support dependencies" | |
979 | if (follow or after): |
|
973 | if (follow or after): | |
980 | # hard fail on DAG dependencies |
|
974 | # hard fail on DAG dependencies | |
981 | raise RuntimeError(msg) |
|
975 | raise RuntimeError(msg) | |
982 | if isinstance(f, dependent): |
|
976 | if isinstance(f, dependent): | |
983 | # soft warn on functional dependencies |
|
977 | # soft warn on functional dependencies | |
984 | warnings.warn(msg, RuntimeWarning) |
|
978 | warnings.warn(msg, RuntimeWarning) | |
985 |
|
979 | |||
986 |
|
980 | |||
|
981 | after = self._build_dependency(after) | |||
|
982 | follow = self._build_dependency(follow) | |||
987 | subheader = dict(after=after, follow=follow, timeout=timeout) |
|
983 | subheader = dict(after=after, follow=follow, timeout=timeout) | |
988 | bufs = ss.pack_apply_message(f,args,kwargs) |
|
984 | bufs = ss.pack_apply_message(f,args,kwargs) | |
989 | content = dict(bound=bound) |
|
985 | content = dict(bound=bound) | |
990 |
|
986 | |||
991 | msg = self.session.send(self._task_socket, "apply_request", |
|
987 | msg = self.session.send(self._task_socket, "apply_request", | |
992 | content=content, buffers=bufs, subheader=subheader) |
|
988 | content=content, buffers=bufs, subheader=subheader) | |
993 | msg_id = msg['msg_id'] |
|
989 | msg_id = msg['msg_id'] | |
994 | self.outstanding.add(msg_id) |
|
990 | self.outstanding.add(msg_id) | |
995 | self.history.append(msg_id) |
|
991 | self.history.append(msg_id) | |
996 | ar = AsyncResult(self, [msg_id], fname=f.__name__) |
|
992 | ar = AsyncResult(self, [msg_id], fname=f.__name__) | |
997 | if block: |
|
993 | if block: | |
998 | return ar.get() |
|
994 | return ar.get() | |
999 | else: |
|
995 | else: | |
1000 | return ar |
|
996 | return ar | |
1001 |
|
997 | |||
1002 | def _apply_direct(self, f, args, kwargs, bound=True, block=None, targets=None): |
|
998 | def _apply_direct(self, f, args, kwargs, bound=True, block=None, targets=None): | |
1003 | """Then underlying method for applying functions to specific engines |
|
999 | """Then underlying method for applying functions to specific engines | |
1004 | via the MUX queue.""" |
|
1000 | via the MUX queue.""" | |
1005 |
|
1001 | |||
1006 | queues,targets = self._build_targets(targets) |
|
1002 | queues,targets = self._build_targets(targets) | |
1007 |
|
1003 | |||
1008 | subheader = {} |
|
1004 | subheader = {} | |
1009 | content = dict(bound=bound) |
|
1005 | content = dict(bound=bound) | |
1010 | bufs = ss.pack_apply_message(f,args,kwargs) |
|
1006 | bufs = ss.pack_apply_message(f,args,kwargs) | |
1011 |
|
1007 | |||
1012 | msg_ids = [] |
|
1008 | msg_ids = [] | |
1013 | for queue in queues: |
|
1009 | for queue in queues: | |
1014 | msg = self.session.send(self._mux_socket, "apply_request", |
|
1010 | msg = self.session.send(self._mux_socket, "apply_request", | |
1015 | content=content, buffers=bufs,ident=queue, subheader=subheader) |
|
1011 | content=content, buffers=bufs,ident=queue, subheader=subheader) | |
1016 | msg_id = msg['msg_id'] |
|
1012 | msg_id = msg['msg_id'] | |
1017 | self.outstanding.add(msg_id) |
|
1013 | self.outstanding.add(msg_id) | |
1018 | self.history.append(msg_id) |
|
1014 | self.history.append(msg_id) | |
1019 | msg_ids.append(msg_id) |
|
1015 | msg_ids.append(msg_id) | |
1020 | ar = AsyncResult(self, msg_ids, fname=f.__name__) |
|
1016 | ar = AsyncResult(self, msg_ids, fname=f.__name__) | |
1021 | if block: |
|
1017 | if block: | |
1022 | return ar.get() |
|
1018 | return ar.get() | |
1023 | else: |
|
1019 | else: | |
1024 | return ar |
|
1020 | return ar | |
1025 |
|
1021 | |||
1026 | #-------------------------------------------------------------------------- |
|
1022 | #-------------------------------------------------------------------------- | |
1027 | # Map and decorators |
|
1023 | # Map and decorators | |
1028 | #-------------------------------------------------------------------------- |
|
1024 | #-------------------------------------------------------------------------- | |
1029 |
|
1025 | |||
1030 | def map(self, f, *sequences): |
|
1026 | def map(self, f, *sequences): | |
1031 | """Parallel version of builtin `map`, using all our engines.""" |
|
1027 | """Parallel version of builtin `map`, using all our engines.""" | |
1032 | pf = ParallelFunction(self, f, block=self.block, |
|
1028 | pf = ParallelFunction(self, f, block=self.block, | |
1033 | bound=True, targets='all') |
|
1029 | bound=True, targets='all') | |
1034 | return pf.map(*sequences) |
|
1030 | return pf.map(*sequences) | |
1035 |
|
1031 | |||
1036 | def parallel(self, bound=True, targets='all', block=True): |
|
1032 | def parallel(self, bound=True, targets='all', block=True): | |
1037 | """Decorator for making a ParallelFunction.""" |
|
1033 | """Decorator for making a ParallelFunction.""" | |
1038 | return parallel(self, bound=bound, targets=targets, block=block) |
|
1034 | return parallel(self, bound=bound, targets=targets, block=block) | |
1039 |
|
1035 | |||
1040 | def remote(self, bound=True, targets='all', block=True): |
|
1036 | def remote(self, bound=True, targets='all', block=True): | |
1041 | """Decorator for making a RemoteFunction.""" |
|
1037 | """Decorator for making a RemoteFunction.""" | |
1042 | return remote(self, bound=bound, targets=targets, block=block) |
|
1038 | return remote(self, bound=bound, targets=targets, block=block) | |
1043 |
|
1039 | |||
1044 | #-------------------------------------------------------------------------- |
|
1040 | #-------------------------------------------------------------------------- | |
1045 | # Data movement |
|
1041 | # Data movement | |
1046 | #-------------------------------------------------------------------------- |
|
1042 | #-------------------------------------------------------------------------- | |
1047 |
|
1043 | |||
1048 | @defaultblock |
|
1044 | @defaultblock | |
1049 | def push(self, ns, targets='all', block=None): |
|
1045 | def push(self, ns, targets='all', block=None): | |
1050 | """Push the contents of `ns` into the namespace on `target`""" |
|
1046 | """Push the contents of `ns` into the namespace on `target`""" | |
1051 | if not isinstance(ns, dict): |
|
1047 | if not isinstance(ns, dict): | |
1052 | raise TypeError("Must be a dict, not %s"%type(ns)) |
|
1048 | raise TypeError("Must be a dict, not %s"%type(ns)) | |
1053 | result = self.apply(_push, (ns,), targets=targets, block=block, bound=True) |
|
1049 | result = self.apply(_push, (ns,), targets=targets, block=block, bound=True) | |
1054 | return result |
|
1050 | return result | |
1055 |
|
1051 | |||
1056 | @defaultblock |
|
1052 | @defaultblock | |
1057 | def pull(self, keys, targets='all', block=None): |
|
1053 | def pull(self, keys, targets='all', block=None): | |
1058 | """Pull objects from `target`'s namespace by `keys`""" |
|
1054 | """Pull objects from `target`'s namespace by `keys`""" | |
1059 | if isinstance(keys, str): |
|
1055 | if isinstance(keys, str): | |
1060 | pass |
|
1056 | pass | |
1061 | elif isinstance(keys, (list,tuple,set)): |
|
1057 | elif isinstance(keys, (list,tuple,set)): | |
1062 | for key in keys: |
|
1058 | for key in keys: | |
1063 | if not isinstance(key, str): |
|
1059 | if not isinstance(key, str): | |
1064 | raise TypeError |
|
1060 | raise TypeError | |
1065 | result = self.apply(_pull, (keys,), targets=targets, block=block, bound=True) |
|
1061 | result = self.apply(_pull, (keys,), targets=targets, block=block, bound=True) | |
1066 | return result |
|
1062 | return result | |
1067 |
|
1063 | |||
1068 | def scatter(self, key, seq, dist='b', flatten=False, targets='all', block=None): |
|
1064 | def scatter(self, key, seq, dist='b', flatten=False, targets='all', block=None): | |
1069 | """ |
|
1065 | """ | |
1070 | Partition a Python sequence and send the partitions to a set of engines. |
|
1066 | Partition a Python sequence and send the partitions to a set of engines. | |
1071 | """ |
|
1067 | """ | |
1072 | block = block if block is not None else self.block |
|
1068 | block = block if block is not None else self.block | |
1073 | targets = self._build_targets(targets)[-1] |
|
1069 | targets = self._build_targets(targets)[-1] | |
1074 | mapObject = Map.dists[dist]() |
|
1070 | mapObject = Map.dists[dist]() | |
1075 | nparts = len(targets) |
|
1071 | nparts = len(targets) | |
1076 | msg_ids = [] |
|
1072 | msg_ids = [] | |
1077 | for index, engineid in enumerate(targets): |
|
1073 | for index, engineid in enumerate(targets): | |
1078 | partition = mapObject.getPartition(seq, index, nparts) |
|
1074 | partition = mapObject.getPartition(seq, index, nparts) | |
1079 | if flatten and len(partition) == 1: |
|
1075 | if flatten and len(partition) == 1: | |
1080 | r = self.push({key: partition[0]}, targets=engineid, block=False) |
|
1076 | r = self.push({key: partition[0]}, targets=engineid, block=False) | |
1081 | else: |
|
1077 | else: | |
1082 | r = self.push({key: partition}, targets=engineid, block=False) |
|
1078 | r = self.push({key: partition}, targets=engineid, block=False) | |
1083 | msg_ids.extend(r.msg_ids) |
|
1079 | msg_ids.extend(r.msg_ids) | |
1084 | r = AsyncResult(self, msg_ids, fname='scatter') |
|
1080 | r = AsyncResult(self, msg_ids, fname='scatter') | |
1085 | if block: |
|
1081 | if block: | |
1086 | return r.get() |
|
1082 | return r.get() | |
1087 | else: |
|
1083 | else: | |
1088 | return r |
|
1084 | return r | |
1089 |
|
1085 | |||
1090 | def gather(self, key, dist='b', targets='all', block=None): |
|
1086 | def gather(self, key, dist='b', targets='all', block=None): | |
1091 | """ |
|
1087 | """ | |
1092 | Gather a partitioned sequence on a set of engines as a single local seq. |
|
1088 | Gather a partitioned sequence on a set of engines as a single local seq. | |
1093 | """ |
|
1089 | """ | |
1094 | block = block if block is not None else self.block |
|
1090 | block = block if block is not None else self.block | |
1095 |
|
1091 | |||
1096 | targets = self._build_targets(targets)[-1] |
|
1092 | targets = self._build_targets(targets)[-1] | |
1097 | mapObject = Map.dists[dist]() |
|
1093 | mapObject = Map.dists[dist]() | |
1098 | msg_ids = [] |
|
1094 | msg_ids = [] | |
1099 | for index, engineid in enumerate(targets): |
|
1095 | for index, engineid in enumerate(targets): | |
1100 | msg_ids.extend(self.pull(key, targets=engineid,block=False).msg_ids) |
|
1096 | msg_ids.extend(self.pull(key, targets=engineid,block=False).msg_ids) | |
1101 |
|
1097 | |||
1102 | r = AsyncMapResult(self, msg_ids, mapObject, fname='gather') |
|
1098 | r = AsyncMapResult(self, msg_ids, mapObject, fname='gather') | |
1103 | if block: |
|
1099 | if block: | |
1104 | return r.get() |
|
1100 | return r.get() | |
1105 | else: |
|
1101 | else: | |
1106 | return r |
|
1102 | return r | |
1107 |
|
1103 | |||
1108 | #-------------------------------------------------------------------------- |
|
1104 | #-------------------------------------------------------------------------- | |
1109 | # Query methods |
|
1105 | # Query methods | |
1110 | #-------------------------------------------------------------------------- |
|
1106 | #-------------------------------------------------------------------------- | |
1111 |
|
1107 | |||
1112 | @spinfirst |
|
1108 | @spinfirst | |
1113 | def get_results(self, msg_ids, status_only=False): |
|
1109 | def get_results(self, msg_ids, status_only=False): | |
1114 | """Returns the result of the execute or task request with `msg_ids`. |
|
1110 | """Returns the result of the execute or task request with `msg_ids`. | |
1115 |
|
1111 | |||
1116 | Parameters |
|
1112 | Parameters | |
1117 | ---------- |
|
1113 | ---------- | |
1118 | msg_ids : list of ints or msg_ids |
|
1114 | msg_ids : list of ints or msg_ids | |
1119 | if int: |
|
1115 | if int: | |
1120 | Passed as index to self.history for convenience. |
|
1116 | Passed as index to self.history for convenience. | |
1121 | status_only : bool (default: False) |
|
1117 | status_only : bool (default: False) | |
1122 | if False: |
|
1118 | if False: | |
1123 | return the actual results |
|
1119 | return the actual results | |
1124 |
|
1120 | |||
1125 | Returns |
|
1121 | Returns | |
1126 | ------- |
|
1122 | ------- | |
1127 |
|
1123 | |||
1128 | results : dict |
|
1124 | results : dict | |
1129 | There will always be the keys 'pending' and 'completed', which will |
|
1125 | There will always be the keys 'pending' and 'completed', which will | |
1130 | be lists of msg_ids. |
|
1126 | be lists of msg_ids. | |
1131 | """ |
|
1127 | """ | |
1132 | if not isinstance(msg_ids, (list,tuple)): |
|
1128 | if not isinstance(msg_ids, (list,tuple)): | |
1133 | msg_ids = [msg_ids] |
|
1129 | msg_ids = [msg_ids] | |
1134 | theids = [] |
|
1130 | theids = [] | |
1135 | for msg_id in msg_ids: |
|
1131 | for msg_id in msg_ids: | |
1136 | if isinstance(msg_id, int): |
|
1132 | if isinstance(msg_id, int): | |
1137 | msg_id = self.history[msg_id] |
|
1133 | msg_id = self.history[msg_id] | |
1138 | if not isinstance(msg_id, str): |
|
1134 | if not isinstance(msg_id, str): | |
1139 | raise TypeError("msg_ids must be str, not %r"%msg_id) |
|
1135 | raise TypeError("msg_ids must be str, not %r"%msg_id) | |
1140 | theids.append(msg_id) |
|
1136 | theids.append(msg_id) | |
1141 |
|
1137 | |||
1142 | completed = [] |
|
1138 | completed = [] | |
1143 | local_results = {} |
|
1139 | local_results = {} | |
1144 |
|
1140 | |||
1145 | # comment this block out to temporarily disable local shortcut: |
|
1141 | # comment this block out to temporarily disable local shortcut: | |
1146 | for msg_id in list(theids): |
|
1142 | for msg_id in list(theids): | |
1147 | if msg_id in self.results: |
|
1143 | if msg_id in self.results: | |
1148 | completed.append(msg_id) |
|
1144 | completed.append(msg_id) | |
1149 | local_results[msg_id] = self.results[msg_id] |
|
1145 | local_results[msg_id] = self.results[msg_id] | |
1150 | theids.remove(msg_id) |
|
1146 | theids.remove(msg_id) | |
1151 |
|
1147 | |||
1152 | if theids: # some not locally cached |
|
1148 | if theids: # some not locally cached | |
1153 | content = dict(msg_ids=theids, status_only=status_only) |
|
1149 | content = dict(msg_ids=theids, status_only=status_only) | |
1154 | msg = self.session.send(self._query_socket, "result_request", content=content) |
|
1150 | msg = self.session.send(self._query_socket, "result_request", content=content) | |
1155 | zmq.select([self._query_socket], [], []) |
|
1151 | zmq.select([self._query_socket], [], []) | |
1156 | idents,msg = self.session.recv(self._query_socket, zmq.NOBLOCK) |
|
1152 | idents,msg = self.session.recv(self._query_socket, zmq.NOBLOCK) | |
1157 | if self.debug: |
|
1153 | if self.debug: | |
1158 | pprint(msg) |
|
1154 | pprint(msg) | |
1159 | content = msg['content'] |
|
1155 | content = msg['content'] | |
1160 | if content['status'] != 'ok': |
|
1156 | if content['status'] != 'ok': | |
1161 | raise ss.unwrap_exception(content) |
|
1157 | raise ss.unwrap_exception(content) | |
1162 | buffers = msg['buffers'] |
|
1158 | buffers = msg['buffers'] | |
1163 | else: |
|
1159 | else: | |
1164 | content = dict(completed=[],pending=[]) |
|
1160 | content = dict(completed=[],pending=[]) | |
1165 |
|
1161 | |||
1166 | content['completed'].extend(completed) |
|
1162 | content['completed'].extend(completed) | |
1167 |
|
1163 | |||
1168 | if status_only: |
|
1164 | if status_only: | |
1169 | return content |
|
1165 | return content | |
1170 |
|
1166 | |||
1171 | failures = [] |
|
1167 | failures = [] | |
1172 | # load cached results into result: |
|
1168 | # load cached results into result: | |
1173 | content.update(local_results) |
|
1169 | content.update(local_results) | |
1174 | # update cache with results: |
|
1170 | # update cache with results: | |
1175 | for msg_id in sorted(theids): |
|
1171 | for msg_id in sorted(theids): | |
1176 | if msg_id in content['completed']: |
|
1172 | if msg_id in content['completed']: | |
1177 | rec = content[msg_id] |
|
1173 | rec = content[msg_id] | |
1178 | parent = rec['header'] |
|
1174 | parent = rec['header'] | |
1179 | header = rec['result_header'] |
|
1175 | header = rec['result_header'] | |
1180 | rcontent = rec['result_content'] |
|
1176 | rcontent = rec['result_content'] | |
1181 | iodict = rec['io'] |
|
1177 | iodict = rec['io'] | |
1182 | if isinstance(rcontent, str): |
|
1178 | if isinstance(rcontent, str): | |
1183 | rcontent = self.session.unpack(rcontent) |
|
1179 | rcontent = self.session.unpack(rcontent) | |
1184 |
|
1180 | |||
1185 | md = self.metadata.setdefault(msg_id, Metadata()) |
|
1181 | md = self.metadata.setdefault(msg_id, Metadata()) | |
1186 | md.update(self._extract_metadata(header, parent, rcontent)) |
|
1182 | md.update(self._extract_metadata(header, parent, rcontent)) | |
1187 | md.update(iodict) |
|
1183 | md.update(iodict) | |
1188 |
|
1184 | |||
1189 | if rcontent['status'] == 'ok': |
|
1185 | if rcontent['status'] == 'ok': | |
1190 | res,buffers = ss.unserialize_object(buffers) |
|
1186 | res,buffers = ss.unserialize_object(buffers) | |
1191 | else: |
|
1187 | else: | |
1192 | res = ss.unwrap_exception(rcontent) |
|
1188 | res = ss.unwrap_exception(rcontent) | |
1193 | failures.append(res) |
|
1189 | failures.append(res) | |
1194 |
|
1190 | |||
1195 | self.results[msg_id] = res |
|
1191 | self.results[msg_id] = res | |
1196 | content[msg_id] = res |
|
1192 | content[msg_id] = res | |
1197 |
|
1193 | |||
1198 | error.collect_exceptions(failures, "get_results") |
|
1194 | error.collect_exceptions(failures, "get_results") | |
1199 | return content |
|
1195 | return content | |
1200 |
|
1196 | |||
1201 | @spinfirst |
|
1197 | @spinfirst | |
1202 | def queue_status(self, targets=None, verbose=False): |
|
1198 | def queue_status(self, targets=None, verbose=False): | |
1203 | """Fetch the status of engine queues. |
|
1199 | """Fetch the status of engine queues. | |
1204 |
|
1200 | |||
1205 | Parameters |
|
1201 | Parameters | |
1206 | ---------- |
|
1202 | ---------- | |
1207 | targets : int/str/list of ints/strs |
|
1203 | targets : int/str/list of ints/strs | |
1208 | the engines on which to execute |
|
1204 | the engines on which to execute | |
1209 | default : all |
|
1205 | default : all | |
1210 | verbose : bool |
|
1206 | verbose : bool | |
1211 | Whether to return lengths only, or lists of ids for each element |
|
1207 | Whether to return lengths only, or lists of ids for each element | |
1212 | """ |
|
1208 | """ | |
1213 | targets = self._build_targets(targets)[1] |
|
1209 | targets = self._build_targets(targets)[1] | |
1214 | content = dict(targets=targets, verbose=verbose) |
|
1210 | content = dict(targets=targets, verbose=verbose) | |
1215 | self.session.send(self._query_socket, "queue_request", content=content) |
|
1211 | self.session.send(self._query_socket, "queue_request", content=content) | |
1216 | idents,msg = self.session.recv(self._query_socket, 0) |
|
1212 | idents,msg = self.session.recv(self._query_socket, 0) | |
1217 | if self.debug: |
|
1213 | if self.debug: | |
1218 | pprint(msg) |
|
1214 | pprint(msg) | |
1219 | content = msg['content'] |
|
1215 | content = msg['content'] | |
1220 | status = content.pop('status') |
|
1216 | status = content.pop('status') | |
1221 | if status != 'ok': |
|
1217 | if status != 'ok': | |
1222 | raise ss.unwrap_exception(content) |
|
1218 | raise ss.unwrap_exception(content) | |
1223 | return ss.rekey(content) |
|
1219 | return ss.rekey(content) | |
1224 |
|
1220 | |||
1225 | @spinfirst |
|
1221 | @spinfirst | |
1226 | def purge_results(self, msg_ids=[], targets=[]): |
|
1222 | def purge_results(self, msg_ids=[], targets=[]): | |
1227 | """Tell the controller to forget results. |
|
1223 | """Tell the controller to forget results. | |
1228 |
|
1224 | |||
1229 | Individual results can be purged by msg_id, or the entire |
|
1225 | Individual results can be purged by msg_id, or the entire | |
1230 | history of specific targets can be purged. |
|
1226 | history of specific targets can be purged. | |
1231 |
|
1227 | |||
1232 | Parameters |
|
1228 | Parameters | |
1233 | ---------- |
|
1229 | ---------- | |
1234 | msg_ids : str or list of strs |
|
1230 | msg_ids : str or list of strs | |
1235 | the msg_ids whose results should be forgotten. |
|
1231 | the msg_ids whose results should be forgotten. | |
1236 | targets : int/str/list of ints/strs |
|
1232 | targets : int/str/list of ints/strs | |
1237 | The targets, by uuid or int_id, whose entire history is to be purged. |
|
1233 | The targets, by uuid or int_id, whose entire history is to be purged. | |
1238 | Use `targets='all'` to scrub everything from the controller's memory. |
|
1234 | Use `targets='all'` to scrub everything from the controller's memory. | |
1239 |
|
1235 | |||
1240 | default : None |
|
1236 | default : None | |
1241 | """ |
|
1237 | """ | |
1242 | if not targets and not msg_ids: |
|
1238 | if not targets and not msg_ids: | |
1243 | raise ValueError |
|
1239 | raise ValueError | |
1244 | if targets: |
|
1240 | if targets: | |
1245 | targets = self._build_targets(targets)[1] |
|
1241 | targets = self._build_targets(targets)[1] | |
1246 | content = dict(targets=targets, msg_ids=msg_ids) |
|
1242 | content = dict(targets=targets, msg_ids=msg_ids) | |
1247 | self.session.send(self._query_socket, "purge_request", content=content) |
|
1243 | self.session.send(self._query_socket, "purge_request", content=content) | |
1248 | idents, msg = self.session.recv(self._query_socket, 0) |
|
1244 | idents, msg = self.session.recv(self._query_socket, 0) | |
1249 | if self.debug: |
|
1245 | if self.debug: | |
1250 | pprint(msg) |
|
1246 | pprint(msg) | |
1251 | content = msg['content'] |
|
1247 | content = msg['content'] | |
1252 | if content['status'] != 'ok': |
|
1248 | if content['status'] != 'ok': | |
1253 | raise ss.unwrap_exception(content) |
|
1249 | raise ss.unwrap_exception(content) | |
1254 |
|
1250 | |||
1255 | #---------------------------------------- |
|
1251 | #---------------------------------------- | |
1256 | # activate for %px,%autopx magics |
|
1252 | # activate for %px,%autopx magics | |
1257 | #---------------------------------------- |
|
1253 | #---------------------------------------- | |
1258 | def activate(self): |
|
1254 | def activate(self): | |
1259 | """Make this `View` active for parallel magic commands. |
|
1255 | """Make this `View` active for parallel magic commands. | |
1260 |
|
1256 | |||
1261 | IPython has a magic command syntax to work with `MultiEngineClient` objects. |
|
1257 | IPython has a magic command syntax to work with `MultiEngineClient` objects. | |
1262 | In a given IPython session there is a single active one. While |
|
1258 | In a given IPython session there is a single active one. While | |
1263 | there can be many `Views` created and used by the user, |
|
1259 | there can be many `Views` created and used by the user, | |
1264 | there is only one active one. The active `View` is used whenever |
|
1260 | there is only one active one. The active `View` is used whenever | |
1265 | the magic commands %px and %autopx are used. |
|
1261 | the magic commands %px and %autopx are used. | |
1266 |
|
1262 | |||
1267 | The activate() method is called on a given `View` to make it |
|
1263 | The activate() method is called on a given `View` to make it | |
1268 | active. Once this has been done, the magic commands can be used. |
|
1264 | active. Once this has been done, the magic commands can be used. | |
1269 | """ |
|
1265 | """ | |
1270 |
|
1266 | |||
1271 | try: |
|
1267 | try: | |
1272 | # This is injected into __builtins__. |
|
1268 | # This is injected into __builtins__. | |
1273 | ip = get_ipython() |
|
1269 | ip = get_ipython() | |
1274 | except NameError: |
|
1270 | except NameError: | |
1275 | print "The IPython parallel magics (%result, %px, %autopx) only work within IPython." |
|
1271 | print "The IPython parallel magics (%result, %px, %autopx) only work within IPython." | |
1276 | else: |
|
1272 | else: | |
1277 | pmagic = ip.plugin_manager.get_plugin('parallelmagic') |
|
1273 | pmagic = ip.plugin_manager.get_plugin('parallelmagic') | |
1278 | if pmagic is not None: |
|
1274 | if pmagic is not None: | |
1279 | pmagic.active_multiengine_client = self |
|
1275 | pmagic.active_multiengine_client = self | |
1280 | else: |
|
1276 | else: | |
1281 | print "You must first load the parallelmagic extension " \ |
|
1277 | print "You must first load the parallelmagic extension " \ | |
1282 | "by doing '%load_ext parallelmagic'" |
|
1278 | "by doing '%load_ext parallelmagic'" | |
1283 |
|
1279 | |||
1284 | class AsynClient(Client): |
|
1280 | class AsynClient(Client): | |
1285 | """An Asynchronous client, using the Tornado Event Loop. |
|
1281 | """An Asynchronous client, using the Tornado Event Loop. | |
1286 | !!!unfinished!!!""" |
|
1282 | !!!unfinished!!!""" | |
1287 | io_loop = None |
|
1283 | io_loop = None | |
1288 | _queue_stream = None |
|
1284 | _queue_stream = None | |
1289 | _notifier_stream = None |
|
1285 | _notifier_stream = None | |
1290 | _task_stream = None |
|
1286 | _task_stream = None | |
1291 | _control_stream = None |
|
1287 | _control_stream = None | |
1292 |
|
1288 | |||
1293 | def __init__(self, addr, context=None, username=None, debug=False, io_loop=None): |
|
1289 | def __init__(self, addr, context=None, username=None, debug=False, io_loop=None): | |
1294 | Client.__init__(self, addr, context, username, debug) |
|
1290 | Client.__init__(self, addr, context, username, debug) | |
1295 | if io_loop is None: |
|
1291 | if io_loop is None: | |
1296 | io_loop = ioloop.IOLoop.instance() |
|
1292 | io_loop = ioloop.IOLoop.instance() | |
1297 | self.io_loop = io_loop |
|
1293 | self.io_loop = io_loop | |
1298 |
|
1294 | |||
1299 | self._queue_stream = zmqstream.ZMQStream(self._mux_socket, io_loop) |
|
1295 | self._queue_stream = zmqstream.ZMQStream(self._mux_socket, io_loop) | |
1300 | self._control_stream = zmqstream.ZMQStream(self._control_socket, io_loop) |
|
1296 | self._control_stream = zmqstream.ZMQStream(self._control_socket, io_loop) | |
1301 | self._task_stream = zmqstream.ZMQStream(self._task_socket, io_loop) |
|
1297 | self._task_stream = zmqstream.ZMQStream(self._task_socket, io_loop) | |
1302 | self._notification_stream = zmqstream.ZMQStream(self._notification_socket, io_loop) |
|
1298 | self._notification_stream = zmqstream.ZMQStream(self._notification_socket, io_loop) | |
1303 |
|
1299 | |||
1304 | def spin(self): |
|
1300 | def spin(self): | |
1305 | for stream in (self.queue_stream, self.notifier_stream, |
|
1301 | for stream in (self.queue_stream, self.notifier_stream, | |
1306 | self.task_stream, self.control_stream): |
|
1302 | self.task_stream, self.control_stream): | |
1307 | stream.flush() |
|
1303 | stream.flush() | |
1308 |
|
1304 | |||
1309 | __all__ = [ 'Client', |
|
1305 | __all__ = [ 'Client', | |
1310 | 'depend', |
|
1306 | 'depend', | |
1311 | 'require', |
|
1307 | 'require', | |
1312 | 'remote', |
|
1308 | 'remote', | |
1313 | 'parallel', |
|
1309 | 'parallel', | |
1314 | 'RemoteFunction', |
|
1310 | 'RemoteFunction', | |
1315 | 'ParallelFunction', |
|
1311 | 'ParallelFunction', | |
1316 | 'DirectView', |
|
1312 | 'DirectView', | |
1317 | 'LoadBalancedView', |
|
1313 | 'LoadBalancedView', | |
1318 | 'AsyncResult', |
|
1314 | 'AsyncResult', | |
1319 | 'AsyncMapResult' |
|
1315 | 'AsyncMapResult' | |
1320 | ] |
|
1316 | ] |
@@ -1,111 +1,110 b'' | |||||
1 | """Dependency utilities""" |
|
1 | """Dependency utilities""" | |
2 |
|
2 | |||
3 | from IPython.external.decorator import decorator |
|
3 | from IPython.external.decorator import decorator | |
4 | from error import UnmetDependency |
|
4 | from error import UnmetDependency | |
5 |
|
5 | from asyncresult import AsyncResult | ||
6 |
|
||||
7 | # flags |
|
|||
8 | ALL = 1 << 0 |
|
|||
9 | ANY = 1 << 1 |
|
|||
10 | HERE = 1 << 2 |
|
|||
11 | ANYWHERE = 1 << 3 |
|
|||
12 |
|
6 | |||
13 |
|
7 | |||
14 | class depend(object): |
|
8 | class depend(object): | |
15 | """Dependency decorator, for use with tasks.""" |
|
9 | """Dependency decorator, for use with tasks.""" | |
16 | def __init__(self, f, *args, **kwargs): |
|
10 | def __init__(self, f, *args, **kwargs): | |
17 | self.f = f |
|
11 | self.f = f | |
18 | self.args = args |
|
12 | self.args = args | |
19 | self.kwargs = kwargs |
|
13 | self.kwargs = kwargs | |
20 |
|
14 | |||
21 | def __call__(self, f): |
|
15 | def __call__(self, f): | |
22 | return dependent(f, self.f, *self.args, **self.kwargs) |
|
16 | return dependent(f, self.f, *self.args, **self.kwargs) | |
23 |
|
17 | |||
24 | class dependent(object): |
|
18 | class dependent(object): | |
25 | """A function that depends on another function. |
|
19 | """A function that depends on another function. | |
26 | This is an object to prevent the closure used |
|
20 | This is an object to prevent the closure used | |
27 | in traditional decorators, which are not picklable. |
|
21 | in traditional decorators, which are not picklable. | |
28 | """ |
|
22 | """ | |
29 |
|
23 | |||
30 | def __init__(self, f, df, *dargs, **dkwargs): |
|
24 | def __init__(self, f, df, *dargs, **dkwargs): | |
31 | self.f = f |
|
25 | self.f = f | |
32 | self.func_name = getattr(f, '__name__', 'f') |
|
26 | self.func_name = getattr(f, '__name__', 'f') | |
33 | self.df = df |
|
27 | self.df = df | |
34 | self.dargs = dargs |
|
28 | self.dargs = dargs | |
35 | self.dkwargs = dkwargs |
|
29 | self.dkwargs = dkwargs | |
36 |
|
30 | |||
37 | def __call__(self, *args, **kwargs): |
|
31 | def __call__(self, *args, **kwargs): | |
38 | if self.df(*self.dargs, **self.dkwargs) is False: |
|
32 | if self.df(*self.dargs, **self.dkwargs) is False: | |
39 | raise UnmetDependency() |
|
33 | raise UnmetDependency() | |
40 | return self.f(*args, **kwargs) |
|
34 | return self.f(*args, **kwargs) | |
41 |
|
35 | |||
42 | @property |
|
36 | @property | |
43 | def __name__(self): |
|
37 | def __name__(self): | |
44 | return self.func_name |
|
38 | return self.func_name | |
45 |
|
39 | |||
46 | def _require(*names): |
|
40 | def _require(*names): | |
47 | for name in names: |
|
41 | for name in names: | |
48 | try: |
|
42 | try: | |
49 | __import__(name) |
|
43 | __import__(name) | |
50 | except ImportError: |
|
44 | except ImportError: | |
51 | return False |
|
45 | return False | |
52 | return True |
|
46 | return True | |
53 |
|
47 | |||
54 | def require(*names): |
|
48 | def require(*names): | |
55 | return depend(_require, *names) |
|
49 | return depend(_require, *names) | |
56 |
|
50 | |||
57 | class Dependency(set): |
|
51 | class Dependency(set): | |
58 | """An object for representing a set of msg_id dependencies. |
|
52 | """An object for representing a set of msg_id dependencies. | |
59 |
|
53 | |||
60 | Subclassed from set().""" |
|
54 | Subclassed from set().""" | |
61 |
|
55 | |||
62 | mode='all' |
|
56 | all=True | |
63 | success_only=True |
|
57 | success_only=True | |
64 |
|
58 | |||
65 |
def __init__(self, dependencies=[], |
|
59 | def __init__(self, dependencies=[], all=True, success_only=True): | |
66 | if isinstance(dependencies, dict): |
|
60 | if isinstance(dependencies, dict): | |
67 | # load from dict |
|
61 | # load from dict | |
68 |
|
|
62 | all = dependencies.get('all', True) | |
69 | success_only = dependencies.get('success_only', success_only) |
|
63 | success_only = dependencies.get('success_only', success_only) | |
70 | dependencies = dependencies.get('dependencies', []) |
|
64 | dependencies = dependencies.get('dependencies', []) | |
71 | set.__init__(self, dependencies) |
|
65 | ids = [] | |
72 | self.mode = mode.lower() |
|
66 | if isinstance(dependencies, AsyncResult): | |
|
67 | ids.extend(AsyncResult.msg_ids) | |||
|
68 | else: | |||
|
69 | for d in dependencies: | |||
|
70 | if isinstance(d, basestring): | |||
|
71 | ids.append(d) | |||
|
72 | elif isinstance(d, AsyncResult): | |||
|
73 | ids.extend(d.msg_ids) | |||
|
74 | else: | |||
|
75 | raise TypeError("invalid dependency type: %r"%type(d)) | |||
|
76 | set.__init__(self, ids) | |||
|
77 | self.all = all | |||
73 | self.success_only=success_only |
|
78 | self.success_only=success_only | |
74 | if self.mode not in ('any', 'all'): |
|
|||
75 | raise NotImplementedError("Only any|all supported, not %r"%mode) |
|
|||
76 |
|
79 | |||
77 | def check(self, completed, failed=None): |
|
80 | def check(self, completed, failed=None): | |
78 | if failed is not None and not self.success_only: |
|
81 | if failed is not None and not self.success_only: | |
79 | completed = completed.union(failed) |
|
82 | completed = completed.union(failed) | |
80 | if len(self) == 0: |
|
83 | if len(self) == 0: | |
81 | return True |
|
84 | return True | |
82 |
if self. |
|
85 | if self.all: | |
83 | return self.issubset(completed) |
|
86 | return self.issubset(completed) | |
84 | elif self.mode == 'any': |
|
|||
85 | return not self.isdisjoint(completed) |
|
|||
86 | else: |
|
87 | else: | |
87 | raise NotImplementedError("Only any|all supported, not %r"%mode) |
|
88 | return not self.isdisjoint(completed) | |
88 |
|
89 | |||
89 | def unreachable(self, failed): |
|
90 | def unreachable(self, failed): | |
90 | if len(self) == 0 or len(failed) == 0 or not self.success_only: |
|
91 | if len(self) == 0 or len(failed) == 0 or not self.success_only: | |
91 | return False |
|
92 | return False | |
92 |
print self, self.success_only, self. |
|
93 | # print self, self.success_only, self.all, failed | |
93 |
if self. |
|
94 | if self.all: | |
94 | return not self.isdisjoint(failed) |
|
95 | return not self.isdisjoint(failed) | |
95 | elif self.mode == 'any': |
|
|||
96 | return self.issubset(failed) |
|
|||
97 | else: |
|
96 | else: | |
98 | raise NotImplementedError("Only any|all supported, not %r"%mode) |
|
97 | return self.issubset(failed) | |
99 |
|
98 | |||
100 |
|
99 | |||
101 | def as_dict(self): |
|
100 | def as_dict(self): | |
102 | """Represent this dependency as a dict. For json compatibility.""" |
|
101 | """Represent this dependency as a dict. For json compatibility.""" | |
103 | return dict( |
|
102 | return dict( | |
104 | dependencies=list(self), |
|
103 | dependencies=list(self), | |
105 |
|
|
104 | all=self.all, | |
106 | success_only=self.success_only, |
|
105 | success_only=self.success_only, | |
107 | ) |
|
106 | ) | |
108 |
|
107 | |||
109 |
|
108 | |||
110 | __all__ = ['depend', 'require', 'Dependency'] |
|
109 | __all__ = ['depend', 'require', 'dependent', 'Dependency'] | |
111 |
|
110 |
@@ -1,292 +1,295 b'' | |||||
1 | # encoding: utf-8 |
|
1 | # encoding: utf-8 | |
2 |
|
2 | |||
3 | """Classes and functions for kernel related errors and exceptions.""" |
|
3 | """Classes and functions for kernel related errors and exceptions.""" | |
4 | from __future__ import print_function |
|
4 | from __future__ import print_function | |
5 |
|
5 | |||
6 | __docformat__ = "restructuredtext en" |
|
6 | __docformat__ = "restructuredtext en" | |
7 |
|
7 | |||
8 | # Tell nose to skip this module |
|
8 | # Tell nose to skip this module | |
9 | __test__ = {} |
|
9 | __test__ = {} | |
10 |
|
10 | |||
11 | #------------------------------------------------------------------------------- |
|
11 | #------------------------------------------------------------------------------- | |
12 | # Copyright (C) 2008 The IPython Development Team |
|
12 | # Copyright (C) 2008 The IPython Development Team | |
13 | # |
|
13 | # | |
14 | # Distributed under the terms of the BSD License. The full license is in |
|
14 | # Distributed under the terms of the BSD License. The full license is in | |
15 | # the file COPYING, distributed as part of this software. |
|
15 | # the file COPYING, distributed as part of this software. | |
16 | #------------------------------------------------------------------------------- |
|
16 | #------------------------------------------------------------------------------- | |
17 |
|
17 | |||
18 | #------------------------------------------------------------------------------- |
|
18 | #------------------------------------------------------------------------------- | |
19 | # Error classes |
|
19 | # Error classes | |
20 | #------------------------------------------------------------------------------- |
|
20 | #------------------------------------------------------------------------------- | |
21 | class IPythonError(Exception): |
|
21 | class IPythonError(Exception): | |
22 | """Base exception that all of our exceptions inherit from. |
|
22 | """Base exception that all of our exceptions inherit from. | |
23 |
|
23 | |||
24 | This can be raised by code that doesn't have any more specific |
|
24 | This can be raised by code that doesn't have any more specific | |
25 | information.""" |
|
25 | information.""" | |
26 |
|
26 | |||
27 | pass |
|
27 | pass | |
28 |
|
28 | |||
29 | # Exceptions associated with the controller objects |
|
29 | # Exceptions associated with the controller objects | |
30 | class ControllerError(IPythonError): pass |
|
30 | class ControllerError(IPythonError): pass | |
31 |
|
31 | |||
32 | class ControllerCreationError(ControllerError): pass |
|
32 | class ControllerCreationError(ControllerError): pass | |
33 |
|
33 | |||
34 |
|
34 | |||
35 | # Exceptions associated with the Engines |
|
35 | # Exceptions associated with the Engines | |
36 | class EngineError(IPythonError): pass |
|
36 | class EngineError(IPythonError): pass | |
37 |
|
37 | |||
38 | class EngineCreationError(EngineError): pass |
|
38 | class EngineCreationError(EngineError): pass | |
39 |
|
39 | |||
40 | class KernelError(IPythonError): |
|
40 | class KernelError(IPythonError): | |
41 | pass |
|
41 | pass | |
42 |
|
42 | |||
43 | class NotDefined(KernelError): |
|
43 | class NotDefined(KernelError): | |
44 | def __init__(self, name): |
|
44 | def __init__(self, name): | |
45 | self.name = name |
|
45 | self.name = name | |
46 | self.args = (name,) |
|
46 | self.args = (name,) | |
47 |
|
47 | |||
48 | def __repr__(self): |
|
48 | def __repr__(self): | |
49 | return '<NotDefined: %s>' % self.name |
|
49 | return '<NotDefined: %s>' % self.name | |
50 |
|
50 | |||
51 | __str__ = __repr__ |
|
51 | __str__ = __repr__ | |
52 |
|
52 | |||
53 |
|
53 | |||
54 | class QueueCleared(KernelError): |
|
54 | class QueueCleared(KernelError): | |
55 | pass |
|
55 | pass | |
56 |
|
56 | |||
57 |
|
57 | |||
58 | class IdInUse(KernelError): |
|
58 | class IdInUse(KernelError): | |
59 | pass |
|
59 | pass | |
60 |
|
60 | |||
61 |
|
61 | |||
62 | class ProtocolError(KernelError): |
|
62 | class ProtocolError(KernelError): | |
63 | pass |
|
63 | pass | |
64 |
|
64 | |||
65 |
|
65 | |||
66 | class ConnectionError(KernelError): |
|
66 | class ConnectionError(KernelError): | |
67 | pass |
|
67 | pass | |
68 |
|
68 | |||
69 |
|
69 | |||
70 | class InvalidEngineID(KernelError): |
|
70 | class InvalidEngineID(KernelError): | |
71 | pass |
|
71 | pass | |
72 |
|
72 | |||
73 |
|
73 | |||
74 | class NoEnginesRegistered(KernelError): |
|
74 | class NoEnginesRegistered(KernelError): | |
75 | pass |
|
75 | pass | |
76 |
|
76 | |||
77 |
|
77 | |||
78 | class InvalidClientID(KernelError): |
|
78 | class InvalidClientID(KernelError): | |
79 | pass |
|
79 | pass | |
80 |
|
80 | |||
81 |
|
81 | |||
82 | class InvalidDeferredID(KernelError): |
|
82 | class InvalidDeferredID(KernelError): | |
83 | pass |
|
83 | pass | |
84 |
|
84 | |||
85 |
|
85 | |||
86 | class SerializationError(KernelError): |
|
86 | class SerializationError(KernelError): | |
87 | pass |
|
87 | pass | |
88 |
|
88 | |||
89 |
|
89 | |||
90 | class MessageSizeError(KernelError): |
|
90 | class MessageSizeError(KernelError): | |
91 | pass |
|
91 | pass | |
92 |
|
92 | |||
93 |
|
93 | |||
94 | class PBMessageSizeError(MessageSizeError): |
|
94 | class PBMessageSizeError(MessageSizeError): | |
95 | pass |
|
95 | pass | |
96 |
|
96 | |||
97 |
|
97 | |||
98 | class ResultNotCompleted(KernelError): |
|
98 | class ResultNotCompleted(KernelError): | |
99 | pass |
|
99 | pass | |
100 |
|
100 | |||
101 |
|
101 | |||
102 | class ResultAlreadyRetrieved(KernelError): |
|
102 | class ResultAlreadyRetrieved(KernelError): | |
103 | pass |
|
103 | pass | |
104 |
|
104 | |||
105 | class ClientError(KernelError): |
|
105 | class ClientError(KernelError): | |
106 | pass |
|
106 | pass | |
107 |
|
107 | |||
108 |
|
108 | |||
109 | class TaskAborted(KernelError): |
|
109 | class TaskAborted(KernelError): | |
110 | pass |
|
110 | pass | |
111 |
|
111 | |||
112 |
|
112 | |||
113 | class TaskTimeout(KernelError): |
|
113 | class TaskTimeout(KernelError): | |
114 | pass |
|
114 | pass | |
115 |
|
115 | |||
116 |
|
116 | |||
117 | class NotAPendingResult(KernelError): |
|
117 | class NotAPendingResult(KernelError): | |
118 | pass |
|
118 | pass | |
119 |
|
119 | |||
120 |
|
120 | |||
121 | class UnpickleableException(KernelError): |
|
121 | class UnpickleableException(KernelError): | |
122 | pass |
|
122 | pass | |
123 |
|
123 | |||
124 |
|
124 | |||
125 | class AbortedPendingDeferredError(KernelError): |
|
125 | class AbortedPendingDeferredError(KernelError): | |
126 | pass |
|
126 | pass | |
127 |
|
127 | |||
128 |
|
128 | |||
129 | class InvalidProperty(KernelError): |
|
129 | class InvalidProperty(KernelError): | |
130 | pass |
|
130 | pass | |
131 |
|
131 | |||
132 |
|
132 | |||
133 | class MissingBlockArgument(KernelError): |
|
133 | class MissingBlockArgument(KernelError): | |
134 | pass |
|
134 | pass | |
135 |
|
135 | |||
136 |
|
136 | |||
137 | class StopLocalExecution(KernelError): |
|
137 | class StopLocalExecution(KernelError): | |
138 | pass |
|
138 | pass | |
139 |
|
139 | |||
140 |
|
140 | |||
141 | class SecurityError(KernelError): |
|
141 | class SecurityError(KernelError): | |
142 | pass |
|
142 | pass | |
143 |
|
143 | |||
144 |
|
144 | |||
145 | class FileTimeoutError(KernelError): |
|
145 | class FileTimeoutError(KernelError): | |
146 | pass |
|
146 | pass | |
147 |
|
147 | |||
148 | class TimeoutError(KernelError): |
|
148 | class TimeoutError(KernelError): | |
149 | pass |
|
149 | pass | |
150 |
|
150 | |||
151 | class UnmetDependency(KernelError): |
|
151 | class UnmetDependency(KernelError): | |
152 | pass |
|
152 | pass | |
153 |
|
153 | |||
154 | class ImpossibleDependency(UnmetDependency): |
|
154 | class ImpossibleDependency(UnmetDependency): | |
155 | pass |
|
155 | pass | |
156 |
|
156 | |||
157 |
class DependencyTimeout( |
|
157 | class DependencyTimeout(ImpossibleDependency): | |
|
158 | pass | |||
|
159 | ||||
|
160 | class InvalidDependency(ImpossibleDependency): | |||
158 | pass |
|
161 | pass | |
159 |
|
162 | |||
160 | class RemoteError(KernelError): |
|
163 | class RemoteError(KernelError): | |
161 | """Error raised elsewhere""" |
|
164 | """Error raised elsewhere""" | |
162 | ename=None |
|
165 | ename=None | |
163 | evalue=None |
|
166 | evalue=None | |
164 | traceback=None |
|
167 | traceback=None | |
165 | engine_info=None |
|
168 | engine_info=None | |
166 |
|
169 | |||
167 | def __init__(self, ename, evalue, traceback, engine_info=None): |
|
170 | def __init__(self, ename, evalue, traceback, engine_info=None): | |
168 | self.ename=ename |
|
171 | self.ename=ename | |
169 | self.evalue=evalue |
|
172 | self.evalue=evalue | |
170 | self.traceback=traceback |
|
173 | self.traceback=traceback | |
171 | self.engine_info=engine_info or {} |
|
174 | self.engine_info=engine_info or {} | |
172 | self.args=(ename, evalue) |
|
175 | self.args=(ename, evalue) | |
173 |
|
176 | |||
174 | def __repr__(self): |
|
177 | def __repr__(self): | |
175 | engineid = self.engine_info.get('engineid', ' ') |
|
178 | engineid = self.engine_info.get('engineid', ' ') | |
176 | return "<Remote[%s]:%s(%s)>"%(engineid, self.ename, self.evalue) |
|
179 | return "<Remote[%s]:%s(%s)>"%(engineid, self.ename, self.evalue) | |
177 |
|
180 | |||
178 | def __str__(self): |
|
181 | def __str__(self): | |
179 | sig = "%s(%s)"%(self.ename, self.evalue) |
|
182 | sig = "%s(%s)"%(self.ename, self.evalue) | |
180 | if self.traceback: |
|
183 | if self.traceback: | |
181 | return sig + '\n' + self.traceback |
|
184 | return sig + '\n' + self.traceback | |
182 | else: |
|
185 | else: | |
183 | return sig |
|
186 | return sig | |
184 |
|
187 | |||
185 |
|
188 | |||
186 | class TaskRejectError(KernelError): |
|
189 | class TaskRejectError(KernelError): | |
187 | """Exception to raise when a task should be rejected by an engine. |
|
190 | """Exception to raise when a task should be rejected by an engine. | |
188 |
|
191 | |||
189 | This exception can be used to allow a task running on an engine to test |
|
192 | This exception can be used to allow a task running on an engine to test | |
190 | if the engine (or the user's namespace on the engine) has the needed |
|
193 | if the engine (or the user's namespace on the engine) has the needed | |
191 | task dependencies. If not, the task should raise this exception. For |
|
194 | task dependencies. If not, the task should raise this exception. For | |
192 | the task to be retried on another engine, the task should be created |
|
195 | the task to be retried on another engine, the task should be created | |
193 | with the `retries` argument > 1. |
|
196 | with the `retries` argument > 1. | |
194 |
|
197 | |||
195 | The advantage of this approach over our older properties system is that |
|
198 | The advantage of this approach over our older properties system is that | |
196 | tasks have full access to the user's namespace on the engines and the |
|
199 | tasks have full access to the user's namespace on the engines and the | |
197 | properties don't have to be managed or tested by the controller. |
|
200 | properties don't have to be managed or tested by the controller. | |
198 | """ |
|
201 | """ | |
199 |
|
202 | |||
200 |
|
203 | |||
201 | class CompositeError(KernelError): |
|
204 | class CompositeError(KernelError): | |
202 | """Error for representing possibly multiple errors on engines""" |
|
205 | """Error for representing possibly multiple errors on engines""" | |
203 | def __init__(self, message, elist): |
|
206 | def __init__(self, message, elist): | |
204 | Exception.__init__(self, *(message, elist)) |
|
207 | Exception.__init__(self, *(message, elist)) | |
205 | # Don't use pack_exception because it will conflict with the .message |
|
208 | # Don't use pack_exception because it will conflict with the .message | |
206 | # attribute that is being deprecated in 2.6 and beyond. |
|
209 | # attribute that is being deprecated in 2.6 and beyond. | |
207 | self.msg = message |
|
210 | self.msg = message | |
208 | self.elist = elist |
|
211 | self.elist = elist | |
209 | self.args = [ e[0] for e in elist ] |
|
212 | self.args = [ e[0] for e in elist ] | |
210 |
|
213 | |||
211 | def _get_engine_str(self, ei): |
|
214 | def _get_engine_str(self, ei): | |
212 | if not ei: |
|
215 | if not ei: | |
213 | return '[Engine Exception]' |
|
216 | return '[Engine Exception]' | |
214 | else: |
|
217 | else: | |
215 | return '[%i:%s]: ' % (ei['engineid'], ei['method']) |
|
218 | return '[%i:%s]: ' % (ei['engineid'], ei['method']) | |
216 |
|
219 | |||
217 | def _get_traceback(self, ev): |
|
220 | def _get_traceback(self, ev): | |
218 | try: |
|
221 | try: | |
219 | tb = ev._ipython_traceback_text |
|
222 | tb = ev._ipython_traceback_text | |
220 | except AttributeError: |
|
223 | except AttributeError: | |
221 | return 'No traceback available' |
|
224 | return 'No traceback available' | |
222 | else: |
|
225 | else: | |
223 | return tb |
|
226 | return tb | |
224 |
|
227 | |||
225 | def __str__(self): |
|
228 | def __str__(self): | |
226 | s = str(self.msg) |
|
229 | s = str(self.msg) | |
227 | for en, ev, etb, ei in self.elist: |
|
230 | for en, ev, etb, ei in self.elist: | |
228 | engine_str = self._get_engine_str(ei) |
|
231 | engine_str = self._get_engine_str(ei) | |
229 | s = s + '\n' + engine_str + en + ': ' + str(ev) |
|
232 | s = s + '\n' + engine_str + en + ': ' + str(ev) | |
230 | return s |
|
233 | return s | |
231 |
|
234 | |||
232 | def __repr__(self): |
|
235 | def __repr__(self): | |
233 | return "CompositeError(%i)"%len(self.elist) |
|
236 | return "CompositeError(%i)"%len(self.elist) | |
234 |
|
237 | |||
235 | def print_tracebacks(self, excid=None): |
|
238 | def print_tracebacks(self, excid=None): | |
236 | if excid is None: |
|
239 | if excid is None: | |
237 | for (en,ev,etb,ei) in self.elist: |
|
240 | for (en,ev,etb,ei) in self.elist: | |
238 | print (self._get_engine_str(ei)) |
|
241 | print (self._get_engine_str(ei)) | |
239 | print (etb or 'No traceback available') |
|
242 | print (etb or 'No traceback available') | |
240 | print () |
|
243 | print () | |
241 | else: |
|
244 | else: | |
242 | try: |
|
245 | try: | |
243 | en,ev,etb,ei = self.elist[excid] |
|
246 | en,ev,etb,ei = self.elist[excid] | |
244 | except: |
|
247 | except: | |
245 | raise IndexError("an exception with index %i does not exist"%excid) |
|
248 | raise IndexError("an exception with index %i does not exist"%excid) | |
246 | else: |
|
249 | else: | |
247 | print (self._get_engine_str(ei)) |
|
250 | print (self._get_engine_str(ei)) | |
248 | print (etb or 'No traceback available') |
|
251 | print (etb or 'No traceback available') | |
249 |
|
252 | |||
250 | def raise_exception(self, excid=0): |
|
253 | def raise_exception(self, excid=0): | |
251 | try: |
|
254 | try: | |
252 | en,ev,etb,ei = self.elist[excid] |
|
255 | en,ev,etb,ei = self.elist[excid] | |
253 | except: |
|
256 | except: | |
254 | raise IndexError("an exception with index %i does not exist"%excid) |
|
257 | raise IndexError("an exception with index %i does not exist"%excid) | |
255 | else: |
|
258 | else: | |
256 | try: |
|
259 | try: | |
257 | raise RemoteError(en, ev, etb, ei) |
|
260 | raise RemoteError(en, ev, etb, ei) | |
258 | except: |
|
261 | except: | |
259 | et,ev,tb = sys.exc_info() |
|
262 | et,ev,tb = sys.exc_info() | |
260 |
|
263 | |||
261 |
|
264 | |||
262 | def collect_exceptions(rdict_or_list, method='unspecified'): |
|
265 | def collect_exceptions(rdict_or_list, method='unspecified'): | |
263 | """check a result dict for errors, and raise CompositeError if any exist. |
|
266 | """check a result dict for errors, and raise CompositeError if any exist. | |
264 | Passthrough otherwise.""" |
|
267 | Passthrough otherwise.""" | |
265 | elist = [] |
|
268 | elist = [] | |
266 | if isinstance(rdict_or_list, dict): |
|
269 | if isinstance(rdict_or_list, dict): | |
267 | rlist = rdict_or_list.values() |
|
270 | rlist = rdict_or_list.values() | |
268 | else: |
|
271 | else: | |
269 | rlist = rdict_or_list |
|
272 | rlist = rdict_or_list | |
270 | for r in rlist: |
|
273 | for r in rlist: | |
271 | if isinstance(r, RemoteError): |
|
274 | if isinstance(r, RemoteError): | |
272 | en, ev, etb, ei = r.ename, r.evalue, r.traceback, r.engine_info |
|
275 | en, ev, etb, ei = r.ename, r.evalue, r.traceback, r.engine_info | |
273 | # Sometimes we could have CompositeError in our list. Just take |
|
276 | # Sometimes we could have CompositeError in our list. Just take | |
274 | # the errors out of them and put them in our new list. This |
|
277 | # the errors out of them and put them in our new list. This | |
275 | # has the effect of flattening lists of CompositeErrors into one |
|
278 | # has the effect of flattening lists of CompositeErrors into one | |
276 | # CompositeError |
|
279 | # CompositeError | |
277 | if en=='CompositeError': |
|
280 | if en=='CompositeError': | |
278 | for e in ev.elist: |
|
281 | for e in ev.elist: | |
279 | elist.append(e) |
|
282 | elist.append(e) | |
280 | else: |
|
283 | else: | |
281 | elist.append((en, ev, etb, ei)) |
|
284 | elist.append((en, ev, etb, ei)) | |
282 | if len(elist)==0: |
|
285 | if len(elist)==0: | |
283 | return rdict_or_list |
|
286 | return rdict_or_list | |
284 | else: |
|
287 | else: | |
285 | msg = "one or more exceptions from call to method: %s" % (method) |
|
288 | msg = "one or more exceptions from call to method: %s" % (method) | |
286 | # This silliness is needed so the debugger has access to the exception |
|
289 | # This silliness is needed so the debugger has access to the exception | |
287 | # instance (e in this case) |
|
290 | # instance (e in this case) | |
288 | try: |
|
291 | try: | |
289 | raise CompositeError(msg, elist) |
|
292 | raise CompositeError(msg, elist) | |
290 | except CompositeError, e: |
|
293 | except CompositeError, e: | |
291 | raise e |
|
294 | raise e | |
292 |
|
295 |
@@ -1,1053 +1,1053 b'' | |||||
1 | #!/usr/bin/env python |
|
1 | #!/usr/bin/env python | |
2 | """The IPython Controller Hub with 0MQ |
|
2 | """The IPython Controller Hub with 0MQ | |
3 | This is the master object that handles connections from engines and clients, |
|
3 | This is the master object that handles connections from engines and clients, | |
4 | and monitors traffic through the various queues. |
|
4 | and monitors traffic through the various queues. | |
5 | """ |
|
5 | """ | |
6 | #----------------------------------------------------------------------------- |
|
6 | #----------------------------------------------------------------------------- | |
7 | # Copyright (C) 2010 The IPython Development Team |
|
7 | # Copyright (C) 2010 The IPython Development Team | |
8 | # |
|
8 | # | |
9 | # Distributed under the terms of the BSD License. The full license is in |
|
9 | # Distributed under the terms of the BSD License. The full license is in | |
10 | # the file COPYING, distributed as part of this software. |
|
10 | # the file COPYING, distributed as part of this software. | |
11 | #----------------------------------------------------------------------------- |
|
11 | #----------------------------------------------------------------------------- | |
12 |
|
12 | |||
13 | #----------------------------------------------------------------------------- |
|
13 | #----------------------------------------------------------------------------- | |
14 | # Imports |
|
14 | # Imports | |
15 | #----------------------------------------------------------------------------- |
|
15 | #----------------------------------------------------------------------------- | |
16 | from __future__ import print_function |
|
16 | from __future__ import print_function | |
17 |
|
17 | |||
18 | import sys |
|
18 | import sys | |
19 | from datetime import datetime |
|
19 | from datetime import datetime | |
20 | import time |
|
20 | import time | |
21 | import logging |
|
21 | import logging | |
22 |
|
22 | |||
23 | import zmq |
|
23 | import zmq | |
24 | from zmq.eventloop import ioloop |
|
24 | from zmq.eventloop import ioloop | |
25 | from zmq.eventloop.zmqstream import ZMQStream |
|
25 | from zmq.eventloop.zmqstream import ZMQStream | |
26 |
|
26 | |||
27 | # internal: |
|
27 | # internal: | |
28 | from IPython.config.configurable import Configurable |
|
28 | from IPython.config.configurable import Configurable | |
29 | from IPython.utils.traitlets import HasTraits, Instance, Int, Str, Dict, Set, List, Bool |
|
29 | from IPython.utils.traitlets import HasTraits, Instance, Int, Str, Dict, Set, List, Bool | |
30 | from IPython.utils.importstring import import_item |
|
30 | from IPython.utils.importstring import import_item | |
31 |
|
31 | |||
32 | from entry_point import select_random_ports |
|
32 | from entry_point import select_random_ports | |
33 | from factory import RegistrationFactory, LoggingFactory |
|
33 | from factory import RegistrationFactory, LoggingFactory | |
34 |
|
34 | |||
35 | from streamsession import Message, wrap_exception, ISO8601 |
|
35 | from streamsession import Message, wrap_exception, ISO8601 | |
36 | from heartmonitor import HeartMonitor |
|
36 | from heartmonitor import HeartMonitor | |
37 | from util import validate_url_container |
|
37 | from util import validate_url_container | |
38 |
|
38 | |||
39 | try: |
|
39 | try: | |
40 | from pymongo.binary import Binary |
|
40 | from pymongo.binary import Binary | |
41 | except ImportError: |
|
41 | except ImportError: | |
42 | MongoDB=None |
|
42 | MongoDB=None | |
43 | else: |
|
43 | else: | |
44 | from mongodb import MongoDB |
|
44 | from mongodb import MongoDB | |
45 |
|
45 | |||
46 | #----------------------------------------------------------------------------- |
|
46 | #----------------------------------------------------------------------------- | |
47 | # Code |
|
47 | # Code | |
48 | #----------------------------------------------------------------------------- |
|
48 | #----------------------------------------------------------------------------- | |
49 |
|
49 | |||
50 | def _passer(*args, **kwargs): |
|
50 | def _passer(*args, **kwargs): | |
51 | return |
|
51 | return | |
52 |
|
52 | |||
53 | def _printer(*args, **kwargs): |
|
53 | def _printer(*args, **kwargs): | |
54 | print (args) |
|
54 | print (args) | |
55 | print (kwargs) |
|
55 | print (kwargs) | |
56 |
|
56 | |||
57 | def init_record(msg): |
|
57 | def init_record(msg): | |
58 | """Initialize a TaskRecord based on a request.""" |
|
58 | """Initialize a TaskRecord based on a request.""" | |
59 | header = msg['header'] |
|
59 | header = msg['header'] | |
60 | return { |
|
60 | return { | |
61 | 'msg_id' : header['msg_id'], |
|
61 | 'msg_id' : header['msg_id'], | |
62 | 'header' : header, |
|
62 | 'header' : header, | |
63 | 'content': msg['content'], |
|
63 | 'content': msg['content'], | |
64 | 'buffers': msg['buffers'], |
|
64 | 'buffers': msg['buffers'], | |
65 | 'submitted': datetime.strptime(header['date'], ISO8601), |
|
65 | 'submitted': datetime.strptime(header['date'], ISO8601), | |
66 | 'client_uuid' : None, |
|
66 | 'client_uuid' : None, | |
67 | 'engine_uuid' : None, |
|
67 | 'engine_uuid' : None, | |
68 | 'started': None, |
|
68 | 'started': None, | |
69 | 'completed': None, |
|
69 | 'completed': None, | |
70 | 'resubmitted': None, |
|
70 | 'resubmitted': None, | |
71 | 'result_header' : None, |
|
71 | 'result_header' : None, | |
72 | 'result_content' : None, |
|
72 | 'result_content' : None, | |
73 | 'result_buffers' : None, |
|
73 | 'result_buffers' : None, | |
74 | 'queue' : None, |
|
74 | 'queue' : None, | |
75 | 'pyin' : None, |
|
75 | 'pyin' : None, | |
76 | 'pyout': None, |
|
76 | 'pyout': None, | |
77 | 'pyerr': None, |
|
77 | 'pyerr': None, | |
78 | 'stdout': '', |
|
78 | 'stdout': '', | |
79 | 'stderr': '', |
|
79 | 'stderr': '', | |
80 | } |
|
80 | } | |
81 |
|
81 | |||
82 |
|
82 | |||
83 | class EngineConnector(HasTraits): |
|
83 | class EngineConnector(HasTraits): | |
84 | """A simple object for accessing the various zmq connections of an object. |
|
84 | """A simple object for accessing the various zmq connections of an object. | |
85 | Attributes are: |
|
85 | Attributes are: | |
86 | id (int): engine ID |
|
86 | id (int): engine ID | |
87 | uuid (str): uuid (unused?) |
|
87 | uuid (str): uuid (unused?) | |
88 | queue (str): identity of queue's XREQ socket |
|
88 | queue (str): identity of queue's XREQ socket | |
89 | registration (str): identity of registration XREQ socket |
|
89 | registration (str): identity of registration XREQ socket | |
90 | heartbeat (str): identity of heartbeat XREQ socket |
|
90 | heartbeat (str): identity of heartbeat XREQ socket | |
91 | """ |
|
91 | """ | |
92 | id=Int(0) |
|
92 | id=Int(0) | |
93 | queue=Str() |
|
93 | queue=Str() | |
94 | control=Str() |
|
94 | control=Str() | |
95 | registration=Str() |
|
95 | registration=Str() | |
96 | heartbeat=Str() |
|
96 | heartbeat=Str() | |
97 | pending=Set() |
|
97 | pending=Set() | |
98 |
|
98 | |||
99 | class HubFactory(RegistrationFactory): |
|
99 | class HubFactory(RegistrationFactory): | |
100 | """The Configurable for setting up a Hub.""" |
|
100 | """The Configurable for setting up a Hub.""" | |
101 |
|
101 | |||
102 | # name of a scheduler scheme |
|
102 | # name of a scheduler scheme | |
103 |
scheme = Str('l |
|
103 | scheme = Str('leastload', config=True) | |
104 |
|
104 | |||
105 | # port-pairs for monitoredqueues: |
|
105 | # port-pairs for monitoredqueues: | |
106 | hb = Instance(list, config=True) |
|
106 | hb = Instance(list, config=True) | |
107 | def _hb_default(self): |
|
107 | def _hb_default(self): | |
108 | return select_random_ports(2) |
|
108 | return select_random_ports(2) | |
109 |
|
109 | |||
110 | mux = Instance(list, config=True) |
|
110 | mux = Instance(list, config=True) | |
111 | def _mux_default(self): |
|
111 | def _mux_default(self): | |
112 | return select_random_ports(2) |
|
112 | return select_random_ports(2) | |
113 |
|
113 | |||
114 | task = Instance(list, config=True) |
|
114 | task = Instance(list, config=True) | |
115 | def _task_default(self): |
|
115 | def _task_default(self): | |
116 | return select_random_ports(2) |
|
116 | return select_random_ports(2) | |
117 |
|
117 | |||
118 | control = Instance(list, config=True) |
|
118 | control = Instance(list, config=True) | |
119 | def _control_default(self): |
|
119 | def _control_default(self): | |
120 | return select_random_ports(2) |
|
120 | return select_random_ports(2) | |
121 |
|
121 | |||
122 | iopub = Instance(list, config=True) |
|
122 | iopub = Instance(list, config=True) | |
123 | def _iopub_default(self): |
|
123 | def _iopub_default(self): | |
124 | return select_random_ports(2) |
|
124 | return select_random_ports(2) | |
125 |
|
125 | |||
126 | # single ports: |
|
126 | # single ports: | |
127 | mon_port = Instance(int, config=True) |
|
127 | mon_port = Instance(int, config=True) | |
128 | def _mon_port_default(self): |
|
128 | def _mon_port_default(self): | |
129 | return select_random_ports(1)[0] |
|
129 | return select_random_ports(1)[0] | |
130 |
|
130 | |||
131 | query_port = Instance(int, config=True) |
|
131 | query_port = Instance(int, config=True) | |
132 | def _query_port_default(self): |
|
132 | def _query_port_default(self): | |
133 | return select_random_ports(1)[0] |
|
133 | return select_random_ports(1)[0] | |
134 |
|
134 | |||
135 | notifier_port = Instance(int, config=True) |
|
135 | notifier_port = Instance(int, config=True) | |
136 | def _notifier_port_default(self): |
|
136 | def _notifier_port_default(self): | |
137 | return select_random_ports(1)[0] |
|
137 | return select_random_ports(1)[0] | |
138 |
|
138 | |||
139 | ping = Int(1000, config=True) # ping frequency |
|
139 | ping = Int(1000, config=True) # ping frequency | |
140 |
|
140 | |||
141 | engine_ip = Str('127.0.0.1', config=True) |
|
141 | engine_ip = Str('127.0.0.1', config=True) | |
142 | engine_transport = Str('tcp', config=True) |
|
142 | engine_transport = Str('tcp', config=True) | |
143 |
|
143 | |||
144 | client_ip = Str('127.0.0.1', config=True) |
|
144 | client_ip = Str('127.0.0.1', config=True) | |
145 | client_transport = Str('tcp', config=True) |
|
145 | client_transport = Str('tcp', config=True) | |
146 |
|
146 | |||
147 | monitor_ip = Str('127.0.0.1', config=True) |
|
147 | monitor_ip = Str('127.0.0.1', config=True) | |
148 | monitor_transport = Str('tcp', config=True) |
|
148 | monitor_transport = Str('tcp', config=True) | |
149 |
|
149 | |||
150 | monitor_url = Str('') |
|
150 | monitor_url = Str('') | |
151 |
|
151 | |||
152 | db_class = Str('IPython.zmq.parallel.dictdb.DictDB', config=True) |
|
152 | db_class = Str('IPython.zmq.parallel.dictdb.DictDB', config=True) | |
153 |
|
153 | |||
154 | # not configurable |
|
154 | # not configurable | |
155 | db = Instance('IPython.zmq.parallel.dictdb.BaseDB') |
|
155 | db = Instance('IPython.zmq.parallel.dictdb.BaseDB') | |
156 | heartmonitor = Instance('IPython.zmq.parallel.heartmonitor.HeartMonitor') |
|
156 | heartmonitor = Instance('IPython.zmq.parallel.heartmonitor.HeartMonitor') | |
157 | subconstructors = List() |
|
157 | subconstructors = List() | |
158 | _constructed = Bool(False) |
|
158 | _constructed = Bool(False) | |
159 |
|
159 | |||
160 | def _ip_changed(self, name, old, new): |
|
160 | def _ip_changed(self, name, old, new): | |
161 | self.engine_ip = new |
|
161 | self.engine_ip = new | |
162 | self.client_ip = new |
|
162 | self.client_ip = new | |
163 | self.monitor_ip = new |
|
163 | self.monitor_ip = new | |
164 | self._update_monitor_url() |
|
164 | self._update_monitor_url() | |
165 |
|
165 | |||
166 | def _update_monitor_url(self): |
|
166 | def _update_monitor_url(self): | |
167 | self.monitor_url = "%s://%s:%i"%(self.monitor_transport, self.monitor_ip, self.mon_port) |
|
167 | self.monitor_url = "%s://%s:%i"%(self.monitor_transport, self.monitor_ip, self.mon_port) | |
168 |
|
168 | |||
169 | def _transport_changed(self, name, old, new): |
|
169 | def _transport_changed(self, name, old, new): | |
170 | self.engine_transport = new |
|
170 | self.engine_transport = new | |
171 | self.client_transport = new |
|
171 | self.client_transport = new | |
172 | self.monitor_transport = new |
|
172 | self.monitor_transport = new | |
173 | self._update_monitor_url() |
|
173 | self._update_monitor_url() | |
174 |
|
174 | |||
175 | def __init__(self, **kwargs): |
|
175 | def __init__(self, **kwargs): | |
176 | super(HubFactory, self).__init__(**kwargs) |
|
176 | super(HubFactory, self).__init__(**kwargs) | |
177 | self._update_monitor_url() |
|
177 | self._update_monitor_url() | |
178 | # self.on_trait_change(self._sync_ips, 'ip') |
|
178 | # self.on_trait_change(self._sync_ips, 'ip') | |
179 | # self.on_trait_change(self._sync_transports, 'transport') |
|
179 | # self.on_trait_change(self._sync_transports, 'transport') | |
180 | self.subconstructors.append(self.construct_hub) |
|
180 | self.subconstructors.append(self.construct_hub) | |
181 |
|
181 | |||
182 |
|
182 | |||
183 | def construct(self): |
|
183 | def construct(self): | |
184 | assert not self._constructed, "already constructed!" |
|
184 | assert not self._constructed, "already constructed!" | |
185 |
|
185 | |||
186 | for subc in self.subconstructors: |
|
186 | for subc in self.subconstructors: | |
187 | subc() |
|
187 | subc() | |
188 |
|
188 | |||
189 | self._constructed = True |
|
189 | self._constructed = True | |
190 |
|
190 | |||
191 |
|
191 | |||
192 | def start(self): |
|
192 | def start(self): | |
193 | assert self._constructed, "must be constructed by self.construct() first!" |
|
193 | assert self._constructed, "must be constructed by self.construct() first!" | |
194 | self.heartmonitor.start() |
|
194 | self.heartmonitor.start() | |
195 | self.log.info("Heartmonitor started") |
|
195 | self.log.info("Heartmonitor started") | |
196 |
|
196 | |||
197 | def construct_hub(self): |
|
197 | def construct_hub(self): | |
198 | """construct""" |
|
198 | """construct""" | |
199 | client_iface = "%s://%s:"%(self.client_transport, self.client_ip) + "%i" |
|
199 | client_iface = "%s://%s:"%(self.client_transport, self.client_ip) + "%i" | |
200 | engine_iface = "%s://%s:"%(self.engine_transport, self.engine_ip) + "%i" |
|
200 | engine_iface = "%s://%s:"%(self.engine_transport, self.engine_ip) + "%i" | |
201 |
|
201 | |||
202 | ctx = self.context |
|
202 | ctx = self.context | |
203 | loop = self.loop |
|
203 | loop = self.loop | |
204 |
|
204 | |||
205 | # Registrar socket |
|
205 | # Registrar socket | |
206 | reg = ZMQStream(ctx.socket(zmq.XREP), loop) |
|
206 | reg = ZMQStream(ctx.socket(zmq.XREP), loop) | |
207 | reg.bind(client_iface % self.regport) |
|
207 | reg.bind(client_iface % self.regport) | |
208 | self.log.info("Hub listening on %s for registration."%(client_iface%self.regport)) |
|
208 | self.log.info("Hub listening on %s for registration."%(client_iface%self.regport)) | |
209 | if self.client_ip != self.engine_ip: |
|
209 | if self.client_ip != self.engine_ip: | |
210 | reg.bind(engine_iface % self.regport) |
|
210 | reg.bind(engine_iface % self.regport) | |
211 | self.log.info("Hub listening on %s for registration."%(engine_iface%self.regport)) |
|
211 | self.log.info("Hub listening on %s for registration."%(engine_iface%self.regport)) | |
212 |
|
212 | |||
213 | ### Engine connections ### |
|
213 | ### Engine connections ### | |
214 |
|
214 | |||
215 | # heartbeat |
|
215 | # heartbeat | |
216 | hpub = ctx.socket(zmq.PUB) |
|
216 | hpub = ctx.socket(zmq.PUB) | |
217 | hpub.bind(engine_iface % self.hb[0]) |
|
217 | hpub.bind(engine_iface % self.hb[0]) | |
218 | hrep = ctx.socket(zmq.XREP) |
|
218 | hrep = ctx.socket(zmq.XREP) | |
219 | hrep.bind(engine_iface % self.hb[1]) |
|
219 | hrep.bind(engine_iface % self.hb[1]) | |
220 | self.heartmonitor = HeartMonitor(loop=loop, pingstream=ZMQStream(hpub,loop), pongstream=ZMQStream(hrep,loop), |
|
220 | self.heartmonitor = HeartMonitor(loop=loop, pingstream=ZMQStream(hpub,loop), pongstream=ZMQStream(hrep,loop), | |
221 | period=self.ping, logname=self.log.name) |
|
221 | period=self.ping, logname=self.log.name) | |
222 |
|
222 | |||
223 | ### Client connections ### |
|
223 | ### Client connections ### | |
224 | # Clientele socket |
|
224 | # Clientele socket | |
225 | c = ZMQStream(ctx.socket(zmq.XREP), loop) |
|
225 | c = ZMQStream(ctx.socket(zmq.XREP), loop) | |
226 | c.bind(client_iface%self.query_port) |
|
226 | c.bind(client_iface%self.query_port) | |
227 | # Notifier socket |
|
227 | # Notifier socket | |
228 | n = ZMQStream(ctx.socket(zmq.PUB), loop) |
|
228 | n = ZMQStream(ctx.socket(zmq.PUB), loop) | |
229 | n.bind(client_iface%self.notifier_port) |
|
229 | n.bind(client_iface%self.notifier_port) | |
230 |
|
230 | |||
231 | ### build and launch the queues ### |
|
231 | ### build and launch the queues ### | |
232 |
|
232 | |||
233 | # monitor socket |
|
233 | # monitor socket | |
234 | sub = ctx.socket(zmq.SUB) |
|
234 | sub = ctx.socket(zmq.SUB) | |
235 | sub.setsockopt(zmq.SUBSCRIBE, "") |
|
235 | sub.setsockopt(zmq.SUBSCRIBE, "") | |
236 | sub.bind(self.monitor_url) |
|
236 | sub.bind(self.monitor_url) | |
237 | sub = ZMQStream(sub, loop) |
|
237 | sub = ZMQStream(sub, loop) | |
238 |
|
238 | |||
239 | # connect the db |
|
239 | # connect the db | |
240 | self.db = import_item(self.db_class)() |
|
240 | self.db = import_item(self.db_class)() | |
241 | time.sleep(.25) |
|
241 | time.sleep(.25) | |
242 |
|
242 | |||
243 | # build connection dicts |
|
243 | # build connection dicts | |
244 | self.engine_info = { |
|
244 | self.engine_info = { | |
245 | 'control' : engine_iface%self.control[1], |
|
245 | 'control' : engine_iface%self.control[1], | |
246 | 'mux': engine_iface%self.mux[1], |
|
246 | 'mux': engine_iface%self.mux[1], | |
247 | 'heartbeat': (engine_iface%self.hb[0], engine_iface%self.hb[1]), |
|
247 | 'heartbeat': (engine_iface%self.hb[0], engine_iface%self.hb[1]), | |
248 | 'task' : engine_iface%self.task[1], |
|
248 | 'task' : engine_iface%self.task[1], | |
249 | 'iopub' : engine_iface%self.iopub[1], |
|
249 | 'iopub' : engine_iface%self.iopub[1], | |
250 | # 'monitor' : engine_iface%self.mon_port, |
|
250 | # 'monitor' : engine_iface%self.mon_port, | |
251 | } |
|
251 | } | |
252 |
|
252 | |||
253 | self.client_info = { |
|
253 | self.client_info = { | |
254 | 'control' : client_iface%self.control[0], |
|
254 | 'control' : client_iface%self.control[0], | |
255 | 'query': client_iface%self.query_port, |
|
255 | 'query': client_iface%self.query_port, | |
256 | 'mux': client_iface%self.mux[0], |
|
256 | 'mux': client_iface%self.mux[0], | |
257 | 'task' : (self.scheme, client_iface%self.task[0]), |
|
257 | 'task' : (self.scheme, client_iface%self.task[0]), | |
258 | 'iopub' : client_iface%self.iopub[0], |
|
258 | 'iopub' : client_iface%self.iopub[0], | |
259 | 'notification': client_iface%self.notifier_port |
|
259 | 'notification': client_iface%self.notifier_port | |
260 | } |
|
260 | } | |
261 | self.log.debug("hub::Hub engine addrs: %s"%self.engine_info) |
|
261 | self.log.debug("hub::Hub engine addrs: %s"%self.engine_info) | |
262 | self.log.debug("hub::Hub client addrs: %s"%self.client_info) |
|
262 | self.log.debug("hub::Hub client addrs: %s"%self.client_info) | |
263 | self.hub = Hub(loop=loop, session=self.session, monitor=sub, heartmonitor=self.heartmonitor, |
|
263 | self.hub = Hub(loop=loop, session=self.session, monitor=sub, heartmonitor=self.heartmonitor, | |
264 | registrar=reg, clientele=c, notifier=n, db=self.db, |
|
264 | registrar=reg, clientele=c, notifier=n, db=self.db, | |
265 | engine_info=self.engine_info, client_info=self.client_info, |
|
265 | engine_info=self.engine_info, client_info=self.client_info, | |
266 | logname=self.log.name) |
|
266 | logname=self.log.name) | |
267 |
|
267 | |||
268 |
|
268 | |||
269 | class Hub(LoggingFactory): |
|
269 | class Hub(LoggingFactory): | |
270 | """The IPython Controller Hub with 0MQ connections |
|
270 | """The IPython Controller Hub with 0MQ connections | |
271 |
|
271 | |||
272 | Parameters |
|
272 | Parameters | |
273 | ========== |
|
273 | ========== | |
274 | loop: zmq IOLoop instance |
|
274 | loop: zmq IOLoop instance | |
275 | session: StreamSession object |
|
275 | session: StreamSession object | |
276 | <removed> context: zmq context for creating new connections (?) |
|
276 | <removed> context: zmq context for creating new connections (?) | |
277 | queue: ZMQStream for monitoring the command queue (SUB) |
|
277 | queue: ZMQStream for monitoring the command queue (SUB) | |
278 | registrar: ZMQStream for engine registration requests (XREP) |
|
278 | registrar: ZMQStream for engine registration requests (XREP) | |
279 | heartbeat: HeartMonitor object checking the pulse of the engines |
|
279 | heartbeat: HeartMonitor object checking the pulse of the engines | |
280 | clientele: ZMQStream for client connections (XREP) |
|
280 | clientele: ZMQStream for client connections (XREP) | |
281 | not used for jobs, only query/control commands |
|
281 | not used for jobs, only query/control commands | |
282 | notifier: ZMQStream for broadcasting engine registration changes (PUB) |
|
282 | notifier: ZMQStream for broadcasting engine registration changes (PUB) | |
283 | db: connection to db for out of memory logging of commands |
|
283 | db: connection to db for out of memory logging of commands | |
284 | NotImplemented |
|
284 | NotImplemented | |
285 | engine_info: dict of zmq connection information for engines to connect |
|
285 | engine_info: dict of zmq connection information for engines to connect | |
286 | to the queues. |
|
286 | to the queues. | |
287 | client_info: dict of zmq connection information for engines to connect |
|
287 | client_info: dict of zmq connection information for engines to connect | |
288 | to the queues. |
|
288 | to the queues. | |
289 | """ |
|
289 | """ | |
290 | # internal data structures: |
|
290 | # internal data structures: | |
291 | ids=Set() # engine IDs |
|
291 | ids=Set() # engine IDs | |
292 | keytable=Dict() |
|
292 | keytable=Dict() | |
293 | by_ident=Dict() |
|
293 | by_ident=Dict() | |
294 | engines=Dict() |
|
294 | engines=Dict() | |
295 | clients=Dict() |
|
295 | clients=Dict() | |
296 | hearts=Dict() |
|
296 | hearts=Dict() | |
297 | pending=Set() |
|
297 | pending=Set() | |
298 | queues=Dict() # pending msg_ids keyed by engine_id |
|
298 | queues=Dict() # pending msg_ids keyed by engine_id | |
299 | tasks=Dict() # pending msg_ids submitted as tasks, keyed by client_id |
|
299 | tasks=Dict() # pending msg_ids submitted as tasks, keyed by client_id | |
300 | completed=Dict() # completed msg_ids keyed by engine_id |
|
300 | completed=Dict() # completed msg_ids keyed by engine_id | |
301 | all_completed=Set() # completed msg_ids keyed by engine_id |
|
301 | all_completed=Set() # completed msg_ids keyed by engine_id | |
302 | # mia=None |
|
302 | # mia=None | |
303 | incoming_registrations=Dict() |
|
303 | incoming_registrations=Dict() | |
304 | registration_timeout=Int() |
|
304 | registration_timeout=Int() | |
305 | _idcounter=Int(0) |
|
305 | _idcounter=Int(0) | |
306 |
|
306 | |||
307 | # objects from constructor: |
|
307 | # objects from constructor: | |
308 | loop=Instance(ioloop.IOLoop) |
|
308 | loop=Instance(ioloop.IOLoop) | |
309 | registrar=Instance(ZMQStream) |
|
309 | registrar=Instance(ZMQStream) | |
310 | clientele=Instance(ZMQStream) |
|
310 | clientele=Instance(ZMQStream) | |
311 | monitor=Instance(ZMQStream) |
|
311 | monitor=Instance(ZMQStream) | |
312 | heartmonitor=Instance(HeartMonitor) |
|
312 | heartmonitor=Instance(HeartMonitor) | |
313 | notifier=Instance(ZMQStream) |
|
313 | notifier=Instance(ZMQStream) | |
314 | db=Instance(object) |
|
314 | db=Instance(object) | |
315 | client_info=Dict() |
|
315 | client_info=Dict() | |
316 | engine_info=Dict() |
|
316 | engine_info=Dict() | |
317 |
|
317 | |||
318 |
|
318 | |||
319 | def __init__(self, **kwargs): |
|
319 | def __init__(self, **kwargs): | |
320 | """ |
|
320 | """ | |
321 | # universal: |
|
321 | # universal: | |
322 | loop: IOLoop for creating future connections |
|
322 | loop: IOLoop for creating future connections | |
323 | session: streamsession for sending serialized data |
|
323 | session: streamsession for sending serialized data | |
324 | # engine: |
|
324 | # engine: | |
325 | queue: ZMQStream for monitoring queue messages |
|
325 | queue: ZMQStream for monitoring queue messages | |
326 | registrar: ZMQStream for engine registration |
|
326 | registrar: ZMQStream for engine registration | |
327 | heartbeat: HeartMonitor object for tracking engines |
|
327 | heartbeat: HeartMonitor object for tracking engines | |
328 | # client: |
|
328 | # client: | |
329 | clientele: ZMQStream for client connections |
|
329 | clientele: ZMQStream for client connections | |
330 | # extra: |
|
330 | # extra: | |
331 | db: ZMQStream for db connection (NotImplemented) |
|
331 | db: ZMQStream for db connection (NotImplemented) | |
332 | engine_info: zmq address/protocol dict for engine connections |
|
332 | engine_info: zmq address/protocol dict for engine connections | |
333 | client_info: zmq address/protocol dict for client connections |
|
333 | client_info: zmq address/protocol dict for client connections | |
334 | """ |
|
334 | """ | |
335 |
|
335 | |||
336 | super(Hub, self).__init__(**kwargs) |
|
336 | super(Hub, self).__init__(**kwargs) | |
337 | self.registration_timeout = max(5000, 2*self.heartmonitor.period) |
|
337 | self.registration_timeout = max(5000, 2*self.heartmonitor.period) | |
338 |
|
338 | |||
339 | # validate connection dicts: |
|
339 | # validate connection dicts: | |
340 | for k,v in self.client_info.iteritems(): |
|
340 | for k,v in self.client_info.iteritems(): | |
341 | if k == 'task': |
|
341 | if k == 'task': | |
342 | validate_url_container(v[1]) |
|
342 | validate_url_container(v[1]) | |
343 | else: |
|
343 | else: | |
344 | validate_url_container(v) |
|
344 | validate_url_container(v) | |
345 | # validate_url_container(self.client_info) |
|
345 | # validate_url_container(self.client_info) | |
346 | validate_url_container(self.engine_info) |
|
346 | validate_url_container(self.engine_info) | |
347 |
|
347 | |||
348 | # register our callbacks |
|
348 | # register our callbacks | |
349 | self.registrar.on_recv(self.dispatch_register_request) |
|
349 | self.registrar.on_recv(self.dispatch_register_request) | |
350 | self.clientele.on_recv(self.dispatch_client_msg) |
|
350 | self.clientele.on_recv(self.dispatch_client_msg) | |
351 | self.monitor.on_recv(self.dispatch_monitor_traffic) |
|
351 | self.monitor.on_recv(self.dispatch_monitor_traffic) | |
352 |
|
352 | |||
353 | self.heartmonitor.add_heart_failure_handler(self.handle_heart_failure) |
|
353 | self.heartmonitor.add_heart_failure_handler(self.handle_heart_failure) | |
354 | self.heartmonitor.add_new_heart_handler(self.handle_new_heart) |
|
354 | self.heartmonitor.add_new_heart_handler(self.handle_new_heart) | |
355 |
|
355 | |||
356 | self.monitor_handlers = { 'in' : self.save_queue_request, |
|
356 | self.monitor_handlers = { 'in' : self.save_queue_request, | |
357 | 'out': self.save_queue_result, |
|
357 | 'out': self.save_queue_result, | |
358 | 'intask': self.save_task_request, |
|
358 | 'intask': self.save_task_request, | |
359 | 'outtask': self.save_task_result, |
|
359 | 'outtask': self.save_task_result, | |
360 | 'tracktask': self.save_task_destination, |
|
360 | 'tracktask': self.save_task_destination, | |
361 | 'incontrol': _passer, |
|
361 | 'incontrol': _passer, | |
362 | 'outcontrol': _passer, |
|
362 | 'outcontrol': _passer, | |
363 | 'iopub': self.save_iopub_message, |
|
363 | 'iopub': self.save_iopub_message, | |
364 | } |
|
364 | } | |
365 |
|
365 | |||
366 | self.client_handlers = {'queue_request': self.queue_status, |
|
366 | self.client_handlers = {'queue_request': self.queue_status, | |
367 | 'result_request': self.get_results, |
|
367 | 'result_request': self.get_results, | |
368 | 'purge_request': self.purge_results, |
|
368 | 'purge_request': self.purge_results, | |
369 | 'load_request': self.check_load, |
|
369 | 'load_request': self.check_load, | |
370 | 'resubmit_request': self.resubmit_task, |
|
370 | 'resubmit_request': self.resubmit_task, | |
371 | 'shutdown_request': self.shutdown_request, |
|
371 | 'shutdown_request': self.shutdown_request, | |
372 | } |
|
372 | } | |
373 |
|
373 | |||
374 | self.registrar_handlers = {'registration_request' : self.register_engine, |
|
374 | self.registrar_handlers = {'registration_request' : self.register_engine, | |
375 | 'unregistration_request' : self.unregister_engine, |
|
375 | 'unregistration_request' : self.unregister_engine, | |
376 | 'connection_request': self.connection_request, |
|
376 | 'connection_request': self.connection_request, | |
377 | } |
|
377 | } | |
378 |
|
378 | |||
379 | self.log.info("hub::created hub") |
|
379 | self.log.info("hub::created hub") | |
380 |
|
380 | |||
381 | @property |
|
381 | @property | |
382 | def _next_id(self): |
|
382 | def _next_id(self): | |
383 | """gemerate a new ID. |
|
383 | """gemerate a new ID. | |
384 |
|
384 | |||
385 | No longer reuse old ids, just count from 0.""" |
|
385 | No longer reuse old ids, just count from 0.""" | |
386 | newid = self._idcounter |
|
386 | newid = self._idcounter | |
387 | self._idcounter += 1 |
|
387 | self._idcounter += 1 | |
388 | return newid |
|
388 | return newid | |
389 | # newid = 0 |
|
389 | # newid = 0 | |
390 | # incoming = [id[0] for id in self.incoming_registrations.itervalues()] |
|
390 | # incoming = [id[0] for id in self.incoming_registrations.itervalues()] | |
391 | # # print newid, self.ids, self.incoming_registrations |
|
391 | # # print newid, self.ids, self.incoming_registrations | |
392 | # while newid in self.ids or newid in incoming: |
|
392 | # while newid in self.ids or newid in incoming: | |
393 | # newid += 1 |
|
393 | # newid += 1 | |
394 | # return newid |
|
394 | # return newid | |
395 |
|
395 | |||
396 | #----------------------------------------------------------------------------- |
|
396 | #----------------------------------------------------------------------------- | |
397 | # message validation |
|
397 | # message validation | |
398 | #----------------------------------------------------------------------------- |
|
398 | #----------------------------------------------------------------------------- | |
399 |
|
399 | |||
400 | def _validate_targets(self, targets): |
|
400 | def _validate_targets(self, targets): | |
401 | """turn any valid targets argument into a list of integer ids""" |
|
401 | """turn any valid targets argument into a list of integer ids""" | |
402 | if targets is None: |
|
402 | if targets is None: | |
403 | # default to all |
|
403 | # default to all | |
404 | targets = self.ids |
|
404 | targets = self.ids | |
405 |
|
405 | |||
406 | if isinstance(targets, (int,str,unicode)): |
|
406 | if isinstance(targets, (int,str,unicode)): | |
407 | # only one target specified |
|
407 | # only one target specified | |
408 | targets = [targets] |
|
408 | targets = [targets] | |
409 | _targets = [] |
|
409 | _targets = [] | |
410 | for t in targets: |
|
410 | for t in targets: | |
411 | # map raw identities to ids |
|
411 | # map raw identities to ids | |
412 | if isinstance(t, (str,unicode)): |
|
412 | if isinstance(t, (str,unicode)): | |
413 | t = self.by_ident.get(t, t) |
|
413 | t = self.by_ident.get(t, t) | |
414 | _targets.append(t) |
|
414 | _targets.append(t) | |
415 | targets = _targets |
|
415 | targets = _targets | |
416 | bad_targets = [ t for t in targets if t not in self.ids ] |
|
416 | bad_targets = [ t for t in targets if t not in self.ids ] | |
417 | if bad_targets: |
|
417 | if bad_targets: | |
418 | raise IndexError("No Such Engine: %r"%bad_targets) |
|
418 | raise IndexError("No Such Engine: %r"%bad_targets) | |
419 | if not targets: |
|
419 | if not targets: | |
420 | raise IndexError("No Engines Registered") |
|
420 | raise IndexError("No Engines Registered") | |
421 | return targets |
|
421 | return targets | |
422 |
|
422 | |||
423 | def _validate_client_msg(self, msg): |
|
423 | def _validate_client_msg(self, msg): | |
424 | """validates and unpacks headers of a message. Returns False if invalid, |
|
424 | """validates and unpacks headers of a message. Returns False if invalid, | |
425 | (ident, header, parent, content)""" |
|
425 | (ident, header, parent, content)""" | |
426 | client_id = msg[0] |
|
426 | client_id = msg[0] | |
427 | try: |
|
427 | try: | |
428 | msg = self.session.unpack_message(msg[1:], content=True) |
|
428 | msg = self.session.unpack_message(msg[1:], content=True) | |
429 | except: |
|
429 | except: | |
430 | self.log.error("client::Invalid Message %s"%msg, exc_info=True) |
|
430 | self.log.error("client::Invalid Message %s"%msg, exc_info=True) | |
431 | return False |
|
431 | return False | |
432 |
|
432 | |||
433 | msg_type = msg.get('msg_type', None) |
|
433 | msg_type = msg.get('msg_type', None) | |
434 | if msg_type is None: |
|
434 | if msg_type is None: | |
435 | return False |
|
435 | return False | |
436 | header = msg.get('header') |
|
436 | header = msg.get('header') | |
437 | # session doesn't handle split content for now: |
|
437 | # session doesn't handle split content for now: | |
438 | return client_id, msg |
|
438 | return client_id, msg | |
439 |
|
439 | |||
440 |
|
440 | |||
441 | #----------------------------------------------------------------------------- |
|
441 | #----------------------------------------------------------------------------- | |
442 | # dispatch methods (1 per stream) |
|
442 | # dispatch methods (1 per stream) | |
443 | #----------------------------------------------------------------------------- |
|
443 | #----------------------------------------------------------------------------- | |
444 |
|
444 | |||
445 | def dispatch_register_request(self, msg): |
|
445 | def dispatch_register_request(self, msg): | |
446 | """""" |
|
446 | """""" | |
447 | self.log.debug("registration::dispatch_register_request(%s)"%msg) |
|
447 | self.log.debug("registration::dispatch_register_request(%s)"%msg) | |
448 | idents,msg = self.session.feed_identities(msg) |
|
448 | idents,msg = self.session.feed_identities(msg) | |
449 | if not idents: |
|
449 | if not idents: | |
450 | self.log.error("Bad Queue Message: %s"%msg, exc_info=True) |
|
450 | self.log.error("Bad Queue Message: %s"%msg, exc_info=True) | |
451 | return |
|
451 | return | |
452 | try: |
|
452 | try: | |
453 | msg = self.session.unpack_message(msg,content=True) |
|
453 | msg = self.session.unpack_message(msg,content=True) | |
454 | except: |
|
454 | except: | |
455 | self.log.error("registration::got bad registration message: %s"%msg, exc_info=True) |
|
455 | self.log.error("registration::got bad registration message: %s"%msg, exc_info=True) | |
456 | return |
|
456 | return | |
457 |
|
457 | |||
458 | msg_type = msg['msg_type'] |
|
458 | msg_type = msg['msg_type'] | |
459 | content = msg['content'] |
|
459 | content = msg['content'] | |
460 |
|
460 | |||
461 | handler = self.registrar_handlers.get(msg_type, None) |
|
461 | handler = self.registrar_handlers.get(msg_type, None) | |
462 | if handler is None: |
|
462 | if handler is None: | |
463 | self.log.error("registration::got bad registration message: %s"%msg) |
|
463 | self.log.error("registration::got bad registration message: %s"%msg) | |
464 | else: |
|
464 | else: | |
465 | handler(idents, msg) |
|
465 | handler(idents, msg) | |
466 |
|
466 | |||
467 | def dispatch_monitor_traffic(self, msg): |
|
467 | def dispatch_monitor_traffic(self, msg): | |
468 | """all ME and Task queue messages come through here, as well as |
|
468 | """all ME and Task queue messages come through here, as well as | |
469 | IOPub traffic.""" |
|
469 | IOPub traffic.""" | |
470 | self.log.debug("monitor traffic: %s"%msg[:2]) |
|
470 | self.log.debug("monitor traffic: %s"%msg[:2]) | |
471 | switch = msg[0] |
|
471 | switch = msg[0] | |
472 | idents, msg = self.session.feed_identities(msg[1:]) |
|
472 | idents, msg = self.session.feed_identities(msg[1:]) | |
473 | if not idents: |
|
473 | if not idents: | |
474 | self.log.error("Bad Monitor Message: %s"%msg) |
|
474 | self.log.error("Bad Monitor Message: %s"%msg) | |
475 | return |
|
475 | return | |
476 | handler = self.monitor_handlers.get(switch, None) |
|
476 | handler = self.monitor_handlers.get(switch, None) | |
477 | if handler is not None: |
|
477 | if handler is not None: | |
478 | handler(idents, msg) |
|
478 | handler(idents, msg) | |
479 | else: |
|
479 | else: | |
480 | self.log.error("Invalid monitor topic: %s"%switch) |
|
480 | self.log.error("Invalid monitor topic: %s"%switch) | |
481 |
|
481 | |||
482 |
|
482 | |||
483 | def dispatch_client_msg(self, msg): |
|
483 | def dispatch_client_msg(self, msg): | |
484 | """Route messages from clients""" |
|
484 | """Route messages from clients""" | |
485 | idents, msg = self.session.feed_identities(msg) |
|
485 | idents, msg = self.session.feed_identities(msg) | |
486 | if not idents: |
|
486 | if not idents: | |
487 | self.log.error("Bad Client Message: %s"%msg) |
|
487 | self.log.error("Bad Client Message: %s"%msg) | |
488 | return |
|
488 | return | |
489 | client_id = idents[0] |
|
489 | client_id = idents[0] | |
490 | try: |
|
490 | try: | |
491 | msg = self.session.unpack_message(msg, content=True) |
|
491 | msg = self.session.unpack_message(msg, content=True) | |
492 | except: |
|
492 | except: | |
493 | content = wrap_exception() |
|
493 | content = wrap_exception() | |
494 | self.log.error("Bad Client Message: %s"%msg, exc_info=True) |
|
494 | self.log.error("Bad Client Message: %s"%msg, exc_info=True) | |
495 | self.session.send(self.clientele, "hub_error", ident=client_id, |
|
495 | self.session.send(self.clientele, "hub_error", ident=client_id, | |
496 | content=content) |
|
496 | content=content) | |
497 | return |
|
497 | return | |
498 |
|
498 | |||
499 | # print client_id, header, parent, content |
|
499 | # print client_id, header, parent, content | |
500 | #switch on message type: |
|
500 | #switch on message type: | |
501 | msg_type = msg['msg_type'] |
|
501 | msg_type = msg['msg_type'] | |
502 | self.log.info("client:: client %s requested %s"%(client_id, msg_type)) |
|
502 | self.log.info("client:: client %s requested %s"%(client_id, msg_type)) | |
503 | handler = self.client_handlers.get(msg_type, None) |
|
503 | handler = self.client_handlers.get(msg_type, None) | |
504 | try: |
|
504 | try: | |
505 | assert handler is not None, "Bad Message Type: %s"%msg_type |
|
505 | assert handler is not None, "Bad Message Type: %s"%msg_type | |
506 | except: |
|
506 | except: | |
507 | content = wrap_exception() |
|
507 | content = wrap_exception() | |
508 | self.log.error("Bad Message Type: %s"%msg_type, exc_info=True) |
|
508 | self.log.error("Bad Message Type: %s"%msg_type, exc_info=True) | |
509 | self.session.send(self.clientele, "hub_error", ident=client_id, |
|
509 | self.session.send(self.clientele, "hub_error", ident=client_id, | |
510 | content=content) |
|
510 | content=content) | |
511 | return |
|
511 | return | |
512 | else: |
|
512 | else: | |
513 | handler(client_id, msg) |
|
513 | handler(client_id, msg) | |
514 |
|
514 | |||
515 | def dispatch_db(self, msg): |
|
515 | def dispatch_db(self, msg): | |
516 | """""" |
|
516 | """""" | |
517 | raise NotImplementedError |
|
517 | raise NotImplementedError | |
518 |
|
518 | |||
519 | #--------------------------------------------------------------------------- |
|
519 | #--------------------------------------------------------------------------- | |
520 | # handler methods (1 per event) |
|
520 | # handler methods (1 per event) | |
521 | #--------------------------------------------------------------------------- |
|
521 | #--------------------------------------------------------------------------- | |
522 |
|
522 | |||
523 | #----------------------- Heartbeat -------------------------------------- |
|
523 | #----------------------- Heartbeat -------------------------------------- | |
524 |
|
524 | |||
525 | def handle_new_heart(self, heart): |
|
525 | def handle_new_heart(self, heart): | |
526 | """handler to attach to heartbeater. |
|
526 | """handler to attach to heartbeater. | |
527 | Called when a new heart starts to beat. |
|
527 | Called when a new heart starts to beat. | |
528 | Triggers completion of registration.""" |
|
528 | Triggers completion of registration.""" | |
529 | self.log.debug("heartbeat::handle_new_heart(%r)"%heart) |
|
529 | self.log.debug("heartbeat::handle_new_heart(%r)"%heart) | |
530 | if heart not in self.incoming_registrations: |
|
530 | if heart not in self.incoming_registrations: | |
531 | self.log.info("heartbeat::ignoring new heart: %r"%heart) |
|
531 | self.log.info("heartbeat::ignoring new heart: %r"%heart) | |
532 | else: |
|
532 | else: | |
533 | self.finish_registration(heart) |
|
533 | self.finish_registration(heart) | |
534 |
|
534 | |||
535 |
|
535 | |||
536 | def handle_heart_failure(self, heart): |
|
536 | def handle_heart_failure(self, heart): | |
537 | """handler to attach to heartbeater. |
|
537 | """handler to attach to heartbeater. | |
538 | called when a previously registered heart fails to respond to beat request. |
|
538 | called when a previously registered heart fails to respond to beat request. | |
539 | triggers unregistration""" |
|
539 | triggers unregistration""" | |
540 | self.log.debug("heartbeat::handle_heart_failure(%r)"%heart) |
|
540 | self.log.debug("heartbeat::handle_heart_failure(%r)"%heart) | |
541 | eid = self.hearts.get(heart, None) |
|
541 | eid = self.hearts.get(heart, None) | |
542 | queue = self.engines[eid].queue |
|
542 | queue = self.engines[eid].queue | |
543 | if eid is None: |
|
543 | if eid is None: | |
544 | self.log.info("heartbeat::ignoring heart failure %r"%heart) |
|
544 | self.log.info("heartbeat::ignoring heart failure %r"%heart) | |
545 | else: |
|
545 | else: | |
546 | self.unregister_engine(heart, dict(content=dict(id=eid, queue=queue))) |
|
546 | self.unregister_engine(heart, dict(content=dict(id=eid, queue=queue))) | |
547 |
|
547 | |||
548 | #----------------------- MUX Queue Traffic ------------------------------ |
|
548 | #----------------------- MUX Queue Traffic ------------------------------ | |
549 |
|
549 | |||
550 | def save_queue_request(self, idents, msg): |
|
550 | def save_queue_request(self, idents, msg): | |
551 | if len(idents) < 2: |
|
551 | if len(idents) < 2: | |
552 | self.log.error("invalid identity prefix: %s"%idents) |
|
552 | self.log.error("invalid identity prefix: %s"%idents) | |
553 | return |
|
553 | return | |
554 | queue_id, client_id = idents[:2] |
|
554 | queue_id, client_id = idents[:2] | |
555 | try: |
|
555 | try: | |
556 | msg = self.session.unpack_message(msg, content=False) |
|
556 | msg = self.session.unpack_message(msg, content=False) | |
557 | except: |
|
557 | except: | |
558 | self.log.error("queue::client %r sent invalid message to %r: %s"%(client_id, queue_id, msg), exc_info=True) |
|
558 | self.log.error("queue::client %r sent invalid message to %r: %s"%(client_id, queue_id, msg), exc_info=True) | |
559 | return |
|
559 | return | |
560 |
|
560 | |||
561 | eid = self.by_ident.get(queue_id, None) |
|
561 | eid = self.by_ident.get(queue_id, None) | |
562 | if eid is None: |
|
562 | if eid is None: | |
563 | self.log.error("queue::target %r not registered"%queue_id) |
|
563 | self.log.error("queue::target %r not registered"%queue_id) | |
564 | self.log.debug("queue:: valid are: %s"%(self.by_ident.keys())) |
|
564 | self.log.debug("queue:: valid are: %s"%(self.by_ident.keys())) | |
565 | return |
|
565 | return | |
566 |
|
566 | |||
567 | header = msg['header'] |
|
567 | header = msg['header'] | |
568 | msg_id = header['msg_id'] |
|
568 | msg_id = header['msg_id'] | |
569 | record = init_record(msg) |
|
569 | record = init_record(msg) | |
570 | record['engine_uuid'] = queue_id |
|
570 | record['engine_uuid'] = queue_id | |
571 | record['client_uuid'] = client_id |
|
571 | record['client_uuid'] = client_id | |
572 | record['queue'] = 'mux' |
|
572 | record['queue'] = 'mux' | |
573 | if MongoDB is not None and isinstance(self.db, MongoDB): |
|
573 | if MongoDB is not None and isinstance(self.db, MongoDB): | |
574 | record['buffers'] = map(Binary, record['buffers']) |
|
574 | record['buffers'] = map(Binary, record['buffers']) | |
575 | self.pending.add(msg_id) |
|
575 | self.pending.add(msg_id) | |
576 | self.queues[eid].append(msg_id) |
|
576 | self.queues[eid].append(msg_id) | |
577 | self.db.add_record(msg_id, record) |
|
577 | self.db.add_record(msg_id, record) | |
578 |
|
578 | |||
579 | def save_queue_result(self, idents, msg): |
|
579 | def save_queue_result(self, idents, msg): | |
580 | if len(idents) < 2: |
|
580 | if len(idents) < 2: | |
581 | self.log.error("invalid identity prefix: %s"%idents) |
|
581 | self.log.error("invalid identity prefix: %s"%idents) | |
582 | return |
|
582 | return | |
583 |
|
583 | |||
584 | client_id, queue_id = idents[:2] |
|
584 | client_id, queue_id = idents[:2] | |
585 | try: |
|
585 | try: | |
586 | msg = self.session.unpack_message(msg, content=False) |
|
586 | msg = self.session.unpack_message(msg, content=False) | |
587 | except: |
|
587 | except: | |
588 | self.log.error("queue::engine %r sent invalid message to %r: %s"%( |
|
588 | self.log.error("queue::engine %r sent invalid message to %r: %s"%( | |
589 | queue_id,client_id, msg), exc_info=True) |
|
589 | queue_id,client_id, msg), exc_info=True) | |
590 | return |
|
590 | return | |
591 |
|
591 | |||
592 | eid = self.by_ident.get(queue_id, None) |
|
592 | eid = self.by_ident.get(queue_id, None) | |
593 | if eid is None: |
|
593 | if eid is None: | |
594 | self.log.error("queue::unknown engine %r is sending a reply: "%queue_id) |
|
594 | self.log.error("queue::unknown engine %r is sending a reply: "%queue_id) | |
595 | self.log.debug("queue:: %s"%msg[2:]) |
|
595 | self.log.debug("queue:: %s"%msg[2:]) | |
596 | return |
|
596 | return | |
597 |
|
597 | |||
598 | parent = msg['parent_header'] |
|
598 | parent = msg['parent_header'] | |
599 | if not parent: |
|
599 | if not parent: | |
600 | return |
|
600 | return | |
601 | msg_id = parent['msg_id'] |
|
601 | msg_id = parent['msg_id'] | |
602 | if msg_id in self.pending: |
|
602 | if msg_id in self.pending: | |
603 | self.pending.remove(msg_id) |
|
603 | self.pending.remove(msg_id) | |
604 | self.all_completed.add(msg_id) |
|
604 | self.all_completed.add(msg_id) | |
605 | self.queues[eid].remove(msg_id) |
|
605 | self.queues[eid].remove(msg_id) | |
606 | self.completed[eid].append(msg_id) |
|
606 | self.completed[eid].append(msg_id) | |
607 | rheader = msg['header'] |
|
607 | rheader = msg['header'] | |
608 | completed = datetime.strptime(rheader['date'], ISO8601) |
|
608 | completed = datetime.strptime(rheader['date'], ISO8601) | |
609 | started = rheader.get('started', None) |
|
609 | started = rheader.get('started', None) | |
610 | if started is not None: |
|
610 | if started is not None: | |
611 | started = datetime.strptime(started, ISO8601) |
|
611 | started = datetime.strptime(started, ISO8601) | |
612 | result = { |
|
612 | result = { | |
613 | 'result_header' : rheader, |
|
613 | 'result_header' : rheader, | |
614 | 'result_content': msg['content'], |
|
614 | 'result_content': msg['content'], | |
615 | 'started' : started, |
|
615 | 'started' : started, | |
616 | 'completed' : completed |
|
616 | 'completed' : completed | |
617 | } |
|
617 | } | |
618 | if MongoDB is not None and isinstance(self.db, MongoDB): |
|
618 | if MongoDB is not None and isinstance(self.db, MongoDB): | |
619 | result['result_buffers'] = map(Binary, msg['buffers']) |
|
619 | result['result_buffers'] = map(Binary, msg['buffers']) | |
620 | else: |
|
620 | else: | |
621 | result['result_buffers'] = msg['buffers'] |
|
621 | result['result_buffers'] = msg['buffers'] | |
622 | self.db.update_record(msg_id, result) |
|
622 | self.db.update_record(msg_id, result) | |
623 | else: |
|
623 | else: | |
624 | self.log.debug("queue:: unknown msg finished %s"%msg_id) |
|
624 | self.log.debug("queue:: unknown msg finished %s"%msg_id) | |
625 |
|
625 | |||
626 | #--------------------- Task Queue Traffic ------------------------------ |
|
626 | #--------------------- Task Queue Traffic ------------------------------ | |
627 |
|
627 | |||
628 | def save_task_request(self, idents, msg): |
|
628 | def save_task_request(self, idents, msg): | |
629 | """Save the submission of a task.""" |
|
629 | """Save the submission of a task.""" | |
630 | client_id = idents[0] |
|
630 | client_id = idents[0] | |
631 |
|
631 | |||
632 | try: |
|
632 | try: | |
633 | msg = self.session.unpack_message(msg, content=False) |
|
633 | msg = self.session.unpack_message(msg, content=False) | |
634 | except: |
|
634 | except: | |
635 | self.log.error("task::client %r sent invalid task message: %s"%( |
|
635 | self.log.error("task::client %r sent invalid task message: %s"%( | |
636 | client_id, msg), exc_info=True) |
|
636 | client_id, msg), exc_info=True) | |
637 | return |
|
637 | return | |
638 | record = init_record(msg) |
|
638 | record = init_record(msg) | |
639 | if MongoDB is not None and isinstance(self.db, MongoDB): |
|
639 | if MongoDB is not None and isinstance(self.db, MongoDB): | |
640 | record['buffers'] = map(Binary, record['buffers']) |
|
640 | record['buffers'] = map(Binary, record['buffers']) | |
641 | record['client_uuid'] = client_id |
|
641 | record['client_uuid'] = client_id | |
642 | record['queue'] = 'task' |
|
642 | record['queue'] = 'task' | |
643 | header = msg['header'] |
|
643 | header = msg['header'] | |
644 | msg_id = header['msg_id'] |
|
644 | msg_id = header['msg_id'] | |
645 | self.pending.add(msg_id) |
|
645 | self.pending.add(msg_id) | |
646 | self.db.add_record(msg_id, record) |
|
646 | self.db.add_record(msg_id, record) | |
647 |
|
647 | |||
648 | def save_task_result(self, idents, msg): |
|
648 | def save_task_result(self, idents, msg): | |
649 | """save the result of a completed task.""" |
|
649 | """save the result of a completed task.""" | |
650 | client_id = idents[0] |
|
650 | client_id = idents[0] | |
651 | try: |
|
651 | try: | |
652 | msg = self.session.unpack_message(msg, content=False) |
|
652 | msg = self.session.unpack_message(msg, content=False) | |
653 | except: |
|
653 | except: | |
654 | self.log.error("task::invalid task result message send to %r: %s"%( |
|
654 | self.log.error("task::invalid task result message send to %r: %s"%( | |
655 | client_id, msg), exc_info=True) |
|
655 | client_id, msg), exc_info=True) | |
656 | raise |
|
656 | raise | |
657 | return |
|
657 | return | |
658 |
|
658 | |||
659 | parent = msg['parent_header'] |
|
659 | parent = msg['parent_header'] | |
660 | if not parent: |
|
660 | if not parent: | |
661 | # print msg |
|
661 | # print msg | |
662 | self.log.warn("Task %r had no parent!"%msg) |
|
662 | self.log.warn("Task %r had no parent!"%msg) | |
663 | return |
|
663 | return | |
664 | msg_id = parent['msg_id'] |
|
664 | msg_id = parent['msg_id'] | |
665 |
|
665 | |||
666 | header = msg['header'] |
|
666 | header = msg['header'] | |
667 | engine_uuid = header.get('engine', None) |
|
667 | engine_uuid = header.get('engine', None) | |
668 | eid = self.by_ident.get(engine_uuid, None) |
|
668 | eid = self.by_ident.get(engine_uuid, None) | |
669 |
|
669 | |||
670 | if msg_id in self.pending: |
|
670 | if msg_id in self.pending: | |
671 | self.pending.remove(msg_id) |
|
671 | self.pending.remove(msg_id) | |
672 | self.all_completed.add(msg_id) |
|
672 | self.all_completed.add(msg_id) | |
673 | if eid is not None: |
|
673 | if eid is not None: | |
674 | self.completed[eid].append(msg_id) |
|
674 | self.completed[eid].append(msg_id) | |
675 | if msg_id in self.tasks[eid]: |
|
675 | if msg_id in self.tasks[eid]: | |
676 | self.tasks[eid].remove(msg_id) |
|
676 | self.tasks[eid].remove(msg_id) | |
677 | completed = datetime.strptime(header['date'], ISO8601) |
|
677 | completed = datetime.strptime(header['date'], ISO8601) | |
678 | started = header.get('started', None) |
|
678 | started = header.get('started', None) | |
679 | if started is not None: |
|
679 | if started is not None: | |
680 | started = datetime.strptime(started, ISO8601) |
|
680 | started = datetime.strptime(started, ISO8601) | |
681 | result = { |
|
681 | result = { | |
682 | 'result_header' : header, |
|
682 | 'result_header' : header, | |
683 | 'result_content': msg['content'], |
|
683 | 'result_content': msg['content'], | |
684 | 'started' : started, |
|
684 | 'started' : started, | |
685 | 'completed' : completed, |
|
685 | 'completed' : completed, | |
686 | 'engine_uuid': engine_uuid |
|
686 | 'engine_uuid': engine_uuid | |
687 | } |
|
687 | } | |
688 | if MongoDB is not None and isinstance(self.db, MongoDB): |
|
688 | if MongoDB is not None and isinstance(self.db, MongoDB): | |
689 | result['result_buffers'] = map(Binary, msg['buffers']) |
|
689 | result['result_buffers'] = map(Binary, msg['buffers']) | |
690 | else: |
|
690 | else: | |
691 | result['result_buffers'] = msg['buffers'] |
|
691 | result['result_buffers'] = msg['buffers'] | |
692 | self.db.update_record(msg_id, result) |
|
692 | self.db.update_record(msg_id, result) | |
693 |
|
693 | |||
694 | else: |
|
694 | else: | |
695 | self.log.debug("task::unknown task %s finished"%msg_id) |
|
695 | self.log.debug("task::unknown task %s finished"%msg_id) | |
696 |
|
696 | |||
697 | def save_task_destination(self, idents, msg): |
|
697 | def save_task_destination(self, idents, msg): | |
698 | try: |
|
698 | try: | |
699 | msg = self.session.unpack_message(msg, content=True) |
|
699 | msg = self.session.unpack_message(msg, content=True) | |
700 | except: |
|
700 | except: | |
701 | self.log.error("task::invalid task tracking message", exc_info=True) |
|
701 | self.log.error("task::invalid task tracking message", exc_info=True) | |
702 | return |
|
702 | return | |
703 | content = msg['content'] |
|
703 | content = msg['content'] | |
704 | print (content) |
|
704 | print (content) | |
705 | msg_id = content['msg_id'] |
|
705 | msg_id = content['msg_id'] | |
706 | engine_uuid = content['engine_id'] |
|
706 | engine_uuid = content['engine_id'] | |
707 | eid = self.by_ident[engine_uuid] |
|
707 | eid = self.by_ident[engine_uuid] | |
708 |
|
708 | |||
709 | self.log.info("task::task %s arrived on %s"%(msg_id, eid)) |
|
709 | self.log.info("task::task %s arrived on %s"%(msg_id, eid)) | |
710 | # if msg_id in self.mia: |
|
710 | # if msg_id in self.mia: | |
711 | # self.mia.remove(msg_id) |
|
711 | # self.mia.remove(msg_id) | |
712 | # else: |
|
712 | # else: | |
713 | # self.log.debug("task::task %s not listed as MIA?!"%(msg_id)) |
|
713 | # self.log.debug("task::task %s not listed as MIA?!"%(msg_id)) | |
714 |
|
714 | |||
715 | self.tasks[eid].append(msg_id) |
|
715 | self.tasks[eid].append(msg_id) | |
716 | # self.pending[msg_id][1].update(received=datetime.now(),engine=(eid,engine_uuid)) |
|
716 | # self.pending[msg_id][1].update(received=datetime.now(),engine=(eid,engine_uuid)) | |
717 | self.db.update_record(msg_id, dict(engine_uuid=engine_uuid)) |
|
717 | self.db.update_record(msg_id, dict(engine_uuid=engine_uuid)) | |
718 |
|
718 | |||
719 | def mia_task_request(self, idents, msg): |
|
719 | def mia_task_request(self, idents, msg): | |
720 | raise NotImplementedError |
|
720 | raise NotImplementedError | |
721 | client_id = idents[0] |
|
721 | client_id = idents[0] | |
722 | # content = dict(mia=self.mia,status='ok') |
|
722 | # content = dict(mia=self.mia,status='ok') | |
723 | # self.session.send('mia_reply', content=content, idents=client_id) |
|
723 | # self.session.send('mia_reply', content=content, idents=client_id) | |
724 |
|
724 | |||
725 |
|
725 | |||
726 | #--------------------- IOPub Traffic ------------------------------ |
|
726 | #--------------------- IOPub Traffic ------------------------------ | |
727 |
|
727 | |||
728 | def save_iopub_message(self, topics, msg): |
|
728 | def save_iopub_message(self, topics, msg): | |
729 | """save an iopub message into the db""" |
|
729 | """save an iopub message into the db""" | |
730 | print (topics) |
|
730 | print (topics) | |
731 | try: |
|
731 | try: | |
732 | msg = self.session.unpack_message(msg, content=True) |
|
732 | msg = self.session.unpack_message(msg, content=True) | |
733 | except: |
|
733 | except: | |
734 | self.log.error("iopub::invalid IOPub message", exc_info=True) |
|
734 | self.log.error("iopub::invalid IOPub message", exc_info=True) | |
735 | return |
|
735 | return | |
736 |
|
736 | |||
737 | parent = msg['parent_header'] |
|
737 | parent = msg['parent_header'] | |
738 | if not parent: |
|
738 | if not parent: | |
739 | self.log.error("iopub::invalid IOPub message: %s"%msg) |
|
739 | self.log.error("iopub::invalid IOPub message: %s"%msg) | |
740 | return |
|
740 | return | |
741 | msg_id = parent['msg_id'] |
|
741 | msg_id = parent['msg_id'] | |
742 | msg_type = msg['msg_type'] |
|
742 | msg_type = msg['msg_type'] | |
743 | content = msg['content'] |
|
743 | content = msg['content'] | |
744 |
|
744 | |||
745 | # ensure msg_id is in db |
|
745 | # ensure msg_id is in db | |
746 | try: |
|
746 | try: | |
747 | rec = self.db.get_record(msg_id) |
|
747 | rec = self.db.get_record(msg_id) | |
748 | except: |
|
748 | except: | |
749 | self.log.error("iopub::IOPub message has invalid parent", exc_info=True) |
|
749 | self.log.error("iopub::IOPub message has invalid parent", exc_info=True) | |
750 | return |
|
750 | return | |
751 | # stream |
|
751 | # stream | |
752 | d = {} |
|
752 | d = {} | |
753 | if msg_type == 'stream': |
|
753 | if msg_type == 'stream': | |
754 | name = content['name'] |
|
754 | name = content['name'] | |
755 | s = rec[name] or '' |
|
755 | s = rec[name] or '' | |
756 | d[name] = s + content['data'] |
|
756 | d[name] = s + content['data'] | |
757 |
|
757 | |||
758 | elif msg_type == 'pyerr': |
|
758 | elif msg_type == 'pyerr': | |
759 | d['pyerr'] = content |
|
759 | d['pyerr'] = content | |
760 | else: |
|
760 | else: | |
761 | d[msg_type] = content['data'] |
|
761 | d[msg_type] = content['data'] | |
762 |
|
762 | |||
763 | self.db.update_record(msg_id, d) |
|
763 | self.db.update_record(msg_id, d) | |
764 |
|
764 | |||
765 |
|
765 | |||
766 |
|
766 | |||
767 | #------------------------------------------------------------------------- |
|
767 | #------------------------------------------------------------------------- | |
768 | # Registration requests |
|
768 | # Registration requests | |
769 | #------------------------------------------------------------------------- |
|
769 | #------------------------------------------------------------------------- | |
770 |
|
770 | |||
771 | def connection_request(self, client_id, msg): |
|
771 | def connection_request(self, client_id, msg): | |
772 | """Reply with connection addresses for clients.""" |
|
772 | """Reply with connection addresses for clients.""" | |
773 | self.log.info("client::client %s connected"%client_id) |
|
773 | self.log.info("client::client %s connected"%client_id) | |
774 | content = dict(status='ok') |
|
774 | content = dict(status='ok') | |
775 | content.update(self.client_info) |
|
775 | content.update(self.client_info) | |
776 | jsonable = {} |
|
776 | jsonable = {} | |
777 | for k,v in self.keytable.iteritems(): |
|
777 | for k,v in self.keytable.iteritems(): | |
778 | jsonable[str(k)] = v |
|
778 | jsonable[str(k)] = v | |
779 | content['engines'] = jsonable |
|
779 | content['engines'] = jsonable | |
780 | self.session.send(self.registrar, 'connection_reply', content, parent=msg, ident=client_id) |
|
780 | self.session.send(self.registrar, 'connection_reply', content, parent=msg, ident=client_id) | |
781 |
|
781 | |||
782 | def register_engine(self, reg, msg): |
|
782 | def register_engine(self, reg, msg): | |
783 | """Register a new engine.""" |
|
783 | """Register a new engine.""" | |
784 | content = msg['content'] |
|
784 | content = msg['content'] | |
785 | try: |
|
785 | try: | |
786 | queue = content['queue'] |
|
786 | queue = content['queue'] | |
787 | except KeyError: |
|
787 | except KeyError: | |
788 | self.log.error("registration::queue not specified", exc_info=True) |
|
788 | self.log.error("registration::queue not specified", exc_info=True) | |
789 | return |
|
789 | return | |
790 | heart = content.get('heartbeat', None) |
|
790 | heart = content.get('heartbeat', None) | |
791 | """register a new engine, and create the socket(s) necessary""" |
|
791 | """register a new engine, and create the socket(s) necessary""" | |
792 | eid = self._next_id |
|
792 | eid = self._next_id | |
793 | # print (eid, queue, reg, heart) |
|
793 | # print (eid, queue, reg, heart) | |
794 |
|
794 | |||
795 | self.log.debug("registration::register_engine(%i, %r, %r, %r)"%(eid, queue, reg, heart)) |
|
795 | self.log.debug("registration::register_engine(%i, %r, %r, %r)"%(eid, queue, reg, heart)) | |
796 |
|
796 | |||
797 | content = dict(id=eid,status='ok') |
|
797 | content = dict(id=eid,status='ok') | |
798 | content.update(self.engine_info) |
|
798 | content.update(self.engine_info) | |
799 | # check if requesting available IDs: |
|
799 | # check if requesting available IDs: | |
800 | if queue in self.by_ident: |
|
800 | if queue in self.by_ident: | |
801 | try: |
|
801 | try: | |
802 | raise KeyError("queue_id %r in use"%queue) |
|
802 | raise KeyError("queue_id %r in use"%queue) | |
803 | except: |
|
803 | except: | |
804 | content = wrap_exception() |
|
804 | content = wrap_exception() | |
805 | self.log.error("queue_id %r in use"%queue, exc_info=True) |
|
805 | self.log.error("queue_id %r in use"%queue, exc_info=True) | |
806 | elif heart in self.hearts: # need to check unique hearts? |
|
806 | elif heart in self.hearts: # need to check unique hearts? | |
807 | try: |
|
807 | try: | |
808 | raise KeyError("heart_id %r in use"%heart) |
|
808 | raise KeyError("heart_id %r in use"%heart) | |
809 | except: |
|
809 | except: | |
810 | self.log.error("heart_id %r in use"%heart, exc_info=True) |
|
810 | self.log.error("heart_id %r in use"%heart, exc_info=True) | |
811 | content = wrap_exception() |
|
811 | content = wrap_exception() | |
812 | else: |
|
812 | else: | |
813 | for h, pack in self.incoming_registrations.iteritems(): |
|
813 | for h, pack in self.incoming_registrations.iteritems(): | |
814 | if heart == h: |
|
814 | if heart == h: | |
815 | try: |
|
815 | try: | |
816 | raise KeyError("heart_id %r in use"%heart) |
|
816 | raise KeyError("heart_id %r in use"%heart) | |
817 | except: |
|
817 | except: | |
818 | self.log.error("heart_id %r in use"%heart, exc_info=True) |
|
818 | self.log.error("heart_id %r in use"%heart, exc_info=True) | |
819 | content = wrap_exception() |
|
819 | content = wrap_exception() | |
820 | break |
|
820 | break | |
821 | elif queue == pack[1]: |
|
821 | elif queue == pack[1]: | |
822 | try: |
|
822 | try: | |
823 | raise KeyError("queue_id %r in use"%queue) |
|
823 | raise KeyError("queue_id %r in use"%queue) | |
824 | except: |
|
824 | except: | |
825 | self.log.error("queue_id %r in use"%queue, exc_info=True) |
|
825 | self.log.error("queue_id %r in use"%queue, exc_info=True) | |
826 | content = wrap_exception() |
|
826 | content = wrap_exception() | |
827 | break |
|
827 | break | |
828 |
|
828 | |||
829 | msg = self.session.send(self.registrar, "registration_reply", |
|
829 | msg = self.session.send(self.registrar, "registration_reply", | |
830 | content=content, |
|
830 | content=content, | |
831 | ident=reg) |
|
831 | ident=reg) | |
832 |
|
832 | |||
833 | if content['status'] == 'ok': |
|
833 | if content['status'] == 'ok': | |
834 | if heart in self.heartmonitor.hearts: |
|
834 | if heart in self.heartmonitor.hearts: | |
835 | # already beating |
|
835 | # already beating | |
836 | self.incoming_registrations[heart] = (eid,queue,reg[0],None) |
|
836 | self.incoming_registrations[heart] = (eid,queue,reg[0],None) | |
837 | self.finish_registration(heart) |
|
837 | self.finish_registration(heart) | |
838 | else: |
|
838 | else: | |
839 | purge = lambda : self._purge_stalled_registration(heart) |
|
839 | purge = lambda : self._purge_stalled_registration(heart) | |
840 | dc = ioloop.DelayedCallback(purge, self.registration_timeout, self.loop) |
|
840 | dc = ioloop.DelayedCallback(purge, self.registration_timeout, self.loop) | |
841 | dc.start() |
|
841 | dc.start() | |
842 | self.incoming_registrations[heart] = (eid,queue,reg[0],dc) |
|
842 | self.incoming_registrations[heart] = (eid,queue,reg[0],dc) | |
843 | else: |
|
843 | else: | |
844 | self.log.error("registration::registration %i failed: %s"%(eid, content['evalue'])) |
|
844 | self.log.error("registration::registration %i failed: %s"%(eid, content['evalue'])) | |
845 | return eid |
|
845 | return eid | |
846 |
|
846 | |||
847 | def unregister_engine(self, ident, msg): |
|
847 | def unregister_engine(self, ident, msg): | |
848 | """Unregister an engine that explicitly requested to leave.""" |
|
848 | """Unregister an engine that explicitly requested to leave.""" | |
849 | try: |
|
849 | try: | |
850 | eid = msg['content']['id'] |
|
850 | eid = msg['content']['id'] | |
851 | except: |
|
851 | except: | |
852 | self.log.error("registration::bad engine id for unregistration: %s"%ident, exc_info=True) |
|
852 | self.log.error("registration::bad engine id for unregistration: %s"%ident, exc_info=True) | |
853 | return |
|
853 | return | |
854 | self.log.info("registration::unregister_engine(%s)"%eid) |
|
854 | self.log.info("registration::unregister_engine(%s)"%eid) | |
855 | content=dict(id=eid, queue=self.engines[eid].queue) |
|
855 | content=dict(id=eid, queue=self.engines[eid].queue) | |
856 | self.ids.remove(eid) |
|
856 | self.ids.remove(eid) | |
857 | self.keytable.pop(eid) |
|
857 | self.keytable.pop(eid) | |
858 | ec = self.engines.pop(eid) |
|
858 | ec = self.engines.pop(eid) | |
859 | self.hearts.pop(ec.heartbeat) |
|
859 | self.hearts.pop(ec.heartbeat) | |
860 | self.by_ident.pop(ec.queue) |
|
860 | self.by_ident.pop(ec.queue) | |
861 | self.completed.pop(eid) |
|
861 | self.completed.pop(eid) | |
862 | for msg_id in self.queues.pop(eid): |
|
862 | for msg_id in self.queues.pop(eid): | |
863 | msg = self.pending.remove(msg_id) |
|
863 | msg = self.pending.remove(msg_id) | |
864 | ############## TODO: HANDLE IT ################ |
|
864 | ############## TODO: HANDLE IT ################ | |
865 |
|
865 | |||
866 | if self.notifier: |
|
866 | if self.notifier: | |
867 | self.session.send(self.notifier, "unregistration_notification", content=content) |
|
867 | self.session.send(self.notifier, "unregistration_notification", content=content) | |
868 |
|
868 | |||
869 | def finish_registration(self, heart): |
|
869 | def finish_registration(self, heart): | |
870 | """Second half of engine registration, called after our HeartMonitor |
|
870 | """Second half of engine registration, called after our HeartMonitor | |
871 | has received a beat from the Engine's Heart.""" |
|
871 | has received a beat from the Engine's Heart.""" | |
872 | try: |
|
872 | try: | |
873 | (eid,queue,reg,purge) = self.incoming_registrations.pop(heart) |
|
873 | (eid,queue,reg,purge) = self.incoming_registrations.pop(heart) | |
874 | except KeyError: |
|
874 | except KeyError: | |
875 | self.log.error("registration::tried to finish nonexistant registration", exc_info=True) |
|
875 | self.log.error("registration::tried to finish nonexistant registration", exc_info=True) | |
876 | return |
|
876 | return | |
877 | self.log.info("registration::finished registering engine %i:%r"%(eid,queue)) |
|
877 | self.log.info("registration::finished registering engine %i:%r"%(eid,queue)) | |
878 | if purge is not None: |
|
878 | if purge is not None: | |
879 | purge.stop() |
|
879 | purge.stop() | |
880 | control = queue |
|
880 | control = queue | |
881 | self.ids.add(eid) |
|
881 | self.ids.add(eid) | |
882 | self.keytable[eid] = queue |
|
882 | self.keytable[eid] = queue | |
883 | self.engines[eid] = EngineConnector(id=eid, queue=queue, registration=reg, |
|
883 | self.engines[eid] = EngineConnector(id=eid, queue=queue, registration=reg, | |
884 | control=control, heartbeat=heart) |
|
884 | control=control, heartbeat=heart) | |
885 | self.by_ident[queue] = eid |
|
885 | self.by_ident[queue] = eid | |
886 | self.queues[eid] = list() |
|
886 | self.queues[eid] = list() | |
887 | self.tasks[eid] = list() |
|
887 | self.tasks[eid] = list() | |
888 | self.completed[eid] = list() |
|
888 | self.completed[eid] = list() | |
889 | self.hearts[heart] = eid |
|
889 | self.hearts[heart] = eid | |
890 | content = dict(id=eid, queue=self.engines[eid].queue) |
|
890 | content = dict(id=eid, queue=self.engines[eid].queue) | |
891 | if self.notifier: |
|
891 | if self.notifier: | |
892 | self.session.send(self.notifier, "registration_notification", content=content) |
|
892 | self.session.send(self.notifier, "registration_notification", content=content) | |
893 | self.log.info("engine::Engine Connected: %i"%eid) |
|
893 | self.log.info("engine::Engine Connected: %i"%eid) | |
894 |
|
894 | |||
895 | def _purge_stalled_registration(self, heart): |
|
895 | def _purge_stalled_registration(self, heart): | |
896 | if heart in self.incoming_registrations: |
|
896 | if heart in self.incoming_registrations: | |
897 | eid = self.incoming_registrations.pop(heart)[0] |
|
897 | eid = self.incoming_registrations.pop(heart)[0] | |
898 | self.log.info("registration::purging stalled registration: %i"%eid) |
|
898 | self.log.info("registration::purging stalled registration: %i"%eid) | |
899 | else: |
|
899 | else: | |
900 | pass |
|
900 | pass | |
901 |
|
901 | |||
902 | #------------------------------------------------------------------------- |
|
902 | #------------------------------------------------------------------------- | |
903 | # Client Requests |
|
903 | # Client Requests | |
904 | #------------------------------------------------------------------------- |
|
904 | #------------------------------------------------------------------------- | |
905 |
|
905 | |||
906 | def shutdown_request(self, client_id, msg): |
|
906 | def shutdown_request(self, client_id, msg): | |
907 | """handle shutdown request.""" |
|
907 | """handle shutdown request.""" | |
908 | # s = self.context.socket(zmq.XREQ) |
|
908 | # s = self.context.socket(zmq.XREQ) | |
909 | # s.connect(self.client_connections['mux']) |
|
909 | # s.connect(self.client_connections['mux']) | |
910 | # time.sleep(0.1) |
|
910 | # time.sleep(0.1) | |
911 | # for eid,ec in self.engines.iteritems(): |
|
911 | # for eid,ec in self.engines.iteritems(): | |
912 | # self.session.send(s, 'shutdown_request', content=dict(restart=False), ident=ec.queue) |
|
912 | # self.session.send(s, 'shutdown_request', content=dict(restart=False), ident=ec.queue) | |
913 | # time.sleep(1) |
|
913 | # time.sleep(1) | |
914 | self.session.send(self.clientele, 'shutdown_reply', content={'status': 'ok'}, ident=client_id) |
|
914 | self.session.send(self.clientele, 'shutdown_reply', content={'status': 'ok'}, ident=client_id) | |
915 | dc = ioloop.DelayedCallback(lambda : self._shutdown(), 1000, self.loop) |
|
915 | dc = ioloop.DelayedCallback(lambda : self._shutdown(), 1000, self.loop) | |
916 | dc.start() |
|
916 | dc.start() | |
917 |
|
917 | |||
918 | def _shutdown(self): |
|
918 | def _shutdown(self): | |
919 | self.log.info("hub::hub shutting down.") |
|
919 | self.log.info("hub::hub shutting down.") | |
920 | time.sleep(0.1) |
|
920 | time.sleep(0.1) | |
921 | sys.exit(0) |
|
921 | sys.exit(0) | |
922 |
|
922 | |||
923 |
|
923 | |||
924 | def check_load(self, client_id, msg): |
|
924 | def check_load(self, client_id, msg): | |
925 | content = msg['content'] |
|
925 | content = msg['content'] | |
926 | try: |
|
926 | try: | |
927 | targets = content['targets'] |
|
927 | targets = content['targets'] | |
928 | targets = self._validate_targets(targets) |
|
928 | targets = self._validate_targets(targets) | |
929 | except: |
|
929 | except: | |
930 | content = wrap_exception() |
|
930 | content = wrap_exception() | |
931 | self.session.send(self.clientele, "hub_error", |
|
931 | self.session.send(self.clientele, "hub_error", | |
932 | content=content, ident=client_id) |
|
932 | content=content, ident=client_id) | |
933 | return |
|
933 | return | |
934 |
|
934 | |||
935 | content = dict(status='ok') |
|
935 | content = dict(status='ok') | |
936 | # loads = {} |
|
936 | # loads = {} | |
937 | for t in targets: |
|
937 | for t in targets: | |
938 | content[bytes(t)] = len(self.queues[t])+len(self.tasks[t]) |
|
938 | content[bytes(t)] = len(self.queues[t])+len(self.tasks[t]) | |
939 | self.session.send(self.clientele, "load_reply", content=content, ident=client_id) |
|
939 | self.session.send(self.clientele, "load_reply", content=content, ident=client_id) | |
940 |
|
940 | |||
941 |
|
941 | |||
942 | def queue_status(self, client_id, msg): |
|
942 | def queue_status(self, client_id, msg): | |
943 | """Return the Queue status of one or more targets. |
|
943 | """Return the Queue status of one or more targets. | |
944 | if verbose: return the msg_ids |
|
944 | if verbose: return the msg_ids | |
945 | else: return len of each type. |
|
945 | else: return len of each type. | |
946 | keys: queue (pending MUX jobs) |
|
946 | keys: queue (pending MUX jobs) | |
947 | tasks (pending Task jobs) |
|
947 | tasks (pending Task jobs) | |
948 | completed (finished jobs from both queues)""" |
|
948 | completed (finished jobs from both queues)""" | |
949 | content = msg['content'] |
|
949 | content = msg['content'] | |
950 | targets = content['targets'] |
|
950 | targets = content['targets'] | |
951 | try: |
|
951 | try: | |
952 | targets = self._validate_targets(targets) |
|
952 | targets = self._validate_targets(targets) | |
953 | except: |
|
953 | except: | |
954 | content = wrap_exception() |
|
954 | content = wrap_exception() | |
955 | self.session.send(self.clientele, "hub_error", |
|
955 | self.session.send(self.clientele, "hub_error", | |
956 | content=content, ident=client_id) |
|
956 | content=content, ident=client_id) | |
957 | return |
|
957 | return | |
958 | verbose = content.get('verbose', False) |
|
958 | verbose = content.get('verbose', False) | |
959 | content = dict(status='ok') |
|
959 | content = dict(status='ok') | |
960 | for t in targets: |
|
960 | for t in targets: | |
961 | queue = self.queues[t] |
|
961 | queue = self.queues[t] | |
962 | completed = self.completed[t] |
|
962 | completed = self.completed[t] | |
963 | tasks = self.tasks[t] |
|
963 | tasks = self.tasks[t] | |
964 | if not verbose: |
|
964 | if not verbose: | |
965 | queue = len(queue) |
|
965 | queue = len(queue) | |
966 | completed = len(completed) |
|
966 | completed = len(completed) | |
967 | tasks = len(tasks) |
|
967 | tasks = len(tasks) | |
968 | content[bytes(t)] = {'queue': queue, 'completed': completed , 'tasks': tasks} |
|
968 | content[bytes(t)] = {'queue': queue, 'completed': completed , 'tasks': tasks} | |
969 | # pending |
|
969 | # pending | |
970 | self.session.send(self.clientele, "queue_reply", content=content, ident=client_id) |
|
970 | self.session.send(self.clientele, "queue_reply", content=content, ident=client_id) | |
971 |
|
971 | |||
972 | def purge_results(self, client_id, msg): |
|
972 | def purge_results(self, client_id, msg): | |
973 | """Purge results from memory. This method is more valuable before we move |
|
973 | """Purge results from memory. This method is more valuable before we move | |
974 | to a DB based message storage mechanism.""" |
|
974 | to a DB based message storage mechanism.""" | |
975 | content = msg['content'] |
|
975 | content = msg['content'] | |
976 | msg_ids = content.get('msg_ids', []) |
|
976 | msg_ids = content.get('msg_ids', []) | |
977 | reply = dict(status='ok') |
|
977 | reply = dict(status='ok') | |
978 | if msg_ids == 'all': |
|
978 | if msg_ids == 'all': | |
979 | self.db.drop_matching_records(dict(completed={'$ne':None})) |
|
979 | self.db.drop_matching_records(dict(completed={'$ne':None})) | |
980 | else: |
|
980 | else: | |
981 | for msg_id in msg_ids: |
|
981 | for msg_id in msg_ids: | |
982 | if msg_id in self.all_completed: |
|
982 | if msg_id in self.all_completed: | |
983 | self.db.drop_record(msg_id) |
|
983 | self.db.drop_record(msg_id) | |
984 | else: |
|
984 | else: | |
985 | if msg_id in self.pending: |
|
985 | if msg_id in self.pending: | |
986 | try: |
|
986 | try: | |
987 | raise IndexError("msg pending: %r"%msg_id) |
|
987 | raise IndexError("msg pending: %r"%msg_id) | |
988 | except: |
|
988 | except: | |
989 | reply = wrap_exception() |
|
989 | reply = wrap_exception() | |
990 | else: |
|
990 | else: | |
991 | try: |
|
991 | try: | |
992 | raise IndexError("No such msg: %r"%msg_id) |
|
992 | raise IndexError("No such msg: %r"%msg_id) | |
993 | except: |
|
993 | except: | |
994 | reply = wrap_exception() |
|
994 | reply = wrap_exception() | |
995 | break |
|
995 | break | |
996 | eids = content.get('engine_ids', []) |
|
996 | eids = content.get('engine_ids', []) | |
997 | for eid in eids: |
|
997 | for eid in eids: | |
998 | if eid not in self.engines: |
|
998 | if eid not in self.engines: | |
999 | try: |
|
999 | try: | |
1000 | raise IndexError("No such engine: %i"%eid) |
|
1000 | raise IndexError("No such engine: %i"%eid) | |
1001 | except: |
|
1001 | except: | |
1002 | reply = wrap_exception() |
|
1002 | reply = wrap_exception() | |
1003 | break |
|
1003 | break | |
1004 | msg_ids = self.completed.pop(eid) |
|
1004 | msg_ids = self.completed.pop(eid) | |
1005 | uid = self.engines[eid].queue |
|
1005 | uid = self.engines[eid].queue | |
1006 | self.db.drop_matching_records(dict(engine_uuid=uid, completed={'$ne':None})) |
|
1006 | self.db.drop_matching_records(dict(engine_uuid=uid, completed={'$ne':None})) | |
1007 |
|
1007 | |||
1008 | self.session.send(self.clientele, 'purge_reply', content=reply, ident=client_id) |
|
1008 | self.session.send(self.clientele, 'purge_reply', content=reply, ident=client_id) | |
1009 |
|
1009 | |||
1010 | def resubmit_task(self, client_id, msg, buffers): |
|
1010 | def resubmit_task(self, client_id, msg, buffers): | |
1011 | """Resubmit a task.""" |
|
1011 | """Resubmit a task.""" | |
1012 | raise NotImplementedError |
|
1012 | raise NotImplementedError | |
1013 |
|
1013 | |||
1014 | def get_results(self, client_id, msg): |
|
1014 | def get_results(self, client_id, msg): | |
1015 | """Get the result of 1 or more messages.""" |
|
1015 | """Get the result of 1 or more messages.""" | |
1016 | content = msg['content'] |
|
1016 | content = msg['content'] | |
1017 | msg_ids = sorted(set(content['msg_ids'])) |
|
1017 | msg_ids = sorted(set(content['msg_ids'])) | |
1018 | statusonly = content.get('status_only', False) |
|
1018 | statusonly = content.get('status_only', False) | |
1019 | pending = [] |
|
1019 | pending = [] | |
1020 | completed = [] |
|
1020 | completed = [] | |
1021 | content = dict(status='ok') |
|
1021 | content = dict(status='ok') | |
1022 | content['pending'] = pending |
|
1022 | content['pending'] = pending | |
1023 | content['completed'] = completed |
|
1023 | content['completed'] = completed | |
1024 | buffers = [] |
|
1024 | buffers = [] | |
1025 | if not statusonly: |
|
1025 | if not statusonly: | |
1026 | content['results'] = {} |
|
1026 | content['results'] = {} | |
1027 | records = self.db.find_records(dict(msg_id={'$in':msg_ids})) |
|
1027 | records = self.db.find_records(dict(msg_id={'$in':msg_ids})) | |
1028 | for msg_id in msg_ids: |
|
1028 | for msg_id in msg_ids: | |
1029 | if msg_id in self.pending: |
|
1029 | if msg_id in self.pending: | |
1030 | pending.append(msg_id) |
|
1030 | pending.append(msg_id) | |
1031 | elif msg_id in self.all_completed: |
|
1031 | elif msg_id in self.all_completed: | |
1032 | completed.append(msg_id) |
|
1032 | completed.append(msg_id) | |
1033 | if not statusonly: |
|
1033 | if not statusonly: | |
1034 | rec = records[msg_id] |
|
1034 | rec = records[msg_id] | |
1035 | io_dict = {} |
|
1035 | io_dict = {} | |
1036 | for key in 'pyin pyout pyerr stdout stderr'.split(): |
|
1036 | for key in 'pyin pyout pyerr stdout stderr'.split(): | |
1037 | io_dict[key] = rec[key] |
|
1037 | io_dict[key] = rec[key] | |
1038 | content[msg_id] = { 'result_content': rec['result_content'], |
|
1038 | content[msg_id] = { 'result_content': rec['result_content'], | |
1039 | 'header': rec['header'], |
|
1039 | 'header': rec['header'], | |
1040 | 'result_header' : rec['result_header'], |
|
1040 | 'result_header' : rec['result_header'], | |
1041 | 'io' : io_dict, |
|
1041 | 'io' : io_dict, | |
1042 | } |
|
1042 | } | |
1043 | buffers.extend(map(str, rec['result_buffers'])) |
|
1043 | buffers.extend(map(str, rec['result_buffers'])) | |
1044 | else: |
|
1044 | else: | |
1045 | try: |
|
1045 | try: | |
1046 | raise KeyError('No such message: '+msg_id) |
|
1046 | raise KeyError('No such message: '+msg_id) | |
1047 | except: |
|
1047 | except: | |
1048 | content = wrap_exception() |
|
1048 | content = wrap_exception() | |
1049 | break |
|
1049 | break | |
1050 | self.session.send(self.clientele, "result_reply", content=content, |
|
1050 | self.session.send(self.clientele, "result_reply", content=content, | |
1051 | parent=msg, ident=client_id, |
|
1051 | parent=msg, ident=client_id, | |
1052 | buffers=buffers) |
|
1052 | buffers=buffers) | |
1053 |
|
1053 |
@@ -1,580 +1,593 b'' | |||||
1 | #!/usr/bin/env python |
|
1 | #!/usr/bin/env python | |
2 | # encoding: utf-8 |
|
2 | # encoding: utf-8 | |
3 | """ |
|
3 | """ | |
4 | The ipcluster application. |
|
4 | The ipcluster application. | |
5 | """ |
|
5 | """ | |
6 |
|
6 | |||
7 | #----------------------------------------------------------------------------- |
|
7 | #----------------------------------------------------------------------------- | |
8 | # Copyright (C) 2008-2009 The IPython Development Team |
|
8 | # Copyright (C) 2008-2009 The IPython Development Team | |
9 | # |
|
9 | # | |
10 | # Distributed under the terms of the BSD License. The full license is in |
|
10 | # Distributed under the terms of the BSD License. The full license is in | |
11 | # the file COPYING, distributed as part of this software. |
|
11 | # the file COPYING, distributed as part of this software. | |
12 | #----------------------------------------------------------------------------- |
|
12 | #----------------------------------------------------------------------------- | |
13 |
|
13 | |||
14 | #----------------------------------------------------------------------------- |
|
14 | #----------------------------------------------------------------------------- | |
15 | # Imports |
|
15 | # Imports | |
16 | #----------------------------------------------------------------------------- |
|
16 | #----------------------------------------------------------------------------- | |
17 |
|
17 | |||
18 | import re |
|
18 | import re | |
19 | import logging |
|
19 | import logging | |
20 | import os |
|
20 | import os | |
21 | import signal |
|
21 | import signal | |
22 | import logging |
|
22 | import logging | |
|
23 | import errno | |||
23 |
|
24 | |||
|
25 | import zmq | |||
24 | from zmq.eventloop import ioloop |
|
26 | from zmq.eventloop import ioloop | |
25 |
|
27 | |||
26 | from IPython.external.argparse import ArgumentParser, SUPPRESS |
|
28 | from IPython.external.argparse import ArgumentParser, SUPPRESS | |
27 | from IPython.utils.importstring import import_item |
|
29 | from IPython.utils.importstring import import_item | |
28 | from IPython.zmq.parallel.clusterdir import ( |
|
30 | from IPython.zmq.parallel.clusterdir import ( | |
29 | ApplicationWithClusterDir, ClusterDirConfigLoader, |
|
31 | ApplicationWithClusterDir, ClusterDirConfigLoader, | |
30 | ClusterDirError, PIDFileError |
|
32 | ClusterDirError, PIDFileError | |
31 | ) |
|
33 | ) | |
32 |
|
34 | |||
33 |
|
35 | |||
34 | #----------------------------------------------------------------------------- |
|
36 | #----------------------------------------------------------------------------- | |
35 | # Module level variables |
|
37 | # Module level variables | |
36 | #----------------------------------------------------------------------------- |
|
38 | #----------------------------------------------------------------------------- | |
37 |
|
39 | |||
38 |
|
40 | |||
39 | default_config_file_name = u'ipcluster_config.py' |
|
41 | default_config_file_name = u'ipcluster_config.py' | |
40 |
|
42 | |||
41 |
|
43 | |||
42 | _description = """\ |
|
44 | _description = """\ | |
43 | Start an IPython cluster for parallel computing.\n\n |
|
45 | Start an IPython cluster for parallel computing.\n\n | |
44 |
|
46 | |||
45 | An IPython cluster consists of 1 controller and 1 or more engines. |
|
47 | An IPython cluster consists of 1 controller and 1 or more engines. | |
46 | This command automates the startup of these processes using a wide |
|
48 | This command automates the startup of these processes using a wide | |
47 | range of startup methods (SSH, local processes, PBS, mpiexec, |
|
49 | range of startup methods (SSH, local processes, PBS, mpiexec, | |
48 | Windows HPC Server 2008). To start a cluster with 4 engines on your |
|
50 | Windows HPC Server 2008). To start a cluster with 4 engines on your | |
49 | local host simply do 'ipclusterz start -n 4'. For more complex usage |
|
51 | local host simply do 'ipclusterz start -n 4'. For more complex usage | |
50 | you will typically do 'ipclusterz create -p mycluster', then edit |
|
52 | you will typically do 'ipclusterz create -p mycluster', then edit | |
51 | configuration files, followed by 'ipclusterz start -p mycluster -n 4'. |
|
53 | configuration files, followed by 'ipclusterz start -p mycluster -n 4'. | |
52 | """ |
|
54 | """ | |
53 |
|
55 | |||
54 |
|
56 | |||
55 | # Exit codes for ipcluster |
|
57 | # Exit codes for ipcluster | |
56 |
|
58 | |||
57 | # This will be the exit code if the ipcluster appears to be running because |
|
59 | # This will be the exit code if the ipcluster appears to be running because | |
58 | # a .pid file exists |
|
60 | # a .pid file exists | |
59 | ALREADY_STARTED = 10 |
|
61 | ALREADY_STARTED = 10 | |
60 |
|
62 | |||
61 |
|
63 | |||
62 | # This will be the exit code if ipcluster stop is run, but there is not .pid |
|
64 | # This will be the exit code if ipcluster stop is run, but there is not .pid | |
63 | # file to be found. |
|
65 | # file to be found. | |
64 | ALREADY_STOPPED = 11 |
|
66 | ALREADY_STOPPED = 11 | |
65 |
|
67 | |||
66 | # This will be the exit code if ipcluster engines is run, but there is not .pid |
|
68 | # This will be the exit code if ipcluster engines is run, but there is not .pid | |
67 | # file to be found. |
|
69 | # file to be found. | |
68 | NO_CLUSTER = 12 |
|
70 | NO_CLUSTER = 12 | |
69 |
|
71 | |||
70 |
|
72 | |||
71 | #----------------------------------------------------------------------------- |
|
73 | #----------------------------------------------------------------------------- | |
72 | # Command line options |
|
74 | # Command line options | |
73 | #----------------------------------------------------------------------------- |
|
75 | #----------------------------------------------------------------------------- | |
74 |
|
76 | |||
75 |
|
77 | |||
76 | class IPClusterAppConfigLoader(ClusterDirConfigLoader): |
|
78 | class IPClusterAppConfigLoader(ClusterDirConfigLoader): | |
77 |
|
79 | |||
78 | def _add_arguments(self): |
|
80 | def _add_arguments(self): | |
79 | # Don't call ClusterDirConfigLoader._add_arguments as we don't want |
|
81 | # Don't call ClusterDirConfigLoader._add_arguments as we don't want | |
80 | # its defaults on self.parser. Instead, we will put those on |
|
82 | # its defaults on self.parser. Instead, we will put those on | |
81 | # default options on our subparsers. |
|
83 | # default options on our subparsers. | |
82 |
|
84 | |||
83 | # This has all the common options that all subcommands use |
|
85 | # This has all the common options that all subcommands use | |
84 | parent_parser1 = ArgumentParser( |
|
86 | parent_parser1 = ArgumentParser( | |
85 | add_help=False, |
|
87 | add_help=False, | |
86 | argument_default=SUPPRESS |
|
88 | argument_default=SUPPRESS | |
87 | ) |
|
89 | ) | |
88 | self._add_ipython_dir(parent_parser1) |
|
90 | self._add_ipython_dir(parent_parser1) | |
89 | self._add_log_level(parent_parser1) |
|
91 | self._add_log_level(parent_parser1) | |
90 |
|
92 | |||
91 | # This has all the common options that other subcommands use |
|
93 | # This has all the common options that other subcommands use | |
92 | parent_parser2 = ArgumentParser( |
|
94 | parent_parser2 = ArgumentParser( | |
93 | add_help=False, |
|
95 | add_help=False, | |
94 | argument_default=SUPPRESS |
|
96 | argument_default=SUPPRESS | |
95 | ) |
|
97 | ) | |
96 | self._add_cluster_profile(parent_parser2) |
|
98 | self._add_cluster_profile(parent_parser2) | |
97 | self._add_cluster_dir(parent_parser2) |
|
99 | self._add_cluster_dir(parent_parser2) | |
98 | self._add_work_dir(parent_parser2) |
|
100 | self._add_work_dir(parent_parser2) | |
99 | paa = parent_parser2.add_argument |
|
101 | paa = parent_parser2.add_argument | |
100 | paa('--log-to-file', |
|
102 | paa('--log-to-file', | |
101 | action='store_true', dest='Global.log_to_file', |
|
103 | action='store_true', dest='Global.log_to_file', | |
102 | help='Log to a file in the log directory (default is stdout)') |
|
104 | help='Log to a file in the log directory (default is stdout)') | |
103 |
|
105 | |||
104 | # Create the object used to create the subparsers. |
|
106 | # Create the object used to create the subparsers. | |
105 | subparsers = self.parser.add_subparsers( |
|
107 | subparsers = self.parser.add_subparsers( | |
106 | dest='Global.subcommand', |
|
108 | dest='Global.subcommand', | |
107 | title='ipcluster subcommands', |
|
109 | title='ipcluster subcommands', | |
108 | description= |
|
110 | description= | |
109 | """ipcluster has a variety of subcommands. The general way of |
|
111 | """ipcluster has a variety of subcommands. The general way of | |
110 | running ipcluster is 'ipclusterz <cmd> [options]'. To get help |
|
112 | running ipcluster is 'ipclusterz <cmd> [options]'. To get help | |
111 | on a particular subcommand do 'ipclusterz <cmd> -h'.""" |
|
113 | on a particular subcommand do 'ipclusterz <cmd> -h'.""" | |
112 | # help="For more help, type 'ipclusterz <cmd> -h'", |
|
114 | # help="For more help, type 'ipclusterz <cmd> -h'", | |
113 | ) |
|
115 | ) | |
114 |
|
116 | |||
115 | # The "list" subcommand parser |
|
117 | # The "list" subcommand parser | |
116 | parser_list = subparsers.add_parser( |
|
118 | parser_list = subparsers.add_parser( | |
117 | 'list', |
|
119 | 'list', | |
118 | parents=[parent_parser1], |
|
120 | parents=[parent_parser1], | |
119 | argument_default=SUPPRESS, |
|
121 | argument_default=SUPPRESS, | |
120 | help="List all clusters in cwd and ipython_dir.", |
|
122 | help="List all clusters in cwd and ipython_dir.", | |
121 | description= |
|
123 | description= | |
122 | """List all available clusters, by cluster directory, that can |
|
124 | """List all available clusters, by cluster directory, that can | |
123 | be found in the current working directly or in the ipython |
|
125 | be found in the current working directly or in the ipython | |
124 | directory. Cluster directories are named using the convention |
|
126 | directory. Cluster directories are named using the convention | |
125 | 'cluster_<profile>'.""" |
|
127 | 'cluster_<profile>'.""" | |
126 | ) |
|
128 | ) | |
127 |
|
129 | |||
128 | # The "create" subcommand parser |
|
130 | # The "create" subcommand parser | |
129 | parser_create = subparsers.add_parser( |
|
131 | parser_create = subparsers.add_parser( | |
130 | 'create', |
|
132 | 'create', | |
131 | parents=[parent_parser1, parent_parser2], |
|
133 | parents=[parent_parser1, parent_parser2], | |
132 | argument_default=SUPPRESS, |
|
134 | argument_default=SUPPRESS, | |
133 | help="Create a new cluster directory.", |
|
135 | help="Create a new cluster directory.", | |
134 | description= |
|
136 | description= | |
135 | """Create an ipython cluster directory by its profile name or |
|
137 | """Create an ipython cluster directory by its profile name or | |
136 | cluster directory path. Cluster directories contain |
|
138 | cluster directory path. Cluster directories contain | |
137 | configuration, log and security related files and are named |
|
139 | configuration, log and security related files and are named | |
138 | using the convention 'cluster_<profile>'. By default they are |
|
140 | using the convention 'cluster_<profile>'. By default they are | |
139 | located in your ipython directory. Once created, you will |
|
141 | located in your ipython directory. Once created, you will | |
140 | probably need to edit the configuration files in the cluster |
|
142 | probably need to edit the configuration files in the cluster | |
141 | directory to configure your cluster. Most users will create a |
|
143 | directory to configure your cluster. Most users will create a | |
142 | cluster directory by profile name, |
|
144 | cluster directory by profile name, | |
143 | 'ipclusterz create -p mycluster', which will put the directory |
|
145 | 'ipclusterz create -p mycluster', which will put the directory | |
144 | in '<ipython_dir>/cluster_mycluster'. |
|
146 | in '<ipython_dir>/cluster_mycluster'. | |
145 | """ |
|
147 | """ | |
146 | ) |
|
148 | ) | |
147 | paa = parser_create.add_argument |
|
149 | paa = parser_create.add_argument | |
148 | paa('--reset-config', |
|
150 | paa('--reset-config', | |
149 | dest='Global.reset_config', action='store_true', |
|
151 | dest='Global.reset_config', action='store_true', | |
150 | help= |
|
152 | help= | |
151 | """Recopy the default config files to the cluster directory. |
|
153 | """Recopy the default config files to the cluster directory. | |
152 | You will loose any modifications you have made to these files.""") |
|
154 | You will loose any modifications you have made to these files.""") | |
153 |
|
155 | |||
154 | # The "start" subcommand parser |
|
156 | # The "start" subcommand parser | |
155 | parser_start = subparsers.add_parser( |
|
157 | parser_start = subparsers.add_parser( | |
156 | 'start', |
|
158 | 'start', | |
157 | parents=[parent_parser1, parent_parser2], |
|
159 | parents=[parent_parser1, parent_parser2], | |
158 | argument_default=SUPPRESS, |
|
160 | argument_default=SUPPRESS, | |
159 | help="Start a cluster.", |
|
161 | help="Start a cluster.", | |
160 | description= |
|
162 | description= | |
161 | """Start an ipython cluster by its profile name or cluster |
|
163 | """Start an ipython cluster by its profile name or cluster | |
162 | directory. Cluster directories contain configuration, log and |
|
164 | directory. Cluster directories contain configuration, log and | |
163 | security related files and are named using the convention |
|
165 | security related files and are named using the convention | |
164 | 'cluster_<profile>' and should be creating using the 'start' |
|
166 | 'cluster_<profile>' and should be creating using the 'start' | |
165 | subcommand of 'ipcluster'. If your cluster directory is in |
|
167 | subcommand of 'ipcluster'. If your cluster directory is in | |
166 | the cwd or the ipython directory, you can simply refer to it |
|
168 | the cwd or the ipython directory, you can simply refer to it | |
167 | using its profile name, 'ipclusterz start -n 4 -p <profile>`, |
|
169 | using its profile name, 'ipclusterz start -n 4 -p <profile>`, | |
168 | otherwise use the '--cluster-dir' option. |
|
170 | otherwise use the '--cluster-dir' option. | |
169 | """ |
|
171 | """ | |
170 | ) |
|
172 | ) | |
171 |
|
173 | |||
172 | paa = parser_start.add_argument |
|
174 | paa = parser_start.add_argument | |
173 | paa('-n', '--number', |
|
175 | paa('-n', '--number', | |
174 | type=int, dest='Global.n', |
|
176 | type=int, dest='Global.n', | |
175 | help='The number of engines to start.', |
|
177 | help='The number of engines to start.', | |
176 | metavar='Global.n') |
|
178 | metavar='Global.n') | |
177 | paa('--clean-logs', |
|
179 | paa('--clean-logs', | |
178 | dest='Global.clean_logs', action='store_true', |
|
180 | dest='Global.clean_logs', action='store_true', | |
179 | help='Delete old log flies before starting.') |
|
181 | help='Delete old log flies before starting.') | |
180 | paa('--no-clean-logs', |
|
182 | paa('--no-clean-logs', | |
181 | dest='Global.clean_logs', action='store_false', |
|
183 | dest='Global.clean_logs', action='store_false', | |
182 | help="Don't delete old log flies before starting.") |
|
184 | help="Don't delete old log flies before starting.") | |
183 | paa('--daemon', |
|
185 | paa('--daemon', | |
184 | dest='Global.daemonize', action='store_true', |
|
186 | dest='Global.daemonize', action='store_true', | |
185 | help='Daemonize the ipcluster program. This implies --log-to-file') |
|
187 | help='Daemonize the ipcluster program. This implies --log-to-file') | |
186 | paa('--no-daemon', |
|
188 | paa('--no-daemon', | |
187 | dest='Global.daemonize', action='store_false', |
|
189 | dest='Global.daemonize', action='store_false', | |
188 | help="Dont't daemonize the ipcluster program.") |
|
190 | help="Dont't daemonize the ipcluster program.") | |
189 | paa('--delay', |
|
191 | paa('--delay', | |
190 | type=float, dest='Global.delay', |
|
192 | type=float, dest='Global.delay', | |
191 | help="Specify the delay (in seconds) between starting the controller and starting the engine(s).") |
|
193 | help="Specify the delay (in seconds) between starting the controller and starting the engine(s).") | |
192 |
|
194 | |||
193 | # The "stop" subcommand parser |
|
195 | # The "stop" subcommand parser | |
194 | parser_stop = subparsers.add_parser( |
|
196 | parser_stop = subparsers.add_parser( | |
195 | 'stop', |
|
197 | 'stop', | |
196 | parents=[parent_parser1, parent_parser2], |
|
198 | parents=[parent_parser1, parent_parser2], | |
197 | argument_default=SUPPRESS, |
|
199 | argument_default=SUPPRESS, | |
198 | help="Stop a running cluster.", |
|
200 | help="Stop a running cluster.", | |
199 | description= |
|
201 | description= | |
200 | """Stop a running ipython cluster by its profile name or cluster |
|
202 | """Stop a running ipython cluster by its profile name or cluster | |
201 | directory. Cluster directories are named using the convention |
|
203 | directory. Cluster directories are named using the convention | |
202 | 'cluster_<profile>'. If your cluster directory is in |
|
204 | 'cluster_<profile>'. If your cluster directory is in | |
203 | the cwd or the ipython directory, you can simply refer to it |
|
205 | the cwd or the ipython directory, you can simply refer to it | |
204 | using its profile name, 'ipclusterz stop -p <profile>`, otherwise |
|
206 | using its profile name, 'ipclusterz stop -p <profile>`, otherwise | |
205 | use the '--cluster-dir' option. |
|
207 | use the '--cluster-dir' option. | |
206 | """ |
|
208 | """ | |
207 | ) |
|
209 | ) | |
208 | paa = parser_stop.add_argument |
|
210 | paa = parser_stop.add_argument | |
209 | paa('--signal', |
|
211 | paa('--signal', | |
210 | dest='Global.signal', type=int, |
|
212 | dest='Global.signal', type=int, | |
211 | help="The signal number to use in stopping the cluster (default=2).", |
|
213 | help="The signal number to use in stopping the cluster (default=2).", | |
212 | metavar="Global.signal") |
|
214 | metavar="Global.signal") | |
213 |
|
215 | |||
214 | # the "engines" subcommand parser |
|
216 | # the "engines" subcommand parser | |
215 | parser_engines = subparsers.add_parser( |
|
217 | parser_engines = subparsers.add_parser( | |
216 | 'engines', |
|
218 | 'engines', | |
217 | parents=[parent_parser1, parent_parser2], |
|
219 | parents=[parent_parser1, parent_parser2], | |
218 | argument_default=SUPPRESS, |
|
220 | argument_default=SUPPRESS, | |
219 | help="Attach some engines to an existing controller or cluster.", |
|
221 | help="Attach some engines to an existing controller or cluster.", | |
220 | description= |
|
222 | description= | |
221 | """Start one or more engines to connect to an existing Cluster |
|
223 | """Start one or more engines to connect to an existing Cluster | |
222 | by profile name or cluster directory. |
|
224 | by profile name or cluster directory. | |
223 | Cluster directories contain configuration, log and |
|
225 | Cluster directories contain configuration, log and | |
224 | security related files and are named using the convention |
|
226 | security related files and are named using the convention | |
225 | 'cluster_<profile>' and should be creating using the 'start' |
|
227 | 'cluster_<profile>' and should be creating using the 'start' | |
226 | subcommand of 'ipcluster'. If your cluster directory is in |
|
228 | subcommand of 'ipcluster'. If your cluster directory is in | |
227 | the cwd or the ipython directory, you can simply refer to it |
|
229 | the cwd or the ipython directory, you can simply refer to it | |
228 | using its profile name, 'ipclusterz engines -n 4 -p <profile>`, |
|
230 | using its profile name, 'ipclusterz engines -n 4 -p <profile>`, | |
229 | otherwise use the '--cluster-dir' option. |
|
231 | otherwise use the '--cluster-dir' option. | |
230 | """ |
|
232 | """ | |
231 | ) |
|
233 | ) | |
232 | paa = parser_engines.add_argument |
|
234 | paa = parser_engines.add_argument | |
233 | paa('-n', '--number', |
|
235 | paa('-n', '--number', | |
234 | type=int, dest='Global.n', |
|
236 | type=int, dest='Global.n', | |
235 | help='The number of engines to start.', |
|
237 | help='The number of engines to start.', | |
236 | metavar='Global.n') |
|
238 | metavar='Global.n') | |
237 | paa('--daemon', |
|
239 | paa('--daemon', | |
238 | dest='Global.daemonize', action='store_true', |
|
240 | dest='Global.daemonize', action='store_true', | |
239 | help='Daemonize the ipcluster program. This implies --log-to-file') |
|
241 | help='Daemonize the ipcluster program. This implies --log-to-file') | |
240 | paa('--no-daemon', |
|
242 | paa('--no-daemon', | |
241 | dest='Global.daemonize', action='store_false', |
|
243 | dest='Global.daemonize', action='store_false', | |
242 | help="Dont't daemonize the ipcluster program.") |
|
244 | help="Dont't daemonize the ipcluster program.") | |
243 |
|
245 | |||
244 | #----------------------------------------------------------------------------- |
|
246 | #----------------------------------------------------------------------------- | |
245 | # Main application |
|
247 | # Main application | |
246 | #----------------------------------------------------------------------------- |
|
248 | #----------------------------------------------------------------------------- | |
247 |
|
249 | |||
248 |
|
250 | |||
249 | class IPClusterApp(ApplicationWithClusterDir): |
|
251 | class IPClusterApp(ApplicationWithClusterDir): | |
250 |
|
252 | |||
251 | name = u'ipclusterz' |
|
253 | name = u'ipclusterz' | |
252 | description = _description |
|
254 | description = _description | |
253 | usage = None |
|
255 | usage = None | |
254 | command_line_loader = IPClusterAppConfigLoader |
|
256 | command_line_loader = IPClusterAppConfigLoader | |
255 | default_config_file_name = default_config_file_name |
|
257 | default_config_file_name = default_config_file_name | |
256 | default_log_level = logging.INFO |
|
258 | default_log_level = logging.INFO | |
257 | auto_create_cluster_dir = False |
|
259 | auto_create_cluster_dir = False | |
258 |
|
260 | |||
259 | def create_default_config(self): |
|
261 | def create_default_config(self): | |
260 | super(IPClusterApp, self).create_default_config() |
|
262 | super(IPClusterApp, self).create_default_config() | |
261 | self.default_config.Global.controller_launcher = \ |
|
263 | self.default_config.Global.controller_launcher = \ | |
262 | 'IPython.zmq.parallel.launcher.LocalControllerLauncher' |
|
264 | 'IPython.zmq.parallel.launcher.LocalControllerLauncher' | |
263 | self.default_config.Global.engine_launcher = \ |
|
265 | self.default_config.Global.engine_launcher = \ | |
264 | 'IPython.zmq.parallel.launcher.LocalEngineSetLauncher' |
|
266 | 'IPython.zmq.parallel.launcher.LocalEngineSetLauncher' | |
265 | self.default_config.Global.n = 2 |
|
267 | self.default_config.Global.n = 2 | |
266 | self.default_config.Global.delay = 2 |
|
268 | self.default_config.Global.delay = 2 | |
267 | self.default_config.Global.reset_config = False |
|
269 | self.default_config.Global.reset_config = False | |
268 | self.default_config.Global.clean_logs = True |
|
270 | self.default_config.Global.clean_logs = True | |
269 | self.default_config.Global.signal = signal.SIGINT |
|
271 | self.default_config.Global.signal = signal.SIGINT | |
270 | self.default_config.Global.daemonize = False |
|
272 | self.default_config.Global.daemonize = False | |
271 |
|
273 | |||
272 | def find_resources(self): |
|
274 | def find_resources(self): | |
273 | subcommand = self.command_line_config.Global.subcommand |
|
275 | subcommand = self.command_line_config.Global.subcommand | |
274 | if subcommand=='list': |
|
276 | if subcommand=='list': | |
275 | self.list_cluster_dirs() |
|
277 | self.list_cluster_dirs() | |
276 | # Exit immediately because there is nothing left to do. |
|
278 | # Exit immediately because there is nothing left to do. | |
277 | self.exit() |
|
279 | self.exit() | |
278 | elif subcommand=='create': |
|
280 | elif subcommand=='create': | |
279 | self.auto_create_cluster_dir = True |
|
281 | self.auto_create_cluster_dir = True | |
280 | super(IPClusterApp, self).find_resources() |
|
282 | super(IPClusterApp, self).find_resources() | |
281 | elif subcommand=='start' or subcommand=='stop': |
|
283 | elif subcommand=='start' or subcommand=='stop': | |
282 | self.auto_create_cluster_dir = True |
|
284 | self.auto_create_cluster_dir = True | |
283 | try: |
|
285 | try: | |
284 | super(IPClusterApp, self).find_resources() |
|
286 | super(IPClusterApp, self).find_resources() | |
285 | except ClusterDirError: |
|
287 | except ClusterDirError: | |
286 | raise ClusterDirError( |
|
288 | raise ClusterDirError( | |
287 | "Could not find a cluster directory. A cluster dir must " |
|
289 | "Could not find a cluster directory. A cluster dir must " | |
288 | "be created before running 'ipclusterz start'. Do " |
|
290 | "be created before running 'ipclusterz start'. Do " | |
289 | "'ipclusterz create -h' or 'ipclusterz list -h' for more " |
|
291 | "'ipclusterz create -h' or 'ipclusterz list -h' for more " | |
290 | "information about creating and listing cluster dirs." |
|
292 | "information about creating and listing cluster dirs." | |
291 | ) |
|
293 | ) | |
292 | elif subcommand=='engines': |
|
294 | elif subcommand=='engines': | |
293 | self.auto_create_cluster_dir = False |
|
295 | self.auto_create_cluster_dir = False | |
294 | try: |
|
296 | try: | |
295 | super(IPClusterApp, self).find_resources() |
|
297 | super(IPClusterApp, self).find_resources() | |
296 | except ClusterDirError: |
|
298 | except ClusterDirError: | |
297 | raise ClusterDirError( |
|
299 | raise ClusterDirError( | |
298 | "Could not find a cluster directory. A cluster dir must " |
|
300 | "Could not find a cluster directory. A cluster dir must " | |
299 | "be created before running 'ipclusterz start'. Do " |
|
301 | "be created before running 'ipclusterz start'. Do " | |
300 | "'ipclusterz create -h' or 'ipclusterz list -h' for more " |
|
302 | "'ipclusterz create -h' or 'ipclusterz list -h' for more " | |
301 | "information about creating and listing cluster dirs." |
|
303 | "information about creating and listing cluster dirs." | |
302 | ) |
|
304 | ) | |
303 |
|
305 | |||
304 | def list_cluster_dirs(self): |
|
306 | def list_cluster_dirs(self): | |
305 | # Find the search paths |
|
307 | # Find the search paths | |
306 | cluster_dir_paths = os.environ.get('IPCLUSTER_DIR_PATH','') |
|
308 | cluster_dir_paths = os.environ.get('IPCLUSTER_DIR_PATH','') | |
307 | if cluster_dir_paths: |
|
309 | if cluster_dir_paths: | |
308 | cluster_dir_paths = cluster_dir_paths.split(':') |
|
310 | cluster_dir_paths = cluster_dir_paths.split(':') | |
309 | else: |
|
311 | else: | |
310 | cluster_dir_paths = [] |
|
312 | cluster_dir_paths = [] | |
311 | try: |
|
313 | try: | |
312 | ipython_dir = self.command_line_config.Global.ipython_dir |
|
314 | ipython_dir = self.command_line_config.Global.ipython_dir | |
313 | except AttributeError: |
|
315 | except AttributeError: | |
314 | ipython_dir = self.default_config.Global.ipython_dir |
|
316 | ipython_dir = self.default_config.Global.ipython_dir | |
315 | paths = [os.getcwd(), ipython_dir] + \ |
|
317 | paths = [os.getcwd(), ipython_dir] + \ | |
316 | cluster_dir_paths |
|
318 | cluster_dir_paths | |
317 | paths = list(set(paths)) |
|
319 | paths = list(set(paths)) | |
318 |
|
320 | |||
319 | self.log.info('Searching for cluster dirs in paths: %r' % paths) |
|
321 | self.log.info('Searching for cluster dirs in paths: %r' % paths) | |
320 | for path in paths: |
|
322 | for path in paths: | |
321 | files = os.listdir(path) |
|
323 | files = os.listdir(path) | |
322 | for f in files: |
|
324 | for f in files: | |
323 | full_path = os.path.join(path, f) |
|
325 | full_path = os.path.join(path, f) | |
324 | if os.path.isdir(full_path) and f.startswith('cluster_'): |
|
326 | if os.path.isdir(full_path) and f.startswith('cluster_'): | |
325 | profile = full_path.split('_')[-1] |
|
327 | profile = full_path.split('_')[-1] | |
326 | start_cmd = 'ipclusterz start -p %s -n 4' % profile |
|
328 | start_cmd = 'ipclusterz start -p %s -n 4' % profile | |
327 | print start_cmd + " ==> " + full_path |
|
329 | print start_cmd + " ==> " + full_path | |
328 |
|
330 | |||
329 | def pre_construct(self): |
|
331 | def pre_construct(self): | |
330 | # IPClusterApp.pre_construct() is where we cd to the working directory. |
|
332 | # IPClusterApp.pre_construct() is where we cd to the working directory. | |
331 | super(IPClusterApp, self).pre_construct() |
|
333 | super(IPClusterApp, self).pre_construct() | |
332 | config = self.master_config |
|
334 | config = self.master_config | |
333 | try: |
|
335 | try: | |
334 | daemon = config.Global.daemonize |
|
336 | daemon = config.Global.daemonize | |
335 | if daemon: |
|
337 | if daemon: | |
336 | config.Global.log_to_file = True |
|
338 | config.Global.log_to_file = True | |
337 | except AttributeError: |
|
339 | except AttributeError: | |
338 | pass |
|
340 | pass | |
339 |
|
341 | |||
340 | def construct(self): |
|
342 | def construct(self): | |
341 | config = self.master_config |
|
343 | config = self.master_config | |
342 | subcmd = config.Global.subcommand |
|
344 | subcmd = config.Global.subcommand | |
343 | reset = config.Global.reset_config |
|
345 | reset = config.Global.reset_config | |
344 | if subcmd == 'list': |
|
346 | if subcmd == 'list': | |
345 | return |
|
347 | return | |
346 | if subcmd == 'create': |
|
348 | if subcmd == 'create': | |
347 | self.log.info('Copying default config files to cluster directory ' |
|
349 | self.log.info('Copying default config files to cluster directory ' | |
348 | '[overwrite=%r]' % (reset,)) |
|
350 | '[overwrite=%r]' % (reset,)) | |
349 | self.cluster_dir_obj.copy_all_config_files(overwrite=reset) |
|
351 | self.cluster_dir_obj.copy_all_config_files(overwrite=reset) | |
350 | if subcmd =='start': |
|
352 | if subcmd =='start': | |
351 | self.cluster_dir_obj.copy_all_config_files(overwrite=False) |
|
353 | self.cluster_dir_obj.copy_all_config_files(overwrite=False) | |
352 | self.start_logging() |
|
354 | self.start_logging() | |
353 | self.loop = ioloop.IOLoop.instance() |
|
355 | self.loop = ioloop.IOLoop.instance() | |
354 | # reactor.callWhenRunning(self.start_launchers) |
|
356 | # reactor.callWhenRunning(self.start_launchers) | |
355 | dc = ioloop.DelayedCallback(self.start_launchers, 0, self.loop) |
|
357 | dc = ioloop.DelayedCallback(self.start_launchers, 0, self.loop) | |
356 | dc.start() |
|
358 | dc.start() | |
357 | if subcmd == 'engines': |
|
359 | if subcmd == 'engines': | |
358 | self.start_logging() |
|
360 | self.start_logging() | |
359 | self.loop = ioloop.IOLoop.instance() |
|
361 | self.loop = ioloop.IOLoop.instance() | |
360 | # reactor.callWhenRunning(self.start_launchers) |
|
362 | # reactor.callWhenRunning(self.start_launchers) | |
361 | engine_only = lambda : self.start_launchers(controller=False) |
|
363 | engine_only = lambda : self.start_launchers(controller=False) | |
362 | dc = ioloop.DelayedCallback(engine_only, 0, self.loop) |
|
364 | dc = ioloop.DelayedCallback(engine_only, 0, self.loop) | |
363 | dc.start() |
|
365 | dc.start() | |
364 |
|
366 | |||
365 | def start_launchers(self, controller=True): |
|
367 | def start_launchers(self, controller=True): | |
366 | config = self.master_config |
|
368 | config = self.master_config | |
367 |
|
369 | |||
368 | # Create the launchers. In both bases, we set the work_dir of |
|
370 | # Create the launchers. In both bases, we set the work_dir of | |
369 | # the launcher to the cluster_dir. This is where the launcher's |
|
371 | # the launcher to the cluster_dir. This is where the launcher's | |
370 | # subprocesses will be launched. It is not where the controller |
|
372 | # subprocesses will be launched. It is not where the controller | |
371 | # and engine will be launched. |
|
373 | # and engine will be launched. | |
372 | if controller: |
|
374 | if controller: | |
373 | cl_class = import_item(config.Global.controller_launcher) |
|
375 | cl_class = import_item(config.Global.controller_launcher) | |
374 | self.controller_launcher = cl_class( |
|
376 | self.controller_launcher = cl_class( | |
375 | work_dir=self.cluster_dir, config=config, |
|
377 | work_dir=self.cluster_dir, config=config, | |
376 | logname=self.log.name |
|
378 | logname=self.log.name | |
377 | ) |
|
379 | ) | |
378 | # Setup the observing of stopping. If the controller dies, shut |
|
380 | # Setup the observing of stopping. If the controller dies, shut | |
379 | # everything down as that will be completely fatal for the engines. |
|
381 | # everything down as that will be completely fatal for the engines. | |
380 | self.controller_launcher.on_stop(self.stop_launchers) |
|
382 | self.controller_launcher.on_stop(self.stop_launchers) | |
381 | # But, we don't monitor the stopping of engines. An engine dying |
|
383 | # But, we don't monitor the stopping of engines. An engine dying | |
382 | # is just fine and in principle a user could start a new engine. |
|
384 | # is just fine and in principle a user could start a new engine. | |
383 | # Also, if we did monitor engine stopping, it is difficult to |
|
385 | # Also, if we did monitor engine stopping, it is difficult to | |
384 | # know what to do when only some engines die. Currently, the |
|
386 | # know what to do when only some engines die. Currently, the | |
385 | # observing of engine stopping is inconsistent. Some launchers |
|
387 | # observing of engine stopping is inconsistent. Some launchers | |
386 | # might trigger on a single engine stopping, other wait until |
|
388 | # might trigger on a single engine stopping, other wait until | |
387 | # all stop. TODO: think more about how to handle this. |
|
389 | # all stop. TODO: think more about how to handle this. | |
388 |
|
390 | else: | ||
|
391 | self.controller_launcher = None | |||
389 |
|
392 | |||
390 | el_class = import_item(config.Global.engine_launcher) |
|
393 | el_class = import_item(config.Global.engine_launcher) | |
391 | self.engine_launcher = el_class( |
|
394 | self.engine_launcher = el_class( | |
392 | work_dir=self.cluster_dir, config=config, logname=self.log.name |
|
395 | work_dir=self.cluster_dir, config=config, logname=self.log.name | |
393 | ) |
|
396 | ) | |
394 |
|
397 | |||
395 | # Setup signals |
|
398 | # Setup signals | |
396 | signal.signal(signal.SIGINT, self.sigint_handler) |
|
399 | signal.signal(signal.SIGINT, self.sigint_handler) | |
397 |
|
400 | |||
398 | # Start the controller and engines |
|
401 | # Start the controller and engines | |
399 | self._stopping = False # Make sure stop_launchers is not called 2x. |
|
402 | self._stopping = False # Make sure stop_launchers is not called 2x. | |
400 | if controller: |
|
403 | if controller: | |
401 | self.start_controller() |
|
404 | self.start_controller() | |
402 | dc = ioloop.DelayedCallback(self.start_engines, 1000*config.Global.delay*controller, self.loop) |
|
405 | dc = ioloop.DelayedCallback(self.start_engines, 1000*config.Global.delay*controller, self.loop) | |
403 | dc.start() |
|
406 | dc.start() | |
404 | self.startup_message() |
|
407 | self.startup_message() | |
405 |
|
408 | |||
406 | def startup_message(self, r=None): |
|
409 | def startup_message(self, r=None): | |
407 | self.log.info("IPython cluster: started") |
|
410 | self.log.info("IPython cluster: started") | |
408 | return r |
|
411 | return r | |
409 |
|
412 | |||
410 | def start_controller(self, r=None): |
|
413 | def start_controller(self, r=None): | |
411 | # self.log.info("In start_controller") |
|
414 | # self.log.info("In start_controller") | |
412 | config = self.master_config |
|
415 | config = self.master_config | |
413 | d = self.controller_launcher.start( |
|
416 | d = self.controller_launcher.start( | |
414 | cluster_dir=config.Global.cluster_dir |
|
417 | cluster_dir=config.Global.cluster_dir | |
415 | ) |
|
418 | ) | |
416 | return d |
|
419 | return d | |
417 |
|
420 | |||
418 | def start_engines(self, r=None): |
|
421 | def start_engines(self, r=None): | |
419 | # self.log.info("In start_engines") |
|
422 | # self.log.info("In start_engines") | |
420 | config = self.master_config |
|
423 | config = self.master_config | |
421 |
|
424 | |||
422 | d = self.engine_launcher.start( |
|
425 | d = self.engine_launcher.start( | |
423 | config.Global.n, |
|
426 | config.Global.n, | |
424 | cluster_dir=config.Global.cluster_dir |
|
427 | cluster_dir=config.Global.cluster_dir | |
425 | ) |
|
428 | ) | |
426 | return d |
|
429 | return d | |
427 |
|
430 | |||
428 | def stop_controller(self, r=None): |
|
431 | def stop_controller(self, r=None): | |
429 | # self.log.info("In stop_controller") |
|
432 | # self.log.info("In stop_controller") | |
430 | if self.controller_launcher.running: |
|
433 | if self.controller_launcher and self.controller_launcher.running: | |
431 | return self.controller_launcher.stop() |
|
434 | return self.controller_launcher.stop() | |
432 |
|
435 | |||
433 | def stop_engines(self, r=None): |
|
436 | def stop_engines(self, r=None): | |
434 | # self.log.info("In stop_engines") |
|
437 | # self.log.info("In stop_engines") | |
435 | if self.engine_launcher.running: |
|
438 | if self.engine_launcher.running: | |
436 | d = self.engine_launcher.stop() |
|
439 | d = self.engine_launcher.stop() | |
437 | # d.addErrback(self.log_err) |
|
440 | # d.addErrback(self.log_err) | |
438 | return d |
|
441 | return d | |
439 | else: |
|
442 | else: | |
440 | return None |
|
443 | return None | |
441 |
|
444 | |||
442 | def log_err(self, f): |
|
445 | def log_err(self, f): | |
443 | self.log.error(f.getTraceback()) |
|
446 | self.log.error(f.getTraceback()) | |
444 | return None |
|
447 | return None | |
445 |
|
448 | |||
446 | def stop_launchers(self, r=None): |
|
449 | def stop_launchers(self, r=None): | |
447 | if not self._stopping: |
|
450 | if not self._stopping: | |
448 | self._stopping = True |
|
451 | self._stopping = True | |
449 | # if isinstance(r, failure.Failure): |
|
452 | # if isinstance(r, failure.Failure): | |
450 | # self.log.error('Unexpected error in ipcluster:') |
|
453 | # self.log.error('Unexpected error in ipcluster:') | |
451 | # self.log.info(r.getTraceback()) |
|
454 | # self.log.info(r.getTraceback()) | |
452 | self.log.error("IPython cluster: stopping") |
|
455 | self.log.error("IPython cluster: stopping") | |
453 | # These return deferreds. We are not doing anything with them |
|
456 | # These return deferreds. We are not doing anything with them | |
454 | # but we are holding refs to them as a reminder that they |
|
457 | # but we are holding refs to them as a reminder that they | |
455 | # do return deferreds. |
|
458 | # do return deferreds. | |
456 | d1 = self.stop_engines() |
|
459 | d1 = self.stop_engines() | |
457 | d2 = self.stop_controller() |
|
460 | d2 = self.stop_controller() | |
458 | # Wait a few seconds to let things shut down. |
|
461 | # Wait a few seconds to let things shut down. | |
459 | dc = ioloop.DelayedCallback(self.loop.stop, 4000, self.loop) |
|
462 | dc = ioloop.DelayedCallback(self.loop.stop, 4000, self.loop) | |
460 | dc.start() |
|
463 | dc.start() | |
461 | # reactor.callLater(4.0, reactor.stop) |
|
464 | # reactor.callLater(4.0, reactor.stop) | |
462 |
|
465 | |||
463 | def sigint_handler(self, signum, frame): |
|
466 | def sigint_handler(self, signum, frame): | |
464 | self.stop_launchers() |
|
467 | self.stop_launchers() | |
465 |
|
468 | |||
466 | def start_logging(self): |
|
469 | def start_logging(self): | |
467 | # Remove old log files of the controller and engine |
|
470 | # Remove old log files of the controller and engine | |
468 | if self.master_config.Global.clean_logs: |
|
471 | if self.master_config.Global.clean_logs: | |
469 | log_dir = self.master_config.Global.log_dir |
|
472 | log_dir = self.master_config.Global.log_dir | |
470 | for f in os.listdir(log_dir): |
|
473 | for f in os.listdir(log_dir): | |
471 | if re.match(r'ip(engine|controller)z-\d+\.(log|err|out)',f): |
|
474 | if re.match(r'ip(engine|controller)z-\d+\.(log|err|out)',f): | |
472 | os.remove(os.path.join(log_dir, f)) |
|
475 | os.remove(os.path.join(log_dir, f)) | |
473 | # This will remove old log files for ipcluster itself |
|
476 | # This will remove old log files for ipcluster itself | |
474 | super(IPClusterApp, self).start_logging() |
|
477 | super(IPClusterApp, self).start_logging() | |
475 |
|
478 | |||
476 | def start_app(self): |
|
479 | def start_app(self): | |
477 | """Start the application, depending on what subcommand is used.""" |
|
480 | """Start the application, depending on what subcommand is used.""" | |
478 | subcmd = self.master_config.Global.subcommand |
|
481 | subcmd = self.master_config.Global.subcommand | |
479 | if subcmd=='create' or subcmd=='list': |
|
482 | if subcmd=='create' or subcmd=='list': | |
480 | return |
|
483 | return | |
481 | elif subcmd=='start': |
|
484 | elif subcmd=='start': | |
482 | self.start_app_start() |
|
485 | self.start_app_start() | |
483 | elif subcmd=='stop': |
|
486 | elif subcmd=='stop': | |
484 | self.start_app_stop() |
|
487 | self.start_app_stop() | |
485 | elif subcmd=='engines': |
|
488 | elif subcmd=='engines': | |
486 | self.start_app_engines() |
|
489 | self.start_app_engines() | |
487 |
|
490 | |||
488 | def start_app_start(self): |
|
491 | def start_app_start(self): | |
489 | """Start the app for the start subcommand.""" |
|
492 | """Start the app for the start subcommand.""" | |
490 | config = self.master_config |
|
493 | config = self.master_config | |
491 | # First see if the cluster is already running |
|
494 | # First see if the cluster is already running | |
492 | try: |
|
495 | try: | |
493 | pid = self.get_pid_from_file() |
|
496 | pid = self.get_pid_from_file() | |
494 | except PIDFileError: |
|
497 | except PIDFileError: | |
495 | pass |
|
498 | pass | |
496 | else: |
|
499 | else: | |
497 | self.log.critical( |
|
500 | self.log.critical( | |
498 | 'Cluster is already running with [pid=%s]. ' |
|
501 | 'Cluster is already running with [pid=%s]. ' | |
499 | 'use "ipclusterz stop" to stop the cluster.' % pid |
|
502 | 'use "ipclusterz stop" to stop the cluster.' % pid | |
500 | ) |
|
503 | ) | |
501 | # Here I exit with a unusual exit status that other processes |
|
504 | # Here I exit with a unusual exit status that other processes | |
502 | # can watch for to learn how I existed. |
|
505 | # can watch for to learn how I existed. | |
503 | self.exit(ALREADY_STARTED) |
|
506 | self.exit(ALREADY_STARTED) | |
504 |
|
507 | |||
505 | # Now log and daemonize |
|
508 | # Now log and daemonize | |
506 | self.log.info( |
|
509 | self.log.info( | |
507 | 'Starting ipclusterz with [daemon=%r]' % config.Global.daemonize |
|
510 | 'Starting ipclusterz with [daemon=%r]' % config.Global.daemonize | |
508 | ) |
|
511 | ) | |
509 | # TODO: Get daemonize working on Windows or as a Windows Server. |
|
512 | # TODO: Get daemonize working on Windows or as a Windows Server. | |
510 | if config.Global.daemonize: |
|
513 | if config.Global.daemonize: | |
511 | if os.name=='posix': |
|
514 | if os.name=='posix': | |
512 | from twisted.scripts._twistd_unix import daemonize |
|
515 | from twisted.scripts._twistd_unix import daemonize | |
513 | daemonize() |
|
516 | daemonize() | |
514 |
|
517 | |||
515 | # Now write the new pid file AFTER our new forked pid is active. |
|
518 | # Now write the new pid file AFTER our new forked pid is active. | |
516 | self.write_pid_file() |
|
519 | self.write_pid_file() | |
517 | try: |
|
520 | try: | |
518 | self.loop.start() |
|
521 | self.loop.start() | |
519 | except: |
|
522 | except KeyboardInterrupt: | |
520 | self.log.info("stopping...") |
|
523 | pass | |
|
524 | except zmq.ZMQError as e: | |||
|
525 | if e.errno == errno.EINTR: | |||
|
526 | pass | |||
|
527 | else: | |||
|
528 | raise | |||
521 | self.remove_pid_file() |
|
529 | self.remove_pid_file() | |
522 |
|
530 | |||
523 | def start_app_engines(self): |
|
531 | def start_app_engines(self): | |
524 | """Start the app for the start subcommand.""" |
|
532 | """Start the app for the start subcommand.""" | |
525 | config = self.master_config |
|
533 | config = self.master_config | |
526 | # First see if the cluster is already running |
|
534 | # First see if the cluster is already running | |
527 |
|
535 | |||
528 | # Now log and daemonize |
|
536 | # Now log and daemonize | |
529 | self.log.info( |
|
537 | self.log.info( | |
530 | 'Starting engines with [daemon=%r]' % config.Global.daemonize |
|
538 | 'Starting engines with [daemon=%r]' % config.Global.daemonize | |
531 | ) |
|
539 | ) | |
532 | # TODO: Get daemonize working on Windows or as a Windows Server. |
|
540 | # TODO: Get daemonize working on Windows or as a Windows Server. | |
533 | if config.Global.daemonize: |
|
541 | if config.Global.daemonize: | |
534 | if os.name=='posix': |
|
542 | if os.name=='posix': | |
535 | from twisted.scripts._twistd_unix import daemonize |
|
543 | from twisted.scripts._twistd_unix import daemonize | |
536 | daemonize() |
|
544 | daemonize() | |
537 |
|
545 | |||
538 | # Now write the new pid file AFTER our new forked pid is active. |
|
546 | # Now write the new pid file AFTER our new forked pid is active. | |
539 | # self.write_pid_file() |
|
547 | # self.write_pid_file() | |
540 | try: |
|
548 | try: | |
541 | self.loop.start() |
|
549 | self.loop.start() | |
542 | except: |
|
550 | except KeyboardInterrupt: | |
543 | self.log.fatal("stopping...") |
|
551 | pass | |
|
552 | except zmq.ZMQError as e: | |||
|
553 | if e.errno == errno.EINTR: | |||
|
554 | pass | |||
|
555 | else: | |||
|
556 | raise | |||
544 | # self.remove_pid_file() |
|
557 | # self.remove_pid_file() | |
545 |
|
558 | |||
546 | def start_app_stop(self): |
|
559 | def start_app_stop(self): | |
547 | """Start the app for the stop subcommand.""" |
|
560 | """Start the app for the stop subcommand.""" | |
548 | config = self.master_config |
|
561 | config = self.master_config | |
549 | try: |
|
562 | try: | |
550 | pid = self.get_pid_from_file() |
|
563 | pid = self.get_pid_from_file() | |
551 | except PIDFileError: |
|
564 | except PIDFileError: | |
552 | self.log.critical( |
|
565 | self.log.critical( | |
553 | 'Problem reading pid file, cluster is probably not running.' |
|
566 | 'Problem reading pid file, cluster is probably not running.' | |
554 | ) |
|
567 | ) | |
555 | # Here I exit with a unusual exit status that other processes |
|
568 | # Here I exit with a unusual exit status that other processes | |
556 | # can watch for to learn how I existed. |
|
569 | # can watch for to learn how I existed. | |
557 | self.exit(ALREADY_STOPPED) |
|
570 | self.exit(ALREADY_STOPPED) | |
558 | else: |
|
571 | else: | |
559 | if os.name=='posix': |
|
572 | if os.name=='posix': | |
560 | sig = config.Global.signal |
|
573 | sig = config.Global.signal | |
561 | self.log.info( |
|
574 | self.log.info( | |
562 | "Stopping cluster [pid=%r] with [signal=%r]" % (pid, sig) |
|
575 | "Stopping cluster [pid=%r] with [signal=%r]" % (pid, sig) | |
563 | ) |
|
576 | ) | |
564 | os.kill(pid, sig) |
|
577 | os.kill(pid, sig) | |
565 | elif os.name=='nt': |
|
578 | elif os.name=='nt': | |
566 | # As of right now, we don't support daemonize on Windows, so |
|
579 | # As of right now, we don't support daemonize on Windows, so | |
567 | # stop will not do anything. Minimally, it should clean up the |
|
580 | # stop will not do anything. Minimally, it should clean up the | |
568 | # old .pid files. |
|
581 | # old .pid files. | |
569 | self.remove_pid_file() |
|
582 | self.remove_pid_file() | |
570 |
|
583 | |||
571 |
|
584 | |||
572 | def launch_new_instance(): |
|
585 | def launch_new_instance(): | |
573 | """Create and run the IPython cluster.""" |
|
586 | """Create and run the IPython cluster.""" | |
574 | app = IPClusterApp() |
|
587 | app = IPClusterApp() | |
575 | app.start() |
|
588 | app.start() | |
576 |
|
589 | |||
577 |
|
590 | |||
578 | if __name__ == '__main__': |
|
591 | if __name__ == '__main__': | |
579 | launch_new_instance() |
|
592 | launch_new_instance() | |
580 |
|
593 |
@@ -1,542 +1,549 b'' | |||||
1 | """The Python scheduler for rich scheduling. |
|
1 | """The Python scheduler for rich scheduling. | |
2 |
|
2 | |||
3 | The Pure ZMQ scheduler does not allow routing schemes other than LRU, |
|
3 | The Pure ZMQ scheduler does not allow routing schemes other than LRU, | |
4 | nor does it check msg_id DAG dependencies. For those, a slightly slower |
|
4 | nor does it check msg_id DAG dependencies. For those, a slightly slower | |
5 | Python Scheduler exists. |
|
5 | Python Scheduler exists. | |
6 | """ |
|
6 | """ | |
7 |
|
7 | |||
8 | #---------------------------------------------------------------------- |
|
8 | #---------------------------------------------------------------------- | |
9 | # Imports |
|
9 | # Imports | |
10 | #---------------------------------------------------------------------- |
|
10 | #---------------------------------------------------------------------- | |
11 |
|
11 | |||
12 | from __future__ import print_function |
|
12 | from __future__ import print_function | |
13 | import sys |
|
13 | import sys | |
14 | import logging |
|
14 | import logging | |
15 | from random import randint, random |
|
15 | from random import randint, random | |
16 | from types import FunctionType |
|
16 | from types import FunctionType | |
17 | from datetime import datetime, timedelta |
|
17 | from datetime import datetime, timedelta | |
18 | try: |
|
18 | try: | |
19 | import numpy |
|
19 | import numpy | |
20 | except ImportError: |
|
20 | except ImportError: | |
21 | numpy = None |
|
21 | numpy = None | |
22 |
|
22 | |||
23 | import zmq |
|
23 | import zmq | |
24 | from zmq.eventloop import ioloop, zmqstream |
|
24 | from zmq.eventloop import ioloop, zmqstream | |
25 |
|
25 | |||
26 | # local imports |
|
26 | # local imports | |
27 | from IPython.external.decorator import decorator |
|
27 | from IPython.external.decorator import decorator | |
28 | # from IPython.config.configurable import Configurable |
|
28 | # from IPython.config.configurable import Configurable | |
29 | from IPython.utils.traitlets import Instance, Dict, List, Set |
|
29 | from IPython.utils.traitlets import Instance, Dict, List, Set | |
30 |
|
30 | |||
31 | import error |
|
31 | import error | |
32 | # from client import Client |
|
32 | # from client import Client | |
33 | from dependency import Dependency |
|
33 | from dependency import Dependency | |
34 | import streamsession as ss |
|
34 | import streamsession as ss | |
35 | from entry_point import connect_logger, local_logger |
|
35 | from entry_point import connect_logger, local_logger | |
36 | from factory import SessionFactory |
|
36 | from factory import SessionFactory | |
37 |
|
37 | |||
38 |
|
38 | |||
39 | @decorator |
|
39 | @decorator | |
40 | def logged(f,self,*args,**kwargs): |
|
40 | def logged(f,self,*args,**kwargs): | |
41 | # print ("#--------------------") |
|
41 | # print ("#--------------------") | |
42 | self.log.debug("scheduler::%s(*%s,**%s)"%(f.func_name, args, kwargs)) |
|
42 | self.log.debug("scheduler::%s(*%s,**%s)"%(f.func_name, args, kwargs)) | |
43 | # print ("#--") |
|
43 | # print ("#--") | |
44 | return f(self,*args, **kwargs) |
|
44 | return f(self,*args, **kwargs) | |
45 |
|
45 | |||
46 | #---------------------------------------------------------------------- |
|
46 | #---------------------------------------------------------------------- | |
47 | # Chooser functions |
|
47 | # Chooser functions | |
48 | #---------------------------------------------------------------------- |
|
48 | #---------------------------------------------------------------------- | |
49 |
|
49 | |||
50 | def plainrandom(loads): |
|
50 | def plainrandom(loads): | |
51 | """Plain random pick.""" |
|
51 | """Plain random pick.""" | |
52 | n = len(loads) |
|
52 | n = len(loads) | |
53 | return randint(0,n-1) |
|
53 | return randint(0,n-1) | |
54 |
|
54 | |||
55 | def lru(loads): |
|
55 | def lru(loads): | |
56 | """Always pick the front of the line. |
|
56 | """Always pick the front of the line. | |
57 |
|
57 | |||
58 | The content of `loads` is ignored. |
|
58 | The content of `loads` is ignored. | |
59 |
|
59 | |||
60 | Assumes LRU ordering of loads, with oldest first. |
|
60 | Assumes LRU ordering of loads, with oldest first. | |
61 | """ |
|
61 | """ | |
62 | return 0 |
|
62 | return 0 | |
63 |
|
63 | |||
64 | def twobin(loads): |
|
64 | def twobin(loads): | |
65 | """Pick two at random, use the LRU of the two. |
|
65 | """Pick two at random, use the LRU of the two. | |
66 |
|
66 | |||
67 | The content of loads is ignored. |
|
67 | The content of loads is ignored. | |
68 |
|
68 | |||
69 | Assumes LRU ordering of loads, with oldest first. |
|
69 | Assumes LRU ordering of loads, with oldest first. | |
70 | """ |
|
70 | """ | |
71 | n = len(loads) |
|
71 | n = len(loads) | |
72 | a = randint(0,n-1) |
|
72 | a = randint(0,n-1) | |
73 | b = randint(0,n-1) |
|
73 | b = randint(0,n-1) | |
74 | return min(a,b) |
|
74 | return min(a,b) | |
75 |
|
75 | |||
76 | def weighted(loads): |
|
76 | def weighted(loads): | |
77 | """Pick two at random using inverse load as weight. |
|
77 | """Pick two at random using inverse load as weight. | |
78 |
|
78 | |||
79 | Return the less loaded of the two. |
|
79 | Return the less loaded of the two. | |
80 | """ |
|
80 | """ | |
81 | # weight 0 a million times more than 1: |
|
81 | # weight 0 a million times more than 1: | |
82 | weights = 1./(1e-6+numpy.array(loads)) |
|
82 | weights = 1./(1e-6+numpy.array(loads)) | |
83 | sums = weights.cumsum() |
|
83 | sums = weights.cumsum() | |
84 | t = sums[-1] |
|
84 | t = sums[-1] | |
85 | x = random()*t |
|
85 | x = random()*t | |
86 | y = random()*t |
|
86 | y = random()*t | |
87 | idx = 0 |
|
87 | idx = 0 | |
88 | idy = 0 |
|
88 | idy = 0 | |
89 | while sums[idx] < x: |
|
89 | while sums[idx] < x: | |
90 | idx += 1 |
|
90 | idx += 1 | |
91 | while sums[idy] < y: |
|
91 | while sums[idy] < y: | |
92 | idy += 1 |
|
92 | idy += 1 | |
93 | if weights[idy] > weights[idx]: |
|
93 | if weights[idy] > weights[idx]: | |
94 | return idy |
|
94 | return idy | |
95 | else: |
|
95 | else: | |
96 | return idx |
|
96 | return idx | |
97 |
|
97 | |||
98 | def leastload(loads): |
|
98 | def leastload(loads): | |
99 | """Always choose the lowest load. |
|
99 | """Always choose the lowest load. | |
100 |
|
100 | |||
101 | If the lowest load occurs more than once, the first |
|
101 | If the lowest load occurs more than once, the first | |
102 | occurance will be used. If loads has LRU ordering, this means |
|
102 | occurance will be used. If loads has LRU ordering, this means | |
103 | the LRU of those with the lowest load is chosen. |
|
103 | the LRU of those with the lowest load is chosen. | |
104 | """ |
|
104 | """ | |
105 | return loads.index(min(loads)) |
|
105 | return loads.index(min(loads)) | |
106 |
|
106 | |||
107 | #--------------------------------------------------------------------- |
|
107 | #--------------------------------------------------------------------- | |
108 | # Classes |
|
108 | # Classes | |
109 | #--------------------------------------------------------------------- |
|
109 | #--------------------------------------------------------------------- | |
110 | # store empty default dependency: |
|
110 | # store empty default dependency: | |
111 | MET = Dependency([]) |
|
111 | MET = Dependency([]) | |
112 |
|
112 | |||
113 | class TaskScheduler(SessionFactory): |
|
113 | class TaskScheduler(SessionFactory): | |
114 | """Python TaskScheduler object. |
|
114 | """Python TaskScheduler object. | |
115 |
|
115 | |||
116 | This is the simplest object that supports msg_id based |
|
116 | This is the simplest object that supports msg_id based | |
117 | DAG dependencies. *Only* task msg_ids are checked, not |
|
117 | DAG dependencies. *Only* task msg_ids are checked, not | |
118 | msg_ids of jobs submitted via the MUX queue. |
|
118 | msg_ids of jobs submitted via the MUX queue. | |
119 |
|
119 | |||
120 | """ |
|
120 | """ | |
121 |
|
121 | |||
122 | # input arguments: |
|
122 | # input arguments: | |
123 | scheme = Instance(FunctionType, default=leastload) # function for determining the destination |
|
123 | scheme = Instance(FunctionType, default=leastload) # function for determining the destination | |
124 | client_stream = Instance(zmqstream.ZMQStream) # client-facing stream |
|
124 | client_stream = Instance(zmqstream.ZMQStream) # client-facing stream | |
125 | engine_stream = Instance(zmqstream.ZMQStream) # engine-facing stream |
|
125 | engine_stream = Instance(zmqstream.ZMQStream) # engine-facing stream | |
126 | notifier_stream = Instance(zmqstream.ZMQStream) # hub-facing sub stream |
|
126 | notifier_stream = Instance(zmqstream.ZMQStream) # hub-facing sub stream | |
127 | mon_stream = Instance(zmqstream.ZMQStream) # hub-facing pub stream |
|
127 | mon_stream = Instance(zmqstream.ZMQStream) # hub-facing pub stream | |
128 |
|
128 | |||
129 | # internals: |
|
129 | # internals: | |
130 |
|
|
130 | graph = Dict() # dict by msg_id of [ msg_ids that depend on key ] | |
131 | depending = Dict() # dict by msg_id of (msg_id, raw_msg, after, follow) |
|
131 | depending = Dict() # dict by msg_id of (msg_id, raw_msg, after, follow) | |
132 | pending = Dict() # dict by engine_uuid of submitted tasks |
|
132 | pending = Dict() # dict by engine_uuid of submitted tasks | |
133 | completed = Dict() # dict by engine_uuid of completed tasks |
|
133 | completed = Dict() # dict by engine_uuid of completed tasks | |
134 | failed = Dict() # dict by engine_uuid of failed tasks |
|
134 | failed = Dict() # dict by engine_uuid of failed tasks | |
135 | destinations = Dict() # dict by msg_id of engine_uuids where jobs ran (reverse of completed+failed) |
|
135 | destinations = Dict() # dict by msg_id of engine_uuids where jobs ran (reverse of completed+failed) | |
136 | clients = Dict() # dict by msg_id for who submitted the task |
|
136 | clients = Dict() # dict by msg_id for who submitted the task | |
137 | targets = List() # list of target IDENTs |
|
137 | targets = List() # list of target IDENTs | |
138 | loads = List() # list of engine loads |
|
138 | loads = List() # list of engine loads | |
139 | all_completed = Set() # set of all completed tasks |
|
139 | all_completed = Set() # set of all completed tasks | |
140 | all_failed = Set() # set of all failed tasks |
|
140 | all_failed = Set() # set of all failed tasks | |
141 | all_done = Set() # set of all finished tasks=union(completed,failed) |
|
141 | all_done = Set() # set of all finished tasks=union(completed,failed) | |
|
142 | all_ids = Set() # set of all submitted task IDs | |||
142 | blacklist = Dict() # dict by msg_id of locations where a job has encountered UnmetDependency |
|
143 | blacklist = Dict() # dict by msg_id of locations where a job has encountered UnmetDependency | |
143 | auditor = Instance('zmq.eventloop.ioloop.PeriodicCallback') |
|
144 | auditor = Instance('zmq.eventloop.ioloop.PeriodicCallback') | |
144 |
|
145 | |||
145 |
|
146 | |||
146 | def start(self): |
|
147 | def start(self): | |
147 | self.engine_stream.on_recv(self.dispatch_result, copy=False) |
|
148 | self.engine_stream.on_recv(self.dispatch_result, copy=False) | |
148 | self._notification_handlers = dict( |
|
149 | self._notification_handlers = dict( | |
149 | registration_notification = self._register_engine, |
|
150 | registration_notification = self._register_engine, | |
150 | unregistration_notification = self._unregister_engine |
|
151 | unregistration_notification = self._unregister_engine | |
151 | ) |
|
152 | ) | |
152 | self.notifier_stream.on_recv(self.dispatch_notification) |
|
153 | self.notifier_stream.on_recv(self.dispatch_notification) | |
153 | self.auditor = ioloop.PeriodicCallback(self.audit_timeouts, 2e3, self.loop) # 1 Hz |
|
154 | self.auditor = ioloop.PeriodicCallback(self.audit_timeouts, 2e3, self.loop) # 1 Hz | |
154 | self.auditor.start() |
|
155 | self.auditor.start() | |
155 | self.log.info("Scheduler started...%r"%self) |
|
156 | self.log.info("Scheduler started...%r"%self) | |
156 |
|
157 | |||
157 | def resume_receiving(self): |
|
158 | def resume_receiving(self): | |
158 | """Resume accepting jobs.""" |
|
159 | """Resume accepting jobs.""" | |
159 | self.client_stream.on_recv(self.dispatch_submission, copy=False) |
|
160 | self.client_stream.on_recv(self.dispatch_submission, copy=False) | |
160 |
|
161 | |||
161 | def stop_receiving(self): |
|
162 | def stop_receiving(self): | |
162 | """Stop accepting jobs while there are no engines. |
|
163 | """Stop accepting jobs while there are no engines. | |
163 | Leave them in the ZMQ queue.""" |
|
164 | Leave them in the ZMQ queue.""" | |
164 | self.client_stream.on_recv(None) |
|
165 | self.client_stream.on_recv(None) | |
165 |
|
166 | |||
166 | #----------------------------------------------------------------------- |
|
167 | #----------------------------------------------------------------------- | |
167 | # [Un]Registration Handling |
|
168 | # [Un]Registration Handling | |
168 | #----------------------------------------------------------------------- |
|
169 | #----------------------------------------------------------------------- | |
169 |
|
170 | |||
170 | def dispatch_notification(self, msg): |
|
171 | def dispatch_notification(self, msg): | |
171 | """dispatch register/unregister events.""" |
|
172 | """dispatch register/unregister events.""" | |
172 | idents,msg = self.session.feed_identities(msg) |
|
173 | idents,msg = self.session.feed_identities(msg) | |
173 | msg = self.session.unpack_message(msg) |
|
174 | msg = self.session.unpack_message(msg) | |
174 | msg_type = msg['msg_type'] |
|
175 | msg_type = msg['msg_type'] | |
175 | handler = self._notification_handlers.get(msg_type, None) |
|
176 | handler = self._notification_handlers.get(msg_type, None) | |
176 | if handler is None: |
|
177 | if handler is None: | |
177 | raise Exception("Unhandled message type: %s"%msg_type) |
|
178 | raise Exception("Unhandled message type: %s"%msg_type) | |
178 | else: |
|
179 | else: | |
179 | try: |
|
180 | try: | |
180 | handler(str(msg['content']['queue'])) |
|
181 | handler(str(msg['content']['queue'])) | |
181 | except KeyError: |
|
182 | except KeyError: | |
182 | self.log.error("task::Invalid notification msg: %s"%msg) |
|
183 | self.log.error("task::Invalid notification msg: %s"%msg) | |
183 |
|
184 | |||
184 | @logged |
|
185 | @logged | |
185 | def _register_engine(self, uid): |
|
186 | def _register_engine(self, uid): | |
186 | """New engine with ident `uid` became available.""" |
|
187 | """New engine with ident `uid` became available.""" | |
187 | # head of the line: |
|
188 | # head of the line: | |
188 | self.targets.insert(0,uid) |
|
189 | self.targets.insert(0,uid) | |
189 | self.loads.insert(0,0) |
|
190 | self.loads.insert(0,0) | |
190 | # initialize sets |
|
191 | # initialize sets | |
191 | self.completed[uid] = set() |
|
192 | self.completed[uid] = set() | |
192 | self.failed[uid] = set() |
|
193 | self.failed[uid] = set() | |
193 | self.pending[uid] = {} |
|
194 | self.pending[uid] = {} | |
194 | if len(self.targets) == 1: |
|
195 | if len(self.targets) == 1: | |
195 | self.resume_receiving() |
|
196 | self.resume_receiving() | |
196 |
|
197 | |||
197 | def _unregister_engine(self, uid): |
|
198 | def _unregister_engine(self, uid): | |
198 | """Existing engine with ident `uid` became unavailable.""" |
|
199 | """Existing engine with ident `uid` became unavailable.""" | |
199 | if len(self.targets) == 1: |
|
200 | if len(self.targets) == 1: | |
200 | # this was our only engine |
|
201 | # this was our only engine | |
201 | self.stop_receiving() |
|
202 | self.stop_receiving() | |
202 |
|
203 | |||
203 | # handle any potentially finished tasks: |
|
204 | # handle any potentially finished tasks: | |
204 | self.engine_stream.flush() |
|
205 | self.engine_stream.flush() | |
205 |
|
206 | |||
206 | self.completed.pop(uid) |
|
207 | self.completed.pop(uid) | |
207 | self.failed.pop(uid) |
|
208 | self.failed.pop(uid) | |
208 | # don't pop destinations, because it might be used later |
|
209 | # don't pop destinations, because it might be used later | |
209 | # map(self.destinations.pop, self.completed.pop(uid)) |
|
210 | # map(self.destinations.pop, self.completed.pop(uid)) | |
210 | # map(self.destinations.pop, self.failed.pop(uid)) |
|
211 | # map(self.destinations.pop, self.failed.pop(uid)) | |
211 |
|
212 | |||
212 | idx = self.targets.index(uid) |
|
213 | idx = self.targets.index(uid) | |
213 | self.targets.pop(idx) |
|
214 | self.targets.pop(idx) | |
214 | self.loads.pop(idx) |
|
215 | self.loads.pop(idx) | |
215 |
|
216 | |||
216 | # wait 5 seconds before cleaning up pending jobs, since the results might |
|
217 | # wait 5 seconds before cleaning up pending jobs, since the results might | |
217 | # still be incoming |
|
218 | # still be incoming | |
218 | if self.pending[uid]: |
|
219 | if self.pending[uid]: | |
219 | dc = ioloop.DelayedCallback(lambda : self.handle_stranded_tasks(uid), 5000, self.loop) |
|
220 | dc = ioloop.DelayedCallback(lambda : self.handle_stranded_tasks(uid), 5000, self.loop) | |
220 | dc.start() |
|
221 | dc.start() | |
221 |
|
222 | |||
222 | @logged |
|
223 | @logged | |
223 | def handle_stranded_tasks(self, engine): |
|
224 | def handle_stranded_tasks(self, engine): | |
224 | """Deal with jobs resident in an engine that died.""" |
|
225 | """Deal with jobs resident in an engine that died.""" | |
225 | lost = self.pending.pop(engine) |
|
226 | lost = self.pending.pop(engine) | |
226 |
|
227 | |||
227 | for msg_id, (raw_msg,follow) in lost.iteritems(): |
|
228 | for msg_id, (raw_msg,follow) in lost.iteritems(): | |
228 | self.all_failed.add(msg_id) |
|
229 | self.all_failed.add(msg_id) | |
229 | self.all_done.add(msg_id) |
|
230 | self.all_done.add(msg_id) | |
230 | idents,msg = self.session.feed_identities(raw_msg, copy=False) |
|
231 | idents,msg = self.session.feed_identities(raw_msg, copy=False) | |
231 | msg = self.session.unpack_message(msg, copy=False, content=False) |
|
232 | msg = self.session.unpack_message(msg, copy=False, content=False) | |
232 | parent = msg['header'] |
|
233 | parent = msg['header'] | |
233 | idents = [idents[0],engine]+idents[1:] |
|
234 | idents = [idents[0],engine]+idents[1:] | |
234 | print (idents) |
|
235 | print (idents) | |
235 | try: |
|
236 | try: | |
236 | raise error.EngineError("Engine %r died while running task %r"%(engine, msg_id)) |
|
237 | raise error.EngineError("Engine %r died while running task %r"%(engine, msg_id)) | |
237 | except: |
|
238 | except: | |
238 | content = ss.wrap_exception() |
|
239 | content = ss.wrap_exception() | |
239 | msg = self.session.send(self.client_stream, 'apply_reply', content, |
|
240 | msg = self.session.send(self.client_stream, 'apply_reply', content, | |
240 | parent=parent, ident=idents) |
|
241 | parent=parent, ident=idents) | |
241 | self.session.send(self.mon_stream, msg, ident=['outtask']+idents) |
|
242 | self.session.send(self.mon_stream, msg, ident=['outtask']+idents) | |
242 |
self.update_ |
|
243 | self.update_graph(msg_id) | |
243 |
|
244 | |||
244 |
|
245 | |||
245 | #----------------------------------------------------------------------- |
|
246 | #----------------------------------------------------------------------- | |
246 | # Job Submission |
|
247 | # Job Submission | |
247 | #----------------------------------------------------------------------- |
|
248 | #----------------------------------------------------------------------- | |
248 | @logged |
|
249 | @logged | |
249 | def dispatch_submission(self, raw_msg): |
|
250 | def dispatch_submission(self, raw_msg): | |
250 | """Dispatch job submission to appropriate handlers.""" |
|
251 | """Dispatch job submission to appropriate handlers.""" | |
251 | # ensure targets up to date: |
|
252 | # ensure targets up to date: | |
252 | self.notifier_stream.flush() |
|
253 | self.notifier_stream.flush() | |
253 | try: |
|
254 | try: | |
254 | idents, msg = self.session.feed_identities(raw_msg, copy=False) |
|
255 | idents, msg = self.session.feed_identities(raw_msg, copy=False) | |
255 | except Exception as e: |
|
256 | msg = self.session.unpack_message(msg, content=False, copy=False) | |
256 | self.log.error("task::Invaid msg: %s"%msg) |
|
257 | except: | |
|
258 | self.log.error("task::Invaid task: %s"%raw_msg, exc_info=True) | |||
257 | return |
|
259 | return | |
258 |
|
260 | |||
259 | # send to monitor |
|
261 | # send to monitor | |
260 | self.mon_stream.send_multipart(['intask']+raw_msg, copy=False) |
|
262 | self.mon_stream.send_multipart(['intask']+raw_msg, copy=False) | |
261 |
|
263 | |||
262 | msg = self.session.unpack_message(msg, content=False, copy=False) |
|
|||
263 | header = msg['header'] |
|
264 | header = msg['header'] | |
264 | msg_id = header['msg_id'] |
|
265 | msg_id = header['msg_id'] | |
|
266 | self.all_ids.add(msg_id) | |||
265 |
|
267 | |||
266 | # time dependencies |
|
268 | # time dependencies | |
267 | after = Dependency(header.get('after', [])) |
|
269 | after = Dependency(header.get('after', [])) | |
268 |
if after. |
|
270 | if after.all: | |
269 | after.difference_update(self.all_completed) |
|
271 | after.difference_update(self.all_completed) | |
270 | if not after.success_only: |
|
272 | if not after.success_only: | |
271 | after.difference_update(self.all_failed) |
|
273 | after.difference_update(self.all_failed) | |
272 | if after.check(self.all_completed, self.all_failed): |
|
274 | if after.check(self.all_completed, self.all_failed): | |
273 | # recast as empty set, if `after` already met, |
|
275 | # recast as empty set, if `after` already met, | |
274 | # to prevent unnecessary set comparisons |
|
276 | # to prevent unnecessary set comparisons | |
275 | after = MET |
|
277 | after = MET | |
276 |
|
278 | |||
277 | # location dependencies |
|
279 | # location dependencies | |
278 | follow = Dependency(header.get('follow', [])) |
|
280 | follow = Dependency(header.get('follow', [])) | |
279 | # check if unreachable: |
|
281 | ||
280 | if after.unreachable(self.all_failed) or follow.unreachable(self.all_failed): |
|
282 | for dep in after,follow: | |
281 | self.depending[msg_id] = [raw_msg,MET,MET,None] |
|
283 | # check valid: | |
282 | return self.fail_unreachable(msg_id) |
|
284 | if msg_id in dep or dep.difference(self.all_ids): | |
|
285 | self.depending[msg_id] = [raw_msg,MET,MET,None] | |||
|
286 | return self.fail_unreachable(msg_id, error.InvalidDependency) | |||
|
287 | # check if unreachable: | |||
|
288 | if dep.unreachable(self.all_failed): | |||
|
289 | self.depending[msg_id] = [raw_msg,MET,MET,None] | |||
|
290 | return self.fail_unreachable(msg_id) | |||
283 |
|
291 | |||
284 | # turn timeouts into datetime objects: |
|
292 | # turn timeouts into datetime objects: | |
285 | timeout = header.get('timeout', None) |
|
293 | timeout = header.get('timeout', None) | |
286 | if timeout: |
|
294 | if timeout: | |
287 | timeout = datetime.now() + timedelta(0,timeout,0) |
|
295 | timeout = datetime.now() + timedelta(0,timeout,0) | |
288 |
|
296 | |||
289 | if after.check(self.all_completed, self.all_failed): |
|
297 | if after.check(self.all_completed, self.all_failed): | |
290 | # time deps already met, try to run |
|
298 | # time deps already met, try to run | |
291 | if not self.maybe_run(msg_id, raw_msg, follow): |
|
299 | if not self.maybe_run(msg_id, raw_msg, follow, timeout): | |
292 | # can't run yet |
|
300 | # can't run yet | |
293 | self.save_unmet(msg_id, raw_msg, after, follow, timeout) |
|
301 | self.save_unmet(msg_id, raw_msg, after, follow, timeout) | |
294 | else: |
|
302 | else: | |
295 | self.save_unmet(msg_id, raw_msg, after, follow, timeout) |
|
303 | self.save_unmet(msg_id, raw_msg, after, follow, timeout) | |
296 |
|
304 | |||
297 | # @logged |
|
305 | # @logged | |
298 | def audit_timeouts(self): |
|
306 | def audit_timeouts(self): | |
299 | """Audit all waiting tasks for expired timeouts.""" |
|
307 | """Audit all waiting tasks for expired timeouts.""" | |
300 | now = datetime.now() |
|
308 | now = datetime.now() | |
301 | for msg_id in self.depending.keys(): |
|
309 | for msg_id in self.depending.keys(): | |
302 | # must recheck, in case one failure cascaded to another: |
|
310 | # must recheck, in case one failure cascaded to another: | |
303 | if msg_id in self.depending: |
|
311 | if msg_id in self.depending: | |
304 | raw,after,follow,timeout = self.depending[msg_id] |
|
312 | raw,after,follow,timeout = self.depending[msg_id] | |
305 | if timeout and timeout < now: |
|
313 | if timeout and timeout < now: | |
306 | self.fail_unreachable(msg_id, timeout=True) |
|
314 | self.fail_unreachable(msg_id, timeout=True) | |
307 |
|
315 | |||
308 | @logged |
|
316 | @logged | |
309 |
def fail_unreachable(self, msg_id, |
|
317 | def fail_unreachable(self, msg_id, why=error.ImpossibleDependency): | |
310 | """a message has become unreachable""" |
|
318 | """a message has become unreachable""" | |
311 | if msg_id not in self.depending: |
|
319 | if msg_id not in self.depending: | |
312 | self.log.error("msg %r already failed!"%msg_id) |
|
320 | self.log.error("msg %r already failed!"%msg_id) | |
313 | return |
|
321 | return | |
314 | raw_msg, after, follow, timeout = self.depending.pop(msg_id) |
|
322 | raw_msg, after, follow, timeout = self.depending.pop(msg_id) | |
315 | for mid in follow.union(after): |
|
323 | for mid in follow.union(after): | |
316 |
if mid in self. |
|
324 | if mid in self.graph: | |
317 |
self. |
|
325 | self.graph[mid].remove(msg_id) | |
318 |
|
326 | |||
319 | # FIXME: unpacking a message I've already unpacked, but didn't save: |
|
327 | # FIXME: unpacking a message I've already unpacked, but didn't save: | |
320 | idents,msg = self.session.feed_identities(raw_msg, copy=False) |
|
328 | idents,msg = self.session.feed_identities(raw_msg, copy=False) | |
321 | msg = self.session.unpack_message(msg, copy=False, content=False) |
|
329 | msg = self.session.unpack_message(msg, copy=False, content=False) | |
322 | header = msg['header'] |
|
330 | header = msg['header'] | |
323 |
|
331 | |||
324 | impossible = error.DependencyTimeout if timeout else error.ImpossibleDependency |
|
|||
325 |
|
||||
326 | try: |
|
332 | try: | |
327 |
raise |
|
333 | raise why() | |
328 | except: |
|
334 | except: | |
329 | content = ss.wrap_exception() |
|
335 | content = ss.wrap_exception() | |
330 |
|
336 | |||
331 | self.all_done.add(msg_id) |
|
337 | self.all_done.add(msg_id) | |
332 | self.all_failed.add(msg_id) |
|
338 | self.all_failed.add(msg_id) | |
333 |
|
339 | |||
334 | msg = self.session.send(self.client_stream, 'apply_reply', content, |
|
340 | msg = self.session.send(self.client_stream, 'apply_reply', content, | |
335 | parent=header, ident=idents) |
|
341 | parent=header, ident=idents) | |
336 | self.session.send(self.mon_stream, msg, ident=['outtask']+idents) |
|
342 | self.session.send(self.mon_stream, msg, ident=['outtask']+idents) | |
337 |
|
343 | |||
338 |
self.update_ |
|
344 | self.update_graph(msg_id, success=False) | |
339 |
|
345 | |||
340 | @logged |
|
346 | @logged | |
341 | def maybe_run(self, msg_id, raw_msg, follow=None): |
|
347 | def maybe_run(self, msg_id, raw_msg, follow=None, timeout=None): | |
342 | """check location dependencies, and run if they are met.""" |
|
348 | """check location dependencies, and run if they are met.""" | |
343 |
|
349 | |||
344 | if follow: |
|
350 | if follow: | |
345 | def can_run(idx): |
|
351 | def can_run(idx): | |
346 | target = self.targets[idx] |
|
352 | target = self.targets[idx] | |
347 | return target not in self.blacklist.get(msg_id, []) and\ |
|
353 | return target not in self.blacklist.get(msg_id, []) and\ | |
348 | follow.check(self.completed[target], self.failed[target]) |
|
354 | follow.check(self.completed[target], self.failed[target]) | |
349 |
|
355 | |||
350 | indices = filter(can_run, range(len(self.targets))) |
|
356 | indices = filter(can_run, range(len(self.targets))) | |
351 | if not indices: |
|
357 | if not indices: | |
352 | # TODO evaluate unmeetable follow dependencies |
|
358 | if follow.all: | |
353 | if follow.mode == 'all': |
|
|||
354 | dests = set() |
|
359 | dests = set() | |
355 | relevant = self.all_completed if follow.success_only else self.all_done |
|
360 | relevant = self.all_completed if follow.success_only else self.all_done | |
356 | for m in follow.intersection(relevant): |
|
361 | for m in follow.intersection(relevant): | |
357 | dests.add(self.destinations[m]) |
|
362 | dests.add(self.destinations[m]) | |
358 | if len(dests) > 1: |
|
363 | if len(dests) > 1: | |
359 | self.fail_unreachable(msg_id) |
|
364 | self.fail_unreachable(msg_id) | |
360 |
|
365 | |||
361 |
|
366 | |||
362 | return False |
|
367 | return False | |
363 | else: |
|
368 | else: | |
364 | indices = None |
|
369 | indices = None | |
365 |
|
370 | |||
366 | self.submit_task(msg_id, raw_msg, indices) |
|
371 | self.submit_task(msg_id, raw_msg, follow, timeout, indices) | |
367 | return True |
|
372 | return True | |
368 |
|
373 | |||
369 | @logged |
|
374 | @logged | |
370 | def save_unmet(self, msg_id, raw_msg, after, follow, timeout): |
|
375 | def save_unmet(self, msg_id, raw_msg, after, follow, timeout): | |
371 | """Save a message for later submission when its dependencies are met.""" |
|
376 | """Save a message for later submission when its dependencies are met.""" | |
372 | self.depending[msg_id] = [raw_msg,after,follow,timeout] |
|
377 | self.depending[msg_id] = [raw_msg,after,follow,timeout] | |
373 | # track the ids in follow or after, but not those already finished |
|
378 | # track the ids in follow or after, but not those already finished | |
374 | for dep_id in after.union(follow).difference(self.all_done): |
|
379 | for dep_id in after.union(follow).difference(self.all_done): | |
375 |
if dep_id not in self. |
|
380 | if dep_id not in self.graph: | |
376 |
self. |
|
381 | self.graph[dep_id] = set() | |
377 |
self. |
|
382 | self.graph[dep_id].add(msg_id) | |
378 |
|
383 | |||
379 | @logged |
|
384 | @logged | |
380 |
def submit_task(self, msg_id, raw_msg, follow |
|
385 | def submit_task(self, msg_id, raw_msg, follow, timeout, indices=None): | |
381 | """Submit a task to any of a subset of our targets.""" |
|
386 | """Submit a task to any of a subset of our targets.""" | |
382 | if indices: |
|
387 | if indices: | |
383 | loads = [self.loads[i] for i in indices] |
|
388 | loads = [self.loads[i] for i in indices] | |
384 | else: |
|
389 | else: | |
385 | loads = self.loads |
|
390 | loads = self.loads | |
386 | idx = self.scheme(loads) |
|
391 | idx = self.scheme(loads) | |
387 | if indices: |
|
392 | if indices: | |
388 | idx = indices[idx] |
|
393 | idx = indices[idx] | |
389 | target = self.targets[idx] |
|
394 | target = self.targets[idx] | |
390 | # print (target, map(str, msg[:3])) |
|
395 | # print (target, map(str, msg[:3])) | |
391 | self.engine_stream.send(target, flags=zmq.SNDMORE, copy=False) |
|
396 | self.engine_stream.send(target, flags=zmq.SNDMORE, copy=False) | |
392 | self.engine_stream.send_multipart(raw_msg, copy=False) |
|
397 | self.engine_stream.send_multipart(raw_msg, copy=False) | |
393 | self.add_job(idx) |
|
398 | self.add_job(idx) | |
394 | self.pending[target][msg_id] = (raw_msg, follow) |
|
399 | self.pending[target][msg_id] = (raw_msg, follow, timeout) | |
395 | content = dict(msg_id=msg_id, engine_id=target) |
|
400 | content = dict(msg_id=msg_id, engine_id=target) | |
396 | self.session.send(self.mon_stream, 'task_destination', content=content, |
|
401 | self.session.send(self.mon_stream, 'task_destination', content=content, | |
397 | ident=['tracktask',self.session.session]) |
|
402 | ident=['tracktask',self.session.session]) | |
398 |
|
403 | |||
399 | #----------------------------------------------------------------------- |
|
404 | #----------------------------------------------------------------------- | |
400 | # Result Handling |
|
405 | # Result Handling | |
401 | #----------------------------------------------------------------------- |
|
406 | #----------------------------------------------------------------------- | |
402 | @logged |
|
407 | @logged | |
403 | def dispatch_result(self, raw_msg): |
|
408 | def dispatch_result(self, raw_msg): | |
404 | try: |
|
409 | try: | |
405 | idents,msg = self.session.feed_identities(raw_msg, copy=False) |
|
410 | idents,msg = self.session.feed_identities(raw_msg, copy=False) | |
406 | except Exception as e: |
|
411 | msg = self.session.unpack_message(msg, content=False, copy=False) | |
407 | self.log.error("task::Invaid result: %s"%msg) |
|
412 | except: | |
|
413 | self.log.error("task::Invaid result: %s"%raw_msg, exc_info=True) | |||
408 | return |
|
414 | return | |
409 | msg = self.session.unpack_message(msg, content=False, copy=False) |
|
415 | ||
410 | header = msg['header'] |
|
416 | header = msg['header'] | |
411 | if header.get('dependencies_met', True): |
|
417 | if header.get('dependencies_met', True): | |
412 | success = (header['status'] == 'ok') |
|
418 | success = (header['status'] == 'ok') | |
413 | self.handle_result(idents, msg['parent_header'], raw_msg, success) |
|
419 | self.handle_result(idents, msg['parent_header'], raw_msg, success) | |
414 | # send to Hub monitor |
|
420 | # send to Hub monitor | |
415 | self.mon_stream.send_multipart(['outtask']+raw_msg, copy=False) |
|
421 | self.mon_stream.send_multipart(['outtask']+raw_msg, copy=False) | |
416 | else: |
|
422 | else: | |
417 | self.handle_unmet_dependency(idents, msg['parent_header']) |
|
423 | self.handle_unmet_dependency(idents, msg['parent_header']) | |
418 |
|
424 | |||
419 | @logged |
|
425 | @logged | |
420 | def handle_result(self, idents, parent, raw_msg, success=True): |
|
426 | def handle_result(self, idents, parent, raw_msg, success=True): | |
421 | # first, relay result to client |
|
427 | # first, relay result to client | |
422 | engine = idents[0] |
|
428 | engine = idents[0] | |
423 | client = idents[1] |
|
429 | client = idents[1] | |
424 | # swap_ids for XREP-XREP mirror |
|
430 | # swap_ids for XREP-XREP mirror | |
425 | raw_msg[:2] = [client,engine] |
|
431 | raw_msg[:2] = [client,engine] | |
426 | # print (map(str, raw_msg[:4])) |
|
432 | # print (map(str, raw_msg[:4])) | |
427 | self.client_stream.send_multipart(raw_msg, copy=False) |
|
433 | self.client_stream.send_multipart(raw_msg, copy=False) | |
428 | # now, update our data structures |
|
434 | # now, update our data structures | |
429 | msg_id = parent['msg_id'] |
|
435 | msg_id = parent['msg_id'] | |
430 | self.blacklist.pop(msg_id, None) |
|
436 | self.blacklist.pop(msg_id, None) | |
431 | self.pending[engine].pop(msg_id) |
|
437 | self.pending[engine].pop(msg_id) | |
432 | if success: |
|
438 | if success: | |
433 | self.completed[engine].add(msg_id) |
|
439 | self.completed[engine].add(msg_id) | |
434 | self.all_completed.add(msg_id) |
|
440 | self.all_completed.add(msg_id) | |
435 | else: |
|
441 | else: | |
436 | self.failed[engine].add(msg_id) |
|
442 | self.failed[engine].add(msg_id) | |
437 | self.all_failed.add(msg_id) |
|
443 | self.all_failed.add(msg_id) | |
438 | self.all_done.add(msg_id) |
|
444 | self.all_done.add(msg_id) | |
439 | self.destinations[msg_id] = engine |
|
445 | self.destinations[msg_id] = engine | |
440 |
|
446 | |||
441 |
self.update_ |
|
447 | self.update_graph(msg_id, success) | |
442 |
|
448 | |||
443 | @logged |
|
449 | @logged | |
444 | def handle_unmet_dependency(self, idents, parent): |
|
450 | def handle_unmet_dependency(self, idents, parent): | |
445 | engine = idents[0] |
|
451 | engine = idents[0] | |
446 | msg_id = parent['msg_id'] |
|
452 | msg_id = parent['msg_id'] | |
447 | if msg_id not in self.blacklist: |
|
453 | if msg_id not in self.blacklist: | |
448 | self.blacklist[msg_id] = set() |
|
454 | self.blacklist[msg_id] = set() | |
449 | self.blacklist[msg_id].add(engine) |
|
455 | self.blacklist[msg_id].add(engine) | |
450 | raw_msg,follow,timeout = self.pending[engine].pop(msg_id) |
|
456 | raw_msg,follow,timeout = self.pending[engine].pop(msg_id) | |
451 | if not self.maybe_run(msg_id, raw_msg, follow): |
|
457 | if not self.maybe_run(msg_id, raw_msg, follow, timeout): | |
452 | # resubmit failed, put it back in our dependency tree |
|
458 | # resubmit failed, put it back in our dependency tree | |
453 | self.save_unmet(msg_id, raw_msg, MET, follow, timeout) |
|
459 | self.save_unmet(msg_id, raw_msg, MET, follow, timeout) | |
454 | pass |
|
460 | pass | |
455 |
|
461 | |||
456 | @logged |
|
462 | @logged | |
457 |
def update_ |
|
463 | def update_graph(self, dep_id, success=True): | |
458 | """dep_id just finished. Update our dependency |
|
464 | """dep_id just finished. Update our dependency | |
459 | table and submit any jobs that just became runable.""" |
|
465 | table and submit any jobs that just became runable.""" | |
460 | # print ("\n\n***********") |
|
466 | # print ("\n\n***********") | |
461 | # pprint (dep_id) |
|
467 | # pprint (dep_id) | |
462 |
# pprint (self. |
|
468 | # pprint (self.graph) | |
463 | # pprint (self.depending) |
|
469 | # pprint (self.depending) | |
464 | # pprint (self.all_completed) |
|
470 | # pprint (self.all_completed) | |
465 | # pprint (self.all_failed) |
|
471 | # pprint (self.all_failed) | |
466 | # print ("\n\n***********\n\n") |
|
472 | # print ("\n\n***********\n\n") | |
467 |
if dep_id not in self. |
|
473 | if dep_id not in self.graph: | |
468 | return |
|
474 | return | |
469 |
jobs = self. |
|
475 | jobs = self.graph.pop(dep_id) | |
470 |
|
476 | |||
471 | for msg_id in jobs: |
|
477 | for msg_id in jobs: | |
472 | raw_msg, after, follow, timeout = self.depending[msg_id] |
|
478 | raw_msg, after, follow, timeout = self.depending[msg_id] | |
473 | # if dep_id in after: |
|
479 | # if dep_id in after: | |
474 |
# if after. |
|
480 | # if after.all and (success or not after.success_only): | |
475 | # after.remove(dep_id) |
|
481 | # after.remove(dep_id) | |
476 |
|
482 | |||
477 | if after.unreachable(self.all_failed) or follow.unreachable(self.all_failed): |
|
483 | if after.unreachable(self.all_failed) or follow.unreachable(self.all_failed): | |
478 | self.fail_unreachable(msg_id) |
|
484 | self.fail_unreachable(msg_id) | |
479 |
|
485 | |||
480 | elif after.check(self.all_completed, self.all_failed): # time deps met, maybe run |
|
486 | elif after.check(self.all_completed, self.all_failed): # time deps met, maybe run | |
481 | self.depending[msg_id][1] = MET |
|
487 | self.depending[msg_id][1] = MET | |
482 | if self.maybe_run(msg_id, raw_msg, follow): |
|
488 | if self.maybe_run(msg_id, raw_msg, follow, timeout): | |
483 |
|
489 | |||
484 | self.depending.pop(msg_id) |
|
490 | self.depending.pop(msg_id) | |
485 | for mid in follow.union(after): |
|
491 | for mid in follow.union(after): | |
486 |
if mid in self. |
|
492 | if mid in self.graph: | |
487 |
self. |
|
493 | self.graph[mid].remove(msg_id) | |
488 |
|
494 | |||
489 | #---------------------------------------------------------------------- |
|
495 | #---------------------------------------------------------------------- | |
490 | # methods to be overridden by subclasses |
|
496 | # methods to be overridden by subclasses | |
491 | #---------------------------------------------------------------------- |
|
497 | #---------------------------------------------------------------------- | |
492 |
|
498 | |||
493 | def add_job(self, idx): |
|
499 | def add_job(self, idx): | |
494 | """Called after self.targets[idx] just got the job with header. |
|
500 | """Called after self.targets[idx] just got the job with header. | |
495 | Override with subclasses. The default ordering is simple LRU. |
|
501 | Override with subclasses. The default ordering is simple LRU. | |
496 | The default loads are the number of outstanding jobs.""" |
|
502 | The default loads are the number of outstanding jobs.""" | |
497 | self.loads[idx] += 1 |
|
503 | self.loads[idx] += 1 | |
498 | for lis in (self.targets, self.loads): |
|
504 | for lis in (self.targets, self.loads): | |
499 | lis.append(lis.pop(idx)) |
|
505 | lis.append(lis.pop(idx)) | |
500 |
|
506 | |||
501 |
|
507 | |||
502 | def finish_job(self, idx): |
|
508 | def finish_job(self, idx): | |
503 | """Called after self.targets[idx] just finished a job. |
|
509 | """Called after self.targets[idx] just finished a job. | |
504 | Override with subclasses.""" |
|
510 | Override with subclasses.""" | |
505 | self.loads[idx] -= 1 |
|
511 | self.loads[idx] -= 1 | |
506 |
|
512 | |||
507 |
|
513 | |||
508 |
|
514 | |||
509 |
def launch_scheduler(in_addr, out_addr, mon_addr, not_addr, config=None,logname='ZMQ', |
|
515 | def launch_scheduler(in_addr, out_addr, mon_addr, not_addr, config=None,logname='ZMQ', | |
|
516 | log_addr=None, loglevel=logging.DEBUG, scheme='lru'): | |||
510 | from zmq.eventloop import ioloop |
|
517 | from zmq.eventloop import ioloop | |
511 | from zmq.eventloop.zmqstream import ZMQStream |
|
518 | from zmq.eventloop.zmqstream import ZMQStream | |
512 |
|
519 | |||
513 | ctx = zmq.Context() |
|
520 | ctx = zmq.Context() | |
514 | loop = ioloop.IOLoop() |
|
521 | loop = ioloop.IOLoop() | |
515 | print (in_addr, out_addr, mon_addr, not_addr) |
|
522 | print (in_addr, out_addr, mon_addr, not_addr) | |
516 | ins = ZMQStream(ctx.socket(zmq.XREP),loop) |
|
523 | ins = ZMQStream(ctx.socket(zmq.XREP),loop) | |
517 | ins.bind(in_addr) |
|
524 | ins.bind(in_addr) | |
518 | outs = ZMQStream(ctx.socket(zmq.XREP),loop) |
|
525 | outs = ZMQStream(ctx.socket(zmq.XREP),loop) | |
519 | outs.bind(out_addr) |
|
526 | outs.bind(out_addr) | |
520 | mons = ZMQStream(ctx.socket(zmq.PUB),loop) |
|
527 | mons = ZMQStream(ctx.socket(zmq.PUB),loop) | |
521 | mons.connect(mon_addr) |
|
528 | mons.connect(mon_addr) | |
522 | nots = ZMQStream(ctx.socket(zmq.SUB),loop) |
|
529 | nots = ZMQStream(ctx.socket(zmq.SUB),loop) | |
523 | nots.setsockopt(zmq.SUBSCRIBE, '') |
|
530 | nots.setsockopt(zmq.SUBSCRIBE, '') | |
524 | nots.connect(not_addr) |
|
531 | nots.connect(not_addr) | |
525 |
|
532 | |||
526 | scheme = globals().get(scheme, None) |
|
533 | scheme = globals().get(scheme, None) | |
527 | # setup logging |
|
534 | # setup logging | |
528 | if log_addr: |
|
535 | if log_addr: | |
529 | connect_logger(logname, ctx, log_addr, root="scheduler", loglevel=loglevel) |
|
536 | connect_logger(logname, ctx, log_addr, root="scheduler", loglevel=loglevel) | |
530 | else: |
|
537 | else: | |
531 | local_logger(logname, loglevel) |
|
538 | local_logger(logname, loglevel) | |
532 |
|
539 | |||
533 | scheduler = TaskScheduler(client_stream=ins, engine_stream=outs, |
|
540 | scheduler = TaskScheduler(client_stream=ins, engine_stream=outs, | |
534 | mon_stream=mons, notifier_stream=nots, |
|
541 | mon_stream=mons, notifier_stream=nots, | |
535 | scheme=scheme, loop=loop, logname=logname, |
|
542 | scheme=scheme, loop=loop, logname=logname, | |
536 | config=config) |
|
543 | config=config) | |
537 | scheduler.start() |
|
544 | scheduler.start() | |
538 | try: |
|
545 | try: | |
539 | loop.start() |
|
546 | loop.start() | |
540 | except KeyboardInterrupt: |
|
547 | except KeyboardInterrupt: | |
541 | print ("interrupted, exiting...", file=sys.__stderr__) |
|
548 | print ("interrupted, exiting...", file=sys.__stderr__) | |
542 |
|
549 |
@@ -1,355 +1,360 b'' | |||||
1 | """Views of remote engines""" |
|
1 | """Views of remote engines""" | |
2 | #----------------------------------------------------------------------------- |
|
2 | #----------------------------------------------------------------------------- | |
3 | # Copyright (C) 2010 The IPython Development Team |
|
3 | # Copyright (C) 2010 The IPython Development Team | |
4 | # |
|
4 | # | |
5 | # Distributed under the terms of the BSD License. The full license is in |
|
5 | # Distributed under the terms of the BSD License. The full license is in | |
6 | # the file COPYING, distributed as part of this software. |
|
6 | # the file COPYING, distributed as part of this software. | |
7 | #----------------------------------------------------------------------------- |
|
7 | #----------------------------------------------------------------------------- | |
8 |
|
8 | |||
9 | #----------------------------------------------------------------------------- |
|
9 | #----------------------------------------------------------------------------- | |
10 | # Imports |
|
10 | # Imports | |
11 | #----------------------------------------------------------------------------- |
|
11 | #----------------------------------------------------------------------------- | |
12 |
|
12 | |||
13 | from IPython.external.decorator import decorator |
|
13 | from IPython.external.decorator import decorator | |
14 | from IPython.zmq.parallel.remotefunction import ParallelFunction, parallel |
|
14 | from IPython.zmq.parallel.remotefunction import ParallelFunction, parallel | |
15 |
|
15 | |||
16 | #----------------------------------------------------------------------------- |
|
16 | #----------------------------------------------------------------------------- | |
17 | # Decorators |
|
17 | # Decorators | |
18 | #----------------------------------------------------------------------------- |
|
18 | #----------------------------------------------------------------------------- | |
19 |
|
19 | |||
20 | @decorator |
|
20 | @decorator | |
21 | def myblock(f, self, *args, **kwargs): |
|
21 | def myblock(f, self, *args, **kwargs): | |
22 | """override client.block with self.block during a call""" |
|
22 | """override client.block with self.block during a call""" | |
23 | block = self.client.block |
|
23 | block = self.client.block | |
24 | self.client.block = self.block |
|
24 | self.client.block = self.block | |
25 | try: |
|
25 | try: | |
26 | ret = f(self, *args, **kwargs) |
|
26 | ret = f(self, *args, **kwargs) | |
27 | finally: |
|
27 | finally: | |
28 | self.client.block = block |
|
28 | self.client.block = block | |
29 | return ret |
|
29 | return ret | |
30 |
|
30 | |||
31 | @decorator |
|
31 | @decorator | |
32 | def save_ids(f, self, *args, **kwargs): |
|
32 | def save_ids(f, self, *args, **kwargs): | |
33 | """Keep our history and outstanding attributes up to date after a method call.""" |
|
33 | """Keep our history and outstanding attributes up to date after a method call.""" | |
34 | n_previous = len(self.client.history) |
|
34 | n_previous = len(self.client.history) | |
35 | ret = f(self, *args, **kwargs) |
|
35 | ret = f(self, *args, **kwargs) | |
36 | nmsgs = len(self.client.history) - n_previous |
|
36 | nmsgs = len(self.client.history) - n_previous | |
37 | msg_ids = self.client.history[-nmsgs:] |
|
37 | msg_ids = self.client.history[-nmsgs:] | |
38 | self.history.extend(msg_ids) |
|
38 | self.history.extend(msg_ids) | |
39 | map(self.outstanding.add, msg_ids) |
|
39 | map(self.outstanding.add, msg_ids) | |
40 | return ret |
|
40 | return ret | |
41 |
|
41 | |||
42 | @decorator |
|
42 | @decorator | |
43 | def sync_results(f, self, *args, **kwargs): |
|
43 | def sync_results(f, self, *args, **kwargs): | |
44 | """sync relevant results from self.client to our results attribute.""" |
|
44 | """sync relevant results from self.client to our results attribute.""" | |
45 | ret = f(self, *args, **kwargs) |
|
45 | ret = f(self, *args, **kwargs) | |
46 | delta = self.outstanding.difference(self.client.outstanding) |
|
46 | delta = self.outstanding.difference(self.client.outstanding) | |
47 | completed = self.outstanding.intersection(delta) |
|
47 | completed = self.outstanding.intersection(delta) | |
48 | self.outstanding = self.outstanding.difference(completed) |
|
48 | self.outstanding = self.outstanding.difference(completed) | |
49 | for msg_id in completed: |
|
49 | for msg_id in completed: | |
50 | self.results[msg_id] = self.client.results[msg_id] |
|
50 | self.results[msg_id] = self.client.results[msg_id] | |
51 | return ret |
|
51 | return ret | |
52 |
|
52 | |||
53 | @decorator |
|
53 | @decorator | |
54 | def spin_after(f, self, *args, **kwargs): |
|
54 | def spin_after(f, self, *args, **kwargs): | |
55 | """call spin after the method.""" |
|
55 | """call spin after the method.""" | |
56 | ret = f(self, *args, **kwargs) |
|
56 | ret = f(self, *args, **kwargs) | |
57 | self.spin() |
|
57 | self.spin() | |
58 | return ret |
|
58 | return ret | |
59 |
|
59 | |||
60 | #----------------------------------------------------------------------------- |
|
60 | #----------------------------------------------------------------------------- | |
61 | # Classes |
|
61 | # Classes | |
62 | #----------------------------------------------------------------------------- |
|
62 | #----------------------------------------------------------------------------- | |
63 |
|
63 | |||
64 | class View(object): |
|
64 | class View(object): | |
65 | """Base View class for more convenint apply(f,*args,**kwargs) syntax via attributes. |
|
65 | """Base View class for more convenint apply(f,*args,**kwargs) syntax via attributes. | |
66 |
|
66 | |||
67 | Don't use this class, use subclasses. |
|
67 | Don't use this class, use subclasses. | |
68 | """ |
|
68 | """ | |
69 | _targets = None |
|
69 | _targets = None | |
70 | block=None |
|
70 | block=None | |
71 | bound=None |
|
71 | bound=None | |
72 | history=None |
|
72 | history=None | |
73 |
|
73 | |||
74 | def __init__(self, client, targets=None): |
|
74 | def __init__(self, client, targets=None): | |
75 | self.client = client |
|
75 | self.client = client | |
76 | self._targets = targets |
|
76 | self._targets = targets | |
77 | self._ntargets = 1 if isinstance(targets, (int,type(None))) else len(targets) |
|
77 | self._ntargets = 1 if isinstance(targets, (int,type(None))) else len(targets) | |
78 | self.block = client.block |
|
78 | self.block = client.block | |
79 | self.bound=False |
|
79 | self.bound=False | |
80 | self.history = [] |
|
80 | self.history = [] | |
81 | self.outstanding = set() |
|
81 | self.outstanding = set() | |
82 | self.results = {} |
|
82 | self.results = {} | |
83 |
|
83 | |||
84 | def __repr__(self): |
|
84 | def __repr__(self): | |
85 | strtargets = str(self._targets) |
|
85 | strtargets = str(self._targets) | |
86 | if len(strtargets) > 16: |
|
86 | if len(strtargets) > 16: | |
87 | strtargets = strtargets[:12]+'...]' |
|
87 | strtargets = strtargets[:12]+'...]' | |
88 | return "<%s %s>"%(self.__class__.__name__, strtargets) |
|
88 | return "<%s %s>"%(self.__class__.__name__, strtargets) | |
89 |
|
89 | |||
90 | @property |
|
90 | @property | |
91 | def targets(self): |
|
91 | def targets(self): | |
92 | return self._targets |
|
92 | return self._targets | |
93 |
|
93 | |||
94 | @targets.setter |
|
94 | @targets.setter | |
95 | def targets(self, value): |
|
95 | def targets(self, value): | |
96 | self._targets = value |
|
96 | self._targets = value | |
97 | # raise AttributeError("Cannot set my targets argument after construction!") |
|
97 | # raise AttributeError("Cannot set my targets argument after construction!") | |
98 |
|
98 | |||
99 | @sync_results |
|
99 | @sync_results | |
100 | def spin(self): |
|
100 | def spin(self): | |
101 | """spin the client, and sync""" |
|
101 | """spin the client, and sync""" | |
102 | self.client.spin() |
|
102 | self.client.spin() | |
103 |
|
103 | |||
104 | @sync_results |
|
104 | @sync_results | |
105 | @save_ids |
|
105 | @save_ids | |
106 | def apply(self, f, *args, **kwargs): |
|
106 | def apply(self, f, *args, **kwargs): | |
107 | """calls f(*args, **kwargs) on remote engines, returning the result. |
|
107 | """calls f(*args, **kwargs) on remote engines, returning the result. | |
108 |
|
108 | |||
109 | This method does not involve the engine's namespace. |
|
109 | This method does not involve the engine's namespace. | |
110 |
|
110 | |||
111 | if self.block is False: |
|
111 | if self.block is False: | |
112 | returns msg_id |
|
112 | returns msg_id | |
113 | else: |
|
113 | else: | |
114 | returns actual result of f(*args, **kwargs) |
|
114 | returns actual result of f(*args, **kwargs) | |
115 | """ |
|
115 | """ | |
116 | return self.client.apply(f, args, kwargs, block=self.block, targets=self.targets, bound=self.bound) |
|
116 | return self.client.apply(f, args, kwargs, block=self.block, targets=self.targets, bound=self.bound) | |
117 |
|
117 | |||
118 | @save_ids |
|
118 | @save_ids | |
119 | def apply_async(self, f, *args, **kwargs): |
|
119 | def apply_async(self, f, *args, **kwargs): | |
120 | """calls f(*args, **kwargs) on remote engines in a nonblocking manner. |
|
120 | """calls f(*args, **kwargs) on remote engines in a nonblocking manner. | |
121 |
|
121 | |||
122 | This method does not involve the engine's namespace. |
|
122 | This method does not involve the engine's namespace. | |
123 |
|
123 | |||
124 | returns msg_id |
|
124 | returns msg_id | |
125 | """ |
|
125 | """ | |
126 | return self.client.apply(f,args,kwargs, block=False, targets=self.targets, bound=False) |
|
126 | return self.client.apply(f,args,kwargs, block=False, targets=self.targets, bound=False) | |
127 |
|
127 | |||
128 | @spin_after |
|
128 | @spin_after | |
129 | @save_ids |
|
129 | @save_ids | |
130 | def apply_sync(self, f, *args, **kwargs): |
|
130 | def apply_sync(self, f, *args, **kwargs): | |
131 | """calls f(*args, **kwargs) on remote engines in a blocking manner, |
|
131 | """calls f(*args, **kwargs) on remote engines in a blocking manner, | |
132 | returning the result. |
|
132 | returning the result. | |
133 |
|
133 | |||
134 | This method does not involve the engine's namespace. |
|
134 | This method does not involve the engine's namespace. | |
135 |
|
135 | |||
136 | returns: actual result of f(*args, **kwargs) |
|
136 | returns: actual result of f(*args, **kwargs) | |
137 | """ |
|
137 | """ | |
138 | return self.client.apply(f,args,kwargs, block=True, targets=self.targets, bound=False) |
|
138 | return self.client.apply(f,args,kwargs, block=True, targets=self.targets, bound=False) | |
139 |
|
139 | |||
140 | @sync_results |
|
140 | @sync_results | |
141 | @save_ids |
|
141 | @save_ids | |
142 | def apply_bound(self, f, *args, **kwargs): |
|
142 | def apply_bound(self, f, *args, **kwargs): | |
143 | """calls f(*args, **kwargs) bound to engine namespace(s). |
|
143 | """calls f(*args, **kwargs) bound to engine namespace(s). | |
144 |
|
144 | |||
145 | if self.block is False: |
|
145 | if self.block is False: | |
146 | returns msg_id |
|
146 | returns msg_id | |
147 | else: |
|
147 | else: | |
148 | returns actual result of f(*args, **kwargs) |
|
148 | returns actual result of f(*args, **kwargs) | |
149 |
|
149 | |||
150 | This method has access to the targets' globals |
|
150 | This method has access to the targets' globals | |
151 |
|
151 | |||
152 | """ |
|
152 | """ | |
153 | return self.client.apply(f, args, kwargs, block=self.block, targets=self.targets, bound=True) |
|
153 | return self.client.apply(f, args, kwargs, block=self.block, targets=self.targets, bound=True) | |
154 |
|
154 | |||
155 | @sync_results |
|
155 | @sync_results | |
156 | @save_ids |
|
156 | @save_ids | |
157 | def apply_async_bound(self, f, *args, **kwargs): |
|
157 | def apply_async_bound(self, f, *args, **kwargs): | |
158 | """calls f(*args, **kwargs) bound to engine namespace(s) |
|
158 | """calls f(*args, **kwargs) bound to engine namespace(s) | |
159 | in a nonblocking manner. |
|
159 | in a nonblocking manner. | |
160 |
|
160 | |||
161 | returns: msg_id |
|
161 | returns: msg_id | |
162 |
|
162 | |||
163 | This method has access to the targets' globals |
|
163 | This method has access to the targets' globals | |
164 |
|
164 | |||
165 | """ |
|
165 | """ | |
166 | return self.client.apply(f, args, kwargs, block=False, targets=self.targets, bound=True) |
|
166 | return self.client.apply(f, args, kwargs, block=False, targets=self.targets, bound=True) | |
167 |
|
167 | |||
168 | @spin_after |
|
168 | @spin_after | |
169 | @save_ids |
|
169 | @save_ids | |
170 | def apply_sync_bound(self, f, *args, **kwargs): |
|
170 | def apply_sync_bound(self, f, *args, **kwargs): | |
171 | """calls f(*args, **kwargs) bound to engine namespace(s), waiting for the result. |
|
171 | """calls f(*args, **kwargs) bound to engine namespace(s), waiting for the result. | |
172 |
|
172 | |||
173 | returns: actual result of f(*args, **kwargs) |
|
173 | returns: actual result of f(*args, **kwargs) | |
174 |
|
174 | |||
175 | This method has access to the targets' globals |
|
175 | This method has access to the targets' globals | |
176 |
|
176 | |||
177 | """ |
|
177 | """ | |
178 | return self.client.apply(f, args, kwargs, block=True, targets=self.targets, bound=True) |
|
178 | return self.client.apply(f, args, kwargs, block=True, targets=self.targets, bound=True) | |
179 |
|
179 | |||
180 | @spin_after |
|
180 | @spin_after | |
181 | @save_ids |
|
181 | @save_ids | |
182 | def map(self, f, *sequences): |
|
182 | def map(self, f, *sequences): | |
183 | """Parallel version of builtin `map`, using this view's engines.""" |
|
183 | """Parallel version of builtin `map`, using this view's engines.""" | |
184 | if isinstance(self.targets, int): |
|
184 | if isinstance(self.targets, int): | |
185 | targets = [self.targets] |
|
185 | targets = [self.targets] | |
186 | else: |
|
186 | else: | |
187 | targets = self.targets |
|
187 | targets = self.targets | |
188 | pf = ParallelFunction(self.client, f, block=self.block, |
|
188 | pf = ParallelFunction(self.client, f, block=self.block, | |
189 | bound=True, targets=targets) |
|
189 | bound=True, targets=targets) | |
190 | return pf.map(*sequences) |
|
190 | return pf.map(*sequences) | |
191 |
|
191 | |||
192 | def parallel(self, bound=True, block=True): |
|
192 | def parallel(self, bound=True, block=True): | |
193 | """Decorator for making a ParallelFunction""" |
|
193 | """Decorator for making a ParallelFunction""" | |
194 | return parallel(self.client, bound=bound, targets=self.targets, block=block) |
|
194 | return parallel(self.client, bound=bound, targets=self.targets, block=block) | |
195 |
|
195 | |||
196 | def abort(self, msg_ids=None, block=None): |
|
196 | def abort(self, msg_ids=None, block=None): | |
197 | """Abort jobs on my engines. |
|
197 | """Abort jobs on my engines. | |
198 |
|
198 | |||
199 | Parameters |
|
199 | Parameters | |
200 | ---------- |
|
200 | ---------- | |
201 |
|
201 | |||
202 | msg_ids : None, str, list of strs, optional |
|
202 | msg_ids : None, str, list of strs, optional | |
203 | if None: abort all jobs. |
|
203 | if None: abort all jobs. | |
204 | else: abort specific msg_id(s). |
|
204 | else: abort specific msg_id(s). | |
205 | """ |
|
205 | """ | |
206 | block = block if block is not None else self.block |
|
206 | block = block if block is not None else self.block | |
207 | return self.client.abort(msg_ids=msg_ids, targets=self.targets, block=block) |
|
207 | return self.client.abort(msg_ids=msg_ids, targets=self.targets, block=block) | |
208 |
|
208 | |||
209 | def queue_status(self, verbose=False): |
|
209 | def queue_status(self, verbose=False): | |
210 | """Fetch the Queue status of my engines""" |
|
210 | """Fetch the Queue status of my engines""" | |
211 | return self.client.queue_status(targets=self.targets, verbose=verbose) |
|
211 | return self.client.queue_status(targets=self.targets, verbose=verbose) | |
212 |
|
212 | |||
213 | def purge_results(self, msg_ids=[], targets=[]): |
|
213 | def purge_results(self, msg_ids=[], targets=[]): | |
214 | """Instruct the controller to forget specific results.""" |
|
214 | """Instruct the controller to forget specific results.""" | |
215 | if targets is None or targets == 'all': |
|
215 | if targets is None or targets == 'all': | |
216 | targets = self.targets |
|
216 | targets = self.targets | |
217 | return self.client.purge_results(msg_ids=msg_ids, targets=targets) |
|
217 | return self.client.purge_results(msg_ids=msg_ids, targets=targets) | |
218 |
|
218 | |||
219 |
|
219 | |||
220 |
|
220 | |||
221 | class DirectView(View): |
|
221 | class DirectView(View): | |
222 | """Direct Multiplexer View of one or more engines. |
|
222 | """Direct Multiplexer View of one or more engines. | |
223 |
|
223 | |||
224 | These are created via indexed access to a client: |
|
224 | These are created via indexed access to a client: | |
225 |
|
225 | |||
226 | >>> dv_1 = client[1] |
|
226 | >>> dv_1 = client[1] | |
227 | >>> dv_all = client[:] |
|
227 | >>> dv_all = client[:] | |
228 | >>> dv_even = client[::2] |
|
228 | >>> dv_even = client[::2] | |
229 | >>> dv_some = client[1:3] |
|
229 | >>> dv_some = client[1:3] | |
230 |
|
230 | |||
231 | This object provides dictionary access |
|
231 | This object provides dictionary access to engine namespaces: | |
|
232 | ||||
|
233 | # push a=5: | |||
|
234 | >>> dv['a'] = 5 | |||
|
235 | # pull 'foo': | |||
|
236 | >>> db['foo'] | |||
232 |
|
237 | |||
233 | """ |
|
238 | """ | |
234 |
|
239 | |||
235 | @sync_results |
|
240 | @sync_results | |
236 | @save_ids |
|
241 | @save_ids | |
237 | def execute(self, code, block=True): |
|
242 | def execute(self, code, block=True): | |
238 | """execute some code on my targets.""" |
|
243 | """execute some code on my targets.""" | |
239 | return self.client.execute(code, block=self.block, targets=self.targets) |
|
244 | return self.client.execute(code, block=self.block, targets=self.targets) | |
240 |
|
245 | |||
241 | def update(self, ns): |
|
246 | def update(self, ns): | |
242 | """update remote namespace with dict `ns`""" |
|
247 | """update remote namespace with dict `ns`""" | |
243 | return self.client.push(ns, targets=self.targets, block=self.block) |
|
248 | return self.client.push(ns, targets=self.targets, block=self.block) | |
244 |
|
249 | |||
245 | push = update |
|
250 | push = update | |
246 |
|
251 | |||
247 | def get(self, key_s): |
|
252 | def get(self, key_s): | |
248 | """get object(s) by `key_s` from remote namespace |
|
253 | """get object(s) by `key_s` from remote namespace | |
249 | will return one object if it is a key. |
|
254 | will return one object if it is a key. | |
250 | It also takes a list of keys, and will return a list of objects.""" |
|
255 | It also takes a list of keys, and will return a list of objects.""" | |
251 | # block = block if block is not None else self.block |
|
256 | # block = block if block is not None else self.block | |
252 | return self.client.pull(key_s, block=True, targets=self.targets) |
|
257 | return self.client.pull(key_s, block=True, targets=self.targets) | |
253 |
|
258 | |||
254 | @sync_results |
|
259 | @sync_results | |
255 | @save_ids |
|
260 | @save_ids | |
256 | def pull(self, key_s, block=True): |
|
261 | def pull(self, key_s, block=True): | |
257 | """get object(s) by `key_s` from remote namespace |
|
262 | """get object(s) by `key_s` from remote namespace | |
258 | will return one object if it is a key. |
|
263 | will return one object if it is a key. | |
259 | It also takes a list of keys, and will return a list of objects.""" |
|
264 | It also takes a list of keys, and will return a list of objects.""" | |
260 | block = block if block is not None else self.block |
|
265 | block = block if block is not None else self.block | |
261 | return self.client.pull(key_s, block=block, targets=self.targets) |
|
266 | return self.client.pull(key_s, block=block, targets=self.targets) | |
262 |
|
267 | |||
263 | def scatter(self, key, seq, dist='b', flatten=False, targets=None, block=None): |
|
268 | def scatter(self, key, seq, dist='b', flatten=False, targets=None, block=None): | |
264 | """ |
|
269 | """ | |
265 | Partition a Python sequence and send the partitions to a set of engines. |
|
270 | Partition a Python sequence and send the partitions to a set of engines. | |
266 | """ |
|
271 | """ | |
267 | block = block if block is not None else self.block |
|
272 | block = block if block is not None else self.block | |
268 | targets = targets if targets is not None else self.targets |
|
273 | targets = targets if targets is not None else self.targets | |
269 |
|
274 | |||
270 | return self.client.scatter(key, seq, dist=dist, flatten=flatten, |
|
275 | return self.client.scatter(key, seq, dist=dist, flatten=flatten, | |
271 | targets=targets, block=block) |
|
276 | targets=targets, block=block) | |
272 |
|
277 | |||
273 | @sync_results |
|
278 | @sync_results | |
274 | @save_ids |
|
279 | @save_ids | |
275 | def gather(self, key, dist='b', targets=None, block=None): |
|
280 | def gather(self, key, dist='b', targets=None, block=None): | |
276 | """ |
|
281 | """ | |
277 | Gather a partitioned sequence on a set of engines as a single local seq. |
|
282 | Gather a partitioned sequence on a set of engines as a single local seq. | |
278 | """ |
|
283 | """ | |
279 | block = block if block is not None else self.block |
|
284 | block = block if block is not None else self.block | |
280 | targets = targets if targets is not None else self.targets |
|
285 | targets = targets if targets is not None else self.targets | |
281 |
|
286 | |||
282 | return self.client.gather(key, dist=dist, targets=targets, block=block) |
|
287 | return self.client.gather(key, dist=dist, targets=targets, block=block) | |
283 |
|
288 | |||
284 | def __getitem__(self, key): |
|
289 | def __getitem__(self, key): | |
285 | return self.get(key) |
|
290 | return self.get(key) | |
286 |
|
291 | |||
287 | def __setitem__(self,key, value): |
|
292 | def __setitem__(self,key, value): | |
288 | self.update({key:value}) |
|
293 | self.update({key:value}) | |
289 |
|
294 | |||
290 | def clear(self, block=False): |
|
295 | def clear(self, block=False): | |
291 | """Clear the remote namespaces on my engines.""" |
|
296 | """Clear the remote namespaces on my engines.""" | |
292 | block = block if block is not None else self.block |
|
297 | block = block if block is not None else self.block | |
293 | return self.client.clear(targets=self.targets, block=block) |
|
298 | return self.client.clear(targets=self.targets, block=block) | |
294 |
|
299 | |||
295 | def kill(self, block=True): |
|
300 | def kill(self, block=True): | |
296 | """Kill my engines.""" |
|
301 | """Kill my engines.""" | |
297 | block = block if block is not None else self.block |
|
302 | block = block if block is not None else self.block | |
298 | return self.client.kill(targets=self.targets, block=block) |
|
303 | return self.client.kill(targets=self.targets, block=block) | |
299 |
|
304 | |||
300 | #---------------------------------------- |
|
305 | #---------------------------------------- | |
301 | # activate for %px,%autopx magics |
|
306 | # activate for %px,%autopx magics | |
302 | #---------------------------------------- |
|
307 | #---------------------------------------- | |
303 | def activate(self): |
|
308 | def activate(self): | |
304 | """Make this `View` active for parallel magic commands. |
|
309 | """Make this `View` active for parallel magic commands. | |
305 |
|
310 | |||
306 | IPython has a magic command syntax to work with `MultiEngineClient` objects. |
|
311 | IPython has a magic command syntax to work with `MultiEngineClient` objects. | |
307 | In a given IPython session there is a single active one. While |
|
312 | In a given IPython session there is a single active one. While | |
308 | there can be many `Views` created and used by the user, |
|
313 | there can be many `Views` created and used by the user, | |
309 | there is only one active one. The active `View` is used whenever |
|
314 | there is only one active one. The active `View` is used whenever | |
310 | the magic commands %px and %autopx are used. |
|
315 | the magic commands %px and %autopx are used. | |
311 |
|
316 | |||
312 | The activate() method is called on a given `View` to make it |
|
317 | The activate() method is called on a given `View` to make it | |
313 | active. Once this has been done, the magic commands can be used. |
|
318 | active. Once this has been done, the magic commands can be used. | |
314 | """ |
|
319 | """ | |
315 |
|
320 | |||
316 | try: |
|
321 | try: | |
317 | # This is injected into __builtins__. |
|
322 | # This is injected into __builtins__. | |
318 | ip = get_ipython() |
|
323 | ip = get_ipython() | |
319 | except NameError: |
|
324 | except NameError: | |
320 | print "The IPython parallel magics (%result, %px, %autopx) only work within IPython." |
|
325 | print "The IPython parallel magics (%result, %px, %autopx) only work within IPython." | |
321 | else: |
|
326 | else: | |
322 | pmagic = ip.plugin_manager.get_plugin('parallelmagic') |
|
327 | pmagic = ip.plugin_manager.get_plugin('parallelmagic') | |
323 | if pmagic is not None: |
|
328 | if pmagic is not None: | |
324 | pmagic.active_multiengine_client = self |
|
329 | pmagic.active_multiengine_client = self | |
325 | else: |
|
330 | else: | |
326 | print "You must first load the parallelmagic extension " \ |
|
331 | print "You must first load the parallelmagic extension " \ | |
327 | "by doing '%load_ext parallelmagic'" |
|
332 | "by doing '%load_ext parallelmagic'" | |
328 |
|
333 | |||
329 |
|
334 | |||
330 | class LoadBalancedView(View): |
|
335 | class LoadBalancedView(View): | |
331 | """An engine-agnostic View that only executes via the Task queue. |
|
336 | """An engine-agnostic View that only executes via the Task queue. | |
332 |
|
337 | |||
333 | Typically created via: |
|
338 | Typically created via: | |
334 |
|
339 | |||
335 | >>> lbv = client[None] |
|
340 | >>> lbv = client[None] | |
336 | <LoadBalancedView tcp://127.0.0.1:12345> |
|
341 | <LoadBalancedView tcp://127.0.0.1:12345> | |
337 |
|
342 | |||
338 | but can also be created with: |
|
343 | but can also be created with: | |
339 |
|
344 | |||
340 | >>> lbc = LoadBalancedView(client) |
|
345 | >>> lbc = LoadBalancedView(client) | |
341 |
|
346 | |||
342 | TODO: allow subset of engines across which to balance. |
|
347 | TODO: allow subset of engines across which to balance. | |
343 | """ |
|
348 | """ | |
344 | def __repr__(self): |
|
349 | def __repr__(self): | |
345 | return "<%s %s>"%(self.__class__.__name__, self.client._config['url']) |
|
350 | return "<%s %s>"%(self.__class__.__name__, self.client._config['url']) | |
346 |
|
351 | |||
347 | @property |
|
352 | @property | |
348 | def targets(self): |
|
353 | def targets(self): | |
349 | return None |
|
354 | return None | |
350 |
|
355 | |||
351 | @targets.setter |
|
356 | @targets.setter | |
352 | def targets(self, value): |
|
357 | def targets(self, value): | |
353 | raise AttributeError("Cannot set targets for LoadbalancedView!") |
|
358 | raise AttributeError("Cannot set targets for LoadbalancedView!") | |
354 |
|
359 | |||
355 | No newline at end of file |
|
360 |
@@ -1,106 +1,110 b'' | |||||
1 | """Example for generating an arbitrary DAG as a dependency map. |
|
1 | """Example for generating an arbitrary DAG as a dependency map. | |
2 |
|
2 | |||
3 | This demo uses networkx to generate the graph. |
|
3 | This demo uses networkx to generate the graph. | |
4 |
|
4 | |||
5 | Authors |
|
5 | Authors | |
6 | ------- |
|
6 | ------- | |
7 | * MinRK |
|
7 | * MinRK | |
8 | """ |
|
8 | """ | |
9 | import networkx as nx |
|
9 | import networkx as nx | |
10 | from random import randint, random |
|
10 | from random import randint, random | |
11 | from IPython.zmq.parallel import client as cmod |
|
11 | from IPython.zmq.parallel import client as cmod | |
12 |
|
12 | |||
13 | def randomwait(): |
|
13 | def randomwait(): | |
14 | import time |
|
14 | import time | |
15 | from random import random |
|
15 | from random import random | |
16 | time.sleep(random()) |
|
16 | time.sleep(random()) | |
17 | return time.time() |
|
17 | return time.time() | |
18 |
|
18 | |||
19 |
|
19 | |||
20 | def random_dag(nodes, edges): |
|
20 | def random_dag(nodes, edges): | |
21 | """Generate a random Directed Acyclic Graph (DAG) with a given number of nodes and edges.""" |
|
21 | """Generate a random Directed Acyclic Graph (DAG) with a given number of nodes and edges.""" | |
22 | G = nx.DiGraph() |
|
22 | G = nx.DiGraph() | |
23 | for i in range(nodes): |
|
23 | for i in range(nodes): | |
24 | G.add_node(i) |
|
24 | G.add_node(i) | |
25 | while edges > 0: |
|
25 | while edges > 0: | |
26 | a = randint(0,nodes-1) |
|
26 | a = randint(0,nodes-1) | |
27 | b=a |
|
27 | b=a | |
28 | while b==a: |
|
28 | while b==a: | |
29 | b = randint(0,nodes-1) |
|
29 | b = randint(0,nodes-1) | |
30 | G.add_edge(a,b) |
|
30 | G.add_edge(a,b) | |
31 | if nx.is_directed_acyclic_graph(G): |
|
31 | if nx.is_directed_acyclic_graph(G): | |
32 | edges -= 1 |
|
32 | edges -= 1 | |
33 | else: |
|
33 | else: | |
34 | # we closed a loop! |
|
34 | # we closed a loop! | |
35 | G.remove_edge(a,b) |
|
35 | G.remove_edge(a,b) | |
36 | return G |
|
36 | return G | |
37 |
|
37 | |||
38 | def add_children(G, parent, level, n=2): |
|
38 | def add_children(G, parent, level, n=2): | |
39 | """Add children recursively to a binary tree.""" |
|
39 | """Add children recursively to a binary tree.""" | |
40 | if level == 0: |
|
40 | if level == 0: | |
41 | return |
|
41 | return | |
42 | for i in range(n): |
|
42 | for i in range(n): | |
43 | child = parent+str(i) |
|
43 | child = parent+str(i) | |
44 | G.add_node(child) |
|
44 | G.add_node(child) | |
45 | G.add_edge(parent,child) |
|
45 | G.add_edge(parent,child) | |
46 | add_children(G, child, level-1, n) |
|
46 | add_children(G, child, level-1, n) | |
47 |
|
47 | |||
48 | def make_bintree(levels): |
|
48 | def make_bintree(levels): | |
49 | """Make a symmetrical binary tree with @levels""" |
|
49 | """Make a symmetrical binary tree with @levels""" | |
50 | G = nx.DiGraph() |
|
50 | G = nx.DiGraph() | |
51 | root = '0' |
|
51 | root = '0' | |
52 | G.add_node(root) |
|
52 | G.add_node(root) | |
53 | add_children(G, root, levels, 2) |
|
53 | add_children(G, root, levels, 2) | |
54 | return G |
|
54 | return G | |
55 |
|
55 | |||
56 | def submit_jobs(client, G, jobs): |
|
56 | def submit_jobs(client, G, jobs): | |
57 | """Submit jobs via client where G describes the time dependencies.""" |
|
57 | """Submit jobs via client where G describes the time dependencies.""" | |
58 | results = {} |
|
58 | results = {} | |
59 | for node in nx.topological_sort(G): |
|
59 | for node in nx.topological_sort(G): | |
60 |
deps = [ results[n |
|
60 | deps = [ results[n] for n in G.predecessors(node) ] | |
61 | results[node] = client.apply(jobs[node], after=deps) |
|
61 | results[node] = client.apply(jobs[node], after=deps) | |
62 | return results |
|
62 | return results | |
63 |
|
63 | |||
64 | def validate_tree(G, results): |
|
64 | def validate_tree(G, results): | |
65 | """Validate that jobs executed after their dependencies.""" |
|
65 | """Validate that jobs executed after their dependencies.""" | |
66 | for node in G: |
|
66 | for node in G: | |
67 | started = results[node].metadata.started |
|
67 | started = results[node].metadata.started | |
68 | for parent in G.predecessors(node): |
|
68 | for parent in G.predecessors(node): | |
69 | finished = results[parent].metadata.completed |
|
69 | finished = results[parent].metadata.completed | |
70 | assert started > finished, "%s should have happened after %s"%(node, parent) |
|
70 | assert started > finished, "%s should have happened after %s"%(node, parent) | |
71 |
|
71 | |||
72 | def main(nodes, edges): |
|
72 | def main(nodes, edges): | |
73 | """Generate a random graph, submit jobs, then validate that the |
|
73 | """Generate a random graph, submit jobs, then validate that the | |
74 | dependency order was enforced. |
|
74 | dependency order was enforced. | |
75 | Finally, plot the graph, with time on the x-axis, and |
|
75 | Finally, plot the graph, with time on the x-axis, and | |
76 | in-degree on the y (just for spread). All arrows must |
|
76 | in-degree on the y (just for spread). All arrows must | |
77 | point at least slightly to the right if the graph is valid. |
|
77 | point at least slightly to the right if the graph is valid. | |
78 | """ |
|
78 | """ | |
79 | from matplotlib.dates import date2num |
|
79 | from matplotlib.dates import date2num | |
|
80 | from matplotlib.cm import gist_rainbow | |||
80 | print "building DAG" |
|
81 | print "building DAG" | |
81 | G = random_dag(nodes, edges) |
|
82 | G = random_dag(nodes, edges) | |
82 | jobs = {} |
|
83 | jobs = {} | |
83 | pos = {} |
|
84 | pos = {} | |
|
85 | colors = {} | |||
84 | for node in G: |
|
86 | for node in G: | |
85 | jobs[node] = randomwait |
|
87 | jobs[node] = randomwait | |
86 |
|
88 | |||
87 | client = cmod.Client() |
|
89 | client = cmod.Client() | |
88 | print "submitting tasks" |
|
90 | print "submitting %i tasks with %i dependencies"%(nodes,edges) | |
89 | results = submit_jobs(client, G, jobs) |
|
91 | results = submit_jobs(client, G, jobs) | |
90 | print "waiting for results" |
|
92 | print "waiting for results" | |
91 | client.barrier() |
|
93 | client.barrier() | |
92 | print "done" |
|
94 | print "done" | |
93 | for node in G: |
|
95 | for node in G: | |
94 |
|
|
96 | md = results[node].metadata | |
95 |
t = date2num( |
|
97 | start = date2num(md.started) | |
96 | pos[node] = (t, G.in_degree(node)+random()) |
|
98 | runtime = date2num(md.completed) - start | |
97 |
|
99 | pos[node] = (start, runtime) | ||
|
100 | colors[node] = md.engine_id | |||
98 | validate_tree(G, results) |
|
101 | validate_tree(G, results) | |
99 | nx.draw(G, pos) |
|
102 | nx.draw(G, pos, node_list = colors.keys(), node_color=colors.values(), cmap=gist_rainbow) | |
100 | return G,results |
|
103 | return G,results | |
101 |
|
104 | |||
102 | if __name__ == '__main__': |
|
105 | if __name__ == '__main__': | |
103 | import pylab |
|
106 | import pylab | |
104 |
|
|
107 | # main(5,10) | |
|
108 | main(32,96) | |||
105 | pylab.show() |
|
109 | pylab.show() | |
106 | No newline at end of file |
|
110 |
@@ -1,19 +1,20 b'' | |||||
1 | .. _parallelz_index: |
|
1 | .. _parallelz_index: | |
2 |
|
2 | |||
3 | ========================================== |
|
3 | ========================================== | |
4 | Using IPython for parallel computing (ZMQ) |
|
4 | Using IPython for parallel computing (ZMQ) | |
5 | ========================================== |
|
5 | ========================================== | |
6 |
|
6 | |||
7 | .. toctree:: |
|
7 | .. toctree:: | |
8 | :maxdepth: 2 |
|
8 | :maxdepth: 2 | |
9 |
|
9 | |||
10 | parallel_intro.txt |
|
10 | parallel_intro.txt | |
11 | parallel_process.txt |
|
11 | parallel_process.txt | |
12 | parallel_multiengine.txt |
|
12 | parallel_multiengine.txt | |
13 | parallel_task.txt |
|
13 | parallel_task.txt | |
14 | parallel_mpi.txt |
|
14 | parallel_mpi.txt | |
15 | parallel_security.txt |
|
15 | parallel_security.txt | |
16 | parallel_winhpc.txt |
|
16 | parallel_winhpc.txt | |
17 | parallel_demos.txt |
|
17 | parallel_demos.txt | |
|
18 | dag_dependencies.txt | |||
18 |
|
19 | |||
19 |
|
20 |
@@ -1,290 +1,283 b'' | |||||
1 | ================= |
|
1 | ================= | |
2 | Parallel examples |
|
2 | Parallel examples | |
3 | ================= |
|
3 | ================= | |
4 |
|
4 | |||
5 | .. note:: |
|
5 | .. note:: | |
6 |
|
6 | |||
7 | Performance numbers from ``IPython.kernel``, not newparallel. |
|
7 | Performance numbers from ``IPython.kernel``, not newparallel. | |
8 |
|
8 | |||
9 | In this section we describe two more involved examples of using an IPython |
|
9 | In this section we describe two more involved examples of using an IPython | |
10 | cluster to perform a parallel computation. In these examples, we will be using |
|
10 | cluster to perform a parallel computation. In these examples, we will be using | |
11 | IPython's "pylab" mode, which enables interactive plotting using the |
|
11 | IPython's "pylab" mode, which enables interactive plotting using the | |
12 | Matplotlib package. IPython can be started in this mode by typing:: |
|
12 | Matplotlib package. IPython can be started in this mode by typing:: | |
13 |
|
13 | |||
14 | ipython --pylab |
|
14 | ipython --pylab | |
15 |
|
15 | |||
16 | at the system command line. If this prints an error message, you will |
|
16 | at the system command line. | |
17 | need to install the default profiles from within IPython by doing, |
|
|||
18 |
|
||||
19 | .. sourcecode:: ipython |
|
|||
20 |
|
||||
21 | In [1]: %install_profiles |
|
|||
22 |
|
||||
23 | and then restarting IPython. |
|
|||
24 |
|
17 | |||
25 | 150 million digits of pi |
|
18 | 150 million digits of pi | |
26 | ======================== |
|
19 | ======================== | |
27 |
|
20 | |||
28 | In this example we would like to study the distribution of digits in the |
|
21 | In this example we would like to study the distribution of digits in the | |
29 | number pi (in base 10). While it is not known if pi is a normal number (a |
|
22 | number pi (in base 10). While it is not known if pi is a normal number (a | |
30 | number is normal in base 10 if 0-9 occur with equal likelihood) numerical |
|
23 | number is normal in base 10 if 0-9 occur with equal likelihood) numerical | |
31 | investigations suggest that it is. We will begin with a serial calculation on |
|
24 | investigations suggest that it is. We will begin with a serial calculation on | |
32 | 10,000 digits of pi and then perform a parallel calculation involving 150 |
|
25 | 10,000 digits of pi and then perform a parallel calculation involving 150 | |
33 | million digits. |
|
26 | million digits. | |
34 |
|
27 | |||
35 | In both the serial and parallel calculation we will be using functions defined |
|
28 | In both the serial and parallel calculation we will be using functions defined | |
36 | in the :file:`pidigits.py` file, which is available in the |
|
29 | in the :file:`pidigits.py` file, which is available in the | |
37 | :file:`docs/examples/newparallel` directory of the IPython source distribution. |
|
30 | :file:`docs/examples/newparallel` directory of the IPython source distribution. | |
38 | These functions provide basic facilities for working with the digits of pi and |
|
31 | These functions provide basic facilities for working with the digits of pi and | |
39 | can be loaded into IPython by putting :file:`pidigits.py` in your current |
|
32 | can be loaded into IPython by putting :file:`pidigits.py` in your current | |
40 | working directory and then doing: |
|
33 | working directory and then doing: | |
41 |
|
34 | |||
42 | .. sourcecode:: ipython |
|
35 | .. sourcecode:: ipython | |
43 |
|
36 | |||
44 | In [1]: run pidigits.py |
|
37 | In [1]: run pidigits.py | |
45 |
|
38 | |||
46 | Serial calculation |
|
39 | Serial calculation | |
47 | ------------------ |
|
40 | ------------------ | |
48 |
|
41 | |||
49 | For the serial calculation, we will use `SymPy <http://www.sympy.org>`_ to |
|
42 | For the serial calculation, we will use `SymPy <http://www.sympy.org>`_ to | |
50 | calculate 10,000 digits of pi and then look at the frequencies of the digits |
|
43 | calculate 10,000 digits of pi and then look at the frequencies of the digits | |
51 | 0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While |
|
44 | 0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While | |
52 | SymPy is capable of calculating many more digits of pi, our purpose here is to |
|
45 | SymPy is capable of calculating many more digits of pi, our purpose here is to | |
53 | set the stage for the much larger parallel calculation. |
|
46 | set the stage for the much larger parallel calculation. | |
54 |
|
47 | |||
55 | In this example, we use two functions from :file:`pidigits.py`: |
|
48 | In this example, we use two functions from :file:`pidigits.py`: | |
56 | :func:`one_digit_freqs` (which calculates how many times each digit occurs) |
|
49 | :func:`one_digit_freqs` (which calculates how many times each digit occurs) | |
57 | and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result). |
|
50 | and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result). | |
58 | Here is an interactive IPython session that uses these functions with |
|
51 | Here is an interactive IPython session that uses these functions with | |
59 | SymPy: |
|
52 | SymPy: | |
60 |
|
53 | |||
61 | .. sourcecode:: ipython |
|
54 | .. sourcecode:: ipython | |
62 |
|
55 | |||
63 | In [7]: import sympy |
|
56 | In [7]: import sympy | |
64 |
|
57 | |||
65 | In [8]: pi = sympy.pi.evalf(40) |
|
58 | In [8]: pi = sympy.pi.evalf(40) | |
66 |
|
59 | |||
67 | In [9]: pi |
|
60 | In [9]: pi | |
68 | Out[9]: 3.141592653589793238462643383279502884197 |
|
61 | Out[9]: 3.141592653589793238462643383279502884197 | |
69 |
|
62 | |||
70 | In [10]: pi = sympy.pi.evalf(10000) |
|
63 | In [10]: pi = sympy.pi.evalf(10000) | |
71 |
|
64 | |||
72 | In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits |
|
65 | In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits | |
73 |
|
66 | |||
74 | In [12]: run pidigits.py # load one_digit_freqs/plot_one_digit_freqs |
|
67 | In [12]: run pidigits.py # load one_digit_freqs/plot_one_digit_freqs | |
75 |
|
68 | |||
76 | In [13]: freqs = one_digit_freqs(digits) |
|
69 | In [13]: freqs = one_digit_freqs(digits) | |
77 |
|
70 | |||
78 | In [14]: plot_one_digit_freqs(freqs) |
|
71 | In [14]: plot_one_digit_freqs(freqs) | |
79 | Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>] |
|
72 | Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>] | |
80 |
|
73 | |||
81 | The resulting plot of the single digit counts shows that each digit occurs |
|
74 | The resulting plot of the single digit counts shows that each digit occurs | |
82 | approximately 1,000 times, but that with only 10,000 digits the |
|
75 | approximately 1,000 times, but that with only 10,000 digits the | |
83 | statistical fluctuations are still rather large: |
|
76 | statistical fluctuations are still rather large: | |
84 |
|
77 | |||
85 | .. image:: ../parallel/single_digits.* |
|
78 | .. image:: ../parallel/single_digits.* | |
86 |
|
79 | |||
87 | It is clear that to reduce the relative fluctuations in the counts, we need |
|
80 | It is clear that to reduce the relative fluctuations in the counts, we need | |
88 | to look at many more digits of pi. That brings us to the parallel calculation. |
|
81 | to look at many more digits of pi. That brings us to the parallel calculation. | |
89 |
|
82 | |||
90 | Parallel calculation |
|
83 | Parallel calculation | |
91 | -------------------- |
|
84 | -------------------- | |
92 |
|
85 | |||
93 | Calculating many digits of pi is a challenging computational problem in itself. |
|
86 | Calculating many digits of pi is a challenging computational problem in itself. | |
94 | Because we want to focus on the distribution of digits in this example, we |
|
87 | Because we want to focus on the distribution of digits in this example, we | |
95 | will use pre-computed digit of pi from the website of Professor Yasumasa |
|
88 | will use pre-computed digit of pi from the website of Professor Yasumasa | |
96 | Kanada at the University of Tokyo (http://www.super-computing.org). These |
|
89 | Kanada at the University of Tokyo (http://www.super-computing.org). These | |
97 | digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/) |
|
90 | digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/) | |
98 | that each have 10 million digits of pi. |
|
91 | that each have 10 million digits of pi. | |
99 |
|
92 | |||
100 | For the parallel calculation, we have copied these files to the local hard |
|
93 | For the parallel calculation, we have copied these files to the local hard | |
101 | drives of the compute nodes. A total of 15 of these files will be used, for a |
|
94 | drives of the compute nodes. A total of 15 of these files will be used, for a | |
102 | total of 150 million digits of pi. To make things a little more interesting we |
|
95 | total of 150 million digits of pi. To make things a little more interesting we | |
103 | will calculate the frequencies of all 2 digits sequences (00-99) and then plot |
|
96 | will calculate the frequencies of all 2 digits sequences (00-99) and then plot | |
104 | the result using a 2D matrix in Matplotlib. |
|
97 | the result using a 2D matrix in Matplotlib. | |
105 |
|
98 | |||
106 | The overall idea of the calculation is simple: each IPython engine will |
|
99 | The overall idea of the calculation is simple: each IPython engine will | |
107 | compute the two digit counts for the digits in a single file. Then in a final |
|
100 | compute the two digit counts for the digits in a single file. Then in a final | |
108 | step the counts from each engine will be added up. To perform this |
|
101 | step the counts from each engine will be added up. To perform this | |
109 | calculation, we will need two top-level functions from :file:`pidigits.py`: |
|
102 | calculation, we will need two top-level functions from :file:`pidigits.py`: | |
110 |
|
103 | |||
111 | .. literalinclude:: ../../examples/newparallel/pidigits.py |
|
104 | .. literalinclude:: ../../examples/newparallel/pidigits.py | |
112 | :language: python |
|
105 | :language: python | |
113 | :lines: 41-56 |
|
106 | :lines: 41-56 | |
114 |
|
107 | |||
115 | We will also use the :func:`plot_two_digit_freqs` function to plot the |
|
108 | We will also use the :func:`plot_two_digit_freqs` function to plot the | |
116 | results. The code to run this calculation in parallel is contained in |
|
109 | results. The code to run this calculation in parallel is contained in | |
117 | :file:`docs/examples/newparallel/parallelpi.py`. This code can be run in parallel |
|
110 | :file:`docs/examples/newparallel/parallelpi.py`. This code can be run in parallel | |
118 | using IPython by following these steps: |
|
111 | using IPython by following these steps: | |
119 |
|
112 | |||
120 | 1. Use :command:`ipclusterz` to start 15 engines. We used an 8 core (2 quad |
|
113 | 1. Use :command:`ipclusterz` to start 15 engines. We used an 8 core (2 quad | |
121 | core CPUs) cluster with hyperthreading enabled which makes the 8 cores |
|
114 | core CPUs) cluster with hyperthreading enabled which makes the 8 cores | |
122 | looks like 16 (1 controller + 15 engines) in the OS. However, the maximum |
|
115 | looks like 16 (1 controller + 15 engines) in the OS. However, the maximum | |
123 | speedup we can observe is still only 8x. |
|
116 | speedup we can observe is still only 8x. | |
124 | 2. With the file :file:`parallelpi.py` in your current working directory, open |
|
117 | 2. With the file :file:`parallelpi.py` in your current working directory, open | |
125 | up IPython in pylab mode and type ``run parallelpi.py``. This will download |
|
118 | up IPython in pylab mode and type ``run parallelpi.py``. This will download | |
126 | the pi files via ftp the first time you run it, if they are not |
|
119 | the pi files via ftp the first time you run it, if they are not | |
127 | present in the Engines' working directory. |
|
120 | present in the Engines' working directory. | |
128 |
|
121 | |||
129 | When run on our 8 core cluster, we observe a speedup of 7.7x. This is slightly |
|
122 | When run on our 8 core cluster, we observe a speedup of 7.7x. This is slightly | |
130 | less than linear scaling (8x) because the controller is also running on one of |
|
123 | less than linear scaling (8x) because the controller is also running on one of | |
131 | the cores. |
|
124 | the cores. | |
132 |
|
125 | |||
133 | To emphasize the interactive nature of IPython, we now show how the |
|
126 | To emphasize the interactive nature of IPython, we now show how the | |
134 | calculation can also be run by simply typing the commands from |
|
127 | calculation can also be run by simply typing the commands from | |
135 | :file:`parallelpi.py` interactively into IPython: |
|
128 | :file:`parallelpi.py` interactively into IPython: | |
136 |
|
129 | |||
137 | .. sourcecode:: ipython |
|
130 | .. sourcecode:: ipython | |
138 |
|
131 | |||
139 | In [1]: from IPython.zmq.parallel import client |
|
132 | In [1]: from IPython.zmq.parallel import client | |
140 |
|
133 | |||
141 | # The Client allows us to use the engines interactively. |
|
134 | # The Client allows us to use the engines interactively. | |
142 | # We simply pass Client the name of the cluster profile we |
|
135 | # We simply pass Client the name of the cluster profile we | |
143 | # are using. |
|
136 | # are using. | |
144 | In [2]: c = client.Client(profile='mycluster') |
|
137 | In [2]: c = client.Client(profile='mycluster') | |
145 |
|
138 | |||
146 | In [3]: c.ids |
|
139 | In [3]: c.ids | |
147 | Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] |
|
140 | Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] | |
148 |
|
141 | |||
149 | In [4]: run pidigits.py |
|
142 | In [4]: run pidigits.py | |
150 |
|
143 | |||
151 | In [5]: filestring = 'pi200m.ascii.%(i)02dof20' |
|
144 | In [5]: filestring = 'pi200m.ascii.%(i)02dof20' | |
152 |
|
145 | |||
153 | # Create the list of files to process. |
|
146 | # Create the list of files to process. | |
154 | In [6]: files = [filestring % {'i':i} for i in range(1,16)] |
|
147 | In [6]: files = [filestring % {'i':i} for i in range(1,16)] | |
155 |
|
148 | |||
156 | In [7]: files |
|
149 | In [7]: files | |
157 | Out[7]: |
|
150 | Out[7]: | |
158 | ['pi200m.ascii.01of20', |
|
151 | ['pi200m.ascii.01of20', | |
159 | 'pi200m.ascii.02of20', |
|
152 | 'pi200m.ascii.02of20', | |
160 | 'pi200m.ascii.03of20', |
|
153 | 'pi200m.ascii.03of20', | |
161 | 'pi200m.ascii.04of20', |
|
154 | 'pi200m.ascii.04of20', | |
162 | 'pi200m.ascii.05of20', |
|
155 | 'pi200m.ascii.05of20', | |
163 | 'pi200m.ascii.06of20', |
|
156 | 'pi200m.ascii.06of20', | |
164 | 'pi200m.ascii.07of20', |
|
157 | 'pi200m.ascii.07of20', | |
165 | 'pi200m.ascii.08of20', |
|
158 | 'pi200m.ascii.08of20', | |
166 | 'pi200m.ascii.09of20', |
|
159 | 'pi200m.ascii.09of20', | |
167 | 'pi200m.ascii.10of20', |
|
160 | 'pi200m.ascii.10of20', | |
168 | 'pi200m.ascii.11of20', |
|
161 | 'pi200m.ascii.11of20', | |
169 | 'pi200m.ascii.12of20', |
|
162 | 'pi200m.ascii.12of20', | |
170 | 'pi200m.ascii.13of20', |
|
163 | 'pi200m.ascii.13of20', | |
171 | 'pi200m.ascii.14of20', |
|
164 | 'pi200m.ascii.14of20', | |
172 | 'pi200m.ascii.15of20'] |
|
165 | 'pi200m.ascii.15of20'] | |
173 |
|
166 | |||
174 | # download the data files if they don't already exist: |
|
167 | # download the data files if they don't already exist: | |
175 | In [8]: c.map(fetch_pi_file, files) |
|
168 | In [8]: c.map(fetch_pi_file, files) | |
176 |
|
169 | |||
177 | # This is the parallel calculation using the Client.map method |
|
170 | # This is the parallel calculation using the Client.map method | |
178 | # which applies compute_two_digit_freqs to each file in files in parallel. |
|
171 | # which applies compute_two_digit_freqs to each file in files in parallel. | |
179 | In [9]: freqs_all = c.map(compute_two_digit_freqs, files) |
|
172 | In [9]: freqs_all = c.map(compute_two_digit_freqs, files) | |
180 |
|
173 | |||
181 | # Add up the frequencies from each engine. |
|
174 | # Add up the frequencies from each engine. | |
182 | In [10]: freqs = reduce_freqs(freqs_all) |
|
175 | In [10]: freqs = reduce_freqs(freqs_all) | |
183 |
|
176 | |||
184 | In [11]: plot_two_digit_freqs(freqs) |
|
177 | In [11]: plot_two_digit_freqs(freqs) | |
185 | Out[11]: <matplotlib.image.AxesImage object at 0x18beb110> |
|
178 | Out[11]: <matplotlib.image.AxesImage object at 0x18beb110> | |
186 |
|
179 | |||
187 | In [12]: plt.title('2 digit counts of 150m digits of pi') |
|
180 | In [12]: plt.title('2 digit counts of 150m digits of pi') | |
188 | Out[12]: <matplotlib.text.Text object at 0x18d1f9b0> |
|
181 | Out[12]: <matplotlib.text.Text object at 0x18d1f9b0> | |
189 |
|
182 | |||
190 | The resulting plot generated by Matplotlib is shown below. The colors indicate |
|
183 | The resulting plot generated by Matplotlib is shown below. The colors indicate | |
191 | which two digit sequences are more (red) or less (blue) likely to occur in the |
|
184 | which two digit sequences are more (red) or less (blue) likely to occur in the | |
192 | first 150 million digits of pi. We clearly see that the sequence "41" is |
|
185 | first 150 million digits of pi. We clearly see that the sequence "41" is | |
193 | most likely and that "06" and "07" are least likely. Further analysis would |
|
186 | most likely and that "06" and "07" are least likely. Further analysis would | |
194 | show that the relative size of the statistical fluctuations have decreased |
|
187 | show that the relative size of the statistical fluctuations have decreased | |
195 | compared to the 10,000 digit calculation. |
|
188 | compared to the 10,000 digit calculation. | |
196 |
|
189 | |||
197 | .. image:: ../parallel/two_digit_counts.* |
|
190 | .. image:: ../parallel/two_digit_counts.* | |
198 |
|
191 | |||
199 |
|
192 | |||
200 | Parallel options pricing |
|
193 | Parallel options pricing | |
201 | ======================== |
|
194 | ======================== | |
202 |
|
195 | |||
203 | An option is a financial contract that gives the buyer of the contract the |
|
196 | An option is a financial contract that gives the buyer of the contract the | |
204 | right to buy (a "call") or sell (a "put") a secondary asset (a stock for |
|
197 | right to buy (a "call") or sell (a "put") a secondary asset (a stock for | |
205 | example) at a particular date in the future (the expiration date) for a |
|
198 | example) at a particular date in the future (the expiration date) for a | |
206 | pre-agreed upon price (the strike price). For this right, the buyer pays the |
|
199 | pre-agreed upon price (the strike price). For this right, the buyer pays the | |
207 | seller a premium (the option price). There are a wide variety of flavors of |
|
200 | seller a premium (the option price). There are a wide variety of flavors of | |
208 | options (American, European, Asian, etc.) that are useful for different |
|
201 | options (American, European, Asian, etc.) that are useful for different | |
209 | purposes: hedging against risk, speculation, etc. |
|
202 | purposes: hedging against risk, speculation, etc. | |
210 |
|
203 | |||
211 | Much of modern finance is driven by the need to price these contracts |
|
204 | Much of modern finance is driven by the need to price these contracts | |
212 | accurately based on what is known about the properties (such as volatility) of |
|
205 | accurately based on what is known about the properties (such as volatility) of | |
213 | the underlying asset. One method of pricing options is to use a Monte Carlo |
|
206 | the underlying asset. One method of pricing options is to use a Monte Carlo | |
214 | simulation of the underlying asset price. In this example we use this approach |
|
207 | simulation of the underlying asset price. In this example we use this approach | |
215 | to price both European and Asian (path dependent) options for various strike |
|
208 | to price both European and Asian (path dependent) options for various strike | |
216 | prices and volatilities. |
|
209 | prices and volatilities. | |
217 |
|
210 | |||
218 | The code for this example can be found in the :file:`docs/examples/newparallel` |
|
211 | The code for this example can be found in the :file:`docs/examples/newparallel` | |
219 | directory of the IPython source. The function :func:`price_options` in |
|
212 | directory of the IPython source. The function :func:`price_options` in | |
220 | :file:`mcpricer.py` implements the basic Monte Carlo pricing algorithm using |
|
213 | :file:`mcpricer.py` implements the basic Monte Carlo pricing algorithm using | |
221 | the NumPy package and is shown here: |
|
214 | the NumPy package and is shown here: | |
222 |
|
215 | |||
223 | .. literalinclude:: ../../examples/newparallel/mcpricer.py |
|
216 | .. literalinclude:: ../../examples/newparallel/mcpricer.py | |
224 | :language: python |
|
217 | :language: python | |
225 |
|
218 | |||
226 | To run this code in parallel, we will use IPython's :class:`LoadBalancedView` class, |
|
219 | To run this code in parallel, we will use IPython's :class:`LoadBalancedView` class, | |
227 | which distributes work to the engines using dynamic load balancing. This |
|
220 | which distributes work to the engines using dynamic load balancing. This | |
228 | view is a wrapper of the :class:`Client` class shown in |
|
221 | view is a wrapper of the :class:`Client` class shown in | |
229 | the previous example. The parallel calculation using :class:`LoadBalancedView` can |
|
222 | the previous example. The parallel calculation using :class:`LoadBalancedView` can | |
230 | be found in the file :file:`mcpricer.py`. The code in this file creates a |
|
223 | be found in the file :file:`mcpricer.py`. The code in this file creates a | |
231 | :class:`TaskClient` instance and then submits a set of tasks using |
|
224 | :class:`TaskClient` instance and then submits a set of tasks using | |
232 | :meth:`TaskClient.run` that calculate the option prices for different |
|
225 | :meth:`TaskClient.run` that calculate the option prices for different | |
233 | volatilities and strike prices. The results are then plotted as a 2D contour |
|
226 | volatilities and strike prices. The results are then plotted as a 2D contour | |
234 | plot using Matplotlib. |
|
227 | plot using Matplotlib. | |
235 |
|
228 | |||
236 | .. literalinclude:: ../../examples/newparallel/mcdriver.py |
|
229 | .. literalinclude:: ../../examples/newparallel/mcdriver.py | |
237 | :language: python |
|
230 | :language: python | |
238 |
|
231 | |||
239 | To use this code, start an IPython cluster using :command:`ipclusterz`, open |
|
232 | To use this code, start an IPython cluster using :command:`ipclusterz`, open | |
240 | IPython in the pylab mode with the file :file:`mcdriver.py` in your current |
|
233 | IPython in the pylab mode with the file :file:`mcdriver.py` in your current | |
241 | working directory and then type: |
|
234 | working directory and then type: | |
242 |
|
235 | |||
243 | .. sourcecode:: ipython |
|
236 | .. sourcecode:: ipython | |
244 |
|
237 | |||
245 | In [7]: run mcdriver.py |
|
238 | In [7]: run mcdriver.py | |
246 | Submitted tasks: [0, 1, 2, ...] |
|
239 | Submitted tasks: [0, 1, 2, ...] | |
247 |
|
240 | |||
248 | Once all the tasks have finished, the results can be plotted using the |
|
241 | Once all the tasks have finished, the results can be plotted using the | |
249 | :func:`plot_options` function. Here we make contour plots of the Asian |
|
242 | :func:`plot_options` function. Here we make contour plots of the Asian | |
250 | call and Asian put options as function of the volatility and strike price: |
|
243 | call and Asian put options as function of the volatility and strike price: | |
251 |
|
244 | |||
252 | .. sourcecode:: ipython |
|
245 | .. sourcecode:: ipython | |
253 |
|
246 | |||
254 | In [8]: plot_options(sigma_vals, K_vals, prices['acall']) |
|
247 | In [8]: plot_options(sigma_vals, K_vals, prices['acall']) | |
255 |
|
248 | |||
256 | In [9]: plt.figure() |
|
249 | In [9]: plt.figure() | |
257 | Out[9]: <matplotlib.figure.Figure object at 0x18c178d0> |
|
250 | Out[9]: <matplotlib.figure.Figure object at 0x18c178d0> | |
258 |
|
251 | |||
259 | In [10]: plot_options(sigma_vals, K_vals, prices['aput']) |
|
252 | In [10]: plot_options(sigma_vals, K_vals, prices['aput']) | |
260 |
|
253 | |||
261 | These results are shown in the two figures below. On a 8 core cluster the |
|
254 | These results are shown in the two figures below. On a 8 core cluster the | |
262 | entire calculation (10 strike prices, 10 volatilities, 100,000 paths for each) |
|
255 | entire calculation (10 strike prices, 10 volatilities, 100,000 paths for each) | |
263 | took 30 seconds in parallel, giving a speedup of 7.7x, which is comparable |
|
256 | took 30 seconds in parallel, giving a speedup of 7.7x, which is comparable | |
264 | to the speedup observed in our previous example. |
|
257 | to the speedup observed in our previous example. | |
265 |
|
258 | |||
266 | .. image:: ../parallel/asian_call.* |
|
259 | .. image:: ../parallel/asian_call.* | |
267 |
|
260 | |||
268 | .. image:: ../parallel/asian_put.* |
|
261 | .. image:: ../parallel/asian_put.* | |
269 |
|
262 | |||
270 | Conclusion |
|
263 | Conclusion | |
271 | ========== |
|
264 | ========== | |
272 |
|
265 | |||
273 | To conclude these examples, we summarize the key features of IPython's |
|
266 | To conclude these examples, we summarize the key features of IPython's | |
274 | parallel architecture that have been demonstrated: |
|
267 | parallel architecture that have been demonstrated: | |
275 |
|
268 | |||
276 | * Serial code can be parallelized often with only a few extra lines of code. |
|
269 | * Serial code can be parallelized often with only a few extra lines of code. | |
277 | We have used the :class:`DirectView` and :class:`LoadBalancedView` classes |
|
270 | We have used the :class:`DirectView` and :class:`LoadBalancedView` classes | |
278 | for this purpose. |
|
271 | for this purpose. | |
279 | * The resulting parallel code can be run without ever leaving the IPython's |
|
272 | * The resulting parallel code can be run without ever leaving the IPython's | |
280 | interactive shell. |
|
273 | interactive shell. | |
281 | * Any data computed in parallel can be explored interactively through |
|
274 | * Any data computed in parallel can be explored interactively through | |
282 | visualization or further numerical calculations. |
|
275 | visualization or further numerical calculations. | |
283 | * We have run these examples on a cluster running Windows HPC Server 2008. |
|
276 | * We have run these examples on a cluster running Windows HPC Server 2008. | |
284 | IPython's built in support for the Windows HPC job scheduler makes it |
|
277 | IPython's built in support for the Windows HPC job scheduler makes it | |
285 | easy to get started with IPython's parallel capabilities. |
|
278 | easy to get started with IPython's parallel capabilities. | |
286 |
|
279 | |||
287 | .. note:: |
|
280 | .. note:: | |
288 |
|
281 | |||
289 | The newparallel code has never been run on Windows HPC Server, so the last |
|
282 | The newparallel code has never been run on Windows HPC Server, so the last | |
290 | conclusion is untested. |
|
283 | conclusion is untested. |
@@ -1,821 +1,818 b'' | |||||
1 | .. _parallelmultiengine: |
|
1 | .. _parallelmultiengine: | |
2 |
|
2 | |||
3 | ========================== |
|
3 | ========================== | |
4 | IPython's Direct interface |
|
4 | IPython's Direct interface | |
5 | ========================== |
|
5 | ========================== | |
6 |
|
6 | |||
7 | The direct, or multiengine, interface represents one possible way of working with a set of |
|
7 | The direct, or multiengine, interface represents one possible way of working with a set of | |
8 | IPython engines. The basic idea behind the multiengine interface is that the |
|
8 | IPython engines. The basic idea behind the multiengine interface is that the | |
9 | capabilities of each engine are directly and explicitly exposed to the user. |
|
9 | capabilities of each engine are directly and explicitly exposed to the user. | |
10 | Thus, in the multiengine interface, each engine is given an id that is used to |
|
10 | Thus, in the multiengine interface, each engine is given an id that is used to | |
11 | identify the engine and give it work to do. This interface is very intuitive |
|
11 | identify the engine and give it work to do. This interface is very intuitive | |
12 | and is designed with interactive usage in mind, and is thus the best place for |
|
12 | and is designed with interactive usage in mind, and is thus the best place for | |
13 | new users of IPython to begin. |
|
13 | new users of IPython to begin. | |
14 |
|
14 | |||
15 | Starting the IPython controller and engines |
|
15 | Starting the IPython controller and engines | |
16 | =========================================== |
|
16 | =========================================== | |
17 |
|
17 | |||
18 | To follow along with this tutorial, you will need to start the IPython |
|
18 | To follow along with this tutorial, you will need to start the IPython | |
19 | controller and four IPython engines. The simplest way of doing this is to use |
|
19 | controller and four IPython engines. The simplest way of doing this is to use | |
20 | the :command:`ipclusterz` command:: |
|
20 | the :command:`ipclusterz` command:: | |
21 |
|
21 | |||
22 | $ ipclusterz start -n 4 |
|
22 | $ ipclusterz start -n 4 | |
23 |
|
23 | |||
24 | For more detailed information about starting the controller and engines, see |
|
24 | For more detailed information about starting the controller and engines, see | |
25 | our :ref:`introduction <ip1par>` to using IPython for parallel computing. |
|
25 | our :ref:`introduction <ip1par>` to using IPython for parallel computing. | |
26 |
|
26 | |||
27 | Creating a ``Client`` instance |
|
27 | Creating a ``Client`` instance | |
28 | ============================== |
|
28 | ============================== | |
29 |
|
29 | |||
30 | The first step is to import the IPython :mod:`IPython.zmq.parallel.client` |
|
30 | The first step is to import the IPython :mod:`IPython.zmq.parallel.client` | |
31 | module and then create a :class:`.Client` instance: |
|
31 | module and then create a :class:`.Client` instance: | |
32 |
|
32 | |||
33 | .. sourcecode:: ipython |
|
33 | .. sourcecode:: ipython | |
34 |
|
34 | |||
35 | In [1]: from IPython.zmq.parallel import client |
|
35 | In [1]: from IPython.zmq.parallel import client | |
36 |
|
36 | |||
37 | In [2]: rc = client.Client() |
|
37 | In [2]: rc = client.Client() | |
38 |
|
38 | |||
39 | This form assumes that the default connection information (stored in |
|
39 | This form assumes that the default connection information (stored in | |
40 | :file:`ipcontroller-client.json` found in `~/.ipython/clusterz_default/security`) is |
|
40 | :file:`ipcontroller-client.json` found in `~/.ipython/clusterz_default/security`) is | |
41 | accurate. If the controller was started on a remote machine, you must copy that connection |
|
41 | accurate. If the controller was started on a remote machine, you must copy that connection | |
42 | file to the client machine, or enter its contents as arguments to the Client constructor: |
|
42 | file to the client machine, or enter its contents as arguments to the Client constructor: | |
43 |
|
43 | |||
44 | .. sourcecode:: ipython |
|
44 | .. sourcecode:: ipython | |
45 |
|
45 | |||
46 | # If you have copied the json connector file from the controller: |
|
46 | # If you have copied the json connector file from the controller: | |
47 | In [2]: rc = client.Client('/path/to/ipcontroller-client.json') |
|
47 | In [2]: rc = client.Client('/path/to/ipcontroller-client.json') | |
48 | # for a remote controller at 10.0.1.5, visible from my.server.com: |
|
48 | # for a remote controller at 10.0.1.5, visible from my.server.com: | |
49 | In [3]: rc = client.Client('tcp://10.0.1.5:12345', sshserver='my.server.com') |
|
49 | In [3]: rc = client.Client('tcp://10.0.1.5:12345', sshserver='my.server.com') | |
50 |
|
50 | |||
51 |
|
51 | |||
52 | To make sure there are engines connected to the controller, use can get a list |
|
52 | To make sure there are engines connected to the controller, use can get a list | |
53 | of engine ids: |
|
53 | of engine ids: | |
54 |
|
54 | |||
55 | .. sourcecode:: ipython |
|
55 | .. sourcecode:: ipython | |
56 |
|
56 | |||
57 | In [3]: rc.ids |
|
57 | In [3]: rc.ids | |
58 | Out[3]: set([0, 1, 2, 3]) |
|
58 | Out[3]: set([0, 1, 2, 3]) | |
59 |
|
59 | |||
60 | Here we see that there are four engines ready to do work for us. |
|
60 | Here we see that there are four engines ready to do work for us. | |
61 |
|
61 | |||
62 | Quick and easy parallelism |
|
62 | Quick and easy parallelism | |
63 | ========================== |
|
63 | ========================== | |
64 |
|
64 | |||
65 | In many cases, you simply want to apply a Python function to a sequence of |
|
65 | In many cases, you simply want to apply a Python function to a sequence of | |
66 | objects, but *in parallel*. The client interface provides a simple way |
|
66 | objects, but *in parallel*. The client interface provides a simple way | |
67 | of accomplishing this: using the builtin :func:`map` and the ``@remote`` |
|
67 | of accomplishing this: using the builtin :func:`map` and the ``@remote`` | |
68 | function decorator, or the client's :meth:`map` method. |
|
68 | function decorator, or the client's :meth:`map` method. | |
69 |
|
69 | |||
70 | Parallel map |
|
70 | Parallel map | |
71 | ------------ |
|
71 | ------------ | |
72 |
|
72 | |||
73 | Python's builtin :func:`map` functions allows a function to be applied to a |
|
73 | Python's builtin :func:`map` functions allows a function to be applied to a | |
74 | sequence element-by-element. This type of code is typically trivial to |
|
74 | sequence element-by-element. This type of code is typically trivial to | |
75 | parallelize. In fact, since IPython's interface is all about functions anyway, |
|
75 | parallelize. In fact, since IPython's interface is all about functions anyway, | |
76 | you can just use the builtin :func:`map`, or a client's :meth:`map` method: |
|
76 | you can just use the builtin :func:`map`, or a client's :meth:`map` method: | |
77 |
|
77 | |||
78 | .. sourcecode:: ipython |
|
78 | .. sourcecode:: ipython | |
79 |
|
79 | |||
80 | In [62]: serial_result = map(lambda x:x**10, range(32)) |
|
80 | In [62]: serial_result = map(lambda x:x**10, range(32)) | |
81 |
|
81 | |||
82 | In [66]: parallel_result = rc.map(lambda x: x**10, range(32)) |
|
82 | In [66]: parallel_result = rc.map(lambda x: x**10, range(32)) | |
83 |
|
83 | |||
84 | In [67]: serial_result==parallel_result |
|
84 | In [67]: serial_result==parallel_result | |
85 | Out[67]: True |
|
85 | Out[67]: True | |
86 |
|
86 | |||
87 |
|
87 | |||
88 | .. note:: |
|
88 | .. note:: | |
89 |
|
89 | |||
90 | The client's own version of :meth:`map` or that of :class:`.DirectView` do |
|
90 | The client's own version of :meth:`map` or that of :class:`.DirectView` do | |
91 | not do any load balancing. For a load balanced version, use a |
|
91 | not do any load balancing. For a load balanced version, use a | |
92 | :class:`LoadBalancedView`, or a :class:`ParallelFunction` with |
|
92 | :class:`LoadBalancedView`, or a :class:`ParallelFunction` with | |
93 | `targets=None`. |
|
93 | `targets=None`. | |
94 |
|
94 | |||
95 | .. seealso:: |
|
95 | .. seealso:: | |
96 |
|
96 | |||
97 | :meth:`map` is implemented via :class:`.ParallelFunction`. |
|
97 | :meth:`map` is implemented via :class:`.ParallelFunction`. | |
98 |
|
98 | |||
99 | Remote function decorator |
|
99 | Remote function decorator | |
100 | ------------------------- |
|
100 | ------------------------- | |
101 |
|
101 | |||
102 | Remote functions are just like normal functions, but when they are called, |
|
102 | Remote functions are just like normal functions, but when they are called, | |
103 | they execute on one or more engines, rather than locally. IPython provides |
|
103 | they execute on one or more engines, rather than locally. IPython provides | |
104 | some decorators: |
|
104 | some decorators: | |
105 |
|
105 | |||
106 | .. sourcecode:: ipython |
|
106 | .. sourcecode:: ipython | |
107 |
|
107 | |||
108 | In [10]: @rc.remote(block=True) |
|
108 | In [10]: @rc.remote(block=True) | |
109 | ....: def f(x): |
|
109 | ....: def f(x): | |
110 | ....: return 10.0*x**4 |
|
110 | ....: return 10.0*x**4 | |
111 | ....: |
|
111 | ....: | |
112 |
|
112 | |||
113 | In [11]: map(f, range(32)) # this is done in parallel |
|
113 | In [11]: map(f, range(32)) # this is done in parallel | |
114 | Out[11]: [0.0,10.0,160.0,...] |
|
114 | Out[11]: [0.0,10.0,160.0,...] | |
115 |
|
115 | |||
116 | See the docstring for the :func:`parallel` and :func:`remote` decorators for |
|
116 | See the docstring for the :func:`parallel` and :func:`remote` decorators for | |
117 | options. |
|
117 | options. | |
118 |
|
118 | |||
119 | Calling Python functions |
|
119 | Calling Python functions | |
120 | ======================== |
|
120 | ======================== | |
121 |
|
121 | |||
122 | The most basic type of operation that can be performed on the engines is to |
|
122 | The most basic type of operation that can be performed on the engines is to | |
123 | execute Python code or call Python functions. Executing Python code can be |
|
123 | execute Python code or call Python functions. Executing Python code can be | |
124 | done in blocking or non-blocking mode (non-blocking is default) using the |
|
124 | done in blocking or non-blocking mode (non-blocking is default) using the | |
125 | :meth:`execute` method, and calling functions can be done via the |
|
125 | :meth:`execute` method, and calling functions can be done via the | |
126 | :meth:`.View.apply` method. |
|
126 | :meth:`.View.apply` method. | |
127 |
|
127 | |||
128 | apply |
|
128 | apply | |
129 | ----- |
|
129 | ----- | |
130 |
|
130 | |||
131 | The main method for doing remote execution (in fact, all methods that |
|
131 | The main method for doing remote execution (in fact, all methods that | |
132 | communicate with the engines are built on top of it), is :meth:`Client.apply`. |
|
132 | communicate with the engines are built on top of it), is :meth:`Client.apply`. | |
133 | Ideally, :meth:`apply` would have the signature ``apply(f,*args,**kwargs)``, |
|
133 | Ideally, :meth:`apply` would have the signature ``apply(f,*args,**kwargs)``, | |
134 | which would call ``f(*args,**kwargs)`` remotely. However, since :class:`Clients` |
|
134 | which would call ``f(*args,**kwargs)`` remotely. However, since :class:`Clients` | |
135 |
require some more options, they cannot |
|
135 | require some more options, they cannot easily provide this interface. | |
136 | Instead, they provide the signature:: |
|
136 | Instead, they provide the signature:: | |
137 |
|
137 | |||
138 | c.apply(f, args=None, kwargs=None, bound=True, block=None, |
|
138 | c.apply(f, args=None, kwargs=None, bound=True, block=None, targets=None, | |
139 |
|
|
139 | after=None, follow=None, timeout=None) | |
140 |
|
140 | |||
141 | In order to provide the nicer interface, we have :class:`View` classes, which wrap |
|
141 | In order to provide the nicer interface, we have :class:`View` classes, which wrap | |
142 | :meth:`Client.apply` by using attributes and extra :meth:`apply_x` methods to determine |
|
142 | :meth:`Client.apply` by using attributes and extra :meth:`apply_x` methods to determine | |
143 | the extra arguments. For instance, performing index-access on a client creates a |
|
143 | the extra arguments. For instance, performing index-access on a client creates a | |
144 | :class:`.LoadBalancedView`. |
|
144 | :class:`.LoadBalancedView`. | |
145 |
|
145 | |||
146 | .. sourcecode:: ipython |
|
146 | .. sourcecode:: ipython | |
147 |
|
147 | |||
148 | In [4]: view = rc[1:3] |
|
148 | In [4]: view = rc[1:3] | |
149 | Out[4]: <DirectView [1, 2]> |
|
149 | Out[4]: <DirectView [1, 2]> | |
150 |
|
150 | |||
151 | In [5]: view.apply<tab> |
|
151 | In [5]: view.apply<tab> | |
152 | view.apply view.apply_async view.apply_async_bound view.apply_bound view.apply_sync view.apply_sync_bound |
|
152 | view.apply view.apply_async view.apply_async_bound view.apply_bound view.apply_sync view.apply_sync_bound | |
153 |
|
153 | |||
154 | A :class:`DirectView` always uses its `targets` attribute, and it will use its `bound` |
|
154 | A :class:`DirectView` always uses its `targets` attribute, and it will use its `bound` | |
155 | and `block` attributes in its :meth:`apply` method, but the suffixed :meth:`apply_x` |
|
155 | and `block` attributes in its :meth:`apply` method, but the suffixed :meth:`apply_x` | |
156 | methods allow specifying `bound` and `block` via the different methods. |
|
156 | methods allow specifying `bound` and `block` via the different methods. | |
157 |
|
157 | |||
158 | ================== ========== ========== |
|
158 | ================== ========== ========== | |
159 | method block bound |
|
159 | method block bound | |
160 | ================== ========== ========== |
|
160 | ================== ========== ========== | |
161 | apply self.block self.bound |
|
161 | apply self.block self.bound | |
162 | apply_sync True False |
|
162 | apply_sync True False | |
163 | apply_async False False |
|
163 | apply_async False False | |
164 | apply_sync_bound True True |
|
164 | apply_sync_bound True True | |
165 | apply_async_bound False True |
|
165 | apply_async_bound False True | |
166 | ================== ========== ========== |
|
166 | ================== ========== ========== | |
167 |
|
167 | |||
168 | For explanation of these values, read on. |
|
168 | For explanation of these values, read on. | |
169 |
|
169 | |||
170 | Blocking execution |
|
170 | Blocking execution | |
171 | ------------------ |
|
171 | ------------------ | |
172 |
|
172 | |||
173 | In blocking mode, the :class:`.DirectView` object (called ``dview`` in |
|
173 | In blocking mode, the :class:`.DirectView` object (called ``dview`` in | |
174 | these examples) submits the command to the controller, which places the |
|
174 | these examples) submits the command to the controller, which places the | |
175 | command in the engines' queues for execution. The :meth:`apply` call then |
|
175 | command in the engines' queues for execution. The :meth:`apply` call then | |
176 | blocks until the engines are done executing the command: |
|
176 | blocks until the engines are done executing the command: | |
177 |
|
177 | |||
178 | .. sourcecode:: ipython |
|
178 | .. sourcecode:: ipython | |
179 |
|
179 | |||
180 | In [2]: rc.block=True |
|
180 | In [2]: rc.block=True | |
181 | In [3]: dview = rc[:] # A DirectView of all engines |
|
181 | In [3]: dview = rc[:] # A DirectView of all engines | |
182 | In [4]: dview['a'] = 5 |
|
182 | In [4]: dview['a'] = 5 | |
183 |
|
183 | |||
184 | In [5]: dview['b'] = 10 |
|
184 | In [5]: dview['b'] = 10 | |
185 |
|
185 | |||
186 | In [6]: dview.apply_bound(lambda x: a+b+x, 27) |
|
186 | In [6]: dview.apply_bound(lambda x: a+b+x, 27) | |
187 | Out[6]: [42,42,42,42] |
|
187 | Out[6]: [42, 42, 42, 42] | |
188 |
|
188 | |||
189 | Python commands can be executed on specific engines by calling execute using |
|
189 | Python commands can be executed on specific engines by calling execute using | |
190 | the ``targets`` keyword argument, or creating a :class:`DirectView` instance |
|
190 | the ``targets`` keyword argument, or creating a :class:`DirectView` instance | |
191 | by index-access to the client: |
|
191 | by index-access to the client: | |
192 |
|
192 | |||
193 | .. sourcecode:: ipython |
|
193 | .. sourcecode:: ipython | |
194 |
|
194 | |||
195 | In [6]: rc[::2].execute('c=a+b') # shorthand for rc.execute('c=a+b',targets=[0,2]) |
|
195 | In [6]: rc[::2].execute('c=a+b') # shorthand for rc.execute('c=a+b',targets=[0,2]) | |
196 |
|
196 | |||
197 | In [7]: rc[1::2].execute('c=a-b') # shorthand for rc.execute('c=a-b',targets=[1,3]) |
|
197 | In [7]: rc[1::2].execute('c=a-b') # shorthand for rc.execute('c=a-b',targets=[1,3]) | |
198 |
|
198 | |||
199 | In [8]: rc[:]['c'] # shorthand for rc.pull('c',targets='all') |
|
199 | In [8]: rc[:]['c'] # shorthand for rc.pull('c',targets='all') | |
200 | Out[8]: [15,-5,15,-5] |
|
200 | Out[8]: [15, -5, 15, -5] | |
201 |
|
201 | |||
202 | .. note:: |
|
202 | .. note:: | |
203 |
|
203 | |||
204 | Note that every call to ``rc.<meth>(...,targets=x)`` can be made via |
|
204 | Note that every call to ``rc.<meth>(...,targets=x)`` can be made via | |
205 | ``rc[<x>].<meth>(...)``, which constructs a View object. The only place |
|
205 | ``rc[<x>].<meth>(...)``, which constructs a View object. The only place | |
206 | where this differs in in :meth:`apply`. The :class:`Client` takes many |
|
206 | where this differs in in :meth:`apply`. The :class:`Client` takes many | |
207 | arguments to apply, so it requires `args` and `kwargs` to be passed as |
|
207 | arguments to apply, so it requires `args` and `kwargs` to be passed as | |
208 | individual arguments. Extended options such as `bound`,`targets`, and |
|
208 | individual arguments. Extended options such as `bound`,`targets`, and | |
209 | `block` are controlled by the attributes of the :class:`View` objects, so |
|
209 | `block` are controlled by the attributes of the :class:`View` objects, so | |
210 | they can provide the much more convenient |
|
210 | they can provide the much more convenient | |
211 | :meth:`View.apply(f,*args,**kwargs)`, which simply calls |
|
211 | :meth:`View.apply(f,*args,**kwargs)`, which simply calls | |
212 | ``f(*args,**kwargs)`` remotely. |
|
212 | ``f(*args,**kwargs)`` remotely. | |
213 |
|
213 | |||
214 | This example also shows one of the most important things about the IPython |
|
214 | This example also shows one of the most important things about the IPython | |
215 | engines: they have a persistent user namespaces. The :meth:`apply` method can |
|
215 | engines: they have a persistent user namespaces. The :meth:`apply` method can | |
216 | be run in either a bound or unbound way. The default for a View is to be |
|
216 | be run in either a bound or unbound way. The default for a View is to be | |
217 | unbound, unless called by the :meth:`apply_bound` method: |
|
217 | unbound, unless called by the :meth:`apply_bound` method: | |
218 |
|
218 | |||
219 | .. sourcecode:: ipython |
|
219 | .. sourcecode:: ipython | |
220 |
|
220 | |||
221 | In [9]: rc[:]['b'] = 5 # assign b to 5 everywhere |
|
221 | In [9]: rc[:]['b'] = 5 # assign b to 5 everywhere | |
222 |
|
222 | |||
223 | In [10]: v0 = rc[0] |
|
223 | In [10]: v0 = rc[0] | |
224 |
|
224 | |||
225 | In [12]: v0.apply_bound(lambda : b) |
|
225 | In [12]: v0.apply_bound(lambda : b) | |
226 | Out[12]: 5 |
|
226 | Out[12]: 5 | |
227 |
|
227 | |||
228 | In [13]: v0.apply(lambda : b) |
|
228 | In [13]: v0.apply(lambda : b) | |
229 | --------------------------------------------------------------------------- |
|
229 | --------------------------------------------------------------------------- | |
230 | RemoteError Traceback (most recent call last) |
|
230 | RemoteError Traceback (most recent call last) | |
231 | /home/you/<ipython-input-34-21a468eb10f0> in <module>() |
|
231 | /home/you/<ipython-input-34-21a468eb10f0> in <module>() | |
232 | ----> 1 v0.apply(lambda : b) |
|
232 | ----> 1 v0.apply(lambda : b) | |
233 | ... |
|
233 | ... | |
234 | RemoteError: NameError(global name 'b' is not defined) |
|
234 | RemoteError: NameError(global name 'b' is not defined) | |
235 | Traceback (most recent call last): |
|
235 | Traceback (most recent call last): | |
236 | File "/Users/minrk/dev/ip/mine/IPython/zmq/parallel/streamkernel.py", line 294, in apply_request |
|
236 | File "/Users/minrk/dev/ip/mine/IPython/zmq/parallel/streamkernel.py", line 294, in apply_request | |
237 | exec code in working, working |
|
237 | exec code in working, working | |
238 | File "<string>", line 1, in <module> |
|
238 | File "<string>", line 1, in <module> | |
239 | File "<ipython-input-34-21a468eb10f0>", line 1, in <lambda> |
|
239 | File "<ipython-input-34-21a468eb10f0>", line 1, in <lambda> | |
240 | NameError: global name 'b' is not defined |
|
240 | NameError: global name 'b' is not defined | |
241 |
|
241 | |||
242 |
|
242 | |||
243 | Specifically, `bound=True` specifies that the engine's namespace is to be used |
|
243 | Specifically, `bound=True` specifies that the engine's namespace is to be used | |
244 | for execution, and `bound=False` specifies that the engine's namespace is not |
|
244 | for execution, and `bound=False` specifies that the engine's namespace is not | |
245 | to be used (hence, 'b' is undefined during unbound execution, since the |
|
245 | to be used (hence, 'b' is undefined during unbound execution, since the | |
246 | function is called in an empty namespace). Unbound execution is often useful |
|
246 | function is called in an empty namespace). Unbound execution is often useful | |
247 | for large numbers of atomic tasks, which prevents bloating the engine's |
|
247 | for large numbers of atomic tasks, which prevents bloating the engine's | |
248 | memory, while bound execution lets you build on your previous work. |
|
248 | memory, while bound execution lets you build on your previous work. | |
249 |
|
249 | |||
250 |
|
250 | |||
251 | Non-blocking execution |
|
251 | Non-blocking execution | |
252 | ---------------------- |
|
252 | ---------------------- | |
253 |
|
253 | |||
254 | In non-blocking mode, :meth:`apply` submits the command to be executed and |
|
254 | In non-blocking mode, :meth:`apply` submits the command to be executed and | |
255 | then returns a :class:`AsyncResult` object immediately. The |
|
255 | then returns a :class:`AsyncResult` object immediately. The | |
256 | :class:`AsyncResult` object gives you a way of getting a result at a later |
|
256 | :class:`AsyncResult` object gives you a way of getting a result at a later | |
257 | time through its :meth:`get` method. |
|
257 | time through its :meth:`get` method. | |
258 |
|
258 | |||
259 | .. Note:: |
|
259 | .. Note:: | |
260 |
|
260 | |||
261 |
The :class:`AsyncResult` object provides |
|
261 | The :class:`AsyncResult` object provides a superset of the interface in | |
262 | :py:class:`multiprocessing.pool.AsyncResult`. See the |
|
262 | :py:class:`multiprocessing.pool.AsyncResult`. See the | |
263 | `official Python documentation <http://docs.python.org/library/multiprocessing#multiprocessing.pool.AsyncResult>`_ |
|
263 | `official Python documentation <http://docs.python.org/library/multiprocessing#multiprocessing.pool.AsyncResult>`_ | |
264 | for more. |
|
264 | for more. | |
265 |
|
265 | |||
266 |
|
266 | |||
267 | This allows you to quickly submit long running commands without blocking your |
|
267 | This allows you to quickly submit long running commands without blocking your | |
268 | local Python/IPython session: |
|
268 | local Python/IPython session: | |
269 |
|
269 | |||
270 | .. sourcecode:: ipython |
|
270 | .. sourcecode:: ipython | |
271 |
|
271 | |||
272 | # define our function |
|
272 | # define our function | |
273 |
In [ |
|
273 | In [6]: def wait(t): | |
274 |
... |
|
274 | ...: import time | |
275 |
... |
|
275 | ...: tic = time.time() | |
276 |
... |
|
276 | ...: time.sleep(t) | |
277 |
... |
|
277 | ...: return time.time()-tic | |
278 |
|
278 | |||
279 | # In blocking mode |
|
|||
280 | In [6]: rc.apply('import time') |
|
|||
281 |
|
||||
282 | # In non-blocking mode |
|
279 | # In non-blocking mode | |
283 | In [7]: pr = rc[:].apply_async(wait, 2) |
|
280 | In [7]: pr = rc[:].apply_async(wait, 2) | |
284 |
|
281 | |||
285 | # Now block for the result |
|
282 | # Now block for the result | |
286 | In [8]: pr.get() |
|
283 | In [8]: pr.get() | |
287 | Out[8]: [2.0006198883056641, 1.9997570514678955, 1.9996809959411621, 2.0003249645233154] |
|
284 | Out[8]: [2.0006198883056641, 1.9997570514678955, 1.9996809959411621, 2.0003249645233154] | |
288 |
|
285 | |||
289 | # Again in non-blocking mode |
|
286 | # Again in non-blocking mode | |
290 | In [9]: pr = rc[:].apply_async(wait, 10) |
|
287 | In [9]: pr = rc[:].apply_async(wait, 10) | |
291 |
|
288 | |||
292 | # Poll to see if the result is ready |
|
289 | # Poll to see if the result is ready | |
293 | In [10]: pr.ready() |
|
290 | In [10]: pr.ready() | |
294 | Out[10]: False |
|
291 | Out[10]: False | |
295 |
|
292 | |||
296 | # ask for the result, but wait a maximum of 1 second: |
|
293 | # ask for the result, but wait a maximum of 1 second: | |
297 | In [45]: pr.get(1) |
|
294 | In [45]: pr.get(1) | |
298 | --------------------------------------------------------------------------- |
|
295 | --------------------------------------------------------------------------- | |
299 | TimeoutError Traceback (most recent call last) |
|
296 | TimeoutError Traceback (most recent call last) | |
300 | /home/you/<ipython-input-45-7cd858bbb8e0> in <module>() |
|
297 | /home/you/<ipython-input-45-7cd858bbb8e0> in <module>() | |
301 | ----> 1 pr.get(1) |
|
298 | ----> 1 pr.get(1) | |
302 |
|
299 | |||
303 | /path/to/site-packages/IPython/zmq/parallel/asyncresult.pyc in get(self, timeout) |
|
300 | /path/to/site-packages/IPython/zmq/parallel/asyncresult.pyc in get(self, timeout) | |
304 | 62 raise self._exception |
|
301 | 62 raise self._exception | |
305 | 63 else: |
|
302 | 63 else: | |
306 | ---> 64 raise error.TimeoutError("Result not ready.") |
|
303 | ---> 64 raise error.TimeoutError("Result not ready.") | |
307 | 65 |
|
304 | 65 | |
308 | 66 def ready(self): |
|
305 | 66 def ready(self): | |
309 |
|
306 | |||
310 | TimeoutError: Result not ready. |
|
307 | TimeoutError: Result not ready. | |
311 |
|
308 | |||
312 | .. Note:: |
|
309 | .. Note:: | |
313 |
|
310 | |||
314 | Note the import inside the function. This is a common model, to ensure |
|
311 | Note the import inside the function. This is a common model, to ensure | |
315 | that the appropriate modules are imported where the task is run. |
|
312 | that the appropriate modules are imported where the task is run. | |
316 |
|
313 | |||
317 | Often, it is desirable to wait until a set of :class:`AsyncResult` objects |
|
314 | Often, it is desirable to wait until a set of :class:`AsyncResult` objects | |
318 | are done. For this, there is a the method :meth:`barrier`. This method takes a |
|
315 | are done. For this, there is a the method :meth:`barrier`. This method takes a | |
319 |
tuple of :class:`AsyncResult` objects (or `msg_ids`) and blocks until all of the |
|
316 | tuple of :class:`AsyncResult` objects (or `msg_ids`) and blocks until all of the | |
320 | results are ready: |
|
317 | associated results are ready: | |
321 |
|
318 | |||
322 | .. sourcecode:: ipython |
|
319 | .. sourcecode:: ipython | |
323 |
|
320 | |||
324 | In [72]: rc.block=False |
|
321 | In [72]: rc.block=False | |
325 |
|
322 | |||
326 | # A trivial list of AsyncResults objects |
|
323 | # A trivial list of AsyncResults objects | |
327 | In [73]: pr_list = [rc[:].apply_async(wait, 3) for i in range(10)] |
|
324 | In [73]: pr_list = [rc[:].apply_async(wait, 3) for i in range(10)] | |
328 |
|
325 | |||
329 | # Wait until all of them are done |
|
326 | # Wait until all of them are done | |
330 | In [74]: rc.barrier(pr_list) |
|
327 | In [74]: rc.barrier(pr_list) | |
331 |
|
328 | |||
332 |
# Then, their results are ready using get |
|
329 | # Then, their results are ready using get() or the `.r` attribute | |
333 | In [75]: pr_list[0].get() |
|
330 | In [75]: pr_list[0].get() | |
334 | Out[75]: [2.9982571601867676, 2.9982588291168213, 2.9987530708312988, 2.9990990161895752] |
|
331 | Out[75]: [2.9982571601867676, 2.9982588291168213, 2.9987530708312988, 2.9990990161895752] | |
335 |
|
332 | |||
336 |
|
333 | |||
337 |
|
334 | |||
338 | The ``block`` and ``targets`` keyword arguments and attributes |
|
335 | The ``block`` and ``targets`` keyword arguments and attributes | |
339 | -------------------------------------------------------------- |
|
336 | -------------------------------------------------------------- | |
340 |
|
337 | |||
341 | .. warning:: |
|
338 | .. warning:: | |
342 |
|
339 | |||
343 | This is different now, I haven't updated this section. |
|
340 | This is different now, I haven't updated this section. | |
344 | -MinRK |
|
341 | -MinRK | |
345 |
|
342 | |||
346 | Most methods(like :meth:`apply`) accept |
|
343 | Most methods(like :meth:`apply`) accept | |
347 | ``block`` and ``targets`` as keyword arguments. As we have seen above, these |
|
344 | ``block`` and ``targets`` as keyword arguments. As we have seen above, these | |
348 | keyword arguments control the blocking mode and which engines the command is |
|
345 | keyword arguments control the blocking mode and which engines the command is | |
349 | applied to. The :class:`Client` class also has :attr:`block` and |
|
346 | applied to. The :class:`Client` class also has :attr:`block` and | |
350 | :attr:`targets` attributes that control the default behavior when the keyword |
|
347 | :attr:`targets` attributes that control the default behavior when the keyword | |
351 | arguments are not provided. Thus the following logic is used for :attr:`block` |
|
348 | arguments are not provided. Thus the following logic is used for :attr:`block` | |
352 | and :attr:`targets`: |
|
349 | and :attr:`targets`: | |
353 |
|
350 | |||
354 | * If no keyword argument is provided, the instance attributes are used. |
|
351 | * If no keyword argument is provided, the instance attributes are used. | |
355 | * Keyword argument, if provided override the instance attributes. |
|
352 | * Keyword argument, if provided override the instance attributes. | |
356 |
|
353 | |||
357 | The following examples demonstrate how to use the instance attributes: |
|
354 | The following examples demonstrate how to use the instance attributes: | |
358 |
|
355 | |||
359 | .. sourcecode:: ipython |
|
356 | .. sourcecode:: ipython | |
360 |
|
357 | |||
361 | In [16]: rc.targets = [0,2] |
|
358 | In [16]: rc.targets = [0,2] | |
362 |
|
359 | |||
363 | In [17]: rc.block = False |
|
360 | In [17]: rc.block = False | |
364 |
|
361 | |||
365 | In [18]: pr = rc.execute('a=5') |
|
362 | In [18]: pr = rc.execute('a=5') | |
366 |
|
363 | |||
367 | In [19]: pr.r |
|
364 | In [19]: pr.r | |
368 | Out[19]: |
|
365 | Out[19]: | |
369 | <Results List> |
|
366 | <Results List> | |
370 | [0] In [6]: a=5 |
|
367 | [0] In [6]: a=5 | |
371 | [2] In [6]: a=5 |
|
368 | [2] In [6]: a=5 | |
372 |
|
369 | |||
373 | # Note targets='all' means all engines |
|
370 | # Note targets='all' means all engines | |
374 | In [20]: rc.targets = 'all' |
|
371 | In [20]: rc.targets = 'all' | |
375 |
|
372 | |||
376 | In [21]: rc.block = True |
|
373 | In [21]: rc.block = True | |
377 |
|
374 | |||
378 | In [22]: rc.execute('b=10; print b') |
|
375 | In [22]: rc.execute('b=10; print b') | |
379 | Out[22]: |
|
376 | Out[22]: | |
380 | <Results List> |
|
377 | <Results List> | |
381 | [0] In [7]: b=10; print b |
|
378 | [0] In [7]: b=10; print b | |
382 | [0] Out[7]: 10 |
|
379 | [0] Out[7]: 10 | |
383 |
|
380 | |||
384 | [1] In [6]: b=10; print b |
|
381 | [1] In [6]: b=10; print b | |
385 | [1] Out[6]: 10 |
|
382 | [1] Out[6]: 10 | |
386 |
|
383 | |||
387 | [2] In [7]: b=10; print b |
|
384 | [2] In [7]: b=10; print b | |
388 | [2] Out[7]: 10 |
|
385 | [2] Out[7]: 10 | |
389 |
|
386 | |||
390 | [3] In [6]: b=10; print b |
|
387 | [3] In [6]: b=10; print b | |
391 | [3] Out[6]: 10 |
|
388 | [3] Out[6]: 10 | |
392 |
|
389 | |||
393 | The :attr:`block` and :attr:`targets` instance attributes also determine the |
|
390 | The :attr:`block` and :attr:`targets` instance attributes also determine the | |
394 | behavior of the parallel magic commands. |
|
391 | behavior of the parallel magic commands. | |
395 |
|
392 | |||
396 |
|
393 | |||
397 | Parallel magic commands |
|
394 | Parallel magic commands | |
398 | ----------------------- |
|
395 | ----------------------- | |
399 |
|
396 | |||
400 | .. warning:: |
|
397 | .. warning:: | |
401 |
|
398 | |||
402 | The magics have not been changed to work with the zeromq system. ``%px`` |
|
399 | The magics have not been changed to work with the zeromq system. ``%px`` | |
403 | and ``%autopx`` do work, but ``%result`` does not. %px and %autopx *do |
|
400 | and ``%autopx`` do work, but ``%result`` does not. %px and %autopx *do | |
404 | not* print stdin/out. |
|
401 | not* print stdin/out. | |
405 |
|
402 | |||
406 | We provide a few IPython magic commands (``%px``, ``%autopx`` and ``%result``) |
|
403 | We provide a few IPython magic commands (``%px``, ``%autopx`` and ``%result``) | |
407 | that make it more pleasant to execute Python commands on the engines |
|
404 | that make it more pleasant to execute Python commands on the engines | |
408 | interactively. These are simply shortcuts to :meth:`execute` and |
|
405 | interactively. These are simply shortcuts to :meth:`execute` and | |
409 | :meth:`get_result`. The ``%px`` magic executes a single Python command on the |
|
406 | :meth:`get_result`. The ``%px`` magic executes a single Python command on the | |
410 | engines specified by the :attr:`targets` attribute of the |
|
407 | engines specified by the :attr:`targets` attribute of the | |
411 | :class:`MultiEngineClient` instance (by default this is ``'all'``): |
|
408 | :class:`MultiEngineClient` instance (by default this is ``'all'``): | |
412 |
|
409 | |||
413 | .. sourcecode:: ipython |
|
410 | .. sourcecode:: ipython | |
414 |
|
411 | |||
415 | # Create a DirectView for all targets |
|
412 | # Create a DirectView for all targets | |
416 | In [22]: dv = rc[:] |
|
413 | In [22]: dv = rc[:] | |
417 |
|
414 | |||
418 | # Make this DirectView active for parallel magic commands |
|
415 | # Make this DirectView active for parallel magic commands | |
419 | In [23]: dv.activate() |
|
416 | In [23]: dv.activate() | |
420 |
|
417 | |||
421 | In [24]: dv.block=True |
|
418 | In [24]: dv.block=True | |
422 |
|
419 | |||
423 | In [25]: import numpy |
|
420 | In [25]: import numpy | |
424 |
|
421 | |||
425 | In [26]: %px import numpy |
|
422 | In [26]: %px import numpy | |
426 | Parallel execution on engines: [0, 1, 2, 3] |
|
423 | Parallel execution on engines: [0, 1, 2, 3] | |
427 | Out[26]:[None,None,None,None] |
|
424 | Out[26]:[None,None,None,None] | |
428 |
|
425 | |||
429 | In [27]: %px a = numpy.random.rand(2,2) |
|
426 | In [27]: %px a = numpy.random.rand(2,2) | |
430 | Parallel execution on engines: [0, 1, 2, 3] |
|
427 | Parallel execution on engines: [0, 1, 2, 3] | |
431 |
|
428 | |||
432 | In [28]: %px ev = numpy.linalg.eigvals(a) |
|
429 | In [28]: %px ev = numpy.linalg.eigvals(a) | |
433 | Parallel execution on engines: [0, 1, 2, 3] |
|
430 | Parallel execution on engines: [0, 1, 2, 3] | |
434 |
|
431 | |||
435 | In [28]: dv['ev'] |
|
432 | In [28]: dv['ev'] | |
436 | Out[44]: [ array([ 1.09522024, -0.09645227]), |
|
433 | Out[44]: [ array([ 1.09522024, -0.09645227]), | |
437 | array([ 1.21435496, -0.35546712]), |
|
434 | array([ 1.21435496, -0.35546712]), | |
438 | array([ 0.72180653, 0.07133042]), |
|
435 | array([ 0.72180653, 0.07133042]), | |
439 | array([ 1.46384341e+00, 1.04353244e-04]) |
|
436 | array([ 1.46384341e+00, 1.04353244e-04]) | |
440 | ] |
|
437 | ] | |
441 |
|
438 | |||
442 | .. Note:: |
|
439 | .. Note:: | |
443 |
|
440 | |||
444 | ``%result`` doesn't work |
|
441 | ``%result`` doesn't work | |
445 |
|
442 | |||
446 | The ``%result`` magic gets and prints the stdin/stdout/stderr of the last |
|
443 | The ``%result`` magic gets and prints the stdin/stdout/stderr of the last | |
447 | command executed on each engine. It is simply a shortcut to the |
|
444 | command executed on each engine. It is simply a shortcut to the | |
448 | :meth:`get_result` method: |
|
445 | :meth:`get_result` method: | |
449 |
|
446 | |||
450 | .. sourcecode:: ipython |
|
447 | .. sourcecode:: ipython | |
451 |
|
448 | |||
452 | In [29]: %result |
|
449 | In [29]: %result | |
453 | Out[29]: |
|
450 | Out[29]: | |
454 | <Results List> |
|
451 | <Results List> | |
455 | [0] In [10]: print numpy.linalg.eigvals(a) |
|
452 | [0] In [10]: print numpy.linalg.eigvals(a) | |
456 | [0] Out[10]: [ 1.28167017 0.14197338] |
|
453 | [0] Out[10]: [ 1.28167017 0.14197338] | |
457 |
|
454 | |||
458 | [1] In [9]: print numpy.linalg.eigvals(a) |
|
455 | [1] In [9]: print numpy.linalg.eigvals(a) | |
459 | [1] Out[9]: [-0.14093616 1.27877273] |
|
456 | [1] Out[9]: [-0.14093616 1.27877273] | |
460 |
|
457 | |||
461 | [2] In [10]: print numpy.linalg.eigvals(a) |
|
458 | [2] In [10]: print numpy.linalg.eigvals(a) | |
462 | [2] Out[10]: [-0.37023573 1.06779409] |
|
459 | [2] Out[10]: [-0.37023573 1.06779409] | |
463 |
|
460 | |||
464 | [3] In [9]: print numpy.linalg.eigvals(a) |
|
461 | [3] In [9]: print numpy.linalg.eigvals(a) | |
465 | [3] Out[9]: [ 0.83664764 -0.25602658] |
|
462 | [3] Out[9]: [ 0.83664764 -0.25602658] | |
466 |
|
463 | |||
467 | The ``%autopx`` magic switches to a mode where everything you type is executed |
|
464 | The ``%autopx`` magic switches to a mode where everything you type is executed | |
468 | on the engines given by the :attr:`targets` attribute: |
|
465 | on the engines given by the :attr:`targets` attribute: | |
469 |
|
466 | |||
470 | .. sourcecode:: ipython |
|
467 | .. sourcecode:: ipython | |
471 |
|
468 | |||
472 | In [30]: dv.block=False |
|
469 | In [30]: dv.block=False | |
473 |
|
470 | |||
474 | In [31]: %autopx |
|
471 | In [31]: %autopx | |
475 | Auto Parallel Enabled |
|
472 | Auto Parallel Enabled | |
476 | Type %autopx to disable |
|
473 | Type %autopx to disable | |
477 |
|
474 | |||
478 | In [32]: max_evals = [] |
|
475 | In [32]: max_evals = [] | |
479 | <IPython.zmq.parallel.asyncresult.AsyncResult object at 0x17b8a70> |
|
476 | <IPython.zmq.parallel.asyncresult.AsyncResult object at 0x17b8a70> | |
480 |
|
477 | |||
481 | In [33]: for i in range(100): |
|
478 | In [33]: for i in range(100): | |
482 | ....: a = numpy.random.rand(10,10) |
|
479 | ....: a = numpy.random.rand(10,10) | |
483 | ....: a = a+a.transpose() |
|
480 | ....: a = a+a.transpose() | |
484 | ....: evals = numpy.linalg.eigvals(a) |
|
481 | ....: evals = numpy.linalg.eigvals(a) | |
485 | ....: max_evals.append(evals[0].real) |
|
482 | ....: max_evals.append(evals[0].real) | |
486 | ....: |
|
483 | ....: | |
487 | ....: |
|
484 | ....: | |
488 | <IPython.zmq.parallel.asyncresult.AsyncResult object at 0x17af8f0> |
|
485 | <IPython.zmq.parallel.asyncresult.AsyncResult object at 0x17af8f0> | |
489 |
|
486 | |||
490 | In [34]: %autopx |
|
487 | In [34]: %autopx | |
491 | Auto Parallel Disabled |
|
488 | Auto Parallel Disabled | |
492 |
|
489 | |||
493 | In [35]: dv.block=True |
|
490 | In [35]: dv.block=True | |
494 |
|
491 | |||
495 | In [36]: px ans= "Average max eigenvalue is: %f"%(sum(max_evals)/len(max_evals)) |
|
492 | In [36]: px ans= "Average max eigenvalue is: %f"%(sum(max_evals)/len(max_evals)) | |
496 | Parallel execution on engines: [0, 1, 2, 3] |
|
493 | Parallel execution on engines: [0, 1, 2, 3] | |
497 |
|
494 | |||
498 | In [37]: dv['ans'] |
|
495 | In [37]: dv['ans'] | |
499 | Out[37]: [ 'Average max eigenvalue is: 10.1387247332', |
|
496 | Out[37]: [ 'Average max eigenvalue is: 10.1387247332', | |
500 | 'Average max eigenvalue is: 10.2076902286', |
|
497 | 'Average max eigenvalue is: 10.2076902286', | |
501 | 'Average max eigenvalue is: 10.1891484655', |
|
498 | 'Average max eigenvalue is: 10.1891484655', | |
502 | 'Average max eigenvalue is: 10.1158837784',] |
|
499 | 'Average max eigenvalue is: 10.1158837784',] | |
503 |
|
500 | |||
504 |
|
501 | |||
505 | .. Note:: |
|
502 | .. Note:: | |
506 |
|
503 | |||
507 | Multiline ``%autpx`` gets fouled up by NameErrors, because IPython |
|
504 | Multiline ``%autpx`` gets fouled up by NameErrors, because IPython | |
508 | currently introspects too much. |
|
505 | currently introspects too much. | |
509 |
|
506 | |||
510 |
|
507 | |||
511 | Moving Python objects around |
|
508 | Moving Python objects around | |
512 | ============================ |
|
509 | ============================ | |
513 |
|
510 | |||
514 | In addition to calling functions and executing code on engines, you can |
|
511 | In addition to calling functions and executing code on engines, you can | |
515 | transfer Python objects to and from your IPython session and the engines. In |
|
512 | transfer Python objects to and from your IPython session and the engines. In | |
516 | IPython, these operations are called :meth:`push` (sending an object to the |
|
513 | IPython, these operations are called :meth:`push` (sending an object to the | |
517 | engines) and :meth:`pull` (getting an object from the engines). |
|
514 | engines) and :meth:`pull` (getting an object from the engines). | |
518 |
|
515 | |||
519 | Basic push and pull |
|
516 | Basic push and pull | |
520 | ------------------- |
|
517 | ------------------- | |
521 |
|
518 | |||
522 | Here are some examples of how you use :meth:`push` and :meth:`pull`: |
|
519 | Here are some examples of how you use :meth:`push` and :meth:`pull`: | |
523 |
|
520 | |||
524 | .. sourcecode:: ipython |
|
521 | .. sourcecode:: ipython | |
525 |
|
522 | |||
526 | In [38]: rc.push(dict(a=1.03234,b=3453)) |
|
523 | In [38]: rc.push(dict(a=1.03234,b=3453)) | |
527 | Out[38]: [None,None,None,None] |
|
524 | Out[38]: [None,None,None,None] | |
528 |
|
525 | |||
529 | In [39]: rc.pull('a') |
|
526 | In [39]: rc.pull('a') | |
530 | Out[39]: [ 1.03234, 1.03234, 1.03234, 1.03234] |
|
527 | Out[39]: [ 1.03234, 1.03234, 1.03234, 1.03234] | |
531 |
|
528 | |||
532 | In [40]: rc.pull('b',targets=0) |
|
529 | In [40]: rc.pull('b',targets=0) | |
533 | Out[40]: 3453 |
|
530 | Out[40]: 3453 | |
534 |
|
531 | |||
535 | In [41]: rc.pull(('a','b')) |
|
532 | In [41]: rc.pull(('a','b')) | |
536 | Out[41]: [ [1.03234, 3453], [1.03234, 3453], [1.03234, 3453], [1.03234, 3453] ] |
|
533 | Out[41]: [ [1.03234, 3453], [1.03234, 3453], [1.03234, 3453], [1.03234, 3453] ] | |
537 |
|
534 | |||
538 | # zmq client does not have zip_pull |
|
535 | # zmq client does not have zip_pull | |
539 | In [42]: rc.zip_pull(('a','b')) |
|
536 | In [42]: rc.zip_pull(('a','b')) | |
540 | Out[42]: [(1.03234, 1.03234, 1.03234, 1.03234), (3453, 3453, 3453, 3453)] |
|
537 | Out[42]: [(1.03234, 1.03234, 1.03234, 1.03234), (3453, 3453, 3453, 3453)] | |
541 |
|
538 | |||
542 | In [43]: rc.push(dict(c='speed')) |
|
539 | In [43]: rc.push(dict(c='speed')) | |
543 | Out[43]: [None,None,None,None] |
|
540 | Out[43]: [None,None,None,None] | |
544 |
|
541 | |||
545 | In non-blocking mode :meth:`push` and :meth:`pull` also return |
|
542 | In non-blocking mode :meth:`push` and :meth:`pull` also return | |
546 | :class:`AsyncResult` objects: |
|
543 | :class:`AsyncResult` objects: | |
547 |
|
544 | |||
548 | .. sourcecode:: ipython |
|
545 | .. sourcecode:: ipython | |
549 |
|
546 | |||
550 | In [47]: rc.block=False |
|
547 | In [47]: rc.block=False | |
551 |
|
548 | |||
552 | In [48]: pr = rc.pull('a') |
|
549 | In [48]: pr = rc.pull('a') | |
553 |
|
550 | |||
554 | In [49]: pr.get() |
|
551 | In [49]: pr.get() | |
555 | Out[49]: [1.03234, 1.03234, 1.03234, 1.03234] |
|
552 | Out[49]: [1.03234, 1.03234, 1.03234, 1.03234] | |
556 |
|
553 | |||
557 |
|
554 | |||
558 |
|
555 | |||
559 |
|
556 | |||
560 | Dictionary interface |
|
557 | Dictionary interface | |
561 | -------------------- |
|
558 | -------------------- | |
562 |
|
559 | |||
563 | Since a namespace is just a :class:`dict`, :class:`DirectView` objects provide |
|
560 | Since a namespace is just a :class:`dict`, :class:`DirectView` objects provide | |
564 | dictionary-style access by key and methods such as :meth:`get` and |
|
561 | dictionary-style access by key and methods such as :meth:`get` and | |
565 | :meth:`update` for convenience. This make the remote namespaces of the engines |
|
562 | :meth:`update` for convenience. This make the remote namespaces of the engines | |
566 | appear as a local dictionary. Underneath, this uses :meth:`push` and |
|
563 | appear as a local dictionary. Underneath, this uses :meth:`push` and | |
567 | :meth:`pull`: |
|
564 | :meth:`pull`: | |
568 |
|
565 | |||
569 | .. sourcecode:: ipython |
|
566 | .. sourcecode:: ipython | |
570 |
|
567 | |||
571 | In [50]: rc.block=True |
|
568 | In [50]: rc.block=True | |
572 |
|
569 | |||
573 | In [51]: rc[:]['a']=['foo','bar'] |
|
570 | In [51]: rc[:]['a']=['foo','bar'] | |
574 |
|
571 | |||
575 | In [52]: rc[:]['a'] |
|
572 | In [52]: rc[:]['a'] | |
576 | Out[52]: [ ['foo', 'bar'], ['foo', 'bar'], ['foo', 'bar'], ['foo', 'bar'] ] |
|
573 | Out[52]: [ ['foo', 'bar'], ['foo', 'bar'], ['foo', 'bar'], ['foo', 'bar'] ] | |
577 |
|
574 | |||
578 | Scatter and gather |
|
575 | Scatter and gather | |
579 | ------------------ |
|
576 | ------------------ | |
580 |
|
577 | |||
581 | Sometimes it is useful to partition a sequence and push the partitions to |
|
578 | Sometimes it is useful to partition a sequence and push the partitions to | |
582 | different engines. In MPI language, this is know as scatter/gather and we |
|
579 | different engines. In MPI language, this is know as scatter/gather and we | |
583 | follow that terminology. However, it is important to remember that in |
|
580 | follow that terminology. However, it is important to remember that in | |
584 | IPython's :class:`Client` class, :meth:`scatter` is from the |
|
581 | IPython's :class:`Client` class, :meth:`scatter` is from the | |
585 | interactive IPython session to the engines and :meth:`gather` is from the |
|
582 | interactive IPython session to the engines and :meth:`gather` is from the | |
586 | engines back to the interactive IPython session. For scatter/gather operations |
|
583 | engines back to the interactive IPython session. For scatter/gather operations | |
587 | between engines, MPI should be used: |
|
584 | between engines, MPI should be used: | |
588 |
|
585 | |||
589 | .. sourcecode:: ipython |
|
586 | .. sourcecode:: ipython | |
590 |
|
587 | |||
591 | In [58]: rc.scatter('a',range(16)) |
|
588 | In [58]: rc.scatter('a',range(16)) | |
592 | Out[58]: [None,None,None,None] |
|
589 | Out[58]: [None,None,None,None] | |
593 |
|
590 | |||
594 | In [59]: rc[:]['a'] |
|
591 | In [59]: rc[:]['a'] | |
595 | Out[59]: [ [0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15] ] |
|
592 | Out[59]: [ [0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15] ] | |
596 |
|
593 | |||
597 | In [60]: rc.gather('a') |
|
594 | In [60]: rc.gather('a') | |
598 | Out[60]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] |
|
595 | Out[60]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] | |
599 |
|
596 | |||
600 | Other things to look at |
|
597 | Other things to look at | |
601 | ======================= |
|
598 | ======================= | |
602 |
|
599 | |||
603 | How to do parallel list comprehensions |
|
600 | How to do parallel list comprehensions | |
604 | -------------------------------------- |
|
601 | -------------------------------------- | |
605 |
|
602 | |||
606 | In many cases list comprehensions are nicer than using the map function. While |
|
603 | In many cases list comprehensions are nicer than using the map function. While | |
607 | we don't have fully parallel list comprehensions, it is simple to get the |
|
604 | we don't have fully parallel list comprehensions, it is simple to get the | |
608 | basic effect using :meth:`scatter` and :meth:`gather`: |
|
605 | basic effect using :meth:`scatter` and :meth:`gather`: | |
609 |
|
606 | |||
610 | .. sourcecode:: ipython |
|
607 | .. sourcecode:: ipython | |
611 |
|
608 | |||
612 | In [66]: rc.scatter('x',range(64)) |
|
609 | In [66]: rc.scatter('x',range(64)) | |
613 | Out[66]: [None,None,None,None] |
|
610 | Out[66]: [None,None,None,None] | |
614 |
|
611 | |||
615 | In [67]: px y = [i**10 for i in x] |
|
612 | In [67]: px y = [i**10 for i in x] | |
616 | Parallel execution on engines: [0, 1, 2, 3] |
|
613 | Parallel execution on engines: [0, 1, 2, 3] | |
617 | Out[67]: |
|
614 | Out[67]: | |
618 |
|
615 | |||
619 | In [68]: y = rc.gather('y') |
|
616 | In [68]: y = rc.gather('y') | |
620 |
|
617 | |||
621 | In [69]: print y |
|
618 | In [69]: print y | |
622 | [0, 1, 1024, 59049, 1048576, 9765625, 60466176, 282475249, 1073741824,...] |
|
619 | [0, 1, 1024, 59049, 1048576, 9765625, 60466176, 282475249, 1073741824,...] | |
623 |
|
620 | |||
624 | Parallel exceptions |
|
621 | Parallel exceptions | |
625 | ------------------- |
|
622 | ------------------- | |
626 |
|
623 | |||
627 | In the multiengine interface, parallel commands can raise Python exceptions, |
|
624 | In the multiengine interface, parallel commands can raise Python exceptions, | |
628 | just like serial commands. But, it is a little subtle, because a single |
|
625 | just like serial commands. But, it is a little subtle, because a single | |
629 | parallel command can actually raise multiple exceptions (one for each engine |
|
626 | parallel command can actually raise multiple exceptions (one for each engine | |
630 | the command was run on). To express this idea, the MultiEngine interface has a |
|
627 | the command was run on). To express this idea, the MultiEngine interface has a | |
631 | :exc:`CompositeError` exception class that will be raised in most cases. The |
|
628 | :exc:`CompositeError` exception class that will be raised in most cases. The | |
632 | :exc:`CompositeError` class is a special type of exception that wraps one or |
|
629 | :exc:`CompositeError` class is a special type of exception that wraps one or | |
633 | more other types of exceptions. Here is how it works: |
|
630 | more other types of exceptions. Here is how it works: | |
634 |
|
631 | |||
635 | .. sourcecode:: ipython |
|
632 | .. sourcecode:: ipython | |
636 |
|
633 | |||
637 | In [76]: rc.block=True |
|
634 | In [76]: rc.block=True | |
638 |
|
635 | |||
639 | In [77]: rc.execute('1/0') |
|
636 | In [77]: rc.execute('1/0') | |
640 | --------------------------------------------------------------------------- |
|
637 | --------------------------------------------------------------------------- | |
641 | CompositeError Traceback (most recent call last) |
|
638 | CompositeError Traceback (most recent call last) | |
642 |
|
639 | |||
643 | /ipython1-client-r3021/docs/examples/<ipython console> in <module>() |
|
640 | /ipython1-client-r3021/docs/examples/<ipython console> in <module>() | |
644 |
|
641 | |||
645 | /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in execute(self, lines, targets, block) |
|
642 | /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in execute(self, lines, targets, block) | |
646 | 432 targets, block = self._findTargetsAndBlock(targets, block) |
|
643 | 432 targets, block = self._findTargetsAndBlock(targets, block) | |
647 | 433 result = blockingCallFromThread(self.smultiengine.execute, lines, |
|
644 | 433 result = blockingCallFromThread(self.smultiengine.execute, lines, | |
648 | --> 434 targets=targets, block=block) |
|
645 | --> 434 targets=targets, block=block) | |
649 | 435 if block: |
|
646 | 435 if block: | |
650 | 436 result = ResultList(result) |
|
647 | 436 result = ResultList(result) | |
651 |
|
648 | |||
652 | /ipython1-client-r3021/ipython1/kernel/twistedutil.pyc in blockingCallFromThread(f, *a, **kw) |
|
649 | /ipython1-client-r3021/ipython1/kernel/twistedutil.pyc in blockingCallFromThread(f, *a, **kw) | |
653 | 72 result.raiseException() |
|
650 | 72 result.raiseException() | |
654 | 73 except Exception, e: |
|
651 | 73 except Exception, e: | |
655 | ---> 74 raise e |
|
652 | ---> 74 raise e | |
656 | 75 return result |
|
653 | 75 return result | |
657 | 76 |
|
654 | 76 | |
658 |
|
655 | |||
659 | CompositeError: one or more exceptions from call to method: execute |
|
656 | CompositeError: one or more exceptions from call to method: execute | |
660 | [0:execute]: ZeroDivisionError: integer division or modulo by zero |
|
657 | [0:execute]: ZeroDivisionError: integer division or modulo by zero | |
661 | [1:execute]: ZeroDivisionError: integer division or modulo by zero |
|
658 | [1:execute]: ZeroDivisionError: integer division or modulo by zero | |
662 | [2:execute]: ZeroDivisionError: integer division or modulo by zero |
|
659 | [2:execute]: ZeroDivisionError: integer division or modulo by zero | |
663 | [3:execute]: ZeroDivisionError: integer division or modulo by zero |
|
660 | [3:execute]: ZeroDivisionError: integer division or modulo by zero | |
664 |
|
661 | |||
665 | Notice how the error message printed when :exc:`CompositeError` is raised has |
|
662 | Notice how the error message printed when :exc:`CompositeError` is raised has | |
666 | information about the individual exceptions that were raised on each engine. |
|
663 | information about the individual exceptions that were raised on each engine. | |
667 | If you want, you can even raise one of these original exceptions: |
|
664 | If you want, you can even raise one of these original exceptions: | |
668 |
|
665 | |||
669 | .. sourcecode:: ipython |
|
666 | .. sourcecode:: ipython | |
670 |
|
667 | |||
671 | In [80]: try: |
|
668 | In [80]: try: | |
672 | ....: rc.execute('1/0') |
|
669 | ....: rc.execute('1/0') | |
673 | ....: except client.CompositeError, e: |
|
670 | ....: except client.CompositeError, e: | |
674 | ....: e.raise_exception() |
|
671 | ....: e.raise_exception() | |
675 | ....: |
|
672 | ....: | |
676 | ....: |
|
673 | ....: | |
677 | --------------------------------------------------------------------------- |
|
674 | --------------------------------------------------------------------------- | |
678 | ZeroDivisionError Traceback (most recent call last) |
|
675 | ZeroDivisionError Traceback (most recent call last) | |
679 |
|
676 | |||
680 | /ipython1-client-r3021/docs/examples/<ipython console> in <module>() |
|
677 | /ipython1-client-r3021/docs/examples/<ipython console> in <module>() | |
681 |
|
678 | |||
682 | /ipython1-client-r3021/ipython1/kernel/error.pyc in raise_exception(self, excid) |
|
679 | /ipython1-client-r3021/ipython1/kernel/error.pyc in raise_exception(self, excid) | |
683 | 156 raise IndexError("an exception with index %i does not exist"%excid) |
|
680 | 156 raise IndexError("an exception with index %i does not exist"%excid) | |
684 | 157 else: |
|
681 | 157 else: | |
685 | --> 158 raise et, ev, etb |
|
682 | --> 158 raise et, ev, etb | |
686 | 159 |
|
683 | 159 | |
687 | 160 def collect_exceptions(rlist, method): |
|
684 | 160 def collect_exceptions(rlist, method): | |
688 |
|
685 | |||
689 | ZeroDivisionError: integer division or modulo by zero |
|
686 | ZeroDivisionError: integer division or modulo by zero | |
690 |
|
687 | |||
691 | If you are working in IPython, you can simple type ``%debug`` after one of |
|
688 | If you are working in IPython, you can simple type ``%debug`` after one of | |
692 | these :exc:`CompositeError` exceptions is raised, and inspect the exception |
|
689 | these :exc:`CompositeError` exceptions is raised, and inspect the exception | |
693 | instance: |
|
690 | instance: | |
694 |
|
691 | |||
695 | .. sourcecode:: ipython |
|
692 | .. sourcecode:: ipython | |
696 |
|
693 | |||
697 | In [81]: rc.execute('1/0') |
|
694 | In [81]: rc.execute('1/0') | |
698 | --------------------------------------------------------------------------- |
|
695 | --------------------------------------------------------------------------- | |
699 | CompositeError Traceback (most recent call last) |
|
696 | CompositeError Traceback (most recent call last) | |
700 |
|
697 | |||
701 | /ipython1-client-r3021/docs/examples/<ipython console> in <module>() |
|
698 | /ipython1-client-r3021/docs/examples/<ipython console> in <module>() | |
702 |
|
699 | |||
703 | /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in execute(self, lines, targets, block) |
|
700 | /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in execute(self, lines, targets, block) | |
704 | 432 targets, block = self._findTargetsAndBlock(targets, block) |
|
701 | 432 targets, block = self._findTargetsAndBlock(targets, block) | |
705 | 433 result = blockingCallFromThread(self.smultiengine.execute, lines, |
|
702 | 433 result = blockingCallFromThread(self.smultiengine.execute, lines, | |
706 | --> 434 targets=targets, block=block) |
|
703 | --> 434 targets=targets, block=block) | |
707 | 435 if block: |
|
704 | 435 if block: | |
708 | 436 result = ResultList(result) |
|
705 | 436 result = ResultList(result) | |
709 |
|
706 | |||
710 | /ipython1-client-r3021/ipython1/kernel/twistedutil.pyc in blockingCallFromThread(f, *a, **kw) |
|
707 | /ipython1-client-r3021/ipython1/kernel/twistedutil.pyc in blockingCallFromThread(f, *a, **kw) | |
711 | 72 result.raiseException() |
|
708 | 72 result.raiseException() | |
712 | 73 except Exception, e: |
|
709 | 73 except Exception, e: | |
713 | ---> 74 raise e |
|
710 | ---> 74 raise e | |
714 | 75 return result |
|
711 | 75 return result | |
715 | 76 |
|
712 | 76 | |
716 |
|
713 | |||
717 | CompositeError: one or more exceptions from call to method: execute |
|
714 | CompositeError: one or more exceptions from call to method: execute | |
718 | [0:execute]: ZeroDivisionError: integer division or modulo by zero |
|
715 | [0:execute]: ZeroDivisionError: integer division or modulo by zero | |
719 | [1:execute]: ZeroDivisionError: integer division or modulo by zero |
|
716 | [1:execute]: ZeroDivisionError: integer division or modulo by zero | |
720 | [2:execute]: ZeroDivisionError: integer division or modulo by zero |
|
717 | [2:execute]: ZeroDivisionError: integer division or modulo by zero | |
721 | [3:execute]: ZeroDivisionError: integer division or modulo by zero |
|
718 | [3:execute]: ZeroDivisionError: integer division or modulo by zero | |
722 |
|
719 | |||
723 | In [82]: %debug |
|
720 | In [82]: %debug | |
724 | > |
|
721 | > | |
725 |
|
722 | |||
726 | /ipython1-client-r3021/ipython1/kernel/twistedutil.py(74)blockingCallFromThread() |
|
723 | /ipython1-client-r3021/ipython1/kernel/twistedutil.py(74)blockingCallFromThread() | |
727 | 73 except Exception, e: |
|
724 | 73 except Exception, e: | |
728 | ---> 74 raise e |
|
725 | ---> 74 raise e | |
729 | 75 return result |
|
726 | 75 return result | |
730 |
|
727 | |||
731 | # With the debugger running, e is the exceptions instance. We can tab complete |
|
728 | # With the debugger running, e is the exceptions instance. We can tab complete | |
732 | # on it and see the extra methods that are available. |
|
729 | # on it and see the extra methods that are available. | |
733 | ipdb> e. |
|
730 | ipdb> e. | |
734 | e.__class__ e.__getitem__ e.__new__ e.__setstate__ e.args |
|
731 | e.__class__ e.__getitem__ e.__new__ e.__setstate__ e.args | |
735 | e.__delattr__ e.__getslice__ e.__reduce__ e.__str__ e.elist |
|
732 | e.__delattr__ e.__getslice__ e.__reduce__ e.__str__ e.elist | |
736 | e.__dict__ e.__hash__ e.__reduce_ex__ e.__weakref__ e.message |
|
733 | e.__dict__ e.__hash__ e.__reduce_ex__ e.__weakref__ e.message | |
737 | e.__doc__ e.__init__ e.__repr__ e._get_engine_str e.print_tracebacks |
|
734 | e.__doc__ e.__init__ e.__repr__ e._get_engine_str e.print_tracebacks | |
738 | e.__getattribute__ e.__module__ e.__setattr__ e._get_traceback e.raise_exception |
|
735 | e.__getattribute__ e.__module__ e.__setattr__ e._get_traceback e.raise_exception | |
739 | ipdb> e.print_tracebacks() |
|
736 | ipdb> e.print_tracebacks() | |
740 | [0:execute]: |
|
737 | [0:execute]: | |
741 | --------------------------------------------------------------------------- |
|
738 | --------------------------------------------------------------------------- | |
742 | ZeroDivisionError Traceback (most recent call last) |
|
739 | ZeroDivisionError Traceback (most recent call last) | |
743 |
|
740 | |||
744 | /ipython1-client-r3021/docs/examples/<string> in <module>() |
|
741 | /ipython1-client-r3021/docs/examples/<string> in <module>() | |
745 |
|
742 | |||
746 | ZeroDivisionError: integer division or modulo by zero |
|
743 | ZeroDivisionError: integer division or modulo by zero | |
747 |
|
744 | |||
748 | [1:execute]: |
|
745 | [1:execute]: | |
749 | --------------------------------------------------------------------------- |
|
746 | --------------------------------------------------------------------------- | |
750 | ZeroDivisionError Traceback (most recent call last) |
|
747 | ZeroDivisionError Traceback (most recent call last) | |
751 |
|
748 | |||
752 | /ipython1-client-r3021/docs/examples/<string> in <module>() |
|
749 | /ipython1-client-r3021/docs/examples/<string> in <module>() | |
753 |
|
750 | |||
754 | ZeroDivisionError: integer division or modulo by zero |
|
751 | ZeroDivisionError: integer division or modulo by zero | |
755 |
|
752 | |||
756 | [2:execute]: |
|
753 | [2:execute]: | |
757 | --------------------------------------------------------------------------- |
|
754 | --------------------------------------------------------------------------- | |
758 | ZeroDivisionError Traceback (most recent call last) |
|
755 | ZeroDivisionError Traceback (most recent call last) | |
759 |
|
756 | |||
760 | /ipython1-client-r3021/docs/examples/<string> in <module>() |
|
757 | /ipython1-client-r3021/docs/examples/<string> in <module>() | |
761 |
|
758 | |||
762 | ZeroDivisionError: integer division or modulo by zero |
|
759 | ZeroDivisionError: integer division or modulo by zero | |
763 |
|
760 | |||
764 | [3:execute]: |
|
761 | [3:execute]: | |
765 | --------------------------------------------------------------------------- |
|
762 | --------------------------------------------------------------------------- | |
766 | ZeroDivisionError Traceback (most recent call last) |
|
763 | ZeroDivisionError Traceback (most recent call last) | |
767 |
|
764 | |||
768 | /ipython1-client-r3021/docs/examples/<string> in <module>() |
|
765 | /ipython1-client-r3021/docs/examples/<string> in <module>() | |
769 |
|
766 | |||
770 | ZeroDivisionError: integer division or modulo by zero |
|
767 | ZeroDivisionError: integer division or modulo by zero | |
771 |
|
768 | |||
772 |
|
769 | |||
773 | All of this same error handling magic even works in non-blocking mode: |
|
770 | All of this same error handling magic even works in non-blocking mode: | |
774 |
|
771 | |||
775 | .. sourcecode:: ipython |
|
772 | .. sourcecode:: ipython | |
776 |
|
773 | |||
777 | In [83]: rc.block=False |
|
774 | In [83]: rc.block=False | |
778 |
|
775 | |||
779 | In [84]: pr = rc.execute('1/0') |
|
776 | In [84]: pr = rc.execute('1/0') | |
780 |
|
777 | |||
781 | In [85]: pr.get() |
|
778 | In [85]: pr.get() | |
782 | --------------------------------------------------------------------------- |
|
779 | --------------------------------------------------------------------------- | |
783 | CompositeError Traceback (most recent call last) |
|
780 | CompositeError Traceback (most recent call last) | |
784 |
|
781 | |||
785 | /ipython1-client-r3021/docs/examples/<ipython console> in <module>() |
|
782 | /ipython1-client-r3021/docs/examples/<ipython console> in <module>() | |
786 |
|
783 | |||
787 | /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in _get_r(self) |
|
784 | /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in _get_r(self) | |
788 | 170 |
|
785 | 170 | |
789 | 171 def _get_r(self): |
|
786 | 171 def _get_r(self): | |
790 | --> 172 return self.get_result(block=True) |
|
787 | --> 172 return self.get_result(block=True) | |
791 | 173 |
|
788 | 173 | |
792 | 174 r = property(_get_r) |
|
789 | 174 r = property(_get_r) | |
793 |
|
790 | |||
794 | /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in get_result(self, default, block) |
|
791 | /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in get_result(self, default, block) | |
795 | 131 return self.result |
|
792 | 131 return self.result | |
796 | 132 try: |
|
793 | 132 try: | |
797 | --> 133 result = self.client.get_pending_deferred(self.result_id, block) |
|
794 | --> 133 result = self.client.get_pending_deferred(self.result_id, block) | |
798 | 134 except error.ResultNotCompleted: |
|
795 | 134 except error.ResultNotCompleted: | |
799 | 135 return default |
|
796 | 135 return default | |
800 |
|
797 | |||
801 | /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in get_pending_deferred(self, deferredID, block) |
|
798 | /ipython1-client-r3021/ipython1/kernel/multiengineclient.pyc in get_pending_deferred(self, deferredID, block) | |
802 | 385 |
|
799 | 385 | |
803 | 386 def get_pending_deferred(self, deferredID, block): |
|
800 | 386 def get_pending_deferred(self, deferredID, block): | |
804 | --> 387 return blockingCallFromThread(self.smultiengine.get_pending_deferred, deferredID, block) |
|
801 | --> 387 return blockingCallFromThread(self.smultiengine.get_pending_deferred, deferredID, block) | |
805 | 388 |
|
802 | 388 | |
806 | 389 def barrier(self, pendingResults): |
|
803 | 389 def barrier(self, pendingResults): | |
807 |
|
804 | |||
808 | /ipython1-client-r3021/ipython1/kernel/twistedutil.pyc in blockingCallFromThread(f, *a, **kw) |
|
805 | /ipython1-client-r3021/ipython1/kernel/twistedutil.pyc in blockingCallFromThread(f, *a, **kw) | |
809 | 72 result.raiseException() |
|
806 | 72 result.raiseException() | |
810 | 73 except Exception, e: |
|
807 | 73 except Exception, e: | |
811 | ---> 74 raise e |
|
808 | ---> 74 raise e | |
812 | 75 return result |
|
809 | 75 return result | |
813 | 76 |
|
810 | 76 | |
814 |
|
811 | |||
815 | CompositeError: one or more exceptions from call to method: execute |
|
812 | CompositeError: one or more exceptions from call to method: execute | |
816 | [0:execute]: ZeroDivisionError: integer division or modulo by zero |
|
813 | [0:execute]: ZeroDivisionError: integer division or modulo by zero | |
817 | [1:execute]: ZeroDivisionError: integer division or modulo by zero |
|
814 | [1:execute]: ZeroDivisionError: integer division or modulo by zero | |
818 | [2:execute]: ZeroDivisionError: integer division or modulo by zero |
|
815 | [2:execute]: ZeroDivisionError: integer division or modulo by zero | |
819 | [3:execute]: ZeroDivisionError: integer division or modulo by zero |
|
816 | [3:execute]: ZeroDivisionError: integer division or modulo by zero | |
820 |
|
817 | |||
821 |
|
818 |
@@ -1,323 +1,324 b'' | |||||
1 | .. _parallelsecurity: |
|
1 | .. _parallelsecurity: | |
2 |
|
2 | |||
3 | =========================== |
|
3 | =========================== | |
4 | Security details of IPython |
|
4 | Security details of IPython | |
5 | =========================== |
|
5 | =========================== | |
6 |
|
6 | |||
7 | .. note:: |
|
7 | .. note:: | |
8 |
|
8 | |||
9 | This section is not thorough, and IPython.zmq needs a thorough security |
|
9 | This section is not thorough, and IPython.zmq needs a thorough security | |
10 | audit. |
|
10 | audit. | |
11 |
|
11 | |||
12 | IPython's :mod:`IPython.zmq` package exposes the full power of the |
|
12 | IPython's :mod:`IPython.zmq` package exposes the full power of the | |
13 | Python interpreter over a TCP/IP network for the purposes of parallel |
|
13 | Python interpreter over a TCP/IP network for the purposes of parallel | |
14 | computing. This feature brings up the important question of IPython's security |
|
14 | computing. This feature brings up the important question of IPython's security | |
15 | model. This document gives details about this model and how it is implemented |
|
15 | model. This document gives details about this model and how it is implemented | |
16 | in IPython's architecture. |
|
16 | in IPython's architecture. | |
17 |
|
17 | |||
18 | Processs and network topology |
|
18 | Processs and network topology | |
19 | ============================= |
|
19 | ============================= | |
20 |
|
20 | |||
21 | To enable parallel computing, IPython has a number of different processes that |
|
21 | To enable parallel computing, IPython has a number of different processes that | |
22 | run. These processes are discussed at length in the IPython documentation and |
|
22 | run. These processes are discussed at length in the IPython documentation and | |
23 | are summarized here: |
|
23 | are summarized here: | |
24 |
|
24 | |||
25 | * The IPython *engine*. This process is a full blown Python |
|
25 | * The IPython *engine*. This process is a full blown Python | |
26 | interpreter in which user code is executed. Multiple |
|
26 | interpreter in which user code is executed. Multiple | |
27 | engines are started to make parallel computing possible. |
|
27 | engines are started to make parallel computing possible. | |
28 | * The IPython *hub*. This process monitors a set of |
|
28 | * The IPython *hub*. This process monitors a set of | |
29 | engines and schedulers, and keeps track of the state of the processes. It listens |
|
29 | engines and schedulers, and keeps track of the state of the processes. It listens | |
30 | for registration connections from engines and clients, and monitor connections |
|
30 | for registration connections from engines and clients, and monitor connections | |
31 | from schedulers. |
|
31 | from schedulers. | |
32 | * The IPython *schedulers*. This is a set of processes that relay commands and results |
|
32 | * The IPython *schedulers*. This is a set of processes that relay commands and results | |
33 | between clients and engines. They are typically on the same machine as the controller, |
|
33 | between clients and engines. They are typically on the same machine as the controller, | |
34 | and listen for connections from engines and clients, but connect to the Hub. |
|
34 | and listen for connections from engines and clients, but connect to the Hub. | |
35 | * The IPython *client*. This process is typically an |
|
35 | * The IPython *client*. This process is typically an | |
36 | interactive Python process that is used to coordinate the |
|
36 | interactive Python process that is used to coordinate the | |
37 | engines to get a parallel computation done. |
|
37 | engines to get a parallel computation done. | |
38 |
|
38 | |||
39 | Collectively, these processes are called the IPython *kernel*, and the hub and schedulers |
|
39 | Collectively, these processes are called the IPython *kernel*, and the hub and schedulers | |
40 | together are referred to as the *controller*. |
|
40 | together are referred to as the *controller*. | |
41 |
|
41 | |||
42 | .. note:: |
|
42 | .. note:: | |
43 |
|
43 | |||
44 | Are these really still referred to as the Kernel? It doesn't seem so to me. 'cluster' |
|
44 | Are these really still referred to as the Kernel? It doesn't seem so to me. 'cluster' | |
45 | seems more accurate. |
|
45 | seems more accurate. | |
46 |
|
46 | |||
47 | -MinRK |
|
47 | -MinRK | |
48 |
|
48 | |||
49 | These processes communicate over any transport supported by ZeroMQ (tcp,pgm,infiniband,ipc) |
|
49 | These processes communicate over any transport supported by ZeroMQ (tcp,pgm,infiniband,ipc) | |
50 | with a well defined topology. The IPython hub and schedulers listen on sockets. Upon |
|
50 | with a well defined topology. The IPython hub and schedulers listen on sockets. Upon | |
51 | starting, an engine connects to a hub and registers itself, which then informs the engine |
|
51 | starting, an engine connects to a hub and registers itself, which then informs the engine | |
52 | of the connection information for the schedulers, and the engine then connects to the |
|
52 | of the connection information for the schedulers, and the engine then connects to the | |
53 | schedulers. These engine/hub and engine/scheduler connections persist for the |
|
53 | schedulers. These engine/hub and engine/scheduler connections persist for the | |
54 | lifetime of each engine. |
|
54 | lifetime of each engine. | |
55 |
|
55 | |||
56 | The IPython client also connects to the controller processes using a number of socket |
|
56 | The IPython client also connects to the controller processes using a number of socket | |
57 | connections. As of writing, this is one socket per scheduler (4), and 3 connections to the |
|
57 | connections. As of writing, this is one socket per scheduler (4), and 3 connections to the | |
58 | hub for a total of 7. These connections persist for the lifetime of the client only. |
|
58 | hub for a total of 7. These connections persist for the lifetime of the client only. | |
59 |
|
59 | |||
60 | A given IPython controller and set of engines engines typically has a relatively |
|
60 | A given IPython controller and set of engines engines typically has a relatively | |
61 | short lifetime. Typically this lifetime corresponds to the duration of a single parallel |
|
61 | short lifetime. Typically this lifetime corresponds to the duration of a single parallel | |
62 | simulation performed by a single user. Finally, the hub, schedulers, engines, and client |
|
62 | simulation performed by a single user. Finally, the hub, schedulers, engines, and client | |
63 | processes typically execute with the permissions of that same user. More specifically, the |
|
63 | processes typically execute with the permissions of that same user. More specifically, the | |
64 | controller and engines are *not* executed as root or with any other superuser permissions. |
|
64 | controller and engines are *not* executed as root or with any other superuser permissions. | |
65 |
|
65 | |||
66 | Application logic |
|
66 | Application logic | |
67 | ================= |
|
67 | ================= | |
68 |
|
68 | |||
69 | When running the IPython kernel to perform a parallel computation, a user |
|
69 | When running the IPython kernel to perform a parallel computation, a user | |
70 | utilizes the IPython client to send Python commands and data through the |
|
70 | utilizes the IPython client to send Python commands and data through the | |
71 | IPython schedulers to the IPython engines, where those commands are executed |
|
71 | IPython schedulers to the IPython engines, where those commands are executed | |
72 | and the data processed. The design of IPython ensures that the client is the |
|
72 | and the data processed. The design of IPython ensures that the client is the | |
73 | only access point for the capabilities of the engines. That is, the only way |
|
73 | only access point for the capabilities of the engines. That is, the only way | |
74 | of addressing the engines is through a client. |
|
74 | of addressing the engines is through a client. | |
75 |
|
75 | |||
76 | A user can utilize the client to instruct the IPython engines to execute |
|
76 | A user can utilize the client to instruct the IPython engines to execute | |
77 | arbitrary Python commands. These Python commands can include calls to the |
|
77 | arbitrary Python commands. These Python commands can include calls to the | |
78 | system shell, access the filesystem, etc., as required by the user's |
|
78 | system shell, access the filesystem, etc., as required by the user's | |
79 | application code. From this perspective, when a user runs an IPython engine on |
|
79 | application code. From this perspective, when a user runs an IPython engine on | |
80 | a host, that engine has the same capabilities and permissions as the user |
|
80 | a host, that engine has the same capabilities and permissions as the user | |
81 | themselves (as if they were logged onto the engine's host with a terminal). |
|
81 | themselves (as if they were logged onto the engine's host with a terminal). | |
82 |
|
82 | |||
83 | Secure network connections |
|
83 | Secure network connections | |
84 | ========================== |
|
84 | ========================== | |
85 |
|
85 | |||
86 | Overview |
|
86 | Overview | |
87 | -------- |
|
87 | -------- | |
88 |
|
88 | |||
89 | ZeroMQ provides exactly no security. For this reason, users of IPython must be very |
|
89 | ZeroMQ provides exactly no security. For this reason, users of IPython must be very | |
90 | careful in managing connections, because an open TCP/IP socket presents access to |
|
90 | careful in managing connections, because an open TCP/IP socket presents access to | |
91 | arbitrary execution as the user on the engine machines. As a result, the default behavior |
|
91 | arbitrary execution as the user on the engine machines. As a result, the default behavior | |
92 | of controller processes is to only listen for clients on the loopback interface, and the |
|
92 | of controller processes is to only listen for clients on the loopback interface, and the | |
93 | client must establish SSH tunnels to connect to the controller processes. |
|
93 | client must establish SSH tunnels to connect to the controller processes. | |
94 |
|
94 | |||
95 | .. warning:: |
|
95 | .. warning:: | |
96 |
|
96 | |||
97 | If the controller's loopback interface is untrusted, then IPython should be considered |
|
97 | If the controller's loopback interface is untrusted, then IPython should be considered | |
98 | vulnerable, and this extends to the loopback of all connected clients, which have |
|
98 | vulnerable, and this extends to the loopback of all connected clients, which have | |
99 | opened a loopback port that is redirected to the controller's loopback port. |
|
99 | opened a loopback port that is redirected to the controller's loopback port. | |
100 |
|
100 | |||
101 |
|
101 | |||
102 | SSH |
|
102 | SSH | |
103 | --- |
|
103 | --- | |
104 |
|
104 | |||
105 | Since ZeroMQ provides no security, SSH tunnels are the primary source of secure |
|
105 | Since ZeroMQ provides no security, SSH tunnels are the primary source of secure | |
106 | connections. A connector file, such as `ipcontroller-client.json`, will contain |
|
106 | connections. A connector file, such as `ipcontroller-client.json`, will contain | |
107 | information for connecting to the controller, possibly including the address of an |
|
107 | information for connecting to the controller, possibly including the address of an | |
108 | ssh-server through with the client is to tunnel. The Client object then creates tunnels |
|
108 | ssh-server through with the client is to tunnel. The Client object then creates tunnels | |
109 | using either [OpenSSH]_ or [Paramiko]_, depending on the platform. If users do not wish to |
|
109 | using either [OpenSSH]_ or [Paramiko]_, depending on the platform. If users do not wish to | |
110 | use OpenSSH or Paramiko, or the tunneling utilities are insufficient, then they may |
|
110 | use OpenSSH or Paramiko, or the tunneling utilities are insufficient, then they may | |
111 | construct the tunnels themselves, and simply connect clients and engines as if the |
|
111 | construct the tunnels themselves, and simply connect clients and engines as if the | |
112 | controller were on loopback on the connecting machine. |
|
112 | controller were on loopback on the connecting machine. | |
113 |
|
113 | |||
114 | .. note:: |
|
114 | .. note:: | |
115 |
|
115 | |||
116 | There is not currently tunneling available for engines. |
|
116 | There is not currently tunneling available for engines. | |
117 |
|
117 | |||
118 | Authentication |
|
118 | Authentication | |
119 | -------------- |
|
119 | -------------- | |
120 |
|
120 | |||
121 | To protect users of shared machines, an execution key is used to authenticate all messages. |
|
121 | To protect users of shared machines, an execution key is used to authenticate all messages. | |
122 |
|
122 | |||
123 | The Session object that handles the message protocol uses a unique key to verify valid |
|
123 | The Session object that handles the message protocol uses a unique key to verify valid | |
124 | messages. This can be any value specified by the user, but the default behavior is a |
|
124 | messages. This can be any value specified by the user, but the default behavior is a | |
125 | pseudo-random 128-bit number, as generated by `uuid.uuid4()`. This key is checked on every |
|
125 | pseudo-random 128-bit number, as generated by `uuid.uuid4()`. This key is checked on every | |
126 | message everywhere it is unpacked (Controller, Engine, and Client) to ensure that it came |
|
126 | message everywhere it is unpacked (Controller, Engine, and Client) to ensure that it came | |
127 | from an authentic user, and no messages that do not contain this key are acted upon in any |
|
127 | from an authentic user, and no messages that do not contain this key are acted upon in any | |
128 | way. |
|
128 | way. | |
129 |
|
129 | |||
130 | There is exactly one key per cluster - it must be the same everywhere. Typically, the |
|
130 | There is exactly one key per cluster - it must be the same everywhere. Typically, the | |
131 | controller creates this key, and stores it in the private connection files |
|
131 | controller creates this key, and stores it in the private connection files | |
132 | `ipython-{engine|client}.json`. These files are typically stored in the |
|
132 | `ipython-{engine|client}.json`. These files are typically stored in the | |
133 | `~/.ipython/clusterz_<profile>/security` directory, and are maintained as readable only by |
|
133 | `~/.ipython/clusterz_<profile>/security` directory, and are maintained as readable only by | |
134 | the owner, just as is common practice with a user's keys in their `.ssh` directory. |
|
134 | the owner, just as is common practice with a user's keys in their `.ssh` directory. | |
135 |
|
135 | |||
136 | .. warning:: |
|
136 | .. warning:: | |
137 |
|
137 | |||
138 | It is important to note that the key authentication, as emphasized by the use of |
|
138 | It is important to note that the key authentication, as emphasized by the use of | |
139 | a uuid rather than generating a key with a cryptographic library, provides a |
|
139 | a uuid rather than generating a key with a cryptographic library, provides a | |
140 | defense against *accidental* messages more than it does against malicious attacks. |
|
140 | defense against *accidental* messages more than it does against malicious attacks. | |
141 | If loopback is compromised, it would be trivial for an attacker to intercept messages |
|
141 | If loopback is compromised, it would be trivial for an attacker to intercept messages | |
142 | and deduce the key, as there is no encryption. |
|
142 | and deduce the key, as there is no encryption. | |
143 |
|
143 | |||
144 |
|
144 | |||
145 |
|
145 | |||
146 | Specific security vulnerabilities |
|
146 | Specific security vulnerabilities | |
147 | ================================= |
|
147 | ================================= | |
148 |
|
148 | |||
149 | There are a number of potential security vulnerabilities present in IPython's |
|
149 | There are a number of potential security vulnerabilities present in IPython's | |
150 | architecture. In this section we discuss those vulnerabilities and detail how |
|
150 | architecture. In this section we discuss those vulnerabilities and detail how | |
151 | the security architecture described above prevents them from being exploited. |
|
151 | the security architecture described above prevents them from being exploited. | |
152 |
|
152 | |||
153 | Unauthorized clients |
|
153 | Unauthorized clients | |
154 | -------------------- |
|
154 | -------------------- | |
155 |
|
155 | |||
156 | The IPython client can instruct the IPython engines to execute arbitrary |
|
156 | The IPython client can instruct the IPython engines to execute arbitrary | |
157 | Python code with the permissions of the user who started the engines. If an |
|
157 | Python code with the permissions of the user who started the engines. If an | |
158 | attacker were able to connect their own hostile IPython client to the IPython |
|
158 | attacker were able to connect their own hostile IPython client to the IPython | |
159 | controller, they could instruct the engines to execute code. |
|
159 | controller, they could instruct the engines to execute code. | |
160 |
|
160 | |||
161 |
|
161 | |||
162 | On the first level, this attack is prevented by requiring access to the controller's |
|
162 | On the first level, this attack is prevented by requiring access to the controller's | |
163 | ports, which are recommended to only be open on loopback if the controller is on an |
|
163 | ports, which are recommended to only be open on loopback if the controller is on an | |
164 | untrusted local network. If the attacker does have access to the Controller's ports, then |
|
164 | untrusted local network. If the attacker does have access to the Controller's ports, then | |
165 | the attack is prevented by the capabilities based client authentication of the execution |
|
165 | the attack is prevented by the capabilities based client authentication of the execution | |
166 | key. The relevant authentication information is encoded into the JSON file that clients |
|
166 | key. The relevant authentication information is encoded into the JSON file that clients | |
167 | must present to gain access to the IPython controller. By limiting the distribution of |
|
167 | must present to gain access to the IPython controller. By limiting the distribution of | |
168 | those keys, a user can grant access to only authorized persons, just as with SSH keys. |
|
168 | those keys, a user can grant access to only authorized persons, just as with SSH keys. | |
169 |
|
169 | |||
170 | It is highly unlikely that an execution key could be guessed by an attacker |
|
170 | It is highly unlikely that an execution key could be guessed by an attacker | |
171 | in a brute force guessing attack. A given instance of the IPython controller |
|
171 | in a brute force guessing attack. A given instance of the IPython controller | |
172 | only runs for a relatively short amount of time (on the order of hours). Thus |
|
172 | only runs for a relatively short amount of time (on the order of hours). Thus | |
173 | an attacker would have only a limited amount of time to test a search space of |
|
173 | an attacker would have only a limited amount of time to test a search space of | |
174 | size 2**128. |
|
174 | size 2**128. | |
175 |
|
175 | |||
176 | .. warning:: |
|
176 | .. warning:: | |
177 |
|
177 | |||
178 | If the attacker has gained enough access to intercept loopback connections on |
|
178 | If the attacker has gained enough access to intercept loopback connections on | |
179 | *either* the controller or client, then the key is easily deduced from network |
|
179 | *either* the controller or client, then the key is easily deduced from network | |
180 | traffic. |
|
180 | traffic. | |
181 |
|
181 | |||
182 |
|
182 | |||
183 | Unauthorized engines |
|
183 | Unauthorized engines | |
184 | -------------------- |
|
184 | -------------------- | |
185 |
|
185 | |||
186 | If an attacker were able to connect a hostile engine to a user's controller, |
|
186 | If an attacker were able to connect a hostile engine to a user's controller, | |
187 | the user might unknowingly send sensitive code or data to the hostile engine. |
|
187 | the user might unknowingly send sensitive code or data to the hostile engine. | |
188 | This attacker's engine would then have full access to that code and data. |
|
188 | This attacker's engine would then have full access to that code and data. | |
189 |
|
189 | |||
190 | This type of attack is prevented in the same way as the unauthorized client |
|
190 | This type of attack is prevented in the same way as the unauthorized client | |
191 | attack, through the usage of the capabilities based authentication scheme. |
|
191 | attack, through the usage of the capabilities based authentication scheme. | |
192 |
|
192 | |||
193 | Unauthorized controllers |
|
193 | Unauthorized controllers | |
194 | ------------------------ |
|
194 | ------------------------ | |
195 |
|
195 | |||
196 | It is also possible that an attacker could try to convince a user's IPython |
|
196 | It is also possible that an attacker could try to convince a user's IPython | |
197 | client or engine to connect to a hostile IPython controller. That controller |
|
197 | client or engine to connect to a hostile IPython controller. That controller | |
198 | would then have full access to the code and data sent between the IPython |
|
198 | would then have full access to the code and data sent between the IPython | |
199 | client and the IPython engines. |
|
199 | client and the IPython engines. | |
200 |
|
200 | |||
201 | Again, this attack is prevented through the capabilities in a connection file, which |
|
201 | Again, this attack is prevented through the capabilities in a connection file, which | |
202 | ensure that a client or engine connects to the correct controller. It is also important to |
|
202 | ensure that a client or engine connects to the correct controller. It is also important to | |
203 | note that the connection files also encode the IP address and port that the controller is |
|
203 | note that the connection files also encode the IP address and port that the controller is | |
204 | listening on, so there is little chance of mistakenly connecting to a controller running |
|
204 | listening on, so there is little chance of mistakenly connecting to a controller running | |
205 | on a different IP address and port. |
|
205 | on a different IP address and port. | |
206 |
|
206 | |||
207 | When starting an engine or client, a user must specify the key to use |
|
207 | When starting an engine or client, a user must specify the key to use | |
208 | for that connection. Thus, in order to introduce a hostile controller, the |
|
208 | for that connection. Thus, in order to introduce a hostile controller, the | |
209 | attacker must convince the user to use the key associated with the |
|
209 | attacker must convince the user to use the key associated with the | |
210 | hostile controller. As long as a user is diligent in only using keys from |
|
210 | hostile controller. As long as a user is diligent in only using keys from | |
211 | trusted sources, this attack is not possible. |
|
211 | trusted sources, this attack is not possible. | |
212 |
|
212 | |||
213 | .. note:: |
|
213 | .. note:: | |
214 |
|
214 | |||
215 | I may be wrong, the unauthorized controller may be easier to fake than this. |
|
215 | I may be wrong, the unauthorized controller may be easier to fake than this. | |
216 |
|
216 | |||
217 | Other security measures |
|
217 | Other security measures | |
218 | ======================= |
|
218 | ======================= | |
219 |
|
219 | |||
220 | A number of other measures are taken to further limit the security risks |
|
220 | A number of other measures are taken to further limit the security risks | |
221 | involved in running the IPython kernel. |
|
221 | involved in running the IPython kernel. | |
222 |
|
222 | |||
223 | First, by default, the IPython controller listens on random port numbers. |
|
223 | First, by default, the IPython controller listens on random port numbers. | |
224 | While this can be overridden by the user, in the default configuration, an |
|
224 | While this can be overridden by the user, in the default configuration, an | |
225 | attacker would have to do a port scan to even find a controller to attack. |
|
225 | attacker would have to do a port scan to even find a controller to attack. | |
226 | When coupled with the relatively short running time of a typical controller |
|
226 | When coupled with the relatively short running time of a typical controller | |
227 | (on the order of hours), an attacker would have to work extremely hard and |
|
227 | (on the order of hours), an attacker would have to work extremely hard and | |
228 | extremely *fast* to even find a running controller to attack. |
|
228 | extremely *fast* to even find a running controller to attack. | |
229 |
|
229 | |||
230 | Second, much of the time, especially when run on supercomputers or clusters, |
|
230 | Second, much of the time, especially when run on supercomputers or clusters, | |
231 | the controller is running behind a firewall. Thus, for engines or client to |
|
231 | the controller is running behind a firewall. Thus, for engines or client to | |
232 | connect to the controller: |
|
232 | connect to the controller: | |
233 |
|
233 | |||
234 | * The different processes have to all be behind the firewall. |
|
234 | * The different processes have to all be behind the firewall. | |
235 |
|
235 | |||
236 | or: |
|
236 | or: | |
237 |
|
237 | |||
238 | * The user has to use SSH port forwarding to tunnel the |
|
238 | * The user has to use SSH port forwarding to tunnel the | |
239 | connections through the firewall. |
|
239 | connections through the firewall. | |
240 |
|
240 | |||
241 | In either case, an attacker is presented with additional barriers that prevent |
|
241 | In either case, an attacker is presented with additional barriers that prevent | |
242 | attacking or even probing the system. |
|
242 | attacking or even probing the system. | |
243 |
|
243 | |||
244 | Summary |
|
244 | Summary | |
245 | ======= |
|
245 | ======= | |
246 |
|
246 | |||
247 | IPython's architecture has been carefully designed with security in mind. The |
|
247 | IPython's architecture has been carefully designed with security in mind. The | |
248 | capabilities based authentication model, in conjunction with SSH tunneled |
|
248 | capabilities based authentication model, in conjunction with SSH tunneled | |
249 | TCP/IP channels, address the core potential vulnerabilities in the system, |
|
249 | TCP/IP channels, address the core potential vulnerabilities in the system, | |
250 | while still enabling user's to use the system in open networks. |
|
250 | while still enabling user's to use the system in open networks. | |
251 |
|
251 | |||
252 | Other questions |
|
252 | Other questions | |
253 | =============== |
|
253 | =============== | |
254 |
|
254 | |||
255 | .. note:: |
|
255 | .. note:: | |
256 |
|
256 | |||
257 | this does not apply to ZMQ, but I am sure there will be questions. |
|
257 | this does not apply to ZMQ, but I am sure there will be questions. | |
258 |
|
258 | |||
259 | About keys |
|
259 | About keys | |
260 | ---------- |
|
260 | ---------- | |
261 |
|
261 | |||
262 | Can you clarify the roles of the certificate and its keys versus the FURL, |
|
262 | Can you clarify the roles of the certificate and its keys versus the FURL, | |
263 | which is also called a key? |
|
263 | which is also called a key? | |
264 |
|
264 | |||
265 | The certificate created by IPython processes is a standard public key x509 |
|
265 | The certificate created by IPython processes is a standard public key x509 | |
266 | certificate, that is used by the SSL handshake protocol to setup encrypted |
|
266 | certificate, that is used by the SSL handshake protocol to setup encrypted | |
267 | channel between the controller and the IPython engine or client. This public |
|
267 | channel between the controller and the IPython engine or client. This public | |
268 | and private key associated with this certificate are used only by the SSL |
|
268 | and private key associated with this certificate are used only by the SSL | |
269 | handshake protocol in setting up this encrypted channel. |
|
269 | handshake protocol in setting up this encrypted channel. | |
270 |
|
270 | |||
271 | The FURL serves a completely different and independent purpose from the |
|
271 | The FURL serves a completely different and independent purpose from the | |
272 | key pair associated with the certificate. When we refer to a FURL as a |
|
272 | key pair associated with the certificate. When we refer to a FURL as a | |
273 | key, we are using the word "key" in the capabilities based security model |
|
273 | key, we are using the word "key" in the capabilities based security model | |
274 | sense. This has nothing to do with "key" in the public/private key sense used |
|
274 | sense. This has nothing to do with "key" in the public/private key sense used | |
275 | in the SSL protocol. |
|
275 | in the SSL protocol. | |
276 |
|
276 | |||
277 | With that said the FURL is used as an cryptographic key, to grant |
|
277 | With that said the FURL is used as an cryptographic key, to grant | |
278 | IPython engines and clients access to particular capabilities that the |
|
278 | IPython engines and clients access to particular capabilities that the | |
279 | controller offers. |
|
279 | controller offers. | |
280 |
|
280 | |||
281 | Self signed certificates |
|
281 | Self signed certificates | |
282 | ------------------------ |
|
282 | ------------------------ | |
283 |
|
283 | |||
284 | Is the controller creating a self-signed certificate? Is this created for per |
|
284 | Is the controller creating a self-signed certificate? Is this created for per | |
285 | instance/session, one-time-setup or each-time the controller is started? |
|
285 | instance/session, one-time-setup or each-time the controller is started? | |
286 |
|
286 | |||
287 | The Foolscap network protocol, which handles the SSL protocol details, creates |
|
287 | The Foolscap network protocol, which handles the SSL protocol details, creates | |
288 | a self-signed x509 certificate using OpenSSL for each IPython process. The |
|
288 | a self-signed x509 certificate using OpenSSL for each IPython process. The | |
289 | lifetime of the certificate is handled differently for the IPython controller |
|
289 | lifetime of the certificate is handled differently for the IPython controller | |
290 | and the engines/client. |
|
290 | and the engines/client. | |
291 |
|
291 | |||
292 | For the IPython engines and client, the certificate is only held in memory for |
|
292 | For the IPython engines and client, the certificate is only held in memory for | |
293 | the lifetime of its process. It is never written to disk. |
|
293 | the lifetime of its process. It is never written to disk. | |
294 |
|
294 | |||
295 | For the controller, the certificate can be created anew each time the |
|
295 | For the controller, the certificate can be created anew each time the | |
296 | controller starts or it can be created once and reused each time the |
|
296 | controller starts or it can be created once and reused each time the | |
297 | controller starts. If at any point, the certificate is deleted, a new one is |
|
297 | controller starts. If at any point, the certificate is deleted, a new one is | |
298 | created the next time the controller starts. |
|
298 | created the next time the controller starts. | |
299 |
|
299 | |||
300 | SSL private key |
|
300 | SSL private key | |
301 | --------------- |
|
301 | --------------- | |
302 |
|
302 | |||
303 | How the private key (associated with the certificate) is distributed? |
|
303 | How the private key (associated with the certificate) is distributed? | |
304 |
|
304 | |||
305 | In the usual implementation of the SSL protocol, the private key is never |
|
305 | In the usual implementation of the SSL protocol, the private key is never | |
306 | distributed. We follow this standard always. |
|
306 | distributed. We follow this standard always. | |
307 |
|
307 | |||
308 | SSL versus Foolscap authentication |
|
308 | SSL versus Foolscap authentication | |
309 | ---------------------------------- |
|
309 | ---------------------------------- | |
310 |
|
310 | |||
311 | Many SSL connections only perform one sided authentication (the server to the |
|
311 | Many SSL connections only perform one sided authentication (the server to the | |
312 | client). How is the client authentication in IPython's system related to SSL |
|
312 | client). How is the client authentication in IPython's system related to SSL | |
313 | authentication? |
|
313 | authentication? | |
314 |
|
314 | |||
315 | We perform a two way SSL handshake in which both parties request and verify |
|
315 | We perform a two way SSL handshake in which both parties request and verify | |
316 | the certificate of their peer. This mutual authentication is handled by the |
|
316 | the certificate of their peer. This mutual authentication is handled by the | |
317 | SSL handshake and is separate and independent from the additional |
|
317 | SSL handshake and is separate and independent from the additional | |
318 | authentication steps that the CLIENT and SERVER perform after an encrypted |
|
318 | authentication steps that the CLIENT and SERVER perform after an encrypted | |
319 | channel is established. |
|
319 | channel is established. | |
320 |
|
320 | |||
321 | .. [RFC5246] <http://tools.ietf.org/html/rfc5246> |
|
321 | .. [RFC5246] <http://tools.ietf.org/html/rfc5246> | |
322 |
|
322 | |||
323 |
|
323 | .. [OpenSSH] <http://www.openssh.com/> | ||
|
324 | .. [Paramiko] <http://www.lag.net/paramiko/> |
@@ -1,132 +1,395 b'' | |||||
1 | .. _paralleltask: |
|
1 | .. _paralleltask: | |
2 |
|
2 | |||
3 | ========================== |
|
3 | ========================== | |
4 | The IPython task interface |
|
4 | The IPython task interface | |
5 | ========================== |
|
5 | ========================== | |
6 |
|
6 | |||
7 |
The task interface to the c |
|
7 | The task interface to the cluster presents the engines as a fault tolerant, | |
8 | dynamic load-balanced system of workers. Unlike the multiengine interface, in |
|
8 | dynamic load-balanced system of workers. Unlike the multiengine interface, in | |
9 |
the task interface |
|
9 | the task interface the user have no direct access to individual engines. By | |
10 |
allowing the IPython scheduler to assign work, this interface is |
|
10 | allowing the IPython scheduler to assign work, this interface is simultaneously | |
11 | and more powerful. |
|
11 | simpler and more powerful. | |
12 |
|
12 | |||
13 | Best of all the user can use both of these interfaces running at the same time |
|
13 | Best of all, the user can use both of these interfaces running at the same time | |
14 | to take advantage of their respective strengths. When the user can break up |
|
14 | to take advantage of their respective strengths. When the user can break up | |
15 | the user's work into segments that do not depend on previous execution, the |
|
15 | the user's work into segments that do not depend on previous execution, the | |
16 | task interface is ideal. But it also has more power and flexibility, allowing |
|
16 | task interface is ideal. But it also has more power and flexibility, allowing | |
17 | the user to guide the distribution of jobs, without having to assign tasks to |
|
17 | the user to guide the distribution of jobs, without having to assign tasks to | |
18 | engines explicitly. |
|
18 | engines explicitly. | |
19 |
|
19 | |||
20 | Starting the IPython controller and engines |
|
20 | Starting the IPython controller and engines | |
21 | =========================================== |
|
21 | =========================================== | |
22 |
|
22 | |||
23 | To follow along with this tutorial, you will need to start the IPython |
|
23 | To follow along with this tutorial, you will need to start the IPython | |
24 | controller and four IPython engines. The simplest way of doing this is to use |
|
24 | controller and four IPython engines. The simplest way of doing this is to use | |
25 | the :command:`ipclusterz` command:: |
|
25 | the :command:`ipclusterz` command:: | |
26 |
|
26 | |||
27 | $ ipclusterz start -n 4 |
|
27 | $ ipclusterz start -n 4 | |
28 |
|
28 | |||
29 | For more detailed information about starting the controller and engines, see |
|
29 | For more detailed information about starting the controller and engines, see | |
30 | our :ref:`introduction <ip1par>` to using IPython for parallel computing. |
|
30 | our :ref:`introduction <ip1par>` to using IPython for parallel computing. | |
31 |
|
31 | |||
32 | Creating a ``Client`` instance |
|
32 | Creating a ``Client`` instance | |
33 | ============================== |
|
33 | ============================== | |
34 |
|
34 | |||
35 | The first step is to import the IPython :mod:`IPython.zmq.parallel.client` |
|
35 | The first step is to import the IPython :mod:`IPython.zmq.parallel.client` | |
36 | module and then create a :class:`.Client` instance: |
|
36 | module and then create a :class:`.Client` instance: | |
37 |
|
37 | |||
38 | .. sourcecode:: ipython |
|
38 | .. sourcecode:: ipython | |
39 |
|
39 | |||
40 | In [1]: from IPython.zmq.parallel import client |
|
40 | In [1]: from IPython.zmq.parallel import client | |
41 |
|
41 | |||
42 | In [2]: rc = client.Client() |
|
42 | In [2]: rc = client.Client() | |
43 |
|
43 | |||
44 | In [3]: lview = rc[None] |
|
44 | In [3]: lview = rc[None] | |
45 | Out[3]: <LoadBalancedView tcp://127.0.0.1:10101> |
|
45 | Out[3]: <LoadBalancedView tcp://127.0.0.1:10101> | |
46 |
|
46 | |||
47 |
|
47 | |||
48 | This form assumes that the controller was started on localhost with default |
|
48 | This form assumes that the controller was started on localhost with default | |
49 | configuration. If not, the location of the controller must be given as an |
|
49 | configuration. If not, the location of the controller must be given as an | |
50 | argument to the constructor: |
|
50 | argument to the constructor: | |
51 |
|
51 | |||
52 | .. sourcecode:: ipython |
|
52 | .. sourcecode:: ipython | |
53 |
|
53 | |||
54 | # for a visible LAN controller listening on an external port: |
|
54 | # for a visible LAN controller listening on an external port: | |
55 | In [2]: rc = client.Client('tcp://192.168.1.16:10101') |
|
55 | In [2]: rc = client.Client('tcp://192.168.1.16:10101') | |
56 | # for a remote controller at my.server.com listening on localhost: |
|
56 | # for a remote controller at my.server.com listening on localhost: | |
57 | In [3]: rc = client.Client(sshserver='my.server.com') |
|
57 | In [3]: rc = client.Client(sshserver='my.server.com') | |
58 |
|
58 | |||
59 |
|
59 | |||
60 |
|
60 | |||
61 | Quick and easy parallelism |
|
61 | Quick and easy parallelism | |
62 | ========================== |
|
62 | ========================== | |
63 |
|
63 | |||
64 | In many cases, you simply want to apply a Python function to a sequence of |
|
64 | In many cases, you simply want to apply a Python function to a sequence of | |
65 | objects, but *in parallel*. Like the multiengine interface, these can be |
|
65 | objects, but *in parallel*. Like the multiengine interface, these can be | |
66 | implemented via the task interface. The exact same tools can perform these |
|
66 | implemented via the task interface. The exact same tools can perform these | |
67 | actions in load-balanced ways as well as multiplexed ways: a parallel version |
|
67 | actions in load-balanced ways as well as multiplexed ways: a parallel version | |
68 | of :func:`map` and :func:`@parallel` function decorator. If one specifies the |
|
68 | of :func:`map` and :func:`@parallel` function decorator. If one specifies the | |
69 | argument `targets=None`, then they are dynamically load balanced. Thus, if the |
|
69 | argument `targets=None`, then they are dynamically load balanced. Thus, if the | |
70 | execution time per item varies significantly, you should use the versions in |
|
70 | execution time per item varies significantly, you should use the versions in | |
71 | the task interface. |
|
71 | the task interface. | |
72 |
|
72 | |||
73 | Parallel map |
|
73 | Parallel map | |
74 | ------------ |
|
74 | ------------ | |
75 |
|
75 | |||
76 | To load-balance :meth:`map`,simply use a LoadBalancedView, created by asking |
|
76 | To load-balance :meth:`map`,simply use a LoadBalancedView, created by asking | |
77 | for the ``None`` element: |
|
77 | for the ``None`` element: | |
78 |
|
78 | |||
79 | .. sourcecode:: ipython |
|
79 | .. sourcecode:: ipython | |
80 |
|
80 | |||
81 | In [63]: serial_result = map(lambda x:x**10, range(32)) |
|
81 | In [63]: serial_result = map(lambda x:x**10, range(32)) | |
82 |
|
82 | |||
83 | In [64]: parallel_result = tc[None].map(lambda x:x**10, range(32)) |
|
83 | In [64]: parallel_result = tc[None].map(lambda x:x**10, range(32)) | |
84 |
|
84 | |||
85 | In [65]: serial_result==parallel_result |
|
85 | In [65]: serial_result==parallel_result | |
86 | Out[65]: True |
|
86 | Out[65]: True | |
87 |
|
87 | |||
88 | Parallel function decorator |
|
88 | Parallel function decorator | |
89 | --------------------------- |
|
89 | --------------------------- | |
90 |
|
90 | |||
91 | Parallel functions are just like normal function, but they can be called on |
|
91 | Parallel functions are just like normal function, but they can be called on | |
92 | sequences and *in parallel*. The multiengine interface provides a decorator |
|
92 | sequences and *in parallel*. The multiengine interface provides a decorator | |
93 | that turns any Python function into a parallel function: |
|
93 | that turns any Python function into a parallel function: | |
94 |
|
94 | |||
95 | .. sourcecode:: ipython |
|
95 | .. sourcecode:: ipython | |
96 |
|
96 | |||
97 | In [10]: @lview.parallel() |
|
97 | In [10]: @lview.parallel() | |
98 | ....: def f(x): |
|
98 | ....: def f(x): | |
99 | ....: return 10.0*x**4 |
|
99 | ....: return 10.0*x**4 | |
100 |
....: |
|
100 | ....: | |
101 |
|
101 | |||
102 | In [11]: f.map(range(32)) # this is done in parallel |
|
102 | In [11]: f.map(range(32)) # this is done in parallel | |
103 | Out[11]: [0.0,10.0,160.0,...] |
|
103 | Out[11]: [0.0,10.0,160.0,...] | |
104 |
|
104 | |||
|
105 | Dependencies | |||
|
106 | ============ | |||
|
107 | ||||
|
108 | Often, pure atomic load-balancing is too primitive for your work. In these cases, you | |||
|
109 | may want to associate some kind of `Dependency` that describes when, where, or whether | |||
|
110 | a task can be run. In IPython, we provide two types of dependencies: | |||
|
111 | `Functional Dependencies`_ and `Graph Dependencies`_ | |||
|
112 | ||||
|
113 | .. note:: | |||
|
114 | ||||
|
115 | It is important to note that the pure ZeroMQ scheduler does not support dependencies, | |||
|
116 | and you will see errors or warnings if you try to use dependencies with the pure | |||
|
117 | scheduler. | |||
|
118 | ||||
|
119 | Functional Dependencies | |||
|
120 | ----------------------- | |||
|
121 | ||||
|
122 | Functional dependencies are used to determine whether a given engine is capable of running | |||
|
123 | a particular task. This is implemented via a special :class:`Exception` class, | |||
|
124 | :class:`UnmetDependency`, found in `IPython.zmq.parallel.error`. Its use is very simple: | |||
|
125 | if a task fails with an UnmetDependency exception, then the scheduler, instead of relaying | |||
|
126 | the error up to the client like any other error, catches the error, and submits the task | |||
|
127 | to a different engine. This will repeat indefinitely, and a task will never be submitted | |||
|
128 | to a given engine a second time. | |||
|
129 | ||||
|
130 | You can manually raise the :class:`UnmetDependency` yourself, but IPython has provided | |||
|
131 | some decorators for facilitating this behavior. | |||
|
132 | ||||
|
133 | There are two decorators and a class used for functional dependencies: | |||
|
134 | ||||
|
135 | .. sourcecode:: ipython | |||
|
136 | ||||
|
137 | In [9]: from IPython.zmq.parallel.dependency import depend, require, dependent | |||
|
138 | ||||
|
139 | @require | |||
|
140 | ******** | |||
|
141 | ||||
|
142 | The simplest sort of dependency is requiring that a Python module is available. The | |||
|
143 | ``@require`` decorator lets you define a function that will only run on engines where names | |||
|
144 | you specify are importable: | |||
|
145 | ||||
|
146 | .. sourcecode:: ipython | |||
|
147 | ||||
|
148 | In [10]: @require('numpy', 'zmq') | |||
|
149 | ...: def myfunc(): | |||
|
150 | ...: import numpy,zmq | |||
|
151 | ...: return dostuff() | |||
|
152 | ||||
|
153 | Now, any time you apply :func:`myfunc`, the task will only run on a machine that has | |||
|
154 | numpy and pyzmq available. | |||
|
155 | ||||
|
156 | @depend | |||
|
157 | ******* | |||
|
158 | ||||
|
159 | The ``@depend`` decorator lets you decorate any function with any *other* function to | |||
|
160 | evaluate the dependency. The dependency function will be called at the start of the task, | |||
|
161 | and if it returns ``False``, then the dependency will be considered unmet, and the task | |||
|
162 | will be assigned to another engine. If the dependency returns *anything other than | |||
|
163 | ``False``*, the rest of the task will continue. | |||
|
164 | ||||
|
165 | .. sourcecode:: ipython | |||
|
166 | ||||
|
167 | In [10]: def platform_specific(plat): | |||
|
168 | ...: import sys | |||
|
169 | ...: return sys.platform == plat | |||
|
170 | ||||
|
171 | In [11]: @depend(platform_specific, 'darwin') | |||
|
172 | ...: def mactask(): | |||
|
173 | ...: do_mac_stuff() | |||
|
174 | ||||
|
175 | In [12]: @depend(platform_specific, 'nt') | |||
|
176 | ...: def wintask(): | |||
|
177 | ...: do_windows_stuff() | |||
|
178 | ||||
|
179 | In this case, any time you apply ``mytask``, it will only run on an OSX machine. | |||
|
180 | ``@depend`` is just like ``apply``, in that it has a ``@depend(f,*args,**kwargs)`` | |||
|
181 | signature. | |||
|
182 | ||||
|
183 | dependents | |||
|
184 | ********** | |||
|
185 | ||||
|
186 | You don't have to use the decorators on your tasks, if for instance you may want | |||
|
187 | to run tasks with a single function but varying dependencies, you can directly construct | |||
|
188 | the :class:`dependent` object that the decorators use: | |||
|
189 | ||||
|
190 | .. sourcecode::ipython | |||
|
191 | ||||
|
192 | In [13]: def mytask(*args): | |||
|
193 | ...: dostuff() | |||
|
194 | ||||
|
195 | In [14]: mactask = dependent(mytask, platform_specific, 'darwin') | |||
|
196 | # this is the same as decorating the declaration of mytask with @depend | |||
|
197 | # but you can do it again: | |||
|
198 | ||||
|
199 | In [15]: wintask = dependent(mytask, platform_specific, 'nt') | |||
|
200 | ||||
|
201 | # in general: | |||
|
202 | In [16]: t = dependent(f, g, *dargs, **dkwargs) | |||
|
203 | ||||
|
204 | # is equivalent to: | |||
|
205 | In [17]: @depend(g, *dargs, **dkwargs) | |||
|
206 | ...: def t(a,b,c): | |||
|
207 | ...: # contents of f | |||
|
208 | ||||
|
209 | Graph Dependencies | |||
|
210 | ------------------ | |||
|
211 | ||||
|
212 | Sometimes you want to restrict the time and/or location to run a given task as a function | |||
|
213 | of the time and/or location of other tasks. This is implemented via a subclass of | |||
|
214 | :class:`set`, called a :class:`Dependency`. A Dependency is just a set of `msg_ids` | |||
|
215 | corresponding to tasks, and a few attributes to guide how to decide when the Dependency | |||
|
216 | has been met. | |||
|
217 | ||||
|
218 | The switches we provide for interpreting whether a given dependency set has been met: | |||
|
219 | ||||
|
220 | any|all | |||
|
221 | Whether the dependency is considered met if *any* of the dependencies are done, or | |||
|
222 | only after *all* of them have finished. This is set by a Dependency's :attr:`all` | |||
|
223 | boolean attribute, which defaults to ``True``. | |||
|
224 | ||||
|
225 | success_only | |||
|
226 | Whether to consider only tasks that did not raise an error as being fulfilled. | |||
|
227 | Sometimes you want to run a task after another, but only if that task succeeded. In | |||
|
228 | this case, ``success_only`` should be ``True``. However sometimes you may not care | |||
|
229 | whether the task succeeds, and always want the second task to run, in which case | |||
|
230 | you should use `success_only=False`. The default behavior is to only use successes. | |||
|
231 | ||||
|
232 | There are other switches for interpretation that are made at the *task* level. These are | |||
|
233 | specified via keyword arguments to the client's :meth:`apply` method. | |||
|
234 | ||||
|
235 | after,follow | |||
|
236 | You may want to run a task *after* a given set of dependencies have been run and/or | |||
|
237 | run it *where* another set of dependencies are met. To support this, every task has an | |||
|
238 | `after` dependency to restrict time, and a `follow` dependency to restrict | |||
|
239 | destination. | |||
|
240 | ||||
|
241 | timeout | |||
|
242 | You may also want to set a time-limit for how long the scheduler should wait before a | |||
|
243 | task's dependencies are met. This is done via a `timeout`, which defaults to 0, which | |||
|
244 | indicates that the task should never timeout. If the timeout is reached, and the | |||
|
245 | scheduler still hasn't been able to assign the task to an engine, the task will fail | |||
|
246 | with a :class:`DependencyTimeout`. | |||
|
247 | ||||
|
248 | .. note:: | |||
|
249 | ||||
|
250 | Dependencies only work within the task scheduler. You cannot instruct a load-balanced | |||
|
251 | task to run after a job submitted via the MUX interface. | |||
|
252 | ||||
|
253 | The simplest form of Dependencies is with `all=True,success_only=True`. In these cases, | |||
|
254 | you can skip using Dependency objects, and just pass msg_ids or AsyncResult objects as the | |||
|
255 | `follow` and `after` keywords to :meth:`client.apply`: | |||
|
256 | ||||
|
257 | .. sourcecode:: ipython | |||
|
258 | ||||
|
259 | In [14]: client.block=False | |||
|
260 | ||||
|
261 | In [15]: ar = client.apply(f, args, kwargs, targets=None) | |||
|
262 | ||||
|
263 | In [16]: ar2 = client.apply(f2, targets=None) | |||
|
264 | ||||
|
265 | In [17]: ar3 = client.apply(f3, after=[ar,ar2]) | |||
|
266 | ||||
|
267 | In [17]: ar4 = client.apply(f3, follow=[ar], timeout=2.5) | |||
|
268 | ||||
|
269 | ||||
|
270 | .. seealso:: | |||
|
271 | ||||
|
272 | Some parallel workloads can be described as a `Directed Acyclic Graph | |||
|
273 | <http://en.wikipedia.org/wiki/Directed_acyclic_graph>`_, or DAG. See :ref:`DAG | |||
|
274 | Dependencies <dag_dependencies>` for an example demonstrating how to use map a NetworkX DAG | |||
|
275 | onto task dependencies. | |||
|
276 | ||||
|
277 | ||||
|
278 | ||||
|
279 | Impossible Dependencies | |||
|
280 | *********************** | |||
|
281 | ||||
|
282 | The schedulers do perform some analysis on graph dependencies to determine whether they | |||
|
283 | are not possible to be met. If the scheduler does discover that a dependency cannot be | |||
|
284 | met, then the task will fail with an :class:`ImpossibleDependency` error. This way, if the | |||
|
285 | scheduler realized that a task can never be run, it won't sit indefinitely in the | |||
|
286 | scheduler clogging the pipeline. | |||
|
287 | ||||
|
288 | The basic cases that are checked: | |||
|
289 | ||||
|
290 | * depending on nonexistent messages | |||
|
291 | * `follow` dependencies were run on more than one machine and `all=True` | |||
|
292 | * any dependencies failed and `all=True,success_only=True` | |||
|
293 | * all dependencies failed and `all=False,success_only=True` | |||
|
294 | ||||
|
295 | .. warning:: | |||
|
296 | ||||
|
297 | This analysis has not been proven to be rigorous, so it is likely possible for tasks | |||
|
298 | to become impossible to run in obscure situations, so a timeout may be a good choice. | |||
|
299 | ||||
|
300 | Schedulers | |||
|
301 | ========== | |||
|
302 | ||||
|
303 | There are a variety of valid ways to determine where jobs should be assigned in a | |||
|
304 | load-balancing situation. In IPython, we support several standard schemes, and | |||
|
305 | even make it easy to define your own. The scheme can be selected via the ``--scheme`` | |||
|
306 | argument to :command:`ipcontrollerz`, or in the :attr:`HubFactory.scheme` attribute | |||
|
307 | of a controller config object. | |||
|
308 | ||||
|
309 | The built-in routing schemes: | |||
|
310 | ||||
|
311 | lru: Least Recently Used | |||
|
312 | ||||
|
313 | Always assign work to the least-recently-used engine. A close relative of | |||
|
314 | round-robin, it will be fair with respect to the number of tasks, agnostic | |||
|
315 | with respect to runtime of each task. | |||
|
316 | ||||
|
317 | plainrandom: Plain Random | |||
|
318 | Randomly picks an engine on which to run. | |||
|
319 | ||||
|
320 | twobin: Two-Bin Random | |||
|
321 | ||||
|
322 | **Depends on numpy** | |||
|
323 | ||||
|
324 | Pick two engines at random, and use the LRU of the two. This is known to be better | |||
|
325 | than plain random in many cases, but requires a small amount of computation. | |||
|
326 | ||||
|
327 | leastload: Least Load | |||
|
328 | ||||
|
329 | **This is the default scheme** | |||
|
330 | ||||
|
331 | Always assign tasks to the engine with the fewest outstanding tasks (LRU breaks tie). | |||
|
332 | ||||
|
333 | weighted: Weighted Two-Bin Random | |||
|
334 | ||||
|
335 | **Depends on numpy** | |||
|
336 | ||||
|
337 | Pick two engines at random using the number of outstanding tasks as inverse weights, | |||
|
338 | and use the one with the lower load. | |||
|
339 | ||||
|
340 | ||||
|
341 | Pure ZMQ Scheduler | |||
|
342 | ------------------ | |||
|
343 | ||||
|
344 | For maximum throughput, the 'pure' scheme is not Python at all, but a C-level | |||
|
345 | :class:`MonitoredQueue` from PyZMQ, which uses a ZeroMQ ``XREQ`` socket to perform all | |||
|
346 | load-balancing. This scheduler does not support any of the advanced features of the Python | |||
|
347 | :class:`.Scheduler`. | |||
|
348 | ||||
|
349 | Disabled features when using the ZMQ Scheduler: | |||
|
350 | ||||
|
351 | * Engine unregistration | |||
|
352 | Task farming will be disabled if an engine unregisters. | |||
|
353 | Further, if an engine is unregistered during computation, the scheduler may not recover. | |||
|
354 | * Dependencies | |||
|
355 | Since there is no Python logic inside the Scheduler, routing decisions cannot be made | |||
|
356 | based on message content. | |||
|
357 | * Early destination notification | |||
|
358 | The Python schedulers know which engine gets which task, and notify the Hub. This | |||
|
359 | allows graceful handling of Engines coming and going. There is no way to know | |||
|
360 | where ZeroMQ messages have gone, so there is no way to know what tasks are on which | |||
|
361 | engine until they *finish*. This makes recovery from engine shutdown very difficult. | |||
|
362 | ||||
|
363 | ||||
|
364 | .. note:: | |||
|
365 | ||||
|
366 | TODO: performance comparisons | |||
|
367 | ||||
|
368 | ||||
105 | More details |
|
369 | More details | |
106 | ============ |
|
370 | ============ | |
107 |
|
371 | |||
108 | The :class:`Client` has many more powerful features that allow quite a bit |
|
372 | The :class:`Client` has many more powerful features that allow quite a bit | |
109 | of flexibility in how tasks are defined and run. The next places to look are |
|
373 | of flexibility in how tasks are defined and run. The next places to look are | |
110 | in the following classes: |
|
374 | in the following classes: | |
111 |
|
375 | |||
112 | * :class:`IPython.zmq.parallel.client.Client` |
|
376 | * :class:`IPython.zmq.parallel.client.Client` | |
113 | * :class:`IPython.zmq.parallel.client.AsyncResult` |
|
377 | * :class:`IPython.zmq.parallel.client.AsyncResult` | |
114 | * :meth:`IPython.zmq.parallel.client.Client.apply` |
|
378 | * :meth:`IPython.zmq.parallel.client.Client.apply` | |
115 | * :mod:`IPython.zmq.parallel.dependency` |
|
379 | * :mod:`IPython.zmq.parallel.dependency` | |
116 |
|
380 | |||
117 | The following is an overview of how to use these classes together: |
|
381 | The following is an overview of how to use these classes together: | |
118 |
|
382 | |||
119 | 1. Create a :class:`Client`. |
|
383 | 1. Create a :class:`Client`. | |
120 | 2. Define some functions to be run as tasks |
|
384 | 2. Define some functions to be run as tasks | |
121 | 3. Submit your tasks to using the :meth:`apply` method of your |
|
385 | 3. Submit your tasks to using the :meth:`apply` method of your | |
122 | :class:`Client` instance, specifying `targets=None`. This signals |
|
386 | :class:`Client` instance, specifying `targets=None`. This signals | |
123 | the :class:`Client` to entrust the Scheduler with assigning tasks to engines. |
|
387 | the :class:`Client` to entrust the Scheduler with assigning tasks to engines. | |
124 | 4. Use :meth:`Client.get_results` to get the results of the |
|
388 | 4. Use :meth:`Client.get_results` to get the results of the | |
125 | tasks, or use the :meth:`AsyncResult.get` method of the results to wait |
|
389 | tasks, or use the :meth:`AsyncResult.get` method of the results to wait | |
126 | for and then receive the results. |
|
390 | for and then receive the results. | |
127 |
|
391 | |||
128 | We are in the process of developing more detailed information about the task |
|
|||
129 | interface. For now, the docstrings of the :meth:`Client.apply`, |
|
|||
130 | and :func:`depend` methods should be consulted. |
|
|||
131 |
|
392 | |||
|
393 | .. seealso:: | |||
132 |
|
394 | |||
|
395 | A demo of :ref:`DAG Dependencies <dag_dependencies>` with NetworkX and IPython. |
@@ -1,337 +1,337 b'' | |||||
1 | ============================================ |
|
1 | ============================================ | |
2 | Getting started with Windows HPC Server 2008 |
|
2 | Getting started with Windows HPC Server 2008 | |
3 | ============================================ |
|
3 | ============================================ | |
4 |
|
4 | |||
5 | .. note:: |
|
5 | .. note:: | |
6 |
|
6 | |||
7 | Not adapted to zmq yet |
|
7 | Not adapted to zmq yet | |
8 |
|
8 | |||
9 | Introduction |
|
9 | Introduction | |
10 | ============ |
|
10 | ============ | |
11 |
|
11 | |||
12 | The Python programming language is an increasingly popular language for |
|
12 | The Python programming language is an increasingly popular language for | |
13 | numerical computing. This is due to a unique combination of factors. First, |
|
13 | numerical computing. This is due to a unique combination of factors. First, | |
14 | Python is a high-level and *interactive* language that is well matched to |
|
14 | Python is a high-level and *interactive* language that is well matched to | |
15 | interactive numerical work. Second, it is easy (often times trivial) to |
|
15 | interactive numerical work. Second, it is easy (often times trivial) to | |
16 | integrate legacy C/C++/Fortran code into Python. Third, a large number of |
|
16 | integrate legacy C/C++/Fortran code into Python. Third, a large number of | |
17 | high-quality open source projects provide all the needed building blocks for |
|
17 | high-quality open source projects provide all the needed building blocks for | |
18 | numerical computing: numerical arrays (NumPy), algorithms (SciPy), 2D/3D |
|
18 | numerical computing: numerical arrays (NumPy), algorithms (SciPy), 2D/3D | |
19 | Visualization (Matplotlib, Mayavi, Chaco), Symbolic Mathematics (Sage, Sympy) |
|
19 | Visualization (Matplotlib, Mayavi, Chaco), Symbolic Mathematics (Sage, Sympy) | |
20 | and others. |
|
20 | and others. | |
21 |
|
21 | |||
22 | The IPython project is a core part of this open-source toolchain and is |
|
22 | The IPython project is a core part of this open-source toolchain and is | |
23 | focused on creating a comprehensive environment for interactive and |
|
23 | focused on creating a comprehensive environment for interactive and | |
24 | exploratory computing in the Python programming language. It enables all of |
|
24 | exploratory computing in the Python programming language. It enables all of | |
25 | the above tools to be used interactively and consists of two main components: |
|
25 | the above tools to be used interactively and consists of two main components: | |
26 |
|
26 | |||
27 | * An enhanced interactive Python shell with support for interactive plotting |
|
27 | * An enhanced interactive Python shell with support for interactive plotting | |
28 | and visualization. |
|
28 | and visualization. | |
29 | * An architecture for interactive parallel computing. |
|
29 | * An architecture for interactive parallel computing. | |
30 |
|
30 | |||
31 | With these components, it is possible to perform all aspects of a parallel |
|
31 | With these components, it is possible to perform all aspects of a parallel | |
32 | computation interactively. This type of workflow is particularly relevant in |
|
32 | computation interactively. This type of workflow is particularly relevant in | |
33 | scientific and numerical computing where algorithms, code and data are |
|
33 | scientific and numerical computing where algorithms, code and data are | |
34 | continually evolving as the user/developer explores a problem. The broad |
|
34 | continually evolving as the user/developer explores a problem. The broad | |
35 | treads in computing (commodity clusters, multicore, cloud computing, etc.) |
|
35 | treads in computing (commodity clusters, multicore, cloud computing, etc.) | |
36 | make these capabilities of IPython particularly relevant. |
|
36 | make these capabilities of IPython particularly relevant. | |
37 |
|
37 | |||
38 | While IPython is a cross platform tool, it has particularly strong support for |
|
38 | While IPython is a cross platform tool, it has particularly strong support for | |
39 | Windows based compute clusters running Windows HPC Server 2008. This document |
|
39 | Windows based compute clusters running Windows HPC Server 2008. This document | |
40 | describes how to get started with IPython on Windows HPC Server 2008. The |
|
40 | describes how to get started with IPython on Windows HPC Server 2008. The | |
41 | content and emphasis here is practical: installing IPython, configuring |
|
41 | content and emphasis here is practical: installing IPython, configuring | |
42 | IPython to use the Windows job scheduler and running example parallel programs |
|
42 | IPython to use the Windows job scheduler and running example parallel programs | |
43 | interactively. A more complete description of IPython's parallel computing |
|
43 | interactively. A more complete description of IPython's parallel computing | |
44 | capabilities can be found in IPython's online documentation |
|
44 | capabilities can be found in IPython's online documentation | |
45 | (http://ipython.scipy.org/moin/Documentation). |
|
45 | (http://ipython.scipy.org/moin/Documentation). | |
46 |
|
46 | |||
47 | Setting up your Windows cluster |
|
47 | Setting up your Windows cluster | |
48 | =============================== |
|
48 | =============================== | |
49 |
|
49 | |||
50 | This document assumes that you already have a cluster running Windows |
|
50 | This document assumes that you already have a cluster running Windows | |
51 | HPC Server 2008. Here is a broad overview of what is involved with setting up |
|
51 | HPC Server 2008. Here is a broad overview of what is involved with setting up | |
52 | such a cluster: |
|
52 | such a cluster: | |
53 |
|
53 | |||
54 | 1. Install Windows Server 2008 on the head and compute nodes in the cluster. |
|
54 | 1. Install Windows Server 2008 on the head and compute nodes in the cluster. | |
55 | 2. Setup the network configuration on each host. Each host should have a |
|
55 | 2. Setup the network configuration on each host. Each host should have a | |
56 | static IP address. |
|
56 | static IP address. | |
57 | 3. On the head node, activate the "Active Directory Domain Services" role |
|
57 | 3. On the head node, activate the "Active Directory Domain Services" role | |
58 | and make the head node the domain controller. |
|
58 | and make the head node the domain controller. | |
59 | 4. Join the compute nodes to the newly created Active Directory (AD) domain. |
|
59 | 4. Join the compute nodes to the newly created Active Directory (AD) domain. | |
60 | 5. Setup user accounts in the domain with shared home directories. |
|
60 | 5. Setup user accounts in the domain with shared home directories. | |
61 | 6. Install the HPC Pack 2008 on the head node to create a cluster. |
|
61 | 6. Install the HPC Pack 2008 on the head node to create a cluster. | |
62 | 7. Install the HPC Pack 2008 on the compute nodes. |
|
62 | 7. Install the HPC Pack 2008 on the compute nodes. | |
63 |
|
63 | |||
64 | More details about installing and configuring Windows HPC Server 2008 can be |
|
64 | More details about installing and configuring Windows HPC Server 2008 can be | |
65 | found on the Windows HPC Home Page (http://www.microsoft.com/hpc). Regardless |
|
65 | found on the Windows HPC Home Page (http://www.microsoft.com/hpc). Regardless | |
66 | of what steps you follow to set up your cluster, the remainder of this |
|
66 | of what steps you follow to set up your cluster, the remainder of this | |
67 | document will assume that: |
|
67 | document will assume that: | |
68 |
|
68 | |||
69 | * There are domain users that can log on to the AD domain and submit jobs |
|
69 | * There are domain users that can log on to the AD domain and submit jobs | |
70 | to the cluster scheduler. |
|
70 | to the cluster scheduler. | |
71 | * These domain users have shared home directories. While shared home |
|
71 | * These domain users have shared home directories. While shared home | |
72 | directories are not required to use IPython, they make it much easier to |
|
72 | directories are not required to use IPython, they make it much easier to | |
73 | use IPython. |
|
73 | use IPython. | |
74 |
|
74 | |||
75 | Installation of IPython and its dependencies |
|
75 | Installation of IPython and its dependencies | |
76 | ============================================ |
|
76 | ============================================ | |
77 |
|
77 | |||
78 | IPython and all of its dependencies are freely available and open source. |
|
78 | IPython and all of its dependencies are freely available and open source. | |
79 | These packages provide a powerful and cost-effective approach to numerical and |
|
79 | These packages provide a powerful and cost-effective approach to numerical and | |
80 | scientific computing on Windows. The following dependencies are needed to run |
|
80 | scientific computing on Windows. The following dependencies are needed to run | |
81 | IPython on Windows: |
|
81 | IPython on Windows: | |
82 |
|
82 | |||
83 | * Python 2.5 or 2.6 (http://www.python.org) |
|
83 | * Python 2.5 or 2.6 (http://www.python.org) | |
84 | * pywin32 (http://sourceforge.net/projects/pywin32/) |
|
84 | * pywin32 (http://sourceforge.net/projects/pywin32/) | |
85 | * PyReadline (https://launchpad.net/pyreadline) |
|
85 | * PyReadline (https://launchpad.net/pyreadline) | |
86 | * zope.interface and Twisted (http://twistedmatrix.com) |
|
86 | * zope.interface and Twisted (http://twistedmatrix.com) | |
87 | * Foolcap (http://foolscap.lothar.com/trac) |
|
87 | * Foolcap (http://foolscap.lothar.com/trac) | |
88 | * pyOpenSSL (https://launchpad.net/pyopenssl) |
|
88 | * pyOpenSSL (https://launchpad.net/pyopenssl) | |
89 | * IPython (http://ipython.scipy.org) |
|
89 | * IPython (http://ipython.scipy.org) | |
90 |
|
90 | |||
91 | In addition, the following dependencies are needed to run the demos described |
|
91 | In addition, the following dependencies are needed to run the demos described | |
92 | in this document. |
|
92 | in this document. | |
93 |
|
93 | |||
94 | * NumPy and SciPy (http://www.scipy.org) |
|
94 | * NumPy and SciPy (http://www.scipy.org) | |
95 | * wxPython (http://www.wxpython.org) |
|
95 | * wxPython (http://www.wxpython.org) | |
96 | * Matplotlib (http://matplotlib.sourceforge.net/) |
|
96 | * Matplotlib (http://matplotlib.sourceforge.net/) | |
97 |
|
97 | |||
98 | The easiest way of obtaining these dependencies is through the Enthought |
|
98 | The easiest way of obtaining these dependencies is through the Enthought | |
99 | Python Distribution (EPD) (http://www.enthought.com/products/epd.php). EPD is |
|
99 | Python Distribution (EPD) (http://www.enthought.com/products/epd.php). EPD is | |
100 | produced by Enthought, Inc. and contains all of these packages and others in a |
|
100 | produced by Enthought, Inc. and contains all of these packages and others in a | |
101 | single installer and is available free for academic users. While it is also |
|
101 | single installer and is available free for academic users. While it is also | |
102 | possible to download and install each package individually, this is a tedious |
|
102 | possible to download and install each package individually, this is a tedious | |
103 | process. Thus, we highly recommend using EPD to install these packages on |
|
103 | process. Thus, we highly recommend using EPD to install these packages on | |
104 | Windows. |
|
104 | Windows. | |
105 |
|
105 | |||
106 | Regardless of how you install the dependencies, here are the steps you will |
|
106 | Regardless of how you install the dependencies, here are the steps you will | |
107 | need to follow: |
|
107 | need to follow: | |
108 |
|
108 | |||
109 | 1. Install all of the packages listed above, either individually or using EPD |
|
109 | 1. Install all of the packages listed above, either individually or using EPD | |
110 | on the head node, compute nodes and user workstations. |
|
110 | on the head node, compute nodes and user workstations. | |
111 |
|
111 | |||
112 | 2. Make sure that :file:`C:\\Python25` and :file:`C:\\Python25\\Scripts` are |
|
112 | 2. Make sure that :file:`C:\\Python25` and :file:`C:\\Python25\\Scripts` are | |
113 | in the system :envvar:`%PATH%` variable on each node. |
|
113 | in the system :envvar:`%PATH%` variable on each node. | |
114 |
|
114 | |||
115 | 3. Install the latest development version of IPython. This can be done by |
|
115 | 3. Install the latest development version of IPython. This can be done by | |
116 | downloading the the development version from the IPython website |
|
116 | downloading the the development version from the IPython website | |
117 | (http://ipython.scipy.org) and following the installation instructions. |
|
117 | (http://ipython.scipy.org) and following the installation instructions. | |
118 |
|
118 | |||
119 | Further details about installing IPython or its dependencies can be found in |
|
119 | Further details about installing IPython or its dependencies can be found in | |
120 | the online IPython documentation (http://ipython.scipy.org/moin/Documentation) |
|
120 | the online IPython documentation (http://ipython.scipy.org/moin/Documentation) | |
121 | Once you are finished with the installation, you can try IPython out by |
|
121 | Once you are finished with the installation, you can try IPython out by | |
122 | opening a Windows Command Prompt and typing ``ipython``. This will |
|
122 | opening a Windows Command Prompt and typing ``ipython``. This will | |
123 | start IPython's interactive shell and you should see something like the |
|
123 | start IPython's interactive shell and you should see something like the | |
124 | following screenshot: |
|
124 | following screenshot: | |
125 |
|
125 | |||
126 | .. image:: ipython_shell.* |
|
126 | .. image:: ../parallel/ipython_shell.* | |
127 |
|
127 | |||
128 | Starting an IPython cluster |
|
128 | Starting an IPython cluster | |
129 | =========================== |
|
129 | =========================== | |
130 |
|
130 | |||
131 | To use IPython's parallel computing capabilities, you will need to start an |
|
131 | To use IPython's parallel computing capabilities, you will need to start an | |
132 | IPython cluster. An IPython cluster consists of one controller and multiple |
|
132 | IPython cluster. An IPython cluster consists of one controller and multiple | |
133 | engines: |
|
133 | engines: | |
134 |
|
134 | |||
135 | IPython controller |
|
135 | IPython controller | |
136 | The IPython controller manages the engines and acts as a gateway between |
|
136 | The IPython controller manages the engines and acts as a gateway between | |
137 | the engines and the client, which runs in the user's interactive IPython |
|
137 | the engines and the client, which runs in the user's interactive IPython | |
138 | session. The controller is started using the :command:`ipcontroller` |
|
138 | session. The controller is started using the :command:`ipcontroller` | |
139 | command. |
|
139 | command. | |
140 |
|
140 | |||
141 | IPython engine |
|
141 | IPython engine | |
142 | IPython engines run a user's Python code in parallel on the compute nodes. |
|
142 | IPython engines run a user's Python code in parallel on the compute nodes. | |
143 | Engines are starting using the :command:`ipengine` command. |
|
143 | Engines are starting using the :command:`ipengine` command. | |
144 |
|
144 | |||
145 | Once these processes are started, a user can run Python code interactively and |
|
145 | Once these processes are started, a user can run Python code interactively and | |
146 | in parallel on the engines from within the IPython shell using an appropriate |
|
146 | in parallel on the engines from within the IPython shell using an appropriate | |
147 | client. This includes the ability to interact with, plot and visualize data |
|
147 | client. This includes the ability to interact with, plot and visualize data | |
148 | from the engines. |
|
148 | from the engines. | |
149 |
|
149 | |||
150 | IPython has a command line program called :command:`ipclusterz` that automates |
|
150 | IPython has a command line program called :command:`ipclusterz` that automates | |
151 | all aspects of starting the controller and engines on the compute nodes. |
|
151 | all aspects of starting the controller and engines on the compute nodes. | |
152 | :command:`ipclusterz` has full support for the Windows HPC job scheduler, |
|
152 | :command:`ipclusterz` has full support for the Windows HPC job scheduler, | |
153 | meaning that :command:`ipclusterz` can use this job scheduler to start the |
|
153 | meaning that :command:`ipclusterz` can use this job scheduler to start the | |
154 | controller and engines. In our experience, the Windows HPC job scheduler is |
|
154 | controller and engines. In our experience, the Windows HPC job scheduler is | |
155 | particularly well suited for interactive applications, such as IPython. Once |
|
155 | particularly well suited for interactive applications, such as IPython. Once | |
156 | :command:`ipclusterz` is configured properly, a user can start an IPython |
|
156 | :command:`ipclusterz` is configured properly, a user can start an IPython | |
157 | cluster from their local workstation almost instantly, without having to log |
|
157 | cluster from their local workstation almost instantly, without having to log | |
158 | on to the head node (as is typically required by Unix based job schedulers). |
|
158 | on to the head node (as is typically required by Unix based job schedulers). | |
159 | This enables a user to move seamlessly between serial and parallel |
|
159 | This enables a user to move seamlessly between serial and parallel | |
160 | computations. |
|
160 | computations. | |
161 |
|
161 | |||
162 | In this section we show how to use :command:`ipclusterz` to start an IPython |
|
162 | In this section we show how to use :command:`ipclusterz` to start an IPython | |
163 | cluster using the Windows HPC Server 2008 job scheduler. To make sure that |
|
163 | cluster using the Windows HPC Server 2008 job scheduler. To make sure that | |
164 | :command:`ipclusterz` is installed and working properly, you should first try |
|
164 | :command:`ipclusterz` is installed and working properly, you should first try | |
165 | to start an IPython cluster on your local host. To do this, open a Windows |
|
165 | to start an IPython cluster on your local host. To do this, open a Windows | |
166 | Command Prompt and type the following command:: |
|
166 | Command Prompt and type the following command:: | |
167 |
|
167 | |||
168 | ipclusterz start -n 2 |
|
168 | ipclusterz start -n 2 | |
169 |
|
169 | |||
170 | You should see a number of messages printed to the screen, ending with |
|
170 | You should see a number of messages printed to the screen, ending with | |
171 | "IPython cluster: started". The result should look something like the following |
|
171 | "IPython cluster: started". The result should look something like the following | |
172 | screenshot: |
|
172 | screenshot: | |
173 |
|
173 | |||
174 |
.. image:: ipcluster |
|
174 | .. image:: ../parallel/ipcluster_start.* | |
175 |
|
175 | |||
176 | At this point, the controller and two engines are running on your local host. |
|
176 | At this point, the controller and two engines are running on your local host. | |
177 | This configuration is useful for testing and for situations where you want to |
|
177 | This configuration is useful for testing and for situations where you want to | |
178 | take advantage of multiple cores on your local computer. |
|
178 | take advantage of multiple cores on your local computer. | |
179 |
|
179 | |||
180 | Now that we have confirmed that :command:`ipclusterz` is working properly, we |
|
180 | Now that we have confirmed that :command:`ipclusterz` is working properly, we | |
181 | describe how to configure and run an IPython cluster on an actual compute |
|
181 | describe how to configure and run an IPython cluster on an actual compute | |
182 | cluster running Windows HPC Server 2008. Here is an outline of the needed |
|
182 | cluster running Windows HPC Server 2008. Here is an outline of the needed | |
183 | steps: |
|
183 | steps: | |
184 |
|
184 | |||
185 | 1. Create a cluster profile using: ``ipclusterz create -p mycluster`` |
|
185 | 1. Create a cluster profile using: ``ipclusterz create -p mycluster`` | |
186 |
|
186 | |||
187 | 2. Edit configuration files in the directory :file:`.ipython\\cluster_mycluster` |
|
187 | 2. Edit configuration files in the directory :file:`.ipython\\cluster_mycluster` | |
188 |
|
188 | |||
189 | 3. Start the cluster using: ``ipcluser start -p mycluster -n 32`` |
|
189 | 3. Start the cluster using: ``ipcluser start -p mycluster -n 32`` | |
190 |
|
190 | |||
191 | Creating a cluster profile |
|
191 | Creating a cluster profile | |
192 | -------------------------- |
|
192 | -------------------------- | |
193 |
|
193 | |||
194 | In most cases, you will have to create a cluster profile to use IPython on a |
|
194 | In most cases, you will have to create a cluster profile to use IPython on a | |
195 | cluster. A cluster profile is a name (like "mycluster") that is associated |
|
195 | cluster. A cluster profile is a name (like "mycluster") that is associated | |
196 | with a particular cluster configuration. The profile name is used by |
|
196 | with a particular cluster configuration. The profile name is used by | |
197 | :command:`ipclusterz` when working with the cluster. |
|
197 | :command:`ipclusterz` when working with the cluster. | |
198 |
|
198 | |||
199 | Associated with each cluster profile is a cluster directory. This cluster |
|
199 | Associated with each cluster profile is a cluster directory. This cluster | |
200 | directory is a specially named directory (typically located in the |
|
200 | directory is a specially named directory (typically located in the | |
201 | :file:`.ipython` subdirectory of your home directory) that contains the |
|
201 | :file:`.ipython` subdirectory of your home directory) that contains the | |
202 | configuration files for a particular cluster profile, as well as log files and |
|
202 | configuration files for a particular cluster profile, as well as log files and | |
203 | security keys. The naming convention for cluster directories is: |
|
203 | security keys. The naming convention for cluster directories is: | |
204 | :file:`cluster_<profile name>`. Thus, the cluster directory for a profile named |
|
204 | :file:`cluster_<profile name>`. Thus, the cluster directory for a profile named | |
205 | "foo" would be :file:`.ipython\\cluster_foo`. |
|
205 | "foo" would be :file:`.ipython\\cluster_foo`. | |
206 |
|
206 | |||
207 | To create a new cluster profile (named "mycluster") and the associated cluster |
|
207 | To create a new cluster profile (named "mycluster") and the associated cluster | |
208 | directory, type the following command at the Windows Command Prompt:: |
|
208 | directory, type the following command at the Windows Command Prompt:: | |
209 |
|
209 | |||
210 | ipclusterz create -p mycluster |
|
210 | ipclusterz create -p mycluster | |
211 |
|
211 | |||
212 | The output of this command is shown in the screenshot below. Notice how |
|
212 | The output of this command is shown in the screenshot below. Notice how | |
213 | :command:`ipclusterz` prints out the location of the newly created cluster |
|
213 | :command:`ipclusterz` prints out the location of the newly created cluster | |
214 | directory. |
|
214 | directory. | |
215 |
|
215 | |||
216 |
.. image:: ipcluster |
|
216 | .. image:: ../parallel/ipcluster_create.* | |
217 |
|
217 | |||
218 | Configuring a cluster profile |
|
218 | Configuring a cluster profile | |
219 | ----------------------------- |
|
219 | ----------------------------- | |
220 |
|
220 | |||
221 | Next, you will need to configure the newly created cluster profile by editing |
|
221 | Next, you will need to configure the newly created cluster profile by editing | |
222 | the following configuration files in the cluster directory: |
|
222 | the following configuration files in the cluster directory: | |
223 |
|
223 | |||
224 | * :file:`ipclusterz_config.py` |
|
224 | * :file:`ipclusterz_config.py` | |
225 | * :file:`ipcontroller_config.py` |
|
225 | * :file:`ipcontroller_config.py` | |
226 | * :file:`ipengine_config.py` |
|
226 | * :file:`ipengine_config.py` | |
227 |
|
227 | |||
228 | When :command:`ipclusterz` is run, these configuration files are used to |
|
228 | When :command:`ipclusterz` is run, these configuration files are used to | |
229 | determine how the engines and controller will be started. In most cases, |
|
229 | determine how the engines and controller will be started. In most cases, | |
230 | you will only have to set a few of the attributes in these files. |
|
230 | you will only have to set a few of the attributes in these files. | |
231 |
|
231 | |||
232 | To configure :command:`ipclusterz` to use the Windows HPC job scheduler, you |
|
232 | To configure :command:`ipclusterz` to use the Windows HPC job scheduler, you | |
233 | will need to edit the following attributes in the file |
|
233 | will need to edit the following attributes in the file | |
234 | :file:`ipclusterz_config.py`:: |
|
234 | :file:`ipclusterz_config.py`:: | |
235 |
|
235 | |||
236 | # Set these at the top of the file to tell ipclusterz to use the |
|
236 | # Set these at the top of the file to tell ipclusterz to use the | |
237 | # Windows HPC job scheduler. |
|
237 | # Windows HPC job scheduler. | |
238 | c.Global.controller_launcher = \ |
|
238 | c.Global.controller_launcher = \ | |
239 | 'IPython.zmq.parallel.launcher.WindowsHPCControllerLauncher' |
|
239 | 'IPython.zmq.parallel.launcher.WindowsHPCControllerLauncher' | |
240 | c.Global.engine_launcher = \ |
|
240 | c.Global.engine_launcher = \ | |
241 | 'IPython.zmq.parallel.launcher.WindowsHPCEngineSetLauncher' |
|
241 | 'IPython.zmq.parallel.launcher.WindowsHPCEngineSetLauncher' | |
242 |
|
242 | |||
243 | # Set these to the host name of the scheduler (head node) of your cluster. |
|
243 | # Set these to the host name of the scheduler (head node) of your cluster. | |
244 | c.WindowsHPCControllerLauncher.scheduler = 'HEADNODE' |
|
244 | c.WindowsHPCControllerLauncher.scheduler = 'HEADNODE' | |
245 | c.WindowsHPCEngineSetLauncher.scheduler = 'HEADNODE' |
|
245 | c.WindowsHPCEngineSetLauncher.scheduler = 'HEADNODE' | |
246 |
|
246 | |||
247 | There are a number of other configuration attributes that can be set, but |
|
247 | There are a number of other configuration attributes that can be set, but | |
248 | in most cases these will be sufficient to get you started. |
|
248 | in most cases these will be sufficient to get you started. | |
249 |
|
249 | |||
250 | .. warning:: |
|
250 | .. warning:: | |
251 | If any of your configuration attributes involve specifying the location |
|
251 | If any of your configuration attributes involve specifying the location | |
252 | of shared directories or files, you must make sure that you use UNC paths |
|
252 | of shared directories or files, you must make sure that you use UNC paths | |
253 | like :file:`\\\\host\\share`. It is also important that you specify |
|
253 | like :file:`\\\\host\\share`. It is also important that you specify | |
254 | these paths using raw Python strings: ``r'\\host\share'`` to make sure |
|
254 | these paths using raw Python strings: ``r'\\host\share'`` to make sure | |
255 | that the backslashes are properly escaped. |
|
255 | that the backslashes are properly escaped. | |
256 |
|
256 | |||
257 | Starting the cluster profile |
|
257 | Starting the cluster profile | |
258 | ---------------------------- |
|
258 | ---------------------------- | |
259 |
|
259 | |||
260 | Once a cluster profile has been configured, starting an IPython cluster using |
|
260 | Once a cluster profile has been configured, starting an IPython cluster using | |
261 | the profile is simple:: |
|
261 | the profile is simple:: | |
262 |
|
262 | |||
263 | ipclusterz start -p mycluster -n 32 |
|
263 | ipclusterz start -p mycluster -n 32 | |
264 |
|
264 | |||
265 | The ``-n`` option tells :command:`ipclusterz` how many engines to start (in |
|
265 | The ``-n`` option tells :command:`ipclusterz` how many engines to start (in | |
266 | this case 32). Stopping the cluster is as simple as typing Control-C. |
|
266 | this case 32). Stopping the cluster is as simple as typing Control-C. | |
267 |
|
267 | |||
268 | Using the HPC Job Manager |
|
268 | Using the HPC Job Manager | |
269 | ------------------------- |
|
269 | ------------------------- | |
270 |
|
270 | |||
271 | When ``ipclusterz start`` is run the first time, :command:`ipclusterz` creates |
|
271 | When ``ipclusterz start`` is run the first time, :command:`ipclusterz` creates | |
272 | two XML job description files in the cluster directory: |
|
272 | two XML job description files in the cluster directory: | |
273 |
|
273 | |||
274 | * :file:`ipcontroller_job.xml` |
|
274 | * :file:`ipcontroller_job.xml` | |
275 | * :file:`ipengineset_job.xml` |
|
275 | * :file:`ipengineset_job.xml` | |
276 |
|
276 | |||
277 | Once these files have been created, they can be imported into the HPC Job |
|
277 | Once these files have been created, they can be imported into the HPC Job | |
278 | Manager application. Then, the controller and engines for that profile can be |
|
278 | Manager application. Then, the controller and engines for that profile can be | |
279 | started using the HPC Job Manager directly, without using :command:`ipclusterz`. |
|
279 | started using the HPC Job Manager directly, without using :command:`ipclusterz`. | |
280 | However, anytime the cluster profile is re-configured, ``ipclusterz start`` |
|
280 | However, anytime the cluster profile is re-configured, ``ipclusterz start`` | |
281 | must be run again to regenerate the XML job description files. The |
|
281 | must be run again to regenerate the XML job description files. The | |
282 | following screenshot shows what the HPC Job Manager interface looks like |
|
282 | following screenshot shows what the HPC Job Manager interface looks like | |
283 | with a running IPython cluster. |
|
283 | with a running IPython cluster. | |
284 |
|
284 | |||
285 | .. image:: hpc_job_manager.* |
|
285 | .. image:: ../parallel/hpc_job_manager.* | |
286 |
|
286 | |||
287 | Performing a simple interactive parallel computation |
|
287 | Performing a simple interactive parallel computation | |
288 | ==================================================== |
|
288 | ==================================================== | |
289 |
|
289 | |||
290 | Once you have started your IPython cluster, you can start to use it. To do |
|
290 | Once you have started your IPython cluster, you can start to use it. To do | |
291 | this, open up a new Windows Command Prompt and start up IPython's interactive |
|
291 | this, open up a new Windows Command Prompt and start up IPython's interactive | |
292 | shell by typing:: |
|
292 | shell by typing:: | |
293 |
|
293 | |||
294 | ipython |
|
294 | ipython | |
295 |
|
295 | |||
296 | Then you can create a :class:`MultiEngineClient` instance for your profile and |
|
296 | Then you can create a :class:`MultiEngineClient` instance for your profile and | |
297 | use the resulting instance to do a simple interactive parallel computation. In |
|
297 | use the resulting instance to do a simple interactive parallel computation. In | |
298 | the code and screenshot that follows, we take a simple Python function and |
|
298 | the code and screenshot that follows, we take a simple Python function and | |
299 | apply it to each element of an array of integers in parallel using the |
|
299 | apply it to each element of an array of integers in parallel using the | |
300 | :meth:`MultiEngineClient.map` method: |
|
300 | :meth:`MultiEngineClient.map` method: | |
301 |
|
301 | |||
302 | .. sourcecode:: ipython |
|
302 | .. sourcecode:: ipython | |
303 |
|
303 | |||
304 | In [1]: from IPython.zmq.parallel.client import * |
|
304 | In [1]: from IPython.zmq.parallel.client import * | |
305 |
|
305 | |||
306 | In [2]: c = MultiEngineClient(profile='mycluster') |
|
306 | In [2]: c = MultiEngineClient(profile='mycluster') | |
307 |
|
307 | |||
308 | In [3]: mec.get_ids() |
|
308 | In [3]: mec.get_ids() | |
309 | Out[3]: [0, 1, 2, 3, 4, 5, 67, 8, 9, 10, 11, 12, 13, 14] |
|
309 | Out[3]: [0, 1, 2, 3, 4, 5, 67, 8, 9, 10, 11, 12, 13, 14] | |
310 |
|
310 | |||
311 | In [4]: def f(x): |
|
311 | In [4]: def f(x): | |
312 | ...: return x**10 |
|
312 | ...: return x**10 | |
313 |
|
313 | |||
314 | In [5]: mec.map(f, range(15)) # f is applied in parallel |
|
314 | In [5]: mec.map(f, range(15)) # f is applied in parallel | |
315 | Out[5]: |
|
315 | Out[5]: | |
316 | [0, |
|
316 | [0, | |
317 | 1, |
|
317 | 1, | |
318 | 1024, |
|
318 | 1024, | |
319 | 59049, |
|
319 | 59049, | |
320 | 1048576, |
|
320 | 1048576, | |
321 | 9765625, |
|
321 | 9765625, | |
322 | 60466176, |
|
322 | 60466176, | |
323 | 282475249, |
|
323 | 282475249, | |
324 | 1073741824, |
|
324 | 1073741824, | |
325 | 3486784401L, |
|
325 | 3486784401L, | |
326 | 10000000000L, |
|
326 | 10000000000L, | |
327 | 25937424601L, |
|
327 | 25937424601L, | |
328 | 61917364224L, |
|
328 | 61917364224L, | |
329 | 137858491849L, |
|
329 | 137858491849L, | |
330 | 289254654976L] |
|
330 | 289254654976L] | |
331 |
|
331 | |||
332 | The :meth:`map` method has the same signature as Python's builtin :func:`map` |
|
332 | The :meth:`map` method has the same signature as Python's builtin :func:`map` | |
333 | function, but runs the calculation in parallel. More involved examples of using |
|
333 | function, but runs the calculation in parallel. More involved examples of using | |
334 | :class:`MultiEngineClient` are provided in the examples that follow. |
|
334 | :class:`MultiEngineClient` are provided in the examples that follow. | |
335 |
|
335 | |||
336 | .. image:: mec_simple.* |
|
336 | .. image:: ../parallel/mec_simple.* | |
337 |
|
337 |
General Comments 0
You need to be logged in to leave comments.
Login now