##// END OF EJS Templates
add db,resubmit/retries docs
MinRK -
Show More
@@ -0,0 +1,114 b''
1 .. _parallel_db:
2
3 =======================
4 IPython's Task Database
5 =======================
6
7 The IPython Hub stores all task requests and results in a database. Currently supported backends
8 are: MongoDB, SQLite (the default), and an in-memory DictDB. The most common use case for
9 this is clients requesting results for tasks they did not submit, via:
10
11 .. sourcecode:: ipython
12
13 In [1]: rc.get_result(task_id)
14
15 However, since we have this DB backend, we provide a direct query method in the :class:`client`
16 for users who want deeper introspection into their task history. The :meth:`db_query` method of
17 the Client is modeled after MongoDB queries, so if you have used MongoDB it should look
18 familiar. In fact, when the MongoDB backend is in use, the query is relayed directly. However,
19 when using other backends, the interface is emulated and only a subset of queries is possible.
20
21 .. seealso::
22
23 MongoDB query docs: http://www.mongodb.org/display/DOCS/Querying
24
25 :meth:`Client.db_query` takes a dictionary query object, with keys from the TaskRecord key list,
26 and values of either exact values to test, or MongoDB queries, which are dicts of The form:
27 ``{'operator' : 'argument(s)'}``. There is also an optional `keys` argument, that specifies
28 which subset of keys should be retrieved. The default is to retrieve all keys excluding the
29 request and result buffers. :meth:`db_query` returns a list of TaskRecord dicts. Also like
30 MongoDB, the `msg_id` key will always be included, whether requested or not.
31
32 TaskRecord keys:
33
34 =============== =============== =============
35 Key Type Description
36 =============== =============== =============
37 msg_id uuid(bytes) The msg ID
38 header dict The request header
39 content dict The request content (likely empty)
40 buffers list(bytes) buffers containing serialized request objects
41 submitted datetime timestamp for time of submission (set by client)
42 client_uuid uuid(bytes) IDENT of client's socket
43 engine_uuid uuid(bytes) IDENT of engine's socket
44 started datetime time task began execution on engine
45 completed datetime time task finished execution (success or failure) on engine
46 resubmitted datetime time of resubmission (if applicable)
47 result_header dict header for result
48 result_content dict content for result
49 result_buffers list(bytes) buffers containing serialized request objects
50 queue bytes The name of the queue for the task ('mux' or 'task')
51 pyin <unused> Python input (unused)
52 pyout <unused> Python output (unused)
53 pyerr <unused> Python traceback (unused)
54 stdout str Stream of stdout data
55 stderr str Stream of stderr data
56
57 =============== =============== =============
58
59 MongoDB operators we emulate on all backends:
60
61 ========== =================
62 Operator Python equivalent
63 ========== =================
64 '$in' in
65 '$nin' not in
66 '$eq' ==
67 '$ne' !=
68 '$ge' >
69 '$gte' >=
70 '$le' <
71 '$lte' <=
72 ========== =================
73
74
75 The DB Query is useful for two primary cases:
76
77 1. deep polling of task status or metadata
78 2. selecting a subset of tasks, on which to perform a later operation (e.g. wait on result, purge records, resubmit,...)
79
80 Example Queries
81 ===============
82
83
84 To get all msg_ids that are not completed, only retrieving their ID and start time:
85
86 .. sourcecode:: ipython
87
88 In [1]: incomplete = rc.db_query({'complete' : None}, keys=['msg_id', 'started'])
89
90 All jobs started in the last hour by me:
91
92 .. sourcecode:: ipython
93
94 In [1]: from datetime import datetime, timedelta
95
96 In [2]: hourago = datetime.now() - timedelta(1./24)
97
98 In [3]: recent = rc.db_query({'started' : {'$gte' : hourago },
99 'client_uuid' : rc.session.session})
100
101 All jobs started more than an hour ago, by clients *other than me*:
102
103 .. sourcecode:: ipython
104
105 In [3]: recent = rc.db_query({'started' : {'$le' : hourago },
106 'client_uuid' : {'$ne' : rc.session.session}})
107
108 Result headers for all jobs on engine 3 or 4:
109
110 .. sourcecode:: ipython
111
112 In [1]: uuids = map(rc._engines.get, (3,4))
113
114 In [2]: hist34 = rc.db_query({'engine_uuid' : {'$in' : uuids }, keys='result_header')
@@ -1317,9 +1317,11 b' class Client(HasTraits):'
1317 1317 query : mongodb query dict
1318 1318 The search dict. See mongodb query docs for details.
1319 1319 keys : list of strs [optional]
1320 THe subset of keys to be returned. The default is to fetch everything.
1320 The subset of keys to be returned. The default is to fetch everything but buffers.
1321 1321 'msg_id' will *always* be included.
1322 1322 """
1323 if isinstance(keys, basestring):
1324 keys = [keys]
1323 1325 content = dict(query=query, keys=keys)
1324 1326 self.session.send(self._query_socket, "db_request", content=content)
1325 1327 idents, msg = self.session.recv(self._query_socket, 0)
@@ -12,6 +12,7 b' Using IPython for parallel computing'
12 12 parallel_multiengine.txt
13 13 parallel_task.txt
14 14 parallel_mpi.txt
15 parallel_db.txt
15 16 parallel_security.txt
16 17 parallel_winhpc.txt
17 18 parallel_demos.txt
@@ -292,6 +292,7 b' you can skip using Dependency objects, and just pass msg_ids or AsyncResult obje'
292 292
293 293
294 294
295
295 296 Impossible Dependencies
296 297 ***********************
297 298
@@ -313,6 +314,27 b' The basic cases that are checked:'
313 314 This analysis has not been proven to be rigorous, so it is likely possible for tasks
314 315 to become impossible to run in obscure situations, so a timeout may be a good choice.
315 316
317
318 Retries and Resubmit
319 ====================
320
321 Retries
322 -------
323
324 Another flag for tasks is `retries`. This is an integer, specifying how many times
325 a task should be resubmitted after failure. This is useful for tasks that should still run
326 if their engine was shutdown, or may have some statistical chance of failing. The default
327 is to not retry tasks.
328
329 Resubmit
330 --------
331
332 Sometimes you may want to re-run a task. This could be because it failed for some reason, and
333 you have fixed the error, or because you want to restore the cluster to an interrupted state.
334 For this, the :class:`Client` has a :meth:`rc.resubmit` method. This simply takes one or more
335 msg_ids, and returns an :class:`AsyncHubResult` for the result(s). You cannot resubmit
336 a task that is pending - only those that have finished, either successful or unsuccessful.
337
316 338 .. _parallel_schedulers:
317 339
318 340 Schedulers
@@ -391,6 +413,8 b' Disabled features when using the ZMQ Scheduler:'
391 413 TODO: performance comparisons
392 414
393 415
416
417
394 418 More details
395 419 ============
396 420
General Comments 0
You need to be logged in to leave comments. Login now