Show More
@@ -4,19 +4,40 b'' | |||
|
4 | 4 | IPython's Task Database |
|
5 | 5 | ======================= |
|
6 | 6 | |
|
7 | The IPython Hub stores all task requests and results in a database. Currently supported backends | |
|
8 | are: MongoDB, SQLite (the default), and an in-memory DictDB. The most common use case for | |
|
9 | this is clients requesting results for tasks they did not submit, via: | |
|
7 | Enabling a DB Backend | |
|
8 | ===================== | |
|
9 | ||
|
10 | The IPython Hub can store all task requests and results in a database. | |
|
11 | Currently supported backends are: MongoDB, SQLite, and an in-memory DictDB. | |
|
12 | ||
|
13 | This database behavior is optional due to its potential :ref:`db_cost`, | |
|
14 | so you must enable one, either at the command-line:: | |
|
15 | ||
|
16 | $> ipcontroller --dictb # or --mongodb or --sqlitedb | |
|
17 | ||
|
18 | or in your :file:`ipcontroller_config.py`: | |
|
19 | ||
|
20 | .. sourcecode:: python | |
|
21 | ||
|
22 | c.HubFactory.db_class = "DictDB" | |
|
23 | c.HubFactory.db_class = "MongoDB" | |
|
24 | c.HubFactory.db_class = "SQLiteDB" | |
|
25 | ||
|
26 | ||
|
27 | Using the Task Database | |
|
28 | ======================= | |
|
29 | ||
|
30 | The most common use case for this is clients requesting results for tasks they did not submit, via: | |
|
10 | 31 | |
|
11 | 32 | .. sourcecode:: ipython |
|
12 | 33 | |
|
13 | 34 | In [1]: rc.get_result(task_id) |
|
14 | 35 | |
|
15 |
However, since we have this DB backend, we provide a direct query method in the :class:` |
|
|
36 | However, since we have this DB backend, we provide a direct query method in the :class:`~.Client` | |
|
16 | 37 | for users who want deeper introspection into their task history. The :meth:`db_query` method of |
|
17 | 38 | the Client is modeled after MongoDB queries, so if you have used MongoDB it should look |
|
18 |
familiar. In fact, when the MongoDB backend is in use, the query is relayed directly. |
|
|
19 |
|
|
|
39 | familiar. In fact, when the MongoDB backend is in use, the query is relayed directly. | |
|
40 | When using other backends, the interface is emulated and only a subset of queries is possible. | |
|
20 | 41 | |
|
21 | 42 | .. seealso:: |
|
22 | 43 | |
@@ -39,18 +60,18 b' header dict The request header' | |||
|
39 | 60 | content dict The request content (likely empty) |
|
40 | 61 | buffers list(bytes) buffers containing serialized request objects |
|
41 | 62 | submitted datetime timestamp for time of submission (set by client) |
|
42 |
client_uuid uuid( |
|
|
43 |
engine_uuid uuid( |
|
|
63 | client_uuid uuid(ascii) IDENT of client's socket | |
|
64 | engine_uuid uuid(ascii) IDENT of engine's socket | |
|
44 | 65 | started datetime time task began execution on engine |
|
45 | 66 | completed datetime time task finished execution (success or failure) on engine |
|
46 | 67 | resubmitted uuid(ascii) msg_id of resubmitted task (if applicable) |
|
47 | 68 | result_header dict header for result |
|
48 | 69 | result_content dict content for result |
|
49 | 70 | result_buffers list(bytes) buffers containing serialized request objects |
|
50 |
queue |
|
|
51 |
pyin |
|
|
52 |
pyout |
|
|
53 |
pyerr |
|
|
71 | queue str The name of the queue for the task ('mux' or 'task') | |
|
72 | pyin str Python input source | |
|
73 | pyout dict Python output (pyout message content) | |
|
74 | pyerr dict Python traceback (pyerr message content) | |
|
54 | 75 | stdout str Stream of stdout data |
|
55 | 76 | stderr str Stream of stderr data |
|
56 | 77 | |
@@ -77,15 +98,15 b' The DB Query is useful for two primary cases:' | |||
|
77 | 98 | 1. deep polling of task status or metadata |
|
78 | 99 | 2. selecting a subset of tasks, on which to perform a later operation (e.g. wait on result, purge records, resubmit,...) |
|
79 | 100 | |
|
101 | ||
|
80 | 102 | Example Queries |
|
81 | 103 | =============== |
|
82 | 104 | |
|
83 | ||
|
84 | 105 | To get all msg_ids that are not completed, only retrieving their ID and start time: |
|
85 | 106 | |
|
86 | 107 | .. sourcecode:: ipython |
|
87 | 108 | |
|
88 | In [1]: incomplete = rc.db_query({'complete' : None}, keys=['msg_id', 'started']) | |
|
109 | In [1]: incomplete = rc.db_query({'completed' : None}, keys=['msg_id', 'started']) | |
|
89 | 110 | |
|
90 | 111 | All jobs started in the last hour by me: |
|
91 | 112 | |
@@ -113,12 +134,13 b' Result headers for all jobs on engine 3 or 4:' | |||
|
113 | 134 | |
|
114 | 135 | In [2]: hist34 = rc.db_query({'engine_uuid' : {'$in' : uuids }, keys='result_header') |
|
115 | 136 | |
|
137 | .. _db_cost: | |
|
116 | 138 | |
|
117 | 139 | Cost |
|
118 | 140 | ==== |
|
119 | 141 | |
|
120 | 142 | The advantage of the database backends is, of course, that large amounts of |
|
121 |
data can be stored that won't fit in memory. The |
|
|
143 | data can be stored that won't fit in memory. The basic DictDB 'backend' is actually | |
|
122 | 144 | to just store all of this information in a Python dictionary. This is very fast, |
|
123 | 145 | but will run out of memory quickly if you move a lot of data around, or your |
|
124 | 146 | cluster is to run for a long time. |
General Comments 0
You need to be logged in to leave comments.
Login now