upstream/ipython Commit - r7510:fcf42b73

4

IPython's Task Database

4

IPython's Task Database

5

=======================

5

=======================

6

7

The IPython Hub stores all task requests and results in a database. Currently supported backends

7

Enabling a DB Backend

8

are: MongoDB, SQLite (the default), and an in-memory DictDB. The most common use case for

8

=====================

9

this is clients requesting results for tasks they did not submit, via:

9

10

The IPython Hub can store all task requests and results in a database.

11

Currently supported backends are: MongoDB, SQLite, and an in-memory DictDB.

12

13

This database behavior is optional due to its potential :ref:`db_cost`,

14

so you must enable one, either at the command-line::

15

16

$> ipcontroller --dictb # or --mongodb or --sqlitedb

17

18

or in your :file:`ipcontroller_config.py`:

19

20

.. sourcecode:: python

21

22

c.HubFactory.db_class = "DictDB"

23

c.HubFactory.db_class = "MongoDB"

24

c.HubFactory.db_class = "SQLiteDB"

25

26

27

Using the Task Database

28

=======================

29

30

The most common use case for this is clients requesting results for tasks they did not submit, via:

10

31

11

.. sourcecode:: ipython

32

.. sourcecode:: ipython

12

33

13

In [1]: rc.get_result(task_id)

34

In [1]: rc.get_result(task_id)

14

35

15

However, since we have this DB backend, we provide a direct query method in the :class:`client`

36

However, since we have this DB backend, we provide a direct query method in the :class:`~.Client`

16

for users who want deeper introspection into their task history. The :meth:`db_query` method of

37

for users who want deeper introspection into their task history. The :meth:`db_query` method of

17

the Client is modeled after MongoDB queries, so if you have used MongoDB it should look

38

the Client is modeled after MongoDB queries, so if you have used MongoDB it should look

18

familiar. In fact, when the MongoDB backend is in use, the query is relayed directly. ~~However,~~

39

familiar. In fact, when the MongoDB backend is in use, the query is relayed directly.

19

when using other backends, the interface is emulated and only a subset of queries is possible.

40

When using other backends, the interface is emulated and only a subset of queries is possible.

20

41

21

.. seealso::

42

.. seealso::

22

43

39

content dict The request content (likely empty)

60

content dict The request content (likely empty)

40

buffers list(bytes) buffers containing serialized request objects

61

buffers list(bytes) buffers containing serialized request objects

41

submitted datetime timestamp for time of submission (set by client)

62

submitted datetime timestamp for time of submission (set by client)

42

client_uuid uuid(~~bytes~~) IDENT of client's socket

63

client_uuid uuid(ascii) IDENT of client's socket

43

engine_uuid uuid(~~bytes~~) IDENT of engine's socket

64

engine_uuid uuid(ascii) IDENT of engine's socket

44

started datetime time task began execution on engine

65

started datetime time task began execution on engine

45

completed datetime time task finished execution (success or failure) on engine

66

completed datetime time task finished execution (success or failure) on engine

46

resubmitted uuid(ascii) msg_id of resubmitted task (if applicable)

67

resubmitted uuid(ascii) msg_id of resubmitted task (if applicable)

47

result_header dict header for result

68

result_header dict header for result

48

result_content dict content for result

69

result_content dict content for result

49

result_buffers list(bytes) buffers containing serialized request objects

70

result_buffers list(bytes) buffers containing serialized request objects

50

queue ~~byte~~s The name of the queue for the task ('mux' or 'task')

71

queue str The name of the queue for the task ('mux' or 'task')

51

pyin ~~<unused>~~ Python input ~~(unu~~sed)

72

pyin str Python input source

52

pyout ~~<unused>~~ Python output (~~unused~~)

73

pyout dict Python output (pyout message content)

53

pyerr ~~<unused>~~ Python traceback (~~unused~~)

74

pyerr dict Python traceback (pyerr message content)

54

stdout str Stream of stdout data

75

stdout str Stream of stdout data

55

stderr str Stream of stderr data

76

stderr str Stream of stderr data

56

77

1. deep polling of task status or metadata

98

1. deep polling of task status or metadata

78

2. selecting a subset of tasks, on which to perform a later operation (e.g. wait on result, purge records, resubmit,...)

99

2. selecting a subset of tasks, on which to perform a later operation (e.g. wait on result, purge records, resubmit,...)

79

100

101

80

Example Queries

102

Example Queries

81

===============

103

===============

82

104

83

84

To get all msg_ids that are not completed, only retrieving their ID and start time:

105

To get all msg_ids that are not completed, only retrieving their ID and start time:

85

106

86

.. sourcecode:: ipython

107

.. sourcecode:: ipython

87

108

88

In [1]: incomplete = rc.db_query({'complete' : None}, keys=['msg_id', 'started'])

109

In [1]: incomplete = rc.db_query({'completed' : None}, keys=['msg_id', 'started'])

89

110

90

All jobs started in the last hour by me:

111

All jobs started in the last hour by me:

91

112

113

134

114

In [2]: hist34 = rc.db_query({'engine_uuid' : {'$in' : uuids }, keys='result_header')

135

In [2]: hist34 = rc.db_query({'engine_uuid' : {'$in' : uuids }, keys='result_header')

115

136

137

.. _db_cost:

116

138

117

Cost

139

Cost

118

====

140

====

119

141

120

The advantage of the database backends is, of course, that large amounts of

142

The advantage of the database backends is, of course, that large amounts of

121

data can be stored that won't fit in memory. The ~~default~~ 'backend' is actually

143

data can be stored that won't fit in memory. The basic DictDB 'backend' is actually

122

to just store all of this information in a Python dictionary. This is very fast,

144

to just store all of this information in a Python dictionary. This is very fast,

123

but will run out of memory quickly if you move a lot of data around, or your

145

but will run out of memory quickly if you move a lot of data around, or your

124

cluster is to run for a long time.

146

cluster is to run for a long time.

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

             IPython's Task Database
             =======================
-            The IPython Hub stores all task requests and results in a database. Currently supported backends
+            Enabling a DB Backend
-            are: MongoDB, SQLite (the default), and an in-memory DictDB.  The most common use case for
+            =====================
-            this is clients requesting results for tasks they did not submit, via:
+            The IPython Hub can store all task requests and results in a database.
+            Currently supported backends are: MongoDB, SQLite, and an in-memory DictDB.
+            This database behavior is optional due to its potential :ref:`db_cost`,
+            so you must enable one, either at the command-line::
+                $> ipcontroller --dictb # or --mongodb or --sqlitedb
+            or in your :file:`ipcontroller_config.py`:
+            .. sourcecode:: python
+                c.HubFactory.db_class = "DictDB"
+                c.HubFactory.db_class = "MongoDB"
+                c.HubFactory.db_class = "SQLiteDB"
+            Using the Task Database
+            =======================
+            The most common use case for this is clients requesting results for tasks they did not submit, via:
             .. sourcecode:: ipython
                 In [1]: rc.get_result(task_id)
-            However, since we have this DB backend, we provide a direct query method in the :class:`client`
+            However, since we have this DB backend, we provide a direct query method in the :class:`~.Client`
             for users who want deeper introspection into their task history. The :meth:`db_query` method of
             the Client is modeled after MongoDB queries, so if you have used MongoDB it should look
-            familiar.  In fact, when the MongoDB backend is in use, the query is relayed directly.  However,
+            familiar.  In fact, when the MongoDB backend is in use, the query is relayed directly.
-            when using other backends, the interface is emulated and only a subset of queries is possible.
+            When using other backends, the interface is emulated and only a subset of queries is possible.
             .. seealso::
             content         dict            The request content (likely empty)
             buffers         list(bytes)     buffers containing serialized request objects
             submitted       datetime        timestamp for time of submission (set by client)
-            client_uuid     uuid(bytes)     IDENT of client's socket
+            client_uuid     uuid(ascii)     IDENT of client's socket
-            engine_uuid     uuid(bytes)     IDENT of engine's socket
+            engine_uuid     uuid(ascii)     IDENT of engine's socket
             started         datetime        time task began execution on engine
             completed       datetime        time task finished execution (success or failure) on engine
             resubmitted     uuid(ascii)     msg_id of resubmitted task (if applicable)
             result_header   dict            header for result
             result_content  dict            content for result
             result_buffers  list(bytes)     buffers containing serialized request objects
-            queue           bytes           The name of the queue for the task ('mux' or 'task')
+            queue           str             The name of the queue for the task ('mux' or 'task')
-            pyin            <unused>        Python input (unused)
+            pyin            str             Python input source
-            pyout           <unused>        Python output (unused)
+            pyout           dict            Python output (pyout message content)
-            pyerr           <unused>        Python traceback (unused)
+            pyerr           dict            Python traceback (pyerr message content)
             stdout          str             Stream of stdout data
             stderr          str             Stream of stderr data
 . deep polling of task status or metadata
 . selecting a subset of tasks, on which to perform a later operation (e.g. wait on result, purge records, resubmit,...)
             Example Queries
             ===============
             To get all msg_ids that are not completed, only retrieving their ID and start time:
             .. sourcecode:: ipython
-                In [1]: incomplete = rc.db_query({'complete' : None}, keys=['msg_id', 'started'])
+                In [1]: incomplete = rc.db_query({'completed' : None}, keys=['msg_id', 'started'])
             All jobs started in the last hour by me:
                 In [2]: hist34 = rc.db_query({'engine_uuid' : {'$in' : uuids }, keys='result_header')
+            .. _db_cost:
             Cost
             ====
             The advantage of the database backends is, of course, that large amounts of
-            data can be stored that won't fit in memory.  The default 'backend' is actually
+            data can be stored that won't fit in memory.  The basic DictDB 'backend' is actually
             to just store all of this information in a Python dictionary.  This is very fast,
             but will run out of memory quickly if you move a lot of data around, or your
             cluster is to run for a long time.