upstream/ipython Files · docs/source/parallel/parallel_db.txt

resubmitted tasks are now wholly separate (new msg_ids)...

resubmitted tasks are now wholly separate (new msg_ids) This removes much of the fragility due to trying to cram a resubmit into the original ID. The `resubmit` record is now the resubmitted task ID, rather than the timestamp.

MinRK - - Load All Authors

File last commit:

r6817:fa181ff0


                r6817:fa181ff0

Download file

             parallel_db.txt
        
                    137 lines
            
             | 5.3 KiB
            
                | text/plain
            
             |
                TextLexer

/ docs / source / parallel / parallel_db.txt

History | Annotation | Raw |Copy content |Copy permalink

				.. _parallel_db:

				=======================
				IPython's Task Database
				=======================

				The IPython Hub stores all task requests and results in a database. Currently supported backends
				are: MongoDB, SQLite (the default), and an in-memory DictDB. The most common use case for
				this is clients requesting results for tasks they did not submit, via:

				.. sourcecode:: ipython

				In [1]: rc.get_result(task_id)

				However, since we have this DB backend, we provide a direct query method in the :class:`client`
				for users who want deeper introspection into their task history. The :meth:`db_query` method of
				the Client is modeled after MongoDB queries, so if you have used MongoDB it should look
				familiar. In fact, when the MongoDB backend is in use, the query is relayed directly. However,
				when using other backends, the interface is emulated and only a subset of queries is possible.

				.. seealso::

				MongoDB query docs: http://www.mongodb.org/display/DOCS/Querying

				:meth:`Client.db_query` takes a dictionary query object, with keys from the TaskRecord key list,
				and values of either exact values to test, or MongoDB queries, which are dicts of The form:
				``{'operator' : 'argument(s)'}``. There is also an optional `keys` argument, that specifies
				which subset of keys should be retrieved. The default is to retrieve all keys excluding the
				request and result buffers. :meth:`db_query` returns a list of TaskRecord dicts. Also like
				MongoDB, the `msg_id` key will always be included, whether requested or not.

				TaskRecord keys:

				=============== =============== =============
				Key Type Description
				=============== =============== =============
				msg_id uuid(ascii) The msg ID
				header dict The request header
				content dict The request content (likely empty)
				buffers list(bytes) buffers containing serialized request objects
				submitted datetime timestamp for time of submission (set by client)
				client_uuid uuid(bytes) IDENT of client's socket
				engine_uuid uuid(bytes) IDENT of engine's socket
				started datetime time task began execution on engine
				completed datetime time task finished execution (success or failure) on engine
				resubmitted uuid(ascii) msg_id of resubmitted task (if applicable)
				result_header dict header for result
				result_content dict content for result
				result_buffers list(bytes) buffers containing serialized request objects
				queue bytes The name of the queue for the task ('mux' or 'task')
				pyin <unused> Python input (unused)
				pyout <unused> Python output (unused)
				pyerr <unused> Python traceback (unused)
				stdout str Stream of stdout data
				stderr str Stream of stderr data

				=============== =============== =============

				MongoDB operators we emulate on all backends:

				========== =================
				Operator Python equivalent
				========== =================
				'$in' in
				'$nin' not in
				'$eq' ==
				'$ne' !=
				'$ge' >
				'$gte' >=
				'$le' <
				'$lte' <=
				========== =================


				The DB Query is useful for two primary cases:

				1. deep polling of task status or metadata
				2. selecting a subset of tasks, on which to perform a later operation (e.g. wait on result, purge records, resubmit,...)

				Example Queries
				===============


				To get all msg_ids that are not completed, only retrieving their ID and start time:

				.. sourcecode:: ipython

				In [1]: incomplete = rc.db_query({'complete' : None}, keys=['msg_id', 'started'])

				All jobs started in the last hour by me:

				.. sourcecode:: ipython

				In [1]: from datetime import datetime, timedelta

				In [2]: hourago = datetime.now() - timedelta(1./24)

				In [3]: recent = rc.db_query({'started' : {'$gte' : hourago },
				'client_uuid' : rc.session.session})

				All jobs started more than an hour ago, by clients other than me:

				.. sourcecode:: ipython

				In [3]: recent = rc.db_query({'started' : {'$le' : hourago },
				'client_uuid' : {'$ne' : rc.session.session}})

				Result headers for all jobs on engine 3 or 4:

				.. sourcecode:: ipython

				In [1]: uuids = map(rc._engines.get, (3,4))

				In [2]: hist34 = rc.db_query({'engine_uuid' : {'$in' : uuids }, keys='result_header')


				Cost
				====

				The advantage of the database backends is, of course, that large amounts of
				data can be stored that won't fit in memory. The default 'backend' is actually
				to just store all of this information in a Python dictionary. This is very fast,
				but will run out of memory quickly if you move a lot of data around, or your
				cluster is to run for a long time.

				Unfortunately, the DB backends (SQLite and MongoDB) right now are rather slow,
				and can still consume large amounts of resources, particularly if large tasks
				or results are being created at a high frequency.

				For this reason, we have added :class:`~.NoDB`,a dummy backend that doesn't
				actually store any information. When you use this database, nothing is stored,
				and any request for results will result in a KeyError. This obviously prevents
				later requests for results and task resubmission from functioning, but
				sometimes those nice features are not as useful as keeping Hub memory under
				control.

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages