Show More
@@ -106,6 +106,13 b' flags.update({' | |||||
106 | 'use the MongoDB backend'), |
|
106 | 'use the MongoDB backend'), | |
107 | 'dictdb' : ({'HubFactory' : {'db_class' : 'IPython.parallel.controller.dictdb.DictDB'}}, |
|
107 | 'dictdb' : ({'HubFactory' : {'db_class' : 'IPython.parallel.controller.dictdb.DictDB'}}, | |
108 | 'use the in-memory DictDB backend'), |
|
108 | 'use the in-memory DictDB backend'), | |
|
109 | 'nodb' : ({'HubFactory' : {'db_class' : 'IPython.parallel.controller.dictdb.NoDB'}}, | |||
|
110 | """use dummy DB backend, which doesn't store any information. | |||
|
111 | ||||
|
112 | This can be used to prevent growth of the memory footprint of the Hub | |||
|
113 | in cases where its record-keeping is not required. Requesting results | |||
|
114 | of tasks submitted by other clients, db_queries, and task resubmission | |||
|
115 | will not be available."""), | |||
109 | 'reuse' : ({'IPControllerApp' : {'reuse_files' : True}}, |
|
116 | 'reuse' : ({'IPControllerApp' : {'reuse_files' : True}}, | |
110 | 'reuse existing json connection files') |
|
117 | 'reuse existing json connection files') | |
111 | }) |
|
118 | }) |
@@ -183,3 +183,34 b' class DictDB(BaseDB):' | |||||
183 | """get all msg_ids, ordered by time submitted.""" |
|
183 | """get all msg_ids, ordered by time submitted.""" | |
184 | msg_ids = self._records.keys() |
|
184 | msg_ids = self._records.keys() | |
185 | return sorted(msg_ids, key=lambda m: self._records[m]['submitted']) |
|
185 | return sorted(msg_ids, key=lambda m: self._records[m]['submitted']) | |
|
186 | ||||
|
187 | class NoDB(DictDB): | |||
|
188 | """A blackhole db backend that actually stores no information. | |||
|
189 | ||||
|
190 | Provides the full DB interface, but raises KeyErrors on any | |||
|
191 | method that tries to access the records. This can be used to | |||
|
192 | minimize the memory footprint of the Hub when its record-keeping | |||
|
193 | functionality is not required. | |||
|
194 | """ | |||
|
195 | ||||
|
196 | def add_record(self, msg_id, record): | |||
|
197 | pass | |||
|
198 | ||||
|
199 | def get_record(self, msg_id): | |||
|
200 | raise KeyError("NoDB does not support record access") | |||
|
201 | ||||
|
202 | def update_record(self, msg_id, record): | |||
|
203 | pass | |||
|
204 | ||||
|
205 | def drop_matching_records(self, check): | |||
|
206 | pass | |||
|
207 | ||||
|
208 | def drop_record(self, msg_id): | |||
|
209 | pass | |||
|
210 | ||||
|
211 | def find_records(self, check, keys=None): | |||
|
212 | raise KeyError("NoDB does not store information") | |||
|
213 | ||||
|
214 | def get_history(self): | |||
|
215 | raise KeyError("NoDB does not store information") | |||
|
216 |
@@ -112,3 +112,26 b' Result headers for all jobs on engine 3 or 4:' | |||||
112 | In [1]: uuids = map(rc._engines.get, (3,4)) |
|
112 | In [1]: uuids = map(rc._engines.get, (3,4)) | |
113 |
|
113 | |||
114 | In [2]: hist34 = rc.db_query({'engine_uuid' : {'$in' : uuids }, keys='result_header') |
|
114 | In [2]: hist34 = rc.db_query({'engine_uuid' : {'$in' : uuids }, keys='result_header') | |
|
115 | ||||
|
116 | ||||
|
117 | Cost | |||
|
118 | ==== | |||
|
119 | ||||
|
120 | The advantage of the database backends is, of course, that large amounts of | |||
|
121 | data can be stored that won't fit in memory. The default 'backend' is actually | |||
|
122 | to just store all of this information in a Python dictionary. This is very fast, | |||
|
123 | but will run out of memory quickly if you move a lot of data around, or your | |||
|
124 | cluster is to run for a long time. | |||
|
125 | ||||
|
126 | Unfortunately, the DB backends (SQLite and MongoDB) right now are rather slow, | |||
|
127 | and can still consume large amounts of resources, particularly if large tasks | |||
|
128 | or results are being created at a high frequency. | |||
|
129 | ||||
|
130 | For this reason, we have added :class:`~.NoDB`,a dummy backend that doesn't | |||
|
131 | actually store any information. When you use this database, nothing is stored, | |||
|
132 | and any request for results will result in a KeyError. This obviously prevents | |||
|
133 | later requests for results and task resubmission from functioning, but | |||
|
134 | sometimes those nice features are not as useful as keeping Hub memory under | |||
|
135 | control. | |||
|
136 | ||||
|
137 |
@@ -763,6 +763,10 b' To use one of these backends, you must set the :attr:`HubFactory.db_class` trait' | |||||
763 | # and SQLite: |
|
763 | # and SQLite: | |
764 | c.HubFactory.db_class = 'IPython.parallel.controller.sqlitedb.SQLiteDB' |
|
764 | c.HubFactory.db_class = 'IPython.parallel.controller.sqlitedb.SQLiteDB' | |
765 |
|
765 | |||
|
766 | # You can use NoDB to disable the database altogether, in case you don't need | |||
|
767 | # to reuse tasks or results, and want to keep memory consumption under control. | |||
|
768 | c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.NoDB' | |||
|
769 | ||||
766 | When using the proper databases, you can actually allow for tasks to persist from |
|
770 | When using the proper databases, you can actually allow for tasks to persist from | |
767 | one session to the next by specifying the MongoDB database or SQLite table in |
|
771 | one session to the next by specifying the MongoDB database or SQLite table in | |
768 | which tasks are to be stored. The default is to use a table named for the Hub's Session, |
|
772 | which tasks are to be stored. The default is to use a table named for the Hub's Session, | |
@@ -789,6 +793,22 b' you can specify any arguments you may need to the PyMongo `Connection' | |||||
789 | # keyword args to pymongo.Connection |
|
793 | # keyword args to pymongo.Connection | |
790 | c.MongoDB.connection_kwargs = {} |
|
794 | c.MongoDB.connection_kwargs = {} | |
791 |
|
795 | |||
|
796 | But sometimes you are moving lots of data around quickly, and you don't need | |||
|
797 | that information to be stored for later access, even by other Clients to this | |||
|
798 | same session. For this case, we have a dummy database, which doesn't actually | |||
|
799 | store anything. This lets the Hub stay small in memory, at the obvious expense | |||
|
800 | of being able to access the information that would have been stored in the | |||
|
801 | database (used for task resubmission, requesting results of tasks you didn't | |||
|
802 | submit, etc.). To use this backend, simply pass ``--nodb`` to | |||
|
803 | :command:`ipcontroller` on the command-line, or specify the :class:`NoDB` class | |||
|
804 | in your :file:`ipcontroller_config.py` as described above. | |||
|
805 | ||||
|
806 | ||||
|
807 | .. seealso:: | |||
|
808 | ||||
|
809 | For more information on the database backends, see the :ref:`db backend reference <parallel_db>`. | |||
|
810 | ||||
|
811 | ||||
792 | .. _PyMongo: http://api.mongodb.org/python/1.9/ |
|
812 | .. _PyMongo: http://api.mongodb.org/python/1.9/ | |
793 |
|
813 | |||
794 | Configuring `ipengine` |
|
814 | Configuring `ipengine` |
General Comments 0
You need to be logged in to leave comments.
Login now