##// END OF EJS Templates
fixes #90 + docs update
marcink -
r894:1fed3c91 beta
parent child Browse files
Show More
@@ -1,294 +1,303 b''
1 1 .. _setup:
2 2
3 3 Setup
4 4 =====
5 5
6 6
7 7 Setting up the application
8 8 --------------------------
9 9
10 10 First You'll ned to create RhodeCode config file. Run the following command
11 11 to do this
12 12
13 13 ::
14 14
15 15 paster make-config RhodeCode production.ini
16 16
17 17 - This will create `production.ini` config inside the directory
18 18 this config contains various settings for RhodeCode, e.g proxy port,
19 19 email settings, usage of static files, cache, celery settings and logging.
20 20
21 21
22 22
23 23 Next we need to create the database.
24 24
25 25 ::
26 26
27 27 paster setup-app production.ini
28 28
29 29 - This command will create all needed tables and an admin account.
30 30 When asked for a path You can either use a new location of one with already
31 31 existing ones. RhodeCode will simply add all new found repositories to
32 32 it's database. Also make sure You specify correct path to repositories.
33 33 - Remember that the given path for mercurial_ repositories must be write
34 34 accessible for the application. It's very important since RhodeCode web
35 35 interface will work even without such an access but, when trying to do a
36 36 push it'll eventually fail with permission denied errors.
37 37
38 38 You are ready to use rhodecode, to run it simply execute
39 39
40 40 ::
41 41
42 42 paster serve production.ini
43 43
44 44 - This command runs the RhodeCode server the app should be available at the
45 45 127.0.0.1:5000. This ip and port is configurable via the production.ini
46 46 file created in previous step
47 47 - Use admin account you created to login.
48 48 - Default permissions on each repository is read, and owner is admin. So
49 49 remember to update these if needed. In the admin panel You can toggle ldap,
50 50 anonymous, permissions settings. As well as edit more advanced options on
51 51 users and repositories
52 52
53 53
54 54 Setting up Whoosh full text search
55 55 ----------------------------------
56 56
57 Index for whoosh can be build starting from version 1.1 using paster command
58 passing repo locations to index, as well as Your config file that stores
59 whoosh index files locations. There is possible to pass `-f` to the options
57 Starting from version 1.1 whoosh index can be build using paster command.
58 You have to specify the config file that stores location of index, and
59 location of repositories (`--repo-location`). Starting from version 1.2 it is
60 also possible to specify a comma separated list of repositories (`--index-only`)
61 to build index only on chooses repositories skipping any other found in repos
62 location
63
64 There is possible also to pass `-f` to the options
60 65 to enable full index rebuild. Without that indexing will run always in in
61 66 incremental mode.
62 67
63 ::
68 incremental mode::
64 69
65 70 paster make-index production.ini --repo-location=<location for repos>
66 71
67 for full index rebuild You can use
72
68 73
69 ::
74 for full index rebuild You can use::
70 75
71 76 paster make-index production.ini -f --repo-location=<location for repos>
72 77
73 - For full text search You can either put crontab entry for
78
79 building index just for chosen repositories is possible with such command::
80
81 paster make-index production.ini --repo-location=<location for repos> --index-only=vcs,rhodecode
74 82
75 This command can be run even from crontab in order to do periodical
76 index builds and keep Your index always up to date. An example entry might
77 look like this
83
84 In order to do periodical index builds and keep Your index always up to date.
85 It's recommended to do a crontab entry for incremental indexing.
86 An example entry might look like this
78 87
79 88 ::
80 89
81 90 /path/to/python/bin/paster /path/to/rhodecode/production.ini --repo-location=<location for repos>
82 91
83 When using incremental(default) mode whoosh will check last modification date
92 When using incremental (default) mode whoosh will check last modification date
84 93 of each file and add it to reindex if newer file is available. Also indexing
85 94 daemon checks for removed files and removes them from index.
86 95
87 96 Sometime You might want to rebuild index from scratch. You can do that using
88 97 the `-f` flag passed to paster command or, in admin panel You can check
89 98 `build from scratch` flag.
90 99
91 100
92 101 Setting up LDAP support
93 102 -----------------------
94 103
95 104 RhodeCode starting from version 1.1 supports ldap authentication. In order
96 105 to use ldap, You have to install python-ldap package. This package is available
97 106 via pypi, so You can install it by running
98 107
99 108 ::
100 109
101 110 easy_install python-ldap
102 111
103 112 ::
104 113
105 114 pip install python-ldap
106 115
107 116 .. note::
108 117 python-ldap requires some certain libs on Your system, so before installing
109 118 it check that You have at least `openldap`, and `sasl` libraries.
110 119
111 120 ldap settings are located in admin->ldap section,
112 121
113 122 Here's a typical ldap setup::
114 123
115 124 Enable ldap = checked #controls if ldap access is enabled
116 125 Host = host.domain.org #actual ldap server to connect
117 126 Port = 389 or 689 for ldaps #ldap server ports
118 127 Enable LDAPS = unchecked #enable disable ldaps
119 128 Account = <account> #access for ldap server(if required)
120 129 Password = <password> #password for ldap server(if required)
121 130 Base DN = uid=%(user)s,CN=users,DC=host,DC=domain,DC=org
122 131
123 132
124 133 `Account` and `Password` are optional, and used for two-phase ldap
125 134 authentication so those are credentials to access Your ldap, if it doesn't
126 135 support anonymous search/user lookups.
127 136
128 137 Base DN must have %(user)s template inside, it's a placer where Your uid used
129 138 to login would go, it allows admins to specify not standard schema for uid
130 139 variable
131 140
132 141 If all data are entered correctly, and `python-ldap` is properly installed
133 142 Users should be granted to access RhodeCode wit ldap accounts. When
134 143 logging at the first time an special ldap account is created inside RhodeCode,
135 144 so You can control over permissions even on ldap users. If such user exists
136 145 already in RhodeCode database ldap user with the same username would be not
137 146 able to access RhodeCode.
138 147
139 148 If You have problems with ldap access and believe You entered correct
140 149 information check out the RhodeCode logs,any error messages sent from
141 150 ldap will be saved there.
142 151
143 152
144 153
145 154 Setting Up Celery
146 155 -----------------
147 156
148 157 Since version 1.1 celery is configured by the rhodecode ini configuration files
149 158 simply set use_celery=true in the ini file then add / change the configuration
150 159 variables inside the ini file.
151 160
152 161 Remember that the ini files uses format with '.' not with '_' like celery
153 162 so for example setting `BROKER_HOST` in celery means setting `broker.host` in
154 163 the config file.
155 164
156 165 In order to make start using celery run::
157 166 paster celeryd <configfile.ini>
158 167
159 168
160 169
161 170 .. note::
162 171 Make sure You run this command from same virtualenv, and with the same user
163 172 that rhodecode runs.
164 173
165 174
166 175 Nginx virtual host example
167 176 --------------------------
168 177
169 178 Sample config for nginx using proxy::
170 179
171 180 server {
172 181 listen 80;
173 182 server_name hg.myserver.com;
174 183 access_log /var/log/nginx/rhodecode.access.log;
175 184 error_log /var/log/nginx/rhodecode.error.log;
176 185 location / {
177 186 root /var/www/rhodecode/rhodecode/public/;
178 187 if (!-f $request_filename){
179 188 proxy_pass http://127.0.0.1:5000;
180 189 }
181 190 #this is important if You want to use https !!!
182 191 proxy_set_header X-Url-Scheme $scheme;
183 192 include /etc/nginx/proxy.conf;
184 193 }
185 194 }
186 195
187 196 Here's the proxy.conf. It's tuned so it'll not timeout on long
188 197 pushes and also on large pushes::
189 198
190 199 proxy_redirect off;
191 200 proxy_set_header Host $host;
192 201 proxy_set_header X-Host $http_host;
193 202 proxy_set_header X-Real-IP $remote_addr;
194 203 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
195 204 proxy_set_header Proxy-host $proxy_host;
196 205 client_max_body_size 400m;
197 206 client_body_buffer_size 128k;
198 207 proxy_buffering off;
199 208 proxy_connect_timeout 3600;
200 209 proxy_send_timeout 3600;
201 210 proxy_read_timeout 3600;
202 211 proxy_buffer_size 8k;
203 212 proxy_buffers 8 32k;
204 213 proxy_busy_buffers_size 64k;
205 214 proxy_temp_file_write_size 64k;
206 215
207 216 Also when using root path with nginx You might set the static files to false
208 217 in production.ini file::
209 218
210 219 [app:main]
211 220 use = egg:rhodecode
212 221 full_stack = true
213 222 static_files = false
214 223 lang=en
215 224 cache_dir = %(here)s/data
216 225
217 226 To not have the statics served by the application. And improve speed.
218 227
219 228
220 229 Apache virtual host example
221 230 ---------------------------
222 231
223 232 Sample config for apache using proxy::
224 233
225 234 <VirtualHost *:80>
226 235 ServerName hg.myserver.com
227 236 ServerAlias hg.myserver.com
228 237
229 238 <Proxy *>
230 239 Order allow,deny
231 240 Allow from all
232 241 </Proxy>
233 242
234 243 #important !
235 244 #Directive to properly generate url (clone url) for pylons
236 245 ProxyPreserveHost On
237 246
238 247 #rhodecode instance
239 248 ProxyPass / http://127.0.0.1:5000/
240 249 ProxyPassReverse / http://127.0.0.1:5000/
241 250
242 251 #to enable https use line below
243 252 #SetEnvIf X-Url-Scheme https HTTPS=1
244 253
245 254 </VirtualHost>
246 255
247 256
248 257 Additional tutorial
249 258 http://wiki.pylonshq.com/display/pylonscookbook/Apache+as+a+reverse+proxy+for+Pylons
250 259
251 260
252 261 Apache's example FCGI config
253 262 ----------------------------
254 263
255 264 TODO !
256 265
257 266 Other configuration files
258 267 -------------------------
259 268
260 269 Some example init.d script can be found here, for debian and gentoo:
261 270
262 271 https://rhodeocode.org/rhodecode/files/tip/init.d
263 272
264 273
265 274 Troubleshooting
266 275 ---------------
267 276
268 277 - missing static files ?
269 278
270 279 - make sure either to set the `static_files = true` in the .ini file or
271 280 double check the root path for Your http setup. It should point to
272 281 for example:
273 282 /home/my-virtual-python/lib/python2.6/site-packages/rhodecode/public
274 283
275 284 - can't install celery/rabbitmq
276 285
277 286 - don't worry RhodeCode works without them too. No extra setup required
278 287
279 288 - long lasting push timeouts ?
280 289
281 290 - make sure You set a longer timeouts in Your proxy/fcgi settings, timeouts
282 291 are caused by https server and not RhodeCode
283 292
284 293 - large pushes timeouts ?
285 294
286 295 - make sure You set a proper max_body_size for the http server
287 296
288 297
289 298
290 299 .. _virtualenv: http://pypi.python.org/pypi/virtualenv
291 300 .. _python: http://www.python.org/
292 301 .. _mercurial: http://mercurial.selenic.com/
293 302 .. _celery: http://celeryproject.org/
294 303 .. _rabbitmq: http://www.rabbitmq.com/ No newline at end of file
@@ -1,193 +1,203 b''
1 1 import os
2 2 import sys
3 3 import traceback
4 4 from os.path import dirname as dn, join as jn
5 5
6 6 #to get the rhodecode import
7 7 sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
8 8
9 from string import strip
10
9 11 from rhodecode.model import init_model
10 12 from rhodecode.model.scm import ScmModel
11 13 from rhodecode.config.environment import load_environment
12 14 from rhodecode.lib.utils import BasePasterCommand, Command, add_cache
13 15
14 16 from shutil import rmtree
15 17 from webhelpers.html.builder import escape
16 18 from vcs.utils.lazy import LazyProperty
17 19
18 20 from sqlalchemy import engine_from_config
19 21
20 22 from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
21 23 from whoosh.fields import TEXT, ID, STORED, Schema, FieldType
22 24 from whoosh.index import create_in, open_dir
23 25 from whoosh.formats import Characters
24 26 from whoosh.highlight import highlight, SimpleFragmenter, HtmlFormatter
25 27
26 28
27 29 #EXTENSIONS WE WANT TO INDEX CONTENT OFF
28 30 INDEX_EXTENSIONS = ['action', 'adp', 'ashx', 'asmx', 'aspx', 'asx', 'axd', 'c',
29 31 'cfg', 'cfm', 'cpp', 'cs', 'css', 'diff', 'do', 'el', 'erl',
30 32 'h', 'htm', 'html', 'ini', 'java', 'js', 'jsp', 'jspx', 'lisp',
31 33 'lua', 'm', 'mako', 'ml', 'pas', 'patch', 'php', 'php3',
32 34 'php4', 'phtml', 'pm', 'py', 'rb', 'rst', 's', 'sh', 'sql',
33 35 'tpl', 'txt', 'vim', 'wss', 'xhtml', 'xml', 'xsl', 'xslt',
34 36 'yaws']
35 37
36 38 #CUSTOM ANALYZER wordsplit + lowercase filter
37 39 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
38 40
39 41
40 42 #INDEX SCHEMA DEFINITION
41 43 SCHEMA = Schema(owner=TEXT(),
42 44 repository=TEXT(stored=True),
43 45 path=TEXT(stored=True),
44 46 content=FieldType(format=Characters(ANALYZER),
45 47 scorable=True, stored=True),
46 48 modtime=STORED(), extension=TEXT(stored=True))
47 49
48 50
49 51 IDX_NAME = 'HG_INDEX'
50 52 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
51 53 FRAGMENTER = SimpleFragmenter(200)
52 54
53 55
54 56 class MakeIndex(BasePasterCommand):
55 57
56 58 max_args = 1
57 59 min_args = 1
58 60
59 61 usage = "CONFIG_FILE"
60 62 summary = "Creates index for full text search given configuration file"
61 63 group_name = "RhodeCode"
62 64 takes_config_file = -1
63 65 parser = Command.standard_parser(verbose=True)
64 66
65 67 def command(self):
66 68
67 69 from pylons import config
68 70 add_cache(config)
69 71 engine = engine_from_config(config, 'sqlalchemy.db1.')
70 72 init_model(engine)
71 73
72 74 index_location = config['index_dir']
73 75 repo_location = self.options.repo_location
76 repo_list = map(strip, self.options.repo_list.split(','))
74 77
75 78 #======================================================================
76 79 # WHOOSH DAEMON
77 80 #======================================================================
78 81 from rhodecode.lib.pidlock import LockHeld, DaemonLock
79 82 from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
80 83 try:
81 84 l = DaemonLock()
82 85 WhooshIndexingDaemon(index_location=index_location,
83 repo_location=repo_location)\
86 repo_location=repo_location,
87 repo_list=repo_list)\
84 88 .run(full_index=self.options.full_index)
85 89 l.release()
86 90 except LockHeld:
87 91 sys.exit(1)
88 92
89 93 def update_parser(self):
90 94 self.parser.add_option('--repo-location',
91 95 action='store',
92 96 dest='repo_location',
93 97 help="Specifies repositories location to index REQUIRED",
94 98 )
99 self.parser.add_option('--index-only',
100 action='store',
101 dest='repo_list',
102 help="Specifies a comma separated list of repositores "
103 "to build index on OPTIONAL",
104 )
95 105 self.parser.add_option('-f',
96 106 action='store_true',
97 107 dest='full_index',
98 108 help="Specifies that index should be made full i.e"
99 109 " destroy old and build from scratch",
100 110 default=False)
101 111
102 112 class ResultWrapper(object):
103 113 def __init__(self, search_type, searcher, matcher, highlight_items):
104 114 self.search_type = search_type
105 115 self.searcher = searcher
106 116 self.matcher = matcher
107 117 self.highlight_items = highlight_items
108 118 self.fragment_size = 200 / 2
109 119
110 120 @LazyProperty
111 121 def doc_ids(self):
112 122 docs_id = []
113 123 while self.matcher.is_active():
114 124 docnum = self.matcher.id()
115 125 chunks = [offsets for offsets in self.get_chunks()]
116 126 docs_id.append([docnum, chunks])
117 127 self.matcher.next()
118 128 return docs_id
119 129
120 130 def __str__(self):
121 131 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
122 132
123 133 def __repr__(self):
124 134 return self.__str__()
125 135
126 136 def __len__(self):
127 137 return len(self.doc_ids)
128 138
129 139 def __iter__(self):
130 140 """
131 141 Allows Iteration over results,and lazy generate content
132 142
133 143 *Requires* implementation of ``__getitem__`` method.
134 144 """
135 145 for docid in self.doc_ids:
136 146 yield self.get_full_content(docid)
137 147
138 148 def __getslice__(self, i, j):
139 149 """
140 150 Slicing of resultWrapper
141 151 """
142 152 slice = []
143 153 for docid in self.doc_ids[i:j]:
144 154 slice.append(self.get_full_content(docid))
145 155 return slice
146 156
147 157
148 158 def get_full_content(self, docid):
149 159 res = self.searcher.stored_fields(docid[0])
150 160 f_path = res['path'][res['path'].find(res['repository']) \
151 161 + len(res['repository']):].lstrip('/')
152 162
153 163 content_short = self.get_short_content(res, docid[1])
154 164 res.update({'content_short':content_short,
155 165 'content_short_hl':self.highlight(content_short),
156 166 'f_path':f_path})
157 167
158 168 return res
159 169
160 170 def get_short_content(self, res, chunks):
161 171
162 172 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
163 173
164 174 def get_chunks(self):
165 175 """
166 176 Smart function that implements chunking the content
167 177 but not overlap chunks so it doesn't highlight the same
168 178 close occurrences twice.
169 179 @param matcher:
170 180 @param size:
171 181 """
172 182 memory = [(0, 0)]
173 183 for span in self.matcher.spans():
174 184 start = span.startchar or 0
175 185 end = span.endchar or 0
176 186 start_offseted = max(0, start - self.fragment_size)
177 187 end_offseted = end + self.fragment_size
178 188
179 189 if start_offseted < memory[-1][1]:
180 190 start_offseted = memory[-1][1]
181 191 memory.append((start_offseted, end_offseted,))
182 192 yield (start_offseted, end_offseted,)
183 193
184 194 def highlight(self, content, top=5):
185 195 if self.search_type != 'content':
186 196 return ''
187 197 hl = highlight(escape(content),
188 198 self.highlight_items,
189 199 analyzer=ANALYZER,
190 200 fragmenter=FRAGMENTER,
191 201 formatter=FORMATTER,
192 202 top=top)
193 203 return hl
@@ -1,226 +1,236 b''
1 1 # -*- coding: utf-8 -*-
2 2 """
3 3 rhodecode.lib.indexers.daemon
4 4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5 5
6 6 A deamon will read from task table and run tasks
7 7
8 8 :created_on: Jan 26, 2010
9 9 :author: marcink
10 10 :copyright: (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
11 11 :license: GPLv3, see COPYING for more details.
12 12 """
13 13 # This program is free software; you can redistribute it and/or
14 14 # modify it under the terms of the GNU General Public License
15 15 # as published by the Free Software Foundation; version 2
16 16 # of the License or (at your opinion) any later version of the license.
17 17 #
18 18 # This program is distributed in the hope that it will be useful,
19 19 # but WITHOUT ANY WARRANTY; without even the implied warranty of
20 20 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 21 # GNU General Public License for more details.
22 22 #
23 23 # You should have received a copy of the GNU General Public License
24 24 # along with this program; if not, write to the Free Software
25 25 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
26 26 # MA 02110-1301, USA.
27 27
28 28 import sys
29 29 import os
30 30 import traceback
31 31 from os.path import dirname as dn
32 32 from os.path import join as jn
33 33
34 34 #to get the rhodecode import
35 35 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
36 36 sys.path.append(project_path)
37 37
38 38
39 39 from rhodecode.model.scm import ScmModel
40 40 from rhodecode.lib.helpers import safe_unicode
41 41 from whoosh.index import create_in, open_dir
42 42 from shutil import rmtree
43 43 from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
44 44
45 45 from time import mktime
46 46 from vcs.exceptions import ChangesetError, RepositoryError
47 47
48 48 import logging
49 49
50 50 log = logging.getLogger('whooshIndexer')
51 51 # create logger
52 52 log.setLevel(logging.DEBUG)
53 53 log.propagate = False
54 54 # create console handler and set level to debug
55 55 ch = logging.StreamHandler()
56 56 ch.setLevel(logging.DEBUG)
57 57
58 58 # create formatter
59 59 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
60 60
61 61 # add formatter to ch
62 62 ch.setFormatter(formatter)
63 63
64 64 # add ch to logger
65 65 log.addHandler(ch)
66 66
67 67 class WhooshIndexingDaemon(object):
68 68 """
69 69 Deamon for atomic jobs
70 70 """
71 71
72 72 def __init__(self, indexname='HG_INDEX', index_location=None,
73 repo_location=None, sa=None):
73 repo_location=None, sa=None, repo_list=None):
74 74 self.indexname = indexname
75 75
76 76 self.index_location = index_location
77 77 if not index_location:
78 78 raise Exception('You have to provide index location')
79 79
80 80 self.repo_location = repo_location
81 81 if not repo_location:
82 82 raise Exception('You have to provide repositories location')
83 83
84 84 self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
85
86 if repo_list:
87 filtered_repo_paths = {}
88 for repo_name, repo in self.repo_paths.items():
89 if repo_name in repo_list:
90 filtered_repo_paths[repo.name] = repo
91
92 self.repo_paths = filtered_repo_paths
93
94
85 95 self.initial = False
86 96 if not os.path.isdir(self.index_location):
87 97 os.makedirs(self.index_location)
88 98 log.info('Cannot run incremental index since it does not'
89 99 ' yet exist running full build')
90 100 self.initial = True
91 101
92 102 def get_paths(self, repo):
93 103 """recursive walk in root dir and return a set of all path in that dir
94 104 based on repository walk function
95 105 """
96 106 index_paths_ = set()
97 107 try:
98 108 for topnode, dirs, files in repo.walk('/', 'tip'):
99 109 for f in files:
100 110 index_paths_.add(jn(repo.path, f.path))
101 111 for dir in dirs:
102 112 for f in files:
103 113 index_paths_.add(jn(repo.path, f.path))
104 114
105 115 except RepositoryError, e:
106 116 log.debug(traceback.format_exc())
107 117 pass
108 118 return index_paths_
109 119
110 120 def get_node(self, repo, path):
111 121 n_path = path[len(repo.path) + 1:]
112 122 node = repo.get_changeset().get_node(n_path)
113 123 return node
114 124
115 125 def get_node_mtime(self, node):
116 126 return mktime(node.last_changeset.date.timetuple())
117 127
118 128 def add_doc(self, writer, path, repo):
119 129 """Adding doc to writer this function itself fetches data from
120 130 the instance of vcs backend"""
121 131 node = self.get_node(repo, path)
122 132
123 133 #we just index the content of chosen files, and skip binary files
124 134 if node.extension in INDEX_EXTENSIONS and not node.is_binary:
125 135
126 136 u_content = node.content
127 137 if not isinstance(u_content, unicode):
128 138 log.warning(' >> %s Could not get this content as unicode '
129 139 'replacing with empty content', path)
130 140 u_content = u''
131 141 else:
132 142 log.debug(' >> %s [WITH CONTENT]' % path)
133 143
134 144 else:
135 145 log.debug(' >> %s' % path)
136 146 #just index file name without it's content
137 147 u_content = u''
138 148
139 149 writer.add_document(owner=unicode(repo.contact),
140 150 repository=safe_unicode(repo.name),
141 151 path=safe_unicode(path),
142 152 content=u_content,
143 153 modtime=self.get_node_mtime(node),
144 154 extension=node.extension)
145 155
146 156
147 157 def build_index(self):
148 158 if os.path.exists(self.index_location):
149 159 log.debug('removing previous index')
150 160 rmtree(self.index_location)
151 161
152 162 if not os.path.exists(self.index_location):
153 163 os.mkdir(self.index_location)
154 164
155 165 idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
156 166 writer = idx.writer()
157 print self.repo_paths.values()
158 for cnt, repo in enumerate(self.repo_paths.values()):
167
168 for repo in self.repo_paths.values():
159 169 log.debug('building index @ %s' % repo.path)
160 170
161 171 for idx_path in self.get_paths(repo):
162 172 self.add_doc(writer, idx_path, repo)
163 173
164 174 log.debug('>> COMMITING CHANGES <<')
165 175 writer.commit(merge=True)
166 176 log.debug('>>> FINISHED BUILDING INDEX <<<')
167 177
168 178
169 179 def update_index(self):
170 180 log.debug('STARTING INCREMENTAL INDEXING UPDATE')
171 181
172 182 idx = open_dir(self.index_location, indexname=self.indexname)
173 183 # The set of all paths in the index
174 184 indexed_paths = set()
175 185 # The set of all paths we need to re-index
176 186 to_index = set()
177 187
178 188 reader = idx.reader()
179 189 writer = idx.writer()
180 190
181 191 # Loop over the stored fields in the index
182 192 for fields in reader.all_stored_fields():
183 193 indexed_path = fields['path']
184 194 indexed_paths.add(indexed_path)
185 195
186 196 repo = self.repo_paths[fields['repository']]
187 197
188 198 try:
189 199 node = self.get_node(repo, indexed_path)
190 200 except ChangesetError:
191 201 # This file was deleted since it was indexed
192 202 log.debug('removing from index %s' % indexed_path)
193 203 writer.delete_by_term('path', indexed_path)
194 204
195 205 else:
196 206 # Check if this file was changed since it was indexed
197 207 indexed_time = fields['modtime']
198 208 mtime = self.get_node_mtime(node)
199 209 if mtime > indexed_time:
200 210 # The file has changed, delete it and add it to the list of
201 211 # files to reindex
202 212 log.debug('adding to reindex list %s' % indexed_path)
203 213 writer.delete_by_term('path', indexed_path)
204 214 to_index.add(indexed_path)
205 215
206 216 # Loop over the files in the filesystem
207 217 # Assume we have a function that gathers the filenames of the
208 218 # documents to be indexed
209 219 for repo in self.repo_paths.values():
210 220 for path in self.get_paths(repo):
211 221 if path in to_index or path not in indexed_paths:
212 222 # This is either a file that's changed, or a new file
213 223 # that wasn't indexed before. So index it!
214 224 self.add_doc(writer, path, repo)
215 225 log.debug('re indexing %s' % path)
216 226
217 227 log.debug('>> COMMITING CHANGES <<')
218 228 writer.commit(merge=True)
219 229 log.debug('>>> FINISHED REBUILDING INDEX <<<')
220 230
221 231 def run(self, full_index=False):
222 232 """Run daemon"""
223 233 if full_index or self.initial:
224 234 self.build_index()
225 235 else:
226 236 self.update_index()
General Comments 0
You need to be logged in to leave comments. Login now