##// END OF EJS Templates
fixes #90 + docs update
marcink -
r894:1fed3c91 beta
parent child Browse files
Show More
@@ -1,294 +1,303 b''
1 .. _setup:
1 .. _setup:
2
2
3 Setup
3 Setup
4 =====
4 =====
5
5
6
6
7 Setting up the application
7 Setting up the application
8 --------------------------
8 --------------------------
9
9
10 First You'll ned to create RhodeCode config file. Run the following command
10 First You'll ned to create RhodeCode config file. Run the following command
11 to do this
11 to do this
12
12
13 ::
13 ::
14
14
15 paster make-config RhodeCode production.ini
15 paster make-config RhodeCode production.ini
16
16
17 - This will create `production.ini` config inside the directory
17 - This will create `production.ini` config inside the directory
18 this config contains various settings for RhodeCode, e.g proxy port,
18 this config contains various settings for RhodeCode, e.g proxy port,
19 email settings, usage of static files, cache, celery settings and logging.
19 email settings, usage of static files, cache, celery settings and logging.
20
20
21
21
22
22
23 Next we need to create the database.
23 Next we need to create the database.
24
24
25 ::
25 ::
26
26
27 paster setup-app production.ini
27 paster setup-app production.ini
28
28
29 - This command will create all needed tables and an admin account.
29 - This command will create all needed tables and an admin account.
30 When asked for a path You can either use a new location of one with already
30 When asked for a path You can either use a new location of one with already
31 existing ones. RhodeCode will simply add all new found repositories to
31 existing ones. RhodeCode will simply add all new found repositories to
32 it's database. Also make sure You specify correct path to repositories.
32 it's database. Also make sure You specify correct path to repositories.
33 - Remember that the given path for mercurial_ repositories must be write
33 - Remember that the given path for mercurial_ repositories must be write
34 accessible for the application. It's very important since RhodeCode web
34 accessible for the application. It's very important since RhodeCode web
35 interface will work even without such an access but, when trying to do a
35 interface will work even without such an access but, when trying to do a
36 push it'll eventually fail with permission denied errors.
36 push it'll eventually fail with permission denied errors.
37
37
38 You are ready to use rhodecode, to run it simply execute
38 You are ready to use rhodecode, to run it simply execute
39
39
40 ::
40 ::
41
41
42 paster serve production.ini
42 paster serve production.ini
43
43
44 - This command runs the RhodeCode server the app should be available at the
44 - This command runs the RhodeCode server the app should be available at the
45 127.0.0.1:5000. This ip and port is configurable via the production.ini
45 127.0.0.1:5000. This ip and port is configurable via the production.ini
46 file created in previous step
46 file created in previous step
47 - Use admin account you created to login.
47 - Use admin account you created to login.
48 - Default permissions on each repository is read, and owner is admin. So
48 - Default permissions on each repository is read, and owner is admin. So
49 remember to update these if needed. In the admin panel You can toggle ldap,
49 remember to update these if needed. In the admin panel You can toggle ldap,
50 anonymous, permissions settings. As well as edit more advanced options on
50 anonymous, permissions settings. As well as edit more advanced options on
51 users and repositories
51 users and repositories
52
52
53
53
54 Setting up Whoosh full text search
54 Setting up Whoosh full text search
55 ----------------------------------
55 ----------------------------------
56
56
57 Index for whoosh can be build starting from version 1.1 using paster command
57 Starting from version 1.1 whoosh index can be build using paster command.
58 passing repo locations to index, as well as Your config file that stores
58 You have to specify the config file that stores location of index, and
59 whoosh index files locations. There is possible to pass `-f` to the options
59 location of repositories (`--repo-location`). Starting from version 1.2 it is
60 also possible to specify a comma separated list of repositories (`--index-only`)
61 to build index only on chooses repositories skipping any other found in repos
62 location
63
64 There is possible also to pass `-f` to the options
60 to enable full index rebuild. Without that indexing will run always in in
65 to enable full index rebuild. Without that indexing will run always in in
61 incremental mode.
66 incremental mode.
62
67
63 ::
68 incremental mode::
64
69
65 paster make-index production.ini --repo-location=<location for repos>
70 paster make-index production.ini --repo-location=<location for repos>
66
71
67 for full index rebuild You can use
72
68
73
69 ::
74 for full index rebuild You can use::
70
75
71 paster make-index production.ini -f --repo-location=<location for repos>
76 paster make-index production.ini -f --repo-location=<location for repos>
72
77
73 - For full text search You can either put crontab entry for
78
79 building index just for chosen repositories is possible with such command::
80
81 paster make-index production.ini --repo-location=<location for repos> --index-only=vcs,rhodecode
74
82
75 This command can be run even from crontab in order to do periodical
83
76 index builds and keep Your index always up to date. An example entry might
84 In order to do periodical index builds and keep Your index always up to date.
77 look like this
85 It's recommended to do a crontab entry for incremental indexing.
86 An example entry might look like this
78
87
79 ::
88 ::
80
89
81 /path/to/python/bin/paster /path/to/rhodecode/production.ini --repo-location=<location for repos>
90 /path/to/python/bin/paster /path/to/rhodecode/production.ini --repo-location=<location for repos>
82
91
83 When using incremental(default) mode whoosh will check last modification date
92 When using incremental (default) mode whoosh will check last modification date
84 of each file and add it to reindex if newer file is available. Also indexing
93 of each file and add it to reindex if newer file is available. Also indexing
85 daemon checks for removed files and removes them from index.
94 daemon checks for removed files and removes them from index.
86
95
87 Sometime You might want to rebuild index from scratch. You can do that using
96 Sometime You might want to rebuild index from scratch. You can do that using
88 the `-f` flag passed to paster command or, in admin panel You can check
97 the `-f` flag passed to paster command or, in admin panel You can check
89 `build from scratch` flag.
98 `build from scratch` flag.
90
99
91
100
92 Setting up LDAP support
101 Setting up LDAP support
93 -----------------------
102 -----------------------
94
103
95 RhodeCode starting from version 1.1 supports ldap authentication. In order
104 RhodeCode starting from version 1.1 supports ldap authentication. In order
96 to use ldap, You have to install python-ldap package. This package is available
105 to use ldap, You have to install python-ldap package. This package is available
97 via pypi, so You can install it by running
106 via pypi, so You can install it by running
98
107
99 ::
108 ::
100
109
101 easy_install python-ldap
110 easy_install python-ldap
102
111
103 ::
112 ::
104
113
105 pip install python-ldap
114 pip install python-ldap
106
115
107 .. note::
116 .. note::
108 python-ldap requires some certain libs on Your system, so before installing
117 python-ldap requires some certain libs on Your system, so before installing
109 it check that You have at least `openldap`, and `sasl` libraries.
118 it check that You have at least `openldap`, and `sasl` libraries.
110
119
111 ldap settings are located in admin->ldap section,
120 ldap settings are located in admin->ldap section,
112
121
113 Here's a typical ldap setup::
122 Here's a typical ldap setup::
114
123
115 Enable ldap = checked #controls if ldap access is enabled
124 Enable ldap = checked #controls if ldap access is enabled
116 Host = host.domain.org #actual ldap server to connect
125 Host = host.domain.org #actual ldap server to connect
117 Port = 389 or 689 for ldaps #ldap server ports
126 Port = 389 or 689 for ldaps #ldap server ports
118 Enable LDAPS = unchecked #enable disable ldaps
127 Enable LDAPS = unchecked #enable disable ldaps
119 Account = <account> #access for ldap server(if required)
128 Account = <account> #access for ldap server(if required)
120 Password = <password> #password for ldap server(if required)
129 Password = <password> #password for ldap server(if required)
121 Base DN = uid=%(user)s,CN=users,DC=host,DC=domain,DC=org
130 Base DN = uid=%(user)s,CN=users,DC=host,DC=domain,DC=org
122
131
123
132
124 `Account` and `Password` are optional, and used for two-phase ldap
133 `Account` and `Password` are optional, and used for two-phase ldap
125 authentication so those are credentials to access Your ldap, if it doesn't
134 authentication so those are credentials to access Your ldap, if it doesn't
126 support anonymous search/user lookups.
135 support anonymous search/user lookups.
127
136
128 Base DN must have %(user)s template inside, it's a placer where Your uid used
137 Base DN must have %(user)s template inside, it's a placer where Your uid used
129 to login would go, it allows admins to specify not standard schema for uid
138 to login would go, it allows admins to specify not standard schema for uid
130 variable
139 variable
131
140
132 If all data are entered correctly, and `python-ldap` is properly installed
141 If all data are entered correctly, and `python-ldap` is properly installed
133 Users should be granted to access RhodeCode wit ldap accounts. When
142 Users should be granted to access RhodeCode wit ldap accounts. When
134 logging at the first time an special ldap account is created inside RhodeCode,
143 logging at the first time an special ldap account is created inside RhodeCode,
135 so You can control over permissions even on ldap users. If such user exists
144 so You can control over permissions even on ldap users. If such user exists
136 already in RhodeCode database ldap user with the same username would be not
145 already in RhodeCode database ldap user with the same username would be not
137 able to access RhodeCode.
146 able to access RhodeCode.
138
147
139 If You have problems with ldap access and believe You entered correct
148 If You have problems with ldap access and believe You entered correct
140 information check out the RhodeCode logs,any error messages sent from
149 information check out the RhodeCode logs,any error messages sent from
141 ldap will be saved there.
150 ldap will be saved there.
142
151
143
152
144
153
145 Setting Up Celery
154 Setting Up Celery
146 -----------------
155 -----------------
147
156
148 Since version 1.1 celery is configured by the rhodecode ini configuration files
157 Since version 1.1 celery is configured by the rhodecode ini configuration files
149 simply set use_celery=true in the ini file then add / change the configuration
158 simply set use_celery=true in the ini file then add / change the configuration
150 variables inside the ini file.
159 variables inside the ini file.
151
160
152 Remember that the ini files uses format with '.' not with '_' like celery
161 Remember that the ini files uses format with '.' not with '_' like celery
153 so for example setting `BROKER_HOST` in celery means setting `broker.host` in
162 so for example setting `BROKER_HOST` in celery means setting `broker.host` in
154 the config file.
163 the config file.
155
164
156 In order to make start using celery run::
165 In order to make start using celery run::
157 paster celeryd <configfile.ini>
166 paster celeryd <configfile.ini>
158
167
159
168
160
169
161 .. note::
170 .. note::
162 Make sure You run this command from same virtualenv, and with the same user
171 Make sure You run this command from same virtualenv, and with the same user
163 that rhodecode runs.
172 that rhodecode runs.
164
173
165
174
166 Nginx virtual host example
175 Nginx virtual host example
167 --------------------------
176 --------------------------
168
177
169 Sample config for nginx using proxy::
178 Sample config for nginx using proxy::
170
179
171 server {
180 server {
172 listen 80;
181 listen 80;
173 server_name hg.myserver.com;
182 server_name hg.myserver.com;
174 access_log /var/log/nginx/rhodecode.access.log;
183 access_log /var/log/nginx/rhodecode.access.log;
175 error_log /var/log/nginx/rhodecode.error.log;
184 error_log /var/log/nginx/rhodecode.error.log;
176 location / {
185 location / {
177 root /var/www/rhodecode/rhodecode/public/;
186 root /var/www/rhodecode/rhodecode/public/;
178 if (!-f $request_filename){
187 if (!-f $request_filename){
179 proxy_pass http://127.0.0.1:5000;
188 proxy_pass http://127.0.0.1:5000;
180 }
189 }
181 #this is important if You want to use https !!!
190 #this is important if You want to use https !!!
182 proxy_set_header X-Url-Scheme $scheme;
191 proxy_set_header X-Url-Scheme $scheme;
183 include /etc/nginx/proxy.conf;
192 include /etc/nginx/proxy.conf;
184 }
193 }
185 }
194 }
186
195
187 Here's the proxy.conf. It's tuned so it'll not timeout on long
196 Here's the proxy.conf. It's tuned so it'll not timeout on long
188 pushes and also on large pushes::
197 pushes and also on large pushes::
189
198
190 proxy_redirect off;
199 proxy_redirect off;
191 proxy_set_header Host $host;
200 proxy_set_header Host $host;
192 proxy_set_header X-Host $http_host;
201 proxy_set_header X-Host $http_host;
193 proxy_set_header X-Real-IP $remote_addr;
202 proxy_set_header X-Real-IP $remote_addr;
194 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
203 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
195 proxy_set_header Proxy-host $proxy_host;
204 proxy_set_header Proxy-host $proxy_host;
196 client_max_body_size 400m;
205 client_max_body_size 400m;
197 client_body_buffer_size 128k;
206 client_body_buffer_size 128k;
198 proxy_buffering off;
207 proxy_buffering off;
199 proxy_connect_timeout 3600;
208 proxy_connect_timeout 3600;
200 proxy_send_timeout 3600;
209 proxy_send_timeout 3600;
201 proxy_read_timeout 3600;
210 proxy_read_timeout 3600;
202 proxy_buffer_size 8k;
211 proxy_buffer_size 8k;
203 proxy_buffers 8 32k;
212 proxy_buffers 8 32k;
204 proxy_busy_buffers_size 64k;
213 proxy_busy_buffers_size 64k;
205 proxy_temp_file_write_size 64k;
214 proxy_temp_file_write_size 64k;
206
215
207 Also when using root path with nginx You might set the static files to false
216 Also when using root path with nginx You might set the static files to false
208 in production.ini file::
217 in production.ini file::
209
218
210 [app:main]
219 [app:main]
211 use = egg:rhodecode
220 use = egg:rhodecode
212 full_stack = true
221 full_stack = true
213 static_files = false
222 static_files = false
214 lang=en
223 lang=en
215 cache_dir = %(here)s/data
224 cache_dir = %(here)s/data
216
225
217 To not have the statics served by the application. And improve speed.
226 To not have the statics served by the application. And improve speed.
218
227
219
228
220 Apache virtual host example
229 Apache virtual host example
221 ---------------------------
230 ---------------------------
222
231
223 Sample config for apache using proxy::
232 Sample config for apache using proxy::
224
233
225 <VirtualHost *:80>
234 <VirtualHost *:80>
226 ServerName hg.myserver.com
235 ServerName hg.myserver.com
227 ServerAlias hg.myserver.com
236 ServerAlias hg.myserver.com
228
237
229 <Proxy *>
238 <Proxy *>
230 Order allow,deny
239 Order allow,deny
231 Allow from all
240 Allow from all
232 </Proxy>
241 </Proxy>
233
242
234 #important !
243 #important !
235 #Directive to properly generate url (clone url) for pylons
244 #Directive to properly generate url (clone url) for pylons
236 ProxyPreserveHost On
245 ProxyPreserveHost On
237
246
238 #rhodecode instance
247 #rhodecode instance
239 ProxyPass / http://127.0.0.1:5000/
248 ProxyPass / http://127.0.0.1:5000/
240 ProxyPassReverse / http://127.0.0.1:5000/
249 ProxyPassReverse / http://127.0.0.1:5000/
241
250
242 #to enable https use line below
251 #to enable https use line below
243 #SetEnvIf X-Url-Scheme https HTTPS=1
252 #SetEnvIf X-Url-Scheme https HTTPS=1
244
253
245 </VirtualHost>
254 </VirtualHost>
246
255
247
256
248 Additional tutorial
257 Additional tutorial
249 http://wiki.pylonshq.com/display/pylonscookbook/Apache+as+a+reverse+proxy+for+Pylons
258 http://wiki.pylonshq.com/display/pylonscookbook/Apache+as+a+reverse+proxy+for+Pylons
250
259
251
260
252 Apache's example FCGI config
261 Apache's example FCGI config
253 ----------------------------
262 ----------------------------
254
263
255 TODO !
264 TODO !
256
265
257 Other configuration files
266 Other configuration files
258 -------------------------
267 -------------------------
259
268
260 Some example init.d script can be found here, for debian and gentoo:
269 Some example init.d script can be found here, for debian and gentoo:
261
270
262 https://rhodeocode.org/rhodecode/files/tip/init.d
271 https://rhodeocode.org/rhodecode/files/tip/init.d
263
272
264
273
265 Troubleshooting
274 Troubleshooting
266 ---------------
275 ---------------
267
276
268 - missing static files ?
277 - missing static files ?
269
278
270 - make sure either to set the `static_files = true` in the .ini file or
279 - make sure either to set the `static_files = true` in the .ini file or
271 double check the root path for Your http setup. It should point to
280 double check the root path for Your http setup. It should point to
272 for example:
281 for example:
273 /home/my-virtual-python/lib/python2.6/site-packages/rhodecode/public
282 /home/my-virtual-python/lib/python2.6/site-packages/rhodecode/public
274
283
275 - can't install celery/rabbitmq
284 - can't install celery/rabbitmq
276
285
277 - don't worry RhodeCode works without them too. No extra setup required
286 - don't worry RhodeCode works without them too. No extra setup required
278
287
279 - long lasting push timeouts ?
288 - long lasting push timeouts ?
280
289
281 - make sure You set a longer timeouts in Your proxy/fcgi settings, timeouts
290 - make sure You set a longer timeouts in Your proxy/fcgi settings, timeouts
282 are caused by https server and not RhodeCode
291 are caused by https server and not RhodeCode
283
292
284 - large pushes timeouts ?
293 - large pushes timeouts ?
285
294
286 - make sure You set a proper max_body_size for the http server
295 - make sure You set a proper max_body_size for the http server
287
296
288
297
289
298
290 .. _virtualenv: http://pypi.python.org/pypi/virtualenv
299 .. _virtualenv: http://pypi.python.org/pypi/virtualenv
291 .. _python: http://www.python.org/
300 .. _python: http://www.python.org/
292 .. _mercurial: http://mercurial.selenic.com/
301 .. _mercurial: http://mercurial.selenic.com/
293 .. _celery: http://celeryproject.org/
302 .. _celery: http://celeryproject.org/
294 .. _rabbitmq: http://www.rabbitmq.com/ No newline at end of file
303 .. _rabbitmq: http://www.rabbitmq.com/
@@ -1,193 +1,203 b''
1 import os
1 import os
2 import sys
2 import sys
3 import traceback
3 import traceback
4 from os.path import dirname as dn, join as jn
4 from os.path import dirname as dn, join as jn
5
5
6 #to get the rhodecode import
6 #to get the rhodecode import
7 sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
7 sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
8
8
9 from string import strip
10
9 from rhodecode.model import init_model
11 from rhodecode.model import init_model
10 from rhodecode.model.scm import ScmModel
12 from rhodecode.model.scm import ScmModel
11 from rhodecode.config.environment import load_environment
13 from rhodecode.config.environment import load_environment
12 from rhodecode.lib.utils import BasePasterCommand, Command, add_cache
14 from rhodecode.lib.utils import BasePasterCommand, Command, add_cache
13
15
14 from shutil import rmtree
16 from shutil import rmtree
15 from webhelpers.html.builder import escape
17 from webhelpers.html.builder import escape
16 from vcs.utils.lazy import LazyProperty
18 from vcs.utils.lazy import LazyProperty
17
19
18 from sqlalchemy import engine_from_config
20 from sqlalchemy import engine_from_config
19
21
20 from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
22 from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
21 from whoosh.fields import TEXT, ID, STORED, Schema, FieldType
23 from whoosh.fields import TEXT, ID, STORED, Schema, FieldType
22 from whoosh.index import create_in, open_dir
24 from whoosh.index import create_in, open_dir
23 from whoosh.formats import Characters
25 from whoosh.formats import Characters
24 from whoosh.highlight import highlight, SimpleFragmenter, HtmlFormatter
26 from whoosh.highlight import highlight, SimpleFragmenter, HtmlFormatter
25
27
26
28
27 #EXTENSIONS WE WANT TO INDEX CONTENT OFF
29 #EXTENSIONS WE WANT TO INDEX CONTENT OFF
28 INDEX_EXTENSIONS = ['action', 'adp', 'ashx', 'asmx', 'aspx', 'asx', 'axd', 'c',
30 INDEX_EXTENSIONS = ['action', 'adp', 'ashx', 'asmx', 'aspx', 'asx', 'axd', 'c',
29 'cfg', 'cfm', 'cpp', 'cs', 'css', 'diff', 'do', 'el', 'erl',
31 'cfg', 'cfm', 'cpp', 'cs', 'css', 'diff', 'do', 'el', 'erl',
30 'h', 'htm', 'html', 'ini', 'java', 'js', 'jsp', 'jspx', 'lisp',
32 'h', 'htm', 'html', 'ini', 'java', 'js', 'jsp', 'jspx', 'lisp',
31 'lua', 'm', 'mako', 'ml', 'pas', 'patch', 'php', 'php3',
33 'lua', 'm', 'mako', 'ml', 'pas', 'patch', 'php', 'php3',
32 'php4', 'phtml', 'pm', 'py', 'rb', 'rst', 's', 'sh', 'sql',
34 'php4', 'phtml', 'pm', 'py', 'rb', 'rst', 's', 'sh', 'sql',
33 'tpl', 'txt', 'vim', 'wss', 'xhtml', 'xml', 'xsl', 'xslt',
35 'tpl', 'txt', 'vim', 'wss', 'xhtml', 'xml', 'xsl', 'xslt',
34 'yaws']
36 'yaws']
35
37
36 #CUSTOM ANALYZER wordsplit + lowercase filter
38 #CUSTOM ANALYZER wordsplit + lowercase filter
37 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
39 ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
38
40
39
41
40 #INDEX SCHEMA DEFINITION
42 #INDEX SCHEMA DEFINITION
41 SCHEMA = Schema(owner=TEXT(),
43 SCHEMA = Schema(owner=TEXT(),
42 repository=TEXT(stored=True),
44 repository=TEXT(stored=True),
43 path=TEXT(stored=True),
45 path=TEXT(stored=True),
44 content=FieldType(format=Characters(ANALYZER),
46 content=FieldType(format=Characters(ANALYZER),
45 scorable=True, stored=True),
47 scorable=True, stored=True),
46 modtime=STORED(), extension=TEXT(stored=True))
48 modtime=STORED(), extension=TEXT(stored=True))
47
49
48
50
49 IDX_NAME = 'HG_INDEX'
51 IDX_NAME = 'HG_INDEX'
50 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
52 FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
51 FRAGMENTER = SimpleFragmenter(200)
53 FRAGMENTER = SimpleFragmenter(200)
52
54
53
55
54 class MakeIndex(BasePasterCommand):
56 class MakeIndex(BasePasterCommand):
55
57
56 max_args = 1
58 max_args = 1
57 min_args = 1
59 min_args = 1
58
60
59 usage = "CONFIG_FILE"
61 usage = "CONFIG_FILE"
60 summary = "Creates index for full text search given configuration file"
62 summary = "Creates index for full text search given configuration file"
61 group_name = "RhodeCode"
63 group_name = "RhodeCode"
62 takes_config_file = -1
64 takes_config_file = -1
63 parser = Command.standard_parser(verbose=True)
65 parser = Command.standard_parser(verbose=True)
64
66
65 def command(self):
67 def command(self):
66
68
67 from pylons import config
69 from pylons import config
68 add_cache(config)
70 add_cache(config)
69 engine = engine_from_config(config, 'sqlalchemy.db1.')
71 engine = engine_from_config(config, 'sqlalchemy.db1.')
70 init_model(engine)
72 init_model(engine)
71
73
72 index_location = config['index_dir']
74 index_location = config['index_dir']
73 repo_location = self.options.repo_location
75 repo_location = self.options.repo_location
76 repo_list = map(strip, self.options.repo_list.split(','))
74
77
75 #======================================================================
78 #======================================================================
76 # WHOOSH DAEMON
79 # WHOOSH DAEMON
77 #======================================================================
80 #======================================================================
78 from rhodecode.lib.pidlock import LockHeld, DaemonLock
81 from rhodecode.lib.pidlock import LockHeld, DaemonLock
79 from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
82 from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
80 try:
83 try:
81 l = DaemonLock()
84 l = DaemonLock()
82 WhooshIndexingDaemon(index_location=index_location,
85 WhooshIndexingDaemon(index_location=index_location,
83 repo_location=repo_location)\
86 repo_location=repo_location,
87 repo_list=repo_list)\
84 .run(full_index=self.options.full_index)
88 .run(full_index=self.options.full_index)
85 l.release()
89 l.release()
86 except LockHeld:
90 except LockHeld:
87 sys.exit(1)
91 sys.exit(1)
88
92
89 def update_parser(self):
93 def update_parser(self):
90 self.parser.add_option('--repo-location',
94 self.parser.add_option('--repo-location',
91 action='store',
95 action='store',
92 dest='repo_location',
96 dest='repo_location',
93 help="Specifies repositories location to index REQUIRED",
97 help="Specifies repositories location to index REQUIRED",
94 )
98 )
99 self.parser.add_option('--index-only',
100 action='store',
101 dest='repo_list',
102 help="Specifies a comma separated list of repositores "
103 "to build index on OPTIONAL",
104 )
95 self.parser.add_option('-f',
105 self.parser.add_option('-f',
96 action='store_true',
106 action='store_true',
97 dest='full_index',
107 dest='full_index',
98 help="Specifies that index should be made full i.e"
108 help="Specifies that index should be made full i.e"
99 " destroy old and build from scratch",
109 " destroy old and build from scratch",
100 default=False)
110 default=False)
101
111
102 class ResultWrapper(object):
112 class ResultWrapper(object):
103 def __init__(self, search_type, searcher, matcher, highlight_items):
113 def __init__(self, search_type, searcher, matcher, highlight_items):
104 self.search_type = search_type
114 self.search_type = search_type
105 self.searcher = searcher
115 self.searcher = searcher
106 self.matcher = matcher
116 self.matcher = matcher
107 self.highlight_items = highlight_items
117 self.highlight_items = highlight_items
108 self.fragment_size = 200 / 2
118 self.fragment_size = 200 / 2
109
119
110 @LazyProperty
120 @LazyProperty
111 def doc_ids(self):
121 def doc_ids(self):
112 docs_id = []
122 docs_id = []
113 while self.matcher.is_active():
123 while self.matcher.is_active():
114 docnum = self.matcher.id()
124 docnum = self.matcher.id()
115 chunks = [offsets for offsets in self.get_chunks()]
125 chunks = [offsets for offsets in self.get_chunks()]
116 docs_id.append([docnum, chunks])
126 docs_id.append([docnum, chunks])
117 self.matcher.next()
127 self.matcher.next()
118 return docs_id
128 return docs_id
119
129
120 def __str__(self):
130 def __str__(self):
121 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
131 return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
122
132
123 def __repr__(self):
133 def __repr__(self):
124 return self.__str__()
134 return self.__str__()
125
135
126 def __len__(self):
136 def __len__(self):
127 return len(self.doc_ids)
137 return len(self.doc_ids)
128
138
129 def __iter__(self):
139 def __iter__(self):
130 """
140 """
131 Allows Iteration over results,and lazy generate content
141 Allows Iteration over results,and lazy generate content
132
142
133 *Requires* implementation of ``__getitem__`` method.
143 *Requires* implementation of ``__getitem__`` method.
134 """
144 """
135 for docid in self.doc_ids:
145 for docid in self.doc_ids:
136 yield self.get_full_content(docid)
146 yield self.get_full_content(docid)
137
147
138 def __getslice__(self, i, j):
148 def __getslice__(self, i, j):
139 """
149 """
140 Slicing of resultWrapper
150 Slicing of resultWrapper
141 """
151 """
142 slice = []
152 slice = []
143 for docid in self.doc_ids[i:j]:
153 for docid in self.doc_ids[i:j]:
144 slice.append(self.get_full_content(docid))
154 slice.append(self.get_full_content(docid))
145 return slice
155 return slice
146
156
147
157
148 def get_full_content(self, docid):
158 def get_full_content(self, docid):
149 res = self.searcher.stored_fields(docid[0])
159 res = self.searcher.stored_fields(docid[0])
150 f_path = res['path'][res['path'].find(res['repository']) \
160 f_path = res['path'][res['path'].find(res['repository']) \
151 + len(res['repository']):].lstrip('/')
161 + len(res['repository']):].lstrip('/')
152
162
153 content_short = self.get_short_content(res, docid[1])
163 content_short = self.get_short_content(res, docid[1])
154 res.update({'content_short':content_short,
164 res.update({'content_short':content_short,
155 'content_short_hl':self.highlight(content_short),
165 'content_short_hl':self.highlight(content_short),
156 'f_path':f_path})
166 'f_path':f_path})
157
167
158 return res
168 return res
159
169
160 def get_short_content(self, res, chunks):
170 def get_short_content(self, res, chunks):
161
171
162 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
172 return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
163
173
164 def get_chunks(self):
174 def get_chunks(self):
165 """
175 """
166 Smart function that implements chunking the content
176 Smart function that implements chunking the content
167 but not overlap chunks so it doesn't highlight the same
177 but not overlap chunks so it doesn't highlight the same
168 close occurrences twice.
178 close occurrences twice.
169 @param matcher:
179 @param matcher:
170 @param size:
180 @param size:
171 """
181 """
172 memory = [(0, 0)]
182 memory = [(0, 0)]
173 for span in self.matcher.spans():
183 for span in self.matcher.spans():
174 start = span.startchar or 0
184 start = span.startchar or 0
175 end = span.endchar or 0
185 end = span.endchar or 0
176 start_offseted = max(0, start - self.fragment_size)
186 start_offseted = max(0, start - self.fragment_size)
177 end_offseted = end + self.fragment_size
187 end_offseted = end + self.fragment_size
178
188
179 if start_offseted < memory[-1][1]:
189 if start_offseted < memory[-1][1]:
180 start_offseted = memory[-1][1]
190 start_offseted = memory[-1][1]
181 memory.append((start_offseted, end_offseted,))
191 memory.append((start_offseted, end_offseted,))
182 yield (start_offseted, end_offseted,)
192 yield (start_offseted, end_offseted,)
183
193
184 def highlight(self, content, top=5):
194 def highlight(self, content, top=5):
185 if self.search_type != 'content':
195 if self.search_type != 'content':
186 return ''
196 return ''
187 hl = highlight(escape(content),
197 hl = highlight(escape(content),
188 self.highlight_items,
198 self.highlight_items,
189 analyzer=ANALYZER,
199 analyzer=ANALYZER,
190 fragmenter=FRAGMENTER,
200 fragmenter=FRAGMENTER,
191 formatter=FORMATTER,
201 formatter=FORMATTER,
192 top=top)
202 top=top)
193 return hl
203 return hl
@@ -1,226 +1,236 b''
1 # -*- coding: utf-8 -*-
1 # -*- coding: utf-8 -*-
2 """
2 """
3 rhodecode.lib.indexers.daemon
3 rhodecode.lib.indexers.daemon
4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5
5
6 A deamon will read from task table and run tasks
6 A deamon will read from task table and run tasks
7
7
8 :created_on: Jan 26, 2010
8 :created_on: Jan 26, 2010
9 :author: marcink
9 :author: marcink
10 :copyright: (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
10 :copyright: (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
11 :license: GPLv3, see COPYING for more details.
11 :license: GPLv3, see COPYING for more details.
12 """
12 """
13 # This program is free software; you can redistribute it and/or
13 # This program is free software; you can redistribute it and/or
14 # modify it under the terms of the GNU General Public License
14 # modify it under the terms of the GNU General Public License
15 # as published by the Free Software Foundation; version 2
15 # as published by the Free Software Foundation; version 2
16 # of the License or (at your opinion) any later version of the license.
16 # of the License or (at your opinion) any later version of the license.
17 #
17 #
18 # This program is distributed in the hope that it will be useful,
18 # This program is distributed in the hope that it will be useful,
19 # but WITHOUT ANY WARRANTY; without even the implied warranty of
19 # but WITHOUT ANY WARRANTY; without even the implied warranty of
20 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 # GNU General Public License for more details.
21 # GNU General Public License for more details.
22 #
22 #
23 # You should have received a copy of the GNU General Public License
23 # You should have received a copy of the GNU General Public License
24 # along with this program; if not, write to the Free Software
24 # along with this program; if not, write to the Free Software
25 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
25 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
26 # MA 02110-1301, USA.
26 # MA 02110-1301, USA.
27
27
28 import sys
28 import sys
29 import os
29 import os
30 import traceback
30 import traceback
31 from os.path import dirname as dn
31 from os.path import dirname as dn
32 from os.path import join as jn
32 from os.path import join as jn
33
33
34 #to get the rhodecode import
34 #to get the rhodecode import
35 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
35 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
36 sys.path.append(project_path)
36 sys.path.append(project_path)
37
37
38
38
39 from rhodecode.model.scm import ScmModel
39 from rhodecode.model.scm import ScmModel
40 from rhodecode.lib.helpers import safe_unicode
40 from rhodecode.lib.helpers import safe_unicode
41 from whoosh.index import create_in, open_dir
41 from whoosh.index import create_in, open_dir
42 from shutil import rmtree
42 from shutil import rmtree
43 from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
43 from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
44
44
45 from time import mktime
45 from time import mktime
46 from vcs.exceptions import ChangesetError, RepositoryError
46 from vcs.exceptions import ChangesetError, RepositoryError
47
47
48 import logging
48 import logging
49
49
50 log = logging.getLogger('whooshIndexer')
50 log = logging.getLogger('whooshIndexer')
51 # create logger
51 # create logger
52 log.setLevel(logging.DEBUG)
52 log.setLevel(logging.DEBUG)
53 log.propagate = False
53 log.propagate = False
54 # create console handler and set level to debug
54 # create console handler and set level to debug
55 ch = logging.StreamHandler()
55 ch = logging.StreamHandler()
56 ch.setLevel(logging.DEBUG)
56 ch.setLevel(logging.DEBUG)
57
57
58 # create formatter
58 # create formatter
59 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
59 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
60
60
61 # add formatter to ch
61 # add formatter to ch
62 ch.setFormatter(formatter)
62 ch.setFormatter(formatter)
63
63
64 # add ch to logger
64 # add ch to logger
65 log.addHandler(ch)
65 log.addHandler(ch)
66
66
67 class WhooshIndexingDaemon(object):
67 class WhooshIndexingDaemon(object):
68 """
68 """
69 Deamon for atomic jobs
69 Deamon for atomic jobs
70 """
70 """
71
71
72 def __init__(self, indexname='HG_INDEX', index_location=None,
72 def __init__(self, indexname='HG_INDEX', index_location=None,
73 repo_location=None, sa=None):
73 repo_location=None, sa=None, repo_list=None):
74 self.indexname = indexname
74 self.indexname = indexname
75
75
76 self.index_location = index_location
76 self.index_location = index_location
77 if not index_location:
77 if not index_location:
78 raise Exception('You have to provide index location')
78 raise Exception('You have to provide index location')
79
79
80 self.repo_location = repo_location
80 self.repo_location = repo_location
81 if not repo_location:
81 if not repo_location:
82 raise Exception('You have to provide repositories location')
82 raise Exception('You have to provide repositories location')
83
83
84 self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
84 self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
85
86 if repo_list:
87 filtered_repo_paths = {}
88 for repo_name, repo in self.repo_paths.items():
89 if repo_name in repo_list:
90 filtered_repo_paths[repo.name] = repo
91
92 self.repo_paths = filtered_repo_paths
93
94
85 self.initial = False
95 self.initial = False
86 if not os.path.isdir(self.index_location):
96 if not os.path.isdir(self.index_location):
87 os.makedirs(self.index_location)
97 os.makedirs(self.index_location)
88 log.info('Cannot run incremental index since it does not'
98 log.info('Cannot run incremental index since it does not'
89 ' yet exist running full build')
99 ' yet exist running full build')
90 self.initial = True
100 self.initial = True
91
101
92 def get_paths(self, repo):
102 def get_paths(self, repo):
93 """recursive walk in root dir and return a set of all path in that dir
103 """recursive walk in root dir and return a set of all path in that dir
94 based on repository walk function
104 based on repository walk function
95 """
105 """
96 index_paths_ = set()
106 index_paths_ = set()
97 try:
107 try:
98 for topnode, dirs, files in repo.walk('/', 'tip'):
108 for topnode, dirs, files in repo.walk('/', 'tip'):
99 for f in files:
109 for f in files:
100 index_paths_.add(jn(repo.path, f.path))
110 index_paths_.add(jn(repo.path, f.path))
101 for dir in dirs:
111 for dir in dirs:
102 for f in files:
112 for f in files:
103 index_paths_.add(jn(repo.path, f.path))
113 index_paths_.add(jn(repo.path, f.path))
104
114
105 except RepositoryError, e:
115 except RepositoryError, e:
106 log.debug(traceback.format_exc())
116 log.debug(traceback.format_exc())
107 pass
117 pass
108 return index_paths_
118 return index_paths_
109
119
110 def get_node(self, repo, path):
120 def get_node(self, repo, path):
111 n_path = path[len(repo.path) + 1:]
121 n_path = path[len(repo.path) + 1:]
112 node = repo.get_changeset().get_node(n_path)
122 node = repo.get_changeset().get_node(n_path)
113 return node
123 return node
114
124
115 def get_node_mtime(self, node):
125 def get_node_mtime(self, node):
116 return mktime(node.last_changeset.date.timetuple())
126 return mktime(node.last_changeset.date.timetuple())
117
127
118 def add_doc(self, writer, path, repo):
128 def add_doc(self, writer, path, repo):
119 """Adding doc to writer this function itself fetches data from
129 """Adding doc to writer this function itself fetches data from
120 the instance of vcs backend"""
130 the instance of vcs backend"""
121 node = self.get_node(repo, path)
131 node = self.get_node(repo, path)
122
132
123 #we just index the content of chosen files, and skip binary files
133 #we just index the content of chosen files, and skip binary files
124 if node.extension in INDEX_EXTENSIONS and not node.is_binary:
134 if node.extension in INDEX_EXTENSIONS and not node.is_binary:
125
135
126 u_content = node.content
136 u_content = node.content
127 if not isinstance(u_content, unicode):
137 if not isinstance(u_content, unicode):
128 log.warning(' >> %s Could not get this content as unicode '
138 log.warning(' >> %s Could not get this content as unicode '
129 'replacing with empty content', path)
139 'replacing with empty content', path)
130 u_content = u''
140 u_content = u''
131 else:
141 else:
132 log.debug(' >> %s [WITH CONTENT]' % path)
142 log.debug(' >> %s [WITH CONTENT]' % path)
133
143
134 else:
144 else:
135 log.debug(' >> %s' % path)
145 log.debug(' >> %s' % path)
136 #just index file name without it's content
146 #just index file name without it's content
137 u_content = u''
147 u_content = u''
138
148
139 writer.add_document(owner=unicode(repo.contact),
149 writer.add_document(owner=unicode(repo.contact),
140 repository=safe_unicode(repo.name),
150 repository=safe_unicode(repo.name),
141 path=safe_unicode(path),
151 path=safe_unicode(path),
142 content=u_content,
152 content=u_content,
143 modtime=self.get_node_mtime(node),
153 modtime=self.get_node_mtime(node),
144 extension=node.extension)
154 extension=node.extension)
145
155
146
156
147 def build_index(self):
157 def build_index(self):
148 if os.path.exists(self.index_location):
158 if os.path.exists(self.index_location):
149 log.debug('removing previous index')
159 log.debug('removing previous index')
150 rmtree(self.index_location)
160 rmtree(self.index_location)
151
161
152 if not os.path.exists(self.index_location):
162 if not os.path.exists(self.index_location):
153 os.mkdir(self.index_location)
163 os.mkdir(self.index_location)
154
164
155 idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
165 idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
156 writer = idx.writer()
166 writer = idx.writer()
157 print self.repo_paths.values()
167
158 for cnt, repo in enumerate(self.repo_paths.values()):
168 for repo in self.repo_paths.values():
159 log.debug('building index @ %s' % repo.path)
169 log.debug('building index @ %s' % repo.path)
160
170
161 for idx_path in self.get_paths(repo):
171 for idx_path in self.get_paths(repo):
162 self.add_doc(writer, idx_path, repo)
172 self.add_doc(writer, idx_path, repo)
163
173
164 log.debug('>> COMMITING CHANGES <<')
174 log.debug('>> COMMITING CHANGES <<')
165 writer.commit(merge=True)
175 writer.commit(merge=True)
166 log.debug('>>> FINISHED BUILDING INDEX <<<')
176 log.debug('>>> FINISHED BUILDING INDEX <<<')
167
177
168
178
169 def update_index(self):
179 def update_index(self):
170 log.debug('STARTING INCREMENTAL INDEXING UPDATE')
180 log.debug('STARTING INCREMENTAL INDEXING UPDATE')
171
181
172 idx = open_dir(self.index_location, indexname=self.indexname)
182 idx = open_dir(self.index_location, indexname=self.indexname)
173 # The set of all paths in the index
183 # The set of all paths in the index
174 indexed_paths = set()
184 indexed_paths = set()
175 # The set of all paths we need to re-index
185 # The set of all paths we need to re-index
176 to_index = set()
186 to_index = set()
177
187
178 reader = idx.reader()
188 reader = idx.reader()
179 writer = idx.writer()
189 writer = idx.writer()
180
190
181 # Loop over the stored fields in the index
191 # Loop over the stored fields in the index
182 for fields in reader.all_stored_fields():
192 for fields in reader.all_stored_fields():
183 indexed_path = fields['path']
193 indexed_path = fields['path']
184 indexed_paths.add(indexed_path)
194 indexed_paths.add(indexed_path)
185
195
186 repo = self.repo_paths[fields['repository']]
196 repo = self.repo_paths[fields['repository']]
187
197
188 try:
198 try:
189 node = self.get_node(repo, indexed_path)
199 node = self.get_node(repo, indexed_path)
190 except ChangesetError:
200 except ChangesetError:
191 # This file was deleted since it was indexed
201 # This file was deleted since it was indexed
192 log.debug('removing from index %s' % indexed_path)
202 log.debug('removing from index %s' % indexed_path)
193 writer.delete_by_term('path', indexed_path)
203 writer.delete_by_term('path', indexed_path)
194
204
195 else:
205 else:
196 # Check if this file was changed since it was indexed
206 # Check if this file was changed since it was indexed
197 indexed_time = fields['modtime']
207 indexed_time = fields['modtime']
198 mtime = self.get_node_mtime(node)
208 mtime = self.get_node_mtime(node)
199 if mtime > indexed_time:
209 if mtime > indexed_time:
200 # The file has changed, delete it and add it to the list of
210 # The file has changed, delete it and add it to the list of
201 # files to reindex
211 # files to reindex
202 log.debug('adding to reindex list %s' % indexed_path)
212 log.debug('adding to reindex list %s' % indexed_path)
203 writer.delete_by_term('path', indexed_path)
213 writer.delete_by_term('path', indexed_path)
204 to_index.add(indexed_path)
214 to_index.add(indexed_path)
205
215
206 # Loop over the files in the filesystem
216 # Loop over the files in the filesystem
207 # Assume we have a function that gathers the filenames of the
217 # Assume we have a function that gathers the filenames of the
208 # documents to be indexed
218 # documents to be indexed
209 for repo in self.repo_paths.values():
219 for repo in self.repo_paths.values():
210 for path in self.get_paths(repo):
220 for path in self.get_paths(repo):
211 if path in to_index or path not in indexed_paths:
221 if path in to_index or path not in indexed_paths:
212 # This is either a file that's changed, or a new file
222 # This is either a file that's changed, or a new file
213 # that wasn't indexed before. So index it!
223 # that wasn't indexed before. So index it!
214 self.add_doc(writer, path, repo)
224 self.add_doc(writer, path, repo)
215 log.debug('re indexing %s' % path)
225 log.debug('re indexing %s' % path)
216
226
217 log.debug('>> COMMITING CHANGES <<')
227 log.debug('>> COMMITING CHANGES <<')
218 writer.commit(merge=True)
228 writer.commit(merge=True)
219 log.debug('>>> FINISHED REBUILDING INDEX <<<')
229 log.debug('>>> FINISHED REBUILDING INDEX <<<')
220
230
221 def run(self, full_index=False):
231 def run(self, full_index=False):
222 """Run daemon"""
232 """Run daemon"""
223 if full_index or self.initial:
233 if full_index or self.initial:
224 self.build_index()
234 self.build_index()
225 else:
235 else:
226 self.update_index()
236 self.update_index()
General Comments 0
You need to be logged in to leave comments. Login now