##// END OF EJS Templates
fixed cache problem,...
marcink -
r777:aac24db5 beta
parent child Browse files
Show More
@@ -1,237 +1,253 b''
1 1 .. _setup:
2 2
3 3 Setup
4 4 =====
5 5
6 6
7 7 Setting up the application
8 8 --------------------------
9 9
10 10 ::
11 11
12 12 paster make-config RhodeCode production.ini
13 13
14 14 - This will create `production.ini` config inside the directory
15 15 this config contains various settings for RhodeCode, e.g proxy port,
16 16 email settings,static files, cache and logging.
17 17
18 18 ::
19 19
20 20 paster setup-app production.ini
21 21
22 22 - This command will create all needed tables and an admin account.
23 23 When asked for a path You can either use a new location of one with already
24 24 existing ones. RhodeCode will simply add all new found repositories to
25 25 it's database. Also make sure You specify correct path to repositories.
26 26 - Remember that the given path for mercurial_ repositories must be write
27 27 accessible for the application. It's very important since RhodeCode web interface
28 28 will work even without such an access but, when trying to do a push it'll
29 29 eventually fail with permission denied errors.
30 30 - Run
31 31
32 32 ::
33 33
34 34 paster serve production.ini
35 35
36 36 - This command runs the RhodeCode server the app should be available at the
37 37 127.0.0.1:5000. This ip and port is configurable via the production.ini
38 38 file created in previous step
39 39 - Use admin account you created to login.
40 40 - Default permissions on each repository is read, and owner is admin. So
41 41 remember to update these if needed.
42 42
43 43
44 44 Setting up Whoosh full text search
45 45 ----------------------------------
46 46
47 47 Index for whoosh can be build starting from version 1.1 using paster command
48 48 passing repo locations to index, as well as Your config file that stores
49 49 whoosh index files locations. There is possible to pass `-f` to the options
50 50 to enable full index rebuild. Without that indexing will run always in in
51 51 incremental mode.
52 52
53 53 ::
54 54
55 55 paster make-index --repo-location=<location for repos> production.ini
56 56
57 57 for full index rebuild You can use
58 58
59 59 ::
60 60
61 61 paster make-index -f --repo-location=<location for repos> production.ini
62 62
63 63 - For full text search You can either put crontab entry for
64 64
65 65 This command can be run even from crontab in order to do periodical
66 66 index builds and keep Your index always up to date. An example entry might
67 67 look like this
68 68
69 69 ::
70 70
71 71 /path/to/python/bin/paster --repo-location=<location for repos> /path/to/rhodecode/production.ini
72 72
73 73 When using incremental(default) mode whoosh will check last modification date
74 74 of each file and add it to reindex if newer file is available. Also indexing
75 75 daemon checks for removed files and removes them from index.
76 76
77 77 Sometime You might want to rebuild index from scratch. You can do that using
78 78 the `-f` flag passed to paster command or, in admin panel You can check
79 79 `build from scratch` flag.
80 80
81 81
82 82 Setting up LDAP support
83 83 -----------------------
84 84
85 85 RhodeCode starting from version 1.1 supports ldap authentication. In order
86 86 to use ldap, You have to install python-ldap package. This package is available
87 87 via pypi, so You can install it by running
88 88
89 89 ::
90 90
91 91 easy_install python-ldap
92 92
93 93 ::
94 94
95 95 pip install python-ldap
96 96
97 97 .. note::
98 98 python-ldap requires some certain libs on Your system, so before installing
99 99 it check that You have at least `openldap`, and `sasl` libraries.
100 100
101 101 ldap settings are located in admin->ldap section,
102 102
103 103 Here's a typical ldap setup::
104 104
105 105 Enable ldap = checked #controls if ldap access is enabled
106 106 Host = host.domain.org #actual ldap server to connect
107 107 Port = 389 or 689 for ldaps #ldap server ports
108 108 Enable LDAPS = unchecked #enable disable ldaps
109 109 Account = <account> #access for ldap server(if required)
110 110 Password = <password> #password for ldap server(if required)
111 111 Base DN = uid=%(user)s,CN=users,DC=host,DC=domain,DC=org
112 112
113 113
114 114 `Account` and `Password` are optional, and used for two-phase ldap
115 115 authentication so those are credentials to access Your ldap, if it doesn't
116 116 support anonymous search/user lookups.
117 117
118 118 Base DN must have %(user)s template inside, it's a placer where Your uid used
119 119 to login would go, it allows admins to specify not standard schema for uid
120 120 variable
121 121
122 122 If all data are entered correctly, and `python-ldap` is properly installed
123 123 Users should be granted to access RhodeCode wit ldap accounts. When
124 124 logging at the first time an special ldap account is created inside RhodeCode,
125 125 so You can control over permissions even on ldap users. If such user exists
126 126 already in RhodeCode database ldap user with the same username would be not
127 127 able to access RhodeCode.
128 128
129 129 If You have problems with ldap access and believe You entered correct
130 130 information check out the RhodeCode logs,any error messages sent from
131 131 ldap will be saved there.
132 132
133 133
134
135 Setting Up Celery
136 -----------------
137
138 Since version 1.1 celery is configured by the rhodecode ini configuration files
139 simply set use_celery=true in the ini file then add / change the configuration
140 variables inside the ini file.
141
142 Remember that the ini files uses format with '.' not with '_' like celery
143 so for example setting `BROKER_HOST` in celery means setting `broker.host` in
144 the config file.
145
146 In order to make start using celery run::
147 paster celeryd <configfile.ini>
148
149
134 150 Nginx virtual host example
135 151 --------------------------
136 152
137 153 Sample config for nginx using proxy::
138 154
139 155 server {
140 156 listen 80;
141 157 server_name hg.myserver.com;
142 158 access_log /var/log/nginx/rhodecode.access.log;
143 159 error_log /var/log/nginx/rhodecode.error.log;
144 160 location / {
145 161 root /var/www/rhodecode/rhodecode/public/;
146 162 if (!-f $request_filename){
147 163 proxy_pass http://127.0.0.1:5000;
148 164 }
149 165 #this is important for https !!!
150 166 proxy_set_header X-Url-Scheme $scheme;
151 167 include /etc/nginx/proxy.conf;
152 168 }
153 169 }
154 170
155 171 Here's the proxy.conf. It's tuned so it'll not timeout on long
156 172 pushes and also on large pushes::
157 173
158 174 proxy_redirect off;
159 175 proxy_set_header Host $host;
160 176 proxy_set_header X-Host $http_host;
161 177 proxy_set_header X-Real-IP $remote_addr;
162 178 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
163 179 proxy_set_header Proxy-host $proxy_host;
164 180 client_max_body_size 400m;
165 181 client_body_buffer_size 128k;
166 182 proxy_buffering off;
167 183 proxy_connect_timeout 3600;
168 184 proxy_send_timeout 3600;
169 185 proxy_read_timeout 3600;
170 186 proxy_buffer_size 8k;
171 187 proxy_buffers 8 32k;
172 188 proxy_busy_buffers_size 64k;
173 189 proxy_temp_file_write_size 64k;
174 190
175 191 Also when using root path with nginx You might set the static files to false
176 192 in production.ini file::
177 193
178 194 [app:main]
179 195 use = egg:rhodecode
180 196 full_stack = true
181 197 static_files = false
182 198 lang=en
183 199 cache_dir = %(here)s/data
184 200
185 201 To not have the statics served by the application. And improve speed.
186 202
187 203 Apache reverse proxy
188 204 --------------------
189 205 Tutorial can be found here
190 206 http://wiki.pylonshq.com/display/pylonscookbook/Apache+as+a+reverse+proxy+for+Pylons
191 207
192 208
193 209 Apache's example FCGI config
194 210 ----------------------------
195 211
196 212 TODO !
197 213
198 214 Other configuration files
199 215 -------------------------
200 216
201 217 Some extra configuration files and examples can be found here:
202 218 http://hg.python-works.com/rhodecode/files/tip/init.d
203 219
204 220 and also an celeryconfig file can be use from here:
205 221 http://hg.python-works.com/rhodecode/files/tip/celeryconfig.py
206 222
207 223 Troubleshooting
208 224 ---------------
209 225
210 226 - missing static files ?
211 227
212 228 - make sure either to set the `static_files = true` in the .ini file or
213 229 double check the root path for Your http setup. It should point to
214 230 for example:
215 231 /home/my-virtual-python/lib/python2.6/site-packages/rhodecode/public
216 232
217 233 - can't install celery/rabbitmq
218 234
219 235 - don't worry RhodeCode works without them too. No extra setup required
220 236
221 237
222 238 - long lasting push timeouts ?
223 239
224 240 - make sure You set a longer timeouts in Your proxy/fcgi settings, timeouts
225 241 are caused by https server and not RhodeCode
226 242
227 243 - large pushes timeouts ?
228 244
229 245 - make sure You set a proper max_body_size for the http server
230 246
231 247
232 248
233 249 .. _virtualenv: http://pypi.python.org/pypi/virtualenv
234 250 .. _python: http://www.python.org/
235 251 .. _mercurial: http://mercurial.selenic.com/
236 252 .. _celery: http://celeryproject.org/
237 253 .. _rabbitmq: http://www.rabbitmq.com/ No newline at end of file
@@ -1,363 +1,389 b''
1 1 from celery.decorators import task
2 2
3 3 import os
4 4 import traceback
5 5 import beaker
6 6 from time import mktime
7 7 from operator import itemgetter
8 8
9 9 from pylons import config
10 10 from pylons.i18n.translation import _
11 11
12 12 from rhodecode.lib.celerylib import run_task, locked_task, str2bool
13 13 from rhodecode.lib.helpers import person
14 14 from rhodecode.lib.smtp_mailer import SmtpMailer
15 15 from rhodecode.lib.utils import OrderedDict
16 16 from rhodecode.model import init_model
17 17 from rhodecode.model import meta
18 18 from rhodecode.model.db import RhodeCodeUi
19 19
20 20 from vcs.backends import get_repo
21 21
22 22 from sqlalchemy import engine_from_config
23 23
24 #set cache regions for beaker so celery can utilise it
25 def add_cache(settings):
26 cache_settings = {'regions':None}
27 for key in settings.keys():
28 for prefix in ['beaker.cache.', 'cache.']:
29 if key.startswith(prefix):
30 name = key.split(prefix)[1].strip()
31 cache_settings[name] = settings[key].strip()
32 if cache_settings['regions']:
33 for region in cache_settings['regions'].split(','):
34 region = region.strip()
35 region_settings = {}
36 for key, value in cache_settings.items():
37 if key.startswith(region):
38 region_settings[key.split('.')[1]] = value
39 region_settings['expire'] = int(region_settings.get('expire',
40 60))
41 region_settings.setdefault('lock_dir',
42 cache_settings.get('lock_dir'))
43 if 'type' not in region_settings:
44 region_settings['type'] = cache_settings.get('type',
45 'memory')
46 beaker.cache.cache_regions[region] = region_settings
47 add_cache(config)
48
24 49 try:
25 50 import json
26 51 except ImportError:
27 52 #python 2.5 compatibility
28 53 import simplejson as json
29 54
30 55 __all__ = ['whoosh_index', 'get_commits_stats',
31 56 'reset_user_password', 'send_email']
32 57
33 58 CELERY_ON = str2bool(config['app_conf'].get('use_celery'))
34 59
35 60 def get_session():
36 61 if CELERY_ON:
37 62 engine = engine_from_config(config, 'sqlalchemy.db1.')
38 63 init_model(engine)
39 64 sa = meta.Session()
40 65 return sa
41 66
42 67 def get_repos_path():
43 68 sa = get_session()
44 69 q = sa.query(RhodeCodeUi).filter(RhodeCodeUi.ui_key == '/').one()
45 70 return q.ui_value
46 71
47 72 @task
48 73 @locked_task
49 74 def whoosh_index(repo_location, full_index):
50 75 log = whoosh_index.get_logger()
51 76 from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
52 77 index_location = config['index_dir']
53 78 WhooshIndexingDaemon(index_location=index_location,
54 repo_location=repo_location).run(full_index=full_index)
79 repo_location=repo_location, sa=get_session())\
80 .run(full_index=full_index)
55 81
56 82 @task
57 83 @locked_task
58 84 def get_commits_stats(repo_name, ts_min_y, ts_max_y):
59 85 from rhodecode.model.db import Statistics, Repository
60 86 log = get_commits_stats.get_logger()
61 87
62 88 #for js data compatibilty
63 89 author_key_cleaner = lambda k: person(k).replace('"', "")
64 90
65 91 commits_by_day_author_aggregate = {}
66 92 commits_by_day_aggregate = {}
67 93 repos_path = get_repos_path()
68 94 p = os.path.join(repos_path, repo_name)
69 95 repo = get_repo(p)
70 96
71 97 skip_date_limit = True
72 98 parse_limit = 250 #limit for single task changeset parsing optimal for
73 99 last_rev = 0
74 100 last_cs = None
75 101 timegetter = itemgetter('time')
76 102
77 103 sa = get_session()
78 104
79 105 dbrepo = sa.query(Repository)\
80 106 .filter(Repository.repo_name == repo_name).scalar()
81 107 cur_stats = sa.query(Statistics)\
82 108 .filter(Statistics.repository == dbrepo).scalar()
83 109 if cur_stats:
84 110 last_rev = cur_stats.stat_on_revision
85 111 if not repo.revisions:
86 112 return True
87 113
88 114 if last_rev == repo.revisions[-1] and len(repo.revisions) > 1:
89 115 #pass silently without any work if we're not on first revision or
90 116 #current state of parsing revision(from db marker) is the last revision
91 117 return True
92 118
93 119 if cur_stats:
94 120 commits_by_day_aggregate = OrderedDict(
95 121 json.loads(
96 122 cur_stats.commit_activity_combined))
97 123 commits_by_day_author_aggregate = json.loads(cur_stats.commit_activity)
98 124
99 125 log.debug('starting parsing %s', parse_limit)
100 126 lmktime = mktime
101 127
102 128 for cnt, rev in enumerate(repo.revisions[last_rev:]):
103 129 last_cs = cs = repo.get_changeset(rev)
104 130 k = '%s-%s-%s' % (cs.date.timetuple()[0], cs.date.timetuple()[1],
105 131 cs.date.timetuple()[2])
106 132 timetupple = [int(x) for x in k.split('-')]
107 133 timetupple.extend([0 for _ in xrange(6)])
108 134 k = lmktime(timetupple)
109 135 if commits_by_day_author_aggregate.has_key(author_key_cleaner(cs.author)):
110 136 try:
111 137 l = [timegetter(x) for x in commits_by_day_author_aggregate\
112 138 [author_key_cleaner(cs.author)]['data']]
113 139 time_pos = l.index(k)
114 140 except ValueError:
115 141 time_pos = False
116 142
117 143 if time_pos >= 0 and time_pos is not False:
118 144
119 145 datadict = commits_by_day_author_aggregate\
120 146 [author_key_cleaner(cs.author)]['data'][time_pos]
121 147
122 148 datadict["commits"] += 1
123 149 datadict["added"] += len(cs.added)
124 150 datadict["changed"] += len(cs.changed)
125 151 datadict["removed"] += len(cs.removed)
126 152
127 153 else:
128 154 if k >= ts_min_y and k <= ts_max_y or skip_date_limit:
129 155
130 156 datadict = {"time":k,
131 157 "commits":1,
132 158 "added":len(cs.added),
133 159 "changed":len(cs.changed),
134 160 "removed":len(cs.removed),
135 161 }
136 162 commits_by_day_author_aggregate\
137 163 [author_key_cleaner(cs.author)]['data'].append(datadict)
138 164
139 165 else:
140 166 if k >= ts_min_y and k <= ts_max_y or skip_date_limit:
141 167 commits_by_day_author_aggregate[author_key_cleaner(cs.author)] = {
142 168 "label":author_key_cleaner(cs.author),
143 169 "data":[{"time":k,
144 170 "commits":1,
145 171 "added":len(cs.added),
146 172 "changed":len(cs.changed),
147 173 "removed":len(cs.removed),
148 174 }],
149 175 "schema":["commits"],
150 176 }
151 177
152 178 #gather all data by day
153 179 if commits_by_day_aggregate.has_key(k):
154 180 commits_by_day_aggregate[k] += 1
155 181 else:
156 182 commits_by_day_aggregate[k] = 1
157 183
158 184 if cnt >= parse_limit:
159 185 #don't fetch to much data since we can freeze application
160 186 break
161 187 overview_data = []
162 188 for k, v in commits_by_day_aggregate.items():
163 189 overview_data.append([k, v])
164 190 overview_data = sorted(overview_data, key=itemgetter(0))
165 191 if not commits_by_day_author_aggregate:
166 192 commits_by_day_author_aggregate[author_key_cleaner(repo.contact)] = {
167 193 "label":author_key_cleaner(repo.contact),
168 194 "data":[0, 1],
169 195 "schema":["commits"],
170 196 }
171 197
172 198 stats = cur_stats if cur_stats else Statistics()
173 199 stats.commit_activity = json.dumps(commits_by_day_author_aggregate)
174 200 stats.commit_activity_combined = json.dumps(overview_data)
175 201
176 202 log.debug('last revison %s', last_rev)
177 203 leftovers = len(repo.revisions[last_rev:])
178 204 log.debug('revisions to parse %s', leftovers)
179 205
180 206 if last_rev == 0 or leftovers < parse_limit:
181 207 stats.languages = json.dumps(__get_codes_stats(repo_name))
182 208
183 209 stats.repository = dbrepo
184 210 stats.stat_on_revision = last_cs.revision
185 211
186 212 try:
187 213 sa.add(stats)
188 214 sa.commit()
189 215 except:
190 216 log.error(traceback.format_exc())
191 217 sa.rollback()
192 218 return False
193 219 if len(repo.revisions) > 1:
194 220 run_task(get_commits_stats, repo_name, ts_min_y, ts_max_y)
195 221
196 222 return True
197 223
198 224 @task
199 225 def reset_user_password(user_email):
200 226 log = reset_user_password.get_logger()
201 227 from rhodecode.lib import auth
202 228 from rhodecode.model.db import User
203 229
204 230 try:
205 231 try:
206 232 sa = get_session()
207 233 user = sa.query(User).filter(User.email == user_email).scalar()
208 234 new_passwd = auth.PasswordGenerator().gen_password(8,
209 235 auth.PasswordGenerator.ALPHABETS_BIG_SMALL)
210 236 if user:
211 237 user.password = auth.get_crypt_password(new_passwd)
212 238 sa.add(user)
213 239 sa.commit()
214 240 log.info('change password for %s', user_email)
215 241 if new_passwd is None:
216 242 raise Exception('unable to generate new password')
217 243
218 244 except:
219 245 log.error(traceback.format_exc())
220 246 sa.rollback()
221 247
222 248 run_task(send_email, user_email,
223 249 "Your new rhodecode password",
224 250 'Your new rhodecode password:%s' % (new_passwd))
225 251 log.info('send new password mail to %s', user_email)
226 252
227 253
228 254 except:
229 255 log.error('Failed to update user password')
230 256 log.error(traceback.format_exc())
231 257
232 258 return True
233 259
234 260 @task
235 261 def send_email(recipients, subject, body):
236 262 """
237 263 Sends an email with defined parameters from the .ini files.
238 264
239 265
240 266 :param recipients: list of recipients, it this is empty the defined email
241 267 address from field 'email_to' is used instead
242 268 :param subject: subject of the mail
243 269 :param body: body of the mail
244 270 """
245 271 log = send_email.get_logger()
246 272 email_config = config
247 273
248 274 if not recipients:
249 275 recipients = [email_config.get('email_to')]
250 276
251 277 mail_from = email_config.get('app_email_from')
252 278 user = email_config.get('smtp_username')
253 279 passwd = email_config.get('smtp_password')
254 280 mail_server = email_config.get('smtp_server')
255 281 mail_port = email_config.get('smtp_port')
256 282 tls = str2bool(email_config.get('smtp_use_tls'))
257 283 ssl = str2bool(email_config.get('smtp_use_ssl'))
258 284
259 285 try:
260 286 m = SmtpMailer(mail_from, user, passwd, mail_server,
261 287 mail_port, ssl, tls)
262 288 m.send(recipients, subject, body)
263 289 except:
264 290 log.error('Mail sending failed')
265 291 log.error(traceback.format_exc())
266 292 return False
267 293 return True
268 294
269 295 @task
270 296 def create_repo_fork(form_data, cur_user):
271 297 from rhodecode.model.repo import RepoModel
272 298 from vcs import get_backend
273 299 log = create_repo_fork.get_logger()
274 300 repo_model = RepoModel(get_session())
275 301 repo_model.create(form_data, cur_user, just_db=True, fork=True)
276 302 repo_name = form_data['repo_name']
277 303 repos_path = get_repos_path()
278 304 repo_path = os.path.join(repos_path, repo_name)
279 305 repo_fork_path = os.path.join(repos_path, form_data['fork_name'])
280 306 alias = form_data['repo_type']
281 307
282 308 log.info('creating repo fork %s as %s', repo_name, repo_path)
283 309 backend = get_backend(alias)
284 310 backend(str(repo_fork_path), create=True, src_url=str(repo_path))
285 311
286 312 def __get_codes_stats(repo_name):
287 313 LANGUAGES_EXTENSIONS_MAP = {'scm': 'Scheme', 'asmx': 'VbNetAspx', 'Rout':
288 314 'RConsole', 'rest': 'Rst', 'abap': 'ABAP', 'go': 'Go', 'phtml': 'HtmlPhp',
289 315 'ns2': 'Newspeak', 'xml': 'EvoqueXml', 'sh-session': 'BashSession', 'ads':
290 316 'Ada', 'clj': 'Clojure', 'll': 'Llvm', 'ebuild': 'Bash', 'adb': 'Ada',
291 317 'ada': 'Ada', 'c++-objdump': 'CppObjdump', 'aspx':
292 318 'VbNetAspx', 'ksh': 'Bash', 'coffee': 'CoffeeScript', 'vert': 'GLShader',
293 319 'Makefile.*': 'Makefile', 'di': 'D', 'dpatch': 'DarcsPatch', 'rake':
294 320 'Ruby', 'moo': 'MOOCode', 'erl-sh': 'ErlangShell', 'geo': 'GLShader',
295 321 'pov': 'Povray', 'bas': 'VbNet', 'bat': 'Batch', 'd': 'D', 'lisp':
296 322 'CommonLisp', 'h': 'C', 'rbx': 'Ruby', 'tcl': 'Tcl', 'c++': 'Cpp', 'md':
297 323 'MiniD', '.vimrc': 'Vim', 'xsd': 'Xml', 'ml': 'Ocaml', 'el': 'CommonLisp',
298 324 'befunge': 'Befunge', 'xsl': 'Xslt', 'pyx': 'Cython', 'cfm':
299 325 'ColdfusionHtml', 'evoque': 'Evoque', 'cfg': 'Ini', 'htm': 'Html',
300 326 'Makefile': 'Makefile', 'cfc': 'ColdfusionHtml', 'tex': 'Tex', 'cs':
301 327 'CSharp', 'mxml': 'Mxml', 'patch': 'Diff', 'apache.conf': 'ApacheConf',
302 328 'scala': 'Scala', 'applescript': 'AppleScript', 'GNUmakefile': 'Makefile',
303 329 'c-objdump': 'CObjdump', 'lua': 'Lua', 'apache2.conf': 'ApacheConf', 'rb':
304 330 'Ruby', 'gemspec': 'Ruby', 'rl': 'RagelObjectiveC', 'vala': 'Vala', 'tmpl':
305 331 'Cheetah', 'bf': 'Brainfuck', 'plt': 'Gnuplot', 'G': 'AntlrRuby', 'xslt':
306 332 'Xslt', 'flxh': 'Felix', 'asax': 'VbNetAspx', 'Rakefile': 'Ruby', 'S': 'S',
307 333 'wsdl': 'Xml', 'js': 'Javascript', 'autodelegate': 'Myghty', 'properties':
308 334 'Ini', 'bash': 'Bash', 'c': 'C', 'g': 'AntlrRuby', 'r3': 'Rebol', 's':
309 335 'Gas', 'ashx': 'VbNetAspx', 'cxx': 'Cpp', 'boo': 'Boo', 'prolog': 'Prolog',
310 336 'sqlite3-console': 'SqliteConsole', 'cl': 'CommonLisp', 'cc': 'Cpp', 'pot':
311 337 'Gettext', 'vim': 'Vim', 'pxi': 'Cython', 'yaml': 'Yaml', 'SConstruct':
312 338 'Python', 'diff': 'Diff', 'txt': 'Text', 'cw': 'Redcode', 'pxd': 'Cython',
313 339 'plot': 'Gnuplot', 'java': 'Java', 'hrl': 'Erlang', 'py': 'Python',
314 340 'makefile': 'Makefile', 'squid.conf': 'SquidConf', 'asm': 'Nasm', 'toc':
315 341 'Tex', 'kid': 'Genshi', 'rhtml': 'Rhtml', 'po': 'Gettext', 'pl': 'Prolog',
316 342 'pm': 'Perl', 'hx': 'Haxe', 'ascx': 'VbNetAspx', 'ooc': 'Ooc', 'asy':
317 343 'Asymptote', 'hs': 'Haskell', 'SConscript': 'Python', 'pytb':
318 344 'PythonTraceback', 'myt': 'Myghty', 'hh': 'Cpp', 'R': 'S', 'aux': 'Tex',
319 345 'rst': 'Rst', 'cpp-objdump': 'CppObjdump', 'lgt': 'Logtalk', 'rss': 'Xml',
320 346 'flx': 'Felix', 'b': 'Brainfuck', 'f': 'Fortran', 'rbw': 'Ruby',
321 347 '.htaccess': 'ApacheConf', 'cxx-objdump': 'CppObjdump', 'j': 'ObjectiveJ',
322 348 'mll': 'Ocaml', 'yml': 'Yaml', 'mu': 'MuPAD', 'r': 'Rebol', 'ASM': 'Nasm',
323 349 'erl': 'Erlang', 'mly': 'Ocaml', 'mo': 'Modelica', 'def': 'Modula2', 'ini':
324 350 'Ini', 'control': 'DebianControl', 'vb': 'VbNet', 'vapi': 'Vala', 'pro':
325 351 'Prolog', 'spt': 'Cheetah', 'mli': 'Ocaml', 'as': 'ActionScript3', 'cmd':
326 352 'Batch', 'cpp': 'Cpp', 'io': 'Io', 'tac': 'Python', 'haml': 'Haml', 'rkt':
327 353 'Racket', 'st':'Smalltalk', 'inc': 'Povray', 'pas': 'Delphi', 'cmake':
328 354 'CMake', 'csh':'Tcsh', 'hpp': 'Cpp', 'feature': 'Gherkin', 'html': 'Html',
329 355 'php':'Php', 'php3':'Php', 'php4':'Php', 'php5':'Php', 'xhtml': 'Html',
330 356 'hxx': 'Cpp', 'eclass': 'Bash', 'css': 'Css',
331 357 'frag': 'GLShader', 'd-objdump': 'DObjdump', 'weechatlog': 'IrcLogs',
332 358 'tcsh': 'Tcsh', 'objdump': 'Objdump', 'pyw': 'Python', 'h++': 'Cpp',
333 359 'py3tb': 'Python3Traceback', 'jsp': 'Jsp', 'sql': 'Sql', 'mak': 'Makefile',
334 360 'php': 'Php', 'mao': 'Mako', 'man': 'Groff', 'dylan': 'Dylan', 'sass':
335 361 'Sass', 'cfml': 'ColdfusionHtml', 'darcspatch': 'DarcsPatch', 'tpl':
336 362 'Smarty', 'm': 'ObjectiveC', 'f90': 'Fortran', 'mod': 'Modula2', 'sh':
337 363 'Bash', 'lhs': 'LiterateHaskell', 'sources.list': 'SourcesList', 'axd':
338 364 'VbNetAspx', 'sc': 'Python'}
339 365
340 366 repos_path = get_repos_path()
341 367 p = os.path.join(repos_path, repo_name)
342 368 repo = get_repo(p)
343 369 tip = repo.get_changeset()
344 370 code_stats = {}
345 371
346 372 def aggregate(cs):
347 373 for f in cs[2]:
348 374 ext = f.extension
349 375 key = LANGUAGES_EXTENSIONS_MAP.get(ext, ext)
350 376 key = key or ext
351 377 if ext in LANGUAGES_EXTENSIONS_MAP.keys():
352 378 if code_stats.has_key(key):
353 379 code_stats[key] += 1
354 380 else:
355 381 code_stats[key] = 1
356 382
357 383 map(aggregate, tip.walk('/'))
358 384
359 385 return code_stats or {}
360 386
361 387
362 388
363 389
@@ -1,215 +1,215 b''
1 1 #!/usr/bin/env python
2 2 # encoding: utf-8
3 3 # whoosh indexer daemon for rhodecode
4 4 # Copyright (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
5 5 #
6 6 # This program is free software; you can redistribute it and/or
7 7 # modify it under the terms of the GNU General Public License
8 8 # as published by the Free Software Foundation; version 2
9 9 # of the License or (at your opinion) any later version of the license.
10 10 #
11 11 # This program is distributed in the hope that it will be useful,
12 12 # but WITHOUT ANY WARRANTY; without even the implied warranty of
13 13 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14 14 # GNU General Public License for more details.
15 15 #
16 16 # You should have received a copy of the GNU General Public License
17 17 # along with this program; if not, write to the Free Software
18 18 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
19 19 # MA 02110-1301, USA.
20 20 """
21 21 Created on Jan 26, 2010
22 22
23 23 @author: marcink
24 24 A deamon will read from task table and run tasks
25 25 """
26 26 import sys
27 27 import os
28 28 from os.path import dirname as dn
29 29 from os.path import join as jn
30 30
31 31 #to get the rhodecode import
32 32 project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
33 33 sys.path.append(project_path)
34 34
35 35
36 36 from rhodecode.model.scm import ScmModel
37 37 from rhodecode.lib.helpers import safe_unicode
38 38 from whoosh.index import create_in, open_dir
39 39 from shutil import rmtree
40 40 from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
41 41
42 42 from time import mktime
43 43 from vcs.exceptions import ChangesetError, RepositoryError
44 44
45 45 import logging
46 46
47 47 log = logging.getLogger('whooshIndexer')
48 48 # create logger
49 49 log.setLevel(logging.DEBUG)
50 50 log.propagate = False
51 51 # create console handler and set level to debug
52 52 ch = logging.StreamHandler()
53 53 ch.setLevel(logging.DEBUG)
54 54
55 55 # create formatter
56 56 formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
57 57
58 58 # add formatter to ch
59 59 ch.setFormatter(formatter)
60 60
61 61 # add ch to logger
62 62 log.addHandler(ch)
63 63
64 64 class WhooshIndexingDaemon(object):
65 65 """
66 66 Deamon for atomic jobs
67 67 """
68 68
69 69 def __init__(self, indexname='HG_INDEX', index_location=None,
70 repo_location=None):
70 repo_location=None, sa=None):
71 71 self.indexname = indexname
72 72
73 73 self.index_location = index_location
74 74 if not index_location:
75 75 raise Exception('You have to provide index location')
76 76
77 77 self.repo_location = repo_location
78 78 if not repo_location:
79 79 raise Exception('You have to provide repositories location')
80 80
81 self.repo_paths = ScmModel().repo_scan(self.repo_location, None)
81 self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
82 82 self.initial = False
83 83 if not os.path.isdir(self.index_location):
84 84 os.makedirs(self.index_location)
85 85 log.info('Cannot run incremental index since it does not'
86 86 ' yet exist running full build')
87 87 self.initial = True
88 88
89 89 def get_paths(self, repo):
90 90 """recursive walk in root dir and return a set of all path in that dir
91 91 based on repository walk function
92 92 """
93 93 index_paths_ = set()
94 94 try:
95 95 for topnode, dirs, files in repo.walk('/', 'tip'):
96 96 for f in files:
97 97 index_paths_.add(jn(repo.path, f.path))
98 98 for dir in dirs:
99 99 for f in files:
100 100 index_paths_.add(jn(repo.path, f.path))
101 101
102 102 except RepositoryError:
103 103 pass
104 104 return index_paths_
105 105
106 106 def get_node(self, repo, path):
107 107 n_path = path[len(repo.path) + 1:]
108 108 node = repo.get_changeset().get_node(n_path)
109 109 return node
110 110
111 111 def get_node_mtime(self, node):
112 112 return mktime(node.last_changeset.date.timetuple())
113 113
114 114 def add_doc(self, writer, path, repo):
115 115 """Adding doc to writer this function itself fetches data from
116 116 the instance of vcs backend"""
117 117 node = self.get_node(repo, path)
118 118
119 119 #we just index the content of chosen files
120 120 if node.extension in INDEX_EXTENSIONS:
121 121 log.debug(' >> %s [WITH CONTENT]' % path)
122 122 u_content = node.content
123 123 else:
124 124 log.debug(' >> %s' % path)
125 125 #just index file name without it's content
126 126 u_content = u''
127 127
128 128 writer.add_document(owner=unicode(repo.contact),
129 129 repository=safe_unicode(repo.name),
130 130 path=safe_unicode(path),
131 131 content=u_content,
132 132 modtime=self.get_node_mtime(node),
133 133 extension=node.extension)
134 134
135 135
136 136 def build_index(self):
137 137 if os.path.exists(self.index_location):
138 138 log.debug('removing previous index')
139 139 rmtree(self.index_location)
140 140
141 141 if not os.path.exists(self.index_location):
142 142 os.mkdir(self.index_location)
143 143
144 144 idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
145 145 writer = idx.writer()
146 146
147 147 for cnt, repo in enumerate(self.repo_paths.values()):
148 148 log.debug('building index @ %s' % repo.path)
149 149
150 150 for idx_path in self.get_paths(repo):
151 151 self.add_doc(writer, idx_path, repo)
152 152
153 153 log.debug('>> COMMITING CHANGES <<')
154 154 writer.commit(merge=True)
155 155 log.debug('>>> FINISHED BUILDING INDEX <<<')
156 156
157 157
158 158 def update_index(self):
159 159 log.debug('STARTING INCREMENTAL INDEXING UPDATE')
160 160
161 161 idx = open_dir(self.index_location, indexname=self.indexname)
162 162 # The set of all paths in the index
163 163 indexed_paths = set()
164 164 # The set of all paths we need to re-index
165 165 to_index = set()
166 166
167 167 reader = idx.reader()
168 168 writer = idx.writer()
169 169
170 170 # Loop over the stored fields in the index
171 171 for fields in reader.all_stored_fields():
172 172 indexed_path = fields['path']
173 173 indexed_paths.add(indexed_path)
174 174
175 175 repo = self.repo_paths[fields['repository']]
176 176
177 177 try:
178 178 node = self.get_node(repo, indexed_path)
179 179 except ChangesetError:
180 180 # This file was deleted since it was indexed
181 181 log.debug('removing from index %s' % indexed_path)
182 182 writer.delete_by_term('path', indexed_path)
183 183
184 184 else:
185 185 # Check if this file was changed since it was indexed
186 186 indexed_time = fields['modtime']
187 187 mtime = self.get_node_mtime(node)
188 188 if mtime > indexed_time:
189 189 # The file has changed, delete it and add it to the list of
190 190 # files to reindex
191 191 log.debug('adding to reindex list %s' % indexed_path)
192 192 writer.delete_by_term('path', indexed_path)
193 193 to_index.add(indexed_path)
194 194
195 195 # Loop over the files in the filesystem
196 196 # Assume we have a function that gathers the filenames of the
197 197 # documents to be indexed
198 198 for repo in self.repo_paths.values():
199 199 for path in self.get_paths(repo):
200 200 if path in to_index or path not in indexed_paths:
201 201 # This is either a file that's changed, or a new file
202 202 # that wasn't indexed before. So index it!
203 203 self.add_doc(writer, path, repo)
204 204 log.debug('re indexing %s' % path)
205 205
206 206 log.debug('>> COMMITING CHANGES <<')
207 207 writer.commit(merge=True)
208 208 log.debug('>>> FINISHED REBUILDING INDEX <<<')
209 209
210 210 def run(self, full_index=False):
211 211 """Run daemon"""
212 212 if full_index or self.initial:
213 213 self.build_index()
214 214 else:
215 215 self.update_index()
General Comments 0
You need to be logged in to leave comments. Login now