upstream/kallithea Commit - r777:aac24db5

fixed cache problem,...

marcink -

r777:aac24db5 beta

parent child

docs/changelog.rst

0 +1 -1

              .. _changelog:
              Changelog
              =========
 .1.0 (**2010-XX-XX**)
              ----------------------
              :status: in-progress
              :branch: beta
              news
              ++++
              - rewrite of internals for vcs >=0.1.10
              - anonymous access, authentication via ldap
              - performance upgrade for cached repos list - each repository has it's own
                cache that's invalidated when needed.
              - main page quick filter for filtering repositories
              - user dashboards with ability to follow chosen repositories actions
              - sends email to admin on new user registration
              - added cache/statistics reset options into repository settings
              - more detailed action logger (based on hooks) with pushed changesets lists
                and options to disable those hooks from admin panel
              - introduced new enhanced changelog for merges that shows more accurate results
              - gui optimizations, fixed application width to 1024px
-             - whoosh,celeryd,upgrade moved to paster command
+             - whoosh, celeryd, upgrade moved to paster command
              fixes
              +++++
              - fixes #61 forked repo was showing only after cache expired
              - fixes #76 no confirmation on user deletes
              - fixes #66 Name field misspelled
              - fixes #72 block user removal when he owns repositories
              - fixes #69 added password confirmation fields
              - numerous small bugfixes
              - a lot of fixes and tweaks for file browser
              - fixed detached session issues
              (special thanks for TkSoh for detailed feedback)
 .0.2 (**2010-11-12**)
              ----------------------
              news
              ++++
              - tested under python2.7
              - bumped sqlalchemy and celery versions
              fixes
              +++++
              - fixed #59 missing graph.js
              - fixed repo_size crash when repository had broken symlinks
              - fixed python2.5 crashes.
 .0.1 (**2010-11-10**)
              ----------------------
              news
              ++++
              - small css updated
              fixes
              +++++
              - fixed #53 python2.5 incompatible enumerate calls
              - fixed #52 disable mercurial extension for web
              - fixed #51 deleting repositories don't delete it's dependent objects
 .0.0 (**2010-11-02**)
              ----------------------
              - security bugfix simplehg wasn't checking for permissions on commands
                other than pull or push.
              - fixed doubled messages after push or pull in admin journal
              - templating and css corrections, fixed repo switcher on chrome, updated titles
              - admin menu accessible from options menu on repository view
              - permissions cached queries
 .0.0rc4  (**2010-10-12**)
              --------------------------
              - fixed python2.5 missing simplejson imports (thanks to Jens Bäckman)
              - removed cache_manager settings from sqlalchemy meta
              - added sqlalchemy cache settings to ini files
              - validated password length and added second try of failure on paster setup-app
              - fixed setup database destroy prompt even when there was no db
 .0.0rc3 (**2010-10-11**)
              -------------------------
              - fixed i18n during installation.
 .0.0rc2 (**2010-10-11**)
              -------------------------
              - Disabled dirsize in file browser, it's causing nasty bug when dir renames
                occure. After vcs is fixed it'll be put back again.
              - templating/css rewrites, optimized css.

docs/setup.rst

0 +16 0

              .. _setup:
              Setup
              =====
              Setting up the application
              --------------------------
              ::
               paster make-config RhodeCode production.ini
              - This will create `production.ini` config inside the directory
                this config contains various settings for RhodeCode, e.g proxy port,
                email settings,static files, cache and logging.
              ::
               paster setup-app production.ini
              - This command will create all needed tables and an admin account.
                When asked for a path You can either use a new location of one with already
                existing ones. RhodeCode will simply add all new found repositories to
                it's database. Also make sure You specify correct path to repositories.
              - Remember that the given path for mercurial_ repositories must be write
                accessible for the application. It's very important since RhodeCode web interface
                will work even without such an access but, when trying to do a push it'll
                eventually fail with permission denied errors.
              - Run
              ::
               paster serve production.ini
              - This command runs the RhodeCode server the app should be available at the
 .0.0.1:5000. This ip and port is configurable via the production.ini
                file  created in previous step
              - Use admin account you created to login.
              - Default permissions on each repository is read, and owner is admin. So
                remember to update these if needed.
              Setting up Whoosh full text search
              ----------------------------------
              Index for whoosh can be build starting from version 1.1 using paster command
              passing repo locations to index, as well as Your config file that stores
              whoosh index files locations. There is possible to pass `-f` to the options
              to enable full index rebuild. Without that indexing will run always in in
              incremental mode.
              ::
               paster make-index --repo-location=<location for repos> production.ini
              for full index rebuild You can use
              ::
               paster make-index -f --repo-location=<location for repos> production.ini
              - For full text search You can either put crontab entry for
              This command can be run even from crontab in order to do periodical
              index builds and keep Your index always up to date. An example entry might
              look like this
              ::
               /path/to/python/bin/paster --repo-location=<location for repos> /path/to/rhodecode/production.ini
              When using incremental(default) mode whoosh will check last modification date
              of each file and add it to reindex if newer file is available. Also indexing
              daemon checks for removed files and removes them from index.
              Sometime You might want to rebuild index from scratch. You can do that using
              the `-f` flag passed to paster command or, in admin panel You can check
              `build from scratch` flag.
              Setting up LDAP support
              -----------------------
              RhodeCode starting from version 1.1 supports ldap authentication. In order
              to use ldap, You have to install python-ldap package. This package is available
              via pypi, so You can install it by running
              ::
               easy_install python-ldap
              ::
               pip install python-ldap
              .. note::
                 python-ldap requires some certain libs on Your system, so before installing
                 it check that You have at least `openldap`, and `sasl` libraries.
              ldap settings are located in admin->ldap section,
              Here's a typical ldap setup::
               Enable ldap  = checked                 #controls if ldap access is enabled
               Host         = host.domain.org         #actual ldap server to connect
               Port         = 389 or 689 for ldaps    #ldap server ports
               Enable LDAPS = unchecked               #enable disable ldaps
               Account      = <account>               #access for ldap server(if required)
               Password     = <password>              #password for ldap server(if required)
               Base DN      = uid=%(user)s,CN=users,DC=host,DC=domain,DC=org
              `Account` and `Password` are optional, and used for two-phase ldap
              authentication so those are credentials to access Your ldap, if it doesn't
              support anonymous search/user lookups.
              Base DN must have %(user)s template inside, it's a placer where Your uid used
              to login would go, it allows admins to specify not standard schema for uid
              variable
              If all data are entered correctly, and `python-ldap` is properly installed
              Users should be granted to access RhodeCode wit ldap accounts. When
              logging at the first time an special ldap account is created inside RhodeCode,
              so You can control over permissions even on ldap users. If such user exists
              already in RhodeCode database ldap user with the same username would be not
              able to access RhodeCode.
              If You have problems with ldap access and believe You entered correct
              information check out the RhodeCode logs,any error messages sent from
              ldap will be saved there.
+             Setting Up Celery
+             -----------------
+             Since version 1.1 celery is configured by the rhodecode ini configuration files
+             simply set use_celery=true in the ini file then add / change the configuration
+             variables inside the ini file.
+             Remember that the ini files uses format with '.' not with '_' like celery
+             so for example setting `BROKER_HOST` in celery means setting `broker.host` in
+             the config file.
+             In order to make start using celery run::
+              paster celeryd <configfile.ini>
              Nginx virtual host example
              --------------------------
              Sample config for nginx using proxy::
               server {
                  listen          80;
                  server_name     hg.myserver.com;
                  access_log      /var/log/nginx/rhodecode.access.log;
                  error_log       /var/log/nginx/rhodecode.error.log;
                  location / {
                          root /var/www/rhodecode/rhodecode/public/;
                          if (!-f $request_filename){
                              proxy_pass      http://127.0.0.1:5000;
                          }
                          #this is important for https !!!
                          proxy_set_header X-Url-Scheme $scheme;
                          include         /etc/nginx/proxy.conf;
                  }
               }
              Here's the proxy.conf. It's tuned so it'll not timeout on long
              pushes and also on large pushes::
                  proxy_redirect              off;
                  proxy_set_header            Host $host;
                  proxy_set_header            X-Host $http_host;
                  proxy_set_header            X-Real-IP $remote_addr;
                  proxy_set_header            X-Forwarded-For $proxy_add_x_forwarded_for;
                  proxy_set_header            Proxy-host $proxy_host;
                  client_max_body_size        400m;
                  client_body_buffer_size     128k;
                  proxy_buffering             off;
                  proxy_connect_timeout       3600;
                  proxy_send_timeout          3600;
                  proxy_read_timeout          3600;
                  proxy_buffer_size           8k;
                  proxy_buffers               8 32k;
                  proxy_busy_buffers_size     64k;
                  proxy_temp_file_write_size  64k;
              Also when using root path with nginx You might set the static files to false
              in production.ini file::
                [app:main]
                  use = egg:rhodecode
                  full_stack = true
                  static_files = false
                  lang=en
                  cache_dir = %(here)s/data
              To not have the statics served by the application. And improve speed.
              Apache reverse proxy
              --------------------
              Tutorial can be found here
              http://wiki.pylonshq.com/display/pylonscookbook/Apache+as+a+reverse+proxy+for+Pylons
              Apache's example FCGI config
              ----------------------------
              TODO !
              Other configuration files
              -------------------------
              Some extra configuration files and examples can be found here:
              http://hg.python-works.com/rhodecode/files/tip/init.d
              and also an celeryconfig file can be use from here:
              http://hg.python-works.com/rhodecode/files/tip/celeryconfig.py
              Troubleshooting
              ---------------
              - missing static files ?
               - make sure either to set the `static_files = true` in the .ini file or
                 double check the root path for Your http setup. It should point to
                 for example:
                 /home/my-virtual-python/lib/python2.6/site-packages/rhodecode/public
              - can't install celery/rabbitmq
               - don't worry RhodeCode works without them too. No extra setup required
              - long lasting push timeouts ?
               - make sure You set a longer timeouts in Your proxy/fcgi settings, timeouts
                 are caused by https server and not RhodeCode
              - large pushes timeouts ?
               - make sure You set a proper max_body_size for the http server
              .. _virtualenv: http://pypi.python.org/pypi/virtualenv
              .. _python: http://www.python.org/
              .. _mercurial: http://mercurial.selenic.com/
              .. _celery: http://celeryproject.org/
              .. _rabbitmq: http://www.rabbitmq.com/
  No newline at end of file

rhodecode/lib/celerylib/tasks.py

0 +27 -1

              from celery.decorators import task
              import os
              import traceback
              import beaker
              from time import mktime
              from operator import itemgetter
              from pylons import config
              from pylons.i18n.translation import _
              from rhodecode.lib.celerylib import run_task, locked_task, str2bool
              from rhodecode.lib.helpers import person
              from rhodecode.lib.smtp_mailer import SmtpMailer
              from rhodecode.lib.utils import OrderedDict
              from rhodecode.model import init_model
              from rhodecode.model import meta
              from rhodecode.model.db import RhodeCodeUi
              from vcs.backends import get_repo
              from sqlalchemy import engine_from_config
+             #set cache regions for beaker so celery can utilise it
+             def add_cache(settings):
+                 cache_settings = {'regions':None}
+                 for key in settings.keys():
+                     for prefix in ['beaker.cache.', 'cache.']:
+                         if key.startswith(prefix):
+                             name = key.split(prefix)[1].strip()
+                             cache_settings[name] = settings[key].strip()
+                 if cache_settings['regions']:
+                     for region in cache_settings['regions'].split(','):
+                         region = region.strip()
+                         region_settings = {}
+                         for key, value in cache_settings.items():
+                             if key.startswith(region):
+                                 region_settings[key.split('.')[1]] = value
+                         region_settings['expire'] = int(region_settings.get('expire',
+))
+                         region_settings.setdefault('lock_dir',
+                                                    cache_settings.get('lock_dir'))
+                         if 'type' not in region_settings:
+                             region_settings['type'] = cache_settings.get('type',
+                                                                          'memory')
+                         beaker.cache.cache_regions[region] = region_settings
+             add_cache(config)
              try:
                  import json
              except ImportError:
                  #python 2.5 compatibility
                  import simplejson as json
              __all__ = ['whoosh_index', 'get_commits_stats',
                         'reset_user_password', 'send_email']
              CELERY_ON = str2bool(config['app_conf'].get('use_celery'))
              def get_session():
                  if CELERY_ON:
                      engine = engine_from_config(config, 'sqlalchemy.db1.')
                      init_model(engine)
                  sa = meta.Session()
                  return sa
              def get_repos_path():
                  sa = get_session()
                  q = sa.query(RhodeCodeUi).filter(RhodeCodeUi.ui_key == '/').one()
                  return q.ui_value
              @task
              @locked_task
              def whoosh_index(repo_location, full_index):
                  log = whoosh_index.get_logger()
                  from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
                  index_location = config['index_dir']
                  WhooshIndexingDaemon(index_location=index_location,
-                                      repo_location=repo_location).run(full_index=full_index)
+                                      repo_location=repo_location, sa=get_session())\
+                                      .run(full_index=full_index)
              @task
              @locked_task
              def get_commits_stats(repo_name, ts_min_y, ts_max_y):
                  from rhodecode.model.db import Statistics, Repository
                  log = get_commits_stats.get_logger()
                  #for js data compatibilty
                  author_key_cleaner = lambda k: person(k).replace('"', "")
                  commits_by_day_author_aggregate = {}
                  commits_by_day_aggregate = {}
                  repos_path = get_repos_path()
                  p = os.path.join(repos_path, repo_name)
                  repo = get_repo(p)
                  skip_date_limit = True
                  parse_limit = 250 #limit for single task changeset parsing optimal for
                  last_rev = 0
                  last_cs = None
                  timegetter = itemgetter('time')
                  sa = get_session()
                  dbrepo = sa.query(Repository)\
                      .filter(Repository.repo_name == repo_name).scalar()
                  cur_stats = sa.query(Statistics)\
                      .filter(Statistics.repository == dbrepo).scalar()
                  if cur_stats:
                      last_rev = cur_stats.stat_on_revision
                  if not repo.revisions:
                      return True
                  if last_rev == repo.revisions[-1] and len(repo.revisions) > 1:
                      #pass silently without any work if we're not on first revision or
                      #current state of parsing revision(from db marker) is the last revision
                      return True
                  if cur_stats:
                      commits_by_day_aggregate = OrderedDict(
                                                     json.loads(
                                                      cur_stats.commit_activity_combined))
                      commits_by_day_author_aggregate = json.loads(cur_stats.commit_activity)
                  log.debug('starting parsing %s', parse_limit)
                  lmktime = mktime
                  for cnt, rev in enumerate(repo.revisions[last_rev:]):
                      last_cs = cs = repo.get_changeset(rev)
                      k = '%s-%s-%s' % (cs.date.timetuple()[0], cs.date.timetuple()[1],
                                        cs.date.timetuple()[2])
                      timetupple = [int(x) for x in k.split('-')]
                      timetupple.extend([0 for _ in xrange(6)])
                      k = lmktime(timetupple)
                      if commits_by_day_author_aggregate.has_key(author_key_cleaner(cs.author)):
                          try:
                              l = [timegetter(x) for x in commits_by_day_author_aggregate\
                                      [author_key_cleaner(cs.author)]['data']]
                              time_pos = l.index(k)
                          except ValueError:
                              time_pos = False
                          if time_pos >= 0 and time_pos is not False:
                              datadict = commits_by_day_author_aggregate\
                                  [author_key_cleaner(cs.author)]['data'][time_pos]
                              datadict["commits"] += 1
                              datadict["added"] += len(cs.added)
                              datadict["changed"] += len(cs.changed)
                              datadict["removed"] += len(cs.removed)
                          else:
                              if k >= ts_min_y and k <= ts_max_y or skip_date_limit:
                                  datadict = {"time":k,
                                              "commits":1,
                                              "added":len(cs.added),
                                              "changed":len(cs.changed),
                                              "removed":len(cs.removed),
                                             }
                                  commits_by_day_author_aggregate\
                                      [author_key_cleaner(cs.author)]['data'].append(datadict)
                      else:
                          if k >= ts_min_y and k <= ts_max_y or skip_date_limit:
                              commits_by_day_author_aggregate[author_key_cleaner(cs.author)] = {
                                                  "label":author_key_cleaner(cs.author),
                                                  "data":[{"time":k,
                                                           "commits":1,
                                                           "added":len(cs.added),
                                                           "changed":len(cs.changed),
                                                           "removed":len(cs.removed),
                                                           }],
                                                  "schema":["commits"],
                                                  }
                      #gather all data by day
                      if commits_by_day_aggregate.has_key(k):
                          commits_by_day_aggregate[k] += 1
                      else:
                          commits_by_day_aggregate[k] = 1
                      if cnt >= parse_limit:
                          #don't fetch to much data since we can freeze application
                          break
                  overview_data = []
                  for k, v in commits_by_day_aggregate.items():
                      overview_data.append([k, v])
                  overview_data = sorted(overview_data, key=itemgetter(0))
                  if not commits_by_day_author_aggregate:
                      commits_by_day_author_aggregate[author_key_cleaner(repo.contact)] = {
                          "label":author_key_cleaner(repo.contact),
                          "data":[0, 1],
                          "schema":["commits"],
                      }
                  stats = cur_stats if cur_stats else Statistics()
                  stats.commit_activity = json.dumps(commits_by_day_author_aggregate)
                  stats.commit_activity_combined = json.dumps(overview_data)
                  log.debug('last revison %s', last_rev)
                  leftovers = len(repo.revisions[last_rev:])
                  log.debug('revisions to parse %s', leftovers)
                  if last_rev == 0 or leftovers < parse_limit:
                      stats.languages = json.dumps(__get_codes_stats(repo_name))
                  stats.repository = dbrepo
                  stats.stat_on_revision = last_cs.revision
                  try:
                      sa.add(stats)
                      sa.commit()
                  except:
                      log.error(traceback.format_exc())
                      sa.rollback()
                      return False
                  if len(repo.revisions) > 1:
                      run_task(get_commits_stats, repo_name, ts_min_y, ts_max_y)
                  return True
              @task
              def reset_user_password(user_email):
                  log = reset_user_password.get_logger()
                  from rhodecode.lib import auth
                  from rhodecode.model.db import User
                  try:
                      try:
                          sa = get_session()
                          user = sa.query(User).filter(User.email == user_email).scalar()
                          new_passwd = auth.PasswordGenerator().gen_password(8,
                                           auth.PasswordGenerator.ALPHABETS_BIG_SMALL)
                          if user:
                              user.password = auth.get_crypt_password(new_passwd)
                              sa.add(user)
                              sa.commit()
                              log.info('change password for %s', user_email)
                          if new_passwd is None:
                              raise Exception('unable to generate new password')
                      except:
                          log.error(traceback.format_exc())
                          sa.rollback()
                      run_task(send_email, user_email,
                               "Your new rhodecode password",
                               'Your new rhodecode password:%s' % (new_passwd))
                      log.info('send new password mail to %s', user_email)
                  except:
                      log.error('Failed to update user password')
                      log.error(traceback.format_exc())
                  return True
              @task
              def send_email(recipients, subject, body):
                  """
                  Sends an email with defined parameters from the .ini files.
                  :param recipients: list of recipients, it this is empty the defined email
                      address from field 'email_to' is used instead
                  :param subject: subject of the mail
                  :param body: body of the mail
                  """
                  log = send_email.get_logger()
                  email_config = config
                  if not recipients:
                      recipients = [email_config.get('email_to')]
                  mail_from = email_config.get('app_email_from')
                  user = email_config.get('smtp_username')
                  passwd = email_config.get('smtp_password')
                  mail_server = email_config.get('smtp_server')
                  mail_port = email_config.get('smtp_port')
                  tls = str2bool(email_config.get('smtp_use_tls'))
                  ssl = str2bool(email_config.get('smtp_use_ssl'))
                  try:
                      m = SmtpMailer(mail_from, user, passwd, mail_server,
                                     mail_port, ssl, tls)
                      m.send(recipients, subject, body)
                  except:
                      log.error('Mail sending failed')
                      log.error(traceback.format_exc())
                      return False
                  return True
              @task
              def create_repo_fork(form_data, cur_user):
                  from rhodecode.model.repo import RepoModel
                  from vcs import get_backend
                  log = create_repo_fork.get_logger()
                  repo_model = RepoModel(get_session())
                  repo_model.create(form_data, cur_user, just_db=True, fork=True)
                  repo_name = form_data['repo_name']
                  repos_path = get_repos_path()
                  repo_path = os.path.join(repos_path, repo_name)
                  repo_fork_path = os.path.join(repos_path, form_data['fork_name'])
                  alias = form_data['repo_type']
                  log.info('creating repo fork %s as %s', repo_name, repo_path)
                  backend = get_backend(alias)
                  backend(str(repo_fork_path), create=True, src_url=str(repo_path))
              def __get_codes_stats(repo_name):
                  LANGUAGES_EXTENSIONS_MAP = {'scm': 'Scheme', 'asmx': 'VbNetAspx', 'Rout':
                  'RConsole', 'rest': 'Rst', 'abap': 'ABAP', 'go': 'Go', 'phtml': 'HtmlPhp',
                  'ns2': 'Newspeak', 'xml': 'EvoqueXml', 'sh-session': 'BashSession', 'ads':
                  'Ada', 'clj': 'Clojure', 'll': 'Llvm', 'ebuild': 'Bash', 'adb': 'Ada',
                  'ada': 'Ada', 'c++-objdump': 'CppObjdump', 'aspx':
                  'VbNetAspx', 'ksh': 'Bash', 'coffee': 'CoffeeScript', 'vert': 'GLShader',
                  'Makefile.*': 'Makefile', 'di': 'D', 'dpatch': 'DarcsPatch', 'rake':
                  'Ruby', 'moo': 'MOOCode', 'erl-sh': 'ErlangShell', 'geo': 'GLShader',
                  'pov': 'Povray', 'bas': 'VbNet', 'bat': 'Batch', 'd': 'D', 'lisp':
                  'CommonLisp', 'h': 'C', 'rbx': 'Ruby', 'tcl': 'Tcl', 'c++': 'Cpp', 'md':
                  'MiniD', '.vimrc': 'Vim', 'xsd': 'Xml', 'ml': 'Ocaml', 'el': 'CommonLisp',
                  'befunge': 'Befunge', 'xsl': 'Xslt', 'pyx': 'Cython', 'cfm':
                  'ColdfusionHtml', 'evoque': 'Evoque', 'cfg': 'Ini', 'htm': 'Html',
                  'Makefile': 'Makefile', 'cfc': 'ColdfusionHtml', 'tex': 'Tex', 'cs':
                  'CSharp', 'mxml': 'Mxml', 'patch': 'Diff', 'apache.conf': 'ApacheConf',
                  'scala': 'Scala', 'applescript': 'AppleScript', 'GNUmakefile': 'Makefile',
                  'c-objdump': 'CObjdump', 'lua': 'Lua', 'apache2.conf': 'ApacheConf', 'rb':
                  'Ruby', 'gemspec': 'Ruby', 'rl': 'RagelObjectiveC', 'vala': 'Vala', 'tmpl':
                  'Cheetah', 'bf': 'Brainfuck', 'plt': 'Gnuplot', 'G': 'AntlrRuby', 'xslt':
                  'Xslt', 'flxh': 'Felix', 'asax': 'VbNetAspx', 'Rakefile': 'Ruby', 'S': 'S',
                  'wsdl': 'Xml', 'js': 'Javascript', 'autodelegate': 'Myghty', 'properties':
                  'Ini', 'bash': 'Bash', 'c': 'C', 'g': 'AntlrRuby', 'r3': 'Rebol', 's':
                  'Gas', 'ashx': 'VbNetAspx', 'cxx': 'Cpp', 'boo': 'Boo', 'prolog': 'Prolog',
                  'sqlite3-console': 'SqliteConsole', 'cl': 'CommonLisp', 'cc': 'Cpp', 'pot':
                  'Gettext', 'vim': 'Vim', 'pxi': 'Cython', 'yaml': 'Yaml', 'SConstruct':
                  'Python', 'diff': 'Diff', 'txt': 'Text', 'cw': 'Redcode', 'pxd': 'Cython',
                  'plot': 'Gnuplot', 'java': 'Java', 'hrl': 'Erlang', 'py': 'Python',
                  'makefile': 'Makefile', 'squid.conf': 'SquidConf', 'asm': 'Nasm', 'toc':
                  'Tex', 'kid': 'Genshi', 'rhtml': 'Rhtml', 'po': 'Gettext', 'pl': 'Prolog',
                  'pm': 'Perl', 'hx': 'Haxe', 'ascx': 'VbNetAspx', 'ooc': 'Ooc', 'asy':
                  'Asymptote', 'hs': 'Haskell', 'SConscript': 'Python', 'pytb':
                  'PythonTraceback', 'myt': 'Myghty', 'hh': 'Cpp', 'R': 'S', 'aux': 'Tex',
                  'rst': 'Rst', 'cpp-objdump': 'CppObjdump', 'lgt': 'Logtalk', 'rss': 'Xml',
                  'flx': 'Felix', 'b': 'Brainfuck', 'f': 'Fortran', 'rbw': 'Ruby',
                  '.htaccess': 'ApacheConf', 'cxx-objdump': 'CppObjdump', 'j': 'ObjectiveJ',
                  'mll': 'Ocaml', 'yml': 'Yaml', 'mu': 'MuPAD', 'r': 'Rebol', 'ASM': 'Nasm',
                  'erl': 'Erlang', 'mly': 'Ocaml', 'mo': 'Modelica', 'def': 'Modula2', 'ini':
                  'Ini', 'control': 'DebianControl', 'vb': 'VbNet', 'vapi': 'Vala', 'pro':
                  'Prolog', 'spt': 'Cheetah', 'mli': 'Ocaml', 'as': 'ActionScript3', 'cmd':
                  'Batch', 'cpp': 'Cpp', 'io': 'Io', 'tac': 'Python', 'haml': 'Haml', 'rkt':
                  'Racket', 'st':'Smalltalk', 'inc': 'Povray', 'pas': 'Delphi', 'cmake':
                  'CMake', 'csh':'Tcsh', 'hpp': 'Cpp', 'feature': 'Gherkin', 'html': 'Html',
                  'php':'Php', 'php3':'Php', 'php4':'Php', 'php5':'Php', 'xhtml': 'Html',
                  'hxx': 'Cpp', 'eclass': 'Bash', 'css': 'Css',
                  'frag': 'GLShader', 'd-objdump': 'DObjdump', 'weechatlog': 'IrcLogs',
                  'tcsh': 'Tcsh', 'objdump': 'Objdump', 'pyw': 'Python', 'h++': 'Cpp',
                  'py3tb': 'Python3Traceback', 'jsp': 'Jsp', 'sql': 'Sql', 'mak': 'Makefile',
                  'php': 'Php', 'mao': 'Mako', 'man': 'Groff', 'dylan': 'Dylan', 'sass':
                  'Sass', 'cfml': 'ColdfusionHtml', 'darcspatch': 'DarcsPatch', 'tpl':
                  'Smarty', 'm': 'ObjectiveC', 'f90': 'Fortran', 'mod': 'Modula2', 'sh':
                  'Bash', 'lhs': 'LiterateHaskell', 'sources.list': 'SourcesList', 'axd':
                  'VbNetAspx', 'sc': 'Python'}
                  repos_path = get_repos_path()
                  p = os.path.join(repos_path, repo_name)
                  repo = get_repo(p)
                  tip = repo.get_changeset()
                  code_stats = {}
                  def aggregate(cs):
                      for f in cs[2]:
                          ext = f.extension
                          key = LANGUAGES_EXTENSIONS_MAP.get(ext, ext)
                          key = key or ext
                          if ext in LANGUAGES_EXTENSIONS_MAP.keys():
                              if code_stats.has_key(key):
                                  code_stats[key] += 1
                              else:
                                  code_stats[key] = 1
                  map(aggregate, tip.walk('/'))
                  return code_stats or {}

rhodecode/lib/indexers/daemon.py

0 +2 -2

              #!/usr/bin/env python
              # encoding: utf-8
              # whoosh indexer daemon for rhodecode
              # Copyright (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
              #
              # This program is free software; you can redistribute it and/or
              # modify it under the terms of the GNU General Public License
              # as published by the Free Software Foundation; version 2
              # of the License or (at your opinion) any later version of the license.
              #
              # This program is distributed in the hope that it will be useful,
              # but WITHOUT ANY WARRANTY; without even the implied warranty of
              # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
              # GNU General Public License for more details.
              #
              # You should have received a copy of the GNU General Public License
              # along with this program; if not, write to the Free Software
              # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
              # MA  02110-1301, USA.
              """
              Created on Jan 26, 2010
              @author: marcink
              A deamon will read from task table and run tasks
              """
              import sys
              import os
              from os.path import dirname as dn
              from os.path import join as jn
              #to get the rhodecode import
              project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
              sys.path.append(project_path)
              from rhodecode.model.scm import ScmModel
              from rhodecode.lib.helpers import safe_unicode
              from whoosh.index import create_in, open_dir
              from shutil import rmtree
              from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
              from time import mktime
              from vcs.exceptions import ChangesetError, RepositoryError
              import logging
              log = logging.getLogger('whooshIndexer')
              # create logger
              log.setLevel(logging.DEBUG)
              log.propagate = False
              # create console handler and set level to debug
              ch = logging.StreamHandler()
              ch.setLevel(logging.DEBUG)
              # create formatter
              formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
              # add formatter to ch
              ch.setFormatter(formatter)
              # add ch to logger
              log.addHandler(ch)
              class WhooshIndexingDaemon(object):
                  """
                  Deamon for atomic jobs
                  """
                  def __init__(self, indexname='HG_INDEX', index_location=None,
-                              repo_location=None):
+                              repo_location=None, sa=None):
                      self.indexname = indexname
                      self.index_location = index_location
                      if not index_location:
                          raise Exception('You have to provide index location')
                      self.repo_location = repo_location
                      if not repo_location:
                          raise Exception('You have to provide repositories location')
-                     self.repo_paths = ScmModel().repo_scan(self.repo_location, None)
+                     self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
                      self.initial = False
                      if not os.path.isdir(self.index_location):
                          os.makedirs(self.index_location)
                          log.info('Cannot run incremental index since it does not'
                                   ' yet exist running full build')
                          self.initial = True
                  def get_paths(self, repo):
                      """recursive walk in root dir and return a set of all path in that dir
                      based on repository walk function
                      """
                      index_paths_ = set()
                      try:
                          for topnode, dirs, files in repo.walk('/', 'tip'):
                              for f in files:
                                  index_paths_.add(jn(repo.path, f.path))
                              for dir in dirs:
                                  for f in files:
                                      index_paths_.add(jn(repo.path, f.path))
                      except RepositoryError:
                          pass
                      return index_paths_
                  def get_node(self, repo, path):
                      n_path = path[len(repo.path) + 1:]
                      node = repo.get_changeset().get_node(n_path)
                      return node
                  def get_node_mtime(self, node):
                      return mktime(node.last_changeset.date.timetuple())
                  def add_doc(self, writer, path, repo):
                      """Adding doc to writer this function itself fetches data from
                      the instance of vcs backend"""
                      node = self.get_node(repo, path)
                      #we just index the content of chosen files
                      if node.extension in INDEX_EXTENSIONS:
                          log.debug('    >> %s [WITH CONTENT]' % path)
                          u_content = node.content
                      else:
                          log.debug('    >> %s' % path)
                          #just index file name without it's content
                          u_content = u''
                      writer.add_document(owner=unicode(repo.contact),
                                      repository=safe_unicode(repo.name),
                                      path=safe_unicode(path),
                                      content=u_content,
                                      modtime=self.get_node_mtime(node),
                                      extension=node.extension)
                  def build_index(self):
                      if os.path.exists(self.index_location):
                          log.debug('removing previous index')
                          rmtree(self.index_location)
                      if not os.path.exists(self.index_location):
                          os.mkdir(self.index_location)
                      idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
                      writer = idx.writer()
                      for cnt, repo in enumerate(self.repo_paths.values()):
                          log.debug('building index @ %s' % repo.path)
                          for idx_path in self.get_paths(repo):
                              self.add_doc(writer, idx_path, repo)
                      log.debug('>> COMMITING CHANGES <<')
                      writer.commit(merge=True)
                      log.debug('>>> FINISHED BUILDING INDEX <<<')
                  def update_index(self):
                      log.debug('STARTING INCREMENTAL INDEXING UPDATE')
                      idx = open_dir(self.index_location, indexname=self.indexname)
                      # The set of all paths in the index
                      indexed_paths = set()
                      # The set of all paths we need to re-index
                      to_index = set()
                      reader = idx.reader()
                      writer = idx.writer()
                      # Loop over the stored fields in the index
                      for fields in reader.all_stored_fields():
                          indexed_path = fields['path']
                          indexed_paths.add(indexed_path)
                          repo = self.repo_paths[fields['repository']]
                          try:
                              node = self.get_node(repo, indexed_path)
                          except ChangesetError:
                              # This file was deleted since it was indexed
                              log.debug('removing from index %s' % indexed_path)
                              writer.delete_by_term('path', indexed_path)
                          else:
                              # Check if this file was changed since it was indexed
                              indexed_time = fields['modtime']
                              mtime = self.get_node_mtime(node)
                              if mtime > indexed_time:
                                  # The file has changed, delete it and add it to the list of
                                  # files to reindex
                                  log.debug('adding to reindex list %s' % indexed_path)
                                  writer.delete_by_term('path', indexed_path)
                                  to_index.add(indexed_path)
                      # Loop over the files in the filesystem
                      # Assume we have a function that gathers the filenames of the
                      # documents to be indexed
                      for repo in self.repo_paths.values():
                          for path in self.get_paths(repo):
                              if path in to_index or path not in indexed_paths:
                                  # This is either a file that's changed, or a new file
                                  # that wasn't indexed before. So index it!
                                  self.add_doc(writer, path, repo)
                                  log.debug('re indexing %s' % path)
                      log.debug('>> COMMITING CHANGES <<')
                      writer.commit(merge=True)
                      log.debug('>>> FINISHED REBUILDING INDEX <<<')
                  def run(self, full_index=False):
                      """Run daemon"""
                      if full_index or self.initial:
                          self.build_index()
                      else:
                          self.update_index()

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages