upstream/kallithea Commit - r894:1fed3c91

fixes + docs update

marcink -

r894:1fed3c91 beta

parent child

docs/setup.rst

0 +20 -11

              .. _setup:
              Setup
              =====
              Setting up the application
              --------------------------
              First You'll ned to create RhodeCode config file. Run the following command
              to do this
              ::
               paster make-config RhodeCode production.ini
              - This will create `production.ini` config inside the directory
                this config contains various settings for RhodeCode, e.g proxy port,
                email settings, usage of static files, cache, celery settings and logging.
              Next we need to create the database.
              ::
               paster setup-app production.ini
              - This command will create all needed tables and an admin account.
                When asked for a path You can either use a new location of one with already
                existing ones. RhodeCode will simply add all new found repositories to
                it's database. Also make sure You specify correct path to repositories.
              - Remember that the given path for mercurial_ repositories must be write
                accessible for the application. It's very important since RhodeCode web
                interface will work even without such an access but, when trying to do a
                push it'll eventually fail with permission denied errors.
              You are ready to use rhodecode, to run it simply execute
              ::
               paster serve production.ini
              - This command runs the RhodeCode server the app should be available at the
 .0.0.1:5000. This ip and port is configurable via the production.ini
                file created in previous step
              - Use admin account you created to login.
              - Default permissions on each repository is read, and owner is admin. So
                remember to update these if needed. In the admin panel You can toggle ldap,
                anonymous, permissions settings. As well as edit more advanced options on
                users and repositories
              Setting up Whoosh full text search
              ----------------------------------
-             Index for whoosh can be build starting from version 1.1 using paster command
-             passing repo locations to index, as well as Your config file that stores
-             whoosh index files locations. There is possible to pass `-f` to the options
+             Starting from version 1.1 whoosh index can be build using paster command.
+             You have to specify the config file that stores location of index, and
+             location of repositories (`--repo-location`). Starting from version 1.2 it is
+             also possible to specify a comma separated list of repositories (`--index-only`)
+             to build index only on chooses repositories skipping any other found in repos
+             location
+             There is possible also to pass `-f` to the options
              to enable full index rebuild. Without that indexing will run always in in
              incremental mode.
-             ::
+             incremental mode::
               paster make-index production.ini --repo-location=<location for repos>
-             for full index rebuild You can use
-             ::
+             for full index rebuild You can use::
               paster make-index production.ini -f --repo-location=<location for repos>
-             - For full text search You can either put crontab entry for
+             building index just for chosen repositories is possible with such command::
+              paster make-index production.ini --repo-location=<location for repos> --index-only=vcs,rhodecode
-             This command can be run even from crontab in order to do periodical
-             index builds and keep Your index always up to date. An example entry might
-             look like this
+             In order to do periodical index builds and keep Your index always up to date.
+             It's recommended to do a crontab entry for incremental indexing.
+             An example entry might look like this
              ::
               /path/to/python/bin/paster /path/to/rhodecode/production.ini --repo-location=<location for repos>
-             When using incremental(default) mode whoosh will check last modification date
+             When using incremental (default) mode whoosh will check last modification date
              of each file and add it to reindex if newer file is available. Also indexing
              daemon checks for removed files and removes them from index.
              Sometime You might want to rebuild index from scratch. You can do that using
              the `-f` flag passed to paster command or, in admin panel You can check
              `build from scratch` flag.
              Setting up LDAP support
              -----------------------
              RhodeCode starting from version 1.1 supports ldap authentication. In order
              to use ldap, You have to install python-ldap package. This package is available
              via pypi, so You can install it by running
              ::
               easy_install python-ldap
              ::
               pip install python-ldap
              .. note::
                 python-ldap requires some certain libs on Your system, so before installing
                 it check that You have at least `openldap`, and `sasl` libraries.
              ldap settings are located in admin->ldap section,
              Here's a typical ldap setup::
               Enable ldap  = checked                 #controls if ldap access is enabled
               Host         = host.domain.org         #actual ldap server to connect
               Port         = 389 or 689 for ldaps    #ldap server ports
               Enable LDAPS = unchecked               #enable disable ldaps
               Account      = <account>               #access for ldap server(if required)
               Password     = <password>              #password for ldap server(if required)
               Base DN      = uid=%(user)s,CN=users,DC=host,DC=domain,DC=org
              `Account` and `Password` are optional, and used for two-phase ldap
              authentication so those are credentials to access Your ldap, if it doesn't
              support anonymous search/user lookups.
              Base DN must have %(user)s template inside, it's a placer where Your uid used
              to login would go, it allows admins to specify not standard schema for uid
              variable
              If all data are entered correctly, and `python-ldap` is properly installed
              Users should be granted to access RhodeCode wit ldap accounts. When
              logging at the first time an special ldap account is created inside RhodeCode,
              so You can control over permissions even on ldap users. If such user exists
              already in RhodeCode database ldap user with the same username would be not
              able to access RhodeCode.
              If You have problems with ldap access and believe You entered correct
              information check out the RhodeCode logs,any error messages sent from
              ldap will be saved there.
              Setting Up Celery
              -----------------
              Since version 1.1 celery is configured by the rhodecode ini configuration files
              simply set use_celery=true in the ini file then add / change the configuration
              variables inside the ini file.
              Remember that the ini files uses format with '.' not with '_' like celery
              so for example setting `BROKER_HOST` in celery means setting `broker.host` in
              the config file.
              In order to make start using celery run::
               paster celeryd <configfile.ini>
              .. note::
                 Make sure You run this command from same virtualenv, and with the same user
                 that rhodecode runs.
              Nginx virtual host example
              --------------------------
              Sample config for nginx using proxy::
               server {
                  listen          80;
                  server_name     hg.myserver.com;
                  access_log      /var/log/nginx/rhodecode.access.log;
                  error_log       /var/log/nginx/rhodecode.error.log;
                  location / {
                          root /var/www/rhodecode/rhodecode/public/;
                          if (!-f $request_filename){
                              proxy_pass      http://127.0.0.1:5000;
                          }
                          #this is important if You want to use https !!!
                          proxy_set_header X-Url-Scheme $scheme;
                          include         /etc/nginx/proxy.conf;
                  }
               }
              Here's the proxy.conf. It's tuned so it'll not timeout on long
              pushes and also on large pushes::
                  proxy_redirect              off;
                  proxy_set_header            Host $host;
                  proxy_set_header            X-Host $http_host;
                  proxy_set_header            X-Real-IP $remote_addr;
                  proxy_set_header            X-Forwarded-For $proxy_add_x_forwarded_for;
                  proxy_set_header            Proxy-host $proxy_host;
                  client_max_body_size        400m;
                  client_body_buffer_size     128k;
                  proxy_buffering             off;
                  proxy_connect_timeout       3600;
                  proxy_send_timeout          3600;
                  proxy_read_timeout          3600;
                  proxy_buffer_size           8k;
                  proxy_buffers               8 32k;
                  proxy_busy_buffers_size     64k;
                  proxy_temp_file_write_size  64k;
              Also when using root path with nginx You might set the static files to false
              in production.ini file::
                [app:main]
                  use = egg:rhodecode
                  full_stack = true
                  static_files = false
                  lang=en
                  cache_dir = %(here)s/data
              To not have the statics served by the application. And improve speed.
              Apache virtual host example
              ---------------------------
              Sample config for apache using proxy::
              <VirtualHost *:80>
                      ServerName hg.myserver.com
                      ServerAlias hg.myserver.com
                      <Proxy *>
                        Order allow,deny
                        Allow from all
                      </Proxy>
                      #important !
                      #Directive to properly generate url (clone url) for pylons
                      ProxyPreserveHost On
                      #rhodecode instance
                      ProxyPass / http://127.0.0.1:5000/
                      ProxyPassReverse / http://127.0.0.1:5000/
                      #to enable https use line below
                      #SetEnvIf X-Url-Scheme https HTTPS=1
              </VirtualHost>
              Additional tutorial
              http://wiki.pylonshq.com/display/pylonscookbook/Apache+as+a+reverse+proxy+for+Pylons
              Apache's example FCGI config
              ----------------------------
              TODO !
              Other configuration files
              -------------------------
              Some example init.d script can be found here, for debian and gentoo:
              https://rhodeocode.org/rhodecode/files/tip/init.d
              Troubleshooting
              ---------------
              - missing static files ?
               - make sure either to set the `static_files = true` in the .ini file or
                 double check the root path for Your http setup. It should point to
                 for example:
                 /home/my-virtual-python/lib/python2.6/site-packages/rhodecode/public
              - can't install celery/rabbitmq
               - don't worry RhodeCode works without them too. No extra setup required
              - long lasting push timeouts ?
               - make sure You set a longer timeouts in Your proxy/fcgi settings, timeouts
                 are caused by https server and not RhodeCode
              - large pushes timeouts ?
               - make sure You set a proper max_body_size for the http server
              .. _virtualenv: http://pypi.python.org/pypi/virtualenv
              .. _python: http://www.python.org/
              .. _mercurial: http://mercurial.selenic.com/
              .. _celery: http://celeryproject.org/
              .. _rabbitmq: http://www.rabbitmq.com/
  No newline at end of file

rhodecode/lib/indexers/__init__.py

0 +11 -1

              import os
              import sys
              import traceback
              from os.path import dirname as dn, join as jn
              #to get the rhodecode import
              sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
+             from string import strip
              from rhodecode.model import init_model
              from rhodecode.model.scm import ScmModel
              from rhodecode.config.environment import load_environment
              from rhodecode.lib.utils import BasePasterCommand, Command, add_cache
              from shutil import rmtree
              from webhelpers.html.builder import escape
              from vcs.utils.lazy import LazyProperty
              from sqlalchemy import engine_from_config
              from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
              from whoosh.fields import TEXT, ID, STORED, Schema, FieldType
              from whoosh.index import create_in, open_dir
              from whoosh.formats import Characters
              from whoosh.highlight import highlight, SimpleFragmenter, HtmlFormatter
              #EXTENSIONS WE WANT TO INDEX CONTENT OFF
              INDEX_EXTENSIONS = ['action', 'adp', 'ashx', 'asmx', 'aspx', 'asx', 'axd', 'c',
                                  'cfg', 'cfm', 'cpp', 'cs', 'css', 'diff', 'do', 'el', 'erl',
                                  'h', 'htm', 'html', 'ini', 'java', 'js', 'jsp', 'jspx', 'lisp',
                                  'lua', 'm', 'mako', 'ml', 'pas', 'patch', 'php', 'php3',
                                  'php4', 'phtml', 'pm', 'py', 'rb', 'rst', 's', 'sh', 'sql',
                                  'tpl', 'txt', 'vim', 'wss', 'xhtml', 'xml', 'xsl', 'xslt',
                                  'yaws']
              #CUSTOM ANALYZER wordsplit + lowercase filter
              ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
              #INDEX SCHEMA DEFINITION
              SCHEMA = Schema(owner=TEXT(),
                              repository=TEXT(stored=True),
                              path=TEXT(stored=True),
                              content=FieldType(format=Characters(ANALYZER),
                                           scorable=True, stored=True),
                              modtime=STORED(), extension=TEXT(stored=True))
              IDX_NAME = 'HG_INDEX'
              FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
              FRAGMENTER = SimpleFragmenter(200)
              class MakeIndex(BasePasterCommand):
                  max_args = 1
                  min_args = 1
                  usage = "CONFIG_FILE"
                  summary = "Creates index for full text search given configuration file"
                  group_name = "RhodeCode"
                  takes_config_file = -1
                  parser = Command.standard_parser(verbose=True)
                  def command(self):
                      from pylons import config
                      add_cache(config)
                      engine = engine_from_config(config, 'sqlalchemy.db1.')
                      init_model(engine)
                      index_location = config['index_dir']
                      repo_location = self.options.repo_location
+                     repo_list = map(strip, self.options.repo_list.split(','))
                      #======================================================================
                      # WHOOSH DAEMON
                      #======================================================================
                      from rhodecode.lib.pidlock import LockHeld, DaemonLock
                      from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
                      try:
                          l = DaemonLock()
                          WhooshIndexingDaemon(index_location=index_location,
-                                              repo_location=repo_location)\
+                                              repo_location=repo_location,
+                                              repo_list=repo_list)\
                              .run(full_index=self.options.full_index)
                          l.release()
                      except LockHeld:
                          sys.exit(1)
                  def update_parser(self):
                      self.parser.add_option('--repo-location',
                                        action='store',
                                        dest='repo_location',
                                        help="Specifies repositories location to index REQUIRED",
                                        )
+                     self.parser.add_option('--index-only',
+                                       action='store',
+                                       dest='repo_list',
+                                       help="Specifies a comma separated list of repositores "
+                                             "to build index on OPTIONAL",
+                                       )
                      self.parser.add_option('-f',
                                        action='store_true',
                                        dest='full_index',
                                        help="Specifies that index should be made full i.e"
                                              " destroy old and build from scratch",
                                        default=False)
              class ResultWrapper(object):
                  def __init__(self, search_type, searcher, matcher, highlight_items):
                      self.search_type = search_type
                      self.searcher = searcher
                      self.matcher = matcher
                      self.highlight_items = highlight_items
                      self.fragment_size = 200 / 2
                  @LazyProperty
                  def doc_ids(self):
                      docs_id = []
                      while self.matcher.is_active():
                          docnum = self.matcher.id()
                          chunks = [offsets for offsets in self.get_chunks()]
                          docs_id.append([docnum, chunks])
                          self.matcher.next()
                      return docs_id
                  def __str__(self):
                      return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
                  def __repr__(self):
                      return self.__str__()
                  def __len__(self):
                      return len(self.doc_ids)
                  def __iter__(self):
                      """
                      Allows Iteration over results,and lazy generate content
                      *Requires* implementation of ``__getitem__`` method.
                      """
                      for docid in self.doc_ids:
                          yield self.get_full_content(docid)
                  def __getslice__(self, i, j):
                      """
                      Slicing of resultWrapper
                      """
                      slice = []
                      for docid in self.doc_ids[i:j]:
                          slice.append(self.get_full_content(docid))
                      return slice
                  def get_full_content(self, docid):
                      res = self.searcher.stored_fields(docid[0])
                      f_path = res['path'][res['path'].find(res['repository']) \
                                           + len(res['repository']):].lstrip('/')
                      content_short = self.get_short_content(res, docid[1])
                      res.update({'content_short':content_short,
                                  'content_short_hl':self.highlight(content_short),
                                  'f_path':f_path})
                      return res
                  def get_short_content(self, res, chunks):
                      return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
                  def get_chunks(self):
                      """
                      Smart function that implements chunking the content
                      but not overlap chunks so it doesn't highlight the same
                      close occurrences twice.
                      @param matcher:
                      @param size:
                      """
                      memory = [(0, 0)]
                      for span in self.matcher.spans():
                          start = span.startchar or 0
                          end = span.endchar or 0
                          start_offseted = max(0, start - self.fragment_size)
                          end_offseted = end + self.fragment_size
                          if start_offseted < memory[-1][1]:
                              start_offseted = memory[-1][1]
                          memory.append((start_offseted, end_offseted,))
                          yield (start_offseted, end_offseted,)
                  def highlight(self, content, top=5):
                      if self.search_type != 'content':
                          return ''
                      hl = highlight(escape(content),
                               self.highlight_items,
                               analyzer=ANALYZER,
                               fragmenter=FRAGMENTER,
                               formatter=FORMATTER,
                               top=top)
                      return hl

rhodecode/lib/indexers/daemon.py

0 +13 -3

              # -*- coding: utf-8 -*-
              """
                  rhodecode.lib.indexers.daemon
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                  A deamon will read from task table and run tasks
                  :created_on: Jan 26, 2010
                  :author: marcink
                  :copyright: (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
                  :license: GPLv3, see COPYING for more details.
              """
              # This program is free software; you can redistribute it and/or
              # modify it under the terms of the GNU General Public License
              # as published by the Free Software Foundation; version 2
              # of the License or (at your opinion) any later version of the license.
              #
              # This program is distributed in the hope that it will be useful,
              # but WITHOUT ANY WARRANTY; without even the implied warranty of
              # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
              # GNU General Public License for more details.
              #
              # You should have received a copy of the GNU General Public License
              # along with this program; if not, write to the Free Software
              # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
              # MA  02110-1301, USA.
              import sys
              import os
              import traceback
              from os.path import dirname as dn
              from os.path import join as jn
              #to get the rhodecode import
              project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
              sys.path.append(project_path)
              from rhodecode.model.scm import ScmModel
              from rhodecode.lib.helpers import safe_unicode
              from whoosh.index import create_in, open_dir
              from shutil import rmtree
              from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
              from time import mktime
              from vcs.exceptions import ChangesetError, RepositoryError
              import logging
              log = logging.getLogger('whooshIndexer')
              # create logger
              log.setLevel(logging.DEBUG)
              log.propagate = False
              # create console handler and set level to debug
              ch = logging.StreamHandler()
              ch.setLevel(logging.DEBUG)
              # create formatter
              formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
              # add formatter to ch
              ch.setFormatter(formatter)
              # add ch to logger
              log.addHandler(ch)
              class WhooshIndexingDaemon(object):
                  """
                  Deamon for atomic jobs
                  """
                  def __init__(self, indexname='HG_INDEX', index_location=None,
-                              repo_location=None, sa=None):
+                              repo_location=None, sa=None, repo_list=None):
                      self.indexname = indexname
                      self.index_location = index_location
                      if not index_location:
                          raise Exception('You have to provide index location')
                      self.repo_location = repo_location
                      if not repo_location:
                          raise Exception('You have to provide repositories location')
                      self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
+                     if repo_list:
+                         filtered_repo_paths = {}
+                         for repo_name, repo in self.repo_paths.items():
+                             if repo_name in repo_list:
+                                 filtered_repo_paths[repo.name] = repo
+                         self.repo_paths = filtered_repo_paths
                      self.initial = False
                      if not os.path.isdir(self.index_location):
                          os.makedirs(self.index_location)
                          log.info('Cannot run incremental index since it does not'
                                   ' yet exist running full build')
                          self.initial = True
                  def get_paths(self, repo):
                      """recursive walk in root dir and return a set of all path in that dir
                      based on repository walk function
                      """
                      index_paths_ = set()
                      try:
                          for topnode, dirs, files in repo.walk('/', 'tip'):
                              for f in files:
                                  index_paths_.add(jn(repo.path, f.path))
                              for dir in dirs:
                                  for f in files:
                                      index_paths_.add(jn(repo.path, f.path))
                      except RepositoryError, e:
                          log.debug(traceback.format_exc())
                          pass
                      return index_paths_
                  def get_node(self, repo, path):
                      n_path = path[len(repo.path) + 1:]
                      node = repo.get_changeset().get_node(n_path)
                      return node
                  def get_node_mtime(self, node):
                      return mktime(node.last_changeset.date.timetuple())
                  def add_doc(self, writer, path, repo):
                      """Adding doc to writer this function itself fetches data from
                      the instance of vcs backend"""
                      node = self.get_node(repo, path)
                      #we just index the content of chosen files, and skip binary files
                      if node.extension in INDEX_EXTENSIONS and not node.is_binary:
                          u_content = node.content
                          if not isinstance(u_content, unicode):
                              log.warning('  >> %s Could not get this content as unicode '
                                        'replacing with empty content', path)
                              u_content = u''
                          else:
                              log.debug('    >> %s [WITH CONTENT]' % path)
                      else:
                          log.debug('    >> %s' % path)
                          #just index file name without it's content
                          u_content = u''
                      writer.add_document(owner=unicode(repo.contact),
                                      repository=safe_unicode(repo.name),
                                      path=safe_unicode(path),
                                      content=u_content,
                                      modtime=self.get_node_mtime(node),
                                      extension=node.extension)
                  def build_index(self):
                      if os.path.exists(self.index_location):
                          log.debug('removing previous index')
                          rmtree(self.index_location)
                      if not os.path.exists(self.index_location):
                          os.mkdir(self.index_location)
                      idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
                      writer = idx.writer()
-                     print self.repo_paths.values()
-                     for cnt, repo in enumerate(self.repo_paths.values()):
+                     for repo in self.repo_paths.values():
                          log.debug('building index @ %s' % repo.path)
                          for idx_path in self.get_paths(repo):
                              self.add_doc(writer, idx_path, repo)
                      log.debug('>> COMMITING CHANGES <<')
                      writer.commit(merge=True)
                      log.debug('>>> FINISHED BUILDING INDEX <<<')
                  def update_index(self):
                      log.debug('STARTING INCREMENTAL INDEXING UPDATE')
                      idx = open_dir(self.index_location, indexname=self.indexname)
                      # The set of all paths in the index
                      indexed_paths = set()
                      # The set of all paths we need to re-index
                      to_index = set()
                      reader = idx.reader()
                      writer = idx.writer()
                      # Loop over the stored fields in the index
                      for fields in reader.all_stored_fields():
                          indexed_path = fields['path']
                          indexed_paths.add(indexed_path)
                          repo = self.repo_paths[fields['repository']]
                          try:
                              node = self.get_node(repo, indexed_path)
                          except ChangesetError:
                              # This file was deleted since it was indexed
                              log.debug('removing from index %s' % indexed_path)
                              writer.delete_by_term('path', indexed_path)
                          else:
                              # Check if this file was changed since it was indexed
                              indexed_time = fields['modtime']
                              mtime = self.get_node_mtime(node)
                              if mtime > indexed_time:
                                  # The file has changed, delete it and add it to the list of
                                  # files to reindex
                                  log.debug('adding to reindex list %s' % indexed_path)
                                  writer.delete_by_term('path', indexed_path)
                                  to_index.add(indexed_path)
                      # Loop over the files in the filesystem
                      # Assume we have a function that gathers the filenames of the
                      # documents to be indexed
                      for repo in self.repo_paths.values():
                          for path in self.get_paths(repo):
                              if path in to_index or path not in indexed_paths:
                                  # This is either a file that's changed, or a new file
                                  # that wasn't indexed before. So index it!
                                  self.add_doc(writer, path, repo)
                                  log.debug('re indexing %s' % path)
                      log.debug('>> COMMITING CHANGES <<')
                      writer.commit(merge=True)
                      log.debug('>>> FINISHED REBUILDING INDEX <<<')
                  def run(self, full_index=False):
                      """Run daemon"""
                      if full_index or self.initial:
                          self.build_index()
                      else:
                          self.update_index()

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages