upstream/kallithea Commit - r894:1fed3c91

fixes + docs update

marcink -

r894:1fed3c91 beta

parent child

docs/setup.rst

0 +20 -11

             .. _setup:
             Setup
             =====
             Setting up the application
             --------------------------
             First You'll ned to create RhodeCode config file. Run the following command
             to do this
             ::
              paster make-config RhodeCode production.ini
             - This will create `production.ini` config inside the directory
               this config contains various settings for RhodeCode, e.g proxy port,
               email settings, usage of static files, cache, celery settings and logging.
             Next we need to create the database.
             ::
              paster setup-app production.ini
             - This command will create all needed tables and an admin account.
               When asked for a path You can either use a new location of one with already
               existing ones. RhodeCode will simply add all new found repositories to
               it's database. Also make sure You specify correct path to repositories.
             - Remember that the given path for mercurial_ repositories must be write
               accessible for the application. It's very important since RhodeCode web
               interface will work even without such an access but, when trying to do a
               push it'll eventually fail with permission denied errors.
             You are ready to use rhodecode, to run it simply execute
             ::
              paster serve production.ini
             - This command runs the RhodeCode server the app should be available at the
 .0.0.1:5000. This ip and port is configurable via the production.ini
               file created in previous step
             - Use admin account you created to login.
             - Default permissions on each repository is read, and owner is admin. So
               remember to update these if needed. In the admin panel You can toggle ldap,
               anonymous, permissions settings. As well as edit more advanced options on
               users and repositories
             Setting up Whoosh full text search
             ----------------------------------
-            Index for whoosh can be build starting from version 1.1 using paster command
+            Starting from version 1.1 whoosh index can be build using paster command.
-            passing repo locations to index, as well as Your config file that stores
+            You have to specify the config file that stores location of index, and
-            whoosh index files locations. There is possible to pass `-f` to the options
+            location of repositories (`--repo-location`). Starting from version 1.2 it is
+            also possible to specify a comma separated list of repositories (`--index-only`)
+            to build index only on chooses repositories skipping any other found in repos
+            location
+            There is possible also to pass `-f` to the options
             to enable full index rebuild. Without that indexing will run always in in
             incremental mode.
-            ::
+            incremental mode::
              paster make-index production.ini --repo-location=<location for repos>
-            for full index rebuild You can use
-            ::
+            for full index rebuild You can use::
              paster make-index production.ini -f --repo-location=<location for repos>
-            - For full text search You can either put crontab entry for
+            building index just for chosen repositories is possible with such command::
+             paster make-index production.ini --repo-location=<location for repos> --index-only=vcs,rhodecode
-            This command can be run even from crontab in order to do periodical
-            index builds and keep Your index always up to date. An example entry might
+            In order to do periodical index builds and keep Your index always up to date.
-            look like this
+            It's recommended to do a crontab entry for incremental indexing.
+            An example entry might look like this
             ::
              /path/to/python/bin/paster /path/to/rhodecode/production.ini --repo-location=<location for repos>
-            When using incremental(default) mode whoosh will check last modification date
+            When using incremental (default) mode whoosh will check last modification date
             of each file and add it to reindex if newer file is available. Also indexing
             daemon checks for removed files and removes them from index.
             Sometime You might want to rebuild index from scratch. You can do that using
             the `-f` flag passed to paster command or, in admin panel You can check
             `build from scratch` flag.
             Setting up LDAP support
             -----------------------
             RhodeCode starting from version 1.1 supports ldap authentication. In order
             to use ldap, You have to install python-ldap package. This package is available
             via pypi, so You can install it by running
             ::
              easy_install python-ldap
             ::
              pip install python-ldap
             .. note::
                python-ldap requires some certain libs on Your system, so before installing
                it check that You have at least `openldap`, and `sasl` libraries.
             ldap settings are located in admin->ldap section,
             Here's a typical ldap setup::
              Enable ldap  = checked                 #controls if ldap access is enabled
              Host         = host.domain.org         #actual ldap server to connect
              Port         = 389 or 689 for ldaps    #ldap server ports
              Enable LDAPS = unchecked               #enable disable ldaps
              Account      = <account>               #access for ldap server(if required)
              Password     = <password>              #password for ldap server(if required)
              Base DN      = uid=%(user)s,CN=users,DC=host,DC=domain,DC=org
             `Account` and `Password` are optional, and used for two-phase ldap
             authentication so those are credentials to access Your ldap, if it doesn't
             support anonymous search/user lookups.
             Base DN must have %(user)s template inside, it's a placer where Your uid used
             to login would go, it allows admins to specify not standard schema for uid
             variable
             If all data are entered correctly, and `python-ldap` is properly installed
             Users should be granted to access RhodeCode wit ldap accounts. When
             logging at the first time an special ldap account is created inside RhodeCode,
             so You can control over permissions even on ldap users. If such user exists
             already in RhodeCode database ldap user with the same username would be not
             able to access RhodeCode.
             If You have problems with ldap access and believe You entered correct
             information check out the RhodeCode logs,any error messages sent from
             ldap will be saved there.
             Setting Up Celery
             -----------------
             Since version 1.1 celery is configured by the rhodecode ini configuration files
             simply set use_celery=true in the ini file then add / change the configuration
             variables inside the ini file.
             Remember that the ini files uses format with '.' not with '_' like celery
             so for example setting `BROKER_HOST` in celery means setting `broker.host` in
             the config file.
             In order to make start using celery run::
              paster celeryd <configfile.ini>
             .. note::
                Make sure You run this command from same virtualenv, and with the same user
                that rhodecode runs.
             Nginx virtual host example
             --------------------------
             Sample config for nginx using proxy::
              server {
                 listen          80;
                 server_name     hg.myserver.com;
                 access_log      /var/log/nginx/rhodecode.access.log;
                 error_log       /var/log/nginx/rhodecode.error.log;
                 location / {
                         root /var/www/rhodecode/rhodecode/public/;
                         if (!-f $request_filename){
                             proxy_pass      http://127.0.0.1:5000;
                         }
                         #this is important if You want to use https !!!
                         proxy_set_header X-Url-Scheme $scheme;
                         include         /etc/nginx/proxy.conf;
                 }
              }
             Here's the proxy.conf. It's tuned so it'll not timeout on long
             pushes and also on large pushes::
                 proxy_redirect              off;
                 proxy_set_header            Host $host;
                 proxy_set_header            X-Host $http_host;
                 proxy_set_header            X-Real-IP $remote_addr;
                 proxy_set_header            X-Forwarded-For $proxy_add_x_forwarded_for;
                 proxy_set_header            Proxy-host $proxy_host;
                 client_max_body_size        400m;
                 client_body_buffer_size     128k;
                 proxy_buffering             off;
                 proxy_connect_timeout       3600;
                 proxy_send_timeout          3600;
                 proxy_read_timeout          3600;
                 proxy_buffer_size           8k;
                 proxy_buffers               8 32k;
                 proxy_busy_buffers_size     64k;
                 proxy_temp_file_write_size  64k;
             Also when using root path with nginx You might set the static files to false
             in production.ini file::
               [app:main]
                 use = egg:rhodecode
                 full_stack = true
                 static_files = false
                 lang=en
                 cache_dir = %(here)s/data
             To not have the statics served by the application. And improve speed.
             Apache virtual host example
             ---------------------------
             Sample config for apache using proxy::
             <VirtualHost *:80>
                     ServerName hg.myserver.com
                     ServerAlias hg.myserver.com
                     <Proxy *>
                       Order allow,deny
                       Allow from all
                     </Proxy>
                     #important !
                     #Directive to properly generate url (clone url) for pylons
                     ProxyPreserveHost On
                     #rhodecode instance
                     ProxyPass / http://127.0.0.1:5000/
                     ProxyPassReverse / http://127.0.0.1:5000/
                     #to enable https use line below
                     #SetEnvIf X-Url-Scheme https HTTPS=1
             </VirtualHost>
             Additional tutorial
             http://wiki.pylonshq.com/display/pylonscookbook/Apache+as+a+reverse+proxy+for+Pylons
             Apache's example FCGI config
             ----------------------------
             TODO !
             Other configuration files
             -------------------------
             Some example init.d script can be found here, for debian and gentoo:
             https://rhodeocode.org/rhodecode/files/tip/init.d
             Troubleshooting
             ---------------
             - missing static files ?
              - make sure either to set the `static_files = true` in the .ini file or
                double check the root path for Your http setup. It should point to
                for example:
                /home/my-virtual-python/lib/python2.6/site-packages/rhodecode/public
             - can't install celery/rabbitmq
              - don't worry RhodeCode works without them too. No extra setup required
             - long lasting push timeouts ?
              - make sure You set a longer timeouts in Your proxy/fcgi settings, timeouts
                are caused by https server and not RhodeCode
             - large pushes timeouts ?
              - make sure You set a proper max_body_size for the http server
             .. _virtualenv: http://pypi.python.org/pypi/virtualenv
             .. _python: http://www.python.org/
             .. _mercurial: http://mercurial.selenic.com/
             .. _celery: http://celeryproject.org/
             .. _rabbitmq: http://www.rabbitmq.com/

rhodecode/lib/indexers/__init__.py

0 +11 -1

             import os
             import sys
             import traceback
             from os.path import dirname as dn, join as jn
             #to get the rhodecode import
             sys.path.append(dn(dn(dn(os.path.realpath(__file__)))))
+            from string import strip
             from rhodecode.model import init_model
             from rhodecode.model.scm import ScmModel
             from rhodecode.config.environment import load_environment
             from rhodecode.lib.utils import BasePasterCommand, Command, add_cache
             from shutil import rmtree
             from webhelpers.html.builder import escape
             from vcs.utils.lazy import LazyProperty
             from sqlalchemy import engine_from_config
             from whoosh.analysis import RegexTokenizer, LowercaseFilter, StopFilter
             from whoosh.fields import TEXT, ID, STORED, Schema, FieldType
             from whoosh.index import create_in, open_dir
             from whoosh.formats import Characters
             from whoosh.highlight import highlight, SimpleFragmenter, HtmlFormatter
             #EXTENSIONS WE WANT TO INDEX CONTENT OFF
             INDEX_EXTENSIONS = ['action', 'adp', 'ashx', 'asmx', 'aspx', 'asx', 'axd', 'c',
                                 'cfg', 'cfm', 'cpp', 'cs', 'css', 'diff', 'do', 'el', 'erl',
                                 'h', 'htm', 'html', 'ini', 'java', 'js', 'jsp', 'jspx', 'lisp',
                                 'lua', 'm', 'mako', 'ml', 'pas', 'patch', 'php', 'php3',
                                 'php4', 'phtml', 'pm', 'py', 'rb', 'rst', 's', 'sh', 'sql',
                                 'tpl', 'txt', 'vim', 'wss', 'xhtml', 'xml', 'xsl', 'xslt',
                                 'yaws']
             #CUSTOM ANALYZER wordsplit + lowercase filter
             ANALYZER = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
             #INDEX SCHEMA DEFINITION
             SCHEMA = Schema(owner=TEXT(),
                             repository=TEXT(stored=True),
                             path=TEXT(stored=True),
                             content=FieldType(format=Characters(ANALYZER),
                                          scorable=True, stored=True),
                             modtime=STORED(), extension=TEXT(stored=True))
             IDX_NAME = 'HG_INDEX'
             FORMATTER = HtmlFormatter('span', between='\n<span class="break">...</span>\n')
             FRAGMENTER = SimpleFragmenter(200)
             class MakeIndex(BasePasterCommand):
                 max_args = 1
                 min_args = 1
                 usage = "CONFIG_FILE"
                 summary = "Creates index for full text search given configuration file"
                 group_name = "RhodeCode"
                 takes_config_file = -1
                 parser = Command.standard_parser(verbose=True)
                 def command(self):
                     from pylons import config
                     add_cache(config)
                     engine = engine_from_config(config, 'sqlalchemy.db1.')
                     init_model(engine)
                     index_location = config['index_dir']
                     repo_location = self.options.repo_location
+                    repo_list = map(strip, self.options.repo_list.split(','))
                     #======================================================================
                     # WHOOSH DAEMON
                     #======================================================================
                     from rhodecode.lib.pidlock import LockHeld, DaemonLock
                     from rhodecode.lib.indexers.daemon import WhooshIndexingDaemon
                     try:
                         l = DaemonLock()
                         WhooshIndexingDaemon(index_location=index_location,
-                                             repo_location=repo_location)\
+                                             repo_location=repo_location,
+                                             repo_list=repo_list)\
                             .run(full_index=self.options.full_index)
                         l.release()
                     except LockHeld:
                         sys.exit(1)
                 def update_parser(self):
                     self.parser.add_option('--repo-location',
                                       action='store',
                                       dest='repo_location',
                                       help="Specifies repositories location to index REQUIRED",
                                       )
+                    self.parser.add_option('--index-only',
+                                      action='store',
+                                      dest='repo_list',
+                                      help="Specifies a comma separated list of repositores "
+                                            "to build index on OPTIONAL",
+                                      )
                     self.parser.add_option('-f',
                                       action='store_true',
                                       dest='full_index',
                                       help="Specifies that index should be made full i.e"
                                             " destroy old and build from scratch",
                                       default=False)
             class ResultWrapper(object):
                 def __init__(self, search_type, searcher, matcher, highlight_items):
                     self.search_type = search_type
                     self.searcher = searcher
                     self.matcher = matcher
                     self.highlight_items = highlight_items
                     self.fragment_size = 200 / 2
                 @LazyProperty
                 def doc_ids(self):
                     docs_id = []
                     while self.matcher.is_active():
                         docnum = self.matcher.id()
                         chunks = [offsets for offsets in self.get_chunks()]
                         docs_id.append([docnum, chunks])
                         self.matcher.next()
                     return docs_id
                 def __str__(self):
                     return '<%s at %s>' % (self.__class__.__name__, len(self.doc_ids))
                 def __repr__(self):
                     return self.__str__()
                 def __len__(self):
                     return len(self.doc_ids)
                 def __iter__(self):
                     """
                     Allows Iteration over results,and lazy generate content
                     *Requires* implementation of ``__getitem__`` method.
                     """
                     for docid in self.doc_ids:
                         yield self.get_full_content(docid)
                 def __getslice__(self, i, j):
                     """
                     Slicing of resultWrapper
                     """
                     slice = []
                     for docid in self.doc_ids[i:j]:
                         slice.append(self.get_full_content(docid))
                     return slice
                 def get_full_content(self, docid):
                     res = self.searcher.stored_fields(docid[0])
                     f_path = res['path'][res['path'].find(res['repository']) \
                                          + len(res['repository']):].lstrip('/')
                     content_short = self.get_short_content(res, docid[1])
                     res.update({'content_short':content_short,
                                 'content_short_hl':self.highlight(content_short),
                                 'f_path':f_path})
                     return res
                 def get_short_content(self, res, chunks):
                     return ''.join([res['content'][chunk[0]:chunk[1]] for chunk in chunks])
                 def get_chunks(self):
                     """
                     Smart function that implements chunking the content
                     but not overlap chunks so it doesn't highlight the same
                     close occurrences twice.
                     @param matcher:
                     @param size:
                     """
                     memory = [(0, 0)]
                     for span in self.matcher.spans():
                         start = span.startchar or 0
                         end = span.endchar or 0
                         start_offseted = max(0, start - self.fragment_size)
                         end_offseted = end + self.fragment_size
                         if start_offseted < memory[-1][1]:
                             start_offseted = memory[-1][1]
                         memory.append((start_offseted, end_offseted,))
                         yield (start_offseted, end_offseted,)
                 def highlight(self, content, top=5):
                     if self.search_type != 'content':
                         return ''
                     hl = highlight(escape(content),
                              self.highlight_items,
                              analyzer=ANALYZER,
                              fragmenter=FRAGMENTER,
                              formatter=FORMATTER,
                              top=top)
                     return hl

rhodecode/lib/indexers/daemon.py

0 +13 -3

             # -*- coding: utf-8 -*-
             """
                 rhodecode.lib.indexers.daemon
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                 A deamon will read from task table and run tasks
                 :created_on: Jan 26, 2010
                 :author: marcink
                 :copyright: (C) 2009-2010 Marcin Kuzminski <marcin@python-works.com>
                 :license: GPLv3, see COPYING for more details.
             """
             # This program is free software; you can redistribute it and/or
             # modify it under the terms of the GNU General Public License
             # as published by the Free Software Foundation; version 2
             # of the License or (at your opinion) any later version of the license.
             #
             # This program is distributed in the hope that it will be useful,
             # but WITHOUT ANY WARRANTY; without even the implied warranty of
             # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
             # GNU General Public License for more details.
             #
             # You should have received a copy of the GNU General Public License
             # along with this program; if not, write to the Free Software
             # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
             # MA  02110-1301, USA.
             import sys
             import os
             import traceback
             from os.path import dirname as dn
             from os.path import join as jn
             #to get the rhodecode import
             project_path = dn(dn(dn(dn(os.path.realpath(__file__)))))
             sys.path.append(project_path)
             from rhodecode.model.scm import ScmModel
             from rhodecode.lib.helpers import safe_unicode
             from whoosh.index import create_in, open_dir
             from shutil import rmtree
             from rhodecode.lib.indexers import INDEX_EXTENSIONS, SCHEMA, IDX_NAME
             from time import mktime
             from vcs.exceptions import ChangesetError, RepositoryError
             import logging
             log = logging.getLogger('whooshIndexer')
             # create logger
             log.setLevel(logging.DEBUG)
             log.propagate = False
             # create console handler and set level to debug
             ch = logging.StreamHandler()
             ch.setLevel(logging.DEBUG)
             # create formatter
             formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
             # add formatter to ch
             ch.setFormatter(formatter)
             # add ch to logger
             log.addHandler(ch)
             class WhooshIndexingDaemon(object):
                 """
                 Deamon for atomic jobs
                 """
                 def __init__(self, indexname='HG_INDEX', index_location=None,
-                             repo_location=None, sa=None):
+                             repo_location=None, sa=None, repo_list=None):
                     self.indexname = indexname
                     self.index_location = index_location
                     if not index_location:
                         raise Exception('You have to provide index location')
                     self.repo_location = repo_location
                     if not repo_location:
                         raise Exception('You have to provide repositories location')
                     self.repo_paths = ScmModel(sa).repo_scan(self.repo_location, None)
+                    if repo_list:
+                        filtered_repo_paths = {}
+                        for repo_name, repo in self.repo_paths.items():
+                            if repo_name in repo_list:
+                                filtered_repo_paths[repo.name] = repo
+                        self.repo_paths = filtered_repo_paths
                     self.initial = False
                     if not os.path.isdir(self.index_location):
                         os.makedirs(self.index_location)
                         log.info('Cannot run incremental index since it does not'
                                  ' yet exist running full build')
                         self.initial = True
                 def get_paths(self, repo):
                     """recursive walk in root dir and return a set of all path in that dir
                     based on repository walk function
                     """
                     index_paths_ = set()
                     try:
                         for topnode, dirs, files in repo.walk('/', 'tip'):
                             for f in files:
                                 index_paths_.add(jn(repo.path, f.path))
                             for dir in dirs:
                                 for f in files:
                                     index_paths_.add(jn(repo.path, f.path))
                     except RepositoryError, e:
                         log.debug(traceback.format_exc())
                         pass
                     return index_paths_
                 def get_node(self, repo, path):
                     n_path = path[len(repo.path) + 1:]
                     node = repo.get_changeset().get_node(n_path)
                     return node
                 def get_node_mtime(self, node):
                     return mktime(node.last_changeset.date.timetuple())
                 def add_doc(self, writer, path, repo):
                     """Adding doc to writer this function itself fetches data from
                     the instance of vcs backend"""
                     node = self.get_node(repo, path)
                     #we just index the content of chosen files, and skip binary files
                     if node.extension in INDEX_EXTENSIONS and not node.is_binary:
                         u_content = node.content
                         if not isinstance(u_content, unicode):
                             log.warning('  >> %s Could not get this content as unicode '
                                       'replacing with empty content', path)
                             u_content = u''
                         else:
                             log.debug('    >> %s [WITH CONTENT]' % path)
                     else:
                         log.debug('    >> %s' % path)
                         #just index file name without it's content
                         u_content = u''
                     writer.add_document(owner=unicode(repo.contact),
                                     repository=safe_unicode(repo.name),
                                     path=safe_unicode(path),
                                     content=u_content,
                                     modtime=self.get_node_mtime(node),
                                     extension=node.extension)
                 def build_index(self):
                     if os.path.exists(self.index_location):
                         log.debug('removing previous index')
                         rmtree(self.index_location)
                     if not os.path.exists(self.index_location):
                         os.mkdir(self.index_location)
                     idx = create_in(self.index_location, SCHEMA, indexname=IDX_NAME)
                     writer = idx.writer()
-                    print self.repo_paths.values()
-                    for cnt, repo in enumerate(self.repo_paths.values()):
+                    for repo in self.repo_paths.values():
                         log.debug('building index @ %s' % repo.path)
                         for idx_path in self.get_paths(repo):
                             self.add_doc(writer, idx_path, repo)
                     log.debug('>> COMMITING CHANGES <<')
                     writer.commit(merge=True)
                     log.debug('>>> FINISHED BUILDING INDEX <<<')
                 def update_index(self):
                     log.debug('STARTING INCREMENTAL INDEXING UPDATE')
                     idx = open_dir(self.index_location, indexname=self.indexname)
                     # The set of all paths in the index
                     indexed_paths = set()
                     # The set of all paths we need to re-index
                     to_index = set()
                     reader = idx.reader()
                     writer = idx.writer()
                     # Loop over the stored fields in the index
                     for fields in reader.all_stored_fields():
                         indexed_path = fields['path']
                         indexed_paths.add(indexed_path)
                         repo = self.repo_paths[fields['repository']]
                         try:
                             node = self.get_node(repo, indexed_path)
                         except ChangesetError:
                             # This file was deleted since it was indexed
                             log.debug('removing from index %s' % indexed_path)
                             writer.delete_by_term('path', indexed_path)
                         else:
                             # Check if this file was changed since it was indexed
                             indexed_time = fields['modtime']
                             mtime = self.get_node_mtime(node)
                             if mtime > indexed_time:
                                 # The file has changed, delete it and add it to the list of
                                 # files to reindex
                                 log.debug('adding to reindex list %s' % indexed_path)
                                 writer.delete_by_term('path', indexed_path)
                                 to_index.add(indexed_path)
                     # Loop over the files in the filesystem
                     # Assume we have a function that gathers the filenames of the
                     # documents to be indexed
                     for repo in self.repo_paths.values():
                         for path in self.get_paths(repo):
                             if path in to_index or path not in indexed_paths:
                                 # This is either a file that's changed, or a new file
                                 # that wasn't indexed before. So index it!
                                 self.add_doc(writer, path, repo)
                                 log.debug('re indexing %s' % path)
                     log.debug('>> COMMITING CHANGES <<')
                     writer.commit(merge=True)
                     log.debug('>>> FINISHED REBUILDING INDEX <<<')
                 def run(self, full_index=False):
                     """Run daemon"""
                     if full_index or self.initial:
                         self.build_index()
                     else:
                         self.update_index()

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages