##// END OF EJS Templates
diffs: introducing diff menu for whitespace toggle and context changes
diffs: introducing diff menu for whitespace toggle and context changes

File last commit:

r2949:9a63987a default
r3134:0c8f7d31 default
Show More
indexing.rst
275 lines | 8.5 KiB | text/x-rst | RstLexer

Full-text Search

By default |RC| is configured to use Whoosh to index |repos| and provide full-text search.

|RCE| also provides support for Elasticsearch as a backend for scalable search. See :ref:`enable-elasticsearch` for details.

Indexing

To run the indexer you need to use an |authtoken| with admin rights to all |repos|.

To index new content added, you have the option to set the indexer up in a number of ways, for example:

  • Call the indexer via a cron job. We recommend running this nightly, unless you need everything indexed immediately.
  • Set the indexer to infinitely loop and reindex as soon as it has run its cycle.
  • Hook the indexer up with your CI server to reindex after each push.

The indexer works by indexing new commits added since the last run. If you wish to build a brand new index from scratch each time, use the force option in the configuration file.

Important

You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE| 3.5.0 they are installed by default.

To set up indexing, use the following steps:

  1. :ref:`config-rhoderc`, if running tools remotely.
  2. :ref:`run-index`
  3. :ref:`set-index`
  4. :ref:`advanced-indexing`

Configure the .rhoderc File

|RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details to |RCM| instances. If this file is not automatically created, you can configure it using the following example. You need to configure the details for each instance you want to index.

# Check the instance details
# of the instance you want to index
$ rccontrol status

 - NAME: enterprise-1
 - STATUS: RUNNING
 - TYPE: Momentum
 - VERSION: 1.5.0
 - URL: http://127.0.0.1:10000

To get your API Token, on the |RCM| interface go to :menuselection:`username --> My Account --> Auth tokens`

# Configure .rhoderc with matching details
# This allows the indexer to connect to the instance
[instance:enterprise-1]
api_host = http://127.0.0.1:10000
api_key = <auth token goes here>
repo_dir = /home/<username>/repos

Run the Indexer

Run the indexer using the following command, and specify the instance you want to index:

# From inside a virtualevv
(venv)$ rhodecode-index --instance-name=enterprise-1

# Using default installation
$ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
    --instance-name=enterprise-1

# Using a custom mapping file
$ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
    --instance-name=enterprise-1 \
    --mapping=/home/user/.rccontrol/enterprise-1/mapping.ini

Note

In case of often indexing the index may become fragmented. Most often a result of that is error about too many open files. To fix this indexer needs to be executed with --optimize flag. E.g rhodecode-index --instance-name=enterprise-1 --optimize This should be executed regularly, once a week is recommended.

Schedule the Indexer

To schedule the indexer, configure the crontab file to run the indexer inside your |RCT| virtualenv using the following steps.

  1. Open the crontab file, using crontab -e.
  2. Add the indexer to the crontab, and schedule it to run as regularly as you wish.
  3. Save the file.
$ crontab -e

# The virtualenv can be called using its full path, so for example you can
# put this example into the crontab

# Run the indexer daily at 4am using the default mapping settings
* 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
--instance-name=enterprise-1

# Run the indexer every Sunday at 3am using default mapping
* 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
--instance-name=enterprise-1

# Run the indexer every 15 minutes
# using a specially configured mapping file
*/15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
   --instance-name=enterprise-4 \
   --mapping=/home/user/.rccontrol/enterprise-4/mapping.ini

Advanced Indexing

|RCT| indexes based on the :file:`mapping.ini` file. To configure your index, you can specify different options in this file. The default location is:

  • :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`, using default |RCT|.
  • :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`, when using virtualenv.

Note

If you need to create the :file:`mapping.ini` file, use the |RCT| rhodecode-index --create-mapping path/to/file API call. For details, see the :ref:`tools-cli` section.

The indexer runs in a random order to prevent a failing |repo| from stopping a build. To configure different indexing scenarios, set the following options inside the :file:`mapping.ini` and specify the altered file using the --mapping option.

  • index_files : Index the specified file types.
  • skip_files : Do not index the specified file types.
  • index_files_content : Index the content of the specified file types.
  • skip_files_content : Do not index the content of the specified files.
  • force : Create a fresh index on each run.
  • max_filesize : Files larger than the set size will not be indexed.
  • commit_parse_limit : Set the batch size when indexing commit messages. Set to a lower number to lessen memory load.
  • repo_limit : Set the maximum number or |repos| indexed per run.
  • [INCLUDE] : Set |repos| you want indexed. This takes precedent over [EXCLUDE].
  • [EXCLUDE] : Set |repos| you do not want indexed. Exclude can be used to not index branches, forks, or log |repos|.

At the end of the file you can specify conditions for specific |repos| that will override the default values. To configure your indexer, use the following example :file:`mapping.ini` file.

[__DEFAULT__]
# default patterns for indexing files and content of files.
# Binary files are skipped by default.

# Index python and markdown files
index_files = *.py, *.md

# Do not index these file types
skip_files = *.svg, *.log, *.dump, *.txt

# Index both file types and their content
index_files_content = *.cpp, *.ini, *.py

# Index file names, but not file content
skip_files_content = *.svg,

# Force rebuilding an index from scratch. Each repository will be rebuild
# from scratch with a global flag. Use local flag to rebuild single repos
force = false

# Do not index files larger than 385KB
max_filesize = 385KB

# Limit commit indexing to 500 per batch
commit_parse_limit = 500

# Limit each index run to 25 repos
repo_limit = 25

# __INCLUDE__ is more important that __EXCLUDE__.

[__INCLUDE__]
# Include all repos with these names

docs/* = 1
lib/* = 1

[__EXCLUDE__]
# Do not include the following repo in index

dev-docs/* = 1
legacy-repos/* = 1
*-dev/* = 1

# Each repo that needs special indexing is a separate section below.
# In each section set the options to override the global configuration
# parameters above.
# If special settings are not configured, the global configuration values
# above are inherited. If no special repositories are
# defined here RhodeCode will use the API to ask for all repositories

# For this repo use different settings
[special-repo]
commit_parse_limit = 20,
skip_files = *.idea, *.xml,

# For another repo use different settings
[another-special-repo]
index_files = *,
max_filesize = 800MB
commit_parse_limit = 20000

Enabling Elasticsearch

  1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The default location is :file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
  2. Find the search configuration section:
###################################
## SEARCH INDEXING CONFIGURATION ##
###################################

search.module = rhodecode.lib.index.whoosh
search.location = %(here)s/data/index

and change it to:

search.module = rc_elasticsearch
search.location = http://localhost:9200/

where search.location points to the elasticsearch server.