indexing.rst
275 lines
| 8.5 KiB
| text/x-rst
|
RstLexer
r1 | .. _indexing-ref: | ||
Full-text Search | |||
---------------- | |||
r153 | By default |RC| is configured to use `Whoosh`_ to index |repos| and | ||
provide full-text search. | |||
|RCE| also provides support for `Elasticsearch`_ as a backend for scalable | |||
search. See :ref:`enable-elasticsearch` for details. | |||
Indexing | |||
^^^^^^^^ | |||
r1 | To run the indexer you need to use an |authtoken| with admin rights to all | ||
|repos|. | |||
To index new content added, you have the option to set the indexer up in a | |||
number of ways, for example: | |||
* Call the indexer via a cron job. We recommend running this nightly, | |||
unless you need everything indexed immediately. | |||
* Set the indexer to infinitely loop and reindex as soon as it has run its | |||
cycle. | |||
* Hook the indexer up with your CI server to reindex after each push. | |||
The indexer works by indexing new commits added since the last run. If you | |||
wish to build a brand new index from scratch each time, | |||
use the ``force`` option in the configuration file. | |||
.. important:: | |||
You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE| | |||
3.5.0 they are installed by default. | |||
To set up indexing, use the following steps: | |||
1. :ref:`config-rhoderc`, if running tools remotely. | |||
2. :ref:`run-index` | |||
3. :ref:`set-index` | |||
4. :ref:`advanced-indexing` | |||
.. _config-rhoderc: | |||
Configure the ``.rhoderc`` File | |||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||
|RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details | |||
to |RCM| instances. If this file is not automatically created, | |||
you can configure it using the following example. You need to configure the | |||
details for each instance you want to index. | |||
.. code-block:: bash | |||
# Check the instance details | |||
# of the instance you want to index | |||
$ rccontrol status | |||
- NAME: enterprise-1 | |||
- STATUS: RUNNING | |||
- TYPE: Momentum | |||
- VERSION: 1.5.0 | |||
- URL: http://127.0.0.1:10000 | |||
To get your API Token, on the |RCM| interface go to | |||
:menuselection:`username --> My Account --> Auth tokens` | |||
.. code-block:: ini | |||
# Configure .rhoderc with matching details | |||
# This allows the indexer to connect to the instance | |||
[instance:enterprise-1] | |||
api_host = http://127.0.0.1:10000 | |||
api_key = <auth token goes here> | |||
repo_dir = /home/<username>/repos | |||
.. _run-index: | |||
Run the Indexer | |||
^^^^^^^^^^^^^^^ | |||
Run the indexer using the following command, and specify the instance you | |||
want to index: | |||
.. code-block:: bash | |||
# From inside a virtualevv | |||
(venv)$ rhodecode-index --instance-name=enterprise-1 | |||
# Using default installation | |||
r2949 | $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ | ||
--instance-name=enterprise-1 | |||
r1 | |||
# Using a custom mapping file | |||
r2949 | $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ | ||
--instance-name=enterprise-1 \ | |||
--mapping=/home/user/.rccontrol/enterprise-1/mapping.ini | |||
r1 | |||
.. note:: | |||
r2949 | In case of often indexing the index may become fragmented. Most often a result of that | ||
is error about `too many open files`. To fix this indexer needs to be executed with | |||
--optimize flag. E.g `rhodecode-index --instance-name=enterprise-1 --optimize` | |||
This should be executed regularly, once a week is recommended. | |||
r1 | |||
.. _set-index: | |||
Schedule the Indexer | |||
^^^^^^^^^^^^^^^^^^^^ | |||
To schedule the indexer, configure the crontab file to run the indexer inside | |||
your |RCT| virtualenv using the following steps. | |||
1. Open the crontab file, using ``crontab -e``. | |||
2. Add the indexer to the crontab, and schedule it to run as regularly as you | |||
wish. | |||
3. Save the file. | |||
.. code-block:: bash | |||
$ crontab -e | |||
# The virtualenv can be called using its full path, so for example you can | |||
# put this example into the crontab | |||
# Run the indexer daily at 4am using the default mapping settings | |||
* 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \ | |||
--instance-name=enterprise-1 | |||
# Run the indexer every Sunday at 3am using default mapping | |||
* 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \ | |||
--instance-name=enterprise-1 | |||
# Run the indexer every 15 minutes | |||
# using a specially configured mapping file | |||
*/15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \ | |||
--instance-name=enterprise-4 \ | |||
--mapping=/home/user/.rccontrol/enterprise-4/mapping.ini | |||
.. _advanced-indexing: | |||
Advanced Indexing | |||
^^^^^^^^^^^^^^^^^ | |||
|RCT| indexes based on the :file:`mapping.ini` file. To configure your index, | |||
you can specify different options in this file. The default location is: | |||
* :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`, using default | |||
|RCT|. | |||
* :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`, | |||
when using ``virtualenv``. | |||
.. note:: | |||
If you need to create the :file:`mapping.ini` file, use the |RCT| | |||
``rhodecode-index --create-mapping path/to/file`` API call. For details, | |||
see the :ref:`tools-cli` section. | |||
The indexer runs in a random order to prevent a failing |repo| from stopping | |||
a build. To configure different indexing scenarios, set the following options | |||
inside the :file:`mapping.ini` and specify the altered file using the | |||
``--mapping`` option. | |||
* ``index_files`` : Index the specified file types. | |||
* ``skip_files`` : Do not index the specified file types. | |||
* ``index_files_content`` : Index the content of the specified file types. | |||
* ``skip_files_content`` : Do not index the content of the specified files. | |||
* ``force`` : Create a fresh index on each run. | |||
* ``max_filesize`` : Files larger than the set size will not be indexed. | |||
* ``commit_parse_limit`` : Set the batch size when indexing commit messages. | |||
Set to a lower number to lessen memory load. | |||
* ``repo_limit`` : Set the maximum number or |repos| indexed per run. | |||
* ``[INCLUDE]`` : Set |repos| you want indexed. This takes precedent over | |||
``[EXCLUDE]``. | |||
* ``[EXCLUDE]`` : Set |repos| you do not want indexed. Exclude can be used to | |||
not index branches, forks, or log |repos|. | |||
At the end of the file you can specify conditions for specific |repos| that | |||
will override the default values. To configure your indexer, | |||
use the following example :file:`mapping.ini` file. | |||
.. code-block:: ini | |||
[__DEFAULT__] | |||
# default patterns for indexing files and content of files. | |||
# Binary files are skipped by default. | |||
# Index python and markdown files | |||
index_files = *.py, *.md | |||
# Do not index these file types | |||
skip_files = *.svg, *.log, *.dump, *.txt | |||
# Index both file types and their content | |||
index_files_content = *.cpp, *.ini, *.py | |||
# Index file names, but not file content | |||
skip_files_content = *.svg, | |||
# Force rebuilding an index from scratch. Each repository will be rebuild | |||
# from scratch with a global flag. Use local flag to rebuild single repos | |||
force = false | |||
# Do not index files larger than 385KB | |||
max_filesize = 385KB | |||
# Limit commit indexing to 500 per batch | |||
commit_parse_limit = 500 | |||
# Limit each index run to 25 repos | |||
repo_limit = 25 | |||
# __INCLUDE__ is more important that __EXCLUDE__. | |||
[__INCLUDE__] | |||
# Include all repos with these names | |||
docs/* = 1 | |||
lib/* = 1 | |||
[__EXCLUDE__] | |||
# Do not include the following repo in index | |||
dev-docs/* = 1 | |||
legacy-repos/* = 1 | |||
*-dev/* = 1 | |||
# Each repo that needs special indexing is a separate section below. | |||
# In each section set the options to override the global configuration | |||
# parameters above. | |||
# If special settings are not configured, the global configuration values | |||
# above are inherited. If no special repositories are | |||
# defined here RhodeCode will use the API to ask for all repositories | |||
# For this repo use different settings | |||
[special-repo] | |||
commit_parse_limit = 20, | |||
skip_files = *.idea, *.xml, | |||
# For another repo use different settings | |||
[another-special-repo] | |||
index_files = *, | |||
max_filesize = 800MB | |||
commit_parse_limit = 20000 | |||
r153 | .. _enable-elasticsearch: | ||
Enabling Elasticsearch | |||
^^^^^^^^^^^^^^^^^^^^^^ | |||
1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The | |||
default location is | |||
:file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini` | |||
2. Find the search configuration section: | |||
.. code-block:: ini | |||
################################### | |||
## SEARCH INDEXING CONFIGURATION ## | |||
################################### | |||
search.module = rhodecode.lib.index.whoosh | |||
search.location = %(here)s/data/index | |||
and change it to: | |||
.. code-block:: ini | |||
search.module = rc_elasticsearch | |||
search.location = http://localhost:9200/ | |||
where ``search.location`` points to the elasticsearch server. | |||
r1 | .. _Whoosh: https://pypi.python.org/pypi/Whoosh/ | ||
r153 | .. _Elasticsearch: https://www.elastic.co/ |