##// END OF EJS Templates
celery: Reorder celery imports to fix case where celery tasks were being...
celery: Reorder celery imports to fix case where celery tasks were being registered before celery config was setup. Added comments to make this clear so we don't chase this problem in the future until it is refactored

File last commit:

r153:42d94537 default
r628:4d980275 default
Show More
indexing.rst
271 lines | 8.2 KiB | text/x-rst | RstLexer
project: added all source files and assets
r1 .. _indexing-ref:
Full-text Search
----------------
dan
docs: add elasticsearch docs
r153 By default |RC| is configured to use `Whoosh`_ to index |repos| and
provide full-text search.
|RCE| also provides support for `Elasticsearch`_ as a backend for scalable
search. See :ref:`enable-elasticsearch` for details.
Indexing
^^^^^^^^
project: added all source files and assets
r1 To run the indexer you need to use an |authtoken| with admin rights to all
|repos|.
To index new content added, you have the option to set the indexer up in a
number of ways, for example:
* Call the indexer via a cron job. We recommend running this nightly,
unless you need everything indexed immediately.
* Set the indexer to infinitely loop and reindex as soon as it has run its
cycle.
* Hook the indexer up with your CI server to reindex after each push.
The indexer works by indexing new commits added since the last run. If you
wish to build a brand new index from scratch each time,
use the ``force`` option in the configuration file.
.. important::
You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|
3.5.0 they are installed by default.
To set up indexing, use the following steps:
1. :ref:`config-rhoderc`, if running tools remotely.
2. :ref:`run-index`
3. :ref:`set-index`
4. :ref:`advanced-indexing`
.. _config-rhoderc:
Configure the ``.rhoderc`` File
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
to |RCM| instances. If this file is not automatically created,
you can configure it using the following example. You need to configure the
details for each instance you want to index.
.. code-block:: bash
# Check the instance details
# of the instance you want to index
$ rccontrol status
- NAME: enterprise-1
- STATUS: RUNNING
- TYPE: Momentum
- VERSION: 1.5.0
- URL: http://127.0.0.1:10000
To get your API Token, on the |RCM| interface go to
:menuselection:`username --> My Account --> Auth tokens`
.. code-block:: ini
# Configure .rhoderc with matching details
# This allows the indexer to connect to the instance
[instance:enterprise-1]
api_host = http://127.0.0.1:10000
api_key = <auth token goes here>
repo_dir = /home/<username>/repos
.. _run-index:
Run the Indexer
^^^^^^^^^^^^^^^
Run the indexer using the following command, and specify the instance you
want to index:
.. code-block:: bash
# From inside a virtualevv
(venv)$ rhodecode-index --instance-name=enterprise-1
# Using default installation
$ /home/user/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
--instance-name=enterprise-4
# Using a custom mapping file
$ /home/user/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
--instance-name=enterprise-4 \
--mapping=/home/user/.rccontrol/enterprise-4/mapping.ini
.. note::
|RCT| require |PY| 2.7 to run.
.. _set-index:
Schedule the Indexer
^^^^^^^^^^^^^^^^^^^^
To schedule the indexer, configure the crontab file to run the indexer inside
your |RCT| virtualenv using the following steps.
1. Open the crontab file, using ``crontab -e``.
2. Add the indexer to the crontab, and schedule it to run as regularly as you
wish.
3. Save the file.
.. code-block:: bash
$ crontab -e
# The virtualenv can be called using its full path, so for example you can
# put this example into the crontab
# Run the indexer daily at 4am using the default mapping settings
* 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
--instance-name=enterprise-1
# Run the indexer every Sunday at 3am using default mapping
* 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
--instance-name=enterprise-1
# Run the indexer every 15 minutes
# using a specially configured mapping file
*/15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
--instance-name=enterprise-4 \
--mapping=/home/user/.rccontrol/enterprise-4/mapping.ini
.. _advanced-indexing:
Advanced Indexing
^^^^^^^^^^^^^^^^^
|RCT| indexes based on the :file:`mapping.ini` file. To configure your index,
you can specify different options in this file. The default location is:
* :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`, using default
|RCT|.
* :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,
when using ``virtualenv``.
.. note::
If you need to create the :file:`mapping.ini` file, use the |RCT|
``rhodecode-index --create-mapping path/to/file`` API call. For details,
see the :ref:`tools-cli` section.
The indexer runs in a random order to prevent a failing |repo| from stopping
a build. To configure different indexing scenarios, set the following options
inside the :file:`mapping.ini` and specify the altered file using the
``--mapping`` option.
* ``index_files`` : Index the specified file types.
* ``skip_files`` : Do not index the specified file types.
* ``index_files_content`` : Index the content of the specified file types.
* ``skip_files_content`` : Do not index the content of the specified files.
* ``force`` : Create a fresh index on each run.
* ``max_filesize`` : Files larger than the set size will not be indexed.
* ``commit_parse_limit`` : Set the batch size when indexing commit messages.
Set to a lower number to lessen memory load.
* ``repo_limit`` : Set the maximum number or |repos| indexed per run.
* ``[INCLUDE]`` : Set |repos| you want indexed. This takes precedent over
``[EXCLUDE]``.
* ``[EXCLUDE]`` : Set |repos| you do not want indexed. Exclude can be used to
not index branches, forks, or log |repos|.
At the end of the file you can specify conditions for specific |repos| that
will override the default values. To configure your indexer,
use the following example :file:`mapping.ini` file.
.. code-block:: ini
[__DEFAULT__]
# default patterns for indexing files and content of files.
# Binary files are skipped by default.
# Index python and markdown files
index_files = *.py, *.md
# Do not index these file types
skip_files = *.svg, *.log, *.dump, *.txt
# Index both file types and their content
index_files_content = *.cpp, *.ini, *.py
# Index file names, but not file content
skip_files_content = *.svg,
# Force rebuilding an index from scratch. Each repository will be rebuild
# from scratch with a global flag. Use local flag to rebuild single repos
force = false
# Do not index files larger than 385KB
max_filesize = 385KB
# Limit commit indexing to 500 per batch
commit_parse_limit = 500
# Limit each index run to 25 repos
repo_limit = 25
# __INCLUDE__ is more important that __EXCLUDE__.
[__INCLUDE__]
# Include all repos with these names
docs/* = 1
lib/* = 1
[__EXCLUDE__]
# Do not include the following repo in index
dev-docs/* = 1
legacy-repos/* = 1
*-dev/* = 1
# Each repo that needs special indexing is a separate section below.
# In each section set the options to override the global configuration
# parameters above.
# If special settings are not configured, the global configuration values
# above are inherited. If no special repositories are
# defined here RhodeCode will use the API to ask for all repositories
# For this repo use different settings
[special-repo]
commit_parse_limit = 20,
skip_files = *.idea, *.xml,
# For another repo use different settings
[another-special-repo]
index_files = *,
max_filesize = 800MB
commit_parse_limit = 20000
dan
docs: add elasticsearch docs
r153 .. _enable-elasticsearch:
Enabling Elasticsearch
^^^^^^^^^^^^^^^^^^^^^^
1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
default location is
:file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
2. Find the search configuration section:
.. code-block:: ini
###################################
## SEARCH INDEXING CONFIGURATION ##
###################################
search.module = rhodecode.lib.index.whoosh
search.location = %(here)s/data/index
and change it to:
.. code-block:: ini
search.module = rc_elasticsearch
search.location = http://localhost:9200/
where ``search.location`` points to the elasticsearch server.
project: added all source files and assets
r1 .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
dan
docs: add elasticsearch docs
r153 .. _Elasticsearch: https://www.elastic.co/