.. _indexing-ref: Full-text Search ---------------- By default |RC| is configured to use `Whoosh`_ to index |repos| and provide full-text search. |RCE| also provides support for `Elasticsearch`_ as a backend for scalable search. See :ref:`enable-elasticsearch` for details. Indexing ^^^^^^^^ To run the indexer you need to use an |authtoken| with admin rights to all |repos|. To index new content added, you have the option to set the indexer up in a number of ways, for example: * Call the indexer via a cron job. We recommend running this nightly, unless you need everything indexed immediately. * Set the indexer to infinitely loop and reindex as soon as it has run its cycle. * Hook the indexer up with your CI server to reindex after each push. The indexer works by indexing new commits added since the last run. If you wish to build a brand new index from scratch each time, use the ``force`` option in the configuration file. .. important:: You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE| 3.5.0 they are installed by default. To set up indexing, use the following steps: 1. :ref:`config-rhoderc`, if running tools remotely. 2. :ref:`run-index` 3. :ref:`set-index` 4. :ref:`advanced-indexing` .. _config-rhoderc: Configure the ``.rhoderc`` File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details to |RCM| instances. If this file is not automatically created, you can configure it using the following example. You need to configure the details for each instance you want to index. .. code-block:: bash # Check the instance details # of the instance you want to index $ rccontrol status - NAME: enterprise-1 - STATUS: RUNNING - TYPE: Momentum - VERSION: 1.5.0 - URL: http://127.0.0.1:10000 To get your API Token, on the |RCM| interface go to :menuselection:`username --> My Account --> Auth tokens` .. code-block:: ini # Configure .rhoderc with matching details # This allows the indexer to connect to the instance [instance:enterprise-1] api_host = http://127.0.0.1:10000 api_key = repo_dir = /home//repos .. _run-index: Run the Indexer ^^^^^^^^^^^^^^^ Run the indexer using the following command, and specify the instance you want to index: .. code-block:: bash # From inside a virtualevv (venv)$ rhodecode-index --instance-name=enterprise-1 # Using default installation $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ --instance-name=enterprise-1 # Using a custom mapping file $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ --instance-name=enterprise-1 \ --mapping=/home/user/.rccontrol/enterprise-1/mapping.ini .. note:: In case of often indexing the index may become fragmented. Most often a result of that is error about `too many open files`. To fix this indexer needs to be executed with --optimize flag. E.g `rhodecode-index --instance-name=enterprise-1 --optimize` This should be executed regularly, once a week is recommended. .. _set-index: Schedule the Indexer ^^^^^^^^^^^^^^^^^^^^ To schedule the indexer, configure the crontab file to run the indexer inside your |RCT| virtualenv using the following steps. 1. Open the crontab file, using ``crontab -e``. 2. Add the indexer to the crontab, and schedule it to run as regularly as you wish. 3. Save the file. .. code-block:: bash $ crontab -e # The virtualenv can be called using its full path, so for example you can # put this example into the crontab # Run the indexer daily at 4am using the default mapping settings * 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \ --instance-name=enterprise-1 # Run the indexer every Sunday at 3am using default mapping * 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \ --instance-name=enterprise-1 # Run the indexer every 15 minutes # using a specially configured mapping file */15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \ --instance-name=enterprise-4 \ --mapping=/home/user/.rccontrol/enterprise-4/mapping.ini .. _advanced-indexing: Advanced Indexing ^^^^^^^^^^^^^^^^^ |RCT| indexes based on the :file:`mapping.ini` file. To configure your index, you can specify different options in this file. The default location is: * :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`, using default |RCT|. * :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`, when using ``virtualenv``. .. note:: If you need to create the :file:`mapping.ini` file, use the |RCT| ``rhodecode-index --create-mapping path/to/file`` API call. For details, see the :ref:`tools-cli` section. The indexer runs in a random order to prevent a failing |repo| from stopping a build. To configure different indexing scenarios, set the following options inside the :file:`mapping.ini` and specify the altered file using the ``--mapping`` option. * ``index_files`` : Index the specified file types. * ``skip_files`` : Do not index the specified file types. * ``index_files_content`` : Index the content of the specified file types. * ``skip_files_content`` : Do not index the content of the specified files. * ``force`` : Create a fresh index on each run. * ``max_filesize`` : Files larger than the set size will not be indexed. * ``commit_parse_limit`` : Set the batch size when indexing commit messages. Set to a lower number to lessen memory load. * ``repo_limit`` : Set the maximum number or |repos| indexed per run. * ``[INCLUDE]`` : Set |repos| you want indexed. This takes precedent over ``[EXCLUDE]``. * ``[EXCLUDE]`` : Set |repos| you do not want indexed. Exclude can be used to not index branches, forks, or log |repos|. At the end of the file you can specify conditions for specific |repos| that will override the default values. To configure your indexer, use the following example :file:`mapping.ini` file. .. code-block:: ini [__DEFAULT__] # default patterns for indexing files and content of files. # Binary files are skipped by default. # Index python and markdown files index_files = *.py, *.md # Do not index these file types skip_files = *.svg, *.log, *.dump, *.txt # Index both file types and their content index_files_content = *.cpp, *.ini, *.py # Index file names, but not file content skip_files_content = *.svg, # Force rebuilding an index from scratch. Each repository will be rebuild # from scratch with a global flag. Use local flag to rebuild single repos force = false # Do not index files larger than 385KB max_filesize = 385KB # Limit commit indexing to 500 per batch commit_parse_limit = 500 # Limit each index run to 25 repos repo_limit = 25 # __INCLUDE__ is more important that __EXCLUDE__. [__INCLUDE__] # Include all repos with these names docs/* = 1 lib/* = 1 [__EXCLUDE__] # Do not include the following repo in index dev-docs/* = 1 legacy-repos/* = 1 *-dev/* = 1 # Each repo that needs special indexing is a separate section below. # In each section set the options to override the global configuration # parameters above. # If special settings are not configured, the global configuration values # above are inherited. If no special repositories are # defined here RhodeCode will use the API to ask for all repositories # For this repo use different settings [special-repo] commit_parse_limit = 20, skip_files = *.idea, *.xml, # For another repo use different settings [another-special-repo] index_files = *, max_filesize = 800MB commit_parse_limit = 20000 .. _enable-elasticsearch: Enabling Elasticsearch ^^^^^^^^^^^^^^^^^^^^^^ 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The default location is :file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini` 2. Find the search configuration section: .. code-block:: ini ################################### ## SEARCH INDEXING CONFIGURATION ## ################################### search.module = rhodecode.lib.index.whoosh search.location = %(here)s/data/index and change it to: .. code-block:: ini search.module = rc_elasticsearch search.location = http://localhost:9200/ where ``search.location`` points to the elasticsearch server. .. _Whoosh: https://pypi.python.org/pypi/Whoosh/ .. _Elasticsearch: https://www.elastic.co/