u/pc/rhodecode-enterprise-ce-fork-pc Files · docs/admin/indexing.rst

docs: update full text search indexing documentation

marcink - - Load All Authors

File last commit:

r3400:2aa02c12 default


                r3400:2aa02c12

default

Download file

             indexing.rst
        
                    362 lines
            
             | 12.8 KiB
            
                | text/x-rst
            
             |
                RstLexer
            
             / docs / admin / indexing.rst
          
                    History
                
                 |
                  Source
                 | Raw
                 |Copy content
                 |Copy permalink

        marcink
    
project: added all source files and assets

              r1
            
      .. _indexing-ref:

      Full-text Search

      ----------------

        marcink
    
docs: added SAML documentation....

              r3290
            
      By default RhodeCode is configured to use `Whoosh`_ to index |repos| and

        dan
    
docs: add elasticsearch docs

              r153
            
      provide full-text search.

        marcink
    
docs: update full text search indexing documentation

              r3400
            
      |RCE| also provides support for `Elasticsearch 6`_ as a backend more for advanced

      and scalable search. See :ref:`enable-elasticsearch` for details.

        dan
    
docs: add elasticsearch docs

              r153
            
      Indexing

      ^^^^^^^^

        marcink
    
docs: update full text search indexing documentation

              r3400
            
      To run the indexer you need to have an |authtoken| with admin rights to all |repos|.

        marcink
    
project: added all source files and assets

              r1
            
      To index new content added, you have the option to set the indexer up in a

      number of ways, for example:

        marcink
    
docs: update full text search indexing documentation

              r3400
            
      * Call the indexer via a cron job. We recommend running this once at night.

        In case you need everything indexed immediately it's possible to index few

        times during the day.

      * Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.

        marcink
    
project: added all source files and assets

              r1
            
      * Hook the indexer up with your CI server to reindex after each push.

      The indexer works by indexing new commits added since the last run. If you

      wish to build a brand new index from scratch each time,

      use the ``force`` option in the configuration file.

      .. important::

         You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|

        marcink
    
docs: update full text search indexing documentation

              r3400
            
         3.5.0 they are installed by default and available with community/enterprise installations.

        marcink
    
project: added all source files and assets

              r1
            
      To set up indexing, use the following steps:

      1. :ref:`config-rhoderc`, if running tools remotely.

      2. :ref:`run-index`

      3. :ref:`set-index`

      4. :ref:`advanced-indexing`

      .. _config-rhoderc:

      Configure the ``.rhoderc`` File

      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        marcink
    
docs: update full text search indexing documentation

              r3400
            
      .. note::

          Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of

          executing with `--instance-name=enterprise-1` execute providing the host and token

          directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth token goes here>

        marcink
    
project: added all source files and assets

              r1
            
      |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details

        marcink
    
docs: added SAML documentation....

              r3290
            
      to |RCE| instances. If this file is not automatically created,

        marcink
    
project: added all source files and assets

              r1
            
      you can configure it using the following example. You need to configure the

      details for each instance you want to index.

      .. code-block:: bash

          # Check the instance details

          # of the instance you want to index

          $ rccontrol status

        marcink
    
docs: update full text search indexing documentation

              r3400
            
          - NAME: enterprise-1

          - STATUS: RUNNING

          - TYPE: Enterprise

          - VERSION: 4.1.0

          - URL: http://127.0.0.1:10003

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: added SAML documentation....

              r3290
            
      To get your API Token, on the |RCE| interface go to

        marcink
    
project: added all source files and assets

              r1
            
      :menuselection:`username --> My Account --> Auth tokens`

      .. code-block:: ini

          # Configure .rhoderc with matching details

          # This allows the indexer to connect to the instance

          [instance:enterprise-1]

          api_host = http://127.0.0.1:10000

          api_key = <auth token goes here>

        marcink
    
docs: update full text search indexing documentation

              r3400
            
        marcink
    
project: added all source files and assets

              r1
            
      .. _run-index:

      Run the Indexer

      ^^^^^^^^^^^^^^^

        marcink
    
docs: update full text search indexing documentation

              r3400
            
      Run the indexer using the following command, and specify the instance you want to index:

        marcink
    
project: added all source files and assets

              r1
            
      .. code-block:: bash

         # Using default installation

        marcink
    
docs: added info about --optimize flag for full text search.

              r2949
            
         $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \

             --instance-name=enterprise-1

        marcink
    
project: added all source files and assets

              r1
            
         # Using a custom mapping file

        marcink
    
docs: added info about --optimize flag for full text search.

              r2949
            
         $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \

             --instance-name=enterprise-1 \

        marcink
    
docs: update full text search indexing documentation

              r3400
            
             --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini

         # Using a custom mapping file and invocation without ``.rhoderc``

         $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \

             --api-host=http://rhodecodecode.myserver.com --api-key=xxxxx \

             --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini

         # From inside a virtualev on your local machine or CI server.

         (venv)$ rhodecode-index --instance-name=enterprise-1

        marcink
    
project: added all source files and assets

              r1
            
      .. note::

        marcink
    
docs: added info about --optimize flag for full text search.

              r2949
            
         In case of often indexing the index may become fragmented. Most often a result of that

         is error about `too many open files`. To fix this indexer needs to be executed with

         --optimize flag. E.g `rhodecode-index --instance-name=enterprise-1 --optimize`

         This should be executed regularly, once a week is recommended.

        marcink
    
project: added all source files and assets

              r1
            
      .. _set-index:

      Schedule the Indexer

      ^^^^^^^^^^^^^^^^^^^^

      To schedule the indexer, configure the crontab file to run the indexer inside

      your |RCT| virtualenv using the following steps.

      1. Open the crontab file, using ``crontab -e``.

      2. Add the indexer to the crontab, and schedule it to run as regularly as you

         wish.

      3. Save the file.

      .. code-block:: bash

          $ crontab -e

          # The virtualenv can be called using its full path, so for example you can

          # put this example into the crontab

          # Run the indexer daily at 4am using the default mapping settings

          * 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \

          --instance-name=enterprise-1

          # Run the indexer every Sunday at 3am using default mapping

          * 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \

          --instance-name=enterprise-1

          # Run the indexer every 15 minutes

          # using a specially configured mapping file

          */15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \

             --instance-name=enterprise-4 \

        marcink
    
docs: update full text search indexing documentation

              r3400
            
             --mapping=/home/user/.rccontrol/enterprise-4/search_mapping.ini

        marcink
    
project: added all source files and assets

              r1
            
      .. _advanced-indexing:

      Advanced Indexing

      ^^^^^^^^^^^^^^^^^

        marcink
    
docs: update full text search indexing documentation

              r3400
            
      Force Re-Indexing single repository

      +++++++++++++++++++++++++++++++++++

      Often it's required to re-index whole repository because of some repository changes,

      or to remove some indexed secrets, or files. There's a special `--repo-name=` flag

      for the indexer that limits execution to a single repository. For example to force-reindex

      single repository such call can be made::

          rhodecode-index --instance-name=enterprise-1 --force --repo-name=rhodecode-vcsserver

      Removing repositories from index

      ++++++++++++++++++++++++++++++++

      The indexer automatically removes renamed repositories and builds index for new names.

      In case that you wish to remove indexed repository manually such call would allow that::

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver

      Using search_mapping.ini file for advanced index rules

      ++++++++++++++++++++++++++++++++++++++++++++++++++++++

      By default rhodecode-index runs for all repositories, all files with parsing limits

      defined by the CLI default arguments. You can change those limits by calling with

      different flags such as `--max-filesize 2048kb` or `--repo-limit 10`

      For more advanced execution logic it's possible to use a configuration file that

      would define detailed rules which repositories and how should be indexed.

      |RCT| provides an example index configuration file called :file:`search_mapping.ini`.

      This file is created by default during installation and is located at:

      * :file:`/home/{user}/.rccontrol/{instance-id}/search_mapping.ini`, using default |RCT|.

        marcink
    
project: added all source files and assets

              r1
            
      * :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,

        when using ``virtualenv``.

      .. note::

        marcink
    
docs: update full text search indexing documentation

              r3400
            
          If you need to create the :file:`search_mapping.ini` file manually, use the |RCT|

          ``rhodecode-index --create-mapping path/to/search_mapping.ini`` API call.

          For details, see the :ref:`tools-cli` section.

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
      To Run the indexer with mapping file provide it using `--mapping` flag::

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          rhodecode-index --instance-name=enterprise-1 --mapping=/my/path/search_mapping.ini

      Here's a detailed example of using :file:`search_mapping.ini` file.

        marcink
    
project: added all source files and assets

              r1
            
      .. code-block:: ini

          [__DEFAULT__]

        marcink
    
docs: update full text search indexing documentation

              r3400
            
          ; Create index on commits data, and files data in this order. Available options

          ; are `commits`, `files`

          index_types = commits,files

          ; Commit fetch limit. In what amount of chunks commits should be fetched

          ; via api and parsed. This allows server to transfer smaller chunks and be less loaded

          commit_fetch_limit = 1000

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          ; Commit process limit. Limit the number of commits indexer should fetch, and

          ; store inside the full text search index. eg. if repo has 2000 commits, and

          ; limit is 1000, on the first run it will process commits 0-1000 and on the

          ; second 1000-2000 commits. Help reduce memory usage, default is 50000

          ; (set -1 for unlimited)

          commit_process_limit = 50000

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          ; Limit of how many repositories each run can process, default is -1 (unlimited)

          ; in case of 1000s of repositories it's better to execute in chunks to not overload

          ; the server.

          repo_limit = -1

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          ; Default patterns for indexing files and content of files. Binary files

          ; are skipped by default.

          ; Add to index those comma separated files; globs syntax

          ; e.g index_files = *.py, *.c, *.h, *.js

          index_files = *,

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          ; Do not add to index those comma separated files, this excludes

          ; both search by name and content; globs syntax

          ; e.g index_files = *.key, *.sql, *.xml

          skip_files = ,

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          ; Add to index content of those comma separated files; globs syntax

          ; e.g index_files = *.h, *.obj

          index_files_content = *,

          ; Do not add to index content of those comma separated files; globs syntax

          ; e.g index_files = *.exe, *.bin, *.log, *.dump

          skip_files_content = ,

          ; Force rebuilding an index from scratch. Each repository will be rebuild from

          ; scratch with a global flag. Use --repo-name=NAME --force to rebuild single repo

        marcink
    
project: added all source files and assets

              r1
            
          force = false

        marcink
    
docs: update full text search indexing documentation

              r3400
            
          ; maximum file size that indexer will use, files above that limit are not going

          ; to have they content indexed.

          ; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB

          max_filesize = 2MB

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          [__INDEX_RULES__]

          ; Ordered match rules for repositories. A list of all repositories will be fetched

          ; using API and this list will be filtered using those rules.

          ; Syntax for entry: `glob_pattern_OR_full_repo_name = 0 OR 1` where 0=exclude, 1=include

          ; When this ordered list is traversed first match will return the include/exclude marker

          ; For example:

          ;    upstream/binary_repo = 0

          ;    upstream/subrepo/xml_files = 0

          ;    upstream/* = 1

          ;    special-repo = 1

          ;    * = 0

          ; This will index all repositories under upstream/*, but skip upstream/binary_repo

          ; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          ; Another example:

          ;    *-fork = 0

          ;    * = 1

          ; This will index all repositories, except those that have -fork as suffix.

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          rhodecode-vcsserver = 1

          rhodecode-enterprise-ce = 1

          upstream/mozilla/firefox-repo = 0

          upstream/git-binaries = 0

          upstream/* = 1

          * = 0

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          ; == EXPLICIT REPOSITORY INDEXING ==

          ; If defined this will skip using __INDEX_RULES__, and will not use API to fetch

          ; list of repositories, it will explicitly take names defined with [NAME] format and

          ; try to build the index, to build index just for repo_name_1 and special-repo use:

          ;    [repo_name_1]

          ;    [special-repo]

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          ; == PER REPOSITORY CONFIGURATION ==

          ; This allows overriding the global configuration per repository.

          ; example to set specific file limit, and skip certain files for repository special-repo

          ;    [conf:special-repo]

          ;    max_filesize = 5mb

          ;    skip_files = *.xml, *.sql

          ;    index_types = files,

        marcink
    
project: added all source files and assets

              r1
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
          [conf:rhodecode-vcsserver]

          index_types = files,

          max_filesize = 5mb

          skip_files = *.xml, *.sql

          index_files = *.py, *.c, *.h, *.js

      In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.

      There's a special flag to test the mapping file rules and list repositories that would

      be indexed. Run the indexer with `--show-matched-repos` to list only the match rules::

          rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini

        marcink
    
project: added all source files and assets

              r1
            
        dan
    
docs: add elasticsearch docs

              r153
            
      .. _enable-elasticsearch:

      Enabling Elasticsearch

      ^^^^^^^^^^^^^^^^^^^^^^

        marcink
    
docs: update full text search indexing documentation

              r3400
            
      Elasticsearch is available in EE edition only. It provides much scalable and more advanced

      search capabilities. While Whoosh is fine for upto 1-2GB of data beyond that amount of

      data it starts slowing down, and can cause other problems. Elasticsearch 6 also provides

      much more advanced query language allowing advanced filtering by file paths, extensions

      OR statements, ranges etc. Please check query language examples in the search field for

      some advanced query language usage.

        dan
    
docs: add elasticsearch docs

              r153
            
      1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The

         default location is

         :file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`

      2. Find the search configuration section:

      .. code-block:: ini

          ###################################

          ## SEARCH INDEXING CONFIGURATION ##

          ###################################

          search.module = rhodecode.lib.index.whoosh

          search.location = %(here)s/data/index

      and change it to:

      .. code-block:: ini

          search.module = rc_elasticsearch

        marcink
    
docs: update full text search indexing documentation

              r3400
            
          search.location = http://localhost:9200

          ## specify Elastic Search version, 6 for latest or 2 for legacy

          search.es_version = 6

      where ``search.location`` points to the elasticsearch server

      by default running on port 9200.

        dan
    
docs: add elasticsearch docs

              r153
            
        marcink
    
docs: update full text search indexing documentation

              r3400
            
      Index invocation also needs change. Please provide --es-version= and

      --engine-location= parameters to define elasticsearch server location and it's version.

      For example::

          rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200

        dan
    
docs: add elasticsearch docs

              r153
            
        marcink
    
project: added all source files and assets

              r1
            
      .. _Whoosh: https://pypi.python.org/pypi/Whoosh/

        marcink
    
docs: update full text search indexing documentation

              r3400
            
      .. _Elasticsearch 6: https://www.elastic.co/

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

marcink project: added all source files and assets	r1	.. _indexing-ref:

		Full-text Search
		----------------

marcink docs: added SAML documentation....	r3290	By default RhodeCode is configured to use `Whoosh`_ to index \|repos\| and
dan docs: add elasticsearch docs	r153	provide full-text search.

marcink docs: update full text search indexing documentation	r3400	\|RCE\| also provides support for `Elasticsearch 6`_ as a backend more for advanced
		and scalable search. See :ref:`enable-elasticsearch` for details.
dan docs: add elasticsearch docs	r153
		Indexing
		^^^^^^^^

marcink docs: update full text search indexing documentation	r3400	To run the indexer you need to have an \|authtoken\| with admin rights to all \|repos\|.
marcink project: added all source files and assets	r1
		To index new content added, you have the option to set the indexer up in a
		number of ways, for example:

marcink docs: update full text search indexing documentation	r3400	* Call the indexer via a cron job. We recommend running this once at night.
		In case you need everything indexed immediately it's possible to index few
		times during the day.
		* Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.
marcink project: added all source files and assets	r1	* Hook the indexer up with your CI server to reindex after each push.

		The indexer works by indexing new commits added since the last run. If you
		wish to build a brand new index from scratch each time,
		use the ``force`` option in the configuration file.

		.. important::

		You need to have \|RCT\| installed, see :ref:`install-tools`. Since \|RCE\|
marcink docs: update full text search indexing documentation	r3400	3.5.0 they are installed by default and available with community/enterprise installations.
marcink project: added all source files and assets	r1
		To set up indexing, use the following steps:

		1. :ref:`config-rhoderc`, if running tools remotely.
		2. :ref:`run-index`
		3. :ref:`set-index`
		4. :ref:`advanced-indexing`

		.. _config-rhoderc:

		Configure the ``.rhoderc`` File
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

marcink docs: update full text search indexing documentation	r3400	.. note::

		Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of
		executing with `--instance-name=enterprise-1` execute providing the host and token
		directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth token goes here>


marcink project: added all source files and assets	r1	\|RCT\| uses the :file:`/home/{user}/.rhoderc` file for connection details
marcink docs: added SAML documentation....	r3290	to \|RCE\| instances. If this file is not automatically created,
marcink project: added all source files and assets	r1	you can configure it using the following example. You need to configure the
		details for each instance you want to index.

		.. code-block:: bash

		# Check the instance details
		# of the instance you want to index
		$ rccontrol status

marcink docs: update full text search indexing documentation	r3400	- NAME: enterprise-1
		- STATUS: RUNNING
		- TYPE: Enterprise
		- VERSION: 4.1.0
		- URL: http://127.0.0.1:10003
marcink project: added all source files and assets	r1
marcink docs: added SAML documentation....	r3290	To get your API Token, on the \|RCE\| interface go to
marcink project: added all source files and assets	r1	:menuselection:`username --> My Account --> Auth tokens`

		.. code-block:: ini

		# Configure .rhoderc with matching details
		# This allows the indexer to connect to the instance
		[instance:enterprise-1]
		api_host = http://127.0.0.1:10000
		api_key = <auth token goes here>
marcink docs: update full text search indexing documentation	r3400
marcink project: added all source files and assets	r1
		.. _run-index:

		Run the Indexer
		^^^^^^^^^^^^^^^

marcink docs: update full text search indexing documentation	r3400	Run the indexer using the following command, and specify the instance you want to index:
marcink project: added all source files and assets	r1
		.. code-block:: bash

		# Using default installation
marcink docs: added info about --optimize flag for full text search.	r2949	$ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
		--instance-name=enterprise-1
marcink project: added all source files and assets	r1
		# Using a custom mapping file
marcink docs: added info about --optimize flag for full text search.	r2949	$ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
		--instance-name=enterprise-1 \
marcink docs: update full text search indexing documentation	r3400	--mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini

		# Using a custom mapping file and invocation without ``.rhoderc``
		$ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
		--api-host=http://rhodecodecode.myserver.com --api-key=xxxxx \
		--mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini

		# From inside a virtualev on your local machine or CI server.
		(venv)$ rhodecode-index --instance-name=enterprise-1

marcink project: added all source files and assets	r1
		.. note::

marcink docs: added info about --optimize flag for full text search.	r2949	In case of often indexing the index may become fragmented. Most often a result of that
		is error about `too many open files`. To fix this indexer needs to be executed with
		--optimize flag. E.g `rhodecode-index --instance-name=enterprise-1 --optimize`
		This should be executed regularly, once a week is recommended.

marcink project: added all source files and assets	r1
		.. _set-index:

		Schedule the Indexer
		^^^^^^^^^^^^^^^^^^^^

		To schedule the indexer, configure the crontab file to run the indexer inside
		your \|RCT\| virtualenv using the following steps.

		1. Open the crontab file, using ``crontab -e``.
		2. Add the indexer to the crontab, and schedule it to run as regularly as you
		wish.
		3. Save the file.

		.. code-block:: bash

		$ crontab -e

		# The virtualenv can be called using its full path, so for example you can
		# put this example into the crontab

		# Run the indexer daily at 4am using the default mapping settings
		* 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
		--instance-name=enterprise-1

		# Run the indexer every Sunday at 3am using default mapping
		* 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
		--instance-name=enterprise-1

		# Run the indexer every 15 minutes
		# using a specially configured mapping file
		/15 * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
		--instance-name=enterprise-4 \
marcink docs: update full text search indexing documentation	r3400	--mapping=/home/user/.rccontrol/enterprise-4/search_mapping.ini
marcink project: added all source files and assets	r1
		.. _advanced-indexing:

		Advanced Indexing
		^^^^^^^^^^^^^^^^^

marcink docs: update full text search indexing documentation	r3400
		Force Re-Indexing single repository
		+++++++++++++++++++++++++++++++++++

		Often it's required to re-index whole repository because of some repository changes,
		or to remove some indexed secrets, or files. There's a special `--repo-name=` flag
		for the indexer that limits execution to a single repository. For example to force-reindex
		single repository such call can be made::

		rhodecode-index --instance-name=enterprise-1 --force --repo-name=rhodecode-vcsserver


		Removing repositories from index
		++++++++++++++++++++++++++++++++

		The indexer automatically removes renamed repositories and builds index for new names.
		In case that you wish to remove indexed repository manually such call would allow that::
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver


		Using search_mapping.ini file for advanced index rules
		++++++++++++++++++++++++++++++++++++++++++++++++++++++

		By default rhodecode-index runs for all repositories, all files with parsing limits
		defined by the CLI default arguments. You can change those limits by calling with
		different flags such as `--max-filesize 2048kb` or `--repo-limit 10`

		For more advanced execution logic it's possible to use a configuration file that
		would define detailed rules which repositories and how should be indexed.

		\|RCT\| provides an example index configuration file called :file:`search_mapping.ini`.
		This file is created by default during installation and is located at:

		* :file:`/home/{user}/.rccontrol/{instance-id}/search_mapping.ini`, using default \|RCT\|.
marcink project: added all source files and assets	r1	* :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,
		when using ``virtualenv``.

		.. note::

marcink docs: update full text search indexing documentation	r3400	If you need to create the :file:`search_mapping.ini` file manually, use the \|RCT\|
		``rhodecode-index --create-mapping path/to/search_mapping.ini`` API call.
		For details, see the :ref:`tools-cli` section.
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	To Run the indexer with mapping file provide it using `--mapping` flag::
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	rhodecode-index --instance-name=enterprise-1 --mapping=/my/path/search_mapping.ini


		Here's a detailed example of using :file:`search_mapping.ini` file.
marcink project: added all source files and assets	r1
		.. code-block:: ini

		[__DEFAULT__]
marcink docs: update full text search indexing documentation	r3400	; Create index on commits data, and files data in this order. Available options
		; are `commits`, `files`
		index_types = commits,files

		; Commit fetch limit. In what amount of chunks commits should be fetched
		; via api and parsed. This allows server to transfer smaller chunks and be less loaded
		commit_fetch_limit = 1000
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	; Commit process limit. Limit the number of commits indexer should fetch, and
		; store inside the full text search index. eg. if repo has 2000 commits, and
		; limit is 1000, on the first run it will process commits 0-1000 and on the
		; second 1000-2000 commits. Help reduce memory usage, default is 50000
		; (set -1 for unlimited)
		commit_process_limit = 50000
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	; Limit of how many repositories each run can process, default is -1 (unlimited)
		; in case of 1000s of repositories it's better to execute in chunks to not overload
		; the server.
		repo_limit = -1
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	; Default patterns for indexing files and content of files. Binary files
		; are skipped by default.

		; Add to index those comma separated files; globs syntax
		; e.g index_files = .py, .c, .h, .js
		index_files = *,
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	; Do not add to index those comma separated files, this excludes
		; both search by name and content; globs syntax
		; e.g index_files = .key, .sql, *.xml
		skip_files = ,
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	; Add to index content of those comma separated files; globs syntax
		; e.g index_files = .h, .obj
		index_files_content = *,

		; Do not add to index content of those comma separated files; globs syntax
		; e.g index_files = .exe, .bin, .log, .dump
		skip_files_content = ,

		; Force rebuilding an index from scratch. Each repository will be rebuild from
		; scratch with a global flag. Use --repo-name=NAME --force to rebuild single repo
marcink project: added all source files and assets	r1	force = false

marcink docs: update full text search indexing documentation	r3400	; maximum file size that indexer will use, files above that limit are not going
		; to have they content indexed.
		; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB
		max_filesize = 2MB
marcink project: added all source files and assets	r1

marcink docs: update full text search indexing documentation	r3400	[__INDEX_RULES__]
		; Ordered match rules for repositories. A list of all repositories will be fetched
		; using API and this list will be filtered using those rules.
		; Syntax for entry: `glob_pattern_OR_full_repo_name = 0 OR 1` where 0=exclude, 1=include
		; When this ordered list is traversed first match will return the include/exclude marker
		; For example:
		; upstream/binary_repo = 0
		; upstream/subrepo/xml_files = 0
		; upstream/* = 1
		; special-repo = 1
		; * = 0
		; This will index all repositories under upstream/*, but skip upstream/binary_repo
		; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	; Another example:
		; *-fork = 0
		; * = 1
		; This will index all repositories, except those that have -fork as suffix.
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	rhodecode-vcsserver = 1
		rhodecode-enterprise-ce = 1
		upstream/mozilla/firefox-repo = 0
		upstream/git-binaries = 0
		upstream/* = 1
		* = 0
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	; == EXPLICIT REPOSITORY INDEXING ==
		; If defined this will skip using __INDEX_RULES__, and will not use API to fetch
		; list of repositories, it will explicitly take names defined with [NAME] format and
		; try to build the index, to build index just for repo_name_1 and special-repo use:
		; [repo_name_1]
		; [special-repo]
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	; == PER REPOSITORY CONFIGURATION ==
		; This allows overriding the global configuration per repository.
		; example to set specific file limit, and skip certain files for repository special-repo
		; [conf:special-repo]
		; max_filesize = 5mb
		; skip_files = .xml, .sql
		; index_types = files,
marcink project: added all source files and assets	r1
marcink docs: update full text search indexing documentation	r3400	[conf:rhodecode-vcsserver]
		index_types = files,
		max_filesize = 5mb
		skip_files = .xml, .sql
		index_files = .py, .c, .h, .js


		In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.
		There's a special flag to test the mapping file rules and list repositories that would
		be indexed. Run the indexer with `--show-matched-repos` to list only the match rules::

		rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini

marcink project: added all source files and assets	r1
dan docs: add elasticsearch docs	r153	.. _enable-elasticsearch:

		Enabling Elasticsearch
		^^^^^^^^^^^^^^^^^^^^^^

marcink docs: update full text search indexing documentation	r3400	Elasticsearch is available in EE edition only. It provides much scalable and more advanced
		search capabilities. While Whoosh is fine for upto 1-2GB of data beyond that amount of
		data it starts slowing down, and can cause other problems. Elasticsearch 6 also provides
		much more advanced query language allowing advanced filtering by file paths, extensions
		OR statements, ranges etc. Please check query language examples in the search field for
		some advanced query language usage.


dan docs: add elasticsearch docs	r153	1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
		default location is
		:file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
		2. Find the search configuration section:

		.. code-block:: ini

		###################################
		## SEARCH INDEXING CONFIGURATION ##
		###################################

		search.module = rhodecode.lib.index.whoosh
		search.location = %(here)s/data/index

		and change it to:

		.. code-block:: ini

		search.module = rc_elasticsearch
marcink docs: update full text search indexing documentation	r3400	search.location = http://localhost:9200
		## specify Elastic Search version, 6 for latest or 2 for legacy
		search.es_version = 6

		where ``search.location`` points to the elasticsearch server
		by default running on port 9200.
dan docs: add elasticsearch docs	r153
marcink docs: update full text search indexing documentation	r3400	Index invocation also needs change. Please provide --es-version= and
		--engine-location= parameters to define elasticsearch server location and it's version.
		For example::

		rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200

dan docs: add elasticsearch docs	r153
marcink project: added all source files and assets	r1	.. _Whoosh: https://pypi.python.org/pypi/Whoosh/
marcink docs: update full text search indexing documentation	r3400	.. _Elasticsearch 6: https://www.elastic.co/