##// END OF EJS Templates
docs: updated indexer documentation
marcink -
r3482:4b2dd92b default
parent child Browse files
Show More
@@ -3,10 +3,15 b''
3 Full-text Search
3 Full-text Search
4 ----------------
4 ----------------
5
5
6 RhodeCode provides a full text search capabilities to search inside file content,
7 commit message, and file paths. Indexing is not enabled by default and to use
8 full text search building an index is a pre-requisite.
9
6 By default RhodeCode is configured to use `Whoosh`_ to index |repos| and
10 By default RhodeCode is configured to use `Whoosh`_ to index |repos| and
7 provide full-text search.
11 provide full-text search. `Whoosh`_ works well for a small amount of data and
12 shouldn't be used in case of large code-bases and lots of repositories.
8
13
9 |RCE| also provides support for `Elasticsearch 6`_ as a backend more for advanced
14 |RCE| also provides support for `ElasticSearch 6`_ as a backend more for advanced
10 and scalable search. See :ref:`enable-elasticsearch` for details.
15 and scalable search. See :ref:`enable-elasticsearch` for details.
11
16
12 Indexing
17 Indexing
@@ -14,18 +19,20 b' Indexing'
14
19
15 To run the indexer you need to have an |authtoken| with admin rights to all |repos|.
20 To run the indexer you need to have an |authtoken| with admin rights to all |repos|.
16
21
17 To index new content added, you have the option to set the indexer up in a
22 To index repositories stored in RhodeCode, you have the option to set the indexer up in a
18 number of ways, for example:
23 number of ways, for example:
19
24
20 * Call the indexer via a cron job. We recommend running this once at night.
25 * Call the indexer via a cron job. We recommend running this once at night.
21 In case you need everything indexed immediately it's possible to index few
26 In case you need everything indexed immediately it's possible to index few
22 times during the day.
27 times during the day. Indexer has a special locking mechanism that won't allow
28 two instances of indexer running at once. It's safe to run it even every 1hr.
23 * Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.
29 * Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.
24 * Hook the indexer up with your CI server to reindex after each push.
30 * Hook the indexer up with your CI server to reindex after each push.
25
31
26 The indexer works by indexing new commits added since the last run. If you
32 The indexer works by indexing new commits added since the last run, and comparing
27 wish to build a brand new index from scratch each time,
33 file changes to index only new or modified files.
28 use the ``force`` option in the configuration file.
34 If you wish to build a brand new index from scratch each time, use the ``force``
35 option in the configuration file, or run it with --force flag.
29
36
30 .. important::
37 .. important::
31
38
@@ -48,7 +55,7 b' Configure the ``.rhoderc`` File'
48
55
49 Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of
56 Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of
50 executing with `--instance-name=enterprise-1` execute providing the host and token
57 executing with `--instance-name=enterprise-1` execute providing the host and token
51 directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth token goes here>
58 directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth-token-goes-here>`
52
59
53
60
54 |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
61 |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
@@ -89,14 +96,15 b' Run the indexer using the following comm'
89
96
90 .. code-block:: bash
97 .. code-block:: bash
91
98
92 # Using default installation
99 # Using default simples indexing of all repositories
93 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
100 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
94 --instance-name=enterprise-1
101 --instance-name=enterprise-1
95
102
96 # Using a custom mapping file
103 # Using a custom mapping file with indexing rules, and using elasticsearch 6 backend
97 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
104 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
98 --instance-name=enterprise-1 \
105 --instance-name=enterprise-1 \
99 --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini
106 --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini \
107 --es-version=6 --engine-location=http://elasticsearch-host:9200
100
108
101 # Using a custom mapping file and invocation without ``.rhoderc``
109 # Using a custom mapping file and invocation without ``.rhoderc``
102 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
110 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
@@ -170,6 +178,8 b' Removing repositories from index'
170 ++++++++++++++++++++++++++++++++
178 ++++++++++++++++++++++++++++++++
171
179
172 The indexer automatically removes renamed repositories and builds index for new names.
180 The indexer automatically removes renamed repositories and builds index for new names.
181 In the same way if a listed repository in mapping.ini is not reported existing by the
182 server it's removed from the index.
173 In case that you wish to remove indexed repository manually such call would allow that::
183 In case that you wish to remove indexed repository manually such call would allow that::
174
184
175 rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver
185 rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver
@@ -180,7 +190,7 b' Using search_mapping.ini file for advanc'
180
190
181 By default rhodecode-index runs for all repositories, all files with parsing limits
191 By default rhodecode-index runs for all repositories, all files with parsing limits
182 defined by the CLI default arguments. You can change those limits by calling with
192 defined by the CLI default arguments. You can change those limits by calling with
183 different flags such as `--max-filesize 2048kb` or `--repo-limit 10`
193 different flags such as `--max-filesize=2048kb` or `--repo-limit=10`
184
194
185 For more advanced execution logic it's possible to use a configuration file that
195 For more advanced execution logic it's possible to use a configuration file that
186 would define detailed rules which repositories and how should be indexed.
196 would define detailed rules which repositories and how should be indexed.
@@ -221,7 +231,7 b" Here's a detailed example of using :file"
221 ; limit is 1000, on the first run it will process commits 0-1000 and on the
231 ; limit is 1000, on the first run it will process commits 0-1000 and on the
222 ; second 1000-2000 commits. Help reduce memory usage, default is 50000
232 ; second 1000-2000 commits. Help reduce memory usage, default is 50000
223 ; (set -1 for unlimited)
233 ; (set -1 for unlimited)
224 commit_process_limit = 50000
234 commit_process_limit = 20000
225
235
226 ; Limit of how many repositories each run can process, default is -1 (unlimited)
236 ; Limit of how many repositories each run can process, default is -1 (unlimited)
227 ; in case of 1000s of repositories it's better to execute in chunks to not overload
237 ; in case of 1000s of repositories it's better to execute in chunks to not overload
@@ -237,7 +247,7 b" Here's a detailed example of using :file"
237
247
238 ; Do not add to index those comma separated files, this excludes
248 ; Do not add to index those comma separated files, this excludes
239 ; both search by name and content; globs syntax
249 ; both search by name and content; globs syntax
240 ; e.g index_files = *.key, *.sql, *.xml
250 ; e.g index_files = *.key, *.sql, *.xml, *.pem, *.crt
241 skip_files = ,
251 skip_files = ,
242
252
243 ; Add to index content of those comma separated files; globs syntax
253 ; Add to index content of those comma separated files; globs syntax
@@ -245,7 +255,8 b" Here's a detailed example of using :file"
245 index_files_content = *,
255 index_files_content = *,
246
256
247 ; Do not add to index content of those comma separated files; globs syntax
257 ; Do not add to index content of those comma separated files; globs syntax
248 ; e.g index_files = *.exe, *.bin, *.log, *.dump
258 ; Binary files are not indexed by default.
259 ; e.g index_files = *.min.js, *.xml, *.dump, *.log, *.dump
249 skip_files_content = ,
260 skip_files_content = ,
250
261
251 ; Force rebuilding an index from scratch. Each repository will be rebuild from
262 ; Force rebuilding an index from scratch. Each repository will be rebuild from
@@ -255,7 +266,7 b" Here's a detailed example of using :file"
255 ; maximum file size that indexer will use, files above that limit are not going
266 ; maximum file size that indexer will use, files above that limit are not going
256 ; to have they content indexed.
267 ; to have they content indexed.
257 ; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB
268 ; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB
258 max_filesize = 2MB
269 max_filesize = 10MB
259
270
260
271
261 [__INDEX_RULES__]
272 [__INDEX_RULES__]
@@ -272,17 +283,6 b" Here's a detailed example of using :file"
272 ; This will index all repositories under upstream/*, but skip upstream/binary_repo
283 ; This will index all repositories under upstream/*, but skip upstream/binary_repo
273 ; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches
284 ; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches
274
285
275 ; Another example:
276 ; *-fork = 0
277 ; * = 1
278 ; This will index all repositories, except those that have -fork as suffix.
279
280 rhodecode-vcsserver = 1
281 rhodecode-enterprise-ce = 1
282 upstream/mozilla/firefox-repo = 0
283 upstream/git-binaries = 0
284 upstream/* = 1
285 * = 0
286
286
287 ; == EXPLICIT REPOSITORY INDEXING ==
287 ; == EXPLICIT REPOSITORY INDEXING ==
288 ; If defined this will skip using __INDEX_RULES__, and will not use API to fetch
288 ; If defined this will skip using __INDEX_RULES__, and will not use API to fetch
@@ -294,36 +294,32 b" Here's a detailed example of using :file"
294 ; == PER REPOSITORY CONFIGURATION ==
294 ; == PER REPOSITORY CONFIGURATION ==
295 ; This allows overriding the global configuration per repository.
295 ; This allows overriding the global configuration per repository.
296 ; example to set specific file limit, and skip certain files for repository special-repo
296 ; example to set specific file limit, and skip certain files for repository special-repo
297 ; the CLI flags doesn't override the conf settings.
297 ; [conf:special-repo]
298 ; [conf:special-repo]
298 ; max_filesize = 5mb
299 ; max_filesize = 5mb
299 ; skip_files = *.xml, *.sql
300 ; skip_files = *.xml, *.sql
300 ; index_types = files,
301
301
302 [conf:rhodecode-vcsserver]
303 index_types = files,
304 max_filesize = 5mb
305 skip_files = *.xml, *.sql
306 index_files = *.py, *.c, *.h, *.js
307
302
308
303
309 In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.
304 In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.
310 There's a special flag to test the mapping file rules and list repositories that would
305 There's a special flag to test the mapping file rules and list repositories that would
311 be indexed. Run the indexer with `--show-matched-repos` to list only the match rules::
306 be indexed. Run the indexer with `--show-matched-repos` to list only the
307 match repositories defined in .ini file rules::
312
308
313 rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini
309 rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini
314
310
315
311
316 .. _enable-elasticsearch:
312 .. _enable-elasticsearch:
317
313
318 Enabling Elasticsearch
314 Enabling ElasticSearch
319 ^^^^^^^^^^^^^^^^^^^^^^
315 ^^^^^^^^^^^^^^^^^^^^^^
320
316
321 Elasticsearch is available in EE edition only. It provides much scalable and more advanced
317 ElasticSearch is available in EE edition only. It provides much scalable and more advanced
322 search capabilities. While Whoosh is fine for upto 1-2GB of data beyond that amount of
318 search capabilities. While Whoosh is fine for upto 1-2GB of data, beyond that amount it
323 data it starts slowing down, and can cause other problems. Elasticsearch 6 also provides
319 starts slowing down, and can cause other problems.
324 much more advanced query language allowing advanced filtering by file paths, extensions
320 New ElasticSearch 6 also provides much more advanced query language.
325 OR statements, ranges etc. Please check query language examples in the search field for
321 It allows advanced filtering by file paths, extensions, use OR statements, ranges etc.
326 some advanced query language usage.
322 Please check query language examples in the search field for some advanced query language usage.
327
323
328
324
329 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
325 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
@@ -349,15 +345,15 b' and change it to:'
349 ## specify Elastic Search version, 6 for latest or 2 for legacy
345 ## specify Elastic Search version, 6 for latest or 2 for legacy
350 search.es_version = 6
346 search.es_version = 6
351
347
352 where ``search.location`` points to the elasticsearch server
348 where ``search.location`` points to the ElasticSearch server
353 by default running on port 9200.
349 by default running on port 9200.
354
350
355 Index invocation also needs change. Please provide --es-version= and
351 Index invocation also needs change. Please provide --es-version= and
356 --engine-location= parameters to define elasticsearch server location and it's version.
352 --engine-location= parameters to define ElasticSearch server location and it's version.
357 For example::
353 For example::
358
354
359 rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200
355 rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200
360
356
361
357
362 .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
358 .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
363 .. _Elasticsearch 6: https://www.elastic.co/ No newline at end of file
359 .. _ElasticSearch 6: https://www.elastic.co/
General Comments 0
You need to be logged in to leave comments. Login now