##// END OF EJS Templates
docs: updated indexer documentation
marcink -
r3482:4b2dd92b default
parent child Browse files
Show More
@@ -3,10 +3,15 b''
3 3 Full-text Search
4 4 ----------------
5 5
6 RhodeCode provides a full text search capabilities to search inside file content,
7 commit message, and file paths. Indexing is not enabled by default and to use
8 full text search building an index is a pre-requisite.
9
6 10 By default RhodeCode is configured to use `Whoosh`_ to index |repos| and
7 provide full-text search.
11 provide full-text search. `Whoosh`_ works well for a small amount of data and
12 shouldn't be used in case of large code-bases and lots of repositories.
8 13
9 |RCE| also provides support for `Elasticsearch 6`_ as a backend more for advanced
14 |RCE| also provides support for `ElasticSearch 6`_ as a backend more for advanced
10 15 and scalable search. See :ref:`enable-elasticsearch` for details.
11 16
12 17 Indexing
@@ -14,18 +19,20 b' Indexing'
14 19
15 20 To run the indexer you need to have an |authtoken| with admin rights to all |repos|.
16 21
17 To index new content added, you have the option to set the indexer up in a
22 To index repositories stored in RhodeCode, you have the option to set the indexer up in a
18 23 number of ways, for example:
19 24
20 25 * Call the indexer via a cron job. We recommend running this once at night.
21 26 In case you need everything indexed immediately it's possible to index few
22 times during the day.
27 times during the day. Indexer has a special locking mechanism that won't allow
28 two instances of indexer running at once. It's safe to run it even every 1hr.
23 29 * Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.
24 30 * Hook the indexer up with your CI server to reindex after each push.
25 31
26 The indexer works by indexing new commits added since the last run. If you
27 wish to build a brand new index from scratch each time,
28 use the ``force`` option in the configuration file.
32 The indexer works by indexing new commits added since the last run, and comparing
33 file changes to index only new or modified files.
34 If you wish to build a brand new index from scratch each time, use the ``force``
35 option in the configuration file, or run it with --force flag.
29 36
30 37 .. important::
31 38
@@ -48,7 +55,7 b' Configure the ``.rhoderc`` File'
48 55
49 56 Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of
50 57 executing with `--instance-name=enterprise-1` execute providing the host and token
51 directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth token goes here>
58 directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth-token-goes-here>`
52 59
53 60
54 61 |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
@@ -89,14 +96,15 b' Run the indexer using the following comm'
89 96
90 97 .. code-block:: bash
91 98
92 # Using default installation
99 # Using default simples indexing of all repositories
93 100 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
94 101 --instance-name=enterprise-1
95 102
96 # Using a custom mapping file
103 # Using a custom mapping file with indexing rules, and using elasticsearch 6 backend
97 104 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
98 105 --instance-name=enterprise-1 \
99 --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini
106 --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini \
107 --es-version=6 --engine-location=http://elasticsearch-host:9200
100 108
101 109 # Using a custom mapping file and invocation without ``.rhoderc``
102 110 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
@@ -170,6 +178,8 b' Removing repositories from index'
170 178 ++++++++++++++++++++++++++++++++
171 179
172 180 The indexer automatically removes renamed repositories and builds index for new names.
181 In the same way if a listed repository in mapping.ini is not reported existing by the
182 server it's removed from the index.
173 183 In case that you wish to remove indexed repository manually such call would allow that::
174 184
175 185 rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver
@@ -180,7 +190,7 b' Using search_mapping.ini file for advanc'
180 190
181 191 By default rhodecode-index runs for all repositories, all files with parsing limits
182 192 defined by the CLI default arguments. You can change those limits by calling with
183 different flags such as `--max-filesize 2048kb` or `--repo-limit 10`
193 different flags such as `--max-filesize=2048kb` or `--repo-limit=10`
184 194
185 195 For more advanced execution logic it's possible to use a configuration file that
186 196 would define detailed rules which repositories and how should be indexed.
@@ -221,7 +231,7 b" Here's a detailed example of using :file"
221 231 ; limit is 1000, on the first run it will process commits 0-1000 and on the
222 232 ; second 1000-2000 commits. Help reduce memory usage, default is 50000
223 233 ; (set -1 for unlimited)
224 commit_process_limit = 50000
234 commit_process_limit = 20000
225 235
226 236 ; Limit of how many repositories each run can process, default is -1 (unlimited)
227 237 ; in case of 1000s of repositories it's better to execute in chunks to not overload
@@ -237,7 +247,7 b" Here's a detailed example of using :file"
237 247
238 248 ; Do not add to index those comma separated files, this excludes
239 249 ; both search by name and content; globs syntax
240 ; e.g index_files = *.key, *.sql, *.xml
250 ; e.g index_files = *.key, *.sql, *.xml, *.pem, *.crt
241 251 skip_files = ,
242 252
243 253 ; Add to index content of those comma separated files; globs syntax
@@ -245,7 +255,8 b" Here's a detailed example of using :file"
245 255 index_files_content = *,
246 256
247 257 ; Do not add to index content of those comma separated files; globs syntax
248 ; e.g index_files = *.exe, *.bin, *.log, *.dump
258 ; Binary files are not indexed by default.
259 ; e.g index_files = *.min.js, *.xml, *.dump, *.log, *.dump
249 260 skip_files_content = ,
250 261
251 262 ; Force rebuilding an index from scratch. Each repository will be rebuild from
@@ -255,7 +266,7 b" Here's a detailed example of using :file"
255 266 ; maximum file size that indexer will use, files above that limit are not going
256 267 ; to have they content indexed.
257 268 ; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB
258 max_filesize = 2MB
269 max_filesize = 10MB
259 270
260 271
261 272 [__INDEX_RULES__]
@@ -272,17 +283,6 b" Here's a detailed example of using :file"
272 283 ; This will index all repositories under upstream/*, but skip upstream/binary_repo
273 284 ; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches
274 285
275 ; Another example:
276 ; *-fork = 0
277 ; * = 1
278 ; This will index all repositories, except those that have -fork as suffix.
279
280 rhodecode-vcsserver = 1
281 rhodecode-enterprise-ce = 1
282 upstream/mozilla/firefox-repo = 0
283 upstream/git-binaries = 0
284 upstream/* = 1
285 * = 0
286 286
287 287 ; == EXPLICIT REPOSITORY INDEXING ==
288 288 ; If defined this will skip using __INDEX_RULES__, and will not use API to fetch
@@ -294,36 +294,32 b" Here's a detailed example of using :file"
294 294 ; == PER REPOSITORY CONFIGURATION ==
295 295 ; This allows overriding the global configuration per repository.
296 296 ; example to set specific file limit, and skip certain files for repository special-repo
297 ; the CLI flags doesn't override the conf settings.
297 298 ; [conf:special-repo]
298 299 ; max_filesize = 5mb
299 300 ; skip_files = *.xml, *.sql
300 ; index_types = files,
301 301
302 [conf:rhodecode-vcsserver]
303 index_types = files,
304 max_filesize = 5mb
305 skip_files = *.xml, *.sql
306 index_files = *.py, *.c, *.h, *.js
307 302
308 303
309 304 In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.
310 305 There's a special flag to test the mapping file rules and list repositories that would
311 be indexed. Run the indexer with `--show-matched-repos` to list only the match rules::
306 be indexed. Run the indexer with `--show-matched-repos` to list only the
307 match repositories defined in .ini file rules::
312 308
313 309 rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini
314 310
315 311
316 312 .. _enable-elasticsearch:
317 313
318 Enabling Elasticsearch
314 Enabling ElasticSearch
319 315 ^^^^^^^^^^^^^^^^^^^^^^
320 316
321 Elasticsearch is available in EE edition only. It provides much scalable and more advanced
322 search capabilities. While Whoosh is fine for upto 1-2GB of data beyond that amount of
323 data it starts slowing down, and can cause other problems. Elasticsearch 6 also provides
324 much more advanced query language allowing advanced filtering by file paths, extensions
325 OR statements, ranges etc. Please check query language examples in the search field for
326 some advanced query language usage.
317 ElasticSearch is available in EE edition only. It provides much scalable and more advanced
318 search capabilities. While Whoosh is fine for upto 1-2GB of data, beyond that amount it
319 starts slowing down, and can cause other problems.
320 New ElasticSearch 6 also provides much more advanced query language.
321 It allows advanced filtering by file paths, extensions, use OR statements, ranges etc.
322 Please check query language examples in the search field for some advanced query language usage.
327 323
328 324
329 325 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
@@ -349,15 +345,15 b' and change it to:'
349 345 ## specify Elastic Search version, 6 for latest or 2 for legacy
350 346 search.es_version = 6
351 347
352 where ``search.location`` points to the elasticsearch server
348 where ``search.location`` points to the ElasticSearch server
353 349 by default running on port 9200.
354 350
355 351 Index invocation also needs change. Please provide --es-version= and
356 --engine-location= parameters to define elasticsearch server location and it's version.
352 --engine-location= parameters to define ElasticSearch server location and it's version.
357 353 For example::
358 354
359 355 rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200
360 356
361 357
362 358 .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
363 .. _Elasticsearch 6: https://www.elastic.co/ No newline at end of file
359 .. _ElasticSearch 6: https://www.elastic.co/
General Comments 0
You need to be logged in to leave comments. Login now