Show More
@@ -3,10 +3,15 b'' | |||
|
3 | 3 | Full-text Search |
|
4 | 4 | ---------------- |
|
5 | 5 | |
|
6 | RhodeCode provides a full text search capabilities to search inside file content, | |
|
7 | commit message, and file paths. Indexing is not enabled by default and to use | |
|
8 | full text search building an index is a pre-requisite. | |
|
9 | ||
|
6 | 10 | By default RhodeCode is configured to use `Whoosh`_ to index |repos| and |
|
7 | provide full-text search. | |
|
11 | provide full-text search. `Whoosh`_ works well for a small amount of data and | |
|
12 | shouldn't be used in case of large code-bases and lots of repositories. | |
|
8 | 13 | |
|
9 |
|RCE| also provides support for `Elastic |
|
|
14 | |RCE| also provides support for `ElasticSearch 6`_ as a backend more for advanced | |
|
10 | 15 | and scalable search. See :ref:`enable-elasticsearch` for details. |
|
11 | 16 | |
|
12 | 17 | Indexing |
@@ -14,18 +19,20 b' Indexing' | |||
|
14 | 19 | |
|
15 | 20 | To run the indexer you need to have an |authtoken| with admin rights to all |repos|. |
|
16 | 21 | |
|
17 |
To index |
|
|
22 | To index repositories stored in RhodeCode, you have the option to set the indexer up in a | |
|
18 | 23 | number of ways, for example: |
|
19 | 24 | |
|
20 | 25 | * Call the indexer via a cron job. We recommend running this once at night. |
|
21 | 26 | In case you need everything indexed immediately it's possible to index few |
|
22 | times during the day. | |
|
27 | times during the day. Indexer has a special locking mechanism that won't allow | |
|
28 | two instances of indexer running at once. It's safe to run it even every 1hr. | |
|
23 | 29 | * Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle. |
|
24 | 30 | * Hook the indexer up with your CI server to reindex after each push. |
|
25 | 31 | |
|
26 |
The indexer works by indexing new commits added since the last run |
|
|
27 | wish to build a brand new index from scratch each time, | |
|
28 | use the ``force`` option in the configuration file. | |
|
32 | The indexer works by indexing new commits added since the last run, and comparing | |
|
33 | file changes to index only new or modified files. | |
|
34 | If you wish to build a brand new index from scratch each time, use the ``force`` | |
|
35 | option in the configuration file, or run it with --force flag. | |
|
29 | 36 | |
|
30 | 37 | .. important:: |
|
31 | 38 | |
@@ -48,7 +55,7 b' Configure the ``.rhoderc`` File' | |||
|
48 | 55 | |
|
49 | 56 | Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of |
|
50 | 57 | executing with `--instance-name=enterprise-1` execute providing the host and token |
|
51 |
directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth |
|
|
58 | directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth-token-goes-here>` | |
|
52 | 59 | |
|
53 | 60 | |
|
54 | 61 | |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details |
@@ -89,14 +96,15 b' Run the indexer using the following comm' | |||
|
89 | 96 | |
|
90 | 97 | .. code-block:: bash |
|
91 | 98 | |
|
92 |
# Using default |
|
|
99 | # Using default simples indexing of all repositories | |
|
93 | 100 | $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ |
|
94 | 101 | --instance-name=enterprise-1 |
|
95 | 102 | |
|
96 | # Using a custom mapping file | |
|
103 | # Using a custom mapping file with indexing rules, and using elasticsearch 6 backend | |
|
97 | 104 | $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ |
|
98 | 105 | --instance-name=enterprise-1 \ |
|
99 | --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini | |
|
106 | --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini \ | |
|
107 | --es-version=6 --engine-location=http://elasticsearch-host:9200 | |
|
100 | 108 | |
|
101 | 109 | # Using a custom mapping file and invocation without ``.rhoderc`` |
|
102 | 110 | $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ |
@@ -170,6 +178,8 b' Removing repositories from index' | |||
|
170 | 178 | ++++++++++++++++++++++++++++++++ |
|
171 | 179 | |
|
172 | 180 | The indexer automatically removes renamed repositories and builds index for new names. |
|
181 | In the same way if a listed repository in mapping.ini is not reported existing by the | |
|
182 | server it's removed from the index. | |
|
173 | 183 | In case that you wish to remove indexed repository manually such call would allow that:: |
|
174 | 184 | |
|
175 | 185 | rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver |
@@ -180,7 +190,7 b' Using search_mapping.ini file for advanc' | |||
|
180 | 190 | |
|
181 | 191 | By default rhodecode-index runs for all repositories, all files with parsing limits |
|
182 | 192 | defined by the CLI default arguments. You can change those limits by calling with |
|
183 |
different flags such as `--max-filesize |
|
|
193 | different flags such as `--max-filesize=2048kb` or `--repo-limit=10` | |
|
184 | 194 | |
|
185 | 195 | For more advanced execution logic it's possible to use a configuration file that |
|
186 | 196 | would define detailed rules which repositories and how should be indexed. |
@@ -221,7 +231,7 b" Here's a detailed example of using :file" | |||
|
221 | 231 | ; limit is 1000, on the first run it will process commits 0-1000 and on the |
|
222 | 232 | ; second 1000-2000 commits. Help reduce memory usage, default is 50000 |
|
223 | 233 | ; (set -1 for unlimited) |
|
224 |
commit_process_limit = |
|
|
234 | commit_process_limit = 20000 | |
|
225 | 235 | |
|
226 | 236 | ; Limit of how many repositories each run can process, default is -1 (unlimited) |
|
227 | 237 | ; in case of 1000s of repositories it's better to execute in chunks to not overload |
@@ -237,7 +247,7 b" Here's a detailed example of using :file" | |||
|
237 | 247 | |
|
238 | 248 | ; Do not add to index those comma separated files, this excludes |
|
239 | 249 | ; both search by name and content; globs syntax |
|
240 | ; e.g index_files = *.key, *.sql, *.xml | |
|
250 | ; e.g index_files = *.key, *.sql, *.xml, *.pem, *.crt | |
|
241 | 251 | skip_files = , |
|
242 | 252 | |
|
243 | 253 | ; Add to index content of those comma separated files; globs syntax |
@@ -245,7 +255,8 b" Here's a detailed example of using :file" | |||
|
245 | 255 | index_files_content = *, |
|
246 | 256 | |
|
247 | 257 | ; Do not add to index content of those comma separated files; globs syntax |
|
248 | ; e.g index_files = *.exe, *.bin, *.log, *.dump | |
|
258 | ; Binary files are not indexed by default. | |
|
259 | ; e.g index_files = *.min.js, *.xml, *.dump, *.log, *.dump | |
|
249 | 260 | skip_files_content = , |
|
250 | 261 | |
|
251 | 262 | ; Force rebuilding an index from scratch. Each repository will be rebuild from |
@@ -255,7 +266,7 b" Here's a detailed example of using :file" | |||
|
255 | 266 | ; maximum file size that indexer will use, files above that limit are not going |
|
256 | 267 | ; to have they content indexed. |
|
257 | 268 | ; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB |
|
258 |
max_filesize = |
|
|
269 | max_filesize = 10MB | |
|
259 | 270 | |
|
260 | 271 | |
|
261 | 272 | [__INDEX_RULES__] |
@@ -272,17 +283,6 b" Here's a detailed example of using :file" | |||
|
272 | 283 | ; This will index all repositories under upstream/*, but skip upstream/binary_repo |
|
273 | 284 | ; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches |
|
274 | 285 | |
|
275 | ; Another example: | |
|
276 | ; *-fork = 0 | |
|
277 | ; * = 1 | |
|
278 | ; This will index all repositories, except those that have -fork as suffix. | |
|
279 | ||
|
280 | rhodecode-vcsserver = 1 | |
|
281 | rhodecode-enterprise-ce = 1 | |
|
282 | upstream/mozilla/firefox-repo = 0 | |
|
283 | upstream/git-binaries = 0 | |
|
284 | upstream/* = 1 | |
|
285 | * = 0 | |
|
286 | 286 | |
|
287 | 287 | ; == EXPLICIT REPOSITORY INDEXING == |
|
288 | 288 | ; If defined this will skip using __INDEX_RULES__, and will not use API to fetch |
@@ -294,36 +294,32 b" Here's a detailed example of using :file" | |||
|
294 | 294 | ; == PER REPOSITORY CONFIGURATION == |
|
295 | 295 | ; This allows overriding the global configuration per repository. |
|
296 | 296 | ; example to set specific file limit, and skip certain files for repository special-repo |
|
297 | ; the CLI flags doesn't override the conf settings. | |
|
297 | 298 | ; [conf:special-repo] |
|
298 | 299 | ; max_filesize = 5mb |
|
299 | 300 | ; skip_files = *.xml, *.sql |
|
300 | ; index_types = files, | |
|
301 | 301 | |
|
302 | [conf:rhodecode-vcsserver] | |
|
303 | index_types = files, | |
|
304 | max_filesize = 5mb | |
|
305 | skip_files = *.xml, *.sql | |
|
306 | index_files = *.py, *.c, *.h, *.js | |
|
307 | 302 | |
|
308 | 303 | |
|
309 | 304 | In case of 1000s of repositories it can be tricky to write the include/exclude rules at first. |
|
310 | 305 | There's a special flag to test the mapping file rules and list repositories that would |
|
311 |
be indexed. Run the indexer with `--show-matched-repos` to list only the |
|
|
306 | be indexed. Run the indexer with `--show-matched-repos` to list only the | |
|
307 | match repositories defined in .ini file rules:: | |
|
312 | 308 | |
|
313 | 309 | rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini |
|
314 | 310 | |
|
315 | 311 | |
|
316 | 312 | .. _enable-elasticsearch: |
|
317 | 313 | |
|
318 |
Enabling Elastic |
|
|
314 | Enabling ElasticSearch | |
|
319 | 315 | ^^^^^^^^^^^^^^^^^^^^^^ |
|
320 | 316 | |
|
321 |
Elastic |
|
|
322 |
search capabilities. While Whoosh is fine for upto 1-2GB of data beyond that amount |
|
|
323 |
|
|
|
324 | much more advanced query language allowing advanced filtering by file paths, extensions | |
|
325 | OR statements, ranges etc. Please check query language examples in the search field for | |
|
326 | some advanced query language usage. | |
|
317 | ElasticSearch is available in EE edition only. It provides much scalable and more advanced | |
|
318 | search capabilities. While Whoosh is fine for upto 1-2GB of data, beyond that amount it | |
|
319 | starts slowing down, and can cause other problems. | |
|
320 | New ElasticSearch 6 also provides much more advanced query language. | |
|
321 | It allows advanced filtering by file paths, extensions, use OR statements, ranges etc. | |
|
322 | Please check query language examples in the search field for some advanced query language usage. | |
|
327 | 323 | |
|
328 | 324 | |
|
329 | 325 | 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The |
@@ -349,15 +345,15 b' and change it to:' | |||
|
349 | 345 | ## specify Elastic Search version, 6 for latest or 2 for legacy |
|
350 | 346 | search.es_version = 6 |
|
351 | 347 | |
|
352 |
where ``search.location`` points to the |
|
|
348 | where ``search.location`` points to the ElasticSearch server | |
|
353 | 349 | by default running on port 9200. |
|
354 | 350 | |
|
355 | 351 | Index invocation also needs change. Please provide --es-version= and |
|
356 |
--engine-location= parameters to define |
|
|
352 | --engine-location= parameters to define ElasticSearch server location and it's version. | |
|
357 | 353 | For example:: |
|
358 | 354 | |
|
359 | 355 | rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200 |
|
360 | 356 | |
|
361 | 357 | |
|
362 | 358 | .. _Whoosh: https://pypi.python.org/pypi/Whoosh/ |
|
363 |
.. _Elastic |
|
|
359 | .. _ElasticSearch 6: https://www.elastic.co/ |
General Comments 0
You need to be logged in to leave comments.
Login now