Show More
@@ -3,10 +3,15 b'' | |||||
3 | Full-text Search |
|
3 | Full-text Search | |
4 | ---------------- |
|
4 | ---------------- | |
5 |
|
5 | |||
|
6 | RhodeCode provides a full text search capabilities to search inside file content, | |||
|
7 | commit message, and file paths. Indexing is not enabled by default and to use | |||
|
8 | full text search building an index is a pre-requisite. | |||
|
9 | ||||
6 | By default RhodeCode is configured to use `Whoosh`_ to index |repos| and |
|
10 | By default RhodeCode is configured to use `Whoosh`_ to index |repos| and | |
7 | provide full-text search. |
|
11 | provide full-text search. `Whoosh`_ works well for a small amount of data and | |
|
12 | shouldn't be used in case of large code-bases and lots of repositories. | |||
8 |
|
13 | |||
9 |
|RCE| also provides support for `Elastic |
|
14 | |RCE| also provides support for `ElasticSearch 6`_ as a backend more for advanced | |
10 | and scalable search. See :ref:`enable-elasticsearch` for details. |
|
15 | and scalable search. See :ref:`enable-elasticsearch` for details. | |
11 |
|
16 | |||
12 | Indexing |
|
17 | Indexing | |
@@ -14,18 +19,20 b' Indexing' | |||||
14 |
|
19 | |||
15 | To run the indexer you need to have an |authtoken| with admin rights to all |repos|. |
|
20 | To run the indexer you need to have an |authtoken| with admin rights to all |repos|. | |
16 |
|
21 | |||
17 |
To index |
|
22 | To index repositories stored in RhodeCode, you have the option to set the indexer up in a | |
18 | number of ways, for example: |
|
23 | number of ways, for example: | |
19 |
|
24 | |||
20 | * Call the indexer via a cron job. We recommend running this once at night. |
|
25 | * Call the indexer via a cron job. We recommend running this once at night. | |
21 | In case you need everything indexed immediately it's possible to index few |
|
26 | In case you need everything indexed immediately it's possible to index few | |
22 | times during the day. |
|
27 | times during the day. Indexer has a special locking mechanism that won't allow | |
|
28 | two instances of indexer running at once. It's safe to run it even every 1hr. | |||
23 | * Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle. |
|
29 | * Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle. | |
24 | * Hook the indexer up with your CI server to reindex after each push. |
|
30 | * Hook the indexer up with your CI server to reindex after each push. | |
25 |
|
31 | |||
26 |
The indexer works by indexing new commits added since the last run |
|
32 | The indexer works by indexing new commits added since the last run, and comparing | |
27 | wish to build a brand new index from scratch each time, |
|
33 | file changes to index only new or modified files. | |
28 | use the ``force`` option in the configuration file. |
|
34 | If you wish to build a brand new index from scratch each time, use the ``force`` | |
|
35 | option in the configuration file, or run it with --force flag. | |||
29 |
|
36 | |||
30 | .. important:: |
|
37 | .. important:: | |
31 |
|
38 | |||
@@ -48,7 +55,7 b' Configure the ``.rhoderc`` File' | |||||
48 |
|
55 | |||
49 | Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of |
|
56 | Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of | |
50 | executing with `--instance-name=enterprise-1` execute providing the host and token |
|
57 | executing with `--instance-name=enterprise-1` execute providing the host and token | |
51 |
directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth |
|
58 | directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth-token-goes-here>` | |
52 |
|
59 | |||
53 |
|
60 | |||
54 | |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details |
|
61 | |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details | |
@@ -89,14 +96,15 b' Run the indexer using the following comm' | |||||
89 |
|
96 | |||
90 | .. code-block:: bash |
|
97 | .. code-block:: bash | |
91 |
|
98 | |||
92 |
# Using default |
|
99 | # Using default simples indexing of all repositories | |
93 | $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ |
|
100 | $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ | |
94 | --instance-name=enterprise-1 |
|
101 | --instance-name=enterprise-1 | |
95 |
|
102 | |||
96 | # Using a custom mapping file |
|
103 | # Using a custom mapping file with indexing rules, and using elasticsearch 6 backend | |
97 | $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ |
|
104 | $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ | |
98 | --instance-name=enterprise-1 \ |
|
105 | --instance-name=enterprise-1 \ | |
99 | --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini |
|
106 | --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini \ | |
|
107 | --es-version=6 --engine-location=http://elasticsearch-host:9200 | |||
100 |
|
108 | |||
101 | # Using a custom mapping file and invocation without ``.rhoderc`` |
|
109 | # Using a custom mapping file and invocation without ``.rhoderc`` | |
102 | $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ |
|
110 | $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \ | |
@@ -170,6 +178,8 b' Removing repositories from index' | |||||
170 | ++++++++++++++++++++++++++++++++ |
|
178 | ++++++++++++++++++++++++++++++++ | |
171 |
|
179 | |||
172 | The indexer automatically removes renamed repositories and builds index for new names. |
|
180 | The indexer automatically removes renamed repositories and builds index for new names. | |
|
181 | In the same way if a listed repository in mapping.ini is not reported existing by the | |||
|
182 | server it's removed from the index. | |||
173 | In case that you wish to remove indexed repository manually such call would allow that:: |
|
183 | In case that you wish to remove indexed repository manually such call would allow that:: | |
174 |
|
184 | |||
175 | rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver |
|
185 | rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver | |
@@ -180,7 +190,7 b' Using search_mapping.ini file for advanc' | |||||
180 |
|
190 | |||
181 | By default rhodecode-index runs for all repositories, all files with parsing limits |
|
191 | By default rhodecode-index runs for all repositories, all files with parsing limits | |
182 | defined by the CLI default arguments. You can change those limits by calling with |
|
192 | defined by the CLI default arguments. You can change those limits by calling with | |
183 |
different flags such as `--max-filesize |
|
193 | different flags such as `--max-filesize=2048kb` or `--repo-limit=10` | |
184 |
|
194 | |||
185 | For more advanced execution logic it's possible to use a configuration file that |
|
195 | For more advanced execution logic it's possible to use a configuration file that | |
186 | would define detailed rules which repositories and how should be indexed. |
|
196 | would define detailed rules which repositories and how should be indexed. | |
@@ -221,7 +231,7 b" Here's a detailed example of using :file" | |||||
221 | ; limit is 1000, on the first run it will process commits 0-1000 and on the |
|
231 | ; limit is 1000, on the first run it will process commits 0-1000 and on the | |
222 | ; second 1000-2000 commits. Help reduce memory usage, default is 50000 |
|
232 | ; second 1000-2000 commits. Help reduce memory usage, default is 50000 | |
223 | ; (set -1 for unlimited) |
|
233 | ; (set -1 for unlimited) | |
224 |
commit_process_limit = |
|
234 | commit_process_limit = 20000 | |
225 |
|
235 | |||
226 | ; Limit of how many repositories each run can process, default is -1 (unlimited) |
|
236 | ; Limit of how many repositories each run can process, default is -1 (unlimited) | |
227 | ; in case of 1000s of repositories it's better to execute in chunks to not overload |
|
237 | ; in case of 1000s of repositories it's better to execute in chunks to not overload | |
@@ -237,7 +247,7 b" Here's a detailed example of using :file" | |||||
237 |
|
247 | |||
238 | ; Do not add to index those comma separated files, this excludes |
|
248 | ; Do not add to index those comma separated files, this excludes | |
239 | ; both search by name and content; globs syntax |
|
249 | ; both search by name and content; globs syntax | |
240 | ; e.g index_files = *.key, *.sql, *.xml |
|
250 | ; e.g index_files = *.key, *.sql, *.xml, *.pem, *.crt | |
241 | skip_files = , |
|
251 | skip_files = , | |
242 |
|
252 | |||
243 | ; Add to index content of those comma separated files; globs syntax |
|
253 | ; Add to index content of those comma separated files; globs syntax | |
@@ -245,7 +255,8 b" Here's a detailed example of using :file" | |||||
245 | index_files_content = *, |
|
255 | index_files_content = *, | |
246 |
|
256 | |||
247 | ; Do not add to index content of those comma separated files; globs syntax |
|
257 | ; Do not add to index content of those comma separated files; globs syntax | |
248 | ; e.g index_files = *.exe, *.bin, *.log, *.dump |
|
258 | ; Binary files are not indexed by default. | |
|
259 | ; e.g index_files = *.min.js, *.xml, *.dump, *.log, *.dump | |||
249 | skip_files_content = , |
|
260 | skip_files_content = , | |
250 |
|
261 | |||
251 | ; Force rebuilding an index from scratch. Each repository will be rebuild from |
|
262 | ; Force rebuilding an index from scratch. Each repository will be rebuild from | |
@@ -255,7 +266,7 b" Here's a detailed example of using :file" | |||||
255 | ; maximum file size that indexer will use, files above that limit are not going |
|
266 | ; maximum file size that indexer will use, files above that limit are not going | |
256 | ; to have they content indexed. |
|
267 | ; to have they content indexed. | |
257 | ; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB |
|
268 | ; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB | |
258 |
max_filesize = |
|
269 | max_filesize = 10MB | |
259 |
|
270 | |||
260 |
|
271 | |||
261 | [__INDEX_RULES__] |
|
272 | [__INDEX_RULES__] | |
@@ -272,17 +283,6 b" Here's a detailed example of using :file" | |||||
272 | ; This will index all repositories under upstream/*, but skip upstream/binary_repo |
|
283 | ; This will index all repositories under upstream/*, but skip upstream/binary_repo | |
273 | ; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches |
|
284 | ; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches | |
274 |
|
285 | |||
275 | ; Another example: |
|
|||
276 | ; *-fork = 0 |
|
|||
277 | ; * = 1 |
|
|||
278 | ; This will index all repositories, except those that have -fork as suffix. |
|
|||
279 |
|
||||
280 | rhodecode-vcsserver = 1 |
|
|||
281 | rhodecode-enterprise-ce = 1 |
|
|||
282 | upstream/mozilla/firefox-repo = 0 |
|
|||
283 | upstream/git-binaries = 0 |
|
|||
284 | upstream/* = 1 |
|
|||
285 | * = 0 |
|
|||
286 |
|
286 | |||
287 | ; == EXPLICIT REPOSITORY INDEXING == |
|
287 | ; == EXPLICIT REPOSITORY INDEXING == | |
288 | ; If defined this will skip using __INDEX_RULES__, and will not use API to fetch |
|
288 | ; If defined this will skip using __INDEX_RULES__, and will not use API to fetch | |
@@ -294,36 +294,32 b" Here's a detailed example of using :file" | |||||
294 | ; == PER REPOSITORY CONFIGURATION == |
|
294 | ; == PER REPOSITORY CONFIGURATION == | |
295 | ; This allows overriding the global configuration per repository. |
|
295 | ; This allows overriding the global configuration per repository. | |
296 | ; example to set specific file limit, and skip certain files for repository special-repo |
|
296 | ; example to set specific file limit, and skip certain files for repository special-repo | |
|
297 | ; the CLI flags doesn't override the conf settings. | |||
297 | ; [conf:special-repo] |
|
298 | ; [conf:special-repo] | |
298 | ; max_filesize = 5mb |
|
299 | ; max_filesize = 5mb | |
299 | ; skip_files = *.xml, *.sql |
|
300 | ; skip_files = *.xml, *.sql | |
300 | ; index_types = files, |
|
|||
301 |
|
301 | |||
302 | [conf:rhodecode-vcsserver] |
|
|||
303 | index_types = files, |
|
|||
304 | max_filesize = 5mb |
|
|||
305 | skip_files = *.xml, *.sql |
|
|||
306 | index_files = *.py, *.c, *.h, *.js |
|
|||
307 |
|
302 | |||
308 |
|
303 | |||
309 | In case of 1000s of repositories it can be tricky to write the include/exclude rules at first. |
|
304 | In case of 1000s of repositories it can be tricky to write the include/exclude rules at first. | |
310 | There's a special flag to test the mapping file rules and list repositories that would |
|
305 | There's a special flag to test the mapping file rules and list repositories that would | |
311 |
be indexed. Run the indexer with `--show-matched-repos` to list only the |
|
306 | be indexed. Run the indexer with `--show-matched-repos` to list only the | |
|
307 | match repositories defined in .ini file rules:: | |||
312 |
|
308 | |||
313 | rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini |
|
309 | rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini | |
314 |
|
310 | |||
315 |
|
311 | |||
316 | .. _enable-elasticsearch: |
|
312 | .. _enable-elasticsearch: | |
317 |
|
313 | |||
318 |
Enabling Elastic |
|
314 | Enabling ElasticSearch | |
319 | ^^^^^^^^^^^^^^^^^^^^^^ |
|
315 | ^^^^^^^^^^^^^^^^^^^^^^ | |
320 |
|
316 | |||
321 |
Elastic |
|
317 | ElasticSearch is available in EE edition only. It provides much scalable and more advanced | |
322 |
search capabilities. While Whoosh is fine for upto 1-2GB of data beyond that amount |
|
318 | search capabilities. While Whoosh is fine for upto 1-2GB of data, beyond that amount it | |
323 |
|
|
319 | starts slowing down, and can cause other problems. | |
324 | much more advanced query language allowing advanced filtering by file paths, extensions |
|
320 | New ElasticSearch 6 also provides much more advanced query language. | |
325 | OR statements, ranges etc. Please check query language examples in the search field for |
|
321 | It allows advanced filtering by file paths, extensions, use OR statements, ranges etc. | |
326 | some advanced query language usage. |
|
322 | Please check query language examples in the search field for some advanced query language usage. | |
327 |
|
323 | |||
328 |
|
324 | |||
329 | 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The |
|
325 | 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The | |
@@ -349,15 +345,15 b' and change it to:' | |||||
349 | ## specify Elastic Search version, 6 for latest or 2 for legacy |
|
345 | ## specify Elastic Search version, 6 for latest or 2 for legacy | |
350 | search.es_version = 6 |
|
346 | search.es_version = 6 | |
351 |
|
347 | |||
352 |
where ``search.location`` points to the |
|
348 | where ``search.location`` points to the ElasticSearch server | |
353 | by default running on port 9200. |
|
349 | by default running on port 9200. | |
354 |
|
350 | |||
355 | Index invocation also needs change. Please provide --es-version= and |
|
351 | Index invocation also needs change. Please provide --es-version= and | |
356 |
--engine-location= parameters to define |
|
352 | --engine-location= parameters to define ElasticSearch server location and it's version. | |
357 | For example:: |
|
353 | For example:: | |
358 |
|
354 | |||
359 | rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200 |
|
355 | rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200 | |
360 |
|
356 | |||
361 |
|
357 | |||
362 | .. _Whoosh: https://pypi.python.org/pypi/Whoosh/ |
|
358 | .. _Whoosh: https://pypi.python.org/pypi/Whoosh/ | |
363 |
.. _Elastic |
|
359 | .. _ElasticSearch 6: https://www.elastic.co/ |
General Comments 0
You need to be logged in to leave comments.
Login now