##// END OF EJS Templates
docs: update full text search indexing documentation
marcink -
r3400:2aa02c12 default
parent child Browse files
Show More
@@ -116,7 +116,7 b' Full-text Search Backup'
116 116
117 117 You may also have full text search set up, but the index can be rebuild from
118 118 re-imported |repos| if necessary. You will most likely want to backup your
119 :file:`mapping.ini` file if you've configured that. For more information, see
119 :file:`search_mapping.ini` file if you've configured that. For more information, see
120 120 the :ref:`indexing-ref` section.
121 121
122 122 Restoration Steps
@@ -140,7 +140,7 b' Post Restoration Steps'
140 140 Once you have restored your |RCE| instance to basic functionality, you can
141 141 then work on restoring any specific setup changes you had made.
142 142
143 * To recreate the |RCE| index, use the backed up :file:`mapping.ini` file if
143 * To recreate the |RCE| index, use the backed up :file:`search_mapping.ini` file if
144 144 you had made changes and rerun the indexer. See the
145 145 :ref:`indexing-ref` section for details.
146 146 * To reconfigure any extensions, copy the backed up extensions into the
@@ -23,9 +23,9 b' sections.'
23 23 * :ref:`increase-gunicorn`
24 24 * :ref:`x-frame`
25 25
26 \- **mapping.ini**
26 \- **search_mapping.ini**
27 27 Default location:
28 :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`
28 :file:`/home/{user}/.rccontrol/{instance-id}/search_mapping.ini`
29 29
30 30 This file is used to control the |RCE| indexer. It comes configured
31 31 to index your instance. To change the default configuration, see
@@ -6,22 +6,21 b' Full-text Search'
6 6 By default RhodeCode is configured to use `Whoosh`_ to index |repos| and
7 7 provide full-text search.
8 8
9 |RCE| also provides support for `Elasticsearch`_ as a backend for scalable
10 search. See :ref:`enable-elasticsearch` for details.
9 |RCE| also provides support for `Elasticsearch 6`_ as a backend more for advanced
10 and scalable search. See :ref:`enable-elasticsearch` for details.
11 11
12 12 Indexing
13 13 ^^^^^^^^
14 14
15 To run the indexer you need to use an |authtoken| with admin rights to all
16 |repos|.
15 To run the indexer you need to have an |authtoken| with admin rights to all |repos|.
17 16
18 17 To index new content added, you have the option to set the indexer up in a
19 18 number of ways, for example:
20 19
21 * Call the indexer via a cron job. We recommend running this nightly,
22 unless you need everything indexed immediately.
23 * Set the indexer to infinitely loop and reindex as soon as it has run its
24 cycle.
20 * Call the indexer via a cron job. We recommend running this once at night.
21 In case you need everything indexed immediately it's possible to index few
22 times during the day.
23 * Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.
25 24 * Hook the indexer up with your CI server to reindex after each push.
26 25
27 26 The indexer works by indexing new commits added since the last run. If you
@@ -31,7 +30,7 b' use the ``force`` option in the configur'
31 30 .. important::
32 31
33 32 You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|
34 3.5.0 they are installed by default.
33 3.5.0 they are installed by default and available with community/enterprise installations.
35 34
36 35 To set up indexing, use the following steps:
37 36
@@ -45,6 +44,13 b' 4. :ref:`advanced-indexing`'
45 44 Configure the ``.rhoderc`` File
46 45 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
47 46
47 .. note::
48
49 Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of
50 executing with `--instance-name=enterprise-1` execute providing the host and token
51 directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth token goes here>
52
53
48 54 |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
49 55 to |RCE| instances. If this file is not automatically created,
50 56 you can configure it using the following example. You need to configure the
@@ -56,11 +62,11 b' details for each instance you want to in'
56 62 # of the instance you want to index
57 63 $ rccontrol status
58 64
59 - NAME: enterprise-1
60 - STATUS: RUNNING
61 - TYPE: Momentum
62 - VERSION: 1.5.0
63 - URL: http://127.0.0.1:10000
65 - NAME: enterprise-1
66 - STATUS: RUNNING
67 - TYPE: Enterprise
68 - VERSION: 4.1.0
69 - URL: http://127.0.0.1:10003
64 70
65 71 To get your API Token, on the |RCE| interface go to
66 72 :menuselection:`username --> My Account --> Auth tokens`
@@ -72,21 +78,17 b' To get your API Token, on the |RCE| inte'
72 78 [instance:enterprise-1]
73 79 api_host = http://127.0.0.1:10000
74 80 api_key = <auth token goes here>
75 repo_dir = /home/<username>/repos
81
76 82
77 83 .. _run-index:
78 84
79 85 Run the Indexer
80 86 ^^^^^^^^^^^^^^^
81 87
82 Run the indexer using the following command, and specify the instance you
83 want to index:
88 Run the indexer using the following command, and specify the instance you want to index:
84 89
85 90 .. code-block:: bash
86 91
87 # From inside a virtualevv
88 (venv)$ rhodecode-index --instance-name=enterprise-1
89
90 92 # Using default installation
91 93 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
92 94 --instance-name=enterprise-1
@@ -94,7 +96,16 b' want to index:'
94 96 # Using a custom mapping file
95 97 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
96 98 --instance-name=enterprise-1 \
97 --mapping=/home/user/.rccontrol/enterprise-1/mapping.ini
99 --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini
100
101 # Using a custom mapping file and invocation without ``.rhoderc``
102 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
103 --api-host=http://rhodecodecode.myserver.com --api-key=xxxxx \
104 --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini
105
106 # From inside a virtualev on your local machine or CI server.
107 (venv)$ rhodecode-index --instance-name=enterprise-1
108
98 109
99 110 .. note::
100 111
@@ -136,119 +147,185 b' 3. Save the file.'
136 147 # using a specially configured mapping file
137 148 */15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
138 149 --instance-name=enterprise-4 \
139 --mapping=/home/user/.rccontrol/enterprise-4/mapping.ini
150 --mapping=/home/user/.rccontrol/enterprise-4/search_mapping.ini
140 151
141 152 .. _advanced-indexing:
142 153
143 154 Advanced Indexing
144 155 ^^^^^^^^^^^^^^^^^
145 156
146 |RCT| indexes based on the :file:`mapping.ini` file. To configure your index,
147 you can specify different options in this file. The default location is:
157
158 Force Re-Indexing single repository
159 +++++++++++++++++++++++++++++++++++
160
161 Often it's required to re-index whole repository because of some repository changes,
162 or to remove some indexed secrets, or files. There's a special `--repo-name=` flag
163 for the indexer that limits execution to a single repository. For example to force-reindex
164 single repository such call can be made::
165
166 rhodecode-index --instance-name=enterprise-1 --force --repo-name=rhodecode-vcsserver
167
168
169 Removing repositories from index
170 ++++++++++++++++++++++++++++++++
171
172 The indexer automatically removes renamed repositories and builds index for new names.
173 In case that you wish to remove indexed repository manually such call would allow that::
148 174
149 * :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`, using default
150 |RCT|.
175 rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver
176
177
178 Using search_mapping.ini file for advanced index rules
179 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
180
181 By default rhodecode-index runs for all repositories, all files with parsing limits
182 defined by the CLI default arguments. You can change those limits by calling with
183 different flags such as `--max-filesize 2048kb` or `--repo-limit 10`
184
185 For more advanced execution logic it's possible to use a configuration file that
186 would define detailed rules which repositories and how should be indexed.
187
188 |RCT| provides an example index configuration file called :file:`search_mapping.ini`.
189 This file is created by default during installation and is located at:
190
191 * :file:`/home/{user}/.rccontrol/{instance-id}/search_mapping.ini`, using default |RCT|.
151 192 * :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,
152 193 when using ``virtualenv``.
153 194
154 195 .. note::
155 196
156 If you need to create the :file:`mapping.ini` file, use the |RCT|
157 ``rhodecode-index --create-mapping path/to/file`` API call. For details,
158 see the :ref:`tools-cli` section.
159
160 The indexer runs in a random order to prevent a failing |repo| from stopping
161 a build. To configure different indexing scenarios, set the following options
162 inside the :file:`mapping.ini` and specify the altered file using the
163 ``--mapping`` option.
197 If you need to create the :file:`search_mapping.ini` file manually, use the |RCT|
198 ``rhodecode-index --create-mapping path/to/search_mapping.ini`` API call.
199 For details, see the :ref:`tools-cli` section.
164 200
165 * ``index_files`` : Index the specified file types.
166 * ``skip_files`` : Do not index the specified file types.
167 * ``index_files_content`` : Index the content of the specified file types.
168 * ``skip_files_content`` : Do not index the content of the specified files.
169 * ``force`` : Create a fresh index on each run.
170 * ``max_filesize`` : Files larger than the set size will not be indexed.
171 * ``commit_parse_limit`` : Set the batch size when indexing commit messages.
172 Set to a lower number to lessen memory load.
173 * ``repo_limit`` : Set the maximum number or |repos| indexed per run.
174 * ``[INCLUDE]`` : Set |repos| you want indexed. This takes precedent over
175 ``[EXCLUDE]``.
176 * ``[EXCLUDE]`` : Set |repos| you do not want indexed. Exclude can be used to
177 not index branches, forks, or log |repos|.
201 To Run the indexer with mapping file provide it using `--mapping` flag::
178 202
179 At the end of the file you can specify conditions for specific |repos| that
180 will override the default values. To configure your indexer,
181 use the following example :file:`mapping.ini` file.
203 rhodecode-index --instance-name=enterprise-1 --mapping=/my/path/search_mapping.ini
204
205
206 Here's a detailed example of using :file:`search_mapping.ini` file.
182 207
183 208 .. code-block:: ini
184 209
185 210 [__DEFAULT__]
186 # default patterns for indexing files and content of files.
187 # Binary files are skipped by default.
211 ; Create index on commits data, and files data in this order. Available options
212 ; are `commits`, `files`
213 index_types = commits,files
214
215 ; Commit fetch limit. In what amount of chunks commits should be fetched
216 ; via api and parsed. This allows server to transfer smaller chunks and be less loaded
217 commit_fetch_limit = 1000
188 218
189 # Index python and markdown files
190 index_files = *.py, *.md
219 ; Commit process limit. Limit the number of commits indexer should fetch, and
220 ; store inside the full text search index. eg. if repo has 2000 commits, and
221 ; limit is 1000, on the first run it will process commits 0-1000 and on the
222 ; second 1000-2000 commits. Help reduce memory usage, default is 50000
223 ; (set -1 for unlimited)
224 commit_process_limit = 50000
191 225
192 # Do not index these file types
193 skip_files = *.svg, *.log, *.dump, *.txt
226 ; Limit of how many repositories each run can process, default is -1 (unlimited)
227 ; in case of 1000s of repositories it's better to execute in chunks to not overload
228 ; the server.
229 repo_limit = -1
194 230
195 # Index both file types and their content
196 index_files_content = *.cpp, *.ini, *.py
231 ; Default patterns for indexing files and content of files. Binary files
232 ; are skipped by default.
233
234 ; Add to index those comma separated files; globs syntax
235 ; e.g index_files = *.py, *.c, *.h, *.js
236 index_files = *,
197 237
198 # Index file names, but not file content
199 skip_files_content = *.svg,
238 ; Do not add to index those comma separated files, this excludes
239 ; both search by name and content; globs syntax
240 ; e.g index_files = *.key, *.sql, *.xml
241 skip_files = ,
200 242
201 # Force rebuilding an index from scratch. Each repository will be rebuild
202 # from scratch with a global flag. Use local flag to rebuild single repos
243 ; Add to index content of those comma separated files; globs syntax
244 ; e.g index_files = *.h, *.obj
245 index_files_content = *,
246
247 ; Do not add to index content of those comma separated files; globs syntax
248 ; e.g index_files = *.exe, *.bin, *.log, *.dump
249 skip_files_content = ,
250
251 ; Force rebuilding an index from scratch. Each repository will be rebuild from
252 ; scratch with a global flag. Use --repo-name=NAME --force to rebuild single repo
203 253 force = false
204 254
205 # Do not index files larger than 385KB
206 max_filesize = 385KB
255 ; maximum file size that indexer will use, files above that limit are not going
256 ; to have they content indexed.
257 ; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB
258 max_filesize = 2MB
207 259
208 # Limit commit indexing to 500 per batch
209 commit_parse_limit = 500
210
211 # Limit each index run to 25 repos
212 repo_limit = 25
213 260
214 # __INCLUDE__ is more important that __EXCLUDE__.
215
216 [__INCLUDE__]
217 # Include all repos with these names
261 [__INDEX_RULES__]
262 ; Ordered match rules for repositories. A list of all repositories will be fetched
263 ; using API and this list will be filtered using those rules.
264 ; Syntax for entry: `glob_pattern_OR_full_repo_name = 0 OR 1` where 0=exclude, 1=include
265 ; When this ordered list is traversed first match will return the include/exclude marker
266 ; For example:
267 ; upstream/binary_repo = 0
268 ; upstream/subrepo/xml_files = 0
269 ; upstream/* = 1
270 ; special-repo = 1
271 ; * = 0
272 ; This will index all repositories under upstream/*, but skip upstream/binary_repo
273 ; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches
218 274
219 docs/* = 1
220 lib/* = 1
221
222 [__EXCLUDE__]
223 # Do not include the following repo in index
275 ; Another example:
276 ; *-fork = 0
277 ; * = 1
278 ; This will index all repositories, except those that have -fork as suffix.
224 279
225 dev-docs/* = 1
226 legacy-repos/* = 1
227 *-dev/* = 1
280 rhodecode-vcsserver = 1
281 rhodecode-enterprise-ce = 1
282 upstream/mozilla/firefox-repo = 0
283 upstream/git-binaries = 0
284 upstream/* = 1
285 * = 0
228 286
229 # Each repo that needs special indexing is a separate section below.
230 # In each section set the options to override the global configuration
231 # parameters above.
232 # If special settings are not configured, the global configuration values
233 # above are inherited. If no special repositories are
234 # defined here RhodeCode will use the API to ask for all repositories
287 ; == EXPLICIT REPOSITORY INDEXING ==
288 ; If defined this will skip using __INDEX_RULES__, and will not use API to fetch
289 ; list of repositories, it will explicitly take names defined with [NAME] format and
290 ; try to build the index, to build index just for repo_name_1 and special-repo use:
291 ; [repo_name_1]
292 ; [special-repo]
235 293
236 # For this repo use different settings
237 [special-repo]
238 commit_parse_limit = 20,
239 skip_files = *.idea, *.xml,
294 ; == PER REPOSITORY CONFIGURATION ==
295 ; This allows overriding the global configuration per repository.
296 ; example to set specific file limit, and skip certain files for repository special-repo
297 ; [conf:special-repo]
298 ; max_filesize = 5mb
299 ; skip_files = *.xml, *.sql
300 ; index_types = files,
240 301
241 # For another repo use different settings
242 [another-special-repo]
243 index_files = *,
244 max_filesize = 800MB
245 commit_parse_limit = 20000
302 [conf:rhodecode-vcsserver]
303 index_types = files,
304 max_filesize = 5mb
305 skip_files = *.xml, *.sql
306 index_files = *.py, *.c, *.h, *.js
307
308
309 In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.
310 There's a special flag to test the mapping file rules and list repositories that would
311 be indexed. Run the indexer with `--show-matched-repos` to list only the match rules::
312
313 rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini
314
246 315
247 316 .. _enable-elasticsearch:
248 317
249 318 Enabling Elasticsearch
250 319 ^^^^^^^^^^^^^^^^^^^^^^
251 320
321 Elasticsearch is available in EE edition only. It provides much scalable and more advanced
322 search capabilities. While Whoosh is fine for upto 1-2GB of data beyond that amount of
323 data it starts slowing down, and can cause other problems. Elasticsearch 6 also provides
324 much more advanced query language allowing advanced filtering by file paths, extensions
325 OR statements, ranges etc. Please check query language examples in the search field for
326 some advanced query language usage.
327
328
252 329 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
253 330 default location is
254 331 :file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
@@ -268,9 +345,19 b' and change it to:'
268 345 .. code-block:: ini
269 346
270 347 search.module = rc_elasticsearch
271 search.location = http://localhost:9200/
348 search.location = http://localhost:9200
349 ## specify Elastic Search version, 6 for latest or 2 for legacy
350 search.es_version = 6
351
352 where ``search.location`` points to the elasticsearch server
353 by default running on port 9200.
272 354
273 where ``search.location`` points to the elasticsearch server.
355 Index invocation also needs change. Please provide --es-version= and
356 --engine-location= parameters to define elasticsearch server location and it's version.
357 For example::
358
359 rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200
360
274 361
275 362 .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
276 .. _Elasticsearch: https://www.elastic.co/ No newline at end of file
363 .. _Elasticsearch 6: https://www.elastic.co/ No newline at end of file
@@ -78,7 +78,7 b' Configuration Files'
78 78 -------------------
79 79
80 80 * :file:`/home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
81 * :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`
81 * :file:`/home/{user}/.rccontrol/{instance-id}/search_mapping.ini`
82 82 * :file:`/home/{user}/.rccontrol/{vcsserver-id}/vcsserver.ini`
83 83 * :file:`/home/{user}/.rccontrol/supervisor/supervisord.ini`
84 84 * :file:`/home/{user}/.rccontrol.ini`
@@ -516,7 +516,7 b' Example usage:'
516 516 $ ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
517 517 --instance-name=enterprise-4
518 518
519 # Run indexer based on mapping.ini file
519 # Run indexer based on search_mapping.ini file
520 520 # This is using pre-350 virtualenv
521 521 (venv)$ rhodecode-index --instance-name=enterprise-1
522 522
@@ -527,7 +527,7 b' Example usage:'
527 527
528 528 # Create the indexing mapping file
529 529 $ ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
530 --create-mapping mapping.ini --instance-name=enterprise-4
530 --create-mapping search_mapping.ini --instance-name=enterprise-4
531 531
532 532 .. _tools-rhodecode-list-instance:
533 533
General Comments 0
You need to be logged in to leave comments. Login now