##// END OF EJS Templates
docs: update full text search indexing documentation
marcink -
r3400:2aa02c12 default
parent child Browse files
Show More
@@ -116,7 +116,7 b' Full-text Search Backup'
116
116
117 You may also have full text search set up, but the index can be rebuild from
117 You may also have full text search set up, but the index can be rebuild from
118 re-imported |repos| if necessary. You will most likely want to backup your
118 re-imported |repos| if necessary. You will most likely want to backup your
119 :file:`mapping.ini` file if you've configured that. For more information, see
119 :file:`search_mapping.ini` file if you've configured that. For more information, see
120 the :ref:`indexing-ref` section.
120 the :ref:`indexing-ref` section.
121
121
122 Restoration Steps
122 Restoration Steps
@@ -140,7 +140,7 b' Post Restoration Steps'
140 Once you have restored your |RCE| instance to basic functionality, you can
140 Once you have restored your |RCE| instance to basic functionality, you can
141 then work on restoring any specific setup changes you had made.
141 then work on restoring any specific setup changes you had made.
142
142
143 * To recreate the |RCE| index, use the backed up :file:`mapping.ini` file if
143 * To recreate the |RCE| index, use the backed up :file:`search_mapping.ini` file if
144 you had made changes and rerun the indexer. See the
144 you had made changes and rerun the indexer. See the
145 :ref:`indexing-ref` section for details.
145 :ref:`indexing-ref` section for details.
146 * To reconfigure any extensions, copy the backed up extensions into the
146 * To reconfigure any extensions, copy the backed up extensions into the
@@ -23,9 +23,9 b' sections.'
23 * :ref:`increase-gunicorn`
23 * :ref:`increase-gunicorn`
24 * :ref:`x-frame`
24 * :ref:`x-frame`
25
25
26 \- **mapping.ini**
26 \- **search_mapping.ini**
27 Default location:
27 Default location:
28 :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`
28 :file:`/home/{user}/.rccontrol/{instance-id}/search_mapping.ini`
29
29
30 This file is used to control the |RCE| indexer. It comes configured
30 This file is used to control the |RCE| indexer. It comes configured
31 to index your instance. To change the default configuration, see
31 to index your instance. To change the default configuration, see
@@ -6,22 +6,21 b' Full-text Search'
6 By default RhodeCode is configured to use `Whoosh`_ to index |repos| and
6 By default RhodeCode is configured to use `Whoosh`_ to index |repos| and
7 provide full-text search.
7 provide full-text search.
8
8
9 |RCE| also provides support for `Elasticsearch`_ as a backend for scalable
9 |RCE| also provides support for `Elasticsearch 6`_ as a backend more for advanced
10 search. See :ref:`enable-elasticsearch` for details.
10 and scalable search. See :ref:`enable-elasticsearch` for details.
11
11
12 Indexing
12 Indexing
13 ^^^^^^^^
13 ^^^^^^^^
14
14
15 To run the indexer you need to use an |authtoken| with admin rights to all
15 To run the indexer you need to have an |authtoken| with admin rights to all |repos|.
16 |repos|.
17
16
18 To index new content added, you have the option to set the indexer up in a
17 To index new content added, you have the option to set the indexer up in a
19 number of ways, for example:
18 number of ways, for example:
20
19
21 * Call the indexer via a cron job. We recommend running this nightly,
20 * Call the indexer via a cron job. We recommend running this once at night.
22 unless you need everything indexed immediately.
21 In case you need everything indexed immediately it's possible to index few
23 * Set the indexer to infinitely loop and reindex as soon as it has run its
22 times during the day.
24 cycle.
23 * Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.
25 * Hook the indexer up with your CI server to reindex after each push.
24 * Hook the indexer up with your CI server to reindex after each push.
26
25
27 The indexer works by indexing new commits added since the last run. If you
26 The indexer works by indexing new commits added since the last run. If you
@@ -31,7 +30,7 b' use the ``force`` option in the configur'
31 .. important::
30 .. important::
32
31
33 You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|
32 You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|
34 3.5.0 they are installed by default.
33 3.5.0 they are installed by default and available with community/enterprise installations.
35
34
36 To set up indexing, use the following steps:
35 To set up indexing, use the following steps:
37
36
@@ -45,6 +44,13 b' 4. :ref:`advanced-indexing`'
45 Configure the ``.rhoderc`` File
44 Configure the ``.rhoderc`` File
46 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
45 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
47
46
47 .. note::
48
49 Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of
50 executing with `--instance-name=enterprise-1` execute providing the host and token
51 directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth token goes here>
52
53
48 |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
54 |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
49 to |RCE| instances. If this file is not automatically created,
55 to |RCE| instances. If this file is not automatically created,
50 you can configure it using the following example. You need to configure the
56 you can configure it using the following example. You need to configure the
@@ -56,11 +62,11 b' details for each instance you want to in'
56 # of the instance you want to index
62 # of the instance you want to index
57 $ rccontrol status
63 $ rccontrol status
58
64
59 - NAME: enterprise-1
65 - NAME: enterprise-1
60 - STATUS: RUNNING
66 - STATUS: RUNNING
61 - TYPE: Momentum
67 - TYPE: Enterprise
62 - VERSION: 1.5.0
68 - VERSION: 4.1.0
63 - URL: http://127.0.0.1:10000
69 - URL: http://127.0.0.1:10003
64
70
65 To get your API Token, on the |RCE| interface go to
71 To get your API Token, on the |RCE| interface go to
66 :menuselection:`username --> My Account --> Auth tokens`
72 :menuselection:`username --> My Account --> Auth tokens`
@@ -72,21 +78,17 b' To get your API Token, on the |RCE| inte'
72 [instance:enterprise-1]
78 [instance:enterprise-1]
73 api_host = http://127.0.0.1:10000
79 api_host = http://127.0.0.1:10000
74 api_key = <auth token goes here>
80 api_key = <auth token goes here>
75 repo_dir = /home/<username>/repos
81
76
82
77 .. _run-index:
83 .. _run-index:
78
84
79 Run the Indexer
85 Run the Indexer
80 ^^^^^^^^^^^^^^^
86 ^^^^^^^^^^^^^^^
81
87
82 Run the indexer using the following command, and specify the instance you
88 Run the indexer using the following command, and specify the instance you want to index:
83 want to index:
84
89
85 .. code-block:: bash
90 .. code-block:: bash
86
91
87 # From inside a virtualevv
88 (venv)$ rhodecode-index --instance-name=enterprise-1
89
90 # Using default installation
92 # Using default installation
91 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
93 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
92 --instance-name=enterprise-1
94 --instance-name=enterprise-1
@@ -94,7 +96,16 b' want to index:'
94 # Using a custom mapping file
96 # Using a custom mapping file
95 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
97 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
96 --instance-name=enterprise-1 \
98 --instance-name=enterprise-1 \
97 --mapping=/home/user/.rccontrol/enterprise-1/mapping.ini
99 --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini
100
101 # Using a custom mapping file and invocation without ``.rhoderc``
102 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
103 --api-host=http://rhodecodecode.myserver.com --api-key=xxxxx \
104 --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini
105
106 # From inside a virtualev on your local machine or CI server.
107 (venv)$ rhodecode-index --instance-name=enterprise-1
108
98
109
99 .. note::
110 .. note::
100
111
@@ -136,119 +147,185 b' 3. Save the file.'
136 # using a specially configured mapping file
147 # using a specially configured mapping file
137 */15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
148 */15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
138 --instance-name=enterprise-4 \
149 --instance-name=enterprise-4 \
139 --mapping=/home/user/.rccontrol/enterprise-4/mapping.ini
150 --mapping=/home/user/.rccontrol/enterprise-4/search_mapping.ini
140
151
141 .. _advanced-indexing:
152 .. _advanced-indexing:
142
153
143 Advanced Indexing
154 Advanced Indexing
144 ^^^^^^^^^^^^^^^^^
155 ^^^^^^^^^^^^^^^^^
145
156
146 |RCT| indexes based on the :file:`mapping.ini` file. To configure your index,
157
147 you can specify different options in this file. The default location is:
158 Force Re-Indexing single repository
159 +++++++++++++++++++++++++++++++++++
160
161 Often it's required to re-index whole repository because of some repository changes,
162 or to remove some indexed secrets, or files. There's a special `--repo-name=` flag
163 for the indexer that limits execution to a single repository. For example to force-reindex
164 single repository such call can be made::
165
166 rhodecode-index --instance-name=enterprise-1 --force --repo-name=rhodecode-vcsserver
167
168
169 Removing repositories from index
170 ++++++++++++++++++++++++++++++++
171
172 The indexer automatically removes renamed repositories and builds index for new names.
173 In case that you wish to remove indexed repository manually such call would allow that::
148
174
149 * :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`, using default
175 rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver
150 |RCT|.
176
177
178 Using search_mapping.ini file for advanced index rules
179 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
180
181 By default rhodecode-index runs for all repositories, all files with parsing limits
182 defined by the CLI default arguments. You can change those limits by calling with
183 different flags such as `--max-filesize 2048kb` or `--repo-limit 10`
184
185 For more advanced execution logic it's possible to use a configuration file that
186 would define detailed rules which repositories and how should be indexed.
187
188 |RCT| provides an example index configuration file called :file:`search_mapping.ini`.
189 This file is created by default during installation and is located at:
190
191 * :file:`/home/{user}/.rccontrol/{instance-id}/search_mapping.ini`, using default |RCT|.
151 * :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,
192 * :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,
152 when using ``virtualenv``.
193 when using ``virtualenv``.
153
194
154 .. note::
195 .. note::
155
196
156 If you need to create the :file:`mapping.ini` file, use the |RCT|
197 If you need to create the :file:`search_mapping.ini` file manually, use the |RCT|
157 ``rhodecode-index --create-mapping path/to/file`` API call. For details,
198 ``rhodecode-index --create-mapping path/to/search_mapping.ini`` API call.
158 see the :ref:`tools-cli` section.
199 For details, see the :ref:`tools-cli` section.
159
160 The indexer runs in a random order to prevent a failing |repo| from stopping
161 a build. To configure different indexing scenarios, set the following options
162 inside the :file:`mapping.ini` and specify the altered file using the
163 ``--mapping`` option.
164
200
165 * ``index_files`` : Index the specified file types.
201 To Run the indexer with mapping file provide it using `--mapping` flag::
166 * ``skip_files`` : Do not index the specified file types.
167 * ``index_files_content`` : Index the content of the specified file types.
168 * ``skip_files_content`` : Do not index the content of the specified files.
169 * ``force`` : Create a fresh index on each run.
170 * ``max_filesize`` : Files larger than the set size will not be indexed.
171 * ``commit_parse_limit`` : Set the batch size when indexing commit messages.
172 Set to a lower number to lessen memory load.
173 * ``repo_limit`` : Set the maximum number or |repos| indexed per run.
174 * ``[INCLUDE]`` : Set |repos| you want indexed. This takes precedent over
175 ``[EXCLUDE]``.
176 * ``[EXCLUDE]`` : Set |repos| you do not want indexed. Exclude can be used to
177 not index branches, forks, or log |repos|.
178
202
179 At the end of the file you can specify conditions for specific |repos| that
203 rhodecode-index --instance-name=enterprise-1 --mapping=/my/path/search_mapping.ini
180 will override the default values. To configure your indexer,
204
181 use the following example :file:`mapping.ini` file.
205
206 Here's a detailed example of using :file:`search_mapping.ini` file.
182
207
183 .. code-block:: ini
208 .. code-block:: ini
184
209
185 [__DEFAULT__]
210 [__DEFAULT__]
186 # default patterns for indexing files and content of files.
211 ; Create index on commits data, and files data in this order. Available options
187 # Binary files are skipped by default.
212 ; are `commits`, `files`
213 index_types = commits,files
214
215 ; Commit fetch limit. In what amount of chunks commits should be fetched
216 ; via api and parsed. This allows server to transfer smaller chunks and be less loaded
217 commit_fetch_limit = 1000
188
218
189 # Index python and markdown files
219 ; Commit process limit. Limit the number of commits indexer should fetch, and
190 index_files = *.py, *.md
220 ; store inside the full text search index. eg. if repo has 2000 commits, and
221 ; limit is 1000, on the first run it will process commits 0-1000 and on the
222 ; second 1000-2000 commits. Help reduce memory usage, default is 50000
223 ; (set -1 for unlimited)
224 commit_process_limit = 50000
191
225
192 # Do not index these file types
226 ; Limit of how many repositories each run can process, default is -1 (unlimited)
193 skip_files = *.svg, *.log, *.dump, *.txt
227 ; in case of 1000s of repositories it's better to execute in chunks to not overload
228 ; the server.
229 repo_limit = -1
194
230
195 # Index both file types and their content
231 ; Default patterns for indexing files and content of files. Binary files
196 index_files_content = *.cpp, *.ini, *.py
232 ; are skipped by default.
233
234 ; Add to index those comma separated files; globs syntax
235 ; e.g index_files = *.py, *.c, *.h, *.js
236 index_files = *,
197
237
198 # Index file names, but not file content
238 ; Do not add to index those comma separated files, this excludes
199 skip_files_content = *.svg,
239 ; both search by name and content; globs syntax
240 ; e.g index_files = *.key, *.sql, *.xml
241 skip_files = ,
200
242
201 # Force rebuilding an index from scratch. Each repository will be rebuild
243 ; Add to index content of those comma separated files; globs syntax
202 # from scratch with a global flag. Use local flag to rebuild single repos
244 ; e.g index_files = *.h, *.obj
245 index_files_content = *,
246
247 ; Do not add to index content of those comma separated files; globs syntax
248 ; e.g index_files = *.exe, *.bin, *.log, *.dump
249 skip_files_content = ,
250
251 ; Force rebuilding an index from scratch. Each repository will be rebuild from
252 ; scratch with a global flag. Use --repo-name=NAME --force to rebuild single repo
203 force = false
253 force = false
204
254
205 # Do not index files larger than 385KB
255 ; maximum file size that indexer will use, files above that limit are not going
206 max_filesize = 385KB
256 ; to have they content indexed.
257 ; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB
258 max_filesize = 2MB
207
259
208 # Limit commit indexing to 500 per batch
209 commit_parse_limit = 500
210
211 # Limit each index run to 25 repos
212 repo_limit = 25
213
260
214 # __INCLUDE__ is more important that __EXCLUDE__.
261 [__INDEX_RULES__]
215
262 ; Ordered match rules for repositories. A list of all repositories will be fetched
216 [__INCLUDE__]
263 ; using API and this list will be filtered using those rules.
217 # Include all repos with these names
264 ; Syntax for entry: `glob_pattern_OR_full_repo_name = 0 OR 1` where 0=exclude, 1=include
265 ; When this ordered list is traversed first match will return the include/exclude marker
266 ; For example:
267 ; upstream/binary_repo = 0
268 ; upstream/subrepo/xml_files = 0
269 ; upstream/* = 1
270 ; special-repo = 1
271 ; * = 0
272 ; This will index all repositories under upstream/*, but skip upstream/binary_repo
273 ; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches
218
274
219 docs/* = 1
275 ; Another example:
220 lib/* = 1
276 ; *-fork = 0
221
277 ; * = 1
222 [__EXCLUDE__]
278 ; This will index all repositories, except those that have -fork as suffix.
223 # Do not include the following repo in index
224
279
225 dev-docs/* = 1
280 rhodecode-vcsserver = 1
226 legacy-repos/* = 1
281 rhodecode-enterprise-ce = 1
227 *-dev/* = 1
282 upstream/mozilla/firefox-repo = 0
283 upstream/git-binaries = 0
284 upstream/* = 1
285 * = 0
228
286
229 # Each repo that needs special indexing is a separate section below.
287 ; == EXPLICIT REPOSITORY INDEXING ==
230 # In each section set the options to override the global configuration
288 ; If defined this will skip using __INDEX_RULES__, and will not use API to fetch
231 # parameters above.
289 ; list of repositories, it will explicitly take names defined with [NAME] format and
232 # If special settings are not configured, the global configuration values
290 ; try to build the index, to build index just for repo_name_1 and special-repo use:
233 # above are inherited. If no special repositories are
291 ; [repo_name_1]
234 # defined here RhodeCode will use the API to ask for all repositories
292 ; [special-repo]
235
293
236 # For this repo use different settings
294 ; == PER REPOSITORY CONFIGURATION ==
237 [special-repo]
295 ; This allows overriding the global configuration per repository.
238 commit_parse_limit = 20,
296 ; example to set specific file limit, and skip certain files for repository special-repo
239 skip_files = *.idea, *.xml,
297 ; [conf:special-repo]
298 ; max_filesize = 5mb
299 ; skip_files = *.xml, *.sql
300 ; index_types = files,
240
301
241 # For another repo use different settings
302 [conf:rhodecode-vcsserver]
242 [another-special-repo]
303 index_types = files,
243 index_files = *,
304 max_filesize = 5mb
244 max_filesize = 800MB
305 skip_files = *.xml, *.sql
245 commit_parse_limit = 20000
306 index_files = *.py, *.c, *.h, *.js
307
308
309 In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.
310 There's a special flag to test the mapping file rules and list repositories that would
311 be indexed. Run the indexer with `--show-matched-repos` to list only the match rules::
312
313 rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini
314
246
315
247 .. _enable-elasticsearch:
316 .. _enable-elasticsearch:
248
317
249 Enabling Elasticsearch
318 Enabling Elasticsearch
250 ^^^^^^^^^^^^^^^^^^^^^^
319 ^^^^^^^^^^^^^^^^^^^^^^
251
320
321 Elasticsearch is available in EE edition only. It provides much scalable and more advanced
322 search capabilities. While Whoosh is fine for upto 1-2GB of data beyond that amount of
323 data it starts slowing down, and can cause other problems. Elasticsearch 6 also provides
324 much more advanced query language allowing advanced filtering by file paths, extensions
325 OR statements, ranges etc. Please check query language examples in the search field for
326 some advanced query language usage.
327
328
252 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
329 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
253 default location is
330 default location is
254 :file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
331 :file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
@@ -268,9 +345,19 b' and change it to:'
268 .. code-block:: ini
345 .. code-block:: ini
269
346
270 search.module = rc_elasticsearch
347 search.module = rc_elasticsearch
271 search.location = http://localhost:9200/
348 search.location = http://localhost:9200
349 ## specify Elastic Search version, 6 for latest or 2 for legacy
350 search.es_version = 6
351
352 where ``search.location`` points to the elasticsearch server
353 by default running on port 9200.
272
354
273 where ``search.location`` points to the elasticsearch server.
355 Index invocation also needs change. Please provide --es-version= and
356 --engine-location= parameters to define elasticsearch server location and it's version.
357 For example::
358
359 rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200
360
274
361
275 .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
362 .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
276 .. _Elasticsearch: https://www.elastic.co/ No newline at end of file
363 .. _Elasticsearch 6: https://www.elastic.co/ No newline at end of file
@@ -78,7 +78,7 b' Configuration Files'
78 -------------------
78 -------------------
79
79
80 * :file:`/home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
80 * :file:`/home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
81 * :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`
81 * :file:`/home/{user}/.rccontrol/{instance-id}/search_mapping.ini`
82 * :file:`/home/{user}/.rccontrol/{vcsserver-id}/vcsserver.ini`
82 * :file:`/home/{user}/.rccontrol/{vcsserver-id}/vcsserver.ini`
83 * :file:`/home/{user}/.rccontrol/supervisor/supervisord.ini`
83 * :file:`/home/{user}/.rccontrol/supervisor/supervisord.ini`
84 * :file:`/home/{user}/.rccontrol.ini`
84 * :file:`/home/{user}/.rccontrol.ini`
@@ -516,7 +516,7 b' Example usage:'
516 $ ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
516 $ ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
517 --instance-name=enterprise-4
517 --instance-name=enterprise-4
518
518
519 # Run indexer based on mapping.ini file
519 # Run indexer based on search_mapping.ini file
520 # This is using pre-350 virtualenv
520 # This is using pre-350 virtualenv
521 (venv)$ rhodecode-index --instance-name=enterprise-1
521 (venv)$ rhodecode-index --instance-name=enterprise-1
522
522
@@ -527,7 +527,7 b' Example usage:'
527
527
528 # Create the indexing mapping file
528 # Create the indexing mapping file
529 $ ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
529 $ ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
530 --create-mapping mapping.ini --instance-name=enterprise-4
530 --create-mapping search_mapping.ini --instance-name=enterprise-4
531
531
532 .. _tools-rhodecode-list-instance:
532 .. _tools-rhodecode-list-instance:
533
533
General Comments 0
You need to be logged in to leave comments. Login now