##// END OF EJS Templates
docs: added info about --optimize flag for full text search.
marcink -
r2949:9a63987a default
parent child Browse files
Show More
@@ -1,272 +1,276 b''
1 .. _indexing-ref:
1 .. _indexing-ref:
2
2
3 Full-text Search
3 Full-text Search
4 ----------------
4 ----------------
5
5
6 By default |RC| is configured to use `Whoosh`_ to index |repos| and
6 By default |RC| is configured to use `Whoosh`_ to index |repos| and
7 provide full-text search.
7 provide full-text search.
8
8
9 |RCE| also provides support for `Elasticsearch`_ as a backend for scalable
9 |RCE| also provides support for `Elasticsearch`_ as a backend for scalable
10 search. See :ref:`enable-elasticsearch` for details.
10 search. See :ref:`enable-elasticsearch` for details.
11
11
12 Indexing
12 Indexing
13 ^^^^^^^^
13 ^^^^^^^^
14
14
15 To run the indexer you need to use an |authtoken| with admin rights to all
15 To run the indexer you need to use an |authtoken| with admin rights to all
16 |repos|.
16 |repos|.
17
17
18 To index new content added, you have the option to set the indexer up in a
18 To index new content added, you have the option to set the indexer up in a
19 number of ways, for example:
19 number of ways, for example:
20
20
21 * Call the indexer via a cron job. We recommend running this nightly,
21 * Call the indexer via a cron job. We recommend running this nightly,
22 unless you need everything indexed immediately.
22 unless you need everything indexed immediately.
23 * Set the indexer to infinitely loop and reindex as soon as it has run its
23 * Set the indexer to infinitely loop and reindex as soon as it has run its
24 cycle.
24 cycle.
25 * Hook the indexer up with your CI server to reindex after each push.
25 * Hook the indexer up with your CI server to reindex after each push.
26
26
27 The indexer works by indexing new commits added since the last run. If you
27 The indexer works by indexing new commits added since the last run. If you
28 wish to build a brand new index from scratch each time,
28 wish to build a brand new index from scratch each time,
29 use the ``force`` option in the configuration file.
29 use the ``force`` option in the configuration file.
30
30
31 .. important::
31 .. important::
32
32
33 You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|
33 You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|
34 3.5.0 they are installed by default.
34 3.5.0 they are installed by default.
35
35
36 To set up indexing, use the following steps:
36 To set up indexing, use the following steps:
37
37
38 1. :ref:`config-rhoderc`, if running tools remotely.
38 1. :ref:`config-rhoderc`, if running tools remotely.
39 2. :ref:`run-index`
39 2. :ref:`run-index`
40 3. :ref:`set-index`
40 3. :ref:`set-index`
41 4. :ref:`advanced-indexing`
41 4. :ref:`advanced-indexing`
42
42
43 .. _config-rhoderc:
43 .. _config-rhoderc:
44
44
45 Configure the ``.rhoderc`` File
45 Configure the ``.rhoderc`` File
46 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
46 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
47
47
48 |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
48 |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
49 to |RCM| instances. If this file is not automatically created,
49 to |RCM| instances. If this file is not automatically created,
50 you can configure it using the following example. You need to configure the
50 you can configure it using the following example. You need to configure the
51 details for each instance you want to index.
51 details for each instance you want to index.
52
52
53 .. code-block:: bash
53 .. code-block:: bash
54
54
55 # Check the instance details
55 # Check the instance details
56 # of the instance you want to index
56 # of the instance you want to index
57 $ rccontrol status
57 $ rccontrol status
58
58
59 - NAME: enterprise-1
59 - NAME: enterprise-1
60 - STATUS: RUNNING
60 - STATUS: RUNNING
61 - TYPE: Momentum
61 - TYPE: Momentum
62 - VERSION: 1.5.0
62 - VERSION: 1.5.0
63 - URL: http://127.0.0.1:10000
63 - URL: http://127.0.0.1:10000
64
64
65 To get your API Token, on the |RCM| interface go to
65 To get your API Token, on the |RCM| interface go to
66 :menuselection:`username --> My Account --> Auth tokens`
66 :menuselection:`username --> My Account --> Auth tokens`
67
67
68 .. code-block:: ini
68 .. code-block:: ini
69
69
70 # Configure .rhoderc with matching details
70 # Configure .rhoderc with matching details
71 # This allows the indexer to connect to the instance
71 # This allows the indexer to connect to the instance
72 [instance:enterprise-1]
72 [instance:enterprise-1]
73 api_host = http://127.0.0.1:10000
73 api_host = http://127.0.0.1:10000
74 api_key = <auth token goes here>
74 api_key = <auth token goes here>
75 repo_dir = /home/<username>/repos
75 repo_dir = /home/<username>/repos
76
76
77 .. _run-index:
77 .. _run-index:
78
78
79 Run the Indexer
79 Run the Indexer
80 ^^^^^^^^^^^^^^^
80 ^^^^^^^^^^^^^^^
81
81
82 Run the indexer using the following command, and specify the instance you
82 Run the indexer using the following command, and specify the instance you
83 want to index:
83 want to index:
84
84
85 .. code-block:: bash
85 .. code-block:: bash
86
86
87 # From inside a virtualevv
87 # From inside a virtualevv
88 (venv)$ rhodecode-index --instance-name=enterprise-1
88 (venv)$ rhodecode-index --instance-name=enterprise-1
89
89
90 # Using default installation
90 # Using default installation
91 $ /home/user/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
91 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
92 --instance-name=enterprise-4
92 --instance-name=enterprise-1
93
93
94 # Using a custom mapping file
94 # Using a custom mapping file
95 $ /home/user/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
95 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
96 --instance-name=enterprise-4 \
96 --instance-name=enterprise-1 \
97 --mapping=/home/user/.rccontrol/enterprise-4/mapping.ini
97 --mapping=/home/user/.rccontrol/enterprise-1/mapping.ini
98
98
99 .. note::
99 .. note::
100
100
101 |RCT| require |PY| 2.7 to run.
101 In case of often indexing the index may become fragmented. Most often a result of that
102 is error about `too many open files`. To fix this indexer needs to be executed with
103 --optimize flag. E.g `rhodecode-index --instance-name=enterprise-1 --optimize`
104 This should be executed regularly, once a week is recommended.
105
102
106
103 .. _set-index:
107 .. _set-index:
104
108
105 Schedule the Indexer
109 Schedule the Indexer
106 ^^^^^^^^^^^^^^^^^^^^
110 ^^^^^^^^^^^^^^^^^^^^
107
111
108 To schedule the indexer, configure the crontab file to run the indexer inside
112 To schedule the indexer, configure the crontab file to run the indexer inside
109 your |RCT| virtualenv using the following steps.
113 your |RCT| virtualenv using the following steps.
110
114
111 1. Open the crontab file, using ``crontab -e``.
115 1. Open the crontab file, using ``crontab -e``.
112 2. Add the indexer to the crontab, and schedule it to run as regularly as you
116 2. Add the indexer to the crontab, and schedule it to run as regularly as you
113 wish.
117 wish.
114 3. Save the file.
118 3. Save the file.
115
119
116 .. code-block:: bash
120 .. code-block:: bash
117
121
118 $ crontab -e
122 $ crontab -e
119
123
120 # The virtualenv can be called using its full path, so for example you can
124 # The virtualenv can be called using its full path, so for example you can
121 # put this example into the crontab
125 # put this example into the crontab
122
126
123 # Run the indexer daily at 4am using the default mapping settings
127 # Run the indexer daily at 4am using the default mapping settings
124 * 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
128 * 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
125 --instance-name=enterprise-1
129 --instance-name=enterprise-1
126
130
127 # Run the indexer every Sunday at 3am using default mapping
131 # Run the indexer every Sunday at 3am using default mapping
128 * 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
132 * 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
129 --instance-name=enterprise-1
133 --instance-name=enterprise-1
130
134
131 # Run the indexer every 15 minutes
135 # Run the indexer every 15 minutes
132 # using a specially configured mapping file
136 # using a specially configured mapping file
133 */15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
137 */15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
134 --instance-name=enterprise-4 \
138 --instance-name=enterprise-4 \
135 --mapping=/home/user/.rccontrol/enterprise-4/mapping.ini
139 --mapping=/home/user/.rccontrol/enterprise-4/mapping.ini
136
140
137 .. _advanced-indexing:
141 .. _advanced-indexing:
138
142
139 Advanced Indexing
143 Advanced Indexing
140 ^^^^^^^^^^^^^^^^^
144 ^^^^^^^^^^^^^^^^^
141
145
142 |RCT| indexes based on the :file:`mapping.ini` file. To configure your index,
146 |RCT| indexes based on the :file:`mapping.ini` file. To configure your index,
143 you can specify different options in this file. The default location is:
147 you can specify different options in this file. The default location is:
144
148
145 * :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`, using default
149 * :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`, using default
146 |RCT|.
150 |RCT|.
147 * :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,
151 * :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,
148 when using ``virtualenv``.
152 when using ``virtualenv``.
149
153
150 .. note::
154 .. note::
151
155
152 If you need to create the :file:`mapping.ini` file, use the |RCT|
156 If you need to create the :file:`mapping.ini` file, use the |RCT|
153 ``rhodecode-index --create-mapping path/to/file`` API call. For details,
157 ``rhodecode-index --create-mapping path/to/file`` API call. For details,
154 see the :ref:`tools-cli` section.
158 see the :ref:`tools-cli` section.
155
159
156 The indexer runs in a random order to prevent a failing |repo| from stopping
160 The indexer runs in a random order to prevent a failing |repo| from stopping
157 a build. To configure different indexing scenarios, set the following options
161 a build. To configure different indexing scenarios, set the following options
158 inside the :file:`mapping.ini` and specify the altered file using the
162 inside the :file:`mapping.ini` and specify the altered file using the
159 ``--mapping`` option.
163 ``--mapping`` option.
160
164
161 * ``index_files`` : Index the specified file types.
165 * ``index_files`` : Index the specified file types.
162 * ``skip_files`` : Do not index the specified file types.
166 * ``skip_files`` : Do not index the specified file types.
163 * ``index_files_content`` : Index the content of the specified file types.
167 * ``index_files_content`` : Index the content of the specified file types.
164 * ``skip_files_content`` : Do not index the content of the specified files.
168 * ``skip_files_content`` : Do not index the content of the specified files.
165 * ``force`` : Create a fresh index on each run.
169 * ``force`` : Create a fresh index on each run.
166 * ``max_filesize`` : Files larger than the set size will not be indexed.
170 * ``max_filesize`` : Files larger than the set size will not be indexed.
167 * ``commit_parse_limit`` : Set the batch size when indexing commit messages.
171 * ``commit_parse_limit`` : Set the batch size when indexing commit messages.
168 Set to a lower number to lessen memory load.
172 Set to a lower number to lessen memory load.
169 * ``repo_limit`` : Set the maximum number or |repos| indexed per run.
173 * ``repo_limit`` : Set the maximum number or |repos| indexed per run.
170 * ``[INCLUDE]`` : Set |repos| you want indexed. This takes precedent over
174 * ``[INCLUDE]`` : Set |repos| you want indexed. This takes precedent over
171 ``[EXCLUDE]``.
175 ``[EXCLUDE]``.
172 * ``[EXCLUDE]`` : Set |repos| you do not want indexed. Exclude can be used to
176 * ``[EXCLUDE]`` : Set |repos| you do not want indexed. Exclude can be used to
173 not index branches, forks, or log |repos|.
177 not index branches, forks, or log |repos|.
174
178
175 At the end of the file you can specify conditions for specific |repos| that
179 At the end of the file you can specify conditions for specific |repos| that
176 will override the default values. To configure your indexer,
180 will override the default values. To configure your indexer,
177 use the following example :file:`mapping.ini` file.
181 use the following example :file:`mapping.ini` file.
178
182
179 .. code-block:: ini
183 .. code-block:: ini
180
184
181 [__DEFAULT__]
185 [__DEFAULT__]
182 # default patterns for indexing files and content of files.
186 # default patterns for indexing files and content of files.
183 # Binary files are skipped by default.
187 # Binary files are skipped by default.
184
188
185 # Index python and markdown files
189 # Index python and markdown files
186 index_files = *.py, *.md
190 index_files = *.py, *.md
187
191
188 # Do not index these file types
192 # Do not index these file types
189 skip_files = *.svg, *.log, *.dump, *.txt
193 skip_files = *.svg, *.log, *.dump, *.txt
190
194
191 # Index both file types and their content
195 # Index both file types and their content
192 index_files_content = *.cpp, *.ini, *.py
196 index_files_content = *.cpp, *.ini, *.py
193
197
194 # Index file names, but not file content
198 # Index file names, but not file content
195 skip_files_content = *.svg,
199 skip_files_content = *.svg,
196
200
197 # Force rebuilding an index from scratch. Each repository will be rebuild
201 # Force rebuilding an index from scratch. Each repository will be rebuild
198 # from scratch with a global flag. Use local flag to rebuild single repos
202 # from scratch with a global flag. Use local flag to rebuild single repos
199 force = false
203 force = false
200
204
201 # Do not index files larger than 385KB
205 # Do not index files larger than 385KB
202 max_filesize = 385KB
206 max_filesize = 385KB
203
207
204 # Limit commit indexing to 500 per batch
208 # Limit commit indexing to 500 per batch
205 commit_parse_limit = 500
209 commit_parse_limit = 500
206
210
207 # Limit each index run to 25 repos
211 # Limit each index run to 25 repos
208 repo_limit = 25
212 repo_limit = 25
209
213
210 # __INCLUDE__ is more important that __EXCLUDE__.
214 # __INCLUDE__ is more important that __EXCLUDE__.
211
215
212 [__INCLUDE__]
216 [__INCLUDE__]
213 # Include all repos with these names
217 # Include all repos with these names
214
218
215 docs/* = 1
219 docs/* = 1
216 lib/* = 1
220 lib/* = 1
217
221
218 [__EXCLUDE__]
222 [__EXCLUDE__]
219 # Do not include the following repo in index
223 # Do not include the following repo in index
220
224
221 dev-docs/* = 1
225 dev-docs/* = 1
222 legacy-repos/* = 1
226 legacy-repos/* = 1
223 *-dev/* = 1
227 *-dev/* = 1
224
228
225 # Each repo that needs special indexing is a separate section below.
229 # Each repo that needs special indexing is a separate section below.
226 # In each section set the options to override the global configuration
230 # In each section set the options to override the global configuration
227 # parameters above.
231 # parameters above.
228 # If special settings are not configured, the global configuration values
232 # If special settings are not configured, the global configuration values
229 # above are inherited. If no special repositories are
233 # above are inherited. If no special repositories are
230 # defined here RhodeCode will use the API to ask for all repositories
234 # defined here RhodeCode will use the API to ask for all repositories
231
235
232 # For this repo use different settings
236 # For this repo use different settings
233 [special-repo]
237 [special-repo]
234 commit_parse_limit = 20,
238 commit_parse_limit = 20,
235 skip_files = *.idea, *.xml,
239 skip_files = *.idea, *.xml,
236
240
237 # For another repo use different settings
241 # For another repo use different settings
238 [another-special-repo]
242 [another-special-repo]
239 index_files = *,
243 index_files = *,
240 max_filesize = 800MB
244 max_filesize = 800MB
241 commit_parse_limit = 20000
245 commit_parse_limit = 20000
242
246
243 .. _enable-elasticsearch:
247 .. _enable-elasticsearch:
244
248
245 Enabling Elasticsearch
249 Enabling Elasticsearch
246 ^^^^^^^^^^^^^^^^^^^^^^
250 ^^^^^^^^^^^^^^^^^^^^^^
247
251
248 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
252 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
249 default location is
253 default location is
250 :file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
254 :file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
251 2. Find the search configuration section:
255 2. Find the search configuration section:
252
256
253 .. code-block:: ini
257 .. code-block:: ini
254
258
255 ###################################
259 ###################################
256 ## SEARCH INDEXING CONFIGURATION ##
260 ## SEARCH INDEXING CONFIGURATION ##
257 ###################################
261 ###################################
258
262
259 search.module = rhodecode.lib.index.whoosh
263 search.module = rhodecode.lib.index.whoosh
260 search.location = %(here)s/data/index
264 search.location = %(here)s/data/index
261
265
262 and change it to:
266 and change it to:
263
267
264 .. code-block:: ini
268 .. code-block:: ini
265
269
266 search.module = rc_elasticsearch
270 search.module = rc_elasticsearch
267 search.location = http://localhost:9200/
271 search.location = http://localhost:9200/
268
272
269 where ``search.location`` points to the elasticsearch server.
273 where ``search.location`` points to the elasticsearch server.
270
274
271 .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
275 .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
272 .. _Elasticsearch: https://www.elastic.co/ No newline at end of file
276 .. _Elasticsearch: https://www.elastic.co/
General Comments 0
You need to be logged in to leave comments. Login now