##// END OF EJS Templates
docs: added info about --optimize flag for full text search.
marcink -
r2949:9a63987a default
parent child Browse files
Show More
@@ -1,272 +1,276 b''
1 1 .. _indexing-ref:
2 2
3 3 Full-text Search
4 4 ----------------
5 5
6 6 By default |RC| is configured to use `Whoosh`_ to index |repos| and
7 7 provide full-text search.
8 8
9 9 |RCE| also provides support for `Elasticsearch`_ as a backend for scalable
10 10 search. See :ref:`enable-elasticsearch` for details.
11 11
12 12 Indexing
13 13 ^^^^^^^^
14 14
15 15 To run the indexer you need to use an |authtoken| with admin rights to all
16 16 |repos|.
17 17
18 18 To index new content added, you have the option to set the indexer up in a
19 19 number of ways, for example:
20 20
21 21 * Call the indexer via a cron job. We recommend running this nightly,
22 22 unless you need everything indexed immediately.
23 23 * Set the indexer to infinitely loop and reindex as soon as it has run its
24 24 cycle.
25 25 * Hook the indexer up with your CI server to reindex after each push.
26 26
27 27 The indexer works by indexing new commits added since the last run. If you
28 28 wish to build a brand new index from scratch each time,
29 29 use the ``force`` option in the configuration file.
30 30
31 31 .. important::
32 32
33 33 You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|
34 34 3.5.0 they are installed by default.
35 35
36 36 To set up indexing, use the following steps:
37 37
38 38 1. :ref:`config-rhoderc`, if running tools remotely.
39 39 2. :ref:`run-index`
40 40 3. :ref:`set-index`
41 41 4. :ref:`advanced-indexing`
42 42
43 43 .. _config-rhoderc:
44 44
45 45 Configure the ``.rhoderc`` File
46 46 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
47 47
48 48 |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
49 49 to |RCM| instances. If this file is not automatically created,
50 50 you can configure it using the following example. You need to configure the
51 51 details for each instance you want to index.
52 52
53 53 .. code-block:: bash
54 54
55 55 # Check the instance details
56 56 # of the instance you want to index
57 57 $ rccontrol status
58 58
59 59 - NAME: enterprise-1
60 60 - STATUS: RUNNING
61 61 - TYPE: Momentum
62 62 - VERSION: 1.5.0
63 63 - URL: http://127.0.0.1:10000
64 64
65 65 To get your API Token, on the |RCM| interface go to
66 66 :menuselection:`username --> My Account --> Auth tokens`
67 67
68 68 .. code-block:: ini
69 69
70 70 # Configure .rhoderc with matching details
71 71 # This allows the indexer to connect to the instance
72 72 [instance:enterprise-1]
73 73 api_host = http://127.0.0.1:10000
74 74 api_key = <auth token goes here>
75 75 repo_dir = /home/<username>/repos
76 76
77 77 .. _run-index:
78 78
79 79 Run the Indexer
80 80 ^^^^^^^^^^^^^^^
81 81
82 82 Run the indexer using the following command, and specify the instance you
83 83 want to index:
84 84
85 85 .. code-block:: bash
86 86
87 87 # From inside a virtualevv
88 88 (venv)$ rhodecode-index --instance-name=enterprise-1
89 89
90 90 # Using default installation
91 $ /home/user/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
92 --instance-name=enterprise-4
91 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
92 --instance-name=enterprise-1
93 93
94 94 # Using a custom mapping file
95 $ /home/user/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
96 --instance-name=enterprise-4 \
97 --mapping=/home/user/.rccontrol/enterprise-4/mapping.ini
95 $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
96 --instance-name=enterprise-1 \
97 --mapping=/home/user/.rccontrol/enterprise-1/mapping.ini
98 98
99 99 .. note::
100 100
101 |RCT| require |PY| 2.7 to run.
101 In case of often indexing the index may become fragmented. Most often a result of that
102 is error about `too many open files`. To fix this indexer needs to be executed with
103 --optimize flag. E.g `rhodecode-index --instance-name=enterprise-1 --optimize`
104 This should be executed regularly, once a week is recommended.
105
102 106
103 107 .. _set-index:
104 108
105 109 Schedule the Indexer
106 110 ^^^^^^^^^^^^^^^^^^^^
107 111
108 112 To schedule the indexer, configure the crontab file to run the indexer inside
109 113 your |RCT| virtualenv using the following steps.
110 114
111 115 1. Open the crontab file, using ``crontab -e``.
112 116 2. Add the indexer to the crontab, and schedule it to run as regularly as you
113 117 wish.
114 118 3. Save the file.
115 119
116 120 .. code-block:: bash
117 121
118 122 $ crontab -e
119 123
120 124 # The virtualenv can be called using its full path, so for example you can
121 125 # put this example into the crontab
122 126
123 127 # Run the indexer daily at 4am using the default mapping settings
124 128 * 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
125 129 --instance-name=enterprise-1
126 130
127 131 # Run the indexer every Sunday at 3am using default mapping
128 132 * 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
129 133 --instance-name=enterprise-1
130 134
131 135 # Run the indexer every 15 minutes
132 136 # using a specially configured mapping file
133 137 */15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
134 138 --instance-name=enterprise-4 \
135 139 --mapping=/home/user/.rccontrol/enterprise-4/mapping.ini
136 140
137 141 .. _advanced-indexing:
138 142
139 143 Advanced Indexing
140 144 ^^^^^^^^^^^^^^^^^
141 145
142 146 |RCT| indexes based on the :file:`mapping.ini` file. To configure your index,
143 147 you can specify different options in this file. The default location is:
144 148
145 149 * :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`, using default
146 150 |RCT|.
147 151 * :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,
148 152 when using ``virtualenv``.
149 153
150 154 .. note::
151 155
152 156 If you need to create the :file:`mapping.ini` file, use the |RCT|
153 157 ``rhodecode-index --create-mapping path/to/file`` API call. For details,
154 158 see the :ref:`tools-cli` section.
155 159
156 160 The indexer runs in a random order to prevent a failing |repo| from stopping
157 161 a build. To configure different indexing scenarios, set the following options
158 162 inside the :file:`mapping.ini` and specify the altered file using the
159 163 ``--mapping`` option.
160 164
161 165 * ``index_files`` : Index the specified file types.
162 166 * ``skip_files`` : Do not index the specified file types.
163 167 * ``index_files_content`` : Index the content of the specified file types.
164 168 * ``skip_files_content`` : Do not index the content of the specified files.
165 169 * ``force`` : Create a fresh index on each run.
166 170 * ``max_filesize`` : Files larger than the set size will not be indexed.
167 171 * ``commit_parse_limit`` : Set the batch size when indexing commit messages.
168 172 Set to a lower number to lessen memory load.
169 173 * ``repo_limit`` : Set the maximum number or |repos| indexed per run.
170 174 * ``[INCLUDE]`` : Set |repos| you want indexed. This takes precedent over
171 175 ``[EXCLUDE]``.
172 176 * ``[EXCLUDE]`` : Set |repos| you do not want indexed. Exclude can be used to
173 177 not index branches, forks, or log |repos|.
174 178
175 179 At the end of the file you can specify conditions for specific |repos| that
176 180 will override the default values. To configure your indexer,
177 181 use the following example :file:`mapping.ini` file.
178 182
179 183 .. code-block:: ini
180 184
181 185 [__DEFAULT__]
182 186 # default patterns for indexing files and content of files.
183 187 # Binary files are skipped by default.
184 188
185 189 # Index python and markdown files
186 190 index_files = *.py, *.md
187 191
188 192 # Do not index these file types
189 193 skip_files = *.svg, *.log, *.dump, *.txt
190 194
191 195 # Index both file types and their content
192 196 index_files_content = *.cpp, *.ini, *.py
193 197
194 198 # Index file names, but not file content
195 199 skip_files_content = *.svg,
196 200
197 201 # Force rebuilding an index from scratch. Each repository will be rebuild
198 202 # from scratch with a global flag. Use local flag to rebuild single repos
199 203 force = false
200 204
201 205 # Do not index files larger than 385KB
202 206 max_filesize = 385KB
203 207
204 208 # Limit commit indexing to 500 per batch
205 209 commit_parse_limit = 500
206 210
207 211 # Limit each index run to 25 repos
208 212 repo_limit = 25
209 213
210 214 # __INCLUDE__ is more important that __EXCLUDE__.
211 215
212 216 [__INCLUDE__]
213 217 # Include all repos with these names
214 218
215 219 docs/* = 1
216 220 lib/* = 1
217 221
218 222 [__EXCLUDE__]
219 223 # Do not include the following repo in index
220 224
221 225 dev-docs/* = 1
222 226 legacy-repos/* = 1
223 227 *-dev/* = 1
224 228
225 229 # Each repo that needs special indexing is a separate section below.
226 230 # In each section set the options to override the global configuration
227 231 # parameters above.
228 232 # If special settings are not configured, the global configuration values
229 233 # above are inherited. If no special repositories are
230 234 # defined here RhodeCode will use the API to ask for all repositories
231 235
232 236 # For this repo use different settings
233 237 [special-repo]
234 238 commit_parse_limit = 20,
235 239 skip_files = *.idea, *.xml,
236 240
237 241 # For another repo use different settings
238 242 [another-special-repo]
239 243 index_files = *,
240 244 max_filesize = 800MB
241 245 commit_parse_limit = 20000
242 246
243 247 .. _enable-elasticsearch:
244 248
245 249 Enabling Elasticsearch
246 250 ^^^^^^^^^^^^^^^^^^^^^^
247 251
248 252 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
249 253 default location is
250 254 :file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
251 255 2. Find the search configuration section:
252 256
253 257 .. code-block:: ini
254 258
255 259 ###################################
256 260 ## SEARCH INDEXING CONFIGURATION ##
257 261 ###################################
258 262
259 263 search.module = rhodecode.lib.index.whoosh
260 264 search.location = %(here)s/data/index
261 265
262 266 and change it to:
263 267
264 268 .. code-block:: ini
265 269
266 270 search.module = rc_elasticsearch
267 271 search.location = http://localhost:9200/
268 272
269 273 where ``search.location`` points to the elasticsearch server.
270 274
271 275 .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
272 276 .. _Elasticsearch: https://www.elastic.co/ No newline at end of file
General Comments 0
You need to be logged in to leave comments. Login now