##// END OF EJS Templates
docs: add elasticsearch docs
dan -
r153:42d94537 default
parent child Browse files
Show More
@@ -1,235 +1,272 b''
1 1 .. _indexing-ref:
2 2
3 3 Full-text Search
4 4 ----------------
5 5
6 By default |RCM| uses `Whoosh`_ to index |repos| and provide full-text search.
6 By default |RC| is configured to use `Whoosh`_ to index |repos| and
7 provide full-text search.
8
9 |RCE| also provides support for `Elasticsearch`_ as a backend for scalable
10 search. See :ref:`enable-elasticsearch` for details.
11
12 Indexing
13 ^^^^^^^^
14
7 15 To run the indexer you need to use an |authtoken| with admin rights to all
8 16 |repos|.
9 17
10 18 To index new content added, you have the option to set the indexer up in a
11 19 number of ways, for example:
12 20
13 21 * Call the indexer via a cron job. We recommend running this nightly,
14 22 unless you need everything indexed immediately.
15 23 * Set the indexer to infinitely loop and reindex as soon as it has run its
16 24 cycle.
17 25 * Hook the indexer up with your CI server to reindex after each push.
18 26
19 27 The indexer works by indexing new commits added since the last run. If you
20 28 wish to build a brand new index from scratch each time,
21 29 use the ``force`` option in the configuration file.
22 30
23 31 .. important::
24 32
25 33 You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|
26 34 3.5.0 they are installed by default.
27 35
28 36 To set up indexing, use the following steps:
29 37
30 38 1. :ref:`config-rhoderc`, if running tools remotely.
31 39 2. :ref:`run-index`
32 40 3. :ref:`set-index`
33 41 4. :ref:`advanced-indexing`
34 42
35 43 .. _config-rhoderc:
36 44
37 45 Configure the ``.rhoderc`` File
38 46 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
39 47
40 48 |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
41 49 to |RCM| instances. If this file is not automatically created,
42 50 you can configure it using the following example. You need to configure the
43 51 details for each instance you want to index.
44 52
45 53 .. code-block:: bash
46 54
47 55 # Check the instance details
48 56 # of the instance you want to index
49 57 $ rccontrol status
50 58
51 59 - NAME: enterprise-1
52 60 - STATUS: RUNNING
53 61 - TYPE: Momentum
54 62 - VERSION: 1.5.0
55 63 - URL: http://127.0.0.1:10000
56 64
57 65 To get your API Token, on the |RCM| interface go to
58 66 :menuselection:`username --> My Account --> Auth tokens`
59 67
60 68 .. code-block:: ini
61 69
62 70 # Configure .rhoderc with matching details
63 71 # This allows the indexer to connect to the instance
64 72 [instance:enterprise-1]
65 73 api_host = http://127.0.0.1:10000
66 74 api_key = <auth token goes here>
67 75 repo_dir = /home/<username>/repos
68 76
69 77 .. _run-index:
70 78
71 79 Run the Indexer
72 80 ^^^^^^^^^^^^^^^
73 81
74 82 Run the indexer using the following command, and specify the instance you
75 83 want to index:
76 84
77 85 .. code-block:: bash
78 86
79 87 # From inside a virtualevv
80 88 (venv)$ rhodecode-index --instance-name=enterprise-1
81 89
82 90 # Using default installation
83 91 $ /home/user/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
84 92 --instance-name=enterprise-4
85 93
86 94 # Using a custom mapping file
87 95 $ /home/user/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
88 96 --instance-name=enterprise-4 \
89 97 --mapping=/home/user/.rccontrol/enterprise-4/mapping.ini
90 98
91 99 .. note::
92 100
93 101 |RCT| require |PY| 2.7 to run.
94 102
95 103 .. _set-index:
96 104
97 105 Schedule the Indexer
98 106 ^^^^^^^^^^^^^^^^^^^^
99 107
100 108 To schedule the indexer, configure the crontab file to run the indexer inside
101 109 your |RCT| virtualenv using the following steps.
102 110
103 111 1. Open the crontab file, using ``crontab -e``.
104 112 2. Add the indexer to the crontab, and schedule it to run as regularly as you
105 113 wish.
106 114 3. Save the file.
107 115
108 116 .. code-block:: bash
109 117
110 118 $ crontab -e
111 119
112 120 # The virtualenv can be called using its full path, so for example you can
113 121 # put this example into the crontab
114 122
115 123 # Run the indexer daily at 4am using the default mapping settings
116 124 * 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
117 125 --instance-name=enterprise-1
118 126
119 127 # Run the indexer every Sunday at 3am using default mapping
120 128 * 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
121 129 --instance-name=enterprise-1
122 130
123 131 # Run the indexer every 15 minutes
124 132 # using a specially configured mapping file
125 133 */15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
126 134 --instance-name=enterprise-4 \
127 135 --mapping=/home/user/.rccontrol/enterprise-4/mapping.ini
128 136
129 137 .. _advanced-indexing:
130 138
131 139 Advanced Indexing
132 140 ^^^^^^^^^^^^^^^^^
133 141
134 142 |RCT| indexes based on the :file:`mapping.ini` file. To configure your index,
135 143 you can specify different options in this file. The default location is:
136 144
137 145 * :file:`/home/{user}/.rccontrol/{instance-id}/mapping.ini`, using default
138 146 |RCT|.
139 147 * :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,
140 148 when using ``virtualenv``.
141 149
142 150 .. note::
143 151
144 152 If you need to create the :file:`mapping.ini` file, use the |RCT|
145 153 ``rhodecode-index --create-mapping path/to/file`` API call. For details,
146 154 see the :ref:`tools-cli` section.
147 155
148 156 The indexer runs in a random order to prevent a failing |repo| from stopping
149 157 a build. To configure different indexing scenarios, set the following options
150 158 inside the :file:`mapping.ini` and specify the altered file using the
151 159 ``--mapping`` option.
152 160
153 161 * ``index_files`` : Index the specified file types.
154 162 * ``skip_files`` : Do not index the specified file types.
155 163 * ``index_files_content`` : Index the content of the specified file types.
156 164 * ``skip_files_content`` : Do not index the content of the specified files.
157 165 * ``force`` : Create a fresh index on each run.
158 166 * ``max_filesize`` : Files larger than the set size will not be indexed.
159 167 * ``commit_parse_limit`` : Set the batch size when indexing commit messages.
160 168 Set to a lower number to lessen memory load.
161 169 * ``repo_limit`` : Set the maximum number or |repos| indexed per run.
162 170 * ``[INCLUDE]`` : Set |repos| you want indexed. This takes precedent over
163 171 ``[EXCLUDE]``.
164 172 * ``[EXCLUDE]`` : Set |repos| you do not want indexed. Exclude can be used to
165 173 not index branches, forks, or log |repos|.
166 174
167 175 At the end of the file you can specify conditions for specific |repos| that
168 176 will override the default values. To configure your indexer,
169 177 use the following example :file:`mapping.ini` file.
170 178
171 179 .. code-block:: ini
172 180
173 181 [__DEFAULT__]
174 182 # default patterns for indexing files and content of files.
175 183 # Binary files are skipped by default.
176 184
177 185 # Index python and markdown files
178 186 index_files = *.py, *.md
179 187
180 188 # Do not index these file types
181 189 skip_files = *.svg, *.log, *.dump, *.txt
182 190
183 191 # Index both file types and their content
184 192 index_files_content = *.cpp, *.ini, *.py
185 193
186 194 # Index file names, but not file content
187 195 skip_files_content = *.svg,
188 196
189 197 # Force rebuilding an index from scratch. Each repository will be rebuild
190 198 # from scratch with a global flag. Use local flag to rebuild single repos
191 199 force = false
192 200
193 201 # Do not index files larger than 385KB
194 202 max_filesize = 385KB
195 203
196 204 # Limit commit indexing to 500 per batch
197 205 commit_parse_limit = 500
198 206
199 207 # Limit each index run to 25 repos
200 208 repo_limit = 25
201 209
202 210 # __INCLUDE__ is more important that __EXCLUDE__.
203 211
204 212 [__INCLUDE__]
205 213 # Include all repos with these names
206 214
207 215 docs/* = 1
208 216 lib/* = 1
209 217
210 218 [__EXCLUDE__]
211 219 # Do not include the following repo in index
212 220
213 221 dev-docs/* = 1
214 222 legacy-repos/* = 1
215 223 *-dev/* = 1
216 224
217 225 # Each repo that needs special indexing is a separate section below.
218 226 # In each section set the options to override the global configuration
219 227 # parameters above.
220 228 # If special settings are not configured, the global configuration values
221 229 # above are inherited. If no special repositories are
222 230 # defined here RhodeCode will use the API to ask for all repositories
223 231
224 232 # For this repo use different settings
225 233 [special-repo]
226 234 commit_parse_limit = 20,
227 235 skip_files = *.idea, *.xml,
228 236
229 237 # For another repo use different settings
230 238 [another-special-repo]
231 239 index_files = *,
232 240 max_filesize = 800MB
233 241 commit_parse_limit = 20000
234 242
243 .. _enable-elasticsearch:
244
245 Enabling Elasticsearch
246 ^^^^^^^^^^^^^^^^^^^^^^
247
248 1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
249 default location is
250 :file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
251 2. Find the search configuration section:
252
253 .. code-block:: ini
254
255 ###################################
256 ## SEARCH INDEXING CONFIGURATION ##
257 ###################################
258
259 search.module = rhodecode.lib.index.whoosh
260 search.location = %(here)s/data/index
261
262 and change it to:
263
264 .. code-block:: ini
265
266 search.module = rc_elasticsearch
267 search.location = http://localhost:9200/
268
269 where ``search.location`` points to the elasticsearch server.
270
235 271 .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
272 .. _Elasticsearch: https://www.elastic.co/ No newline at end of file
General Comments 0
You need to be logged in to leave comments. Login now