rhodecode-enterprise-docker Commit - r387:7aac5afe

1

.. _full-text-search-setup:

2

3

Full-text Search

4

----------------

5

6

RhodeCode provides a full text search capabilities to search inside file content,

7

commit message, and file paths. Indexing is not enabled by default and to use

8

full text search building an index is a pre-requisite.

9

10

By default RhodeCode is configured to use `Whoosh`_ to index |repos| and

11

provide full-text search. `Whoosh`_ works well for a small amount of data and

12

shouldn't be used in case of large code-bases and lots of repositories.

13

14

|RCE| also provides support for `ElasticSearch 6`_ as a backend more for advanced

15

and scalable search.

16

17

18

Auth Token generation

19

^^^^^^^^^^^^^^^^^^^^^

20

21

RhodeCode indexer runs on top of |RCE| API and requires an |authtoken| before continuing.

22

To run the indexer you need to have an |authtoken| with *admin* rights to all of |repos| that indexer should

23

process.

24

25

To get your API Token, on the |RCE| interface go to

26

Click on the icon with your user in top right corner :menuselection:`your-username --> My Account --> Auth tokens`

27

28

1. Put a description for the |authtoken|

29

2. Select expiration date if desired

30

3. Select `api calls` role for the token

31

4. Click :guilabel:`Add`

32

5. Click on the obfuscated generated token, and copy it.

33

34

35

Indexing

36

^^^^^^^^

37

38

To index repositories stored in RhodeCode, you have the option to set the indexer up in a

39

number of ways, for example:

40

41

* Call the indexer via a cron job. We recommend running this once at night.

42

In case you need everything indexed immediately it's possible to index few

43

times during the day. Indexer has a special locking mechanism that won't allow

44

two instances of indexer running at once. It's safe to run it even every 1hr.

45

* Hook the indexer up with your CI server to reindex after each push.

46

* Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.

47

This allows to get an instance indexing of content that would be available seconds after changes happen.

48

49

The indexer works by indexing new commits added since the last run, and comparing

50

file changes to index only new or modified files across each invocation.

51

52

.. note::

53

54

If you wish to build a brand new index from scratch each time, use the ``force``

55

option in the configuration file, or run it with --force flag.

56

57

58

To set up indexing, use the following steps:

59

60

1. :ref:`config-rhoderc`

61

2. :ref:`run-index`

62

3. :ref:`set-index`

63

4. :ref:`advanced-indexing`

64

65

66

.. _config-rhoderc:

67

68

Configure the ``.rhoderc`` File

69

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

70

71

.. note::

72

73

Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of

74

executing with `--instance-name=rc-idx` execute providing the host and token

75

directly: `--api-host=https://your-host.example.com --api-key=<auth-token-goes-here>`

76

77

78

.. note::

79

80

In some cases the domain could be only available via the custom DNS, you can always refer the

81

instance by it's docker name and port (`http://rhodecode:10020`) instead of hostname, for example:

82

83

.. code-block:: bash

84

85

./rcstack cli cmd rhodecode-index --api-host=http://rhodecode:10020 --api-key=xxx

86

87

88

Indexer uses the :file:`/home/{user}/.rhoderc` file for connection details

89

to |RCE| instances. You need to configure the details for each instance you want to index.

90

91

92

.. code-block:: bash

93

94

./rcstack cli cmd rhodecode-setup-config \

95

--filename=/etc/rhodecode/conf/.rhoderc \

96

--instance-name=rc-idx api_host=https://your-host.example.com,api_key=<auth-token-goes-here>

97

98

99

Here's an example generated config you might also mount as a file to the docker image.

100

101

.. code-block:: ini

102

103

# Configure .rhoderc with matching details

104

# This allows the indexer to connect to the instance

105

[instance:rc-idx]

106

api_host = https://your-host.example.com

107

api_key = <auth token goes here>

108

109

110

.. _run-index:

111

112

113

Run the Indexer

114

^^^^^^^^^^^^^^^

115

116

Run the indexer using the following command, and specify the instance you want to index:

117

118

.. code-block:: bash

119

120

# Using default simples indexing of all repositories

121

$ ./rcstack cli cmd rhodecode-index \

122

--no-tty --config=/etc/rhodecode/conf/.rhoderc \

123

--instance-name=rc-idx

124

125

# Using a custom mapping file and invocation without ``.rhoderc``

126

$ ./rcstack cli cmd rhodecode-index \

127

--no-tty \

128

--api-host=https://your-host.example.com --api-key=xxxxx \

129

--mapping=/etc/rhodecode/conf/search_mapping.ini

130

131

# Using a custom mapping file with indexing rules, and using elasticsearch 6 backend

132

$ ./rcstack cli cmd rhodecode-index \

133

--no-tty --config=/etc/rhodecode/conf/.rhoderc \

134

--instance-name=rc-idx \

135

--mapping=/etc/rhodecode/conf/search_mapping.ini \

136

--es-version=6 --engine-location=http://elasticsearch:9200

137

138

# For some advanced usage, please check --help flag to see what other CLI options are available

139

``$ ./rcstack cli cmd rhodecode-index --help``

140

141

.. note::

142

143

In case of often indexing using Whoosh backend the index may become fragmented. Most often a result of that

144

is error about `too many open files`. To fix this indexer needs to be executed with `--optimize` flag. E.g

145

146

.. code-block:: bash

147

148

$ ./rcstack cli cmd rhodecode-index --instance-name=rc-idx --optimize

149

150

This should be executed regularly, once a week is recommended. When using ElasticSearch this step can be skipped.

151

152

153

.. _set-index:

154

155

Schedule the Indexer

156

^^^^^^^^^^^^^^^^^^^^

157

158

To schedule the indexer, configure the crontab file to run the indexer inside

159

your |RCT| virtualenv using the following steps.

160

161

1. Open the crontab file, using ``crontab -e``.

162

2. Add the indexer to the crontab, and schedule it to run as regularly as you

163

wish.

164

3. Save the file.

165

166

.. code-block:: bash

167

168

$ crontab -e

169

170

# The virtualenv can be called using its full path, so for example you can

171

# put this example into the crontab

172

173

# Run the indexer daily at 4am using the default mapping settings, --no-tty is required for non interactive calls

174

* 4 * * * ./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx

175

176

# Run the indexer every Sunday at 3am using default mapping

177

* 3 * * 0 ./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx

178

179

# Run the indexer every 15 minutes

180

# using a specially configured mapping file

181

*/15 * * * * ./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx --mapping=/etc/rhodecode/conf/search_mapping.ini

182

183

.. _advanced-indexing:

184

185

Advanced Indexing

186

^^^^^^^^^^^^^^^^^

187

188

189

Force Re-Indexing single repository

190

+++++++++++++++++++++++++++++++++++

191

192

Often it's required to re-index whole repository because of some repository changes,

193

or to remove some indexed secrets, or files. There's a special `--repo-name=` flag

194

for the indexer that limits execution to a single repository. For example to force-reindex

195

single repository such call can be made

196

197

.. code-block:: bash

198

199

./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx --force --repo-name=rhodecode-vcsserver

200

201

Limiting indexing to small number of repos

202

++++++++++++++++++++++++++++++++++++++++++

203

204

Often to preserve memory usage and system load we might limit the number of repositories processed on each call.

205

There's a special `--repo-limit=` flag for the indexer that limits execution to a N repositories.

206

207

.. code-block:: bash

208

209

./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx --repo-limit=10

210

211

212

Removing repositories from index

213

++++++++++++++++++++++++++++++++

214

215

The indexer automatically removes renamed repositories and builds index for new names.

216

In the same way if a listed repository in mapping.ini is not reported existing by the

217

server it's removed from the index.

218

In case that you wish to remove indexed repository manually such call would allow that

219

220

.. code-block:: bash

221

222

./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx --remove-only --repo-name=rhodecode-vcsserver

223

224

225

Using search_mapping.ini file for advanced index rules

226

++++++++++++++++++++++++++++++++++++++++++++++++++++++

227

228

By default rhodecode-index runs for all repositories, all files with parsing limits

229

defined by the CLI default arguments. You can change those limits by calling with

230

different flags such as `--max-filesize=2048kb` or `--repo-limit=10`

231

232

For more advanced execution logic it's possible to use a configuration file that

233

would define detailed rules which repositories and how should be indexed.

234

235

To create the :file:`search_mapping.ini` file manually, use the below command

236

237

.. code-block:: bash

238

239

./rcstack cli cmd rhodecode-index --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx \

240

--create-mapping=/etc/rhodecode/conf/search_mapping.ini

241

242

243

Now the indexer can be executed with `--mapping` flag

244

245

246

Here's a detailed example of using :file:`search_mapping.ini` file.

247

248

.. code-block:: ini

249

250

[__DEFAULT__]

251

; Create index on commits data, and files data in this order. Available options

252

; are `commits`, `files`

253

index_types = commits,files

254

255

; Commit fetch limit. In what amount of chunks commits should be fetched

256

; via api and parsed. This allows server to transfer smaller chunks and be less loaded

257

commit_fetch_limit = 1000

258

259

; Commit process limit. Limit the number of commits indexer should fetch, and

260

; store inside the full text search index. eg. if repo has 2000 commits, and

261

; limit is 1000, on the first run it will process commits 0-1000 and on the

262

; second 1000-2000 commits. Help reduce memory usage, default is 50000

263

; (set -1 for unlimited)

264

commit_process_limit = 20000

265

266

; Limit of how many repositories each run can process, default is -1 (unlimited)

267

; in case of 1000s of repositories it's better to execute in chunks to not overload

268

; the server.

269

repo_limit = -1

270

271

; Default patterns for indexing files and content of files. Binary files

272

; are skipped by default.

273

274

; Add to index those comma separated files; globs syntax

275

; e.g index_files = *.py, *.c, *.h, *.js

276

index_files = *,

277

278

; Do not add to index those comma separated files, this excludes

279

; both search by name and content; globs syntax

280

; e.g index_files = *.key, *.sql, *.xml, *.pem, *.crt

281

skip_files = ,

282

283

; Add to index content of those comma separated files; globs syntax

284

; e.g index_files = *.h, *.obj

285

index_files_content = *,

286

287

; Do not add to index content of those comma separated files; globs syntax

288

; Binary files are not indexed by default.

289

; e.g index_files = *.min.js, *.xml, *.dump, *.log, *.dump

290

skip_files_content = ,

291

292

; Force rebuilding an index from scratch. Each repository will be rebuild from

293

; scratch with a global flag. Use --repo-name=NAME --force to rebuild single repo

294

force = false

295

296

; maximum file size that indexer will use, files above that limit are not going

297

; to have they content indexed.

298

; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB

299

max_filesize = 10MB

300

301

302

[__INDEX_RULES__]

303

; Ordered match rules for repositories. A list of all repositories will be fetched

304

; using API and this list will be filtered using those rules.

305

; Syntax for entry: `glob_pattern_OR_full_repo_name = 0 OR 1` where 0=exclude, 1=include

306

; When this ordered list is traversed first match will return the include/exclude marker

307

; For example:

308

; upstream/binary_repo = 0

309

; upstream/subrepo/xml_files = 0

310

; upstream/* = 1

311

; special-repo = 1

312

; * = 0

313

; This will index all repositories under upstream/*, but skip upstream/binary_repo

314

; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches

315

316

317

; == EXPLICIT REPOSITORY INDEXING ==

318

; If defined this will skip using __INDEX_RULES__, and will not use API to fetch

319

; list of repositories, it will explicitly take names defined with [NAME] format and

320

; try to build the index, to build index just for repo_name_1 and special-repo use:

321

; [repo_name_1]

322

; [special-repo]

323

324

; == PER REPOSITORY CONFIGURATION ==

325

; This allows overriding the global configuration per repository.

326

; example to set specific file limit, and skip certain files for repository special-repo

327

; the CLI flags doesn't override the conf settings.

328

; [conf:special-repo]

329

; max_filesize = 5mb

330

; skip_files = *.xml, *.sql

331

332

333

334

In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.

335

There's a special flag to test the mapping file rules and list repositories that would

336

be indexed. Run the indexer with `--show-matched-repos` to list only the

337

match repositories defined in .ini file rules

338

339

.. code-block:: bash

340

341

./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx --show-matched-repos --mapping=/etc/rhodecode/conf/search_mapping.ini

342

343

344

.. _enable-elasticsearch:

345

346

Enabling ElasticSearch

347

^^^^^^^^^^^^^^^^^^^^^^

348

349

ElasticSearch is available in EE edition only. It provides much scalable and more advanced

350

search capabilities. While Whoosh is fine for upto 1-2GB of data, beyond that amount it

351

starts slowing down, and can cause other problems.

352

New ElasticSearch 6 also provides much more advanced query language.

353

It allows advanced filtering by file paths, extensions, use OR statements, ranges etc.

354

Please check query language examples in the search field for some advanced query language usage.

355

356

357

1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The

358

default location is :file:`config/_shared/rhodecode.ini`

359

2. Find the search configuration section:

360

361

.. code-block:: ini

362

363

###################################

364

## SEARCH INDEXING CONFIGURATION ##

365

###################################

366

367

search.module = rhodecode.lib.index.whoosh

368

search.location = %(here)s/data/index

369

370

and change it to:

371

372

.. code-block:: ini

373

374

search.module = rc_elasticsearch

375

search.location = http://elasticsearch:9200

376

## specify Elastic Search version, 6 for latest or 2 for legacy

377

search.es_version = 6

378

379

where ``search.location`` points to the ElasticSearch server

380

by default running on port 9200.

381

382

Index invocation also needs change. Please provide `--es-version=` and

383

`--engine-location=` parameters to define ElasticSearch server location and it's version.

384

For example::

385

386

--instance-name=rc-idx --es-version=6 --engine-location=http://elasticsearch:9200

387

388

389

.. _Whoosh: https://pypi.python.org/pypi/Whoosh/

390

.. _ElasticSearch 6: https://www.elastic.co/

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

@@ -0,0 +1,390 b''
	1	.. _full-text-search-setup:
	2
	3	Full-text Search
	4	----------------
	5
	6	RhodeCode provides a full text search capabilities to search inside file content,
	7	commit message, and file paths. Indexing is not enabled by default and to use
	8	full text search building an index is a pre-requisite.
	9
	10	By default RhodeCode is configured to use `Whoosh`_ to index \|repos\| and
	11	provide full-text search. `Whoosh`_ works well for a small amount of data and
	12	shouldn't be used in case of large code-bases and lots of repositories.
	13
	14	\|RCE\| also provides support for `ElasticSearch 6`_ as a backend more for advanced
	15	and scalable search.
	16
	17
	18	Auth Token generation
	19	^^^^^^^^^^^^^^^^^^^^^
	20
	21	RhodeCode indexer runs on top of \|RCE\| API and requires an \|authtoken\| before continuing.
	22	To run the indexer you need to have an \|authtoken\| with admin rights to all of \|repos\| that indexer should
	23	process.
	24
	25	To get your API Token, on the \|RCE\| interface go to
	26	Click on the icon with your user in top right corner :menuselection:`your-username --> My Account --> Auth tokens`
	27
	28	1. Put a description for the \|authtoken\|
	29	2. Select expiration date if desired
	30	3. Select `api calls` role for the token
	31	4. Click :guilabel:`Add`
	32	5. Click on the obfuscated generated token, and copy it.
	33
	34
	35	Indexing
	36	^^^^^^^^
	37
	38	To index repositories stored in RhodeCode, you have the option to set the indexer up in a
	39	number of ways, for example:
	40
	41	* Call the indexer via a cron job. We recommend running this once at night.
	42	In case you need everything indexed immediately it's possible to index few
	43	times during the day. Indexer has a special locking mechanism that won't allow
	44	two instances of indexer running at once. It's safe to run it even every 1hr.
	45	* Hook the indexer up with your CI server to reindex after each push.
	46	* Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.
	47	This allows to get an instance indexing of content that would be available seconds after changes happen.
	48
	49	The indexer works by indexing new commits added since the last run, and comparing
	50	file changes to index only new or modified files across each invocation.
	51
	52	.. note::
	53
	54	If you wish to build a brand new index from scratch each time, use the ``force``
	55	option in the configuration file, or run it with --force flag.
	56
	57
	58	To set up indexing, use the following steps:
	59
	60	1. :ref:`config-rhoderc`
	61	2. :ref:`run-index`
	62	3. :ref:`set-index`
	63	4. :ref:`advanced-indexing`
	64
	65
	66	.. _config-rhoderc:
	67
	68	Configure the ``.rhoderc`` File
	69	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	70
	71	.. note::
	72
	73	Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of
	74	executing with `--instance-name=rc-idx` execute providing the host and token
	75	directly: `--api-host=https://your-host.example.com --api-key=<auth-token-goes-here>`
	76
	77
	78	.. note::
	79
	80	In some cases the domain could be only available via the custom DNS, you can always refer the
	81	instance by it's docker name and port (`http://rhodecode:10020`) instead of hostname, for example:
	82
	83	.. code-block:: bash
	84
	85	./rcstack cli cmd rhodecode-index --api-host=http://rhodecode:10020 --api-key=xxx
	86
	87
	88	Indexer uses the :file:`/home/{user}/.rhoderc` file for connection details
	89	to \|RCE\| instances. You need to configure the details for each instance you want to index.
	90
	91
	92	.. code-block:: bash
	93
	94	./rcstack cli cmd rhodecode-setup-config \
	95	--filename=/etc/rhodecode/conf/.rhoderc \
	96	--instance-name=rc-idx api_host=https://your-host.example.com,api_key=<auth-token-goes-here>
	97
	98
	99	Here's an example generated config you might also mount as a file to the docker image.
	100
	101	.. code-block:: ini
	102
	103	# Configure .rhoderc with matching details
	104	# This allows the indexer to connect to the instance
	105	[instance:rc-idx]
	106	api_host = https://your-host.example.com
	107	api_key = <auth token goes here>
	108
	109
	110	.. _run-index:
	111
	112
	113	Run the Indexer
	114	^^^^^^^^^^^^^^^
	115
	116	Run the indexer using the following command, and specify the instance you want to index:
	117
	118	.. code-block:: bash
	119
	120	# Using default simples indexing of all repositories
	121	$ ./rcstack cli cmd rhodecode-index \
	122	--no-tty --config=/etc/rhodecode/conf/.rhoderc \
	123	--instance-name=rc-idx
	124
	125	# Using a custom mapping file and invocation without ``.rhoderc``
	126	$ ./rcstack cli cmd rhodecode-index \
	127	--no-tty \
	128	--api-host=https://your-host.example.com --api-key=xxxxx \
	129	--mapping=/etc/rhodecode/conf/search_mapping.ini
	130
	131	# Using a custom mapping file with indexing rules, and using elasticsearch 6 backend
	132	$ ./rcstack cli cmd rhodecode-index \
	133	--no-tty --config=/etc/rhodecode/conf/.rhoderc \
	134	--instance-name=rc-idx \
	135	--mapping=/etc/rhodecode/conf/search_mapping.ini \
	136	--es-version=6 --engine-location=http://elasticsearch:9200
	137
	138	# For some advanced usage, please check --help flag to see what other CLI options are available
	139	``$ ./rcstack cli cmd rhodecode-index --help``
	140
	141	.. note::
	142
	143	In case of often indexing using Whoosh backend the index may become fragmented. Most often a result of that
	144	is error about `too many open files`. To fix this indexer needs to be executed with `--optimize` flag. E.g
	145
	146	.. code-block:: bash
	147
	148	$ ./rcstack cli cmd rhodecode-index --instance-name=rc-idx --optimize
	149
	150	This should be executed regularly, once a week is recommended. When using ElasticSearch this step can be skipped.
	151
	152
	153	.. _set-index:
	154
	155	Schedule the Indexer
	156	^^^^^^^^^^^^^^^^^^^^
	157
	158	To schedule the indexer, configure the crontab file to run the indexer inside
	159	your \|RCT\| virtualenv using the following steps.
	160
	161	1. Open the crontab file, using ``crontab -e``.
	162	2. Add the indexer to the crontab, and schedule it to run as regularly as you
	163	wish.
	164	3. Save the file.
	165
	166	.. code-block:: bash
	167
	168	$ crontab -e
	169
	170	# The virtualenv can be called using its full path, so for example you can
	171	# put this example into the crontab
	172
	173	# Run the indexer daily at 4am using the default mapping settings, --no-tty is required for non interactive calls
	174	* 4 * * * ./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx
	175
	176	# Run the indexer every Sunday at 3am using default mapping
	177	* 3 * * 0 ./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx
	178
	179	# Run the indexer every 15 minutes
	180	# using a specially configured mapping file
	181	/15 * * * ./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx --mapping=/etc/rhodecode/conf/search_mapping.ini
	182
	183	.. _advanced-indexing:
	184
	185	Advanced Indexing
	186	^^^^^^^^^^^^^^^^^
	187
	188
	189	Force Re-Indexing single repository
	190	+++++++++++++++++++++++++++++++++++
	191
	192	Often it's required to re-index whole repository because of some repository changes,
	193	or to remove some indexed secrets, or files. There's a special `--repo-name=` flag
	194	for the indexer that limits execution to a single repository. For example to force-reindex
	195	single repository such call can be made
	196
	197	.. code-block:: bash
	198
	199	./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx --force --repo-name=rhodecode-vcsserver
	200
	201	Limiting indexing to small number of repos
	202	++++++++++++++++++++++++++++++++++++++++++
	203
	204	Often to preserve memory usage and system load we might limit the number of repositories processed on each call.
	205	There's a special `--repo-limit=` flag for the indexer that limits execution to a N repositories.
	206
	207	.. code-block:: bash
	208
	209	./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx --repo-limit=10
	210
	211
	212	Removing repositories from index
	213	++++++++++++++++++++++++++++++++
	214
	215	The indexer automatically removes renamed repositories and builds index for new names.
	216	In the same way if a listed repository in mapping.ini is not reported existing by the
	217	server it's removed from the index.
	218	In case that you wish to remove indexed repository manually such call would allow that
	219
	220	.. code-block:: bash
	221
	222	./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx --remove-only --repo-name=rhodecode-vcsserver
	223
	224
	225	Using search_mapping.ini file for advanced index rules
	226	++++++++++++++++++++++++++++++++++++++++++++++++++++++
	227
	228	By default rhodecode-index runs for all repositories, all files with parsing limits
	229	defined by the CLI default arguments. You can change those limits by calling with
	230	different flags such as `--max-filesize=2048kb` or `--repo-limit=10`
	231
	232	For more advanced execution logic it's possible to use a configuration file that
	233	would define detailed rules which repositories and how should be indexed.
	234
	235	To create the :file:`search_mapping.ini` file manually, use the below command
	236
	237	.. code-block:: bash
	238
	239	./rcstack cli cmd rhodecode-index --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx \
	240	--create-mapping=/etc/rhodecode/conf/search_mapping.ini
	241
	242
	243	Now the indexer can be executed with `--mapping` flag
	244
	245
	246	Here's a detailed example of using :file:`search_mapping.ini` file.
	247
	248	.. code-block:: ini
	249
	250	[__DEFAULT__]
	251	; Create index on commits data, and files data in this order. Available options
	252	; are `commits`, `files`
	253	index_types = commits,files
	254
	255	; Commit fetch limit. In what amount of chunks commits should be fetched
	256	; via api and parsed. This allows server to transfer smaller chunks and be less loaded
	257	commit_fetch_limit = 1000
	258
	259	; Commit process limit. Limit the number of commits indexer should fetch, and
	260	; store inside the full text search index. eg. if repo has 2000 commits, and
	261	; limit is 1000, on the first run it will process commits 0-1000 and on the
	262	; second 1000-2000 commits. Help reduce memory usage, default is 50000
	263	; (set -1 for unlimited)
	264	commit_process_limit = 20000
	265
	266	; Limit of how many repositories each run can process, default is -1 (unlimited)
	267	; in case of 1000s of repositories it's better to execute in chunks to not overload
	268	; the server.
	269	repo_limit = -1
	270
	271	; Default patterns for indexing files and content of files. Binary files
	272	; are skipped by default.
	273
	274	; Add to index those comma separated files; globs syntax
	275	; e.g index_files = .py, .c, .h, .js
	276	index_files = *,
	277
	278	; Do not add to index those comma separated files, this excludes
	279	; both search by name and content; globs syntax
	280	; e.g index_files = .key, .sql, .xml, .pem, *.crt
	281	skip_files = ,
	282
	283	; Add to index content of those comma separated files; globs syntax
	284	; e.g index_files = .h, .obj
	285	index_files_content = *,
	286
	287	; Do not add to index content of those comma separated files; globs syntax
	288	; Binary files are not indexed by default.
	289	; e.g index_files = .min.js, .xml, .dump, .log, *.dump
	290	skip_files_content = ,
	291
	292	; Force rebuilding an index from scratch. Each repository will be rebuild from
	293	; scratch with a global flag. Use --repo-name=NAME --force to rebuild single repo
	294	force = false
	295
	296	; maximum file size that indexer will use, files above that limit are not going
	297	; to have they content indexed.
	298	; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB
	299	max_filesize = 10MB
	300
	301
	302	[__INDEX_RULES__]
	303	; Ordered match rules for repositories. A list of all repositories will be fetched
	304	; using API and this list will be filtered using those rules.
	305	; Syntax for entry: `glob_pattern_OR_full_repo_name = 0 OR 1` where 0=exclude, 1=include
	306	; When this ordered list is traversed first match will return the include/exclude marker
	307	; For example:
	308	; upstream/binary_repo = 0
	309	; upstream/subrepo/xml_files = 0
	310	; upstream/* = 1
	311	; special-repo = 1
	312	; * = 0
	313	; This will index all repositories under upstream/*, but skip upstream/binary_repo
	314	; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches
	315
	316
	317	; == EXPLICIT REPOSITORY INDEXING ==
	318	; If defined this will skip using __INDEX_RULES__, and will not use API to fetch
	319	; list of repositories, it will explicitly take names defined with [NAME] format and
	320	; try to build the index, to build index just for repo_name_1 and special-repo use:
	321	; [repo_name_1]
	322	; [special-repo]
	323
	324	; == PER REPOSITORY CONFIGURATION ==
	325	; This allows overriding the global configuration per repository.
	326	; example to set specific file limit, and skip certain files for repository special-repo
	327	; the CLI flags doesn't override the conf settings.
	328	; [conf:special-repo]
	329	; max_filesize = 5mb
	330	; skip_files = .xml, .sql
	331
	332
	333
	334	In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.
	335	There's a special flag to test the mapping file rules and list repositories that would
	336	be indexed. Run the indexer with `--show-matched-repos` to list only the
	337	match repositories defined in .ini file rules
	338
	339	.. code-block:: bash
	340
	341	./rcstack cli cmd rhodecode-index --no-tty --config=/etc/rhodecode/conf/.rhoderc --instance-name=rc-idx --show-matched-repos --mapping=/etc/rhodecode/conf/search_mapping.ini
	342
	343
	344	.. _enable-elasticsearch:
	345
	346	Enabling ElasticSearch
	347	^^^^^^^^^^^^^^^^^^^^^^
	348
	349	ElasticSearch is available in EE edition only. It provides much scalable and more advanced
	350	search capabilities. While Whoosh is fine for upto 1-2GB of data, beyond that amount it
	351	starts slowing down, and can cause other problems.
	352	New ElasticSearch 6 also provides much more advanced query language.
	353	It allows advanced filtering by file paths, extensions, use OR statements, ranges etc.
	354	Please check query language examples in the search field for some advanced query language usage.
	355
	356
	357	1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
	358	default location is :file:`config/_shared/rhodecode.ini`
	359	2. Find the search configuration section:
	360
	361	.. code-block:: ini
	362
	363	###################################
	364	## SEARCH INDEXING CONFIGURATION ##
	365	###################################
	366
	367	search.module = rhodecode.lib.index.whoosh
	368	search.location = %(here)s/data/index
	369
	370	and change it to:
	371
	372	.. code-block:: ini
	373
	374	search.module = rc_elasticsearch
	375	search.location = http://elasticsearch:9200
	376	## specify Elastic Search version, 6 for latest or 2 for legacy
	377	search.es_version = 6
	378
	379	where ``search.location`` points to the ElasticSearch server
	380	by default running on port 9200.
	381
	382	Index invocation also needs change. Please provide `--es-version=` and
	383	`--engine-location=` parameters to define ElasticSearch server location and it's version.
	384	For example::
	385
	386	--instance-name=rc-idx --es-version=6 --engine-location=http://elasticsearch:9200
	387
	388
	389	.. _Whoosh: https://pypi.python.org/pypi/Whoosh/
	390	.. _ElasticSearch 6: https://www.elastic.co/