u/pc/rhodecode-enterprise-ce-fork-pc Commit - r3482:4b2dd92b

1

.. _indexing-ref:

1

.. _indexing-ref:

2

3

Full-text Search

3

Full-text Search

4

----------------

4

----------------

5

6

RhodeCode provides a full text search capabilities to search inside file content,

7

commit message, and file paths. Indexing is not enabled by default and to use

8

full text search building an index is a pre-requisite.

9

6

By default RhodeCode is configured to use `Whoosh`_ to index |repos| and

10

By default RhodeCode is configured to use `Whoosh`_ to index |repos| and

7

provide full-text search.

11

provide full-text search. `Whoosh`_ works well for a small amount of data and

12

shouldn't be used in case of large code-bases and lots of repositories.

8

13

9

|RCE| also provides support for `Elasticsearch 6`_ as a backend more for advanced

14

|RCE| also provides support for `ElasticSearch 6`_ as a backend more for advanced

10

and scalable search. See :ref:`enable-elasticsearch` for details.

15

and scalable search. See :ref:`enable-elasticsearch` for details.

11

16

12

Indexing

17

Indexing

13

^^^^^^^^

18

^^^^^^^^

14

19

15

To run the indexer you need to have an |authtoken| with admin rights to all |repos|.

20

To run the indexer you need to have an |authtoken| with admin rights to all |repos|.

16

21

17

To index ~~new content added~~, you have the option to set the indexer up in a

22

To index repositories stored in RhodeCode, you have the option to set the indexer up in a

18

number of ways, for example:

23

number of ways, for example:

19

24

20

* Call the indexer via a cron job. We recommend running this once at night.

25

* Call the indexer via a cron job. We recommend running this once at night.

21

In case you need everything indexed immediately it's possible to index few

26

In case you need everything indexed immediately it's possible to index few

22

times during the day.

27

times during the day. Indexer has a special locking mechanism that won't allow

28

two instances of indexer running at once. It's safe to run it even every 1hr.

23

* Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.

29

* Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.

24

* Hook the indexer up with your CI server to reindex after each push.

30

* Hook the indexer up with your CI server to reindex after each push.

25

31

26

The indexer works by indexing new commits added since the last run~~. If you~~

32

The indexer works by indexing new commits added since the last run, and comparing

27

wish to build a brand new index from scratch each time,

33

file changes to index only new or modified files.

28

use the ``force`` option in the configuration file.

34

If you wish to build a brand new index from scratch each time, use the ``force``

35

option in the configuration file, or run it with --force flag.

29

36

30

.. important::

37

.. important::

31

38

32

You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|

39

You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|

33

3.5.0 they are installed by default and available with community/enterprise installations.

40

3.5.0 they are installed by default and available with community/enterprise installations.

34

41

35

To set up indexing, use the following steps:

42

To set up indexing, use the following steps:

36

43

37

1. :ref:`config-rhoderc`, if running tools remotely.

44

1. :ref:`config-rhoderc`, if running tools remotely.

38

2. :ref:`run-index`

45

2. :ref:`run-index`

39

3. :ref:`set-index`

46

3. :ref:`set-index`

40

4. :ref:`advanced-indexing`

47

4. :ref:`advanced-indexing`

41

48

42

.. _config-rhoderc:

49

.. _config-rhoderc:

43

50

44

Configure the ``.rhoderc`` File

51

Configure the ``.rhoderc`` File

45

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

52

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

46

53

47

.. note::

54

.. note::

48

55

49

Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of

56

Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of

50

executing with `--instance-name=enterprise-1` execute providing the host and token

57

executing with `--instance-name=enterprise-1` execute providing the host and token

51

directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth ~~token goes here>~~

58

directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth-token-goes-here>`

52

59

53

60

54

|RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details

61

|RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details

55

to |RCE| instances. If this file is not automatically created,

62

to |RCE| instances. If this file is not automatically created,

56

you can configure it using the following example. You need to configure the

63

you can configure it using the following example. You need to configure the

57

details for each instance you want to index.

64

details for each instance you want to index.

58

65

59

.. code-block:: bash

66

.. code-block:: bash

60

67

61

# Check the instance details

68

# Check the instance details

62

# of the instance you want to index

69

# of the instance you want to index

63

$ rccontrol status

70

$ rccontrol status

64

71

65

- NAME: enterprise-1

72

- NAME: enterprise-1

66

- STATUS: RUNNING

73

- STATUS: RUNNING

67

- TYPE: Enterprise

74

- TYPE: Enterprise

68

- VERSION: 4.1.0

75

- VERSION: 4.1.0

69

- URL: http://127.0.0.1:10003

76

- URL: http://127.0.0.1:10003

70

77

71

To get your API Token, on the |RCE| interface go to

78

To get your API Token, on the |RCE| interface go to

72

:menuselection:`username --> My Account --> Auth tokens`

79

:menuselection:`username --> My Account --> Auth tokens`

73

80

74

.. code-block:: ini

81

.. code-block:: ini

75

82

76

# Configure .rhoderc with matching details

83

# Configure .rhoderc with matching details

77

# This allows the indexer to connect to the instance

84

# This allows the indexer to connect to the instance

78

[instance:enterprise-1]

85

[instance:enterprise-1]

79

api_host = http://127.0.0.1:10000

86

api_host = http://127.0.0.1:10000

80

api_key = <auth token goes here>

87

api_key = <auth token goes here>

81

88

82

89

83

.. _run-index:

90

.. _run-index:

84

91

85

Run the Indexer

92

Run the Indexer

86

^^^^^^^^^^^^^^^

93

^^^^^^^^^^^^^^^

87

94

88

Run the indexer using the following command, and specify the instance you want to index:

95

Run the indexer using the following command, and specify the instance you want to index:

89

96

90

.. code-block:: bash

97

.. code-block:: bash

91

98

92

# Using default ~~installation~~

99

# Using default simples indexing of all repositories

93

$ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \

100

$ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \

94

--instance-name=enterprise-1

101

--instance-name=enterprise-1

95

102

96

# Using a custom mapping file

103

# Using a custom mapping file with indexing rules, and using elasticsearch 6 backend

97

$ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \

104

$ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \

98

--instance-name=enterprise-1 \

105

--instance-name=enterprise-1 \

99

--mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini

106

--mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini \

107

--es-version=6 --engine-location=http://elasticsearch-host:9200

100

108

101

# Using a custom mapping file and invocation without ``.rhoderc``

109

# Using a custom mapping file and invocation without ``.rhoderc``

102

$ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \

110

$ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \

103

--api-host=http://rhodecodecode.myserver.com --api-key=xxxxx \

111

--api-host=http://rhodecodecode.myserver.com --api-key=xxxxx \

104

--mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini

112

--mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini

105

113

106

# From inside a virtualev on your local machine or CI server.

114

# From inside a virtualev on your local machine or CI server.

107

(venv)$ rhodecode-index --instance-name=enterprise-1

115

(venv)$ rhodecode-index --instance-name=enterprise-1

108

116

109

117

110

.. note::

118

.. note::

111

119

112

In case of often indexing the index may become fragmented. Most often a result of that

120

In case of often indexing the index may become fragmented. Most often a result of that

113

is error about `too many open files`. To fix this indexer needs to be executed with

121

is error about `too many open files`. To fix this indexer needs to be executed with

114

--optimize flag. E.g `rhodecode-index --instance-name=enterprise-1 --optimize`

122

--optimize flag. E.g `rhodecode-index --instance-name=enterprise-1 --optimize`

115

This should be executed regularly, once a week is recommended.

123

This should be executed regularly, once a week is recommended.

116

124

117

125

118

.. _set-index:

126

.. _set-index:

119

127

120

Schedule the Indexer

128

Schedule the Indexer

121

^^^^^^^^^^^^^^^^^^^^

129

^^^^^^^^^^^^^^^^^^^^

122

130

123

To schedule the indexer, configure the crontab file to run the indexer inside

131

To schedule the indexer, configure the crontab file to run the indexer inside

124

your |RCT| virtualenv using the following steps.

132

your |RCT| virtualenv using the following steps.

125

133

126

1. Open the crontab file, using ``crontab -e``.

134

1. Open the crontab file, using ``crontab -e``.

127

2. Add the indexer to the crontab, and schedule it to run as regularly as you

135

2. Add the indexer to the crontab, and schedule it to run as regularly as you

128

wish.

136

wish.

129

3. Save the file.

137

3. Save the file.

130

138

131

.. code-block:: bash

139

.. code-block:: bash

132

140

133

$ crontab -e

141

$ crontab -e

134

142

135

# The virtualenv can be called using its full path, so for example you can

143

# The virtualenv can be called using its full path, so for example you can

136

# put this example into the crontab

144

# put this example into the crontab

137

145

138

# Run the indexer daily at 4am using the default mapping settings

146

# Run the indexer daily at 4am using the default mapping settings

139

* 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \

147

* 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \

140

--instance-name=enterprise-1

148

--instance-name=enterprise-1

141

149

142

# Run the indexer every Sunday at 3am using default mapping

150

# Run the indexer every Sunday at 3am using default mapping

143

* 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \

151

* 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \

144

--instance-name=enterprise-1

152

--instance-name=enterprise-1

145

153

146

# Run the indexer every 15 minutes

154

# Run the indexer every 15 minutes

147

# using a specially configured mapping file

155

# using a specially configured mapping file

148

*/15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \

156

*/15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \

149

--instance-name=enterprise-4 \

157

--instance-name=enterprise-4 \

150

--mapping=/home/user/.rccontrol/enterprise-4/search_mapping.ini

158

--mapping=/home/user/.rccontrol/enterprise-4/search_mapping.ini

151

159

152

.. _advanced-indexing:

160

.. _advanced-indexing:

153

161

154

Advanced Indexing

162

Advanced Indexing

155

^^^^^^^^^^^^^^^^^

163

^^^^^^^^^^^^^^^^^

156

164

157

165

158

Force Re-Indexing single repository

166

Force Re-Indexing single repository

159

+++++++++++++++++++++++++++++++++++

167

+++++++++++++++++++++++++++++++++++

160

168

161

Often it's required to re-index whole repository because of some repository changes,

169

Often it's required to re-index whole repository because of some repository changes,

162

or to remove some indexed secrets, or files. There's a special `--repo-name=` flag

170

or to remove some indexed secrets, or files. There's a special `--repo-name=` flag

163

for the indexer that limits execution to a single repository. For example to force-reindex

171

for the indexer that limits execution to a single repository. For example to force-reindex

164

single repository such call can be made::

172

single repository such call can be made::

165

173

166

rhodecode-index --instance-name=enterprise-1 --force --repo-name=rhodecode-vcsserver

174

rhodecode-index --instance-name=enterprise-1 --force --repo-name=rhodecode-vcsserver

167

175

168

176

169

Removing repositories from index

177

Removing repositories from index

170

++++++++++++++++++++++++++++++++

178

++++++++++++++++++++++++++++++++

171

179

172

The indexer automatically removes renamed repositories and builds index for new names.

180

The indexer automatically removes renamed repositories and builds index for new names.

181

In the same way if a listed repository in mapping.ini is not reported existing by the

182

server it's removed from the index.

173

In case that you wish to remove indexed repository manually such call would allow that::

183

In case that you wish to remove indexed repository manually such call would allow that::

174

184

175

rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver

185

rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver

176

186

177

187

178

Using search_mapping.ini file for advanced index rules

188

Using search_mapping.ini file for advanced index rules

179

++++++++++++++++++++++++++++++++++++++++++++++++++++++

189

++++++++++++++++++++++++++++++++++++++++++++++++++++++

180

190

181

By default rhodecode-index runs for all repositories, all files with parsing limits

191

By default rhodecode-index runs for all repositories, all files with parsing limits

182

defined by the CLI default arguments. You can change those limits by calling with

192

defined by the CLI default arguments. You can change those limits by calling with

183

different flags such as `--max-filesize 2048kb` or `--repo-limit 10`

193

different flags such as `--max-filesize=2048kb` or `--repo-limit=10`

184

194

185

For more advanced execution logic it's possible to use a configuration file that

195

For more advanced execution logic it's possible to use a configuration file that

186

would define detailed rules which repositories and how should be indexed.

196

would define detailed rules which repositories and how should be indexed.

187

197

188

|RCT| provides an example index configuration file called :file:`search_mapping.ini`.

198

|RCT| provides an example index configuration file called :file:`search_mapping.ini`.

189

This file is created by default during installation and is located at:

199

This file is created by default during installation and is located at:

190

200

191

* :file:`/home/{user}/.rccontrol/{instance-id}/search_mapping.ini`, using default |RCT|.

201

* :file:`/home/{user}/.rccontrol/{instance-id}/search_mapping.ini`, using default |RCT|.

192

* :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,

202

* :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,

193

when using ``virtualenv``.

203

when using ``virtualenv``.

194

204

195

.. note::

205

.. note::

196

206

197

If you need to create the :file:`search_mapping.ini` file manually, use the |RCT|

207

If you need to create the :file:`search_mapping.ini` file manually, use the |RCT|

198

``rhodecode-index --create-mapping path/to/search_mapping.ini`` API call.

208

``rhodecode-index --create-mapping path/to/search_mapping.ini`` API call.

199

For details, see the :ref:`tools-cli` section.

209

For details, see the :ref:`tools-cli` section.

200

210

201

To Run the indexer with mapping file provide it using `--mapping` flag::

211

To Run the indexer with mapping file provide it using `--mapping` flag::

202

212

203

rhodecode-index --instance-name=enterprise-1 --mapping=/my/path/search_mapping.ini

213

rhodecode-index --instance-name=enterprise-1 --mapping=/my/path/search_mapping.ini

204

214

205

215

206

Here's a detailed example of using :file:`search_mapping.ini` file.

216

Here's a detailed example of using :file:`search_mapping.ini` file.

207

217

208

.. code-block:: ini

218

.. code-block:: ini

209

219

210

[__DEFAULT__]

220

[__DEFAULT__]

211

; Create index on commits data, and files data in this order. Available options

221

; Create index on commits data, and files data in this order. Available options

212

; are `commits`, `files`

222

; are `commits`, `files`

213

index_types = commits,files

223

index_types = commits,files

214

224

215

; Commit fetch limit. In what amount of chunks commits should be fetched

225

; Commit fetch limit. In what amount of chunks commits should be fetched

216

; via api and parsed. This allows server to transfer smaller chunks and be less loaded

226

; via api and parsed. This allows server to transfer smaller chunks and be less loaded

217

commit_fetch_limit = 1000

227

commit_fetch_limit = 1000

218

228

219

; Commit process limit. Limit the number of commits indexer should fetch, and

229

; Commit process limit. Limit the number of commits indexer should fetch, and

220

; store inside the full text search index. eg. if repo has 2000 commits, and

230

; store inside the full text search index. eg. if repo has 2000 commits, and

221

; limit is 1000, on the first run it will process commits 0-1000 and on the

231

; limit is 1000, on the first run it will process commits 0-1000 and on the

222

; second 1000-2000 commits. Help reduce memory usage, default is 50000

232

; second 1000-2000 commits. Help reduce memory usage, default is 50000

223

; (set -1 for unlimited)

233

; (set -1 for unlimited)

224

commit_process_limit = 50000

234

commit_process_limit = 20000

225

235

226

; Limit of how many repositories each run can process, default is -1 (unlimited)

236

; Limit of how many repositories each run can process, default is -1 (unlimited)

227

; in case of 1000s of repositories it's better to execute in chunks to not overload

237

; in case of 1000s of repositories it's better to execute in chunks to not overload

228

; the server.

238

; the server.

229

repo_limit = -1

239

repo_limit = -1

230

240

231

; Default patterns for indexing files and content of files. Binary files

241

; Default patterns for indexing files and content of files. Binary files

232

; are skipped by default.

242

; are skipped by default.

233

243

234

; Add to index those comma separated files; globs syntax

244

; Add to index those comma separated files; globs syntax

235

; e.g index_files = *.py, *.c, *.h, *.js

245

; e.g index_files = *.py, *.c, *.h, *.js

236

index_files = *,

246

index_files = *,

237

247

238

; Do not add to index those comma separated files, this excludes

248

; Do not add to index those comma separated files, this excludes

239

; both search by name and content; globs syntax

249

; both search by name and content; globs syntax

240

; e.g index_files = *.key, *.sql, *.xml

250

; e.g index_files = *.key, *.sql, *.xml, *.pem, *.crt

241

skip_files = ,

251

skip_files = ,

242

252

243

; Add to index content of those comma separated files; globs syntax

253

; Add to index content of those comma separated files; globs syntax

244

; e.g index_files = *.h, *.obj

254

; e.g index_files = *.h, *.obj

245

index_files_content = *,

255

index_files_content = *,

246

256

247

; Do not add to index content of those comma separated files; globs syntax

257

; Do not add to index content of those comma separated files; globs syntax

248

; e.g index_files = *.exe, *.bin, *.log, *.dump

258

; Binary files are not indexed by default.

259

; e.g index_files = *.min.js, *.xml, *.dump, *.log, *.dump

249

skip_files_content = ,

260

skip_files_content = ,

250

261

251

; Force rebuilding an index from scratch. Each repository will be rebuild from

262

; Force rebuilding an index from scratch. Each repository will be rebuild from

252

; scratch with a global flag. Use --repo-name=NAME --force to rebuild single repo

263

; scratch with a global flag. Use --repo-name=NAME --force to rebuild single repo

253

force = false

264

force = false

254

265

255

; maximum file size that indexer will use, files above that limit are not going

266

; maximum file size that indexer will use, files above that limit are not going

256

; to have they content indexed.

267

; to have they content indexed.

257

; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB

268

; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB

258

max_filesize = 2MB

269

max_filesize = 10MB

259

270

260

271

261

[__INDEX_RULES__]

272

[__INDEX_RULES__]

262

; Ordered match rules for repositories. A list of all repositories will be fetched

273

; Ordered match rules for repositories. A list of all repositories will be fetched

263

; using API and this list will be filtered using those rules.

274

; using API and this list will be filtered using those rules.

264

; Syntax for entry: `glob_pattern_OR_full_repo_name = 0 OR 1` where 0=exclude, 1=include

275

; Syntax for entry: `glob_pattern_OR_full_repo_name = 0 OR 1` where 0=exclude, 1=include

265

; When this ordered list is traversed first match will return the include/exclude marker

276

; When this ordered list is traversed first match will return the include/exclude marker

266

; For example:

277

; For example:

267

; upstream/binary_repo = 0

278

; upstream/binary_repo = 0

268

; upstream/subrepo/xml_files = 0

279

; upstream/subrepo/xml_files = 0

269

; upstream/* = 1

280

; upstream/* = 1

270

; special-repo = 1

281

; special-repo = 1

271

; * = 0

282

; * = 0

272

; This will index all repositories under upstream/*, but skip upstream/binary_repo

283

; This will index all repositories under upstream/*, but skip upstream/binary_repo

273

; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches

284

; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches

274

285

275

; Another example:

276

; *-fork = 0

277

; * = 1

278

; This will index all repositories, except those that have -fork as suffix.

279

280

rhodecode-vcsserver = 1

281

rhodecode-enterprise-ce = 1

282

upstream/mozilla/firefox-repo = 0

283

upstream/git-binaries = 0

284

upstream/* = 1

285

* = 0

286

287

; == EXPLICIT REPOSITORY INDEXING ==

287

; == EXPLICIT REPOSITORY INDEXING ==

288

; If defined this will skip using __INDEX_RULES__, and will not use API to fetch

288

; If defined this will skip using __INDEX_RULES__, and will not use API to fetch

289

; list of repositories, it will explicitly take names defined with [NAME] format and

289

; list of repositories, it will explicitly take names defined with [NAME] format and

290

; try to build the index, to build index just for repo_name_1 and special-repo use:

290

; try to build the index, to build index just for repo_name_1 and special-repo use:

291

; [repo_name_1]

291

; [repo_name_1]

292

; [special-repo]

292

; [special-repo]

293

294

; == PER REPOSITORY CONFIGURATION ==

294

; == PER REPOSITORY CONFIGURATION ==

295

; This allows overriding the global configuration per repository.

295

; This allows overriding the global configuration per repository.

296

; example to set specific file limit, and skip certain files for repository special-repo

296

; example to set specific file limit, and skip certain files for repository special-repo

297

; the CLI flags doesn't override the conf settings.

297

; [conf:special-repo]

298

; [conf:special-repo]

298

; max_filesize = 5mb

299

; max_filesize = 5mb

299

; skip_files = *.xml, *.sql

300

; skip_files = *.xml, *.sql

300

; index_types = files,

301

302

[conf:rhodecode-vcsserver]

303

index_types = files,

304

max_filesize = 5mb

305

skip_files = *.xml, *.sql

306

index_files = *.py, *.c, *.h, *.js

307

302

308

303

309

In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.

304

In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.

310

There's a special flag to test the mapping file rules and list repositories that would

305

There's a special flag to test the mapping file rules and list repositories that would

311

be indexed. Run the indexer with `--show-matched-repos` to list only the ~~match rules~~::

306

be indexed. Run the indexer with `--show-matched-repos` to list only the

307

match repositories defined in .ini file rules::

312

308

313

rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini

309

rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini

314

310

315

311

316

.. _enable-elasticsearch:

312

.. _enable-elasticsearch:

317

313

318

Enabling Elasticsearch

314

Enabling ElasticSearch

319

^^^^^^^^^^^^^^^^^^^^^^

315

^^^^^^^^^^^^^^^^^^^^^^

320

316

321

Elasticsearch is available in EE edition only. It provides much scalable and more advanced

317

ElasticSearch is available in EE edition only. It provides much scalable and more advanced

322

search capabilities. While Whoosh is fine for upto 1-2GB of data beyond that amount of

318

search capabilities. While Whoosh is fine for upto 1-2GB of data, beyond that amount it

323

~~data it~~ starts slowing down, and can cause other problems. ~~Elasticsearch 6 also provides~~

319

starts slowing down, and can cause other problems.

324

much more advanced query language allowing advanced filtering by file paths, extensions

320

New ElasticSearch 6 also provides much more advanced query language.

325

OR statements, ranges etc. Please check query language examples in the search field for

321

It allows advanced filtering by file paths, extensions, use OR statements, ranges etc.

326

some advanced query language usage.

322

Please check query language examples in the search field for some advanced query language usage.

327

323

328

324

329

1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The

325

1. Open the :file:`rhodecode.ini` file for the instance you wish to edit. The

330

default location is

326

default location is

331

:file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`

327

:file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`

332

2. Find the search configuration section:

328

2. Find the search configuration section:

333

329

334

.. code-block:: ini

330

.. code-block:: ini

335

331

336

###################################

332

###################################

337

## SEARCH INDEXING CONFIGURATION ##

333

## SEARCH INDEXING CONFIGURATION ##

338

###################################

334

###################################

339

335

340

search.module = rhodecode.lib.index.whoosh

336

search.module = rhodecode.lib.index.whoosh

341

search.location = %(here)s/data/index

337

search.location = %(here)s/data/index

342

338

343

and change it to:

339

and change it to:

344

340

345

.. code-block:: ini

341

.. code-block:: ini

346

342

347

search.module = rc_elasticsearch

343

search.module = rc_elasticsearch

348

search.location = http://localhost:9200

344

search.location = http://localhost:9200

349

## specify Elastic Search version, 6 for latest or 2 for legacy

345

## specify Elastic Search version, 6 for latest or 2 for legacy

350

search.es_version = 6

346

search.es_version = 6

351

347

352

where ``search.location`` points to the ~~elastics~~earch server

348

where ``search.location`` points to the ElasticSearch server

353

by default running on port 9200.

349

by default running on port 9200.

354

350

355

Index invocation also needs change. Please provide --es-version= and

351

Index invocation also needs change. Please provide --es-version= and

356

--engine-location= parameters to define ~~elastics~~earch server location and it's version.

352

--engine-location= parameters to define ElasticSearch server location and it's version.

357

For example::

353

For example::

358

354

359

rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200

355

rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200

360

356

361

357

362

.. _Whoosh: https://pypi.python.org/pypi/Whoosh/

358

.. _Whoosh: https://pypi.python.org/pypi/Whoosh/

363

.. _Elasticsearch 6: https://www.elastic.co/ No newline at end of file

359

.. _ElasticSearch 6: https://www.elastic.co/

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

             .. _indexing-ref:
             Full-text Search
             ----------------
+            RhodeCode provides a full text search capabilities to search inside file content,
+            commit message, and file paths. Indexing is not enabled by default and to use
+            full text search building an index is a pre-requisite.
             By default RhodeCode is configured to use `Whoosh`_ to index |repos| and
-            provide full-text search.
+            provide full-text search. `Whoosh`_ works well for a small amount of data and
+            shouldn't be used in case of large code-bases and lots of repositories.
-            |RCE| also provides support for `Elasticsearch 6`_ as a backend more for advanced
+            |RCE| also provides support for `ElasticSearch 6`_ as a backend more for advanced
             and scalable search. See :ref:`enable-elasticsearch` for details.
             Indexing
             ^^^^^^^^
             To run the indexer you need to have an |authtoken| with admin rights to all |repos|.
-            To index new content added, you have the option to set the indexer up in a
+            To index repositories stored in RhodeCode, you have the option to set the indexer up in a
             number of ways, for example:
             * Call the indexer via a cron job. We recommend running this once at night.
               In case you need everything indexed immediately it's possible to index few
-              times during the day.
+              times during the day. Indexer has a special locking mechanism that won't allow
+              two instances of indexer running at once. It's safe to run it even every 1hr.
             * Set the indexer to infinitely loop and reindex as soon as it has run its previous cycle.
             * Hook the indexer up with your CI server to reindex after each push.
-            The indexer works by indexing new commits added since the last run. If you
+            The indexer works by indexing new commits added since the last run, and comparing
-            wish to build a brand new index from scratch each time,
+            file changes to index only new or modified files.
-            use the ``force`` option in the configuration file.
+            If you wish to build a brand new index from scratch each time, use the ``force``
+            option in the configuration file, or run it with --force flag.
             .. important::
                You need to have |RCT| installed, see :ref:`install-tools`. Since |RCE|
 .5.0 they are installed by default and available with community/enterprise installations.
             To set up indexing, use the following steps:
 . :ref:`config-rhoderc`, if running tools remotely.
 . :ref:`run-index`
 . :ref:`set-index`
 . :ref:`advanced-indexing`
             .. _config-rhoderc:
             Configure the ``.rhoderc`` File
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
             .. note::
                 Optionally it's possible to use indexer without the ``.rhoderc``. Simply instead of
                 executing with `--instance-name=enterprise-1` execute providing the host and token
-                directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth token goes here>
+                directly: `--api-host=http://127.0.0.1:10000 --api-key=<auth-token-goes-here>`
             |RCT| uses the :file:`/home/{user}/.rhoderc` file for connection details
             to |RCE| instances. If this file is not automatically created,
             you can configure it using the following example. You need to configure the
             details for each instance you want to index.
             .. code-block:: bash
                 # Check the instance details
                 # of the instance you want to index
                 $ rccontrol status
                 - NAME: enterprise-1
                 - STATUS: RUNNING
                 - TYPE: Enterprise
                 - VERSION: 4.1.0
                 - URL: http://127.0.0.1:10003
             To get your API Token, on the |RCE| interface go to
             :menuselection:`username --> My Account --> Auth tokens`
             .. code-block:: ini
                 # Configure .rhoderc with matching details
                 # This allows the indexer to connect to the instance
                 [instance:enterprise-1]
                 api_host = http://127.0.0.1:10000
                 api_key = <auth token goes here>
             .. _run-index:
             Run the Indexer
             ^^^^^^^^^^^^^^^
             Run the indexer using the following command, and specify the instance you want to index:
             .. code-block:: bash
-               # Using default installation
+               # Using default simples indexing of all repositories
                $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
                    --instance-name=enterprise-1
-               # Using a custom mapping file
+               # Using a custom mapping file with indexing rules, and using elasticsearch 6 backend
                $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
                    --instance-name=enterprise-1 \
-                   --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini
+                   --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini \
+                   --es-version=6 --engine-location=http://elasticsearch-host:9200
                # Using a custom mapping file and invocation without ``.rhoderc``
                $ /home/user/.rccontrol/enterprise-1/profile/bin/rhodecode-index \
                    --api-host=http://rhodecodecode.myserver.com --api-key=xxxxx \
                    --mapping=/home/user/.rccontrol/enterprise-1/search_mapping.ini
                # From inside a virtualev on your local machine or CI server.
                (venv)$ rhodecode-index --instance-name=enterprise-1
             .. note::
                In case of often indexing the index may become fragmented. Most often a result of that
                is error about `too many open files`. To fix this indexer needs to be executed with
                --optimize flag. E.g `rhodecode-index --instance-name=enterprise-1 --optimize`
                This should be executed regularly, once a week is recommended.
             .. _set-index:
             Schedule the Indexer
             ^^^^^^^^^^^^^^^^^^^^
             To schedule the indexer, configure the crontab file to run the indexer inside
             your |RCT| virtualenv using the following steps.
 . Open the crontab file, using ``crontab -e``.
 . Add the indexer to the crontab, and schedule it to run as regularly as you
                wish.
 . Save the file.
             .. code-block:: bash
                 $ crontab -e
                 # The virtualenv can be called using its full path, so for example you can
                 # put this example into the crontab
                 # Run the indexer daily at 4am using the default mapping settings
                 * 4 * * * /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
                 --instance-name=enterprise-1
                 # Run the indexer every Sunday at 3am using default mapping
                 * 3 * * 0 /home/ubuntu/.virtualenv/rhodecode-venv/bin/rhodecode-index \
                 --instance-name=enterprise-1
                 # Run the indexer every 15 minutes
                 # using a specially configured mapping file
                 */15 * * * * ~/.rccontrol/enterprise-4/profile/bin/rhodecode-index \
                    --instance-name=enterprise-4 \
                    --mapping=/home/user/.rccontrol/enterprise-4/search_mapping.ini
             .. _advanced-indexing:
             Advanced Indexing
             ^^^^^^^^^^^^^^^^^
             Force Re-Indexing single repository
             +++++++++++++++++++++++++++++++++++
             Often it's required to re-index whole repository because of some repository changes,
             or to remove some indexed secrets, or files. There's a special `--repo-name=` flag
             for the indexer that limits execution to a single repository. For example to force-reindex
             single repository such call can be made::
                 rhodecode-index --instance-name=enterprise-1 --force --repo-name=rhodecode-vcsserver
             Removing repositories from index
             ++++++++++++++++++++++++++++++++
             The indexer automatically removes renamed repositories and builds index for new names.
+            In the same way if a listed repository in mapping.ini is not reported existing by the
+            server it's removed from the index.
             In case that you wish to remove indexed repository manually such call would allow that::
                 rhodecode-index --instance-name=enterprise-1 --remove-only --repo-name=rhodecode-vcsserver
             Using search_mapping.ini file for advanced index rules
             ++++++++++++++++++++++++++++++++++++++++++++++++++++++
             By default rhodecode-index runs for all repositories, all files with parsing limits
             defined by the CLI default arguments. You can change those limits by calling with
-            different flags such as `--max-filesize 2048kb` or `--repo-limit 10`
+            different flags such as `--max-filesize=2048kb` or `--repo-limit=10`
             For more advanced execution logic it's possible to use a configuration file that
             would define detailed rules which repositories and how should be indexed.
             |RCT| provides an example index configuration file called :file:`search_mapping.ini`.
             This file is created by default during installation and is located at:
             * :file:`/home/{user}/.rccontrol/{instance-id}/search_mapping.ini`, using default |RCT|.
             * :file:`~/venv/lib/python2.7/site-packages/rhodecode_tools/templates/mapping.ini`,
               when using ``virtualenv``.
             .. note::
                 If you need to create the :file:`search_mapping.ini` file manually, use the |RCT|
                 ``rhodecode-index --create-mapping path/to/search_mapping.ini`` API call.
                 For details, see the :ref:`tools-cli` section.
             To Run the indexer with mapping file provide it using `--mapping` flag::
                 rhodecode-index --instance-name=enterprise-1 --mapping=/my/path/search_mapping.ini
             Here's a detailed example of using :file:`search_mapping.ini` file.
             .. code-block:: ini
                 [__DEFAULT__]
                 ; Create index on commits data, and files data in this order. Available options
                 ; are `commits`, `files`
                 index_types = commits,files
                 ; Commit fetch limit. In what amount of chunks commits should be fetched
                 ; via api and parsed. This allows server to transfer smaller chunks and be less loaded
                 commit_fetch_limit = 1000
                 ; Commit process limit. Limit the number of commits indexer should fetch, and
                 ; store inside the full text search index. eg. if repo has 2000 commits, and
                 ; limit is 1000, on the first run it will process commits 0-1000 and on the
                 ; second 1000-2000 commits. Help reduce memory usage, default is 50000
                 ; (set -1 for unlimited)
-                commit_process_limit = 50000
+                commit_process_limit = 20000
                 ; Limit of how many repositories each run can process, default is -1 (unlimited)
                 ; in case of 1000s of repositories it's better to execute in chunks to not overload
                 ; the server.
                 repo_limit = -1
                 ; Default patterns for indexing files and content of files. Binary files
                 ; are skipped by default.
                 ; Add to index those comma separated files; globs syntax
                 ; e.g index_files = *.py, *.c, *.h, *.js
                 index_files = *,
                 ; Do not add to index those comma separated files, this excludes
                 ; both search by name and content; globs syntax
-                ; e.g index_files = *.key, *.sql, *.xml
+                ; e.g index_files = *.key, *.sql, *.xml, *.pem, *.crt
                 skip_files = ,
                 ; Add to index content of those comma separated files; globs syntax
                 ; e.g index_files = *.h, *.obj
                 index_files_content = *,
                 ; Do not add to index content of those comma separated files; globs syntax
-                ; e.g index_files = *.exe, *.bin, *.log, *.dump
+                ; Binary files are not indexed by default.
+                ; e.g index_files = *.min.js, *.xml, *.dump, *.log, *.dump
                 skip_files_content = ,
                 ; Force rebuilding an index from scratch. Each repository will be rebuild from
                 ; scratch with a global flag. Use --repo-name=NAME --force to rebuild single repo
                 force = false
                 ; maximum file size that indexer will use, files above that limit are not going
                 ; to have they content indexed.
                 ; Possible options are KB (kilobytes), MB (megabytes), eg 1MB or 1024KB
-                max_filesize = 2MB
+                max_filesize = 10MB
                 [__INDEX_RULES__]
                 ; Ordered match rules for repositories. A list of all repositories will be fetched
                 ; using API and this list will be filtered using those rules.
                 ; Syntax for entry: `glob_pattern_OR_full_repo_name = 0 OR 1` where 0=exclude, 1=include
                 ; When this ordered list is traversed first match will return the include/exclude marker
                 ; For example:
                 ;    upstream/binary_repo = 0
                 ;    upstream/subrepo/xml_files = 0
                 ;    upstream/* = 1
                 ;    special-repo = 1
                 ;    * = 0
                 ; This will index all repositories under upstream/*, but skip upstream/binary_repo
                 ; and upstream/sub_repo/xml_files, last * = 0 means skip all other matches
-                ; Another example:
-                ;    *-fork = 0
-                ;    * = 1
-                ; This will index all repositories, except those that have -fork as suffix.
-                rhodecode-vcsserver = 1
-                rhodecode-enterprise-ce = 1
-                upstream/mozilla/firefox-repo = 0
-                upstream/git-binaries = 0
-                upstream/* = 1
-                * = 0
                 ; == EXPLICIT REPOSITORY INDEXING ==
                 ; If defined this will skip using __INDEX_RULES__, and will not use API to fetch
                 ; list of repositories, it will explicitly take names defined with [NAME] format and
                 ; try to build the index, to build index just for repo_name_1 and special-repo use:
                 ;    [repo_name_1]
                 ;    [special-repo]
                 ; == PER REPOSITORY CONFIGURATION ==
                 ; This allows overriding the global configuration per repository.
                 ; example to set specific file limit, and skip certain files for repository special-repo
+                ; the CLI flags doesn't override the conf settings.
                 ;    [conf:special-repo]
                 ;    max_filesize = 5mb
                 ;    skip_files = *.xml, *.sql
-                ;    index_types = files,
-                [conf:rhodecode-vcsserver]
-                index_types = files,
-                max_filesize = 5mb
-                skip_files = *.xml, *.sql
-                index_files = *.py, *.c, *.h, *.js
             In case of 1000s of repositories it can be tricky to write the include/exclude rules at first.
             There's a special flag to test the mapping file rules and list repositories that would
-            be indexed. Run the indexer with `--show-matched-repos` to list only the match rules::
+            be indexed. Run the indexer with `--show-matched-repos` to list only the
+            match repositories defined in .ini file rules::
                 rhodecode-index --instance-name=enterprise-1 --show-matched-repos --mapping=/my/path/search_mapping.ini
             .. _enable-elasticsearch:
-            Enabling Elasticsearch
+            Enabling ElasticSearch
             ^^^^^^^^^^^^^^^^^^^^^^
-            Elasticsearch is available in EE edition only. It provides much scalable and more advanced
+            ElasticSearch is available in EE edition only. It provides much scalable and more advanced
-            search capabilities. While Whoosh is fine for upto 1-2GB of data beyond that amount of
+            search capabilities. While Whoosh is fine for upto 1-2GB of data, beyond that amount it
-            data it starts slowing down, and can cause other problems. Elasticsearch 6 also provides
+            starts slowing down, and can cause other problems.
-            much more advanced query language allowing advanced filtering by file paths, extensions
+            New ElasticSearch 6 also provides much more advanced query language.
-            OR statements, ranges etc. Please check query language examples in the search field for
+            It allows advanced filtering by file paths, extensions, use OR statements, ranges etc.
-            some advanced query language usage.
+            Please check query language examples in the search field for some advanced query language usage.
 . Open the :file:`rhodecode.ini` file for the instance you wish to edit. The
                default location is
                :file:`home/{user}/.rccontrol/{instance-id}/rhodecode.ini`
 . Find the search configuration section:
             .. code-block:: ini
                 ###################################
                 ## SEARCH INDEXING CONFIGURATION ##
                 ###################################
                 search.module = rhodecode.lib.index.whoosh
                 search.location = %(here)s/data/index
             and change it to:
             .. code-block:: ini
                 search.module = rc_elasticsearch
                 search.location = http://localhost:9200
                 ## specify Elastic Search version, 6 for latest or 2 for legacy
                 search.es_version = 6
-            where ``search.location`` points to the elasticsearch server
+            where ``search.location`` points to the ElasticSearch server
             by default running on port 9200.
             Index invocation also needs change. Please provide --es-version= and
-            --engine-location= parameters to define elasticsearch server location and it's version.
+            --engine-location= parameters to define ElasticSearch server location and it's version.
             For example::
                 rhodecode-index --instace-name=enterprise-1 --es-version=6 --engine-location=http://localhost:9200
             .. _Whoosh: https://pypi.python.org/pypi/Whoosh/
-            .. _Elasticsearch 6: https://www.elastic.co/
  No newline at end of file
+            .. _ElasticSearch 6: https://www.elastic.co/