##// END OF EJS Templates
dirstate-v2: hash the source of the ignore patterns as well...
Raphaël Gomès -
r50453:363923bd stable
parent child Browse files
Show More
@@ -1,616 +1,624
1 1 The *dirstate* is what Mercurial uses internally to track
2 2 the state of files in the working directory,
3 3 such as set by commands like `hg add` and `hg rm`.
4 4 It also contains some cached data that help make `hg status` faster.
5 5 The name refers both to `.hg/dirstate` on the filesystem
6 6 and the corresponding data structure in memory while a Mercurial process
7 7 is running.
8 8
9 9 The original file format, retroactively dubbed `dirstate-v1`,
10 10 is described at https://www.mercurial-scm.org/wiki/DirState.
11 11 It is made of a flat sequence of unordered variable-size entries,
12 12 so accessing any information in it requires parsing all of it.
13 13 Similarly, saving changes requires rewriting the entire file.
14 14
15 15 The newer `dirstate-v2` file format is designed to fix these limitations
16 16 and make `hg status` faster.
17 17
18 18 User guide
19 19 ==========
20 20
21 21 Compatibility
22 22 -------------
23 23
24 24 The file format is experimental and may still change.
25 25 Different versions of Mercurial may not be compatible with each other
26 26 when working on a local repository that uses this format.
27 27 When using an incompatible version with the experimental format,
28 28 anything can happen including data corruption.
29 29
30 30 Since the dirstate is entirely local and not relevant to the wire protocol,
31 31 `dirstate-v2` does not affect compatibility with remote Mercurial versions.
32 32
33 33 When `share-safe` is enabled, different repositories sharing the same store
34 34 can use different dirstate formats.
35 35
36 36 Enabling `dirstate-v2` for new local repositories
37 37 ------------------------------------------------
38 38
39 39 When creating a new local repository such as with `hg init` or `hg clone`,
40 40 the `use-dirstate-v2` boolean in the `format` configuration section
41 41 controls whether to use this file format.
42 42 This is disabled by default as of this writing.
43 43 To enable it for a single repository, run for example::
44 44
45 45 $ hg init my-project --config format.use-dirstate-v2=1
46 46
47 47 Checking the format of an existing local repository
48 48 --------------------------------------------------
49 49
50 50 The `debugformat` commands prints information about
51 51 which of multiple optional formats are used in the current repository,
52 52 including `dirstate-v2`::
53 53
54 54 $ hg debugformat
55 55 format-variant repo
56 56 fncache: yes
57 57 dirstate-v2: yes
58 58 […]
59 59
60 60 Upgrading or downgrading an existing local repository
61 61 -----------------------------------------------------
62 62
63 63 The `debugupgrade` command does various upgrades or downgrades
64 64 on a local repository
65 65 based on the current Mercurial version and on configuration.
66 66 The same `format.use-dirstate-v2` configuration is used again.
67 67
68 68 Example to upgrade::
69 69
70 70 $ hg debugupgrade --config format.use-dirstate-v2=1
71 71
72 72 Example to downgrade to `dirstate-v1`::
73 73
74 74 $ hg debugupgrade --config format.use-dirstate-v2=0
75 75
76 76 Both of this commands do nothing but print a list of proposed changes,
77 77 which may include changes unrelated to the dirstate.
78 78 Those other changes are controlled by their own configuration keys.
79 79 Add `--run` to a command to actually apply the proposed changes.
80 80
81 81 Backups of `.hg/requires` and `.hg/dirstate` are created
82 82 in a `.hg/upgradebackup.*` directory.
83 83 If something goes wrong, restoring those files should undo the change.
84 84
85 85 Note that upgrading affects compatibility with older versions of Mercurial
86 86 as noted above.
87 87 This can be relevant when a repository’s files are on a USB drive
88 88 or some other removable media, or shared over the network, etc.
89 89
90 90 Internal filesystem representation
91 91 ==================================
92 92
93 93 Requirements file
94 94 -----------------
95 95
96 96 The `.hg/requires` file indicates which of various optional file formats
97 97 are used by a given repository.
98 98 Mercurial aborts when seeing a requirement it does not know about,
99 99 which avoids older version accidentally messing up a repository
100 100 that uses a format that was introduced later.
101 101 For versions that do support a format, the presence or absence of
102 102 the corresponding requirement indicates whether to use that format.
103 103
104 104 When the file contains a `dirstate-v2` line,
105 105 the `dirstate-v2` format is used.
106 106 With no such line `dirstate-v1` is used.
107 107
108 108 High level description
109 109 ----------------------
110 110
111 111 Whereas `dirstate-v1` uses a single `.hg/dirstate` file,
112 112 in `dirstate-v2` that file is a "docket" file
113 113 that only contains some metadata
114 114 and points to separate data file named `.hg/dirstate.{ID}`,
115 115 where `{ID}` is a random identifier.
116 116
117 117 This separation allows making data files append-only
118 118 and therefore safer to memory-map.
119 119 Creating a new data file (occasionally to clean up unused data)
120 120 can be done with a different ID
121 121 without disrupting another Mercurial process
122 122 that could still be using the previous data file.
123 123
124 124 Both files have a format designed to reduce the need for parsing,
125 125 by using fixed-size binary components as much as possible.
126 126 For data that is not fixed-size,
127 127 references to other parts of a file can be made by storing "pseudo-pointers":
128 128 integers counted in bytes from the start of a file.
129 129 For read-only access no data structure is needed,
130 130 only a bytes buffer (possibly memory-mapped directly from the filesystem)
131 131 with specific parts read on demand.
132 132
133 133 The data file contains "nodes" organized in a tree.
134 134 Each node represents a file or directory inside the working directory
135 135 or its parent changeset.
136 136 This tree has the same structure as the filesystem,
137 137 so a node representing a directory has child nodes representing
138 138 the files and subdirectories contained directly in that directory.
139 139
140 140 The docket file format
141 141 ----------------------
142 142
143 143 This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
144 144 and `mercurial/dirstateutils/docket.py`.
145 145
146 146 Components of the docket file are found at fixed offsets,
147 147 counted in bytes from the start of the file:
148 148
149 149 * Offset 0:
150 150 The 12-bytes marker string "dirstate-v2\n" ending with a newline character.
151 151 This makes it easier to tell a dirstate-v2 file from a dirstate-v1 file,
152 152 although it is not strictly necessary
153 153 since `.hg/requires` determines which format to use.
154 154
155 155 * Offset 12:
156 156 The changeset node ID on the first parent of the working directory,
157 157 as up to 32 binary bytes.
158 158 If a node ID is shorter (20 bytes for SHA-1),
159 159 it is start-aligned and the rest of the bytes are set to zero.
160 160
161 161 * Offset 44:
162 162 The changeset node ID on the second parent of the working directory,
163 163 or all zeros if there isn’t one.
164 164 Also 32 binary bytes.
165 165
166 166 * Offset 76:
167 167 Tree metadata on 44 bytes, described below.
168 168 Its separation in this documentation from the rest of the docket
169 169 reflects a detail of the current implementation.
170 170 Since tree metadata is also made of fields at fixed offsets, those could
171 171 be inlined here by adding 76 bytes to each offset.
172 172
173 173 * Offset 120:
174 174 The used size of the data file, as a 32-bit big-endian integer.
175 175 The actual size of the data file may be larger
176 176 (if another Mercurial process is appending to it
177 177 but has not updated the docket yet).
178 178 That extra data must be ignored.
179 179
180 180 * Offset 124:
181 181 The length of the data file identifier, as a 8-bit integer.
182 182
183 183 * Offset 125:
184 184 The data file identifier.
185 185
186 186 * Any additional data is current ignored, and dropped when updating the file.
187 187
188 188 Tree metadata in the docket file
189 189 --------------------------------
190 190
191 191 Tree metadata is similarly made of components at fixed offsets.
192 192 These offsets are counted in bytes from the start of tree metadata,
193 193 which is 76 bytes after the start of the docket file.
194 194
195 195 This metadata can be thought of as the singular root of the tree
196 196 formed by nodes in the data file.
197 197
198 198 * Offset 0:
199 199 Pseudo-pointer to the start of root nodes,
200 200 counted in bytes from the start of the data file,
201 201 as a 32-bit big-endian integer.
202 202 These nodes describe files and directories found directly
203 203 at the root of the working directory.
204 204
205 205 * Offset 4:
206 206 Number of root nodes, as a 32-bit big-endian integer.
207 207
208 208 * Offset 8:
209 209 Total number of nodes in the entire tree that "have a dirstate entry",
210 210 as a 32-bit big-endian integer.
211 211 Those nodes represent files that would be present at all in `dirstate-v1`.
212 212 This is typically less than the total number of nodes.
213 213 This counter is used to implement `len(dirstatemap)`.
214 214
215 215 * Offset 12:
216 216 Number of nodes in the entire tree that have a copy source,
217 217 as a 32-bit big-endian integer.
218 218 At the next commit, these files are recorded
219 219 as having been copied or moved/renamed from that source.
220 220 (A move is recorded as a copy and separate removal of the source.)
221 221 This counter is used to implement `len(dirstatemap.copymap)`.
222 222
223 223 * Offset 16:
224 224 An estimation of how many bytes of the data file
225 225 (within its used size) are unused, as a 32-bit big-endian integer.
226 226 When appending to an existing data file,
227 227 some existing nodes or paths can be unreachable from the new root
228 228 but they still take up space.
229 229 This counter is used to decide when to write a new data file from scratch
230 230 instead of appending to an existing one,
231 231 in order to get rid of that unreachable data
232 232 and avoid unbounded file size growth.
233 233
234 234 * Offset 20:
235 235 These four bytes are currently ignored
236 236 and reset to zero when updating a docket file.
237 237 This is an attempt at forward compatibility:
238 238 future Mercurial versions could use this as a bit field
239 239 to indicate that a dirstate has additional data or constraints.
240 240 Finding a dirstate file with the relevant bit unset indicates that
241 241 it was written by a then-older version
242 242 which is not aware of that future change.
243 243
244 244 * Offset 24:
245 245 Either 20 zero bytes, or a SHA-1 hash as 20 binary bytes.
246 246 When present, the hash is of ignore patterns
247 247 that were used for some previous run of the `status` algorithm.
248 248
249 249 * (Offset 44: end of tree metadata)
250 250
251 251 Optional hash of ignore patterns
252 252 --------------------------------
253 253
254 254 The implementation of `status` at `rust/hg-core/src/dirstate_tree/status.rs`
255 255 has been optimized such that its run time is dominated by calls
256 256 to `stat` for reading the filesystem metadata of a file or directory,
257 257 and to `readdir` for listing the contents of a directory.
258 258 In some cases the algorithm can skip calls to `readdir`
259 259 (saving significant time)
260 260 because the dirstate already contains enough of the relevant information
261 261 to build the correct `status` results.
262 262
263 263 The default configuration of `hg status` is to list unknown files
264 264 but not ignored files.
265 265 In this case, it matters for the `readdir`-skipping optimization
266 266 if a given file used to be ignored but became unknown
267 267 because `.hgignore` changed.
268 268 To detect the possibility of such a change,
269 269 the tree metadata contains an optional hash of all ignore patterns.
270 270
271 271 We define:
272 272
273 273 * "Root" ignore files as:
274 274
275 275 - `.hgignore` at the root of the repository if it exists
276 276 - And all files from `ui.ignore.*` config.
277 277
278 278 This set of files is sorted by the string representation of their path.
279 279
280 280 * The "expanded contents" of an ignore files is the byte string made
281 281 by the concatenation of its contents followed by the "expanded contents"
282 282 of other files included with `include:` or `subinclude:` directives,
283 283 in inclusion order. This definition is recursive, as included files can
284 284 themselves include more files.
285 285
286 This hash is defined as the SHA-1 of the concatenation (in sorted
287 order) of the "expanded contents" of each "root" ignore file.
286 * "filepath" as the bytes of the ignore file path
287 relative to the root of the repository if inside the repository,
288 or the untouched path as defined in the configuration.
289
290 This hash is defined as the SHA-1 of the following line format:
291
292 <filepath> <sha1 of the "expanded contents">\n
293
294 for each "root" ignore file. (in sorted order)
295
288 296 (Note that computing this does not require actually concatenating
289 297 into a single contiguous byte sequence.
290 298 Instead a SHA-1 hasher object can be created
291 299 and fed separate chunks one by one.)
292 300
293 301 The data file format
294 302 --------------------
295 303
296 304 This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
297 305 and `mercurial/dirstateutils/v2.py`.
298 306
299 307 The data file contains two types of data: paths and nodes.
300 308
301 309 Paths and nodes can be organized in any order in the file, except that sibling
302 310 nodes must be next to each other and sorted by their path.
303 311 Contiguity lets the parent refer to them all
304 312 by their count and a single pseudo-pointer,
305 313 instead of storing one pseudo-pointer per child node.
306 314 Sorting allows using binary search to find a child node with a given name
307 315 in `O(log(n))` byte sequence comparisons.
308 316
309 317 The current implementation writes paths and child node before a given node
310 318 for ease of figuring out the value of pseudo-pointers by the time the are to be
311 319 written, but this is not an obligation and readers must not rely on it.
312 320
313 321 A path is stored as a byte string anywhere in the file, without delimiter.
314 322 It is referred to by one or more node by a pseudo-pointer to its start, and its
315 323 length in bytes. Since there is no delimiter,
316 324 when a path is a substring of another the same bytes could be reused,
317 325 although the implementation does not exploit this as of this writing.
318 326
319 327 A node is stored on 43 bytes with components at fixed offsets. Paths and
320 328 child nodes relevant to a node are stored externally and referenced though
321 329 pseudo-pointers.
322 330
323 331 All integers are stored in big-endian. All pseudo-pointers are 32-bit integers
324 332 counting bytes from the start of the data file. Path lengths and positions
325 333 are 16-bit integers, also counted in bytes.
326 334
327 335 Node components are:
328 336
329 337 * Offset 0:
330 338 Pseudo-pointer to the full path of this node,
331 339 from the working directory root.
332 340
333 341 * Offset 4:
334 342 Length of the full path.
335 343
336 344 * Offset 6:
337 345 Position of the last `/` path separator within the full path,
338 346 in bytes from the start of the full path,
339 347 or zero if there isn’t one.
340 348 The part of the full path after this position is the "base name".
341 349 Since sibling nodes have the same parent, only their base name vary
342 350 and needs to be considered when doing binary search to find a given path.
343 351
344 352 * Offset 8:
345 353 Pseudo-pointer to the "copy source" path for this node,
346 354 or zero if there is no copy source.
347 355
348 356 * Offset 12:
349 357 Length of the copy source path, or zero if there isn’t one.
350 358
351 359 * Offset 14:
352 360 Pseudo-pointer to the start of child nodes.
353 361
354 362 * Offset 18:
355 363 Number of child nodes, as a 32-bit integer.
356 364 They occupy 43 times this number of bytes
357 365 (not counting space for paths, and further descendants).
358 366
359 367 * Offset 22:
360 368 Number as a 32-bit integer of descendant nodes in this subtree,
361 369 not including this node itself,
362 370 that "have a dirstate entry".
363 371 Those nodes represent files that would be present at all in `dirstate-v1`.
364 372 This is typically less than the total number of descendants.
365 373 This counter is used to implement `has_dir`.
366 374
367 375 * Offset 26:
368 376 Number as a 32-bit integer of descendant nodes in this subtree,
369 377 not including this node itself,
370 378 that represent files tracked in the working directory.
371 379 (For example, `hg rm` makes a file untracked.)
372 380 This counter is used to implement `has_tracked_dir`.
373 381
374 382 * Offset 30:
375 383 A `flags` fields that packs some boolean values as bits of a 16-bit integer.
376 384 Starting from least-significant, bit masks are::
377 385
378 386 WDIR_TRACKED = 1 << 0
379 387 P1_TRACKED = 1 << 1
380 388 P2_INFO = 1 << 2
381 389 MODE_EXEC_PERM = 1 << 3
382 390 MODE_IS_SYMLINK = 1 << 4
383 391 HAS_FALLBACK_EXEC = 1 << 5
384 392 FALLBACK_EXEC = 1 << 6
385 393 HAS_FALLBACK_SYMLINK = 1 << 7
386 394 FALLBACK_SYMLINK = 1 << 8
387 395 EXPECTED_STATE_IS_MODIFIED = 1 << 9
388 396 HAS_MODE_AND_SIZE = 1 << 10
389 397 HAS_MTIME = 1 << 11
390 398 MTIME_SECOND_AMBIGUOUS = 1 << 12
391 399 DIRECTORY = 1 << 13
392 400 ALL_UNKNOWN_RECORDED = 1 << 14
393 401 ALL_IGNORED_RECORDED = 1 << 15
394 402
395 403 The meaning of each bit is described below.
396 404
397 405 Other bits are unset.
398 406 They may be assigned meaning if the future,
399 407 with the limitation that Mercurial versions that pre-date such meaning
400 408 will always reset those bits to unset when writing nodes.
401 409 (A new node is written for any mutation in its subtree,
402 410 leaving the bytes of the old node unreachable
403 411 until the data file is rewritten entirely.)
404 412
405 413 * Offset 32:
406 414 A `size` field described below, as a 32-bit integer.
407 415 Unlike in dirstate-v1, negative values are not used.
408 416
409 417 * Offset 36:
410 418 The seconds component of an `mtime` field described below,
411 419 as a 32-bit integer.
412 420 Unlike in dirstate-v1, negative values are not used.
413 421 When `mtime` is used, this is number of seconds since the Unix epoch
414 422 truncated to its lower 31 bits.
415 423
416 424 * Offset 40:
417 425 The nanoseconds component of an `mtime` field described below,
418 426 as a 32-bit integer.
419 427 When `mtime` is used,
420 428 this is the number of nanoseconds since `mtime.seconds`,
421 429 always strictly less than one billion.
422 430
423 431 This may be zero if more precision is not available.
424 432 (This can happen because of limitations in any of Mercurial, Python,
425 433 libc, the operating system, …)
426 434
427 435 When comparing two mtimes and either has this component set to zero,
428 436 the sub-second precision of both should be ignored.
429 437 False positives when checking mtime equality due to clock resolution
430 438 are always possible and the status algorithm needs to deal with them,
431 439 but having too many false negatives could be harmful too.
432 440
433 441 * (Offset 44: end of this node)
434 442
435 443 The meaning of the boolean values packed in `flags` is:
436 444
437 445 `WDIR_TRACKED`
438 446 Set if the working directory contains a tracked file at this node’s path.
439 447 This is typically set and unset by `hg add` and `hg rm`.
440 448
441 449 `P1_TRACKED`
442 450 Set if the working directory’s first parent changeset
443 451 (whose node identifier is found in tree metadata)
444 452 contains a tracked file at this node’s path.
445 453 This is a cache to reduce manifest lookups.
446 454
447 455 `P2_INFO`
448 456 Set if the file has been involved in some merge operation.
449 457 Either because it was actually merged,
450 458 or because the version in the second parent p2 version was ahead,
451 459 or because some rename moved it there.
452 460 In either case `hg status` will want it displayed as modified.
453 461
454 462 Files that would be mentioned at all in the `dirstate-v1` file format
455 463 have a node with at least one of the above three bits set in `dirstate-v2`.
456 464 Let’s call these files "tracked anywhere",
457 465 and "untracked" the nodes with all three of these bits unset.
458 466 Untracked nodes are typically for directories:
459 467 they hold child nodes and form the tree structure.
460 468 Additional untracked nodes may also exist.
461 469 Although implementations should strive to clean up nodes
462 470 that are entirely unused, other untracked nodes may also exist.
463 471 For example, a future version of Mercurial might in some cases
464 472 add nodes for untracked files or/and ignored files in the working directory
465 473 in order to optimize `hg status`
466 474 by enabling it to skip `readdir` in more cases.
467 475
468 476 `HAS_MODE_AND_SIZE`
469 477 Must be unset for untracked nodes.
470 478 For files tracked anywhere, if this is set:
471 479 - The `size` field is the expected file size,
472 480 in bytes truncated its lower to 31 bits.
473 481 - The expected execute permission for the file’s owner
474 482 is given by `MODE_EXEC_PERM`
475 483 - The expected file type is given by `MODE_IS_SIMLINK`:
476 484 a symbolic link if set, or a normal file if unset.
477 485 If this is unset the expected size, permission, and file type are unknown.
478 486 The `size` field is unused (set to zero).
479 487
480 488 `HAS_MTIME`
481 489 The nodes contains a "valid" last modification time in the `mtime` field.
482 490
483 491
484 492 It means the `mtime` was already strictly in the past when observed,
485 493 meaning that later changes cannot happen in the same clock tick
486 494 and must cause a different modification time
487 495 (unless the system clock jumps back and we get unlucky,
488 496 which is not impossible but deemed unlikely enough).
489 497
490 498 This means that if `std::fs::symlink_metadata` later reports
491 499 the same modification time
492 500 and ignored patterns haven’t changed,
493 501 we can assume the node to be unchanged on disk.
494 502
495 503 The `mtime` field can then be used to skip more expensive lookup when
496 504 checking the status of "tracked" nodes.
497 505
498 506 It can also be set for node where `DIRECTORY` is set.
499 507 See `DIRECTORY` documentation for details.
500 508
501 509 `DIRECTORY`
502 510 When set, this entry will match a directory that exists or existed on the
503 511 file system.
504 512
505 513 * When `HAS_MTIME` is set a directory has been seen on the file system and
506 514 `mtime` matches its last modification time. However, `HAS_MTIME` not
507 515 being set does not indicate the lack of directory on the file system.
508 516
509 517 * When not tracked anywhere, this node does not represent an ignored or
510 518 unknown file on disk.
511 519
512 520 If `HAS_MTIME` is set
513 521 and `mtime` matches the last modification time of the directory on disk,
514 522 the directory is unchanged
515 523 and we can skip calling `std::fs::read_dir` again for this directory,
516 524 and iterate child dirstate nodes instead.
517 525 (as long as `ALL_UNKNOWN_RECORDED` and `ALL_IGNORED_RECORDED` are taken
518 526 into account)
519 527
520 528 `MODE_EXEC_PERM`
521 529 Must be unset if `HAS_MODE_AND_SIZE` is unset.
522 530 If `HAS_MODE_AND_SIZE` is set,
523 531 this indicates whether the file’s own is expected
524 532 to have execute permission.
525 533
526 534 Beware that on system without fs support for this information, the value
527 535 stored in the dirstate might be wrong and should not be relied on.
528 536
529 537 `MODE_IS_SYMLINK`
530 538 Must be unset if `HAS_MODE_AND_SIZE` is unset.
531 539 If `HAS_MODE_AND_SIZE` is set,
532 540 this indicates whether the file is expected to be a symlink
533 541 as opposed to a normal file.
534 542
535 543 Beware that on system without fs support for this information, the value
536 544 stored in the dirstate might be wrong and should not be relied on.
537 545
538 546 `EXPECTED_STATE_IS_MODIFIED`
539 547 Must be unset for untracked nodes.
540 548 For:
541 549 - a file tracked anywhere
542 550 - that has expected metadata (`HAS_MODE_AND_SIZE` and `HAS_MTIME`)
543 551 - if that metadata matches
544 552 metadata found in the working directory with `stat`
545 553 This bit indicates the status of the file.
546 554 If set, the status is modified. If unset, it is clean.
547 555
548 556 In cases where `hg status` needs to read the contents of a file
549 557 because metadata is ambiguous, this bit lets it record the result
550 558 if the result is modified so that a future run of `hg status`
551 559 does not need to do the same again.
552 560 It is valid to never set this bit,
553 561 and consider expected metadata ambiguous if it is set.
554 562
555 563 `ALL_UNKNOWN_RECORDED`
556 564 If set, all "unknown" children existing on disk (at the time of the last
557 565 status) have been recorded and the `mtime` associated with
558 566 `DIRECTORY` can be used for optimization even when "unknown" file
559 567 are listed.
560 568
561 569 Note that the amount recorded "unknown" children can still be zero if None
562 570 where present.
563 571
564 572 Also note that having this flag unset does not imply that no "unknown"
565 573 children have been recorded. Some might be present, but there is
566 574 no guarantee that is will be all of them.
567 575
568 576 `ALL_IGNORED_RECORDED`
569 577 If set, all "ignored" children existing on disk (at the time of the last
570 578 status) have been recorded and the `mtime` associated with
571 579 `DIRECTORY` can be used for optimization even when "ignored" file
572 580 are listed.
573 581
574 582 Note that the amount recorded "ignored" children can still be zero if None
575 583 where present.
576 584
577 585 Also note that having this flag unset does not imply that no "ignored"
578 586 children have been recorded. Some might be present, but there is
579 587 no guarantee that is will be all of them.
580 588
581 589 `HAS_FALLBACK_EXEC`
582 590 If this flag is set, the entry carries "fallback" information for the
583 591 executable bit in the `FALLBACK_EXEC` flag.
584 592
585 593 Fallback information can be stored in the dirstate to keep track of
586 594 filesystem attribute tracked by Mercurial when the underlying file
587 595 system or operating system does not support that property, (e.g.
588 596 Windows).
589 597
590 598 `FALLBACK_EXEC`
591 599 Should be ignored if `HAS_FALLBACK_EXEC` is unset. If set the file for this
592 600 entry should be considered executable if that information cannot be
593 601 extracted from the file system. If unset it should be considered
594 602 non-executable instead.
595 603
596 604 `HAS_FALLBACK_SYMLINK`
597 605 If this flag is set, the entry carries "fallback" information for symbolic
598 606 link status in the `FALLBACK_SYMLINK` flag.
599 607
600 608 Fallback information can be stored in the dirstate to keep track of
601 609 filesystem attribute tracked by Mercurial when the underlying file
602 610 system or operating system does not support that property, (e.g.
603 611 Windows).
604 612
605 613 `FALLBACK_SYMLINK`
606 614 Should be ignored if `HAS_FALLBACK_SYMLINK` is unset. If set the file for
607 615 this entry should be considered a symlink if that information cannot be
608 616 extracted from the file system. If unset it should be considered a normal
609 617 file instead.
610 618
611 619 `MTIME_SECOND_AMBIGUOUS`
612 620 This flag is relevant only when `HAS_FILE_MTIME` is set. When set, the
613 621 `mtime` stored in the entry is only valid for comparison with timestamps
614 622 that have nanosecond information. If available timestamp does not carries
615 623 nanosecond information, the `mtime` should be ignored and no optimization
616 624 can be applied.
@@ -1,913 +1,931
1 1 use crate::dirstate::entry::TruncatedTimestamp;
2 2 use crate::dirstate::status::IgnoreFnType;
3 3 use crate::dirstate::status::StatusPath;
4 4 use crate::dirstate_tree::dirstate_map::BorrowedPath;
5 5 use crate::dirstate_tree::dirstate_map::ChildNodesRef;
6 6 use crate::dirstate_tree::dirstate_map::DirstateMap;
7 7 use crate::dirstate_tree::dirstate_map::DirstateVersion;
8 8 use crate::dirstate_tree::dirstate_map::NodeRef;
9 9 use crate::dirstate_tree::on_disk::DirstateV2ParseError;
10 10 use crate::matchers::get_ignore_function;
11 11 use crate::matchers::Matcher;
12 12 use crate::utils::files::get_bytes_from_os_string;
13 use crate::utils::files::get_bytes_from_path;
13 14 use crate::utils::files::get_path_from_bytes;
14 15 use crate::utils::hg_path::HgPath;
15 16 use crate::BadMatch;
16 17 use crate::DirstateStatus;
17 18 use crate::HgPathBuf;
18 19 use crate::HgPathCow;
19 20 use crate::PatternFileWarning;
20 21 use crate::StatusError;
21 22 use crate::StatusOptions;
22 23 use micro_timer::timed;
23 24 use once_cell::sync::OnceCell;
24 25 use rayon::prelude::*;
25 26 use sha1::{Digest, Sha1};
26 27 use std::borrow::Cow;
27 28 use std::io;
28 29 use std::path::Path;
29 30 use std::path::PathBuf;
30 31 use std::sync::Mutex;
31 32 use std::time::SystemTime;
32 33
33 34 /// Returns the status of the working directory compared to its parent
34 35 /// changeset.
35 36 ///
36 37 /// This algorithm is based on traversing the filesystem tree (`fs` in function
37 38 /// and variable names) and dirstate tree at the same time. The core of this
38 39 /// traversal is the recursive `traverse_fs_directory_and_dirstate` function
39 40 /// and its use of `itertools::merge_join_by`. When reaching a path that only
40 41 /// exists in one of the two trees, depending on information requested by
41 42 /// `options` we may need to traverse the remaining subtree.
42 43 #[timed]
43 44 pub fn status<'dirstate>(
44 45 dmap: &'dirstate mut DirstateMap,
45 46 matcher: &(dyn Matcher + Sync),
46 47 root_dir: PathBuf,
47 48 ignore_files: Vec<PathBuf>,
48 49 options: StatusOptions,
49 50 ) -> Result<(DirstateStatus<'dirstate>, Vec<PatternFileWarning>), StatusError>
50 51 {
51 52 // Force the global rayon threadpool to not exceed 16 concurrent threads.
52 53 // This is a stop-gap measure until we figure out why using more than 16
53 54 // threads makes `status` slower for each additional thread.
54 55 // We use `ok()` in case the global threadpool has already been
55 56 // instantiated in `rhg` or some other caller.
56 57 // TODO find the underlying cause and fix it, then remove this.
57 58 rayon::ThreadPoolBuilder::new()
58 59 .num_threads(16)
59 60 .build_global()
60 61 .ok();
61 62
62 63 let (ignore_fn, warnings, patterns_changed): (IgnoreFnType, _, _) =
63 64 if options.list_ignored || options.list_unknown {
64 65 let (ignore_fn, warnings, changed) = match dmap.dirstate_version {
65 66 DirstateVersion::V1 => {
66 67 let (ignore_fn, warnings) = get_ignore_function(
67 68 ignore_files,
68 69 &root_dir,
69 &mut |_pattern_bytes| {},
70 &mut |_source, _pattern_bytes| {},
70 71 )?;
71 72 (ignore_fn, warnings, None)
72 73 }
73 74 DirstateVersion::V2 => {
74 75 let mut hasher = Sha1::new();
75 76 let (ignore_fn, warnings) = get_ignore_function(
76 77 ignore_files,
77 78 &root_dir,
78 &mut |pattern_bytes| hasher.update(pattern_bytes),
79 &mut |source, pattern_bytes| {
80 // If inside the repo, use the relative version to
81 // make it deterministic inside tests.
82 // The performance hit should be negligible.
83 let source = source
84 .strip_prefix(&root_dir)
85 .unwrap_or(source);
86 let source = get_bytes_from_path(source);
87
88 let mut subhasher = Sha1::new();
89 subhasher.update(pattern_bytes);
90 let patterns_hash = subhasher.finalize();
91
92 hasher.update(source);
93 hasher.update(b" ");
94 hasher.update(patterns_hash);
95 hasher.update(b"\n");
96 },
79 97 )?;
80 98 let new_hash = *hasher.finalize().as_ref();
81 99 let changed = new_hash != dmap.ignore_patterns_hash;
82 100 dmap.ignore_patterns_hash = new_hash;
83 101 (ignore_fn, warnings, Some(changed))
84 102 }
85 103 };
86 104 (ignore_fn, warnings, changed)
87 105 } else {
88 106 (Box::new(|&_| true), vec![], None)
89 107 };
90 108
91 109 let filesystem_time_at_status_start =
92 110 filesystem_now(&root_dir).ok().map(TruncatedTimestamp::from);
93 111
94 112 // If the repository is under the current directory, prefer using a
95 113 // relative path, so the kernel needs to traverse fewer directory in every
96 114 // call to `read_dir` or `symlink_metadata`.
97 115 // This is effective in the common case where the current directory is the
98 116 // repository root.
99 117
100 118 // TODO: Better yet would be to use libc functions like `openat` and
101 119 // `fstatat` to remove such repeated traversals entirely, but the standard
102 120 // library does not provide APIs based on those.
103 121 // Maybe with a crate like https://crates.io/crates/openat instead?
104 122 let root_dir = if let Some(relative) = std::env::current_dir()
105 123 .ok()
106 124 .and_then(|cwd| root_dir.strip_prefix(cwd).ok())
107 125 {
108 126 relative
109 127 } else {
110 128 &root_dir
111 129 };
112 130
113 131 let outcome = DirstateStatus {
114 132 filesystem_time_at_status_start,
115 133 ..Default::default()
116 134 };
117 135 let common = StatusCommon {
118 136 dmap,
119 137 options,
120 138 matcher,
121 139 ignore_fn,
122 140 outcome: Mutex::new(outcome),
123 141 ignore_patterns_have_changed: patterns_changed,
124 142 new_cacheable_directories: Default::default(),
125 143 outdated_cached_directories: Default::default(),
126 144 filesystem_time_at_status_start,
127 145 };
128 146 let is_at_repo_root = true;
129 147 let hg_path = &BorrowedPath::OnDisk(HgPath::new(""));
130 148 let has_ignored_ancestor = HasIgnoredAncestor::create(None, hg_path);
131 149 let root_cached_mtime = None;
132 150 let root_dir_metadata = None;
133 151 // If the path we have for the repository root is a symlink, do follow it.
134 152 // (As opposed to symlinks within the working directory which are not
135 153 // followed, using `std::fs::symlink_metadata`.)
136 154 common.traverse_fs_directory_and_dirstate(
137 155 &has_ignored_ancestor,
138 156 dmap.root.as_ref(),
139 157 hg_path,
140 158 &root_dir,
141 159 root_dir_metadata,
142 160 root_cached_mtime,
143 161 is_at_repo_root,
144 162 )?;
145 163 let mut outcome = common.outcome.into_inner().unwrap();
146 164 let new_cacheable = common.new_cacheable_directories.into_inner().unwrap();
147 165 let outdated = common.outdated_cached_directories.into_inner().unwrap();
148 166
149 167 outcome.dirty = common.ignore_patterns_have_changed == Some(true)
150 168 || !outdated.is_empty()
151 169 || (!new_cacheable.is_empty()
152 170 && dmap.dirstate_version == DirstateVersion::V2);
153 171
154 172 // Remove outdated mtimes before adding new mtimes, in case a given
155 173 // directory is both
156 174 for path in &outdated {
157 175 dmap.clear_cached_mtime(path)?;
158 176 }
159 177 for (path, mtime) in &new_cacheable {
160 178 dmap.set_cached_mtime(path, *mtime)?;
161 179 }
162 180
163 181 Ok((outcome, warnings))
164 182 }
165 183
166 184 /// Bag of random things needed by various parts of the algorithm. Reduces the
167 185 /// number of parameters passed to functions.
168 186 struct StatusCommon<'a, 'tree, 'on_disk: 'tree> {
169 187 dmap: &'tree DirstateMap<'on_disk>,
170 188 options: StatusOptions,
171 189 matcher: &'a (dyn Matcher + Sync),
172 190 ignore_fn: IgnoreFnType<'a>,
173 191 outcome: Mutex<DirstateStatus<'on_disk>>,
174 192 /// New timestamps of directories to be used for caching their readdirs
175 193 new_cacheable_directories:
176 194 Mutex<Vec<(Cow<'on_disk, HgPath>, TruncatedTimestamp)>>,
177 195 /// Used to invalidate the readdir cache of directories
178 196 outdated_cached_directories: Mutex<Vec<Cow<'on_disk, HgPath>>>,
179 197
180 198 /// Whether ignore files like `.hgignore` have changed since the previous
181 199 /// time a `status()` call wrote their hash to the dirstate. `None` means
182 200 /// we don’t know as this run doesn’t list either ignored or uknown files
183 201 /// and therefore isn’t reading `.hgignore`.
184 202 ignore_patterns_have_changed: Option<bool>,
185 203
186 204 /// The current time at the start of the `status()` algorithm, as measured
187 205 /// and possibly truncated by the filesystem.
188 206 filesystem_time_at_status_start: Option<TruncatedTimestamp>,
189 207 }
190 208
191 209 enum Outcome {
192 210 Modified,
193 211 Added,
194 212 Removed,
195 213 Deleted,
196 214 Clean,
197 215 Ignored,
198 216 Unknown,
199 217 Unsure,
200 218 }
201 219
202 220 /// Lazy computation of whether a given path has a hgignored
203 221 /// ancestor.
204 222 struct HasIgnoredAncestor<'a> {
205 223 /// `path` and `parent` constitute the inputs to the computation,
206 224 /// `cache` stores the outcome.
207 225 path: &'a HgPath,
208 226 parent: Option<&'a HasIgnoredAncestor<'a>>,
209 227 cache: OnceCell<bool>,
210 228 }
211 229
212 230 impl<'a> HasIgnoredAncestor<'a> {
213 231 fn create(
214 232 parent: Option<&'a HasIgnoredAncestor<'a>>,
215 233 path: &'a HgPath,
216 234 ) -> HasIgnoredAncestor<'a> {
217 235 Self {
218 236 path,
219 237 parent,
220 238 cache: OnceCell::new(),
221 239 }
222 240 }
223 241
224 242 fn force<'b>(&self, ignore_fn: &IgnoreFnType<'b>) -> bool {
225 243 match self.parent {
226 244 None => false,
227 245 Some(parent) => {
228 246 *(parent.cache.get_or_init(|| {
229 247 parent.force(ignore_fn) || ignore_fn(&self.path)
230 248 }))
231 249 }
232 250 }
233 251 }
234 252 }
235 253
236 254 impl<'a, 'tree, 'on_disk> StatusCommon<'a, 'tree, 'on_disk> {
237 255 fn push_outcome(
238 256 &self,
239 257 which: Outcome,
240 258 dirstate_node: &NodeRef<'tree, 'on_disk>,
241 259 ) -> Result<(), DirstateV2ParseError> {
242 260 let path = dirstate_node
243 261 .full_path_borrowed(self.dmap.on_disk)?
244 262 .detach_from_tree();
245 263 let copy_source = if self.options.list_copies {
246 264 dirstate_node
247 265 .copy_source_borrowed(self.dmap.on_disk)?
248 266 .map(|source| source.detach_from_tree())
249 267 } else {
250 268 None
251 269 };
252 270 self.push_outcome_common(which, path, copy_source);
253 271 Ok(())
254 272 }
255 273
256 274 fn push_outcome_without_copy_source(
257 275 &self,
258 276 which: Outcome,
259 277 path: &BorrowedPath<'_, 'on_disk>,
260 278 ) {
261 279 self.push_outcome_common(which, path.detach_from_tree(), None)
262 280 }
263 281
264 282 fn push_outcome_common(
265 283 &self,
266 284 which: Outcome,
267 285 path: HgPathCow<'on_disk>,
268 286 copy_source: Option<HgPathCow<'on_disk>>,
269 287 ) {
270 288 let mut outcome = self.outcome.lock().unwrap();
271 289 let vec = match which {
272 290 Outcome::Modified => &mut outcome.modified,
273 291 Outcome::Added => &mut outcome.added,
274 292 Outcome::Removed => &mut outcome.removed,
275 293 Outcome::Deleted => &mut outcome.deleted,
276 294 Outcome::Clean => &mut outcome.clean,
277 295 Outcome::Ignored => &mut outcome.ignored,
278 296 Outcome::Unknown => &mut outcome.unknown,
279 297 Outcome::Unsure => &mut outcome.unsure,
280 298 };
281 299 vec.push(StatusPath { path, copy_source });
282 300 }
283 301
284 302 fn read_dir(
285 303 &self,
286 304 hg_path: &HgPath,
287 305 fs_path: &Path,
288 306 is_at_repo_root: bool,
289 307 ) -> Result<Vec<DirEntry>, ()> {
290 308 DirEntry::read_dir(fs_path, is_at_repo_root)
291 309 .map_err(|error| self.io_error(error, hg_path))
292 310 }
293 311
294 312 fn io_error(&self, error: std::io::Error, hg_path: &HgPath) {
295 313 let errno = error.raw_os_error().expect("expected real OS error");
296 314 self.outcome
297 315 .lock()
298 316 .unwrap()
299 317 .bad
300 318 .push((hg_path.to_owned().into(), BadMatch::OsError(errno)))
301 319 }
302 320
303 321 fn check_for_outdated_directory_cache(
304 322 &self,
305 323 dirstate_node: &NodeRef<'tree, 'on_disk>,
306 324 ) -> Result<bool, DirstateV2ParseError> {
307 325 if self.ignore_patterns_have_changed == Some(true)
308 326 && dirstate_node.cached_directory_mtime()?.is_some()
309 327 {
310 328 self.outdated_cached_directories.lock().unwrap().push(
311 329 dirstate_node
312 330 .full_path_borrowed(self.dmap.on_disk)?
313 331 .detach_from_tree(),
314 332 );
315 333 return Ok(true);
316 334 }
317 335 Ok(false)
318 336 }
319 337
320 338 /// If this returns true, we can get accurate results by only using
321 339 /// `symlink_metadata` for child nodes that exist in the dirstate and don’t
322 340 /// need to call `read_dir`.
323 341 fn can_skip_fs_readdir(
324 342 &self,
325 343 directory_metadata: Option<&std::fs::Metadata>,
326 344 cached_directory_mtime: Option<TruncatedTimestamp>,
327 345 ) -> bool {
328 346 if !self.options.list_unknown && !self.options.list_ignored {
329 347 // All states that we care about listing have corresponding
330 348 // dirstate entries.
331 349 // This happens for example with `hg status -mard`.
332 350 return true;
333 351 }
334 352 if !self.options.list_ignored
335 353 && self.ignore_patterns_have_changed == Some(false)
336 354 {
337 355 if let Some(cached_mtime) = cached_directory_mtime {
338 356 // The dirstate contains a cached mtime for this directory, set
339 357 // by a previous run of the `status` algorithm which found this
340 358 // directory eligible for `read_dir` caching.
341 359 if let Some(meta) = directory_metadata {
342 360 if cached_mtime
343 361 .likely_equal_to_mtime_of(meta)
344 362 .unwrap_or(false)
345 363 {
346 364 // The mtime of that directory has not changed
347 365 // since then, which means that the results of
348 366 // `read_dir` should also be unchanged.
349 367 return true;
350 368 }
351 369 }
352 370 }
353 371 }
354 372 false
355 373 }
356 374
357 375 /// Returns whether all child entries of the filesystem directory have a
358 376 /// corresponding dirstate node or are ignored.
359 377 fn traverse_fs_directory_and_dirstate<'ancestor>(
360 378 &self,
361 379 has_ignored_ancestor: &'ancestor HasIgnoredAncestor<'ancestor>,
362 380 dirstate_nodes: ChildNodesRef<'tree, 'on_disk>,
363 381 directory_hg_path: &BorrowedPath<'tree, 'on_disk>,
364 382 directory_fs_path: &Path,
365 383 directory_metadata: Option<&std::fs::Metadata>,
366 384 cached_directory_mtime: Option<TruncatedTimestamp>,
367 385 is_at_repo_root: bool,
368 386 ) -> Result<bool, DirstateV2ParseError> {
369 387 if self.can_skip_fs_readdir(directory_metadata, cached_directory_mtime)
370 388 {
371 389 dirstate_nodes
372 390 .par_iter()
373 391 .map(|dirstate_node| {
374 392 let fs_path = directory_fs_path.join(get_path_from_bytes(
375 393 dirstate_node.base_name(self.dmap.on_disk)?.as_bytes(),
376 394 ));
377 395 match std::fs::symlink_metadata(&fs_path) {
378 396 Ok(fs_metadata) => self.traverse_fs_and_dirstate(
379 397 &fs_path,
380 398 &fs_metadata,
381 399 dirstate_node,
382 400 has_ignored_ancestor,
383 401 ),
384 402 Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
385 403 self.traverse_dirstate_only(dirstate_node)
386 404 }
387 405 Err(error) => {
388 406 let hg_path =
389 407 dirstate_node.full_path(self.dmap.on_disk)?;
390 408 Ok(self.io_error(error, hg_path))
391 409 }
392 410 }
393 411 })
394 412 .collect::<Result<_, _>>()?;
395 413
396 414 // We don’t know, so conservatively say this isn’t the case
397 415 let children_all_have_dirstate_node_or_are_ignored = false;
398 416
399 417 return Ok(children_all_have_dirstate_node_or_are_ignored);
400 418 }
401 419
402 420 let mut fs_entries = if let Ok(entries) = self.read_dir(
403 421 directory_hg_path,
404 422 directory_fs_path,
405 423 is_at_repo_root,
406 424 ) {
407 425 entries
408 426 } else {
409 427 // Treat an unreadable directory (typically because of insufficient
410 428 // permissions) like an empty directory. `self.read_dir` has
411 429 // already called `self.io_error` so a warning will be emitted.
412 430 Vec::new()
413 431 };
414 432
415 433 // `merge_join_by` requires both its input iterators to be sorted:
416 434
417 435 let dirstate_nodes = dirstate_nodes.sorted();
418 436 // `sort_unstable_by_key` doesn’t allow keys borrowing from the value:
419 437 // https://github.com/rust-lang/rust/issues/34162
420 438 fs_entries.sort_unstable_by(|e1, e2| e1.base_name.cmp(&e2.base_name));
421 439
422 440 // Propagate here any error that would happen inside the comparison
423 441 // callback below
424 442 for dirstate_node in &dirstate_nodes {
425 443 dirstate_node.base_name(self.dmap.on_disk)?;
426 444 }
427 445 itertools::merge_join_by(
428 446 dirstate_nodes,
429 447 &fs_entries,
430 448 |dirstate_node, fs_entry| {
431 449 // This `unwrap` never panics because we already propagated
432 450 // those errors above
433 451 dirstate_node
434 452 .base_name(self.dmap.on_disk)
435 453 .unwrap()
436 454 .cmp(&fs_entry.base_name)
437 455 },
438 456 )
439 457 .par_bridge()
440 458 .map(|pair| {
441 459 use itertools::EitherOrBoth::*;
442 460 let has_dirstate_node_or_is_ignored;
443 461 match pair {
444 462 Both(dirstate_node, fs_entry) => {
445 463 self.traverse_fs_and_dirstate(
446 464 &fs_entry.full_path,
447 465 &fs_entry.metadata,
448 466 dirstate_node,
449 467 has_ignored_ancestor,
450 468 )?;
451 469 has_dirstate_node_or_is_ignored = true
452 470 }
453 471 Left(dirstate_node) => {
454 472 self.traverse_dirstate_only(dirstate_node)?;
455 473 has_dirstate_node_or_is_ignored = true;
456 474 }
457 475 Right(fs_entry) => {
458 476 has_dirstate_node_or_is_ignored = self.traverse_fs_only(
459 477 has_ignored_ancestor.force(&self.ignore_fn),
460 478 directory_hg_path,
461 479 fs_entry,
462 480 )
463 481 }
464 482 }
465 483 Ok(has_dirstate_node_or_is_ignored)
466 484 })
467 485 .try_reduce(|| true, |a, b| Ok(a && b))
468 486 }
469 487
470 488 fn traverse_fs_and_dirstate<'ancestor>(
471 489 &self,
472 490 fs_path: &Path,
473 491 fs_metadata: &std::fs::Metadata,
474 492 dirstate_node: NodeRef<'tree, 'on_disk>,
475 493 has_ignored_ancestor: &'ancestor HasIgnoredAncestor<'ancestor>,
476 494 ) -> Result<(), DirstateV2ParseError> {
477 495 let outdated_dircache =
478 496 self.check_for_outdated_directory_cache(&dirstate_node)?;
479 497 let hg_path = &dirstate_node.full_path_borrowed(self.dmap.on_disk)?;
480 498 let file_type = fs_metadata.file_type();
481 499 let file_or_symlink = file_type.is_file() || file_type.is_symlink();
482 500 if !file_or_symlink {
483 501 // If we previously had a file here, it was removed (with
484 502 // `hg rm` or similar) or deleted before it could be
485 503 // replaced by a directory or something else.
486 504 self.mark_removed_or_deleted_if_file(&dirstate_node)?;
487 505 }
488 506 if file_type.is_dir() {
489 507 if self.options.collect_traversed_dirs {
490 508 self.outcome
491 509 .lock()
492 510 .unwrap()
493 511 .traversed
494 512 .push(hg_path.detach_from_tree())
495 513 }
496 514 let is_ignored = HasIgnoredAncestor::create(
497 515 Some(&has_ignored_ancestor),
498 516 hg_path,
499 517 );
500 518 let is_at_repo_root = false;
501 519 let children_all_have_dirstate_node_or_are_ignored = self
502 520 .traverse_fs_directory_and_dirstate(
503 521 &is_ignored,
504 522 dirstate_node.children(self.dmap.on_disk)?,
505 523 hg_path,
506 524 fs_path,
507 525 Some(fs_metadata),
508 526 dirstate_node.cached_directory_mtime()?,
509 527 is_at_repo_root,
510 528 )?;
511 529 self.maybe_save_directory_mtime(
512 530 children_all_have_dirstate_node_or_are_ignored,
513 531 fs_metadata,
514 532 dirstate_node,
515 533 outdated_dircache,
516 534 )?
517 535 } else {
518 536 if file_or_symlink && self.matcher.matches(&hg_path) {
519 537 if let Some(entry) = dirstate_node.entry()? {
520 538 if !entry.any_tracked() {
521 539 // Forward-compat if we start tracking unknown/ignored
522 540 // files for caching reasons
523 541 self.mark_unknown_or_ignored(
524 542 has_ignored_ancestor.force(&self.ignore_fn),
525 543 &hg_path,
526 544 );
527 545 }
528 546 if entry.added() {
529 547 self.push_outcome(Outcome::Added, &dirstate_node)?;
530 548 } else if entry.removed() {
531 549 self.push_outcome(Outcome::Removed, &dirstate_node)?;
532 550 } else if entry.modified() {
533 551 self.push_outcome(Outcome::Modified, &dirstate_node)?;
534 552 } else {
535 553 self.handle_normal_file(&dirstate_node, fs_metadata)?;
536 554 }
537 555 } else {
538 556 // `node.entry.is_none()` indicates a "directory"
539 557 // node, but the filesystem has a file
540 558 self.mark_unknown_or_ignored(
541 559 has_ignored_ancestor.force(&self.ignore_fn),
542 560 hg_path,
543 561 );
544 562 }
545 563 }
546 564
547 565 for child_node in dirstate_node.children(self.dmap.on_disk)?.iter()
548 566 {
549 567 self.traverse_dirstate_only(child_node)?
550 568 }
551 569 }
552 570 Ok(())
553 571 }
554 572
555 573 /// Save directory mtime if applicable.
556 574 ///
557 575 /// `outdated_directory_cache` is `true` if we've just invalidated the
558 576 /// cache for this directory in `check_for_outdated_directory_cache`,
559 577 /// which forces the update.
560 578 fn maybe_save_directory_mtime(
561 579 &self,
562 580 children_all_have_dirstate_node_or_are_ignored: bool,
563 581 directory_metadata: &std::fs::Metadata,
564 582 dirstate_node: NodeRef<'tree, 'on_disk>,
565 583 outdated_directory_cache: bool,
566 584 ) -> Result<(), DirstateV2ParseError> {
567 585 if !children_all_have_dirstate_node_or_are_ignored {
568 586 return Ok(());
569 587 }
570 588 // All filesystem directory entries from `read_dir` have a
571 589 // corresponding node in the dirstate, so we can reconstitute the
572 590 // names of those entries without calling `read_dir` again.
573 591
574 592 // TODO: use let-else here and below when available:
575 593 // https://github.com/rust-lang/rust/issues/87335
576 594 let status_start = if let Some(status_start) =
577 595 &self.filesystem_time_at_status_start
578 596 {
579 597 status_start
580 598 } else {
581 599 return Ok(());
582 600 };
583 601
584 602 // Although the Rust standard library’s `SystemTime` type
585 603 // has nanosecond precision, the times reported for a
586 604 // directory’s (or file’s) modified time may have lower
587 605 // resolution based on the filesystem (for example ext3
588 606 // only stores integer seconds), kernel (see
589 607 // https://stackoverflow.com/a/14393315/1162888), etc.
590 608 let directory_mtime = if let Ok(option) =
591 609 TruncatedTimestamp::for_reliable_mtime_of(
592 610 directory_metadata,
593 611 status_start,
594 612 ) {
595 613 if let Some(directory_mtime) = option {
596 614 directory_mtime
597 615 } else {
598 616 // The directory was modified too recently,
599 617 // don’t cache its `read_dir` results.
600 618 //
601 619 // 1. A change to this directory (direct child was
602 620 // added or removed) cause its mtime to be set
603 621 // (possibly truncated) to `directory_mtime`
604 622 // 2. This `status` algorithm calls `read_dir`
605 623 // 3. An other change is made to the same directory is
606 624 // made so that calling `read_dir` agin would give
607 625 // different results, but soon enough after 1. that
608 626 // the mtime stays the same
609 627 //
610 628 // On a system where the time resolution poor, this
611 629 // scenario is not unlikely if all three steps are caused
612 630 // by the same script.
613 631 return Ok(());
614 632 }
615 633 } else {
616 634 // OS/libc does not support mtime?
617 635 return Ok(());
618 636 };
619 637 // We’ve observed (through `status_start`) that time has
620 638 // “progressed” since `directory_mtime`, so any further
621 639 // change to this directory is extremely likely to cause a
622 640 // different mtime.
623 641 //
624 642 // Having the same mtime again is not entirely impossible
625 643 // since the system clock is not monotonous. It could jump
626 644 // backward to some point before `directory_mtime`, then a
627 645 // directory change could potentially happen during exactly
628 646 // the wrong tick.
629 647 //
630 648 // We deem this scenario (unlike the previous one) to be
631 649 // unlikely enough in practice.
632 650
633 651 let is_up_to_date = if let Some(cached) =
634 652 dirstate_node.cached_directory_mtime()?
635 653 {
636 654 !outdated_directory_cache && cached.likely_equal(directory_mtime)
637 655 } else {
638 656 false
639 657 };
640 658 if !is_up_to_date {
641 659 let hg_path = dirstate_node
642 660 .full_path_borrowed(self.dmap.on_disk)?
643 661 .detach_from_tree();
644 662 self.new_cacheable_directories
645 663 .lock()
646 664 .unwrap()
647 665 .push((hg_path, directory_mtime))
648 666 }
649 667 Ok(())
650 668 }
651 669
652 670 /// A file that is clean in the dirstate was found in the filesystem
653 671 fn handle_normal_file(
654 672 &self,
655 673 dirstate_node: &NodeRef<'tree, 'on_disk>,
656 674 fs_metadata: &std::fs::Metadata,
657 675 ) -> Result<(), DirstateV2ParseError> {
658 676 // Keep the low 31 bits
659 677 fn truncate_u64(value: u64) -> i32 {
660 678 (value & 0x7FFF_FFFF) as i32
661 679 }
662 680
663 681 let entry = dirstate_node
664 682 .entry()?
665 683 .expect("handle_normal_file called with entry-less node");
666 684 let mode_changed =
667 685 || self.options.check_exec && entry.mode_changed(fs_metadata);
668 686 let size = entry.size();
669 687 let size_changed = size != truncate_u64(fs_metadata.len());
670 688 if size >= 0 && size_changed && fs_metadata.file_type().is_symlink() {
671 689 // issue6456: Size returned may be longer due to encryption
672 690 // on EXT-4 fscrypt. TODO maybe only do it on EXT4?
673 691 self.push_outcome(Outcome::Unsure, dirstate_node)?
674 692 } else if dirstate_node.has_copy_source()
675 693 || entry.is_from_other_parent()
676 694 || (size >= 0 && (size_changed || mode_changed()))
677 695 {
678 696 self.push_outcome(Outcome::Modified, dirstate_node)?
679 697 } else {
680 698 let mtime_looks_clean;
681 699 if let Some(dirstate_mtime) = entry.truncated_mtime() {
682 700 let fs_mtime = TruncatedTimestamp::for_mtime_of(fs_metadata)
683 701 .expect("OS/libc does not support mtime?");
684 702 // There might be a change in the future if for example the
685 703 // internal clock become off while process run, but this is a
686 704 // case where the issues the user would face
687 705 // would be a lot worse and there is nothing we
688 706 // can really do.
689 707 mtime_looks_clean = fs_mtime.likely_equal(dirstate_mtime)
690 708 } else {
691 709 // No mtime in the dirstate entry
692 710 mtime_looks_clean = false
693 711 };
694 712 if !mtime_looks_clean {
695 713 self.push_outcome(Outcome::Unsure, dirstate_node)?
696 714 } else if self.options.list_clean {
697 715 self.push_outcome(Outcome::Clean, dirstate_node)?
698 716 }
699 717 }
700 718 Ok(())
701 719 }
702 720
703 721 /// A node in the dirstate tree has no corresponding filesystem entry
704 722 fn traverse_dirstate_only(
705 723 &self,
706 724 dirstate_node: NodeRef<'tree, 'on_disk>,
707 725 ) -> Result<(), DirstateV2ParseError> {
708 726 self.check_for_outdated_directory_cache(&dirstate_node)?;
709 727 self.mark_removed_or_deleted_if_file(&dirstate_node)?;
710 728 dirstate_node
711 729 .children(self.dmap.on_disk)?
712 730 .par_iter()
713 731 .map(|child_node| self.traverse_dirstate_only(child_node))
714 732 .collect()
715 733 }
716 734
717 735 /// A node in the dirstate tree has no corresponding *file* on the
718 736 /// filesystem
719 737 ///
720 738 /// Does nothing on a "directory" node
721 739 fn mark_removed_or_deleted_if_file(
722 740 &self,
723 741 dirstate_node: &NodeRef<'tree, 'on_disk>,
724 742 ) -> Result<(), DirstateV2ParseError> {
725 743 if let Some(entry) = dirstate_node.entry()? {
726 744 if !entry.any_tracked() {
727 745 // Future-compat for when we start storing ignored and unknown
728 746 // files for caching reasons
729 747 return Ok(());
730 748 }
731 749 let path = dirstate_node.full_path(self.dmap.on_disk)?;
732 750 if self.matcher.matches(path) {
733 751 if entry.removed() {
734 752 self.push_outcome(Outcome::Removed, dirstate_node)?
735 753 } else {
736 754 self.push_outcome(Outcome::Deleted, &dirstate_node)?
737 755 }
738 756 }
739 757 }
740 758 Ok(())
741 759 }
742 760
743 761 /// Something in the filesystem has no corresponding dirstate node
744 762 ///
745 763 /// Returns whether that path is ignored
746 764 fn traverse_fs_only(
747 765 &self,
748 766 has_ignored_ancestor: bool,
749 767 directory_hg_path: &HgPath,
750 768 fs_entry: &DirEntry,
751 769 ) -> bool {
752 770 let hg_path = directory_hg_path.join(&fs_entry.base_name);
753 771 let file_type = fs_entry.metadata.file_type();
754 772 let file_or_symlink = file_type.is_file() || file_type.is_symlink();
755 773 if file_type.is_dir() {
756 774 let is_ignored =
757 775 has_ignored_ancestor || (self.ignore_fn)(&hg_path);
758 776 let traverse_children = if is_ignored {
759 777 // Descendants of an ignored directory are all ignored
760 778 self.options.list_ignored
761 779 } else {
762 780 // Descendants of an unknown directory may be either unknown or
763 781 // ignored
764 782 self.options.list_unknown || self.options.list_ignored
765 783 };
766 784 if traverse_children {
767 785 let is_at_repo_root = false;
768 786 if let Ok(children_fs_entries) = self.read_dir(
769 787 &hg_path,
770 788 &fs_entry.full_path,
771 789 is_at_repo_root,
772 790 ) {
773 791 children_fs_entries.par_iter().for_each(|child_fs_entry| {
774 792 self.traverse_fs_only(
775 793 is_ignored,
776 794 &hg_path,
777 795 child_fs_entry,
778 796 );
779 797 })
780 798 }
781 799 if self.options.collect_traversed_dirs {
782 800 self.outcome.lock().unwrap().traversed.push(hg_path.into())
783 801 }
784 802 }
785 803 is_ignored
786 804 } else {
787 805 if file_or_symlink {
788 806 if self.matcher.matches(&hg_path) {
789 807 self.mark_unknown_or_ignored(
790 808 has_ignored_ancestor,
791 809 &BorrowedPath::InMemory(&hg_path),
792 810 )
793 811 } else {
794 812 // We haven’t computed whether this path is ignored. It
795 813 // might not be, and a future run of status might have a
796 814 // different matcher that matches it. So treat it as not
797 815 // ignored. That is, inhibit readdir caching of the parent
798 816 // directory.
799 817 false
800 818 }
801 819 } else {
802 820 // This is neither a directory, a plain file, or a symlink.
803 821 // Treat it like an ignored file.
804 822 true
805 823 }
806 824 }
807 825 }
808 826
809 827 /// Returns whether that path is ignored
810 828 fn mark_unknown_or_ignored(
811 829 &self,
812 830 has_ignored_ancestor: bool,
813 831 hg_path: &BorrowedPath<'_, 'on_disk>,
814 832 ) -> bool {
815 833 let is_ignored = has_ignored_ancestor || (self.ignore_fn)(&hg_path);
816 834 if is_ignored {
817 835 if self.options.list_ignored {
818 836 self.push_outcome_without_copy_source(
819 837 Outcome::Ignored,
820 838 hg_path,
821 839 )
822 840 }
823 841 } else {
824 842 if self.options.list_unknown {
825 843 self.push_outcome_without_copy_source(
826 844 Outcome::Unknown,
827 845 hg_path,
828 846 )
829 847 }
830 848 }
831 849 is_ignored
832 850 }
833 851 }
834 852
835 853 struct DirEntry {
836 854 base_name: HgPathBuf,
837 855 full_path: PathBuf,
838 856 metadata: std::fs::Metadata,
839 857 }
840 858
841 859 impl DirEntry {
842 860 /// Returns **unsorted** entries in the given directory, with name and
843 861 /// metadata.
844 862 ///
845 863 /// If a `.hg` sub-directory is encountered:
846 864 ///
847 865 /// * At the repository root, ignore that sub-directory
848 866 /// * Elsewhere, we’re listing the content of a sub-repo. Return an empty
849 867 /// list instead.
850 868 fn read_dir(path: &Path, is_at_repo_root: bool) -> io::Result<Vec<Self>> {
851 869 // `read_dir` returns a "not found" error for the empty path
852 870 let at_cwd = path == Path::new("");
853 871 let read_dir_path = if at_cwd { Path::new(".") } else { path };
854 872 let mut results = Vec::new();
855 873 for entry in read_dir_path.read_dir()? {
856 874 let entry = entry?;
857 875 let metadata = match entry.metadata() {
858 876 Ok(v) => v,
859 877 Err(e) => {
860 878 // race with file deletion?
861 879 if e.kind() == std::io::ErrorKind::NotFound {
862 880 continue;
863 881 } else {
864 882 return Err(e);
865 883 }
866 884 }
867 885 };
868 886 let file_name = entry.file_name();
869 887 // FIXME don't do this when cached
870 888 if file_name == ".hg" {
871 889 if is_at_repo_root {
872 890 // Skip the repo’s own .hg (might be a symlink)
873 891 continue;
874 892 } else if metadata.is_dir() {
875 893 // A .hg sub-directory at another location means a subrepo,
876 894 // skip it entirely.
877 895 return Ok(Vec::new());
878 896 }
879 897 }
880 898 let full_path = if at_cwd {
881 899 file_name.clone().into()
882 900 } else {
883 901 entry.path()
884 902 };
885 903 let base_name = get_bytes_from_os_string(file_name).into();
886 904 results.push(DirEntry {
887 905 base_name,
888 906 full_path,
889 907 metadata,
890 908 })
891 909 }
892 910 Ok(results)
893 911 }
894 912 }
895 913
896 914 /// Return the `mtime` of a temporary file newly-created in the `.hg` directory
897 915 /// of the give repository.
898 916 ///
899 917 /// This is similar to `SystemTime::now()`, with the result truncated to the
900 918 /// same time resolution as other files’ modification times. Using `.hg`
901 919 /// instead of the system’s default temporary directory (such as `/tmp`) makes
902 920 /// it more likely the temporary file is in the same disk partition as contents
903 921 /// of the working directory, which can matter since different filesystems may
904 922 /// store timestamps with different resolutions.
905 923 ///
906 924 /// This may fail, typically if we lack write permissions. In that case we
907 925 /// should continue the `status()` algoritm anyway and consider the current
908 926 /// date/time to be unknown.
909 927 fn filesystem_now(repo_root: &Path) -> Result<SystemTime, io::Error> {
910 928 tempfile::tempfile_in(repo_root.join(".hg"))?
911 929 .metadata()?
912 930 .modified()
913 931 }
@@ -1,706 +1,706
1 1 // filepatterns.rs
2 2 //
3 3 // Copyright 2019 Raphaël Gomès <rgomes@octobus.net>
4 4 //
5 5 // This software may be used and distributed according to the terms of the
6 6 // GNU General Public License version 2 or any later version.
7 7
8 8 //! Handling of Mercurial-specific patterns.
9 9
10 10 use crate::{
11 11 utils::{
12 12 files::{canonical_path, get_bytes_from_path, get_path_from_bytes},
13 13 hg_path::{path_to_hg_path_buf, HgPathBuf, HgPathError},
14 14 SliceExt,
15 15 },
16 16 FastHashMap, PatternError,
17 17 };
18 18 use lazy_static::lazy_static;
19 19 use regex::bytes::{NoExpand, Regex};
20 20 use std::ops::Deref;
21 21 use std::path::{Path, PathBuf};
22 22 use std::vec::Vec;
23 23
24 24 lazy_static! {
25 25 static ref RE_ESCAPE: Vec<Vec<u8>> = {
26 26 let mut v: Vec<Vec<u8>> = (0..=255).map(|byte| vec![byte]).collect();
27 27 let to_escape = b"()[]{}?*+-|^$\\.&~# \t\n\r\x0b\x0c";
28 28 for byte in to_escape {
29 29 v[*byte as usize].insert(0, b'\\');
30 30 }
31 31 v
32 32 };
33 33 }
34 34
35 35 /// These are matched in order
36 36 const GLOB_REPLACEMENTS: &[(&[u8], &[u8])] =
37 37 &[(b"*/", b"(?:.*/)?"), (b"*", b".*"), (b"", b"[^/]*")];
38 38
39 39 /// Appended to the regexp of globs
40 40 const GLOB_SUFFIX: &[u8; 7] = b"(?:/|$)";
41 41
42 42 #[derive(Debug, Clone, PartialEq, Eq)]
43 43 pub enum PatternSyntax {
44 44 /// A regular expression
45 45 Regexp,
46 46 /// Glob that matches at the front of the path
47 47 RootGlob,
48 48 /// Glob that matches at any suffix of the path (still anchored at
49 49 /// slashes)
50 50 Glob,
51 51 /// a path relative to repository root, which is matched recursively
52 52 Path,
53 53 /// A path relative to cwd
54 54 RelPath,
55 55 /// an unrooted glob (*.rs matches Rust files in all dirs)
56 56 RelGlob,
57 57 /// A regexp that needn't match the start of a name
58 58 RelRegexp,
59 59 /// A path relative to repository root, which is matched non-recursively
60 60 /// (will not match subdirectories)
61 61 RootFiles,
62 62 /// A file of patterns to read and include
63 63 Include,
64 64 /// A file of patterns to match against files under the same directory
65 65 SubInclude,
66 66 /// SubInclude with the result of parsing the included file
67 67 ///
68 68 /// Note: there is no ExpandedInclude because that expansion can be done
69 69 /// in place by replacing the Include pattern by the included patterns.
70 70 /// SubInclude requires more handling.
71 71 ///
72 72 /// Note: `Box` is used to minimize size impact on other enum variants
73 73 ExpandedSubInclude(Box<SubInclude>),
74 74 }
75 75
76 76 /// Transforms a glob pattern into a regex
77 77 fn glob_to_re(pat: &[u8]) -> Vec<u8> {
78 78 let mut input = pat;
79 79 let mut res: Vec<u8> = vec![];
80 80 let mut group_depth = 0;
81 81
82 82 while let Some((c, rest)) = input.split_first() {
83 83 input = rest;
84 84
85 85 match c {
86 86 b'*' => {
87 87 for (source, repl) in GLOB_REPLACEMENTS {
88 88 if let Some(rest) = input.drop_prefix(source) {
89 89 input = rest;
90 90 res.extend(*repl);
91 91 break;
92 92 }
93 93 }
94 94 }
95 95 b'?' => res.extend(b"."),
96 96 b'[' => {
97 97 match input.iter().skip(1).position(|b| *b == b']') {
98 98 None => res.extend(b"\\["),
99 99 Some(end) => {
100 100 // Account for the one we skipped
101 101 let end = end + 1;
102 102
103 103 res.extend(b"[");
104 104
105 105 for (i, b) in input[..end].iter().enumerate() {
106 106 if *b == b'!' && i == 0 {
107 107 res.extend(b"^")
108 108 } else if *b == b'^' && i == 0 {
109 109 res.extend(b"\\^")
110 110 } else if *b == b'\\' {
111 111 res.extend(b"\\\\")
112 112 } else {
113 113 res.push(*b)
114 114 }
115 115 }
116 116 res.extend(b"]");
117 117 input = &input[end + 1..];
118 118 }
119 119 }
120 120 }
121 121 b'{' => {
122 122 group_depth += 1;
123 123 res.extend(b"(?:")
124 124 }
125 125 b'}' if group_depth > 0 => {
126 126 group_depth -= 1;
127 127 res.extend(b")");
128 128 }
129 129 b',' if group_depth > 0 => res.extend(b"|"),
130 130 b'\\' => {
131 131 let c = {
132 132 if let Some((c, rest)) = input.split_first() {
133 133 input = rest;
134 134 c
135 135 } else {
136 136 c
137 137 }
138 138 };
139 139 res.extend(&RE_ESCAPE[*c as usize])
140 140 }
141 141 _ => res.extend(&RE_ESCAPE[*c as usize]),
142 142 }
143 143 }
144 144 res
145 145 }
146 146
147 147 fn escape_pattern(pattern: &[u8]) -> Vec<u8> {
148 148 pattern
149 149 .iter()
150 150 .flat_map(|c| RE_ESCAPE[*c as usize].clone())
151 151 .collect()
152 152 }
153 153
154 154 pub fn parse_pattern_syntax(
155 155 kind: &[u8],
156 156 ) -> Result<PatternSyntax, PatternError> {
157 157 match kind {
158 158 b"re:" => Ok(PatternSyntax::Regexp),
159 159 b"path:" => Ok(PatternSyntax::Path),
160 160 b"relpath:" => Ok(PatternSyntax::RelPath),
161 161 b"rootfilesin:" => Ok(PatternSyntax::RootFiles),
162 162 b"relglob:" => Ok(PatternSyntax::RelGlob),
163 163 b"relre:" => Ok(PatternSyntax::RelRegexp),
164 164 b"glob:" => Ok(PatternSyntax::Glob),
165 165 b"rootglob:" => Ok(PatternSyntax::RootGlob),
166 166 b"include:" => Ok(PatternSyntax::Include),
167 167 b"subinclude:" => Ok(PatternSyntax::SubInclude),
168 168 _ => Err(PatternError::UnsupportedSyntax(
169 169 String::from_utf8_lossy(kind).to_string(),
170 170 )),
171 171 }
172 172 }
173 173
174 174 /// Builds the regex that corresponds to the given pattern.
175 175 /// If within a `syntax: regexp` context, returns the pattern,
176 176 /// otherwise, returns the corresponding regex.
177 177 fn _build_single_regex(entry: &IgnorePattern) -> Vec<u8> {
178 178 let IgnorePattern {
179 179 syntax, pattern, ..
180 180 } = entry;
181 181 if pattern.is_empty() {
182 182 return vec![];
183 183 }
184 184 match syntax {
185 185 PatternSyntax::Regexp => pattern.to_owned(),
186 186 PatternSyntax::RelRegexp => {
187 187 // The `regex` crate accepts `**` while `re2` and Python's `re`
188 188 // do not. Checking for `*` correctly triggers the same error all
189 189 // engines.
190 190 if pattern[0] == b'^'
191 191 || pattern[0] == b'*'
192 192 || pattern.starts_with(b".*")
193 193 {
194 194 return pattern.to_owned();
195 195 }
196 196 [&b".*"[..], pattern].concat()
197 197 }
198 198 PatternSyntax::Path | PatternSyntax::RelPath => {
199 199 if pattern == b"." {
200 200 return vec![];
201 201 }
202 202 [escape_pattern(pattern).as_slice(), b"(?:/|$)"].concat()
203 203 }
204 204 PatternSyntax::RootFiles => {
205 205 let mut res = if pattern == b"." {
206 206 vec![]
207 207 } else {
208 208 // Pattern is a directory name.
209 209 [escape_pattern(pattern).as_slice(), b"/"].concat()
210 210 };
211 211
212 212 // Anything after the pattern must be a non-directory.
213 213 res.extend(b"[^/]+$");
214 214 res
215 215 }
216 216 PatternSyntax::RelGlob => {
217 217 let glob_re = glob_to_re(pattern);
218 218 if let Some(rest) = glob_re.drop_prefix(b"[^/]*") {
219 219 [b".*", rest, GLOB_SUFFIX].concat()
220 220 } else {
221 221 [b"(?:.*/)?", glob_re.as_slice(), GLOB_SUFFIX].concat()
222 222 }
223 223 }
224 224 PatternSyntax::Glob | PatternSyntax::RootGlob => {
225 225 [glob_to_re(pattern).as_slice(), GLOB_SUFFIX].concat()
226 226 }
227 227 PatternSyntax::Include
228 228 | PatternSyntax::SubInclude
229 229 | PatternSyntax::ExpandedSubInclude(_) => unreachable!(),
230 230 }
231 231 }
232 232
233 233 const GLOB_SPECIAL_CHARACTERS: [u8; 7] =
234 234 [b'*', b'?', b'[', b']', b'{', b'}', b'\\'];
235 235
236 236 /// TODO support other platforms
237 237 #[cfg(unix)]
238 238 pub fn normalize_path_bytes(bytes: &[u8]) -> Vec<u8> {
239 239 if bytes.is_empty() {
240 240 return b".".to_vec();
241 241 }
242 242 let sep = b'/';
243 243
244 244 let mut initial_slashes = bytes.iter().take_while(|b| **b == sep).count();
245 245 if initial_slashes > 2 {
246 246 // POSIX allows one or two initial slashes, but treats three or more
247 247 // as single slash.
248 248 initial_slashes = 1;
249 249 }
250 250 let components = bytes
251 251 .split(|b| *b == sep)
252 252 .filter(|c| !(c.is_empty() || c == b"."))
253 253 .fold(vec![], |mut acc, component| {
254 254 if component != b".."
255 255 || (initial_slashes == 0 && acc.is_empty())
256 256 || (!acc.is_empty() && acc[acc.len() - 1] == b"..")
257 257 {
258 258 acc.push(component)
259 259 } else if !acc.is_empty() {
260 260 acc.pop();
261 261 }
262 262 acc
263 263 });
264 264 let mut new_bytes = components.join(&sep);
265 265
266 266 if initial_slashes > 0 {
267 267 let mut buf: Vec<_> = (0..initial_slashes).map(|_| sep).collect();
268 268 buf.extend(new_bytes);
269 269 new_bytes = buf;
270 270 }
271 271 if new_bytes.is_empty() {
272 272 b".".to_vec()
273 273 } else {
274 274 new_bytes
275 275 }
276 276 }
277 277
278 278 /// Wrapper function to `_build_single_regex` that short-circuits 'exact' globs
279 279 /// that don't need to be transformed into a regex.
280 280 pub fn build_single_regex(
281 281 entry: &IgnorePattern,
282 282 ) -> Result<Option<Vec<u8>>, PatternError> {
283 283 let IgnorePattern {
284 284 pattern, syntax, ..
285 285 } = entry;
286 286 let pattern = match syntax {
287 287 PatternSyntax::RootGlob
288 288 | PatternSyntax::Path
289 289 | PatternSyntax::RelGlob
290 290 | PatternSyntax::RootFiles => normalize_path_bytes(&pattern),
291 291 PatternSyntax::Include | PatternSyntax::SubInclude => {
292 292 return Err(PatternError::NonRegexPattern(entry.clone()))
293 293 }
294 294 _ => pattern.to_owned(),
295 295 };
296 296 if *syntax == PatternSyntax::RootGlob
297 297 && !pattern.iter().any(|b| GLOB_SPECIAL_CHARACTERS.contains(b))
298 298 {
299 299 Ok(None)
300 300 } else {
301 301 let mut entry = entry.clone();
302 302 entry.pattern = pattern;
303 303 Ok(Some(_build_single_regex(&entry)))
304 304 }
305 305 }
306 306
307 307 lazy_static! {
308 308 static ref SYNTAXES: FastHashMap<&'static [u8], &'static [u8]> = {
309 309 let mut m = FastHashMap::default();
310 310
311 311 m.insert(b"re".as_ref(), b"relre:".as_ref());
312 312 m.insert(b"regexp".as_ref(), b"relre:".as_ref());
313 313 m.insert(b"glob".as_ref(), b"relglob:".as_ref());
314 314 m.insert(b"rootglob".as_ref(), b"rootglob:".as_ref());
315 315 m.insert(b"include".as_ref(), b"include:".as_ref());
316 316 m.insert(b"subinclude".as_ref(), b"subinclude:".as_ref());
317 317 m.insert(b"path".as_ref(), b"path:".as_ref());
318 318 m.insert(b"rootfilesin".as_ref(), b"rootfilesin:".as_ref());
319 319 m
320 320 };
321 321 }
322 322
323 323 #[derive(Debug)]
324 324 pub enum PatternFileWarning {
325 325 /// (file path, syntax bytes)
326 326 InvalidSyntax(PathBuf, Vec<u8>),
327 327 /// File path
328 328 NoSuchFile(PathBuf),
329 329 }
330 330
331 331 pub fn parse_pattern_file_contents(
332 332 lines: &[u8],
333 333 file_path: &Path,
334 334 default_syntax_override: Option<&[u8]>,
335 335 warn: bool,
336 336 ) -> Result<(Vec<IgnorePattern>, Vec<PatternFileWarning>), PatternError> {
337 337 let comment_regex = Regex::new(r"((?:^|[^\\])(?:\\\\)*)#.*").unwrap();
338 338
339 339 #[allow(clippy::trivial_regex)]
340 340 let comment_escape_regex = Regex::new(r"\\#").unwrap();
341 341 let mut inputs: Vec<IgnorePattern> = vec![];
342 342 let mut warnings: Vec<PatternFileWarning> = vec![];
343 343
344 344 let mut current_syntax =
345 345 default_syntax_override.unwrap_or(b"relre:".as_ref());
346 346
347 347 for (line_number, mut line) in lines.split(|c| *c == b'\n').enumerate() {
348 348 let line_number = line_number + 1;
349 349
350 350 let line_buf;
351 351 if line.contains(&b'#') {
352 352 if let Some(cap) = comment_regex.captures(line) {
353 353 line = &line[..cap.get(1).unwrap().end()]
354 354 }
355 355 line_buf = comment_escape_regex.replace_all(line, NoExpand(b"#"));
356 356 line = &line_buf;
357 357 }
358 358
359 359 let mut line = line.trim_end();
360 360
361 361 if line.is_empty() {
362 362 continue;
363 363 }
364 364
365 365 if let Some(syntax) = line.drop_prefix(b"syntax:") {
366 366 let syntax = syntax.trim();
367 367
368 368 if let Some(rel_syntax) = SYNTAXES.get(syntax) {
369 369 current_syntax = rel_syntax;
370 370 } else if warn {
371 371 warnings.push(PatternFileWarning::InvalidSyntax(
372 372 file_path.to_owned(),
373 373 syntax.to_owned(),
374 374 ));
375 375 }
376 376 continue;
377 377 }
378 378
379 379 let mut line_syntax: &[u8] = &current_syntax;
380 380
381 381 for (s, rels) in SYNTAXES.iter() {
382 382 if let Some(rest) = line.drop_prefix(rels) {
383 383 line_syntax = rels;
384 384 line = rest;
385 385 break;
386 386 }
387 387 if let Some(rest) = line.drop_prefix(&[s, &b":"[..]].concat()) {
388 388 line_syntax = rels;
389 389 line = rest;
390 390 break;
391 391 }
392 392 }
393 393
394 394 inputs.push(IgnorePattern::new(
395 395 parse_pattern_syntax(&line_syntax).map_err(|e| match e {
396 396 PatternError::UnsupportedSyntax(syntax) => {
397 397 PatternError::UnsupportedSyntaxInFile(
398 398 syntax,
399 399 file_path.to_string_lossy().into(),
400 400 line_number,
401 401 )
402 402 }
403 403 _ => e,
404 404 })?,
405 405 &line,
406 406 file_path,
407 407 ));
408 408 }
409 409 Ok((inputs, warnings))
410 410 }
411 411
412 412 pub fn read_pattern_file(
413 413 file_path: &Path,
414 414 warn: bool,
415 inspect_pattern_bytes: &mut impl FnMut(&[u8]),
415 inspect_pattern_bytes: &mut impl FnMut(&Path, &[u8]),
416 416 ) -> Result<(Vec<IgnorePattern>, Vec<PatternFileWarning>), PatternError> {
417 417 match std::fs::read(file_path) {
418 418 Ok(contents) => {
419 inspect_pattern_bytes(&contents);
419 inspect_pattern_bytes(file_path, &contents);
420 420 parse_pattern_file_contents(&contents, file_path, None, warn)
421 421 }
422 422 Err(e) if e.kind() == std::io::ErrorKind::NotFound => Ok((
423 423 vec![],
424 424 vec![PatternFileWarning::NoSuchFile(file_path.to_owned())],
425 425 )),
426 426 Err(e) => Err(e.into()),
427 427 }
428 428 }
429 429
430 430 /// Represents an entry in an "ignore" file.
431 431 #[derive(Debug, Eq, PartialEq, Clone)]
432 432 pub struct IgnorePattern {
433 433 pub syntax: PatternSyntax,
434 434 pub pattern: Vec<u8>,
435 435 pub source: PathBuf,
436 436 }
437 437
438 438 impl IgnorePattern {
439 439 pub fn new(syntax: PatternSyntax, pattern: &[u8], source: &Path) -> Self {
440 440 Self {
441 441 syntax,
442 442 pattern: pattern.to_owned(),
443 443 source: source.to_owned(),
444 444 }
445 445 }
446 446 }
447 447
448 448 pub type PatternResult<T> = Result<T, PatternError>;
449 449
450 450 /// Wrapper for `read_pattern_file` that also recursively expands `include:`
451 451 /// and `subinclude:` patterns.
452 452 ///
453 453 /// The former are expanded in place, while `PatternSyntax::ExpandedSubInclude`
454 454 /// is used for the latter to form a tree of patterns.
455 455 pub fn get_patterns_from_file(
456 456 pattern_file: &Path,
457 457 root_dir: &Path,
458 inspect_pattern_bytes: &mut impl FnMut(&[u8]),
458 inspect_pattern_bytes: &mut impl FnMut(&Path, &[u8]),
459 459 ) -> PatternResult<(Vec<IgnorePattern>, Vec<PatternFileWarning>)> {
460 460 let (patterns, mut warnings) =
461 461 read_pattern_file(pattern_file, true, inspect_pattern_bytes)?;
462 462 let patterns = patterns
463 463 .into_iter()
464 464 .flat_map(|entry| -> PatternResult<_> {
465 465 Ok(match &entry.syntax {
466 466 PatternSyntax::Include => {
467 467 let inner_include =
468 468 root_dir.join(get_path_from_bytes(&entry.pattern));
469 469 let (inner_pats, inner_warnings) = get_patterns_from_file(
470 470 &inner_include,
471 471 root_dir,
472 472 inspect_pattern_bytes,
473 473 )?;
474 474 warnings.extend(inner_warnings);
475 475 inner_pats
476 476 }
477 477 PatternSyntax::SubInclude => {
478 478 let mut sub_include = SubInclude::new(
479 479 &root_dir,
480 480 &entry.pattern,
481 481 &entry.source,
482 482 )?;
483 483 let (inner_patterns, inner_warnings) =
484 484 get_patterns_from_file(
485 485 &sub_include.path,
486 486 &sub_include.root,
487 487 inspect_pattern_bytes,
488 488 )?;
489 489 sub_include.included_patterns = inner_patterns;
490 490 warnings.extend(inner_warnings);
491 491 vec![IgnorePattern {
492 492 syntax: PatternSyntax::ExpandedSubInclude(Box::new(
493 493 sub_include,
494 494 )),
495 495 ..entry
496 496 }]
497 497 }
498 498 _ => vec![entry],
499 499 })
500 500 })
501 501 .flatten()
502 502 .collect();
503 503
504 504 Ok((patterns, warnings))
505 505 }
506 506
507 507 /// Holds all the information needed to handle a `subinclude:` pattern.
508 508 #[derive(Debug, PartialEq, Eq, Clone)]
509 509 pub struct SubInclude {
510 510 /// Will be used for repository (hg) paths that start with this prefix.
511 511 /// It is relative to the current working directory, so comparing against
512 512 /// repository paths is painless.
513 513 pub prefix: HgPathBuf,
514 514 /// The file itself, containing the patterns
515 515 pub path: PathBuf,
516 516 /// Folder in the filesystem where this it applies
517 517 pub root: PathBuf,
518 518
519 519 pub included_patterns: Vec<IgnorePattern>,
520 520 }
521 521
522 522 impl SubInclude {
523 523 pub fn new(
524 524 root_dir: &Path,
525 525 pattern: &[u8],
526 526 source: &Path,
527 527 ) -> Result<SubInclude, HgPathError> {
528 528 let normalized_source =
529 529 normalize_path_bytes(&get_bytes_from_path(source));
530 530
531 531 let source_root = get_path_from_bytes(&normalized_source);
532 532 let source_root =
533 533 source_root.parent().unwrap_or_else(|| source_root.deref());
534 534
535 535 let path = source_root.join(get_path_from_bytes(pattern));
536 536 let new_root = path.parent().unwrap_or_else(|| path.deref());
537 537
538 538 let prefix = canonical_path(root_dir, root_dir, new_root)?;
539 539
540 540 Ok(Self {
541 541 prefix: path_to_hg_path_buf(prefix).and_then(|mut p| {
542 542 if !p.is_empty() {
543 543 p.push_byte(b'/');
544 544 }
545 545 Ok(p)
546 546 })?,
547 547 path: path.to_owned(),
548 548 root: new_root.to_owned(),
549 549 included_patterns: Vec::new(),
550 550 })
551 551 }
552 552 }
553 553
554 554 /// Separate and pre-process subincludes from other patterns for the "ignore"
555 555 /// phase.
556 556 pub fn filter_subincludes(
557 557 ignore_patterns: Vec<IgnorePattern>,
558 558 ) -> Result<(Vec<Box<SubInclude>>, Vec<IgnorePattern>), HgPathError> {
559 559 let mut subincludes = vec![];
560 560 let mut others = vec![];
561 561
562 562 for pattern in ignore_patterns {
563 563 if let PatternSyntax::ExpandedSubInclude(sub_include) = pattern.syntax
564 564 {
565 565 subincludes.push(sub_include);
566 566 } else {
567 567 others.push(pattern)
568 568 }
569 569 }
570 570 Ok((subincludes, others))
571 571 }
572 572
573 573 #[cfg(test)]
574 574 mod tests {
575 575 use super::*;
576 576 use pretty_assertions::assert_eq;
577 577
578 578 #[test]
579 579 fn escape_pattern_test() {
580 580 let untouched =
581 581 br#"!"%',/0123456789:;<=>@ABCDEFGHIJKLMNOPQRSTUVWXYZ_`abcdefghijklmnopqrstuvwxyz"#;
582 582 assert_eq!(escape_pattern(untouched), untouched.to_vec());
583 583 // All escape codes
584 584 assert_eq!(
585 585 escape_pattern(br#"()[]{}?*+-|^$\\.&~# \t\n\r\v\f"#),
586 586 br#"\(\)\[\]\{\}\?\*\+\-\|\^\$\\\\\.\&\~\#\ \\t\\n\\r\\v\\f"#
587 587 .to_vec()
588 588 );
589 589 }
590 590
591 591 #[test]
592 592 fn glob_test() {
593 593 assert_eq!(glob_to_re(br#"?"#), br#"."#);
594 594 assert_eq!(glob_to_re(br#"*"#), br#"[^/]*"#);
595 595 assert_eq!(glob_to_re(br#"**"#), br#".*"#);
596 596 assert_eq!(glob_to_re(br#"**/a"#), br#"(?:.*/)?a"#);
597 597 assert_eq!(glob_to_re(br#"a/**/b"#), br#"a/(?:.*/)?b"#);
598 598 assert_eq!(glob_to_re(br#"[a*?!^][^b][!c]"#), br#"[a*?!^][\^b][^c]"#);
599 599 assert_eq!(glob_to_re(br#"{a,b}"#), br#"(?:a|b)"#);
600 600 assert_eq!(glob_to_re(br#".\*\?"#), br#"\.\*\?"#);
601 601 }
602 602
603 603 #[test]
604 604 fn test_parse_pattern_file_contents() {
605 605 let lines = b"syntax: glob\n*.elc";
606 606
607 607 assert_eq!(
608 608 parse_pattern_file_contents(
609 609 lines,
610 610 Path::new("file_path"),
611 611 None,
612 612 false
613 613 )
614 614 .unwrap()
615 615 .0,
616 616 vec![IgnorePattern::new(
617 617 PatternSyntax::RelGlob,
618 618 b"*.elc",
619 619 Path::new("file_path")
620 620 )],
621 621 );
622 622
623 623 let lines = b"syntax: include\nsyntax: glob";
624 624
625 625 assert_eq!(
626 626 parse_pattern_file_contents(
627 627 lines,
628 628 Path::new("file_path"),
629 629 None,
630 630 false
631 631 )
632 632 .unwrap()
633 633 .0,
634 634 vec![]
635 635 );
636 636 let lines = b"glob:**.o";
637 637 assert_eq!(
638 638 parse_pattern_file_contents(
639 639 lines,
640 640 Path::new("file_path"),
641 641 None,
642 642 false
643 643 )
644 644 .unwrap()
645 645 .0,
646 646 vec![IgnorePattern::new(
647 647 PatternSyntax::RelGlob,
648 648 b"**.o",
649 649 Path::new("file_path")
650 650 )]
651 651 );
652 652 }
653 653
654 654 #[test]
655 655 fn test_build_single_regex() {
656 656 assert_eq!(
657 657 build_single_regex(&IgnorePattern::new(
658 658 PatternSyntax::RelGlob,
659 659 b"rust/target/",
660 660 Path::new("")
661 661 ))
662 662 .unwrap(),
663 663 Some(br"(?:.*/)?rust/target(?:/|$)".to_vec()),
664 664 );
665 665 assert_eq!(
666 666 build_single_regex(&IgnorePattern::new(
667 667 PatternSyntax::Regexp,
668 668 br"rust/target/\d+",
669 669 Path::new("")
670 670 ))
671 671 .unwrap(),
672 672 Some(br"rust/target/\d+".to_vec()),
673 673 );
674 674 }
675 675
676 676 #[test]
677 677 fn test_build_single_regex_shortcut() {
678 678 assert_eq!(
679 679 build_single_regex(&IgnorePattern::new(
680 680 PatternSyntax::RootGlob,
681 681 b"",
682 682 Path::new("")
683 683 ))
684 684 .unwrap(),
685 685 None,
686 686 );
687 687 assert_eq!(
688 688 build_single_regex(&IgnorePattern::new(
689 689 PatternSyntax::RootGlob,
690 690 b"whatever",
691 691 Path::new("")
692 692 ))
693 693 .unwrap(),
694 694 None,
695 695 );
696 696 assert_eq!(
697 697 build_single_regex(&IgnorePattern::new(
698 698 PatternSyntax::RootGlob,
699 699 b"*.o",
700 700 Path::new("")
701 701 ))
702 702 .unwrap(),
703 703 Some(br"[^/]*\.o(?:/|$)".to_vec()),
704 704 );
705 705 }
706 706 }
@@ -1,1688 +1,1688
1 1 // matchers.rs
2 2 //
3 3 // Copyright 2019 Raphaël Gomès <rgomes@octobus.net>
4 4 //
5 5 // This software may be used and distributed according to the terms of the
6 6 // GNU General Public License version 2 or any later version.
7 7
8 8 //! Structs and types for matching files and directories.
9 9
10 10 use crate::{
11 11 dirstate::dirs_multiset::DirsChildrenMultiset,
12 12 filepatterns::{
13 13 build_single_regex, filter_subincludes, get_patterns_from_file,
14 14 PatternFileWarning, PatternResult,
15 15 },
16 16 utils::{
17 17 files::find_dirs,
18 18 hg_path::{HgPath, HgPathBuf},
19 19 Escaped,
20 20 },
21 21 DirsMultiset, DirstateMapError, FastHashMap, IgnorePattern, PatternError,
22 22 PatternSyntax,
23 23 };
24 24
25 25 use crate::dirstate::status::IgnoreFnType;
26 26 use crate::filepatterns::normalize_path_bytes;
27 27 use std::borrow::ToOwned;
28 28 use std::collections::HashSet;
29 29 use std::fmt::{Display, Error, Formatter};
30 30 use std::iter::FromIterator;
31 31 use std::ops::Deref;
32 32 use std::path::{Path, PathBuf};
33 33
34 34 use micro_timer::timed;
35 35
36 36 #[derive(Debug, PartialEq)]
37 37 pub enum VisitChildrenSet {
38 38 /// Don't visit anything
39 39 Empty,
40 40 /// Only visit this directory
41 41 This,
42 42 /// Visit this directory and these subdirectories
43 43 /// TODO Should we implement a `NonEmptyHashSet`?
44 44 Set(HashSet<HgPathBuf>),
45 45 /// Visit this directory and all subdirectories
46 46 Recursive,
47 47 }
48 48
49 49 pub trait Matcher: core::fmt::Debug {
50 50 /// Explicitly listed files
51 51 fn file_set(&self) -> Option<&HashSet<HgPathBuf>>;
52 52 /// Returns whether `filename` is in `file_set`
53 53 fn exact_match(&self, filename: &HgPath) -> bool;
54 54 /// Returns whether `filename` is matched by this matcher
55 55 fn matches(&self, filename: &HgPath) -> bool;
56 56 /// Decides whether a directory should be visited based on whether it
57 57 /// has potential matches in it or one of its subdirectories, and
58 58 /// potentially lists which subdirectories of that directory should be
59 59 /// visited. This is based on the match's primary, included, and excluded
60 60 /// patterns.
61 61 ///
62 62 /// # Example
63 63 ///
64 64 /// Assume matchers `['path:foo/bar', 'rootfilesin:qux']`, we would
65 65 /// return the following values (assuming the implementation of
66 66 /// visit_children_set is capable of recognizing this; some implementations
67 67 /// are not).
68 68 ///
69 69 /// ```text
70 70 /// ```ignore
71 71 /// '' -> {'foo', 'qux'}
72 72 /// 'baz' -> set()
73 73 /// 'foo' -> {'bar'}
74 74 /// // Ideally this would be `Recursive`, but since the prefix nature of
75 75 /// // matchers is applied to the entire matcher, we have to downgrade this
76 76 /// // to `This` due to the (yet to be implemented in Rust) non-prefix
77 77 /// // `RootFilesIn'-kind matcher being mixed in.
78 78 /// 'foo/bar' -> 'this'
79 79 /// 'qux' -> 'this'
80 80 /// ```
81 81 /// # Important
82 82 ///
83 83 /// Most matchers do not know if they're representing files or
84 84 /// directories. They see `['path:dir/f']` and don't know whether `f` is a
85 85 /// file or a directory, so `visit_children_set('dir')` for most matchers
86 86 /// will return `HashSet{ HgPath { "f" } }`, but if the matcher knows it's
87 87 /// a file (like the yet to be implemented in Rust `ExactMatcher` does),
88 88 /// it may return `VisitChildrenSet::This`.
89 89 /// Do not rely on the return being a `HashSet` indicating that there are
90 90 /// no files in this dir to investigate (or equivalently that if there are
91 91 /// files to investigate in 'dir' that it will always return
92 92 /// `VisitChildrenSet::This`).
93 93 fn visit_children_set(&self, directory: &HgPath) -> VisitChildrenSet;
94 94 /// Matcher will match everything and `files_set()` will be empty:
95 95 /// optimization might be possible.
96 96 fn matches_everything(&self) -> bool;
97 97 /// Matcher will match exactly the files in `files_set()`: optimization
98 98 /// might be possible.
99 99 fn is_exact(&self) -> bool;
100 100 }
101 101
102 102 /// Matches everything.
103 103 ///```
104 104 /// use hg::{ matchers::{Matcher, AlwaysMatcher}, utils::hg_path::HgPath };
105 105 ///
106 106 /// let matcher = AlwaysMatcher;
107 107 ///
108 108 /// assert_eq!(matcher.matches(HgPath::new(b"whatever")), true);
109 109 /// assert_eq!(matcher.matches(HgPath::new(b"b.txt")), true);
110 110 /// assert_eq!(matcher.matches(HgPath::new(b"main.c")), true);
111 111 /// assert_eq!(matcher.matches(HgPath::new(br"re:.*\.c$")), true);
112 112 /// ```
113 113 #[derive(Debug)]
114 114 pub struct AlwaysMatcher;
115 115
116 116 impl Matcher for AlwaysMatcher {
117 117 fn file_set(&self) -> Option<&HashSet<HgPathBuf>> {
118 118 None
119 119 }
120 120 fn exact_match(&self, _filename: &HgPath) -> bool {
121 121 false
122 122 }
123 123 fn matches(&self, _filename: &HgPath) -> bool {
124 124 true
125 125 }
126 126 fn visit_children_set(&self, _directory: &HgPath) -> VisitChildrenSet {
127 127 VisitChildrenSet::Recursive
128 128 }
129 129 fn matches_everything(&self) -> bool {
130 130 true
131 131 }
132 132 fn is_exact(&self) -> bool {
133 133 false
134 134 }
135 135 }
136 136
137 137 /// Matches nothing.
138 138 #[derive(Debug)]
139 139 pub struct NeverMatcher;
140 140
141 141 impl Matcher for NeverMatcher {
142 142 fn file_set(&self) -> Option<&HashSet<HgPathBuf>> {
143 143 None
144 144 }
145 145 fn exact_match(&self, _filename: &HgPath) -> bool {
146 146 false
147 147 }
148 148 fn matches(&self, _filename: &HgPath) -> bool {
149 149 false
150 150 }
151 151 fn visit_children_set(&self, _directory: &HgPath) -> VisitChildrenSet {
152 152 VisitChildrenSet::Empty
153 153 }
154 154 fn matches_everything(&self) -> bool {
155 155 false
156 156 }
157 157 fn is_exact(&self) -> bool {
158 158 true
159 159 }
160 160 }
161 161
162 162 /// Matches the input files exactly. They are interpreted as paths, not
163 163 /// patterns.
164 164 ///
165 165 ///```
166 166 /// use hg::{ matchers::{Matcher, FileMatcher}, utils::hg_path::{HgPath, HgPathBuf} };
167 167 ///
168 168 /// let files = vec![HgPathBuf::from_bytes(b"a.txt"), HgPathBuf::from_bytes(br"re:.*\.c$")];
169 169 /// let matcher = FileMatcher::new(files).unwrap();
170 170 ///
171 171 /// assert_eq!(matcher.matches(HgPath::new(b"a.txt")), true);
172 172 /// assert_eq!(matcher.matches(HgPath::new(b"b.txt")), false);
173 173 /// assert_eq!(matcher.matches(HgPath::new(b"main.c")), false);
174 174 /// assert_eq!(matcher.matches(HgPath::new(br"re:.*\.c$")), true);
175 175 /// ```
176 176 #[derive(Debug)]
177 177 pub struct FileMatcher {
178 178 files: HashSet<HgPathBuf>,
179 179 dirs: DirsMultiset,
180 180 }
181 181
182 182 impl FileMatcher {
183 183 pub fn new(files: Vec<HgPathBuf>) -> Result<Self, DirstateMapError> {
184 184 let dirs = DirsMultiset::from_manifest(&files)?;
185 185 Ok(Self {
186 186 files: HashSet::from_iter(files.into_iter()),
187 187 dirs,
188 188 })
189 189 }
190 190 fn inner_matches(&self, filename: &HgPath) -> bool {
191 191 self.files.contains(filename.as_ref())
192 192 }
193 193 }
194 194
195 195 impl Matcher for FileMatcher {
196 196 fn file_set(&self) -> Option<&HashSet<HgPathBuf>> {
197 197 Some(&self.files)
198 198 }
199 199 fn exact_match(&self, filename: &HgPath) -> bool {
200 200 self.inner_matches(filename)
201 201 }
202 202 fn matches(&self, filename: &HgPath) -> bool {
203 203 self.inner_matches(filename)
204 204 }
205 205 fn visit_children_set(&self, directory: &HgPath) -> VisitChildrenSet {
206 206 if self.files.is_empty() || !self.dirs.contains(&directory) {
207 207 return VisitChildrenSet::Empty;
208 208 }
209 209 let mut candidates: HashSet<HgPathBuf> =
210 210 self.dirs.iter().cloned().collect();
211 211
212 212 candidates.extend(self.files.iter().cloned());
213 213 candidates.remove(HgPath::new(b""));
214 214
215 215 if !directory.as_ref().is_empty() {
216 216 let directory = [directory.as_ref().as_bytes(), b"/"].concat();
217 217 candidates = candidates
218 218 .iter()
219 219 .filter_map(|c| {
220 220 if c.as_bytes().starts_with(&directory) {
221 221 Some(HgPathBuf::from_bytes(
222 222 &c.as_bytes()[directory.len()..],
223 223 ))
224 224 } else {
225 225 None
226 226 }
227 227 })
228 228 .collect();
229 229 }
230 230
231 231 // `self.dirs` includes all of the directories, recursively, so if
232 232 // we're attempting to match 'foo/bar/baz.txt', it'll have '', 'foo',
233 233 // 'foo/bar' in it. Thus we can safely ignore a candidate that has a
234 234 // '/' in it, indicating it's for a subdir-of-a-subdir; the immediate
235 235 // subdir will be in there without a slash.
236 236 VisitChildrenSet::Set(
237 237 candidates
238 238 .into_iter()
239 239 .filter_map(|c| {
240 240 if c.bytes().all(|b| *b != b'/') {
241 241 Some(c)
242 242 } else {
243 243 None
244 244 }
245 245 })
246 246 .collect(),
247 247 )
248 248 }
249 249 fn matches_everything(&self) -> bool {
250 250 false
251 251 }
252 252 fn is_exact(&self) -> bool {
253 253 true
254 254 }
255 255 }
256 256
257 257 /// Matches files that are included in the ignore rules.
258 258 /// ```
259 259 /// use hg::{
260 260 /// matchers::{IncludeMatcher, Matcher},
261 261 /// IgnorePattern,
262 262 /// PatternSyntax,
263 263 /// utils::hg_path::HgPath
264 264 /// };
265 265 /// use std::path::Path;
266 266 /// ///
267 267 /// let ignore_patterns =
268 268 /// vec![IgnorePattern::new(PatternSyntax::RootGlob, b"this*", Path::new(""))];
269 269 /// let matcher = IncludeMatcher::new(ignore_patterns).unwrap();
270 270 /// ///
271 271 /// assert_eq!(matcher.matches(HgPath::new(b"testing")), false);
272 272 /// assert_eq!(matcher.matches(HgPath::new(b"this should work")), true);
273 273 /// assert_eq!(matcher.matches(HgPath::new(b"this also")), true);
274 274 /// assert_eq!(matcher.matches(HgPath::new(b"but not this")), false);
275 275 /// ```
276 276 pub struct IncludeMatcher<'a> {
277 277 patterns: Vec<u8>,
278 278 match_fn: IgnoreFnType<'a>,
279 279 /// Whether all the patterns match a prefix (i.e. recursively)
280 280 prefix: bool,
281 281 roots: HashSet<HgPathBuf>,
282 282 dirs: HashSet<HgPathBuf>,
283 283 parents: HashSet<HgPathBuf>,
284 284 }
285 285
286 286 impl core::fmt::Debug for IncludeMatcher<'_> {
287 287 fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
288 288 f.debug_struct("IncludeMatcher")
289 289 .field("patterns", &String::from_utf8_lossy(&self.patterns))
290 290 .field("prefix", &self.prefix)
291 291 .field("roots", &self.roots)
292 292 .field("dirs", &self.dirs)
293 293 .field("parents", &self.parents)
294 294 .finish()
295 295 }
296 296 }
297 297
298 298 impl<'a> Matcher for IncludeMatcher<'a> {
299 299 fn file_set(&self) -> Option<&HashSet<HgPathBuf>> {
300 300 None
301 301 }
302 302
303 303 fn exact_match(&self, _filename: &HgPath) -> bool {
304 304 false
305 305 }
306 306
307 307 fn matches(&self, filename: &HgPath) -> bool {
308 308 (self.match_fn)(filename.as_ref())
309 309 }
310 310
311 311 fn visit_children_set(&self, directory: &HgPath) -> VisitChildrenSet {
312 312 let dir = directory.as_ref();
313 313 if self.prefix && self.roots.contains(dir) {
314 314 return VisitChildrenSet::Recursive;
315 315 }
316 316 if self.roots.contains(HgPath::new(b""))
317 317 || self.roots.contains(dir)
318 318 || self.dirs.contains(dir)
319 319 || find_dirs(dir).any(|parent_dir| self.roots.contains(parent_dir))
320 320 {
321 321 return VisitChildrenSet::This;
322 322 }
323 323
324 324 if self.parents.contains(directory.as_ref()) {
325 325 let multiset = self.get_all_parents_children();
326 326 if let Some(children) = multiset.get(dir) {
327 327 return VisitChildrenSet::Set(
328 328 children.into_iter().map(HgPathBuf::from).collect(),
329 329 );
330 330 }
331 331 }
332 332 VisitChildrenSet::Empty
333 333 }
334 334
335 335 fn matches_everything(&self) -> bool {
336 336 false
337 337 }
338 338
339 339 fn is_exact(&self) -> bool {
340 340 false
341 341 }
342 342 }
343 343
344 344 /// The union of multiple matchers. Will match if any of the matchers match.
345 345 #[derive(Debug)]
346 346 pub struct UnionMatcher {
347 347 matchers: Vec<Box<dyn Matcher + Sync>>,
348 348 }
349 349
350 350 impl Matcher for UnionMatcher {
351 351 fn file_set(&self) -> Option<&HashSet<HgPathBuf>> {
352 352 None
353 353 }
354 354
355 355 fn exact_match(&self, _filename: &HgPath) -> bool {
356 356 false
357 357 }
358 358
359 359 fn matches(&self, filename: &HgPath) -> bool {
360 360 self.matchers.iter().any(|m| m.matches(filename))
361 361 }
362 362
363 363 fn visit_children_set(&self, directory: &HgPath) -> VisitChildrenSet {
364 364 let mut result = HashSet::new();
365 365 let mut this = false;
366 366 for matcher in self.matchers.iter() {
367 367 let visit = matcher.visit_children_set(directory);
368 368 match visit {
369 369 VisitChildrenSet::Empty => continue,
370 370 VisitChildrenSet::This => {
371 371 this = true;
372 372 // Don't break, we might have an 'all' in here.
373 373 continue;
374 374 }
375 375 VisitChildrenSet::Set(set) => {
376 376 result.extend(set);
377 377 }
378 378 VisitChildrenSet::Recursive => {
379 379 return visit;
380 380 }
381 381 }
382 382 }
383 383 if this {
384 384 return VisitChildrenSet::This;
385 385 }
386 386 if result.is_empty() {
387 387 VisitChildrenSet::Empty
388 388 } else {
389 389 VisitChildrenSet::Set(result)
390 390 }
391 391 }
392 392
393 393 fn matches_everything(&self) -> bool {
394 394 // TODO Maybe if all are AlwaysMatcher?
395 395 false
396 396 }
397 397
398 398 fn is_exact(&self) -> bool {
399 399 false
400 400 }
401 401 }
402 402
403 403 impl UnionMatcher {
404 404 pub fn new(matchers: Vec<Box<dyn Matcher + Sync>>) -> Self {
405 405 Self { matchers }
406 406 }
407 407 }
408 408
409 409 #[derive(Debug)]
410 410 pub struct IntersectionMatcher {
411 411 m1: Box<dyn Matcher + Sync>,
412 412 m2: Box<dyn Matcher + Sync>,
413 413 files: Option<HashSet<HgPathBuf>>,
414 414 }
415 415
416 416 impl Matcher for IntersectionMatcher {
417 417 fn file_set(&self) -> Option<&HashSet<HgPathBuf>> {
418 418 self.files.as_ref()
419 419 }
420 420
421 421 fn exact_match(&self, filename: &HgPath) -> bool {
422 422 self.files.as_ref().map_or(false, |f| f.contains(filename))
423 423 }
424 424
425 425 fn matches(&self, filename: &HgPath) -> bool {
426 426 self.m1.matches(filename) && self.m2.matches(filename)
427 427 }
428 428
429 429 fn visit_children_set(&self, directory: &HgPath) -> VisitChildrenSet {
430 430 let m1_set = self.m1.visit_children_set(directory);
431 431 if m1_set == VisitChildrenSet::Empty {
432 432 return VisitChildrenSet::Empty;
433 433 }
434 434 let m2_set = self.m2.visit_children_set(directory);
435 435 if m2_set == VisitChildrenSet::Empty {
436 436 return VisitChildrenSet::Empty;
437 437 }
438 438
439 439 if m1_set == VisitChildrenSet::Recursive {
440 440 return m2_set;
441 441 } else if m2_set == VisitChildrenSet::Recursive {
442 442 return m1_set;
443 443 }
444 444
445 445 match (&m1_set, &m2_set) {
446 446 (VisitChildrenSet::Recursive, _) => m2_set,
447 447 (_, VisitChildrenSet::Recursive) => m1_set,
448 448 (VisitChildrenSet::This, _) | (_, VisitChildrenSet::This) => {
449 449 VisitChildrenSet::This
450 450 }
451 451 (VisitChildrenSet::Set(m1), VisitChildrenSet::Set(m2)) => {
452 452 let set: HashSet<_> = m1.intersection(&m2).cloned().collect();
453 453 if set.is_empty() {
454 454 VisitChildrenSet::Empty
455 455 } else {
456 456 VisitChildrenSet::Set(set)
457 457 }
458 458 }
459 459 _ => unreachable!(),
460 460 }
461 461 }
462 462
463 463 fn matches_everything(&self) -> bool {
464 464 self.m1.matches_everything() && self.m2.matches_everything()
465 465 }
466 466
467 467 fn is_exact(&self) -> bool {
468 468 self.m1.is_exact() || self.m2.is_exact()
469 469 }
470 470 }
471 471
472 472 impl IntersectionMatcher {
473 473 pub fn new(
474 474 mut m1: Box<dyn Matcher + Sync>,
475 475 mut m2: Box<dyn Matcher + Sync>,
476 476 ) -> Self {
477 477 let files = if m1.is_exact() || m2.is_exact() {
478 478 if !m1.is_exact() {
479 479 std::mem::swap(&mut m1, &mut m2);
480 480 }
481 481 m1.file_set().map(|m1_files| {
482 482 m1_files.iter().cloned().filter(|f| m2.matches(f)).collect()
483 483 })
484 484 } else {
485 485 None
486 486 };
487 487 Self { m1, m2, files }
488 488 }
489 489 }
490 490
491 491 #[derive(Debug)]
492 492 pub struct DifferenceMatcher {
493 493 base: Box<dyn Matcher + Sync>,
494 494 excluded: Box<dyn Matcher + Sync>,
495 495 files: Option<HashSet<HgPathBuf>>,
496 496 }
497 497
498 498 impl Matcher for DifferenceMatcher {
499 499 fn file_set(&self) -> Option<&HashSet<HgPathBuf>> {
500 500 self.files.as_ref()
501 501 }
502 502
503 503 fn exact_match(&self, filename: &HgPath) -> bool {
504 504 self.files.as_ref().map_or(false, |f| f.contains(filename))
505 505 }
506 506
507 507 fn matches(&self, filename: &HgPath) -> bool {
508 508 self.base.matches(filename) && !self.excluded.matches(filename)
509 509 }
510 510
511 511 fn visit_children_set(&self, directory: &HgPath) -> VisitChildrenSet {
512 512 let excluded_set = self.excluded.visit_children_set(directory);
513 513 if excluded_set == VisitChildrenSet::Recursive {
514 514 return VisitChildrenSet::Empty;
515 515 }
516 516 let base_set = self.base.visit_children_set(directory);
517 517 // Possible values for base: 'recursive', 'this', set(...), set()
518 518 // Possible values for excluded: 'this', set(...), set()
519 519 // If excluded has nothing under here that we care about, return base,
520 520 // even if it's 'recursive'.
521 521 if excluded_set == VisitChildrenSet::Empty {
522 522 return base_set;
523 523 }
524 524 match base_set {
525 525 VisitChildrenSet::This | VisitChildrenSet::Recursive => {
526 526 // Never return 'recursive' here if excluded_set is any kind of
527 527 // non-empty (either 'this' or set(foo)), since excluded might
528 528 // return set() for a subdirectory.
529 529 VisitChildrenSet::This
530 530 }
531 531 set => {
532 532 // Possible values for base: set(...), set()
533 533 // Possible values for excluded: 'this', set(...)
534 534 // We ignore excluded set results. They're possibly incorrect:
535 535 // base = path:dir/subdir
536 536 // excluded=rootfilesin:dir,
537 537 // visit_children_set(''):
538 538 // base returns {'dir'}, excluded returns {'dir'}, if we
539 539 // subtracted we'd return set(), which is *not* correct, we
540 540 // still need to visit 'dir'!
541 541 set
542 542 }
543 543 }
544 544 }
545 545
546 546 fn matches_everything(&self) -> bool {
547 547 false
548 548 }
549 549
550 550 fn is_exact(&self) -> bool {
551 551 self.base.is_exact()
552 552 }
553 553 }
554 554
555 555 impl DifferenceMatcher {
556 556 pub fn new(
557 557 base: Box<dyn Matcher + Sync>,
558 558 excluded: Box<dyn Matcher + Sync>,
559 559 ) -> Self {
560 560 let base_is_exact = base.is_exact();
561 561 let base_files = base.file_set().map(ToOwned::to_owned);
562 562 let mut new = Self {
563 563 base,
564 564 excluded,
565 565 files: None,
566 566 };
567 567 if base_is_exact {
568 568 new.files = base_files.map(|files| {
569 569 files.iter().cloned().filter(|f| new.matches(f)).collect()
570 570 });
571 571 }
572 572 new
573 573 }
574 574 }
575 575
576 576 /// Returns a function that matches an `HgPath` against the given regex
577 577 /// pattern.
578 578 ///
579 579 /// This can fail when the pattern is invalid or not supported by the
580 580 /// underlying engine (the `regex` crate), for instance anything with
581 581 /// back-references.
582 582 #[timed]
583 583 fn re_matcher(
584 584 pattern: &[u8],
585 585 ) -> PatternResult<impl Fn(&HgPath) -> bool + Sync> {
586 586 use std::io::Write;
587 587
588 588 // The `regex` crate adds `.*` to the start and end of expressions if there
589 589 // are no anchors, so add the start anchor.
590 590 let mut escaped_bytes = vec![b'^', b'(', b'?', b':'];
591 591 for byte in pattern {
592 592 if *byte > 127 {
593 593 write!(escaped_bytes, "\\x{:x}", *byte).unwrap();
594 594 } else {
595 595 escaped_bytes.push(*byte);
596 596 }
597 597 }
598 598 escaped_bytes.push(b')');
599 599
600 600 // Avoid the cost of UTF8 checking
601 601 //
602 602 // # Safety
603 603 // This is safe because we escaped all non-ASCII bytes.
604 604 let pattern_string = unsafe { String::from_utf8_unchecked(escaped_bytes) };
605 605 let re = regex::bytes::RegexBuilder::new(&pattern_string)
606 606 .unicode(false)
607 607 // Big repos with big `.hgignore` will hit the default limit and
608 608 // incur a significant performance hit. One repo's `hg status` hit
609 609 // multiple *minutes*.
610 610 .dfa_size_limit(50 * (1 << 20))
611 611 .build()
612 612 .map_err(|e| PatternError::UnsupportedSyntax(e.to_string()))?;
613 613
614 614 Ok(move |path: &HgPath| re.is_match(path.as_bytes()))
615 615 }
616 616
617 617 /// Returns the regex pattern and a function that matches an `HgPath` against
618 618 /// said regex formed by the given ignore patterns.
619 619 fn build_regex_match<'a, 'b>(
620 620 ignore_patterns: &'a [IgnorePattern],
621 621 ) -> PatternResult<(Vec<u8>, IgnoreFnType<'b>)> {
622 622 let mut regexps = vec![];
623 623 let mut exact_set = HashSet::new();
624 624
625 625 for pattern in ignore_patterns {
626 626 if let Some(re) = build_single_regex(pattern)? {
627 627 regexps.push(re);
628 628 } else {
629 629 let exact = normalize_path_bytes(&pattern.pattern);
630 630 exact_set.insert(HgPathBuf::from_bytes(&exact));
631 631 }
632 632 }
633 633
634 634 let full_regex = regexps.join(&b'|');
635 635
636 636 // An empty pattern would cause the regex engine to incorrectly match the
637 637 // (empty) root directory
638 638 let func = if !(regexps.is_empty()) {
639 639 let matcher = re_matcher(&full_regex)?;
640 640 let func = move |filename: &HgPath| {
641 641 exact_set.contains(filename) || matcher(filename)
642 642 };
643 643 Box::new(func) as IgnoreFnType
644 644 } else {
645 645 let func = move |filename: &HgPath| exact_set.contains(filename);
646 646 Box::new(func) as IgnoreFnType
647 647 };
648 648
649 649 Ok((full_regex, func))
650 650 }
651 651
652 652 /// Returns roots and directories corresponding to each pattern.
653 653 ///
654 654 /// This calculates the roots and directories exactly matching the patterns and
655 655 /// returns a tuple of (roots, dirs). It does not return other directories
656 656 /// which may also need to be considered, like the parent directories.
657 657 fn roots_and_dirs(
658 658 ignore_patterns: &[IgnorePattern],
659 659 ) -> (Vec<HgPathBuf>, Vec<HgPathBuf>) {
660 660 let mut roots = Vec::new();
661 661 let mut dirs = Vec::new();
662 662
663 663 for ignore_pattern in ignore_patterns {
664 664 let IgnorePattern {
665 665 syntax, pattern, ..
666 666 } = ignore_pattern;
667 667 match syntax {
668 668 PatternSyntax::RootGlob | PatternSyntax::Glob => {
669 669 let mut root = HgPathBuf::new();
670 670 for p in pattern.split(|c| *c == b'/') {
671 671 if p.iter().any(|c| match *c {
672 672 b'[' | b'{' | b'*' | b'?' => true,
673 673 _ => false,
674 674 }) {
675 675 break;
676 676 }
677 677 root.push(HgPathBuf::from_bytes(p).as_ref());
678 678 }
679 679 roots.push(root);
680 680 }
681 681 PatternSyntax::Path | PatternSyntax::RelPath => {
682 682 let pat = HgPath::new(if pattern == b"." {
683 683 &[] as &[u8]
684 684 } else {
685 685 pattern
686 686 });
687 687 roots.push(pat.to_owned());
688 688 }
689 689 PatternSyntax::RootFiles => {
690 690 let pat = if pattern == b"." {
691 691 &[] as &[u8]
692 692 } else {
693 693 pattern
694 694 };
695 695 dirs.push(HgPathBuf::from_bytes(pat));
696 696 }
697 697 _ => {
698 698 roots.push(HgPathBuf::new());
699 699 }
700 700 }
701 701 }
702 702 (roots, dirs)
703 703 }
704 704
705 705 /// Paths extracted from patterns
706 706 #[derive(Debug, PartialEq)]
707 707 struct RootsDirsAndParents {
708 708 /// Directories to match recursively
709 709 pub roots: HashSet<HgPathBuf>,
710 710 /// Directories to match non-recursively
711 711 pub dirs: HashSet<HgPathBuf>,
712 712 /// Implicitly required directories to go to items in either roots or dirs
713 713 pub parents: HashSet<HgPathBuf>,
714 714 }
715 715
716 716 /// Extract roots, dirs and parents from patterns.
717 717 fn roots_dirs_and_parents(
718 718 ignore_patterns: &[IgnorePattern],
719 719 ) -> PatternResult<RootsDirsAndParents> {
720 720 let (roots, dirs) = roots_and_dirs(ignore_patterns);
721 721
722 722 let mut parents = HashSet::new();
723 723
724 724 parents.extend(
725 725 DirsMultiset::from_manifest(&dirs)
726 726 .map_err(|e| match e {
727 727 DirstateMapError::InvalidPath(e) => e,
728 728 _ => unreachable!(),
729 729 })?
730 730 .iter()
731 731 .map(ToOwned::to_owned),
732 732 );
733 733 parents.extend(
734 734 DirsMultiset::from_manifest(&roots)
735 735 .map_err(|e| match e {
736 736 DirstateMapError::InvalidPath(e) => e,
737 737 _ => unreachable!(),
738 738 })?
739 739 .iter()
740 740 .map(ToOwned::to_owned),
741 741 );
742 742
743 743 Ok(RootsDirsAndParents {
744 744 roots: HashSet::from_iter(roots),
745 745 dirs: HashSet::from_iter(dirs),
746 746 parents,
747 747 })
748 748 }
749 749
750 750 /// Returns a function that checks whether a given file (in the general sense)
751 751 /// should be matched.
752 752 fn build_match<'a, 'b>(
753 753 ignore_patterns: Vec<IgnorePattern>,
754 754 ) -> PatternResult<(Vec<u8>, IgnoreFnType<'b>)> {
755 755 let mut match_funcs: Vec<IgnoreFnType<'b>> = vec![];
756 756 // For debugging and printing
757 757 let mut patterns = vec![];
758 758
759 759 let (subincludes, ignore_patterns) = filter_subincludes(ignore_patterns)?;
760 760
761 761 if !subincludes.is_empty() {
762 762 // Build prefix-based matcher functions for subincludes
763 763 let mut submatchers = FastHashMap::default();
764 764 let mut prefixes = vec![];
765 765
766 766 for sub_include in subincludes {
767 767 let matcher = IncludeMatcher::new(sub_include.included_patterns)?;
768 768 let match_fn =
769 769 Box::new(move |path: &HgPath| matcher.matches(path));
770 770 prefixes.push(sub_include.prefix.clone());
771 771 submatchers.insert(sub_include.prefix.clone(), match_fn);
772 772 }
773 773
774 774 let match_subinclude = move |filename: &HgPath| {
775 775 for prefix in prefixes.iter() {
776 776 if let Some(rel) = filename.relative_to(prefix) {
777 777 if (submatchers[prefix])(rel) {
778 778 return true;
779 779 }
780 780 }
781 781 }
782 782 false
783 783 };
784 784
785 785 match_funcs.push(Box::new(match_subinclude));
786 786 }
787 787
788 788 if !ignore_patterns.is_empty() {
789 789 // Either do dumb matching if all patterns are rootfiles, or match
790 790 // with a regex.
791 791 if ignore_patterns
792 792 .iter()
793 793 .all(|k| k.syntax == PatternSyntax::RootFiles)
794 794 {
795 795 let dirs: HashSet<_> = ignore_patterns
796 796 .iter()
797 797 .map(|k| k.pattern.to_owned())
798 798 .collect();
799 799 let mut dirs_vec: Vec<_> = dirs.iter().cloned().collect();
800 800
801 801 let match_func = move |path: &HgPath| -> bool {
802 802 let path = path.as_bytes();
803 803 let i = path.iter().rfind(|a| **a == b'/');
804 804 let dir = if let Some(i) = i {
805 805 &path[..*i as usize]
806 806 } else {
807 807 b"."
808 808 };
809 809 dirs.contains(dir.deref())
810 810 };
811 811 match_funcs.push(Box::new(match_func));
812 812
813 813 patterns.extend(b"rootfilesin: ");
814 814 dirs_vec.sort();
815 815 patterns.extend(dirs_vec.escaped_bytes());
816 816 } else {
817 817 let (new_re, match_func) = build_regex_match(&ignore_patterns)?;
818 818 patterns = new_re;
819 819 match_funcs.push(match_func)
820 820 }
821 821 }
822 822
823 823 Ok(if match_funcs.len() == 1 {
824 824 (patterns, match_funcs.remove(0))
825 825 } else {
826 826 (
827 827 patterns,
828 828 Box::new(move |f: &HgPath| -> bool {
829 829 match_funcs.iter().any(|match_func| match_func(f))
830 830 }),
831 831 )
832 832 })
833 833 }
834 834
835 835 /// Parses all "ignore" files with their recursive includes and returns a
836 836 /// function that checks whether a given file (in the general sense) should be
837 837 /// ignored.
838 838 pub fn get_ignore_matcher<'a>(
839 839 mut all_pattern_files: Vec<PathBuf>,
840 840 root_dir: &Path,
841 inspect_pattern_bytes: &mut impl FnMut(&[u8]),
841 inspect_pattern_bytes: &mut impl FnMut(&Path, &[u8]),
842 842 ) -> PatternResult<(IncludeMatcher<'a>, Vec<PatternFileWarning>)> {
843 843 let mut all_patterns = vec![];
844 844 let mut all_warnings = vec![];
845 845
846 846 // Sort to make the ordering of calls to `inspect_pattern_bytes`
847 847 // deterministic even if the ordering of `all_pattern_files` is not (such
848 848 // as when a iteration order of a Python dict or Rust HashMap is involved).
849 849 // Sort by "string" representation instead of the default by component
850 850 // (with a Rust-specific definition of a component)
851 851 all_pattern_files
852 852 .sort_unstable_by(|a, b| a.as_os_str().cmp(b.as_os_str()));
853 853
854 854 for pattern_file in &all_pattern_files {
855 855 let (patterns, warnings) = get_patterns_from_file(
856 856 pattern_file,
857 857 root_dir,
858 858 inspect_pattern_bytes,
859 859 )?;
860 860
861 861 all_patterns.extend(patterns.to_owned());
862 862 all_warnings.extend(warnings);
863 863 }
864 864 let matcher = IncludeMatcher::new(all_patterns)?;
865 865 Ok((matcher, all_warnings))
866 866 }
867 867
868 868 /// Parses all "ignore" files with their recursive includes and returns a
869 869 /// function that checks whether a given file (in the general sense) should be
870 870 /// ignored.
871 871 pub fn get_ignore_function<'a>(
872 872 all_pattern_files: Vec<PathBuf>,
873 873 root_dir: &Path,
874 inspect_pattern_bytes: &mut impl FnMut(&[u8]),
874 inspect_pattern_bytes: &mut impl FnMut(&Path, &[u8]),
875 875 ) -> PatternResult<(IgnoreFnType<'a>, Vec<PatternFileWarning>)> {
876 876 let res =
877 877 get_ignore_matcher(all_pattern_files, root_dir, inspect_pattern_bytes);
878 878 res.map(|(matcher, all_warnings)| {
879 879 let res: IgnoreFnType<'a> =
880 880 Box::new(move |path: &HgPath| matcher.matches(path));
881 881
882 882 (res, all_warnings)
883 883 })
884 884 }
885 885
886 886 impl<'a> IncludeMatcher<'a> {
887 887 pub fn new(ignore_patterns: Vec<IgnorePattern>) -> PatternResult<Self> {
888 888 let RootsDirsAndParents {
889 889 roots,
890 890 dirs,
891 891 parents,
892 892 } = roots_dirs_and_parents(&ignore_patterns)?;
893 893 let prefix = ignore_patterns.iter().all(|k| match k.syntax {
894 894 PatternSyntax::Path | PatternSyntax::RelPath => true,
895 895 _ => false,
896 896 });
897 897 let (patterns, match_fn) = build_match(ignore_patterns)?;
898 898
899 899 Ok(Self {
900 900 patterns,
901 901 match_fn,
902 902 prefix,
903 903 roots,
904 904 dirs,
905 905 parents,
906 906 })
907 907 }
908 908
909 909 fn get_all_parents_children(&self) -> DirsChildrenMultiset {
910 910 // TODO cache
911 911 let thing = self
912 912 .dirs
913 913 .iter()
914 914 .chain(self.roots.iter())
915 915 .chain(self.parents.iter());
916 916 DirsChildrenMultiset::new(thing, Some(&self.parents))
917 917 }
918 918
919 919 pub fn debug_get_patterns(&self) -> &[u8] {
920 920 self.patterns.as_ref()
921 921 }
922 922 }
923 923
924 924 impl<'a> Display for IncludeMatcher<'a> {
925 925 fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error> {
926 926 // XXX What about exact matches?
927 927 // I'm not sure it's worth it to clone the HashSet and keep it
928 928 // around just in case someone wants to display the matcher, plus
929 929 // it's going to be unreadable after a few entries, but we need to
930 930 // inform in this display that exact matches are being used and are
931 931 // (on purpose) missing from the `includes`.
932 932 write!(
933 933 f,
934 934 "IncludeMatcher(includes='{}')",
935 935 String::from_utf8_lossy(&self.patterns.escaped_bytes())
936 936 )
937 937 }
938 938 }
939 939
940 940 #[cfg(test)]
941 941 mod tests {
942 942 use super::*;
943 943 use pretty_assertions::assert_eq;
944 944 use std::path::Path;
945 945
946 946 #[test]
947 947 fn test_roots_and_dirs() {
948 948 let pats = vec![
949 949 IgnorePattern::new(PatternSyntax::Glob, b"g/h/*", Path::new("")),
950 950 IgnorePattern::new(PatternSyntax::Glob, b"g/h", Path::new("")),
951 951 IgnorePattern::new(PatternSyntax::Glob, b"g*", Path::new("")),
952 952 ];
953 953 let (roots, dirs) = roots_and_dirs(&pats);
954 954
955 955 assert_eq!(
956 956 roots,
957 957 vec!(
958 958 HgPathBuf::from_bytes(b"g/h"),
959 959 HgPathBuf::from_bytes(b"g/h"),
960 960 HgPathBuf::new()
961 961 ),
962 962 );
963 963 assert_eq!(dirs, vec!());
964 964 }
965 965
966 966 #[test]
967 967 fn test_roots_dirs_and_parents() {
968 968 let pats = vec![
969 969 IgnorePattern::new(PatternSyntax::Glob, b"g/h/*", Path::new("")),
970 970 IgnorePattern::new(PatternSyntax::Glob, b"g/h", Path::new("")),
971 971 IgnorePattern::new(PatternSyntax::Glob, b"g*", Path::new("")),
972 972 ];
973 973
974 974 let mut roots = HashSet::new();
975 975 roots.insert(HgPathBuf::from_bytes(b"g/h"));
976 976 roots.insert(HgPathBuf::new());
977 977
978 978 let dirs = HashSet::new();
979 979
980 980 let mut parents = HashSet::new();
981 981 parents.insert(HgPathBuf::new());
982 982 parents.insert(HgPathBuf::from_bytes(b"g"));
983 983
984 984 assert_eq!(
985 985 roots_dirs_and_parents(&pats).unwrap(),
986 986 RootsDirsAndParents {
987 987 roots,
988 988 dirs,
989 989 parents
990 990 }
991 991 );
992 992 }
993 993
994 994 #[test]
995 995 fn test_filematcher_visit_children_set() {
996 996 // Visitchildrenset
997 997 let files = vec![HgPathBuf::from_bytes(b"dir/subdir/foo.txt")];
998 998 let matcher = FileMatcher::new(files).unwrap();
999 999
1000 1000 let mut set = HashSet::new();
1001 1001 set.insert(HgPathBuf::from_bytes(b"dir"));
1002 1002 assert_eq!(
1003 1003 matcher.visit_children_set(HgPath::new(b"")),
1004 1004 VisitChildrenSet::Set(set)
1005 1005 );
1006 1006
1007 1007 let mut set = HashSet::new();
1008 1008 set.insert(HgPathBuf::from_bytes(b"subdir"));
1009 1009 assert_eq!(
1010 1010 matcher.visit_children_set(HgPath::new(b"dir")),
1011 1011 VisitChildrenSet::Set(set)
1012 1012 );
1013 1013
1014 1014 let mut set = HashSet::new();
1015 1015 set.insert(HgPathBuf::from_bytes(b"foo.txt"));
1016 1016 assert_eq!(
1017 1017 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1018 1018 VisitChildrenSet::Set(set)
1019 1019 );
1020 1020
1021 1021 assert_eq!(
1022 1022 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1023 1023 VisitChildrenSet::Empty
1024 1024 );
1025 1025 assert_eq!(
1026 1026 matcher.visit_children_set(HgPath::new(b"dir/subdir/foo.txt")),
1027 1027 VisitChildrenSet::Empty
1028 1028 );
1029 1029 assert_eq!(
1030 1030 matcher.visit_children_set(HgPath::new(b"folder")),
1031 1031 VisitChildrenSet::Empty
1032 1032 );
1033 1033 }
1034 1034
1035 1035 #[test]
1036 1036 fn test_filematcher_visit_children_set_files_and_dirs() {
1037 1037 let files = vec![
1038 1038 HgPathBuf::from_bytes(b"rootfile.txt"),
1039 1039 HgPathBuf::from_bytes(b"a/file1.txt"),
1040 1040 HgPathBuf::from_bytes(b"a/b/file2.txt"),
1041 1041 // No file in a/b/c
1042 1042 HgPathBuf::from_bytes(b"a/b/c/d/file4.txt"),
1043 1043 ];
1044 1044 let matcher = FileMatcher::new(files).unwrap();
1045 1045
1046 1046 let mut set = HashSet::new();
1047 1047 set.insert(HgPathBuf::from_bytes(b"a"));
1048 1048 set.insert(HgPathBuf::from_bytes(b"rootfile.txt"));
1049 1049 assert_eq!(
1050 1050 matcher.visit_children_set(HgPath::new(b"")),
1051 1051 VisitChildrenSet::Set(set)
1052 1052 );
1053 1053
1054 1054 let mut set = HashSet::new();
1055 1055 set.insert(HgPathBuf::from_bytes(b"b"));
1056 1056 set.insert(HgPathBuf::from_bytes(b"file1.txt"));
1057 1057 assert_eq!(
1058 1058 matcher.visit_children_set(HgPath::new(b"a")),
1059 1059 VisitChildrenSet::Set(set)
1060 1060 );
1061 1061
1062 1062 let mut set = HashSet::new();
1063 1063 set.insert(HgPathBuf::from_bytes(b"c"));
1064 1064 set.insert(HgPathBuf::from_bytes(b"file2.txt"));
1065 1065 assert_eq!(
1066 1066 matcher.visit_children_set(HgPath::new(b"a/b")),
1067 1067 VisitChildrenSet::Set(set)
1068 1068 );
1069 1069
1070 1070 let mut set = HashSet::new();
1071 1071 set.insert(HgPathBuf::from_bytes(b"d"));
1072 1072 assert_eq!(
1073 1073 matcher.visit_children_set(HgPath::new(b"a/b/c")),
1074 1074 VisitChildrenSet::Set(set)
1075 1075 );
1076 1076 let mut set = HashSet::new();
1077 1077 set.insert(HgPathBuf::from_bytes(b"file4.txt"));
1078 1078 assert_eq!(
1079 1079 matcher.visit_children_set(HgPath::new(b"a/b/c/d")),
1080 1080 VisitChildrenSet::Set(set)
1081 1081 );
1082 1082
1083 1083 assert_eq!(
1084 1084 matcher.visit_children_set(HgPath::new(b"a/b/c/d/e")),
1085 1085 VisitChildrenSet::Empty
1086 1086 );
1087 1087 assert_eq!(
1088 1088 matcher.visit_children_set(HgPath::new(b"folder")),
1089 1089 VisitChildrenSet::Empty
1090 1090 );
1091 1091 }
1092 1092
1093 1093 #[test]
1094 1094 fn test_includematcher() {
1095 1095 // VisitchildrensetPrefix
1096 1096 let matcher = IncludeMatcher::new(vec![IgnorePattern::new(
1097 1097 PatternSyntax::RelPath,
1098 1098 b"dir/subdir",
1099 1099 Path::new(""),
1100 1100 )])
1101 1101 .unwrap();
1102 1102
1103 1103 let mut set = HashSet::new();
1104 1104 set.insert(HgPathBuf::from_bytes(b"dir"));
1105 1105 assert_eq!(
1106 1106 matcher.visit_children_set(HgPath::new(b"")),
1107 1107 VisitChildrenSet::Set(set)
1108 1108 );
1109 1109
1110 1110 let mut set = HashSet::new();
1111 1111 set.insert(HgPathBuf::from_bytes(b"subdir"));
1112 1112 assert_eq!(
1113 1113 matcher.visit_children_set(HgPath::new(b"dir")),
1114 1114 VisitChildrenSet::Set(set)
1115 1115 );
1116 1116 assert_eq!(
1117 1117 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1118 1118 VisitChildrenSet::Recursive
1119 1119 );
1120 1120 // OPT: This should probably be 'all' if its parent is?
1121 1121 assert_eq!(
1122 1122 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1123 1123 VisitChildrenSet::This
1124 1124 );
1125 1125 assert_eq!(
1126 1126 matcher.visit_children_set(HgPath::new(b"folder")),
1127 1127 VisitChildrenSet::Empty
1128 1128 );
1129 1129
1130 1130 // VisitchildrensetRootfilesin
1131 1131 let matcher = IncludeMatcher::new(vec![IgnorePattern::new(
1132 1132 PatternSyntax::RootFiles,
1133 1133 b"dir/subdir",
1134 1134 Path::new(""),
1135 1135 )])
1136 1136 .unwrap();
1137 1137
1138 1138 let mut set = HashSet::new();
1139 1139 set.insert(HgPathBuf::from_bytes(b"dir"));
1140 1140 assert_eq!(
1141 1141 matcher.visit_children_set(HgPath::new(b"")),
1142 1142 VisitChildrenSet::Set(set)
1143 1143 );
1144 1144
1145 1145 let mut set = HashSet::new();
1146 1146 set.insert(HgPathBuf::from_bytes(b"subdir"));
1147 1147 assert_eq!(
1148 1148 matcher.visit_children_set(HgPath::new(b"dir")),
1149 1149 VisitChildrenSet::Set(set)
1150 1150 );
1151 1151
1152 1152 assert_eq!(
1153 1153 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1154 1154 VisitChildrenSet::This
1155 1155 );
1156 1156 assert_eq!(
1157 1157 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1158 1158 VisitChildrenSet::Empty
1159 1159 );
1160 1160 assert_eq!(
1161 1161 matcher.visit_children_set(HgPath::new(b"folder")),
1162 1162 VisitChildrenSet::Empty
1163 1163 );
1164 1164
1165 1165 // VisitchildrensetGlob
1166 1166 let matcher = IncludeMatcher::new(vec![IgnorePattern::new(
1167 1167 PatternSyntax::Glob,
1168 1168 b"dir/z*",
1169 1169 Path::new(""),
1170 1170 )])
1171 1171 .unwrap();
1172 1172
1173 1173 let mut set = HashSet::new();
1174 1174 set.insert(HgPathBuf::from_bytes(b"dir"));
1175 1175 assert_eq!(
1176 1176 matcher.visit_children_set(HgPath::new(b"")),
1177 1177 VisitChildrenSet::Set(set)
1178 1178 );
1179 1179 assert_eq!(
1180 1180 matcher.visit_children_set(HgPath::new(b"folder")),
1181 1181 VisitChildrenSet::Empty
1182 1182 );
1183 1183 assert_eq!(
1184 1184 matcher.visit_children_set(HgPath::new(b"dir")),
1185 1185 VisitChildrenSet::This
1186 1186 );
1187 1187 // OPT: these should probably be set().
1188 1188 assert_eq!(
1189 1189 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1190 1190 VisitChildrenSet::This
1191 1191 );
1192 1192 assert_eq!(
1193 1193 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1194 1194 VisitChildrenSet::This
1195 1195 );
1196 1196
1197 1197 // Test multiple patterns
1198 1198 let matcher = IncludeMatcher::new(vec![
1199 1199 IgnorePattern::new(PatternSyntax::RelPath, b"foo", Path::new("")),
1200 1200 IgnorePattern::new(PatternSyntax::Glob, b"g*", Path::new("")),
1201 1201 ])
1202 1202 .unwrap();
1203 1203
1204 1204 assert_eq!(
1205 1205 matcher.visit_children_set(HgPath::new(b"")),
1206 1206 VisitChildrenSet::This
1207 1207 );
1208 1208
1209 1209 // Test multiple patterns
1210 1210 let matcher = IncludeMatcher::new(vec![IgnorePattern::new(
1211 1211 PatternSyntax::Glob,
1212 1212 b"**/*.exe",
1213 1213 Path::new(""),
1214 1214 )])
1215 1215 .unwrap();
1216 1216
1217 1217 assert_eq!(
1218 1218 matcher.visit_children_set(HgPath::new(b"")),
1219 1219 VisitChildrenSet::This
1220 1220 );
1221 1221 }
1222 1222
1223 1223 #[test]
1224 1224 fn test_unionmatcher() {
1225 1225 // Path + Rootfiles
1226 1226 let m1 = IncludeMatcher::new(vec![IgnorePattern::new(
1227 1227 PatternSyntax::RelPath,
1228 1228 b"dir/subdir",
1229 1229 Path::new(""),
1230 1230 )])
1231 1231 .unwrap();
1232 1232 let m2 = IncludeMatcher::new(vec![IgnorePattern::new(
1233 1233 PatternSyntax::RootFiles,
1234 1234 b"dir",
1235 1235 Path::new(""),
1236 1236 )])
1237 1237 .unwrap();
1238 1238 let matcher = UnionMatcher::new(vec![Box::new(m1), Box::new(m2)]);
1239 1239
1240 1240 let mut set = HashSet::new();
1241 1241 set.insert(HgPathBuf::from_bytes(b"dir"));
1242 1242 assert_eq!(
1243 1243 matcher.visit_children_set(HgPath::new(b"")),
1244 1244 VisitChildrenSet::Set(set)
1245 1245 );
1246 1246 assert_eq!(
1247 1247 matcher.visit_children_set(HgPath::new(b"dir")),
1248 1248 VisitChildrenSet::This
1249 1249 );
1250 1250 assert_eq!(
1251 1251 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1252 1252 VisitChildrenSet::Recursive
1253 1253 );
1254 1254 assert_eq!(
1255 1255 matcher.visit_children_set(HgPath::new(b"dir/foo")),
1256 1256 VisitChildrenSet::Empty
1257 1257 );
1258 1258 assert_eq!(
1259 1259 matcher.visit_children_set(HgPath::new(b"folder")),
1260 1260 VisitChildrenSet::Empty
1261 1261 );
1262 1262 assert_eq!(
1263 1263 matcher.visit_children_set(HgPath::new(b"folder")),
1264 1264 VisitChildrenSet::Empty
1265 1265 );
1266 1266
1267 1267 // OPT: These next two could be 'all' instead of 'this'.
1268 1268 assert_eq!(
1269 1269 matcher.visit_children_set(HgPath::new(b"dir/subdir/z")),
1270 1270 VisitChildrenSet::This
1271 1271 );
1272 1272 assert_eq!(
1273 1273 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1274 1274 VisitChildrenSet::This
1275 1275 );
1276 1276
1277 1277 // Path + unrelated Path
1278 1278 let m1 = IncludeMatcher::new(vec![IgnorePattern::new(
1279 1279 PatternSyntax::RelPath,
1280 1280 b"dir/subdir",
1281 1281 Path::new(""),
1282 1282 )])
1283 1283 .unwrap();
1284 1284 let m2 = IncludeMatcher::new(vec![IgnorePattern::new(
1285 1285 PatternSyntax::RelPath,
1286 1286 b"folder",
1287 1287 Path::new(""),
1288 1288 )])
1289 1289 .unwrap();
1290 1290 let matcher = UnionMatcher::new(vec![Box::new(m1), Box::new(m2)]);
1291 1291
1292 1292 let mut set = HashSet::new();
1293 1293 set.insert(HgPathBuf::from_bytes(b"folder"));
1294 1294 set.insert(HgPathBuf::from_bytes(b"dir"));
1295 1295 assert_eq!(
1296 1296 matcher.visit_children_set(HgPath::new(b"")),
1297 1297 VisitChildrenSet::Set(set)
1298 1298 );
1299 1299 let mut set = HashSet::new();
1300 1300 set.insert(HgPathBuf::from_bytes(b"subdir"));
1301 1301 assert_eq!(
1302 1302 matcher.visit_children_set(HgPath::new(b"dir")),
1303 1303 VisitChildrenSet::Set(set)
1304 1304 );
1305 1305
1306 1306 assert_eq!(
1307 1307 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1308 1308 VisitChildrenSet::Recursive
1309 1309 );
1310 1310 assert_eq!(
1311 1311 matcher.visit_children_set(HgPath::new(b"dir/foo")),
1312 1312 VisitChildrenSet::Empty
1313 1313 );
1314 1314
1315 1315 assert_eq!(
1316 1316 matcher.visit_children_set(HgPath::new(b"folder")),
1317 1317 VisitChildrenSet::Recursive
1318 1318 );
1319 1319 // OPT: These next two could be 'all' instead of 'this'.
1320 1320 assert_eq!(
1321 1321 matcher.visit_children_set(HgPath::new(b"dir/subdir/z")),
1322 1322 VisitChildrenSet::This
1323 1323 );
1324 1324 assert_eq!(
1325 1325 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1326 1326 VisitChildrenSet::This
1327 1327 );
1328 1328
1329 1329 // Path + subpath
1330 1330 let m1 = IncludeMatcher::new(vec![IgnorePattern::new(
1331 1331 PatternSyntax::RelPath,
1332 1332 b"dir/subdir/x",
1333 1333 Path::new(""),
1334 1334 )])
1335 1335 .unwrap();
1336 1336 let m2 = IncludeMatcher::new(vec![IgnorePattern::new(
1337 1337 PatternSyntax::RelPath,
1338 1338 b"dir/subdir",
1339 1339 Path::new(""),
1340 1340 )])
1341 1341 .unwrap();
1342 1342 let matcher = UnionMatcher::new(vec![Box::new(m1), Box::new(m2)]);
1343 1343
1344 1344 let mut set = HashSet::new();
1345 1345 set.insert(HgPathBuf::from_bytes(b"dir"));
1346 1346 assert_eq!(
1347 1347 matcher.visit_children_set(HgPath::new(b"")),
1348 1348 VisitChildrenSet::Set(set)
1349 1349 );
1350 1350 let mut set = HashSet::new();
1351 1351 set.insert(HgPathBuf::from_bytes(b"subdir"));
1352 1352 assert_eq!(
1353 1353 matcher.visit_children_set(HgPath::new(b"dir")),
1354 1354 VisitChildrenSet::Set(set)
1355 1355 );
1356 1356
1357 1357 assert_eq!(
1358 1358 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1359 1359 VisitChildrenSet::Recursive
1360 1360 );
1361 1361 assert_eq!(
1362 1362 matcher.visit_children_set(HgPath::new(b"dir/foo")),
1363 1363 VisitChildrenSet::Empty
1364 1364 );
1365 1365
1366 1366 assert_eq!(
1367 1367 matcher.visit_children_set(HgPath::new(b"folder")),
1368 1368 VisitChildrenSet::Empty
1369 1369 );
1370 1370 assert_eq!(
1371 1371 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1372 1372 VisitChildrenSet::Recursive
1373 1373 );
1374 1374 // OPT: this should probably be 'all' not 'this'.
1375 1375 assert_eq!(
1376 1376 matcher.visit_children_set(HgPath::new(b"dir/subdir/z")),
1377 1377 VisitChildrenSet::This
1378 1378 );
1379 1379 }
1380 1380
1381 1381 #[test]
1382 1382 fn test_intersectionmatcher() {
1383 1383 // Include path + Include rootfiles
1384 1384 let m1 = Box::new(
1385 1385 IncludeMatcher::new(vec![IgnorePattern::new(
1386 1386 PatternSyntax::RelPath,
1387 1387 b"dir/subdir",
1388 1388 Path::new(""),
1389 1389 )])
1390 1390 .unwrap(),
1391 1391 );
1392 1392 let m2 = Box::new(
1393 1393 IncludeMatcher::new(vec![IgnorePattern::new(
1394 1394 PatternSyntax::RootFiles,
1395 1395 b"dir",
1396 1396 Path::new(""),
1397 1397 )])
1398 1398 .unwrap(),
1399 1399 );
1400 1400 let matcher = IntersectionMatcher::new(m1, m2);
1401 1401
1402 1402 let mut set = HashSet::new();
1403 1403 set.insert(HgPathBuf::from_bytes(b"dir"));
1404 1404 assert_eq!(
1405 1405 matcher.visit_children_set(HgPath::new(b"")),
1406 1406 VisitChildrenSet::Set(set)
1407 1407 );
1408 1408 assert_eq!(
1409 1409 matcher.visit_children_set(HgPath::new(b"dir")),
1410 1410 VisitChildrenSet::This
1411 1411 );
1412 1412 assert_eq!(
1413 1413 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1414 1414 VisitChildrenSet::Empty
1415 1415 );
1416 1416 assert_eq!(
1417 1417 matcher.visit_children_set(HgPath::new(b"dir/foo")),
1418 1418 VisitChildrenSet::Empty
1419 1419 );
1420 1420 assert_eq!(
1421 1421 matcher.visit_children_set(HgPath::new(b"folder")),
1422 1422 VisitChildrenSet::Empty
1423 1423 );
1424 1424 assert_eq!(
1425 1425 matcher.visit_children_set(HgPath::new(b"dir/subdir/z")),
1426 1426 VisitChildrenSet::Empty
1427 1427 );
1428 1428 assert_eq!(
1429 1429 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1430 1430 VisitChildrenSet::Empty
1431 1431 );
1432 1432
1433 1433 // Non intersecting paths
1434 1434 let m1 = Box::new(
1435 1435 IncludeMatcher::new(vec![IgnorePattern::new(
1436 1436 PatternSyntax::RelPath,
1437 1437 b"dir/subdir",
1438 1438 Path::new(""),
1439 1439 )])
1440 1440 .unwrap(),
1441 1441 );
1442 1442 let m2 = Box::new(
1443 1443 IncludeMatcher::new(vec![IgnorePattern::new(
1444 1444 PatternSyntax::RelPath,
1445 1445 b"folder",
1446 1446 Path::new(""),
1447 1447 )])
1448 1448 .unwrap(),
1449 1449 );
1450 1450 let matcher = IntersectionMatcher::new(m1, m2);
1451 1451
1452 1452 assert_eq!(
1453 1453 matcher.visit_children_set(HgPath::new(b"")),
1454 1454 VisitChildrenSet::Empty
1455 1455 );
1456 1456 assert_eq!(
1457 1457 matcher.visit_children_set(HgPath::new(b"dir")),
1458 1458 VisitChildrenSet::Empty
1459 1459 );
1460 1460 assert_eq!(
1461 1461 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1462 1462 VisitChildrenSet::Empty
1463 1463 );
1464 1464 assert_eq!(
1465 1465 matcher.visit_children_set(HgPath::new(b"dir/foo")),
1466 1466 VisitChildrenSet::Empty
1467 1467 );
1468 1468 assert_eq!(
1469 1469 matcher.visit_children_set(HgPath::new(b"folder")),
1470 1470 VisitChildrenSet::Empty
1471 1471 );
1472 1472 assert_eq!(
1473 1473 matcher.visit_children_set(HgPath::new(b"dir/subdir/z")),
1474 1474 VisitChildrenSet::Empty
1475 1475 );
1476 1476 assert_eq!(
1477 1477 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1478 1478 VisitChildrenSet::Empty
1479 1479 );
1480 1480
1481 1481 // Nested paths
1482 1482 let m1 = Box::new(
1483 1483 IncludeMatcher::new(vec![IgnorePattern::new(
1484 1484 PatternSyntax::RelPath,
1485 1485 b"dir/subdir/x",
1486 1486 Path::new(""),
1487 1487 )])
1488 1488 .unwrap(),
1489 1489 );
1490 1490 let m2 = Box::new(
1491 1491 IncludeMatcher::new(vec![IgnorePattern::new(
1492 1492 PatternSyntax::RelPath,
1493 1493 b"dir/subdir",
1494 1494 Path::new(""),
1495 1495 )])
1496 1496 .unwrap(),
1497 1497 );
1498 1498 let matcher = IntersectionMatcher::new(m1, m2);
1499 1499
1500 1500 let mut set = HashSet::new();
1501 1501 set.insert(HgPathBuf::from_bytes(b"dir"));
1502 1502 assert_eq!(
1503 1503 matcher.visit_children_set(HgPath::new(b"")),
1504 1504 VisitChildrenSet::Set(set)
1505 1505 );
1506 1506
1507 1507 let mut set = HashSet::new();
1508 1508 set.insert(HgPathBuf::from_bytes(b"subdir"));
1509 1509 assert_eq!(
1510 1510 matcher.visit_children_set(HgPath::new(b"dir")),
1511 1511 VisitChildrenSet::Set(set)
1512 1512 );
1513 1513 let mut set = HashSet::new();
1514 1514 set.insert(HgPathBuf::from_bytes(b"x"));
1515 1515 assert_eq!(
1516 1516 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1517 1517 VisitChildrenSet::Set(set)
1518 1518 );
1519 1519 assert_eq!(
1520 1520 matcher.visit_children_set(HgPath::new(b"dir/foo")),
1521 1521 VisitChildrenSet::Empty
1522 1522 );
1523 1523 assert_eq!(
1524 1524 matcher.visit_children_set(HgPath::new(b"folder")),
1525 1525 VisitChildrenSet::Empty
1526 1526 );
1527 1527 assert_eq!(
1528 1528 matcher.visit_children_set(HgPath::new(b"dir/subdir/z")),
1529 1529 VisitChildrenSet::Empty
1530 1530 );
1531 1531 // OPT: this should probably be 'all' not 'this'.
1532 1532 assert_eq!(
1533 1533 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1534 1534 VisitChildrenSet::This
1535 1535 );
1536 1536
1537 1537 // Diverging paths
1538 1538 let m1 = Box::new(
1539 1539 IncludeMatcher::new(vec![IgnorePattern::new(
1540 1540 PatternSyntax::RelPath,
1541 1541 b"dir/subdir/x",
1542 1542 Path::new(""),
1543 1543 )])
1544 1544 .unwrap(),
1545 1545 );
1546 1546 let m2 = Box::new(
1547 1547 IncludeMatcher::new(vec![IgnorePattern::new(
1548 1548 PatternSyntax::RelPath,
1549 1549 b"dir/subdir/z",
1550 1550 Path::new(""),
1551 1551 )])
1552 1552 .unwrap(),
1553 1553 );
1554 1554 let matcher = IntersectionMatcher::new(m1, m2);
1555 1555
1556 1556 // OPT: these next two could probably be Empty as well.
1557 1557 let mut set = HashSet::new();
1558 1558 set.insert(HgPathBuf::from_bytes(b"dir"));
1559 1559 assert_eq!(
1560 1560 matcher.visit_children_set(HgPath::new(b"")),
1561 1561 VisitChildrenSet::Set(set)
1562 1562 );
1563 1563 // OPT: these next two could probably be Empty as well.
1564 1564 let mut set = HashSet::new();
1565 1565 set.insert(HgPathBuf::from_bytes(b"subdir"));
1566 1566 assert_eq!(
1567 1567 matcher.visit_children_set(HgPath::new(b"dir")),
1568 1568 VisitChildrenSet::Set(set)
1569 1569 );
1570 1570 assert_eq!(
1571 1571 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1572 1572 VisitChildrenSet::Empty
1573 1573 );
1574 1574 assert_eq!(
1575 1575 matcher.visit_children_set(HgPath::new(b"dir/foo")),
1576 1576 VisitChildrenSet::Empty
1577 1577 );
1578 1578 assert_eq!(
1579 1579 matcher.visit_children_set(HgPath::new(b"folder")),
1580 1580 VisitChildrenSet::Empty
1581 1581 );
1582 1582 assert_eq!(
1583 1583 matcher.visit_children_set(HgPath::new(b"dir/subdir/z")),
1584 1584 VisitChildrenSet::Empty
1585 1585 );
1586 1586 assert_eq!(
1587 1587 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1588 1588 VisitChildrenSet::Empty
1589 1589 );
1590 1590 }
1591 1591
1592 1592 #[test]
1593 1593 fn test_differencematcher() {
1594 1594 // Two alwaysmatchers should function like a nevermatcher
1595 1595 let m1 = AlwaysMatcher;
1596 1596 let m2 = AlwaysMatcher;
1597 1597 let matcher = DifferenceMatcher::new(Box::new(m1), Box::new(m2));
1598 1598
1599 1599 for case in &[
1600 1600 &b""[..],
1601 1601 b"dir",
1602 1602 b"dir/subdir",
1603 1603 b"dir/subdir/z",
1604 1604 b"dir/foo",
1605 1605 b"dir/subdir/x",
1606 1606 b"folder",
1607 1607 ] {
1608 1608 assert_eq!(
1609 1609 matcher.visit_children_set(HgPath::new(case)),
1610 1610 VisitChildrenSet::Empty
1611 1611 );
1612 1612 }
1613 1613
1614 1614 // One always and one never should behave the same as an always
1615 1615 let m1 = AlwaysMatcher;
1616 1616 let m2 = NeverMatcher;
1617 1617 let matcher = DifferenceMatcher::new(Box::new(m1), Box::new(m2));
1618 1618
1619 1619 for case in &[
1620 1620 &b""[..],
1621 1621 b"dir",
1622 1622 b"dir/subdir",
1623 1623 b"dir/subdir/z",
1624 1624 b"dir/foo",
1625 1625 b"dir/subdir/x",
1626 1626 b"folder",
1627 1627 ] {
1628 1628 assert_eq!(
1629 1629 matcher.visit_children_set(HgPath::new(case)),
1630 1630 VisitChildrenSet::Recursive
1631 1631 );
1632 1632 }
1633 1633
1634 1634 // Two include matchers
1635 1635 let m1 = Box::new(
1636 1636 IncludeMatcher::new(vec![IgnorePattern::new(
1637 1637 PatternSyntax::RelPath,
1638 1638 b"dir/subdir",
1639 1639 Path::new("/repo"),
1640 1640 )])
1641 1641 .unwrap(),
1642 1642 );
1643 1643 let m2 = Box::new(
1644 1644 IncludeMatcher::new(vec![IgnorePattern::new(
1645 1645 PatternSyntax::RootFiles,
1646 1646 b"dir",
1647 1647 Path::new("/repo"),
1648 1648 )])
1649 1649 .unwrap(),
1650 1650 );
1651 1651
1652 1652 let matcher = DifferenceMatcher::new(m1, m2);
1653 1653
1654 1654 let mut set = HashSet::new();
1655 1655 set.insert(HgPathBuf::from_bytes(b"dir"));
1656 1656 assert_eq!(
1657 1657 matcher.visit_children_set(HgPath::new(b"")),
1658 1658 VisitChildrenSet::Set(set)
1659 1659 );
1660 1660
1661 1661 let mut set = HashSet::new();
1662 1662 set.insert(HgPathBuf::from_bytes(b"subdir"));
1663 1663 assert_eq!(
1664 1664 matcher.visit_children_set(HgPath::new(b"dir")),
1665 1665 VisitChildrenSet::Set(set)
1666 1666 );
1667 1667 assert_eq!(
1668 1668 matcher.visit_children_set(HgPath::new(b"dir/subdir")),
1669 1669 VisitChildrenSet::Recursive
1670 1670 );
1671 1671 assert_eq!(
1672 1672 matcher.visit_children_set(HgPath::new(b"dir/foo")),
1673 1673 VisitChildrenSet::Empty
1674 1674 );
1675 1675 assert_eq!(
1676 1676 matcher.visit_children_set(HgPath::new(b"folder")),
1677 1677 VisitChildrenSet::Empty
1678 1678 );
1679 1679 assert_eq!(
1680 1680 matcher.visit_children_set(HgPath::new(b"dir/subdir/z")),
1681 1681 VisitChildrenSet::This
1682 1682 );
1683 1683 assert_eq!(
1684 1684 matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
1685 1685 VisitChildrenSet::This
1686 1686 );
1687 1687 }
1688 1688 }
@@ -1,40 +1,40
1 1 use crate::error::CommandError;
2 2 use clap::SubCommand;
3 3 use hg;
4 4 use hg::matchers::get_ignore_matcher;
5 5 use hg::StatusError;
6 6 use log::warn;
7 7
8 8 pub const HELP_TEXT: &str = "
9 9 Show effective hgignore patterns used by rhg.
10 10
11 11 This is a pure Rust version of `hg debugignore`.
12 12
13 13 Some options might be missing, check the list below.
14 14 ";
15 15
16 16 pub fn args() -> clap::App<'static, 'static> {
17 17 SubCommand::with_name("debugignorerhg").about(HELP_TEXT)
18 18 }
19 19
20 20 pub fn run(invocation: &crate::CliInvocation) -> Result<(), CommandError> {
21 21 let repo = invocation.repo?;
22 22
23 23 let ignore_file = repo.working_directory_vfs().join(".hgignore"); // TODO hardcoded
24 24
25 25 let (ignore_matcher, warnings) = get_ignore_matcher(
26 26 vec![ignore_file],
27 27 &repo.working_directory_path().to_owned(),
28 &mut |_pattern_bytes| (),
28 &mut |_source, _pattern_bytes| (),
29 29 )
30 30 .map_err(|e| StatusError::from(e))?;
31 31
32 32 if !warnings.is_empty() {
33 33 warn!("Pattern warnings: {:?}", &warnings);
34 34 }
35 35
36 36 let patterns = ignore_matcher.debug_get_patterns();
37 37 invocation.ui.write_stdout(patterns)?;
38 38 invocation.ui.write_stdout(b"\n")?;
39 39 Ok(())
40 40 }
@@ -1,465 +1,471
1 1 #testcases dirstate-v1 dirstate-v2
2 2
3 3 #if dirstate-v2
4 4 $ cat >> $HGRCPATH << EOF
5 5 > [format]
6 6 > use-dirstate-v2=1
7 7 > [storage]
8 8 > dirstate-v2.slow-path=allow
9 9 > EOF
10 10 #endif
11 11
12 12 $ hg init ignorerepo
13 13 $ cd ignorerepo
14 14
15 15 debugignore with no hgignore should be deterministic:
16 16 $ hg debugignore
17 17 <nevermatcher>
18 18
19 19 Issue562: .hgignore requires newline at end:
20 20
21 21 $ touch foo
22 22 $ touch bar
23 23 $ touch baz
24 24 $ cat > makeignore.py <<EOF
25 25 > f = open(".hgignore", "w")
26 26 > f.write("ignore\n")
27 27 > f.write("foo\n")
28 28 > # No EOL here
29 29 > f.write("bar")
30 30 > f.close()
31 31 > EOF
32 32
33 33 $ "$PYTHON" makeignore.py
34 34
35 35 Should display baz only:
36 36
37 37 $ hg status
38 38 ? baz
39 39
40 40 $ rm foo bar baz .hgignore makeignore.py
41 41
42 42 $ touch a.o
43 43 $ touch a.c
44 44 $ touch syntax
45 45 $ mkdir dir
46 46 $ touch dir/a.o
47 47 $ touch dir/b.o
48 48 $ touch dir/c.o
49 49
50 50 $ hg add dir/a.o
51 51 $ hg commit -m 0
52 52 $ hg add dir/b.o
53 53
54 54 $ hg status
55 55 A dir/b.o
56 56 ? a.c
57 57 ? a.o
58 58 ? dir/c.o
59 59 ? syntax
60 60
61 61 $ echo "*.o" > .hgignore
62 62 $ hg status
63 63 abort: $TESTTMP/ignorerepo/.hgignore: invalid pattern (relre): *.o (glob)
64 64 [255]
65 65
66 66 $ echo 're:^(?!a).*\.o$' > .hgignore
67 67 $ hg status
68 68 A dir/b.o
69 69 ? .hgignore
70 70 ? a.c
71 71 ? a.o
72 72 ? syntax
73 73 #if rhg
74 74 $ hg status --config rhg.on-unsupported=abort
75 75 unsupported feature: Unsupported syntax regex parse error:
76 76 ^(?:^(?!a).*\.o$)
77 77 ^^^
78 78 error: look-around, including look-ahead and look-behind, is not supported
79 79 [252]
80 80 #endif
81 81
82 82 Ensure given files are relative to cwd
83 83
84 84 $ echo "dir/.*\.o" > .hgignore
85 85 $ hg status -i
86 86 I dir/c.o
87 87
88 88 $ hg debugignore dir/c.o dir/missing.o
89 89 dir/c.o is ignored
90 90 (ignore rule in $TESTTMP/ignorerepo/.hgignore, line 1: 'dir/.*\.o') (glob)
91 91 dir/missing.o is ignored
92 92 (ignore rule in $TESTTMP/ignorerepo/.hgignore, line 1: 'dir/.*\.o') (glob)
93 93 $ cd dir
94 94 $ hg debugignore c.o missing.o
95 95 c.o is ignored
96 96 (ignore rule in $TESTTMP/ignorerepo/.hgignore, line 1: 'dir/.*\.o') (glob)
97 97 missing.o is ignored
98 98 (ignore rule in $TESTTMP/ignorerepo/.hgignore, line 1: 'dir/.*\.o') (glob)
99 99
100 100 For icasefs, inexact matches also work, except for missing files
101 101
102 102 #if icasefs
103 103 $ hg debugignore c.O missing.O
104 104 c.o is ignored
105 105 (ignore rule in $TESTTMP/ignorerepo/.hgignore, line 1: 'dir/.*\.o') (glob)
106 106 missing.O is not ignored
107 107 #endif
108 108
109 109 $ cd ..
110 110
111 111 $ echo ".*\.o" > .hgignore
112 112 $ hg status
113 113 A dir/b.o
114 114 ? .hgignore
115 115 ? a.c
116 116 ? syntax
117 117
118 118 Ensure that comments work:
119 119
120 120 $ touch 'foo#bar' 'quux#' 'quu0#'
121 121 #if no-windows
122 122 $ touch 'baz\' 'baz\wat' 'ba0\#wat' 'ba1\\' 'ba1\\wat' 'quu0\'
123 123 #endif
124 124
125 125 $ cat <<'EOF' >> .hgignore
126 126 > # full-line comment
127 127 > # whitespace-only comment line
128 128 > syntax# pattern, no whitespace, then comment
129 129 > a.c # pattern, then whitespace, then comment
130 130 > baz\\# # (escaped) backslash, then comment
131 131 > ba0\\\#w # (escaped) backslash, escaped comment character, then comment
132 132 > ba1\\\\# # (escaped) backslashes, then comment
133 133 > foo\#b # escaped comment character
134 134 > quux\## escaped comment character at end of name
135 135 > EOF
136 136 $ hg status
137 137 A dir/b.o
138 138 ? .hgignore
139 139 ? quu0#
140 140 ? quu0\ (no-windows !)
141 141
142 142 $ cat <<'EOF' > .hgignore
143 143 > .*\.o
144 144 > syntax: glob
145 145 > syntax# pattern, no whitespace, then comment
146 146 > a.c # pattern, then whitespace, then comment
147 147 > baz\\#* # (escaped) backslash, then comment
148 148 > ba0\\\#w* # (escaped) backslash, escaped comment character, then comment
149 149 > ba1\\\\#* # (escaped) backslashes, then comment
150 150 > foo\#b* # escaped comment character
151 151 > quux\## escaped comment character at end of name
152 152 > quu0[\#]# escaped comment character inside [...]
153 153 > EOF
154 154 $ hg status
155 155 A dir/b.o
156 156 ? .hgignore
157 157 ? ba1\\wat (no-windows !)
158 158 ? baz\wat (no-windows !)
159 159 ? quu0\ (no-windows !)
160 160
161 161 $ rm 'foo#bar' 'quux#' 'quu0#'
162 162 #if no-windows
163 163 $ rm 'baz\' 'baz\wat' 'ba0\#wat' 'ba1\\' 'ba1\\wat' 'quu0\'
164 164 #endif
165 165
166 166 Check that '^\.' does not ignore the root directory:
167 167
168 168 $ echo "^\." > .hgignore
169 169 $ hg status
170 170 A dir/b.o
171 171 ? a.c
172 172 ? a.o
173 173 ? dir/c.o
174 174 ? syntax
175 175
176 176 Test that patterns from ui.ignore options are read:
177 177
178 178 $ echo > .hgignore
179 179 $ cat >> $HGRCPATH << EOF
180 180 > [ui]
181 181 > ignore.other = $TESTTMP/ignorerepo/.hg/testhgignore
182 182 > EOF
183 183 $ echo "glob:**.o" > .hg/testhgignore
184 184 $ hg status
185 185 A dir/b.o
186 186 ? .hgignore
187 187 ? a.c
188 188 ? syntax
189 189
190 190 empty out testhgignore
191 191 $ echo > .hg/testhgignore
192 192
193 193 Test relative ignore path (issue4473):
194 194
195 195 $ cat >> $HGRCPATH << EOF
196 196 > [ui]
197 197 > ignore.relative = .hg/testhgignorerel
198 198 > EOF
199 199 $ echo "glob:*.o" > .hg/testhgignorerel
200 200 $ cd dir
201 201 $ hg status
202 202 A dir/b.o
203 203 ? .hgignore
204 204 ? a.c
205 205 ? syntax
206 206 $ hg debugignore
207 207 <includematcher includes='.*\\.o(?:/|$)'>
208 208
209 209 $ cd ..
210 210 $ echo > .hg/testhgignorerel
211 211 $ echo "syntax: glob" > .hgignore
212 212 $ echo "re:.*\.o" >> .hgignore
213 213 $ hg status
214 214 A dir/b.o
215 215 ? .hgignore
216 216 ? a.c
217 217 ? syntax
218 218
219 219 $ echo "syntax: invalid" > .hgignore
220 220 $ hg status
221 221 $TESTTMP/ignorerepo/.hgignore: ignoring invalid syntax 'invalid'
222 222 A dir/b.o
223 223 ? .hgignore
224 224 ? a.c
225 225 ? a.o
226 226 ? dir/c.o
227 227 ? syntax
228 228
229 229 $ echo "syntax: glob" > .hgignore
230 230 $ echo "*.o" >> .hgignore
231 231 $ hg status
232 232 A dir/b.o
233 233 ? .hgignore
234 234 ? a.c
235 235 ? syntax
236 236
237 237 $ echo "relglob:syntax*" > .hgignore
238 238 $ hg status
239 239 A dir/b.o
240 240 ? .hgignore
241 241 ? a.c
242 242 ? a.o
243 243 ? dir/c.o
244 244
245 245 $ echo "relglob:*" > .hgignore
246 246 $ hg status
247 247 A dir/b.o
248 248
249 249 $ cd dir
250 250 $ hg status .
251 251 A b.o
252 252
253 253 $ hg debugignore
254 254 <includematcher includes='.*(?:/|$)'>
255 255
256 256 $ hg debugignore b.o
257 257 b.o is ignored
258 258 (ignore rule in $TESTTMP/ignorerepo/.hgignore, line 1: '*') (glob)
259 259
260 260 $ cd ..
261 261
262 262 Check patterns that match only the directory
263 263
264 264 "(fsmonitor !)" below assumes that fsmonitor is enabled with
265 265 "walk_on_invalidate = false" (default), which doesn't involve
266 266 re-walking whole repository at detection of .hgignore change.
267 267
268 268 $ echo "^dir\$" > .hgignore
269 269 $ hg status
270 270 A dir/b.o
271 271 ? .hgignore
272 272 ? a.c
273 273 ? a.o
274 274 ? dir/c.o (fsmonitor !)
275 275 ? syntax
276 276
277 277 Check recursive glob pattern matches no directories (dir/**/c.o matches dir/c.o)
278 278
279 279 $ echo "syntax: glob" > .hgignore
280 280 $ echo "dir/**/c.o" >> .hgignore
281 281 $ touch dir/c.o
282 282 $ mkdir dir/subdir
283 283 $ touch dir/subdir/c.o
284 284 $ hg status
285 285 A dir/b.o
286 286 ? .hgignore
287 287 ? a.c
288 288 ? a.o
289 289 ? syntax
290 290 $ hg debugignore a.c
291 291 a.c is not ignored
292 292 $ hg debugignore dir/c.o
293 293 dir/c.o is ignored
294 294 (ignore rule in $TESTTMP/ignorerepo/.hgignore, line 2: 'dir/**/c.o') (glob)
295 295
296 296 Check rooted globs
297 297
298 298 $ hg purge --all --config extensions.purge=
299 299 $ echo "syntax: rootglob" > .hgignore
300 300 $ echo "a/*.ext" >> .hgignore
301 301 $ for p in a b/a aa; do mkdir -p $p; touch $p/b.ext; done
302 302 $ hg status -A 'set:**.ext'
303 303 ? aa/b.ext
304 304 ? b/a/b.ext
305 305 I a/b.ext
306 306
307 307 Check using 'include:' in ignore file
308 308
309 309 $ hg purge --all --config extensions.purge=
310 310 $ touch foo.included
311 311
312 312 $ echo ".*.included" > otherignore
313 313 $ hg status -I "include:otherignore"
314 314 ? foo.included
315 315
316 316 $ echo "include:otherignore" >> .hgignore
317 317 $ hg status
318 318 A dir/b.o
319 319 ? .hgignore
320 320 ? otherignore
321 321
322 322 Check recursive uses of 'include:'
323 323
324 324 $ echo "include:nested/ignore" >> otherignore
325 325 $ mkdir nested nested/more
326 326 $ echo "glob:*ignore" > nested/ignore
327 327 $ echo "rootglob:a" >> nested/ignore
328 328 $ touch a nested/a nested/more/a
329 329 $ hg status
330 330 A dir/b.o
331 331 ? nested/a
332 332 ? nested/more/a
333 333 $ rm a nested/a nested/more/a
334 334
335 335 $ cp otherignore goodignore
336 336 $ echo "include:badignore" >> otherignore
337 337 $ hg status
338 338 skipping unreadable pattern file 'badignore': $ENOENT$
339 339 A dir/b.o
340 340
341 341 $ mv goodignore otherignore
342 342
343 343 Check using 'include:' while in a non-root directory
344 344
345 345 $ cd ..
346 346 $ hg -R ignorerepo status
347 347 A dir/b.o
348 348 $ cd ignorerepo
349 349
350 350 Check including subincludes
351 351
352 352 $ hg revert -q --all
353 353 $ hg purge --all --config extensions.purge=
354 354 $ echo ".hgignore" > .hgignore
355 355 $ mkdir dir1 dir2
356 356 $ touch dir1/file1 dir1/file2 dir2/file1 dir2/file2
357 357 $ echo "subinclude:dir2/.hgignore" >> .hgignore
358 358 $ echo "glob:file*2" > dir2/.hgignore
359 359 $ hg status
360 360 ? dir1/file1
361 361 ? dir1/file2
362 362 ? dir2/file1
363 363
364 364 Check including subincludes with other patterns
365 365
366 366 $ echo "subinclude:dir1/.hgignore" >> .hgignore
367 367
368 368 $ mkdir dir1/subdir
369 369 $ touch dir1/subdir/file1
370 370 $ echo "rootglob:f?le1" > dir1/.hgignore
371 371 $ hg status
372 372 ? dir1/file2
373 373 ? dir1/subdir/file1
374 374 ? dir2/file1
375 375 $ rm dir1/subdir/file1
376 376
377 377 $ echo "regexp:f.le1" > dir1/.hgignore
378 378 $ hg status
379 379 ? dir1/file2
380 380 ? dir2/file1
381 381
382 382 Check multiple levels of sub-ignores
383 383
384 384 $ touch dir1/subdir/subfile1 dir1/subdir/subfile3 dir1/subdir/subfile4
385 385 $ echo "subinclude:subdir/.hgignore" >> dir1/.hgignore
386 386 $ echo "glob:subfil*3" >> dir1/subdir/.hgignore
387 387
388 388 $ hg status
389 389 ? dir1/file2
390 390 ? dir1/subdir/subfile4
391 391 ? dir2/file1
392 392
393 393 Check include subignore at the same level
394 394
395 395 $ mv dir1/subdir/.hgignore dir1/.hgignoretwo
396 396 $ echo "regexp:f.le1" > dir1/.hgignore
397 397 $ echo "subinclude:.hgignoretwo" >> dir1/.hgignore
398 398 $ echo "glob:file*2" > dir1/.hgignoretwo
399 399
400 400 $ hg status | grep file2
401 401 [1]
402 402 $ hg debugignore dir1/file2
403 403 dir1/file2 is ignored
404 404 (ignore rule in dir2/.hgignore, line 1: 'file*2')
405 405
406 406 #if windows
407 407
408 408 Windows paths are accepted on input
409 409
410 410 $ rm dir1/.hgignore
411 411 $ echo "dir1/file*" >> .hgignore
412 412 $ hg debugignore "dir1\file2"
413 413 dir1/file2 is ignored
414 414 (ignore rule in $TESTTMP\ignorerepo\.hgignore, line 4: 'dir1/file*')
415 415 $ hg up -qC .
416 416
417 417 #endif
418 418
419 419 #if dirstate-v2 rust
420 420
421 421 Check the hash of ignore patterns written in the dirstate
422 422 This is an optimization that is only relevant when using the Rust extensions
423 423
424 $ cat_filename_and_hash () {
425 > for i in "$@"; do
426 > printf "$i "
427 > cat "$i" | "$TESTDIR"/f --raw-sha1 | sed 's/^raw-sha1=//'
428 > done
429 > }
424 430 $ hg status > /dev/null
425 $ cat .hg/testhgignore .hg/testhgignorerel .hgignore dir2/.hgignore dir1/.hgignore dir1/.hgignoretwo | $TESTDIR/f --sha1
426 sha1=6e315b60f15fb5dfa02be00f3e2c8f923051f5ff
431 $ cat_filename_and_hash .hg/testhgignore .hg/testhgignorerel .hgignore dir2/.hgignore dir1/.hgignore dir1/.hgignoretwo | $TESTDIR/f --sha1
432 sha1=c0beb296395d48ced8e14f39009c4ea6e409bfe6
427 433 $ hg debugstate --docket | grep ignore
428 ignore pattern hash: 6e315b60f15fb5dfa02be00f3e2c8f923051f5ff
434 ignore pattern hash: c0beb296395d48ced8e14f39009c4ea6e409bfe6
429 435
430 436 $ echo rel > .hg/testhgignorerel
431 437 $ hg status > /dev/null
432 $ cat .hg/testhgignore .hg/testhgignorerel .hgignore dir2/.hgignore dir1/.hgignore dir1/.hgignoretwo | $TESTDIR/f --sha1
433 sha1=dea19cc7119213f24b6b582a4bae7b0cb063e34e
438 $ cat_filename_and_hash .hg/testhgignore .hg/testhgignorerel .hgignore dir2/.hgignore dir1/.hgignore dir1/.hgignoretwo | $TESTDIR/f --sha1
439 sha1=b8e63d3428ec38abc68baa27631516d5ec46b7fa
434 440 $ hg debugstate --docket | grep ignore
435 ignore pattern hash: dea19cc7119213f24b6b582a4bae7b0cb063e34e
441 ignore pattern hash: b8e63d3428ec38abc68baa27631516d5ec46b7fa
436 442 $ cd ..
437 443
438 444 Check that the hash depends on the source of the hgignore patterns
439 445 (otherwise the context is lost and things like subinclude are cached improperly)
440 446
441 447 $ hg init ignore-collision
442 448 $ cd ignore-collision
443 449 $ echo > .hg/testhgignorerel
444 450
445 451 $ mkdir dir1/ dir1/subdir
446 452 $ touch dir1/subdir/f dir1/subdir/ignored1
447 453 $ echo 'ignored1' > dir1/.hgignore
448 454
449 455 $ mkdir dir2 dir2/subdir
450 456 $ touch dir2/subdir/f dir2/subdir/ignored2
451 457 $ echo 'ignored2' > dir2/.hgignore
452 458 $ echo 'subinclude:dir2/.hgignore' >> .hgignore
453 459 $ echo 'subinclude:dir1/.hgignore' >> .hgignore
454 460
455 461 $ hg commit -Aqm_
456 462
457 463 $ > dir1/.hgignore
458 464 $ echo 'ignored' > dir2/.hgignore
459 465 $ echo 'ignored1' >> dir2/.hgignore
460 466 $ hg status
461 467 M dir1/.hgignore
462 468 M dir2/.hgignore
463 ? dir1/subdir/ignored1 (missing-correct-output !)
469 ? dir1/subdir/ignored1
464 470
465 471 #endif
General Comments 0
You need to be logged in to leave comments. Login now