##// END OF EJS Templates
dirstate-v2: preserve the fallback values on disk...
marmoute -
r49070:b874e8d8 default
parent child Browse files
Show More
@@ -1,560 +1,594 b''
1 1 The *dirstate* is what Mercurial uses internally to track
2 2 the state of files in the working directory,
3 3 such as set by commands like `hg add` and `hg rm`.
4 4 It also contains some cached data that help make `hg status` faster.
5 5 The name refers both to `.hg/dirstate` on the filesystem
6 6 and the corresponding data structure in memory while a Mercurial process
7 7 is running.
8 8
9 9 The original file format, retroactively dubbed `dirstate-v1`,
10 10 is described at https://www.mercurial-scm.org/wiki/DirState.
11 11 It is made of a flat sequence of unordered variable-size entries,
12 12 so accessing any information in it requires parsing all of it.
13 13 Similarly, saving changes requires rewriting the entire file.
14 14
15 15 The newer `dirsate-v2` file format is designed to fix these limitations
16 16 and make `hg status` faster.
17 17
18 18 User guide
19 19 ==========
20 20
21 21 Compatibility
22 22 -------------
23 23
24 24 The file format is experimental and may still change.
25 25 Different versions of Mercurial may not be compatible with each other
26 26 when working on a local repository that uses this format.
27 27 When using an incompatible version with the experimental format,
28 28 anything can happen including data corruption.
29 29
30 30 Since the dirstate is entirely local and not relevant to the wire protocol,
31 31 `dirstate-v2` does not affect compatibility with remote Mercurial versions.
32 32
33 33 When `share-safe` is enabled, different repositories sharing the same store
34 34 can use different dirstate formats.
35 35
36 36 Enabling `dirsate-v2` for new local repositories
37 37 ------------------------------------------------
38 38
39 39 When creating a new local repository such as with `hg init` or `hg clone`,
40 40 the `exp-dirstate-v2` boolean in the `format` configuration section
41 41 controls whether to use this file format.
42 42 This is disabled by default as of this writing.
43 43 To enable it for a single repository, run for example::
44 44
45 45 $ hg init my-project --config format.exp-dirstate-v2=1
46 46
47 47 Checking the format of an existing local repsitory
48 48 --------------------------------------------------
49 49
50 50 The `debugformat` commands prints information about
51 51 which of multiple optional formats are used in the current repository,
52 52 including `dirstate-v2`::
53 53
54 54 $ hg debugformat
55 55 format-variant repo
56 56 fncache: yes
57 57 dirstate-v2: yes
58 58 […]
59 59
60 60 Upgrading or downgrading an existing local repository
61 61 -----------------------------------------------------
62 62
63 63 The `debugupgrade` command does various upgrades or downgrades
64 64 on a local repository
65 65 based on the current Mercurial version and on configuration.
66 66 The same `format.exp-dirstate-v2` configuration is used again.
67 67
68 68 Example to upgrade::
69 69
70 70 $ hg debugupgrade --config format.exp-dirstate-v2=1
71 71
72 72 Example to downgrade to `dirstate-v1`::
73 73
74 74 $ hg debugupgrade --config format.exp-dirstate-v2=0
75 75
76 76 Both of this commands do nothing but print a list of proposed changes,
77 77 which may include changes unrelated to the dirstate.
78 78 Those other changes are controlled by their own configuration keys.
79 79 Add `--run` to a command to actually apply the proposed changes.
80 80
81 81 Backups of `.hg/requires` and `.hg/dirstate` are created
82 82 in a `.hg/upgradebackup.*` directory.
83 83 If something goes wrong, restoring those files should undo the change.
84 84
85 85 Note that upgrading affects compatibility with older versions of Mercurial
86 86 as noted above.
87 87 This can be relevant when a repository’s files are on a USB drive
88 88 or some other removable media, or shared over the network, etc.
89 89
90 90 Internal filesystem representation
91 91 ==================================
92 92
93 93 Requirements file
94 94 -----------------
95 95
96 96 The `.hg/requires` file indicates which of various optional file formats
97 97 are used by a given repository.
98 98 Mercurial aborts when seeing a requirement it does not know about,
99 99 which avoids older version accidentally messing up a respository
100 100 that uses a format that was introduced later.
101 101 For versions that do support a format, the presence or absence of
102 102 the corresponding requirement indicates whether to use that format.
103 103
104 104 When the file contains a `exp-dirstate-v2` line,
105 105 the `dirstate-v2` format is used.
106 106 With no such line `dirstate-v1` is used.
107 107
108 108 High level description
109 109 ----------------------
110 110
111 111 Whereas `dirstate-v1` uses a single `.hg/disrtate` file,
112 112 in `dirstate-v2` that file is a "docket" file
113 113 that only contains some metadata
114 114 and points to separate data file named `.hg/dirstate.{ID}`,
115 115 where `{ID}` is a random identifier.
116 116
117 117 This separation allows making data files append-only
118 118 and therefore safer to memory-map.
119 119 Creating a new data file (occasionally to clean up unused data)
120 120 can be done with a different ID
121 121 without disrupting another Mercurial process
122 122 that could still be using the previous data file.
123 123
124 124 Both files have a format designed to reduce the need for parsing,
125 125 by using fixed-size binary components as much as possible.
126 126 For data that is not fixed-size,
127 127 references to other parts of a file can be made by storing "pseudo-pointers":
128 128 integers counted in bytes from the start of a file.
129 129 For read-only access no data structure is needed,
130 130 only a bytes buffer (possibly memory-mapped directly from the filesystem)
131 131 with specific parts read on demand.
132 132
133 133 The data file contains "nodes" organized in a tree.
134 134 Each node represents a file or directory inside the working directory
135 135 or its parent changeset.
136 136 This tree has the same structure as the filesystem,
137 137 so a node representing a directory has child nodes representing
138 138 the files and subdirectories contained directly in that directory.
139 139
140 140 The docket file format
141 141 ----------------------
142 142
143 143 This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
144 144 and `mercurial/dirstateutils/docket.py`.
145 145
146 146 Components of the docket file are found at fixed offsets,
147 147 counted in bytes from the start of the file:
148 148
149 149 * Offset 0:
150 150 The 12-bytes marker string "dirstate-v2\n" ending with a newline character.
151 151 This makes it easier to tell a dirstate-v2 file from a dirstate-v1 file,
152 152 although it is not strictly necessary
153 153 since `.hg/requires` determines which format to use.
154 154
155 155 * Offset 12:
156 156 The changeset node ID on the first parent of the working directory,
157 157 as up to 32 binary bytes.
158 158 If a node ID is shorter (20 bytes for SHA-1),
159 159 it is start-aligned and the rest of the bytes are set to zero.
160 160
161 161 * Offset 44:
162 162 The changeset node ID on the second parent of the working directory,
163 163 or all zeros if there isn’t one.
164 164 Also 32 binary bytes.
165 165
166 166 * Offset 76:
167 167 Tree metadata on 44 bytes, described below.
168 168 Its separation in this documentation from the rest of the docket
169 169 reflects a detail of the current implementation.
170 170 Since tree metadata is also made of fields at fixed offsets, those could
171 171 be inlined here by adding 76 bytes to each offset.
172 172
173 173 * Offset 120:
174 174 The used size of the data file, as a 32-bit big-endian integer.
175 175 The actual size of the data file may be larger
176 176 (if another Mercurial processis in appending to it
177 177 but has not updated the docket yet).
178 178 That extra data must be ignored.
179 179
180 180 * Offset 124:
181 181 The length of the data file identifier, as a 8-bit integer.
182 182
183 183 * Offset 125:
184 184 The data file identifier.
185 185
186 186 * Any additional data is current ignored, and dropped when updating the file.
187 187
188 188 Tree metadata in the docket file
189 189 --------------------------------
190 190
191 191 Tree metadata is similarly made of components at fixed offsets.
192 192 These offsets are counted in bytes from the start of tree metadata,
193 193 which is 76 bytes after the start of the docket file.
194 194
195 195 This metadata can be thought of as the singular root of the tree
196 196 formed by nodes in the data file.
197 197
198 198 * Offset 0:
199 199 Pseudo-pointer to the start of root nodes,
200 200 counted in bytes from the start of the data file,
201 201 as a 32-bit big-endian integer.
202 202 These nodes describe files and directories found directly
203 203 at the root of the working directory.
204 204
205 205 * Offset 4:
206 206 Number of root nodes, as a 32-bit big-endian integer.
207 207
208 208 * Offset 8:
209 209 Total number of nodes in the entire tree that "have a dirstate entry",
210 210 as a 32-bit big-endian integer.
211 211 Those nodes represent files that would be present at all in `dirstate-v1`.
212 212 This is typically less than the total number of nodes.
213 213 This counter is used to implement `len(dirstatemap)`.
214 214
215 215 * Offset 12:
216 216 Number of nodes in the entire tree that have a copy source,
217 217 as a 32-bit big-endian integer.
218 218 At the next commit, these files are recorded
219 219 as having been copied or moved/renamed from that source.
220 220 (A move is recorded as a copy and separate removal of the source.)
221 221 This counter is used to implement `len(dirstatemap.copymap)`.
222 222
223 223 * Offset 16:
224 224 An estimation of how many bytes of the data file
225 225 (within its used size) are unused, as a 32-bit big-endian integer.
226 226 When appending to an existing data file,
227 227 some existing nodes or paths can be unreachable from the new root
228 228 but they still take up space.
229 229 This counter is used to decide when to write a new data file from scratch
230 230 instead of appending to an existing one,
231 231 in order to get rid of that unreachable data
232 232 and avoid unbounded file size growth.
233 233
234 234 * Offset 20:
235 235 These four bytes are currently ignored
236 236 and reset to zero when updating a docket file.
237 237 This is an attempt at forward compatibility:
238 238 future Mercurial versions could use this as a bit field
239 239 to indicate that a dirstate has additional data or constraints.
240 240 Finding a dirstate file with the relevant bit unset indicates that
241 241 it was written by a then-older version
242 242 which is not aware of that future change.
243 243
244 244 * Offset 24:
245 245 Either 20 zero bytes, or a SHA-1 hash as 20 binary bytes.
246 246 When present, the hash is of ignore patterns
247 247 that were used for some previous run of the `status` algorithm.
248 248
249 249 * (Offset 44: end of tree metadata)
250 250
251 251 Optional hash of ignore patterns
252 252 --------------------------------
253 253
254 254 The implementation of `status` at `rust/hg-core/src/dirstate_tree/status.rs`
255 255 has been optimized such that its run time is dominated by calls
256 256 to `stat` for reading the filesystem metadata of a file or directory,
257 257 and to `readdir` for listing the contents of a directory.
258 258 In some cases the algorithm can skip calls to `readdir`
259 259 (saving significant time)
260 260 because the dirstate already contains enough of the relevant information
261 261 to build the correct `status` results.
262 262
263 263 The default configuration of `hg status` is to list unknown files
264 264 but not ignored files.
265 265 In this case, it matters for the `readdir`-skipping optimization
266 266 if a given file used to be ignored but became unknown
267 267 because `.hgignore` changed.
268 268 To detect the possibility of such a change,
269 269 the tree metadata contains an optional hash of all ignore patterns.
270 270
271 271 We define:
272 272
273 273 * "Root" ignore files as:
274 274
275 275 - `.hgignore` at the root of the repository if it exists
276 276 - And all files from `ui.ignore.*` config.
277 277
278 278 This set of files is sorted by the string representation of their path.
279 279
280 280 * The "expanded contents" of an ignore files is the byte string made
281 281 by the concatenation of its contents followed by the "expanded contents"
282 282 of other files included with `include:` or `subinclude:` directives,
283 283 in inclusion order. This definition is recursive, as included files can
284 284 themselves include more files.
285 285
286 286 This hash is defined as the SHA-1 of the concatenation (in sorted
287 287 order) of the "expanded contents" of each "root" ignore file.
288 288 (Note that computing this does not require actually concatenating
289 289 into a single contiguous byte sequence.
290 290 Instead a SHA-1 hasher object can be created
291 291 and fed separate chunks one by one.)
292 292
293 293 The data file format
294 294 --------------------
295 295
296 296 This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
297 297 and `mercurial/dirstateutils/v2.py`.
298 298
299 299 The data file contains two types of data: paths and nodes.
300 300
301 301 Paths and nodes can be organized in any order in the file, except that sibling
302 302 nodes must be next to each other and sorted by their path.
303 303 Contiguity lets the parent refer to them all
304 304 by their count and a single pseudo-pointer,
305 305 instead of storing one pseudo-pointer per child node.
306 306 Sorting allows using binary seach to find a child node with a given name
307 307 in `O(log(n))` byte sequence comparisons.
308 308
309 309 The current implemention writes paths and child node before a given node
310 310 for ease of figuring out the value of pseudo-pointers by the time the are to be
311 311 written, but this is not an obligation and readers must not rely on it.
312 312
313 313 A path is stored as a byte string anywhere in the file, without delimiter.
314 314 It is refered to by one or more node by a pseudo-pointer to its start, and its
315 315 length in bytes. Since there is no delimiter,
316 316 when a path is a substring of another the same bytes could be reused,
317 317 although the implementation does not exploit this as of this writing.
318 318
319 319 A node is stored on 43 bytes with components at fixed offsets. Paths and
320 320 child nodes relevant to a node are stored externally and referenced though
321 321 pseudo-pointers.
322 322
323 323 All integers are stored in big-endian. All pseudo-pointers are 32-bit integers
324 324 counting bytes from the start of the data file. Path lengths and positions
325 325 are 16-bit integers, also counted in bytes.
326 326
327 327 Node components are:
328 328
329 329 * Offset 0:
330 330 Pseudo-pointer to the full path of this node,
331 331 from the working directory root.
332 332
333 333 * Offset 4:
334 334 Length of the full path.
335 335
336 336 * Offset 6:
337 337 Position of the last `/` path separator within the full path,
338 338 in bytes from the start of the full path,
339 339 or zero if there isn’t one.
340 340 The part of the full path after this position is the "base name".
341 341 Since sibling nodes have the same parent, only their base name vary
342 342 and needs to be considered when doing binary search to find a given path.
343 343
344 344 * Offset 8:
345 345 Pseudo-pointer to the "copy source" path for this node,
346 346 or zero if there is no copy source.
347 347
348 348 * Offset 12:
349 349 Length of the copy source path, or zero if there isn’t one.
350 350
351 351 * Offset 14:
352 352 Pseudo-pointer to the start of child nodes.
353 353
354 354 * Offset 18:
355 355 Number of child nodes, as a 32-bit integer.
356 356 They occupy 43 times this number of bytes
357 357 (not counting space for paths, and further descendants).
358 358
359 359 * Offset 22:
360 360 Number as a 32-bit integer of descendant nodes in this subtree,
361 361 not including this node itself,
362 362 that "have a dirstate entry".
363 363 Those nodes represent files that would be present at all in `dirstate-v1`.
364 364 This is typically less than the total number of descendants.
365 365 This counter is used to implement `has_dir`.
366 366
367 367 * Offset 26:
368 368 Number as a 32-bit integer of descendant nodes in this subtree,
369 369 not including this node itself,
370 370 that represent files tracked in the working directory.
371 371 (For example, `hg rm` makes a file untracked.)
372 372 This counter is used to implement `has_tracked_dir`.
373 373
374 374 * Offset 30:
375 375 A `flags` fields that packs some boolean values as bits of a 16-bit integer.
376 376 Starting from least-significant, bit masks are::
377 377
378 378 WDIR_TRACKED = 1 << 0
379 379 P1_TRACKED = 1 << 1
380 380 P2_INFO = 1 << 2
381 381 HAS_MODE_AND_SIZE = 1 << 3
382 382 HAS_FILE_MTIME = 1 << 4
383 383 HAS_DIRECTORY_MTIME = 1 << 5
384 384 MODE_EXEC_PERM = 1 << 6
385 385 MODE_IS_SYMLINK = 1 << 7
386 386 EXPECTED_STATE_IS_MODIFIED = 1 << 8
387 387 ALL_UNKNOWN_RECORDED = 1 << 9
388 388 ALL_IGNORED_RECORDED = 1 << 10
389 HAS_FALLBACK_EXEC = 1 << 11
390 FALLBACK_EXEC = 1 << 12
391 HAS_FALLBACK_SYMLINK = 1 << 13
392 FALLBACK_SYMLINK = 1 << 14
389 393
390 394 The meaning of each bit is described below.
391 395
392 396 Other bits are unset.
393 397 They may be assigned meaning if the future,
394 398 with the limitation that Mercurial versions that pre-date such meaning
395 399 will always reset those bits to unset when writing nodes.
396 400 (A new node is written for any mutation in its subtree,
397 401 leaving the bytes of the old node unreachable
398 402 until the data file is rewritten entirely.)
399 403
400 404 * Offset 32:
401 405 A `size` field described below, as a 32-bit integer.
402 406 Unlike in dirstate-v1, negative values are not used.
403 407
404 408 * Offset 36:
405 409 The seconds component of an `mtime` field described below,
406 410 as a 32-bit integer.
407 411 Unlike in dirstate-v1, negative values are not used.
408 412 When `mtime` is used, this is number of seconds since the Unix epoch
409 413 truncated to its lower 31 bits.
410 414
411 415 * Offset 40:
412 416 The nanoseconds component of an `mtime` field described below,
413 417 as a 32-bit integer.
414 418 When `mtime` is used,
415 419 this is the number of nanoseconds since `mtime.seconds`,
416 420 always stritctly less than one billion.
417 421
418 422 This may be zero if more precision is not available.
419 423 (This can happen because of limitations in any of Mercurial, Python,
420 424 libc, the operating system, …)
421 425
422 426 When comparing two mtimes and either has this component set to zero,
423 427 the sub-second precision of both should be ignored.
424 428 False positives when checking mtime equality due to clock resolution
425 429 are always possible and the status algorithm needs to deal with them,
426 430 but having too many false negatives could be harmful too.
427 431
428 432 * (Offset 44: end of this node)
429 433
430 434 The meaning of the boolean values packed in `flags` is:
431 435
432 436 `WDIR_TRACKED`
433 437 Set if the working directory contains a tracked file at this node’s path.
434 438 This is typically set and unset by `hg add` and `hg rm`.
435 439
436 440 `P1_TRACKED`
437 441 Set if the working directory’s first parent changeset
438 442 (whose node identifier is found in tree metadata)
439 443 contains a tracked file at this node’s path.
440 444 This is a cache to reduce manifest lookups.
441 445
442 446 `P2_INFO`
443 447 Set if the file has been involved in some merge operation.
444 448 Either because it was actually merged,
445 449 or because the version in the second parent p2 version was ahead,
446 450 or because some rename moved it there.
447 451 In either case `hg status` will want it displayed as modified.
448 452
449 453 Files that would be mentioned at all in the `dirstate-v1` file format
450 454 have a node with at least one of the above three bits set in `dirstate-v2`.
451 455 Let’s call these files "tracked anywhere",
452 456 and "untracked" the nodes with all three of these bits unset.
453 457 Untracked nodes are typically for directories:
454 458 they hold child nodes and form the tree structure.
455 459 Additional untracked nodes may also exist.
456 460 Although implementations should strive to clean up nodes
457 461 that are entirely unused, other untracked nodes may also exist.
458 462 For example, a future version of Mercurial might in some cases
459 463 add nodes for untracked files or/and ignored files in the working directory
460 464 in order to optimize `hg status`
461 465 by enabling it to skip `readdir` in more cases.
462 466
463 467 `HAS_MODE_AND_SIZE`
464 468 Must be unset for untracked nodes.
465 469 For files tracked anywhere, if this is set:
466 470 - The `size` field is the expected file size,
467 471 in bytes truncated its lower to 31 bits.
468 472 - The expected execute permission for the file’s owner
469 473 is given by `MODE_EXEC_PERM`
470 474 - The expected file type is given by `MODE_IS_SIMLINK`:
471 475 a symbolic link if set, or a normal file if unset.
472 476 If this is unset the expected size, permission, and file type are unknown.
473 477 The `size` field is unused (set to zero).
474 478
475 479 `HAS_FILE_MTIME`
476 480 Must be unset for untracked nodes.
477 481 If this and `HAS_DIRECTORY_MTIME` are both unset,
478 482 the `mtime` field is unused (set to zero).
479 483 If this is set, `mtime` is the expected modification time.
480 484
481 485 `HAS_DIRECTORY_MTIME`
482 486 Must be unset for file tracked anywhere.
483 487 If this and `HAS_DIRECTORY_MTIME` are both unset,
484 488 the `mtime` field is unused (set to zero).
485 489 If this is set, at some point,
486 490 this path in the working directory was observed:
487 491
488 492 - To be a directory
489 493 - With the modification time given in `mtime`
490 494 - That time was already strictly in the past when observed,
491 495 meaning that later changes cannot happen in the same clock tick
492 496 and must cause a different modification time
493 497 (unless the system clock jumps back and we get unlucky,
494 498 which is not impossible but deemed unlikely enough).
495 499 - All direct children of this directory
496 500 (as returned by `std::fs::read_dir`)
497 501 either have a corresponding dirstate node,
498 502 or are ignored by ignore patterns whose hash is in tree metadata.
499 503
500 504 This means that if `std::fs::symlink_metadata` later reports
501 505 the same modification time
502 506 and ignored patterns haven’t changed,
503 507 a run of status that is not listing ignored files
504 508 can skip calling `std::fs::read_dir` again for this directory,
505 509 and iterate child dirstate nodes instead.
506 510
507 511 `MODE_EXEC_PERM`
508 512 Must be unset if `HAS_MODE_AND_SIZE` is unset.
509 513 If `HAS_MODE_AND_SIZE` is set,
510 514 this indicates whether the file’s own is expected
511 515 to have execute permission.
512 516
513 517 `MODE_IS_SYMLINK`
514 518 Must be unset if `HAS_MODE_AND_SIZE` is unset.
515 519 If `HAS_MODE_AND_SIZE` is set,
516 520 this indicates whether the file is expected to be a symlink
517 521 as opposed to a normal file.
518 522
519 523 `EXPECTED_STATE_IS_MODIFIED`
520 524 Must be unset for untracked nodes.
521 525 For:
522 526 - a file tracked anywhere
523 527 - that has expected metadata (`HAS_MODE_AND_SIZE` and `HAS_FILE_MTIME`)
524 528 - if that metadata matches
525 529 metadata found in the working directory with `stat`
526 530 This bit indicates the status of the file.
527 531 If set, the status is modified. If unset, it is clean.
528 532
529 533 In cases where `hg status` needs to read the contents of a file
530 534 because metadata is ambiguous, this bit lets it record the result
531 535 if the result is modified so that a future run of `hg status`
532 536 does not need to do the same again.
533 537 It is valid to never set this bit,
534 538 and consider expected metadata ambiguous if it is set.
535 539
536 540 `ALL_UNKNOWN_RECORDED`
537 541 If set, all "unknown" children existing on disk (at the time of the last
538 542 status) have been recorded and the `mtime` associated with
539 543 `HAS_DIRECTORY_MTIME` can be used for optimization even when "unknown" file
540 544 are listed.
541 545
542 546 Note that the amount recorded "unknown" children can still be zero if None
543 547 where present.
544 548
545 549 Also note that having this flag unset does not imply that no "unknown"
546 550 children have been recorded. Some might be present, but there is no garantee
547 551 that is will be all of them.
548 552
549 553 `ALL_IGNORED_RECORDED`
550 554 If set, all "ignored" children existing on disk (at the time of the last
551 555 status) have been recorded and the `mtime` associated with
552 556 `HAS_DIRECTORY_MTIME` can be used for optimization even when "ignored" file
553 557 are listed.
554 558
555 559 Note that the amount recorded "ignored" children can still be zero if None
556 560 where present.
557 561
558 562 Also note that having this flag unset does not imply that no "ignored"
559 563 children have been recorded. Some might be present, but there is no garantee
560 564 that is will be all of them.
565
566 `HAS_FALLBACK_EXEC`
567 If this flag is set, the entry carries "fallback" information for the
568 executable bit in the `FALLBACK_EXEC` flag.
569
570 Fallback information can be stored in the dirstate to keep track of
571 filesystem attribute tracked by Mercurial when the underlying file
572 system or operating system does not support that property, (e.g.
573 Windows).
574
575 `FALLBACK_EXEC`
576 Should be ignored if `HAS_FALLBACK_EXEC` is unset. If set the file for this
577 entry should be considered executable if that information cannot be
578 extracted from the file system. If unset it should be considered
579 non-executable instead.
580
581 `HAS_FALLBACK_SYMLINK`
582 If this flag is set, the entry carries "fallback" information for symbolic
583 link status in the `FALLBACK_SYMLINK` flag.
584
585 Fallback information can be stored in the dirstate to keep track of
586 filesystem attribute tracked by Mercurial when the underlying file
587 system or operating system does not support that property, (e.g.
588 Windows).
589
590 `FALLBACK_SYMLINK`
591 Should be ignored if `HAS_FALLBACK_SYMLINK` is unset. If set the file for
592 this entry should be considered a symlink if that information cannot be
593 extracted from the file system. If unset it should be considered a normal
594 file instead.
@@ -1,890 +1,915 b''
1 1 # parsers.py - Python implementation of parsers.c
2 2 #
3 3 # Copyright 2009 Olivia Mackall <olivia@selenic.com> and others
4 4 #
5 5 # This software may be used and distributed according to the terms of the
6 6 # GNU General Public License version 2 or any later version.
7 7
8 8 from __future__ import absolute_import
9 9
10 10 import stat
11 11 import struct
12 12 import zlib
13 13
14 14 from ..node import (
15 15 nullrev,
16 16 sha1nodeconstants,
17 17 )
18 18 from ..thirdparty import attr
19 19 from .. import (
20 20 error,
21 21 pycompat,
22 22 revlogutils,
23 23 util,
24 24 )
25 25
26 26 from ..revlogutils import nodemap as nodemaputil
27 27 from ..revlogutils import constants as revlog_constants
28 28
29 29 stringio = pycompat.bytesio
30 30
31 31
32 32 _pack = struct.pack
33 33 _unpack = struct.unpack
34 34 _compress = zlib.compress
35 35 _decompress = zlib.decompress
36 36
37 37
38 38 # a special value used internally for `size` if the file come from the other parent
39 39 FROM_P2 = -2
40 40
41 41 # a special value used internally for `size` if the file is modified/merged/added
42 42 NONNORMAL = -1
43 43
44 44 # a special value used internally for `time` if the time is ambigeous
45 45 AMBIGUOUS_TIME = -1
46 46
47 47 # Bits of the `flags` byte inside a node in the file format
48 48 DIRSTATE_V2_WDIR_TRACKED = 1 << 0
49 49 DIRSTATE_V2_P1_TRACKED = 1 << 1
50 50 DIRSTATE_V2_P2_INFO = 1 << 2
51 51 DIRSTATE_V2_HAS_MODE_AND_SIZE = 1 << 3
52 52 DIRSTATE_V2_HAS_FILE_MTIME = 1 << 4
53 53 _DIRSTATE_V2_HAS_DIRCTORY_MTIME = 1 << 5 # Unused when Rust is not available
54 54 DIRSTATE_V2_MODE_EXEC_PERM = 1 << 6
55 55 DIRSTATE_V2_MODE_IS_SYMLINK = 1 << 7
56 56 DIRSTATE_V2_EXPECTED_STATE_IS_MODIFIED = 1 << 8
57 57 DIRSTATE_V2_ALL_UNKNOWN_RECORDED = 1 << 9
58 58 DIRSTATE_V2_ALL_IGNORED_RECORDED = 1 << 10
59 DIRSTATE_V2_HAS_FALLBACK_EXEC = 1 << 11
60 DIRSTATE_V2_FALLBACK_EXEC = 1 << 12
61 DIRSTATE_V2_HAS_FALLBACK_SYMLINK = 1 << 13
62 DIRSTATE_V2_FALLBACK_SYMLINK = 1 << 14
59 63
60 64
61 65 @attr.s(slots=True, init=False)
62 66 class DirstateItem(object):
63 67 """represent a dirstate entry
64 68
65 69 It hold multiple attributes
66 70
67 71 # about file tracking
68 72 - wc_tracked: is the file tracked by the working copy
69 73 - p1_tracked: is the file tracked in working copy first parent
70 74 - p2_info: the file has been involved in some merge operation. Either
71 75 because it was actually merged, or because the p2 version was
72 76 ahead, or because some rename moved it there. In either case
73 77 `hg status` will want it displayed as modified.
74 78
75 79 # about the file state expected from p1 manifest:
76 80 - mode: the file mode in p1
77 81 - size: the file size in p1
78 82
79 83 These value can be set to None, which mean we don't have a meaningful value
80 84 to compare with. Either because we don't really care about them as there
81 85 `status` is known without having to look at the disk or because we don't
82 86 know these right now and a full comparison will be needed to find out if
83 87 the file is clean.
84 88
85 89 # about the file state on disk last time we saw it:
86 90 - mtime: the last known clean mtime for the file.
87 91
88 92 This value can be set to None if no cachable state exist. Either because we
89 93 do not care (see previous section) or because we could not cache something
90 94 yet.
91 95 """
92 96
93 97 _wc_tracked = attr.ib()
94 98 _p1_tracked = attr.ib()
95 99 _p2_info = attr.ib()
96 100 _mode = attr.ib()
97 101 _size = attr.ib()
98 102 _mtime = attr.ib()
99 103 _fallback_exec = attr.ib()
100 104 _fallback_symlink = attr.ib()
101 105
102 106 def __init__(
103 107 self,
104 108 wc_tracked=False,
105 109 p1_tracked=False,
106 110 p2_info=False,
107 111 has_meaningful_data=True,
108 112 has_meaningful_mtime=True,
109 113 parentfiledata=None,
110 114 fallback_exec=None,
111 115 fallback_symlink=None,
112 116 ):
113 117 self._wc_tracked = wc_tracked
114 118 self._p1_tracked = p1_tracked
115 119 self._p2_info = p2_info
116 120
117 121 self._fallback_exec = fallback_exec
118 122 self._fallback_symlink = fallback_symlink
119 123
120 124 self._mode = None
121 125 self._size = None
122 126 self._mtime = None
123 127 if parentfiledata is None:
124 128 has_meaningful_mtime = False
125 129 has_meaningful_data = False
126 130 if has_meaningful_data:
127 131 self._mode = parentfiledata[0]
128 132 self._size = parentfiledata[1]
129 133 if has_meaningful_mtime:
130 134 self._mtime = parentfiledata[2]
131 135
132 136 @classmethod
133 137 def from_v2_data(cls, flags, size, mtime):
134 138 """Build a new DirstateItem object from V2 data"""
135 139 has_mode_size = bool(flags & DIRSTATE_V2_HAS_MODE_AND_SIZE)
136 140 has_meaningful_mtime = bool(flags & DIRSTATE_V2_HAS_FILE_MTIME)
137 141 mode = None
138 142
139 143 if flags & +DIRSTATE_V2_EXPECTED_STATE_IS_MODIFIED:
140 144 # we do not have support for this flag in the code yet,
141 145 # force a lookup for this file.
142 146 has_mode_size = False
143 147 has_meaningful_mtime = False
144 148
149 fallback_exec = None
150 if flags & DIRSTATE_V2_HAS_FALLBACK_EXEC:
151 fallback_exec = flags & DIRSTATE_V2_FALLBACK_EXEC
152
153 fallback_symlink = None
154 if flags & DIRSTATE_V2_HAS_FALLBACK_SYMLINK:
155 fallback_symlink = flags & DIRSTATE_V2_FALLBACK_SYMLINK
156
145 157 if has_mode_size:
146 158 assert stat.S_IXUSR == 0o100
147 159 if flags & DIRSTATE_V2_MODE_EXEC_PERM:
148 160 mode = 0o755
149 161 else:
150 162 mode = 0o644
151 163 if flags & DIRSTATE_V2_MODE_IS_SYMLINK:
152 164 mode |= stat.S_IFLNK
153 165 else:
154 166 mode |= stat.S_IFREG
155 167 return cls(
156 168 wc_tracked=bool(flags & DIRSTATE_V2_WDIR_TRACKED),
157 169 p1_tracked=bool(flags & DIRSTATE_V2_P1_TRACKED),
158 170 p2_info=bool(flags & DIRSTATE_V2_P2_INFO),
159 171 has_meaningful_data=has_mode_size,
160 172 has_meaningful_mtime=has_meaningful_mtime,
161 173 parentfiledata=(mode, size, mtime),
174 fallback_exec=fallback_exec,
175 fallback_symlink=fallback_symlink,
162 176 )
163 177
164 178 @classmethod
165 179 def from_v1_data(cls, state, mode, size, mtime):
166 180 """Build a new DirstateItem object from V1 data
167 181
168 182 Since the dirstate-v1 format is frozen, the signature of this function
169 183 is not expected to change, unlike the __init__ one.
170 184 """
171 185 if state == b'm':
172 186 return cls(wc_tracked=True, p1_tracked=True, p2_info=True)
173 187 elif state == b'a':
174 188 return cls(wc_tracked=True)
175 189 elif state == b'r':
176 190 if size == NONNORMAL:
177 191 p1_tracked = True
178 192 p2_info = True
179 193 elif size == FROM_P2:
180 194 p1_tracked = False
181 195 p2_info = True
182 196 else:
183 197 p1_tracked = True
184 198 p2_info = False
185 199 return cls(p1_tracked=p1_tracked, p2_info=p2_info)
186 200 elif state == b'n':
187 201 if size == FROM_P2:
188 202 return cls(wc_tracked=True, p2_info=True)
189 203 elif size == NONNORMAL:
190 204 return cls(wc_tracked=True, p1_tracked=True)
191 205 elif mtime == AMBIGUOUS_TIME:
192 206 return cls(
193 207 wc_tracked=True,
194 208 p1_tracked=True,
195 209 has_meaningful_mtime=False,
196 210 parentfiledata=(mode, size, 42),
197 211 )
198 212 else:
199 213 return cls(
200 214 wc_tracked=True,
201 215 p1_tracked=True,
202 216 parentfiledata=(mode, size, mtime),
203 217 )
204 218 else:
205 219 raise RuntimeError(b'unknown state: %s' % state)
206 220
207 221 def set_possibly_dirty(self):
208 222 """Mark a file as "possibly dirty"
209 223
210 224 This means the next status call will have to actually check its content
211 225 to make sure it is correct.
212 226 """
213 227 self._mtime = None
214 228
215 229 def set_clean(self, mode, size, mtime):
216 230 """mark a file as "clean" cancelling potential "possibly dirty call"
217 231
218 232 Note: this function is a descendant of `dirstate.normal` and is
219 233 currently expected to be call on "normal" entry only. There are not
220 234 reason for this to not change in the future as long as the ccode is
221 235 updated to preserve the proper state of the non-normal files.
222 236 """
223 237 self._wc_tracked = True
224 238 self._p1_tracked = True
225 239 self._mode = mode
226 240 self._size = size
227 241 self._mtime = mtime
228 242
229 243 def set_tracked(self):
230 244 """mark a file as tracked in the working copy
231 245
232 246 This will ultimately be called by command like `hg add`.
233 247 """
234 248 self._wc_tracked = True
235 249 # `set_tracked` is replacing various `normallookup` call. So we mark
236 250 # the files as needing lookup
237 251 #
238 252 # Consider dropping this in the future in favor of something less broad.
239 253 self._mtime = None
240 254
241 255 def set_untracked(self):
242 256 """mark a file as untracked in the working copy
243 257
244 258 This will ultimately be called by command like `hg remove`.
245 259 """
246 260 self._wc_tracked = False
247 261 self._mode = None
248 262 self._size = None
249 263 self._mtime = None
250 264
251 265 def drop_merge_data(self):
252 266 """remove all "merge-only" from a DirstateItem
253 267
254 268 This is to be call by the dirstatemap code when the second parent is dropped
255 269 """
256 270 if self._p2_info:
257 271 self._p2_info = False
258 272 self._mode = None
259 273 self._size = None
260 274 self._mtime = None
261 275
262 276 @property
263 277 def mode(self):
264 278 return self.v1_mode()
265 279
266 280 @property
267 281 def size(self):
268 282 return self.v1_size()
269 283
270 284 @property
271 285 def mtime(self):
272 286 return self.v1_mtime()
273 287
274 288 @property
275 289 def state(self):
276 290 """
277 291 States are:
278 292 n normal
279 293 m needs merging
280 294 r marked for removal
281 295 a marked for addition
282 296
283 297 XXX This "state" is a bit obscure and mostly a direct expression of the
284 298 dirstatev1 format. It would make sense to ultimately deprecate it in
285 299 favor of the more "semantic" attributes.
286 300 """
287 301 if not self.any_tracked:
288 302 return b'?'
289 303 return self.v1_state()
290 304
291 305 @property
292 306 def has_fallback_exec(self):
293 307 """True if "fallback" information are available for the "exec" bit
294 308
295 309 Fallback information can be stored in the dirstate to keep track of
296 310 filesystem attribute tracked by Mercurial when the underlying file
297 311 system or operating system does not support that property, (e.g.
298 312 Windows).
299 313
300 314 Not all version of the dirstate on-disk storage support preserving this
301 315 information.
302 316 """
303 317 return self._fallback_exec is not None
304 318
305 319 @property
306 320 def fallback_exec(self):
307 321 """ "fallback" information for the executable bit
308 322
309 323 True if the file should be considered executable when we cannot get
310 324 this information from the files system. False if it should be
311 325 considered non-executable.
312 326
313 327 See has_fallback_exec for details."""
314 328 return self._fallback_exec
315 329
316 330 @fallback_exec.setter
317 331 def set_fallback_exec(self, value):
318 332 """control "fallback" executable bit
319 333
320 334 Set to:
321 335 - True if the file should be considered executable,
322 336 - False if the file should be considered non-executable,
323 337 - None if we do not have valid fallback data.
324 338
325 339 See has_fallback_exec for details."""
326 340 if value is None:
327 341 self._fallback_exec = None
328 342 else:
329 343 self._fallback_exec = bool(value)
330 344
331 345 @property
332 346 def has_fallback_symlink(self):
333 347 """True if "fallback" information are available for symlink status
334 348
335 349 Fallback information can be stored in the dirstate to keep track of
336 350 filesystem attribute tracked by Mercurial when the underlying file
337 351 system or operating system does not support that property, (e.g.
338 352 Windows).
339 353
340 354 Not all version of the dirstate on-disk storage support preserving this
341 355 information."""
342 356 return self._fallback_symlink is not None
343 357
344 358 @property
345 359 def fallback_symlink(self):
346 360 """ "fallback" information for symlink status
347 361
348 362 True if the file should be considered executable when we cannot get
349 363 this information from the files system. False if it should be
350 364 considered non-executable.
351 365
352 366 See has_fallback_exec for details."""
353 367 return self._fallback_symlink
354 368
355 369 @fallback_symlink.setter
356 370 def set_fallback_symlink(self, value):
357 371 """control "fallback" symlink status
358 372
359 373 Set to:
360 374 - True if the file should be considered a symlink,
361 375 - False if the file should be considered not a symlink,
362 376 - None if we do not have valid fallback data.
363 377
364 378 See has_fallback_symlink for details."""
365 379 if value is None:
366 380 self._fallback_symlink = None
367 381 else:
368 382 self._fallback_symlink = bool(value)
369 383
370 384 @property
371 385 def tracked(self):
372 386 """True is the file is tracked in the working copy"""
373 387 return self._wc_tracked
374 388
375 389 @property
376 390 def any_tracked(self):
377 391 """True is the file is tracked anywhere (wc or parents)"""
378 392 return self._wc_tracked or self._p1_tracked or self._p2_info
379 393
380 394 @property
381 395 def added(self):
382 396 """True if the file has been added"""
383 397 return self._wc_tracked and not (self._p1_tracked or self._p2_info)
384 398
385 399 @property
386 400 def maybe_clean(self):
387 401 """True if the file has a chance to be in the "clean" state"""
388 402 if not self._wc_tracked:
389 403 return False
390 404 elif not self._p1_tracked:
391 405 return False
392 406 elif self._p2_info:
393 407 return False
394 408 return True
395 409
396 410 @property
397 411 def p1_tracked(self):
398 412 """True if the file is tracked in the first parent manifest"""
399 413 return self._p1_tracked
400 414
401 415 @property
402 416 def p2_info(self):
403 417 """True if the file needed to merge or apply any input from p2
404 418
405 419 See the class documentation for details.
406 420 """
407 421 return self._wc_tracked and self._p2_info
408 422
409 423 @property
410 424 def removed(self):
411 425 """True if the file has been removed"""
412 426 return not self._wc_tracked and (self._p1_tracked or self._p2_info)
413 427
414 428 def v2_data(self):
415 429 """Returns (flags, mode, size, mtime) for v2 serialization"""
416 430 flags = 0
417 431 if self._wc_tracked:
418 432 flags |= DIRSTATE_V2_WDIR_TRACKED
419 433 if self._p1_tracked:
420 434 flags |= DIRSTATE_V2_P1_TRACKED
421 435 if self._p2_info:
422 436 flags |= DIRSTATE_V2_P2_INFO
423 437 if self._mode is not None and self._size is not None:
424 438 flags |= DIRSTATE_V2_HAS_MODE_AND_SIZE
425 439 if self.mode & stat.S_IXUSR:
426 440 flags |= DIRSTATE_V2_MODE_EXEC_PERM
427 441 if stat.S_ISLNK(self.mode):
428 442 flags |= DIRSTATE_V2_MODE_IS_SYMLINK
429 443 if self._mtime is not None:
430 444 flags |= DIRSTATE_V2_HAS_FILE_MTIME
445
446 if self._fallback_exec is not None:
447 flags |= DIRSTATE_V2_HAS_FALLBACK_EXEC
448 if self._fallback_exec:
449 flags |= DIRSTATE_V2_FALLBACK_EXEC
450
451 if self._fallback_symlink is not None:
452 flags |= DIRSTATE_V2_HAS_FALLBACK_SYMLINK
453 if self._fallback_symlink:
454 flags |= DIRSTATE_V2_FALLBACK_SYMLINK
455
431 456 # Note: we do not need to do anything regarding
432 457 # DIRSTATE_V2_ALL_UNKNOWN_RECORDED and DIRSTATE_V2_ALL_IGNORED_RECORDED
433 458 # since we never set _DIRSTATE_V2_HAS_DIRCTORY_MTIME
434 459 return (flags, self._size or 0, self._mtime or 0)
435 460
436 461 def v1_state(self):
437 462 """return a "state" suitable for v1 serialization"""
438 463 if not self.any_tracked:
439 464 # the object has no state to record, this is -currently-
440 465 # unsupported
441 466 raise RuntimeError('untracked item')
442 467 elif self.removed:
443 468 return b'r'
444 469 elif self._p1_tracked and self._p2_info:
445 470 return b'm'
446 471 elif self.added:
447 472 return b'a'
448 473 else:
449 474 return b'n'
450 475
451 476 def v1_mode(self):
452 477 """return a "mode" suitable for v1 serialization"""
453 478 return self._mode if self._mode is not None else 0
454 479
455 480 def v1_size(self):
456 481 """return a "size" suitable for v1 serialization"""
457 482 if not self.any_tracked:
458 483 # the object has no state to record, this is -currently-
459 484 # unsupported
460 485 raise RuntimeError('untracked item')
461 486 elif self.removed and self._p1_tracked and self._p2_info:
462 487 return NONNORMAL
463 488 elif self._p2_info:
464 489 return FROM_P2
465 490 elif self.removed:
466 491 return 0
467 492 elif self.added:
468 493 return NONNORMAL
469 494 elif self._size is None:
470 495 return NONNORMAL
471 496 else:
472 497 return self._size
473 498
474 499 def v1_mtime(self):
475 500 """return a "mtime" suitable for v1 serialization"""
476 501 if not self.any_tracked:
477 502 # the object has no state to record, this is -currently-
478 503 # unsupported
479 504 raise RuntimeError('untracked item')
480 505 elif self.removed:
481 506 return 0
482 507 elif self._mtime is None:
483 508 return AMBIGUOUS_TIME
484 509 elif self._p2_info:
485 510 return AMBIGUOUS_TIME
486 511 elif not self._p1_tracked:
487 512 return AMBIGUOUS_TIME
488 513 else:
489 514 return self._mtime
490 515
491 516 def need_delay(self, now):
492 517 """True if the stored mtime would be ambiguous with the current time"""
493 518 return self.v1_state() == b'n' and self.v1_mtime() == now
494 519
495 520
496 521 def gettype(q):
497 522 return int(q & 0xFFFF)
498 523
499 524
500 525 class BaseIndexObject(object):
501 526 # Can I be passed to an algorithme implemented in Rust ?
502 527 rust_ext_compat = 0
503 528 # Format of an index entry according to Python's `struct` language
504 529 index_format = revlog_constants.INDEX_ENTRY_V1
505 530 # Size of a C unsigned long long int, platform independent
506 531 big_int_size = struct.calcsize(b'>Q')
507 532 # Size of a C long int, platform independent
508 533 int_size = struct.calcsize(b'>i')
509 534 # An empty index entry, used as a default value to be overridden, or nullrev
510 535 null_item = (
511 536 0,
512 537 0,
513 538 0,
514 539 -1,
515 540 -1,
516 541 -1,
517 542 -1,
518 543 sha1nodeconstants.nullid,
519 544 0,
520 545 0,
521 546 revlog_constants.COMP_MODE_INLINE,
522 547 revlog_constants.COMP_MODE_INLINE,
523 548 )
524 549
525 550 @util.propertycache
526 551 def entry_size(self):
527 552 return self.index_format.size
528 553
529 554 @property
530 555 def nodemap(self):
531 556 msg = b"index.nodemap is deprecated, use index.[has_node|rev|get_rev]"
532 557 util.nouideprecwarn(msg, b'5.3', stacklevel=2)
533 558 return self._nodemap
534 559
535 560 @util.propertycache
536 561 def _nodemap(self):
537 562 nodemap = nodemaputil.NodeMap({sha1nodeconstants.nullid: nullrev})
538 563 for r in range(0, len(self)):
539 564 n = self[r][7]
540 565 nodemap[n] = r
541 566 return nodemap
542 567
543 568 def has_node(self, node):
544 569 """return True if the node exist in the index"""
545 570 return node in self._nodemap
546 571
547 572 def rev(self, node):
548 573 """return a revision for a node
549 574
550 575 If the node is unknown, raise a RevlogError"""
551 576 return self._nodemap[node]
552 577
553 578 def get_rev(self, node):
554 579 """return a revision for a node
555 580
556 581 If the node is unknown, return None"""
557 582 return self._nodemap.get(node)
558 583
559 584 def _stripnodes(self, start):
560 585 if '_nodemap' in vars(self):
561 586 for r in range(start, len(self)):
562 587 n = self[r][7]
563 588 del self._nodemap[n]
564 589
565 590 def clearcaches(self):
566 591 self.__dict__.pop('_nodemap', None)
567 592
568 593 def __len__(self):
569 594 return self._lgt + len(self._extra)
570 595
571 596 def append(self, tup):
572 597 if '_nodemap' in vars(self):
573 598 self._nodemap[tup[7]] = len(self)
574 599 data = self._pack_entry(len(self), tup)
575 600 self._extra.append(data)
576 601
577 602 def _pack_entry(self, rev, entry):
578 603 assert entry[8] == 0
579 604 assert entry[9] == 0
580 605 return self.index_format.pack(*entry[:8])
581 606
582 607 def _check_index(self, i):
583 608 if not isinstance(i, int):
584 609 raise TypeError(b"expecting int indexes")
585 610 if i < 0 or i >= len(self):
586 611 raise IndexError
587 612
588 613 def __getitem__(self, i):
589 614 if i == -1:
590 615 return self.null_item
591 616 self._check_index(i)
592 617 if i >= self._lgt:
593 618 data = self._extra[i - self._lgt]
594 619 else:
595 620 index = self._calculate_index(i)
596 621 data = self._data[index : index + self.entry_size]
597 622 r = self._unpack_entry(i, data)
598 623 if self._lgt and i == 0:
599 624 offset = revlogutils.offset_type(0, gettype(r[0]))
600 625 r = (offset,) + r[1:]
601 626 return r
602 627
603 628 def _unpack_entry(self, rev, data):
604 629 r = self.index_format.unpack(data)
605 630 r = r + (
606 631 0,
607 632 0,
608 633 revlog_constants.COMP_MODE_INLINE,
609 634 revlog_constants.COMP_MODE_INLINE,
610 635 )
611 636 return r
612 637
613 638 def pack_header(self, header):
614 639 """pack header information as binary"""
615 640 v_fmt = revlog_constants.INDEX_HEADER
616 641 return v_fmt.pack(header)
617 642
618 643 def entry_binary(self, rev):
619 644 """return the raw binary string representing a revision"""
620 645 entry = self[rev]
621 646 p = revlog_constants.INDEX_ENTRY_V1.pack(*entry[:8])
622 647 if rev == 0:
623 648 p = p[revlog_constants.INDEX_HEADER.size :]
624 649 return p
625 650
626 651
627 652 class IndexObject(BaseIndexObject):
628 653 def __init__(self, data):
629 654 assert len(data) % self.entry_size == 0, (
630 655 len(data),
631 656 self.entry_size,
632 657 len(data) % self.entry_size,
633 658 )
634 659 self._data = data
635 660 self._lgt = len(data) // self.entry_size
636 661 self._extra = []
637 662
638 663 def _calculate_index(self, i):
639 664 return i * self.entry_size
640 665
641 666 def __delitem__(self, i):
642 667 if not isinstance(i, slice) or not i.stop == -1 or i.step is not None:
643 668 raise ValueError(b"deleting slices only supports a:-1 with step 1")
644 669 i = i.start
645 670 self._check_index(i)
646 671 self._stripnodes(i)
647 672 if i < self._lgt:
648 673 self._data = self._data[: i * self.entry_size]
649 674 self._lgt = i
650 675 self._extra = []
651 676 else:
652 677 self._extra = self._extra[: i - self._lgt]
653 678
654 679
655 680 class PersistentNodeMapIndexObject(IndexObject):
656 681 """a Debug oriented class to test persistent nodemap
657 682
658 683 We need a simple python object to test API and higher level behavior. See
659 684 the Rust implementation for more serious usage. This should be used only
660 685 through the dedicated `devel.persistent-nodemap` config.
661 686 """
662 687
663 688 def nodemap_data_all(self):
664 689 """Return bytes containing a full serialization of a nodemap
665 690
666 691 The nodemap should be valid for the full set of revisions in the
667 692 index."""
668 693 return nodemaputil.persistent_data(self)
669 694
670 695 def nodemap_data_incremental(self):
671 696 """Return bytes containing a incremental update to persistent nodemap
672 697
673 698 This containst the data for an append-only update of the data provided
674 699 in the last call to `update_nodemap_data`.
675 700 """
676 701 if self._nm_root is None:
677 702 return None
678 703 docket = self._nm_docket
679 704 changed, data = nodemaputil.update_persistent_data(
680 705 self, self._nm_root, self._nm_max_idx, self._nm_docket.tip_rev
681 706 )
682 707
683 708 self._nm_root = self._nm_max_idx = self._nm_docket = None
684 709 return docket, changed, data
685 710
686 711 def update_nodemap_data(self, docket, nm_data):
687 712 """provide full block of persisted binary data for a nodemap
688 713
689 714 The data are expected to come from disk. See `nodemap_data_all` for a
690 715 produceur of such data."""
691 716 if nm_data is not None:
692 717 self._nm_root, self._nm_max_idx = nodemaputil.parse_data(nm_data)
693 718 if self._nm_root:
694 719 self._nm_docket = docket
695 720 else:
696 721 self._nm_root = self._nm_max_idx = self._nm_docket = None
697 722
698 723
699 724 class InlinedIndexObject(BaseIndexObject):
700 725 def __init__(self, data, inline=0):
701 726 self._data = data
702 727 self._lgt = self._inline_scan(None)
703 728 self._inline_scan(self._lgt)
704 729 self._extra = []
705 730
706 731 def _inline_scan(self, lgt):
707 732 off = 0
708 733 if lgt is not None:
709 734 self._offsets = [0] * lgt
710 735 count = 0
711 736 while off <= len(self._data) - self.entry_size:
712 737 start = off + self.big_int_size
713 738 (s,) = struct.unpack(
714 739 b'>i',
715 740 self._data[start : start + self.int_size],
716 741 )
717 742 if lgt is not None:
718 743 self._offsets[count] = off
719 744 count += 1
720 745 off += self.entry_size + s
721 746 if off != len(self._data):
722 747 raise ValueError(b"corrupted data")
723 748 return count
724 749
725 750 def __delitem__(self, i):
726 751 if not isinstance(i, slice) or not i.stop == -1 or i.step is not None:
727 752 raise ValueError(b"deleting slices only supports a:-1 with step 1")
728 753 i = i.start
729 754 self._check_index(i)
730 755 self._stripnodes(i)
731 756 if i < self._lgt:
732 757 self._offsets = self._offsets[:i]
733 758 self._lgt = i
734 759 self._extra = []
735 760 else:
736 761 self._extra = self._extra[: i - self._lgt]
737 762
738 763 def _calculate_index(self, i):
739 764 return self._offsets[i]
740 765
741 766
742 767 def parse_index2(data, inline, revlogv2=False):
743 768 if not inline:
744 769 cls = IndexObject2 if revlogv2 else IndexObject
745 770 return cls(data), None
746 771 cls = InlinedIndexObject
747 772 return cls(data, inline), (0, data)
748 773
749 774
750 775 def parse_index_cl_v2(data):
751 776 return IndexChangelogV2(data), None
752 777
753 778
754 779 class IndexObject2(IndexObject):
755 780 index_format = revlog_constants.INDEX_ENTRY_V2
756 781
757 782 def replace_sidedata_info(
758 783 self,
759 784 rev,
760 785 sidedata_offset,
761 786 sidedata_length,
762 787 offset_flags,
763 788 compression_mode,
764 789 ):
765 790 """
766 791 Replace an existing index entry's sidedata offset and length with new
767 792 ones.
768 793 This cannot be used outside of the context of sidedata rewriting,
769 794 inside the transaction that creates the revision `rev`.
770 795 """
771 796 if rev < 0:
772 797 raise KeyError
773 798 self._check_index(rev)
774 799 if rev < self._lgt:
775 800 msg = b"cannot rewrite entries outside of this transaction"
776 801 raise KeyError(msg)
777 802 else:
778 803 entry = list(self[rev])
779 804 entry[0] = offset_flags
780 805 entry[8] = sidedata_offset
781 806 entry[9] = sidedata_length
782 807 entry[11] = compression_mode
783 808 entry = tuple(entry)
784 809 new = self._pack_entry(rev, entry)
785 810 self._extra[rev - self._lgt] = new
786 811
787 812 def _unpack_entry(self, rev, data):
788 813 data = self.index_format.unpack(data)
789 814 entry = data[:10]
790 815 data_comp = data[10] & 3
791 816 sidedata_comp = (data[10] & (3 << 2)) >> 2
792 817 return entry + (data_comp, sidedata_comp)
793 818
794 819 def _pack_entry(self, rev, entry):
795 820 data = entry[:10]
796 821 data_comp = entry[10] & 3
797 822 sidedata_comp = (entry[11] & 3) << 2
798 823 data += (data_comp | sidedata_comp,)
799 824
800 825 return self.index_format.pack(*data)
801 826
802 827 def entry_binary(self, rev):
803 828 """return the raw binary string representing a revision"""
804 829 entry = self[rev]
805 830 return self._pack_entry(rev, entry)
806 831
807 832 def pack_header(self, header):
808 833 """pack header information as binary"""
809 834 msg = 'version header should go in the docket, not the index: %d'
810 835 msg %= header
811 836 raise error.ProgrammingError(msg)
812 837
813 838
814 839 class IndexChangelogV2(IndexObject2):
815 840 index_format = revlog_constants.INDEX_ENTRY_CL_V2
816 841
817 842 def _unpack_entry(self, rev, data, r=True):
818 843 items = self.index_format.unpack(data)
819 844 entry = items[:3] + (rev, rev) + items[3:8]
820 845 data_comp = items[8] & 3
821 846 sidedata_comp = (items[8] >> 2) & 3
822 847 return entry + (data_comp, sidedata_comp)
823 848
824 849 def _pack_entry(self, rev, entry):
825 850 assert entry[3] == rev, entry[3]
826 851 assert entry[4] == rev, entry[4]
827 852 data = entry[:3] + entry[5:10]
828 853 data_comp = entry[10] & 3
829 854 sidedata_comp = (entry[11] & 3) << 2
830 855 data += (data_comp | sidedata_comp,)
831 856 return self.index_format.pack(*data)
832 857
833 858
834 859 def parse_index_devel_nodemap(data, inline):
835 860 """like parse_index2, but alway return a PersistentNodeMapIndexObject"""
836 861 return PersistentNodeMapIndexObject(data), None
837 862
838 863
839 864 def parse_dirstate(dmap, copymap, st):
840 865 parents = [st[:20], st[20:40]]
841 866 # dereference fields so they will be local in loop
842 867 format = b">cllll"
843 868 e_size = struct.calcsize(format)
844 869 pos1 = 40
845 870 l = len(st)
846 871
847 872 # the inner loop
848 873 while pos1 < l:
849 874 pos2 = pos1 + e_size
850 875 e = _unpack(b">cllll", st[pos1:pos2]) # a literal here is faster
851 876 pos1 = pos2 + e[4]
852 877 f = st[pos2:pos1]
853 878 if b'\0' in f:
854 879 f, c = f.split(b'\0')
855 880 copymap[f] = c
856 881 dmap[f] = DirstateItem.from_v1_data(*e[:4])
857 882 return parents
858 883
859 884
860 885 def pack_dirstate(dmap, copymap, pl, now):
861 886 now = int(now)
862 887 cs = stringio()
863 888 write = cs.write
864 889 write(b"".join(pl))
865 890 for f, e in pycompat.iteritems(dmap):
866 891 if e.need_delay(now):
867 892 # The file was last modified "simultaneously" with the current
868 893 # write to dirstate (i.e. within the same second for file-
869 894 # systems with a granularity of 1 sec). This commonly happens
870 895 # for at least a couple of files on 'update'.
871 896 # The user could change the file without changing its size
872 897 # within the same second. Invalidate the file's mtime in
873 898 # dirstate, forcing future 'status' calls to compare the
874 899 # contents of the file if the size is the same. This prevents
875 900 # mistakenly treating such files as clean.
876 901 e.set_possibly_dirty()
877 902
878 903 if f in copymap:
879 904 f = b"%s\0%s" % (f, copymap[f])
880 905 e = _pack(
881 906 b">cllll",
882 907 e.v1_state(),
883 908 e.v1_mode(),
884 909 e.v1_size(),
885 910 e.v1_mtime(),
886 911 len(f),
887 912 )
888 913 write(e)
889 914 write(f)
890 915 return cs.getvalue()
@@ -1,621 +1,637 b''
1 1 use crate::dirstate_tree::on_disk::DirstateV2ParseError;
2 2 use crate::errors::HgError;
3 3 use bitflags::bitflags;
4 4 use std::convert::{TryFrom, TryInto};
5 5 use std::fs;
6 6 use std::io;
7 7 use std::time::{SystemTime, UNIX_EPOCH};
8 8
9 9 #[derive(Copy, Clone, Debug, Eq, PartialEq)]
10 10 pub enum EntryState {
11 11 Normal,
12 12 Added,
13 13 Removed,
14 14 Merged,
15 15 }
16 16
17 17 /// The C implementation uses all signed types. This will be an issue
18 18 /// either when 4GB+ source files are commonplace or in 2038, whichever
19 19 /// comes first.
20 20 #[derive(Debug, PartialEq, Copy, Clone)]
21 21 pub struct DirstateEntry {
22 22 pub(crate) flags: Flags,
23 23 mode_size: Option<(u32, u32)>,
24 24 mtime: Option<u32>,
25 25 }
26 26
27 27 bitflags! {
28 28 pub(crate) struct Flags: u8 {
29 29 const WDIR_TRACKED = 1 << 0;
30 30 const P1_TRACKED = 1 << 1;
31 31 const P2_INFO = 1 << 2;
32 32 const HAS_FALLBACK_EXEC = 1 << 3;
33 33 const FALLBACK_EXEC = 1 << 4;
34 34 const HAS_FALLBACK_SYMLINK = 1 << 5;
35 35 const FALLBACK_SYMLINK = 1 << 6;
36 36 }
37 37 }
38 38
39 39 /// A Unix timestamp with nanoseconds precision
40 40 #[derive(Copy, Clone)]
41 41 pub struct TruncatedTimestamp {
42 42 truncated_seconds: u32,
43 43 /// Always in the `0 .. 1_000_000_000` range.
44 44 nanoseconds: u32,
45 45 }
46 46
47 47 impl TruncatedTimestamp {
48 48 /// Constructs from a timestamp potentially outside of the supported range,
49 49 /// and truncate the seconds components to its lower 31 bits.
50 50 ///
51 51 /// Panics if the nanoseconds components is not in the expected range.
52 52 pub fn new_truncate(seconds: i64, nanoseconds: u32) -> Self {
53 53 assert!(nanoseconds < NSEC_PER_SEC);
54 54 Self {
55 55 truncated_seconds: seconds as u32 & RANGE_MASK_31BIT,
56 56 nanoseconds,
57 57 }
58 58 }
59 59
60 60 /// Construct from components. Returns an error if they are not in the
61 61 /// expcted range.
62 62 pub fn from_already_truncated(
63 63 truncated_seconds: u32,
64 64 nanoseconds: u32,
65 65 ) -> Result<Self, DirstateV2ParseError> {
66 66 if truncated_seconds & !RANGE_MASK_31BIT == 0
67 67 && nanoseconds < NSEC_PER_SEC
68 68 {
69 69 Ok(Self {
70 70 truncated_seconds,
71 71 nanoseconds,
72 72 })
73 73 } else {
74 74 Err(DirstateV2ParseError)
75 75 }
76 76 }
77 77
78 78 pub fn for_mtime_of(metadata: &fs::Metadata) -> io::Result<Self> {
79 79 #[cfg(unix)]
80 80 {
81 81 use std::os::unix::fs::MetadataExt;
82 82 let seconds = metadata.mtime();
83 83 // i64Β -> u32 with value always in the `0 .. NSEC_PER_SEC` range
84 84 let nanoseconds = metadata.mtime_nsec().try_into().unwrap();
85 85 Ok(Self::new_truncate(seconds, nanoseconds))
86 86 }
87 87 #[cfg(not(unix))]
88 88 {
89 89 metadata.modified().map(Self::from)
90 90 }
91 91 }
92 92
93 93 /// The lower 31 bits of the number of seconds since the epoch.
94 94 pub fn truncated_seconds(&self) -> u32 {
95 95 self.truncated_seconds
96 96 }
97 97
98 98 /// The sub-second component of this timestamp, in nanoseconds.
99 99 /// Always in the `0 .. 1_000_000_000` range.
100 100 ///
101 101 /// This timestamp is after `(seconds, 0)` by this many nanoseconds.
102 102 pub fn nanoseconds(&self) -> u32 {
103 103 self.nanoseconds
104 104 }
105 105
106 106 /// Returns whether two timestamps are equal modulo 2**31 seconds.
107 107 ///
108 108 /// If this returns `true`, the original values converted from `SystemTime`
109 109 /// or given to `new_truncate` were very likely equal. A false positive is
110 110 /// possible if they were exactly a multiple of 2**31 seconds apart (around
111 111 /// 68 years). This is deemed very unlikely to happen by chance, especially
112 112 /// on filesystems that support sub-second precision.
113 113 ///
114 114 /// If someone is manipulating the modification times of some files to
115 115 /// intentionally make `hg status` return incorrect results, not truncating
116 116 /// wouldn’t help much since they can set exactly the expected timestamp.
117 117 pub fn very_likely_equal(self, other: Self) -> bool {
118 118 self.truncated_seconds == other.truncated_seconds
119 119 && self.nanoseconds == other.nanoseconds
120 120 }
121 121
122 122 pub fn very_likely_equal_to_mtime_of(
123 123 self,
124 124 metadata: &fs::Metadata,
125 125 ) -> io::Result<bool> {
126 126 Ok(self.very_likely_equal(Self::for_mtime_of(metadata)?))
127 127 }
128 128 }
129 129
130 130 impl From<SystemTime> for TruncatedTimestamp {
131 131 fn from(system_time: SystemTime) -> Self {
132 132 // On Unix, `SystemTime` is a wrapper for the `timespec` C struct:
133 133 // https://www.gnu.org/software/libc/manual/html_node/Time-Types.html#index-struct-timespec
134 134 // We want to effectively access its fields, but the Rust standard
135 135 // library does not expose them. The best we can do is:
136 136 let seconds;
137 137 let nanoseconds;
138 138 match system_time.duration_since(UNIX_EPOCH) {
139 139 Ok(duration) => {
140 140 seconds = duration.as_secs() as i64;
141 141 nanoseconds = duration.subsec_nanos();
142 142 }
143 143 Err(error) => {
144 144 // `system_time` is before `UNIX_EPOCH`.
145 145 // We need to undo this algorithm:
146 146 // https://github.com/rust-lang/rust/blob/6bed1f0bc3cc50c10aab26d5f94b16a00776b8a5/library/std/src/sys/unix/time.rs#L40-L41
147 147 let negative = error.duration();
148 148 let negative_secs = negative.as_secs() as i64;
149 149 let negative_nanos = negative.subsec_nanos();
150 150 if negative_nanos == 0 {
151 151 seconds = -negative_secs;
152 152 nanoseconds = 0;
153 153 } else {
154 154 // For example if `system_time` was 4.3Β seconds before
155 155 // the Unix epoch we get a Duration that represents
156 156 // `(-4, -0.3)` but we want `(-5, +0.7)`:
157 157 seconds = -1 - negative_secs;
158 158 nanoseconds = NSEC_PER_SEC - negative_nanos;
159 159 }
160 160 }
161 161 };
162 162 Self::new_truncate(seconds, nanoseconds)
163 163 }
164 164 }
165 165
166 166 const NSEC_PER_SEC: u32 = 1_000_000_000;
167 167 const RANGE_MASK_31BIT: u32 = 0x7FFF_FFFF;
168 168
169 169 pub const MTIME_UNSET: i32 = -1;
170 170
171 171 /// A `DirstateEntry` with a size of `-2` means that it was merged from the
172 172 /// other parent. This allows revert to pick the right status back during a
173 173 /// merge.
174 174 pub const SIZE_FROM_OTHER_PARENT: i32 = -2;
175 175 /// A special value used for internal representation of special case in
176 176 /// dirstate v1 format.
177 177 pub const SIZE_NON_NORMAL: i32 = -1;
178 178
179 179 impl DirstateEntry {
180 180 pub fn from_v2_data(
181 181 wdir_tracked: bool,
182 182 p1_tracked: bool,
183 183 p2_info: bool,
184 184 mode_size: Option<(u32, u32)>,
185 185 mtime: Option<u32>,
186 186 fallback_exec: Option<bool>,
187 187 fallback_symlink: Option<bool>,
188 188 ) -> Self {
189 189 if let Some((mode, size)) = mode_size {
190 190 // TODO: return an error for out of range values?
191 191 assert!(mode & !RANGE_MASK_31BIT == 0);
192 192 assert!(size & !RANGE_MASK_31BIT == 0);
193 193 }
194 194 if let Some(mtime) = mtime {
195 195 assert!(mtime & !RANGE_MASK_31BIT == 0);
196 196 }
197 197 let mut flags = Flags::empty();
198 198 flags.set(Flags::WDIR_TRACKED, wdir_tracked);
199 199 flags.set(Flags::P1_TRACKED, p1_tracked);
200 200 flags.set(Flags::P2_INFO, p2_info);
201 201 if let Some(exec) = fallback_exec {
202 202 flags.insert(Flags::HAS_FALLBACK_EXEC);
203 203 if exec {
204 204 flags.insert(Flags::FALLBACK_EXEC);
205 205 }
206 206 }
207 207 if let Some(exec) = fallback_symlink {
208 208 flags.insert(Flags::HAS_FALLBACK_SYMLINK);
209 209 if exec {
210 210 flags.insert(Flags::FALLBACK_SYMLINK);
211 211 }
212 212 }
213 213 Self {
214 214 flags,
215 215 mode_size,
216 216 mtime,
217 217 }
218 218 }
219 219
220 220 pub fn from_v1_data(
221 221 state: EntryState,
222 222 mode: i32,
223 223 size: i32,
224 224 mtime: i32,
225 225 ) -> Self {
226 226 match state {
227 227 EntryState::Normal => {
228 228 if size == SIZE_FROM_OTHER_PARENT {
229 229 Self {
230 230 // might be missing P1_TRACKED
231 231 flags: Flags::WDIR_TRACKED | Flags::P2_INFO,
232 232 mode_size: None,
233 233 mtime: None,
234 234 }
235 235 } else if size == SIZE_NON_NORMAL {
236 236 Self {
237 237 flags: Flags::WDIR_TRACKED | Flags::P1_TRACKED,
238 238 mode_size: None,
239 239 mtime: None,
240 240 }
241 241 } else if mtime == MTIME_UNSET {
242 242 // TODO:Β return an error for negative values?
243 243 let mode = u32::try_from(mode).unwrap();
244 244 let size = u32::try_from(size).unwrap();
245 245 Self {
246 246 flags: Flags::WDIR_TRACKED | Flags::P1_TRACKED,
247 247 mode_size: Some((mode, size)),
248 248 mtime: None,
249 249 }
250 250 } else {
251 251 // TODO:Β return an error for negative values?
252 252 let mode = u32::try_from(mode).unwrap();
253 253 let size = u32::try_from(size).unwrap();
254 254 let mtime = u32::try_from(mtime).unwrap();
255 255 Self {
256 256 flags: Flags::WDIR_TRACKED | Flags::P1_TRACKED,
257 257 mode_size: Some((mode, size)),
258 258 mtime: Some(mtime),
259 259 }
260 260 }
261 261 }
262 262 EntryState::Added => Self {
263 263 flags: Flags::WDIR_TRACKED,
264 264 mode_size: None,
265 265 mtime: None,
266 266 },
267 267 EntryState::Removed => Self {
268 268 flags: if size == SIZE_NON_NORMAL {
269 269 Flags::P1_TRACKED | Flags::P2_INFO
270 270 } else if size == SIZE_FROM_OTHER_PARENT {
271 271 // We don’t know if P1_TRACKED should be set (file history)
272 272 Flags::P2_INFO
273 273 } else {
274 274 Flags::P1_TRACKED
275 275 },
276 276 mode_size: None,
277 277 mtime: None,
278 278 },
279 279 EntryState::Merged => Self {
280 280 flags: Flags::WDIR_TRACKED
281 281 | Flags::P1_TRACKED // might not be true because of rename ?
282 282 | Flags::P2_INFO, // might not be true because of rename ?
283 283 mode_size: None,
284 284 mtime: None,
285 285 },
286 286 }
287 287 }
288 288
289 289 /// Creates a new entry in "removed" state.
290 290 ///
291 291 /// `size` is expected to be zero, `SIZE_NON_NORMAL`, or
292 292 /// `SIZE_FROM_OTHER_PARENT`
293 293 pub fn new_removed(size: i32) -> Self {
294 294 Self::from_v1_data(EntryState::Removed, 0, size, 0)
295 295 }
296 296
297 297 pub fn tracked(&self) -> bool {
298 298 self.flags.contains(Flags::WDIR_TRACKED)
299 299 }
300 300
301 301 pub fn p1_tracked(&self) -> bool {
302 302 self.flags.contains(Flags::P1_TRACKED)
303 303 }
304 304
305 305 fn in_either_parent(&self) -> bool {
306 306 self.flags.intersects(Flags::P1_TRACKED | Flags::P2_INFO)
307 307 }
308 308
309 309 pub fn removed(&self) -> bool {
310 310 self.in_either_parent() && !self.flags.contains(Flags::WDIR_TRACKED)
311 311 }
312 312
313 313 pub fn p2_info(&self) -> bool {
314 314 self.flags.contains(Flags::WDIR_TRACKED | Flags::P2_INFO)
315 315 }
316 316
317 317 pub fn added(&self) -> bool {
318 318 self.flags.contains(Flags::WDIR_TRACKED) && !self.in_either_parent()
319 319 }
320 320
321 321 pub fn maybe_clean(&self) -> bool {
322 322 if !self.flags.contains(Flags::WDIR_TRACKED) {
323 323 false
324 324 } else if !self.flags.contains(Flags::P1_TRACKED) {
325 325 false
326 326 } else if self.flags.contains(Flags::P2_INFO) {
327 327 false
328 328 } else {
329 329 true
330 330 }
331 331 }
332 332
333 333 pub fn any_tracked(&self) -> bool {
334 334 self.flags.intersects(
335 335 Flags::WDIR_TRACKED | Flags::P1_TRACKED | Flags::P2_INFO,
336 336 )
337 337 }
338 338
339 339 /// Returns `(wdir_tracked, p1_tracked, p2_info, mode_size, mtime)`
340 340 pub(crate) fn v2_data(
341 341 &self,
342 ) -> (bool, bool, bool, Option<(u32, u32)>, Option<u32>) {
342 ) -> (
343 bool,
344 bool,
345 bool,
346 Option<(u32, u32)>,
347 Option<u32>,
348 Option<bool>,
349 Option<bool>,
350 ) {
343 351 if !self.any_tracked() {
344 352 // TODO: return an Option instead?
345 353 panic!("Accessing v1_state of an untracked DirstateEntry")
346 354 }
347 355 let wdir_tracked = self.flags.contains(Flags::WDIR_TRACKED);
348 356 let p1_tracked = self.flags.contains(Flags::P1_TRACKED);
349 357 let p2_info = self.flags.contains(Flags::P2_INFO);
350 358 let mode_size = self.mode_size;
351 359 let mtime = self.mtime;
352 (wdir_tracked, p1_tracked, p2_info, mode_size, mtime)
360 (
361 wdir_tracked,
362 p1_tracked,
363 p2_info,
364 mode_size,
365 mtime,
366 self.get_fallback_exec(),
367 self.get_fallback_symlink(),
368 )
353 369 }
354 370
355 371 fn v1_state(&self) -> EntryState {
356 372 if !self.any_tracked() {
357 373 // TODO: return an Option instead?
358 374 panic!("Accessing v1_state of an untracked DirstateEntry")
359 375 }
360 376 if self.removed() {
361 377 EntryState::Removed
362 378 } else if self
363 379 .flags
364 380 .contains(Flags::WDIR_TRACKED | Flags::P1_TRACKED | Flags::P2_INFO)
365 381 {
366 382 EntryState::Merged
367 383 } else if self.added() {
368 384 EntryState::Added
369 385 } else {
370 386 EntryState::Normal
371 387 }
372 388 }
373 389
374 390 fn v1_mode(&self) -> i32 {
375 391 if let Some((mode, _size)) = self.mode_size {
376 392 i32::try_from(mode).unwrap()
377 393 } else {
378 394 0
379 395 }
380 396 }
381 397
382 398 fn v1_size(&self) -> i32 {
383 399 if !self.any_tracked() {
384 400 // TODO: return an Option instead?
385 401 panic!("Accessing v1_size of an untracked DirstateEntry")
386 402 }
387 403 if self.removed()
388 404 && self.flags.contains(Flags::P1_TRACKED | Flags::P2_INFO)
389 405 {
390 406 SIZE_NON_NORMAL
391 407 } else if self.flags.contains(Flags::P2_INFO) {
392 408 SIZE_FROM_OTHER_PARENT
393 409 } else if self.removed() {
394 410 0
395 411 } else if self.added() {
396 412 SIZE_NON_NORMAL
397 413 } else if let Some((_mode, size)) = self.mode_size {
398 414 i32::try_from(size).unwrap()
399 415 } else {
400 416 SIZE_NON_NORMAL
401 417 }
402 418 }
403 419
404 420 fn v1_mtime(&self) -> i32 {
405 421 if !self.any_tracked() {
406 422 // TODO: return an Option instead?
407 423 panic!("Accessing v1_mtime of an untracked DirstateEntry")
408 424 }
409 425 if self.removed() {
410 426 0
411 427 } else if self.flags.contains(Flags::P2_INFO) {
412 428 MTIME_UNSET
413 429 } else if !self.flags.contains(Flags::P1_TRACKED) {
414 430 MTIME_UNSET
415 431 } else if let Some(mtime) = self.mtime {
416 432 i32::try_from(mtime).unwrap()
417 433 } else {
418 434 MTIME_UNSET
419 435 }
420 436 }
421 437
422 438 // TODO: return `Option<EntryState>`? None when `!self.any_tracked`
423 439 pub fn state(&self) -> EntryState {
424 440 self.v1_state()
425 441 }
426 442
427 443 // TODO: return Option?
428 444 pub fn mode(&self) -> i32 {
429 445 self.v1_mode()
430 446 }
431 447
432 448 // TODO: return Option?
433 449 pub fn size(&self) -> i32 {
434 450 self.v1_size()
435 451 }
436 452
437 453 // TODO: return Option?
438 454 pub fn mtime(&self) -> i32 {
439 455 self.v1_mtime()
440 456 }
441 457
442 458 pub fn get_fallback_exec(&self) -> Option<bool> {
443 459 if self.flags.contains(Flags::HAS_FALLBACK_EXEC) {
444 460 Some(self.flags.contains(Flags::FALLBACK_EXEC))
445 461 } else {
446 462 None
447 463 }
448 464 }
449 465
450 466 pub fn set_fallback_exec(&mut self, value: Option<bool>) {
451 467 match value {
452 468 None => {
453 469 self.flags.remove(Flags::HAS_FALLBACK_EXEC);
454 470 self.flags.remove(Flags::FALLBACK_EXEC);
455 471 }
456 472 Some(exec) => {
457 473 self.flags.insert(Flags::HAS_FALLBACK_EXEC);
458 474 if exec {
459 475 self.flags.insert(Flags::FALLBACK_EXEC);
460 476 }
461 477 }
462 478 }
463 479 }
464 480
465 481 pub fn get_fallback_symlink(&self) -> Option<bool> {
466 482 if self.flags.contains(Flags::HAS_FALLBACK_SYMLINK) {
467 483 Some(self.flags.contains(Flags::FALLBACK_SYMLINK))
468 484 } else {
469 485 None
470 486 }
471 487 }
472 488
473 489 pub fn set_fallback_symlink(&mut self, value: Option<bool>) {
474 490 match value {
475 491 None => {
476 492 self.flags.remove(Flags::HAS_FALLBACK_SYMLINK);
477 493 self.flags.remove(Flags::FALLBACK_SYMLINK);
478 494 }
479 495 Some(symlink) => {
480 496 self.flags.insert(Flags::HAS_FALLBACK_SYMLINK);
481 497 if symlink {
482 498 self.flags.insert(Flags::FALLBACK_SYMLINK);
483 499 }
484 500 }
485 501 }
486 502 }
487 503
488 504 pub fn drop_merge_data(&mut self) {
489 505 if self.flags.contains(Flags::P2_INFO) {
490 506 self.flags.remove(Flags::P2_INFO);
491 507 self.mode_size = None;
492 508 self.mtime = None;
493 509 }
494 510 }
495 511
496 512 pub fn set_possibly_dirty(&mut self) {
497 513 self.mtime = None
498 514 }
499 515
500 516 pub fn set_clean(&mut self, mode: u32, size: u32, mtime: u32) {
501 517 let size = size & RANGE_MASK_31BIT;
502 518 let mtime = mtime & RANGE_MASK_31BIT;
503 519 self.flags.insert(Flags::WDIR_TRACKED | Flags::P1_TRACKED);
504 520 self.mode_size = Some((mode, size));
505 521 self.mtime = Some(mtime);
506 522 }
507 523
508 524 pub fn set_tracked(&mut self) {
509 525 self.flags.insert(Flags::WDIR_TRACKED);
510 526 // `set_tracked` is replacing various `normallookup` call. So we mark
511 527 // the files as needing lookup
512 528 //
513 529 // Consider dropping this in the future in favor of something less
514 530 // broad.
515 531 self.mtime = None;
516 532 }
517 533
518 534 pub fn set_untracked(&mut self) {
519 535 self.flags.remove(Flags::WDIR_TRACKED);
520 536 self.mode_size = None;
521 537 self.mtime = None;
522 538 }
523 539
524 540 /// Returns `(state, mode, size, mtime)` for the puprose of serialization
525 541 /// in the dirstate-v1 format.
526 542 ///
527 543 /// This includes marker values such as `mtime == -1`. In the future we may
528 544 /// want to not represent these cases that way in memory, but serialization
529 545 /// will need to keep the same format.
530 546 pub fn v1_data(&self) -> (u8, i32, i32, i32) {
531 547 (
532 548 self.v1_state().into(),
533 549 self.v1_mode(),
534 550 self.v1_size(),
535 551 self.v1_mtime(),
536 552 )
537 553 }
538 554
539 555 pub(crate) fn is_from_other_parent(&self) -> bool {
540 556 self.state() == EntryState::Normal
541 557 && self.size() == SIZE_FROM_OTHER_PARENT
542 558 }
543 559
544 560 // TODO: other platforms
545 561 #[cfg(unix)]
546 562 pub fn mode_changed(
547 563 &self,
548 564 filesystem_metadata: &std::fs::Metadata,
549 565 ) -> bool {
550 566 use std::os::unix::fs::MetadataExt;
551 567 const EXEC_BIT_MASK: u32 = 0o100;
552 568 let dirstate_exec_bit = (self.mode() as u32) & EXEC_BIT_MASK;
553 569 let fs_exec_bit = filesystem_metadata.mode() & EXEC_BIT_MASK;
554 570 dirstate_exec_bit != fs_exec_bit
555 571 }
556 572
557 573 /// Returns a `(state, mode, size, mtime)` tuple as for
558 574 /// `DirstateMapMethods::debug_iter`.
559 575 pub fn debug_tuple(&self) -> (u8, i32, i32, i32) {
560 576 (self.state().into(), self.mode(), self.size(), self.mtime())
561 577 }
562 578
563 579 pub fn mtime_is_ambiguous(&self, now: i32) -> bool {
564 580 self.state() == EntryState::Normal && self.mtime() == now
565 581 }
566 582
567 583 pub fn clear_ambiguous_mtime(&mut self, now: i32) -> bool {
568 584 let ambiguous = self.mtime_is_ambiguous(now);
569 585 if ambiguous {
570 586 // The file was last modified "simultaneously" with the current
571 587 // write to dirstate (i.e. within the same second for file-
572 588 // systems with a granularity of 1 sec). This commonly happens
573 589 // for at least a couple of files on 'update'.
574 590 // The user could change the file without changing its size
575 591 // within the same second. Invalidate the file's mtime in
576 592 // dirstate, forcing future 'status' calls to compare the
577 593 // contents of the file if the size is the same. This prevents
578 594 // mistakenly treating such files as clean.
579 595 self.set_possibly_dirty()
580 596 }
581 597 ambiguous
582 598 }
583 599 }
584 600
585 601 impl EntryState {
586 602 pub fn is_tracked(self) -> bool {
587 603 use EntryState::*;
588 604 match self {
589 605 Normal | Added | Merged => true,
590 606 Removed => false,
591 607 }
592 608 }
593 609 }
594 610
595 611 impl TryFrom<u8> for EntryState {
596 612 type Error = HgError;
597 613
598 614 fn try_from(value: u8) -> Result<Self, Self::Error> {
599 615 match value {
600 616 b'n' => Ok(EntryState::Normal),
601 617 b'a' => Ok(EntryState::Added),
602 618 b'r' => Ok(EntryState::Removed),
603 619 b'm' => Ok(EntryState::Merged),
604 620 _ => Err(HgError::CorruptedRepository(format!(
605 621 "Incorrect dirstate entry state {}",
606 622 value
607 623 ))),
608 624 }
609 625 }
610 626 }
611 627
612 628 impl Into<u8> for EntryState {
613 629 fn into(self) -> u8 {
614 630 match self {
615 631 EntryState::Normal => b'n',
616 632 EntryState::Added => b'a',
617 633 EntryState::Removed => b'r',
618 634 EntryState::Merged => b'm',
619 635 }
620 636 }
621 637 }
@@ -1,750 +1,773 b''
1 1 //! The "version 2" disk representation of the dirstate
2 2 //!
3 3 //! See `mercurial/helptext/internals/dirstate-v2.txt`
4 4
5 5 use crate::dirstate::TruncatedTimestamp;
6 6 use crate::dirstate_tree::dirstate_map::{self, DirstateMap, NodeRef};
7 7 use crate::dirstate_tree::path_with_basename::WithBasename;
8 8 use crate::errors::HgError;
9 9 use crate::utils::hg_path::HgPath;
10 10 use crate::DirstateEntry;
11 11 use crate::DirstateError;
12 12 use crate::DirstateParents;
13 13 use bitflags::bitflags;
14 14 use bytes_cast::unaligned::{U16Be, U32Be};
15 15 use bytes_cast::BytesCast;
16 16 use format_bytes::format_bytes;
17 17 use std::borrow::Cow;
18 18 use std::convert::{TryFrom, TryInto};
19 19
20 20 /// Added at the start of `.hg/dirstate` when the "v2" format is used.
21 21 /// This a redundant sanity check more than an actual "magic number" since
22 22 /// `.hg/requires` already governs which format should be used.
23 23 pub const V2_FORMAT_MARKER: &[u8; 12] = b"dirstate-v2\n";
24 24
25 25 /// Keep space for 256-bit hashes
26 26 const STORED_NODE_ID_BYTES: usize = 32;
27 27
28 28 /// … even though only 160 bits are used for now, with SHA-1
29 29 const USED_NODE_ID_BYTES: usize = 20;
30 30
31 31 pub(super) const IGNORE_PATTERNS_HASH_LEN: usize = 20;
32 32 pub(super) type IgnorePatternsHash = [u8; IGNORE_PATTERNS_HASH_LEN];
33 33
34 34 /// Must match constants of the same names in `mercurial/dirstateutils/v2.py`
35 35 const TREE_METADATA_SIZE: usize = 44;
36 36 const NODE_SIZE: usize = 44;
37 37
38 38 /// Make sure that size-affecting changes are made knowingly
39 39 #[allow(unused)]
40 40 fn static_assert_size_of() {
41 41 let _ = std::mem::transmute::<TreeMetadata, [u8; TREE_METADATA_SIZE]>;
42 42 let _ = std::mem::transmute::<DocketHeader, [u8; TREE_METADATA_SIZE + 81]>;
43 43 let _ = std::mem::transmute::<Node, [u8; NODE_SIZE]>;
44 44 }
45 45
46 46 // Must match `HEADER` in `mercurial/dirstateutils/docket.py`
47 47 #[derive(BytesCast)]
48 48 #[repr(C)]
49 49 struct DocketHeader {
50 50 marker: [u8; V2_FORMAT_MARKER.len()],
51 51 parent_1: [u8; STORED_NODE_ID_BYTES],
52 52 parent_2: [u8; STORED_NODE_ID_BYTES],
53 53
54 54 metadata: TreeMetadata,
55 55
56 56 /// Counted in bytes
57 57 data_size: Size,
58 58
59 59 uuid_size: u8,
60 60 }
61 61
62 62 pub struct Docket<'on_disk> {
63 63 header: &'on_disk DocketHeader,
64 64 uuid: &'on_disk [u8],
65 65 }
66 66
67 67 /// Fields are documented in the *Tree metadata in the docket file*
68 68 /// section of `mercurial/helptext/internals/dirstate-v2.txt`
69 69 #[derive(BytesCast)]
70 70 #[repr(C)]
71 71 struct TreeMetadata {
72 72 root_nodes: ChildNodes,
73 73 nodes_with_entry_count: Size,
74 74 nodes_with_copy_source_count: Size,
75 75 unreachable_bytes: Size,
76 76 unused: [u8; 4],
77 77
78 78 /// See *Optional hash of ignore patterns* section of
79 79 /// `mercurial/helptext/internals/dirstate-v2.txt`
80 80 ignore_patterns_hash: IgnorePatternsHash,
81 81 }
82 82
83 83 /// Fields are documented in the *The data file format*
84 84 /// section of `mercurial/helptext/internals/dirstate-v2.txt`
85 85 #[derive(BytesCast)]
86 86 #[repr(C)]
87 87 pub(super) struct Node {
88 88 full_path: PathSlice,
89 89
90 90 /// In bytes from `self.full_path.start`
91 91 base_name_start: PathSize,
92 92
93 93 copy_source: OptPathSlice,
94 94 children: ChildNodes,
95 95 pub(super) descendants_with_entry_count: Size,
96 96 pub(super) tracked_descendants_count: Size,
97 97 flags: U16Be,
98 98 size: U32Be,
99 99 mtime: PackedTruncatedTimestamp,
100 100 }
101 101
102 102 bitflags! {
103 103 #[repr(C)]
104 104 struct Flags: u16 {
105 105 const WDIR_TRACKED = 1 << 0;
106 106 const P1_TRACKED = 1 << 1;
107 107 const P2_INFO = 1 << 2;
108 108 const HAS_MODE_AND_SIZE = 1 << 3;
109 109 const HAS_FILE_MTIME = 1 << 4;
110 110 const HAS_DIRECTORY_MTIME = 1 << 5;
111 111 const MODE_EXEC_PERM = 1 << 6;
112 112 const MODE_IS_SYMLINK = 1 << 7;
113 113 const EXPECTED_STATE_IS_MODIFIED = 1 << 8;
114 114 const ALL_UNKNOWN_RECORDED = 1 << 9;
115 115 const ALL_IGNORED_RECORDED = 1 << 10;
116 const HAS_FALLBACK_EXEC = 1 << 11;
117 const FALLBACK_EXEC = 1 << 12;
118 const HAS_FALLBACK_SYMLINK = 1 << 13;
119 const FALLBACK_SYMLINK = 1 << 14;
116 120 }
117 121 }
118 122
119 123 /// Duration since the Unix epoch
120 124 #[derive(BytesCast, Copy, Clone)]
121 125 #[repr(C)]
122 126 struct PackedTruncatedTimestamp {
123 127 truncated_seconds: U32Be,
124 128 nanoseconds: U32Be,
125 129 }
126 130
127 131 /// Counted in bytes from the start of the file
128 132 ///
129 133 /// NOTE: not supporting `.hg/dirstate` files larger than 4 GiB.
130 134 type Offset = U32Be;
131 135
132 136 /// Counted in number of items
133 137 ///
134 138 /// NOTE: we choose not to support counting more than 4 billion nodes anywhere.
135 139 type Size = U32Be;
136 140
137 141 /// Counted in bytes
138 142 ///
139 143 /// NOTE: we choose not to support file names/paths longer than 64 KiB.
140 144 type PathSize = U16Be;
141 145
142 146 /// A contiguous sequence of `len` times `Node`, representing the child nodes
143 147 /// of either some other node or of the repository root.
144 148 ///
145 149 /// Always sorted by ascending `full_path`, to allow binary search.
146 150 /// Since nodes with the same parent nodes also have the same parent path,
147 151 /// only the `base_name`s need to be compared during binary search.
148 152 #[derive(BytesCast, Copy, Clone)]
149 153 #[repr(C)]
150 154 struct ChildNodes {
151 155 start: Offset,
152 156 len: Size,
153 157 }
154 158
155 159 /// A `HgPath` of `len` bytes
156 160 #[derive(BytesCast, Copy, Clone)]
157 161 #[repr(C)]
158 162 struct PathSlice {
159 163 start: Offset,
160 164 len: PathSize,
161 165 }
162 166
163 167 /// Either nothing if `start == 0`, or a `HgPath` of `len` bytes
164 168 type OptPathSlice = PathSlice;
165 169
166 170 /// Unexpected file format found in `.hg/dirstate` with the "v2" format.
167 171 ///
168 172 /// This should only happen if Mercurial is buggy or a repository is corrupted.
169 173 #[derive(Debug)]
170 174 pub struct DirstateV2ParseError;
171 175
172 176 impl From<DirstateV2ParseError> for HgError {
173 177 fn from(_: DirstateV2ParseError) -> Self {
174 178 HgError::corrupted("dirstate-v2 parse error")
175 179 }
176 180 }
177 181
178 182 impl From<DirstateV2ParseError> for crate::DirstateError {
179 183 fn from(error: DirstateV2ParseError) -> Self {
180 184 HgError::from(error).into()
181 185 }
182 186 }
183 187
184 188 impl<'on_disk> Docket<'on_disk> {
185 189 pub fn parents(&self) -> DirstateParents {
186 190 use crate::Node;
187 191 let p1 = Node::try_from(&self.header.parent_1[..USED_NODE_ID_BYTES])
188 192 .unwrap()
189 193 .clone();
190 194 let p2 = Node::try_from(&self.header.parent_2[..USED_NODE_ID_BYTES])
191 195 .unwrap()
192 196 .clone();
193 197 DirstateParents { p1, p2 }
194 198 }
195 199
196 200 pub fn tree_metadata(&self) -> &[u8] {
197 201 self.header.metadata.as_bytes()
198 202 }
199 203
200 204 pub fn data_size(&self) -> usize {
201 205 // This `unwrap` could only panic on a 16-bit CPU
202 206 self.header.data_size.get().try_into().unwrap()
203 207 }
204 208
205 209 pub fn data_filename(&self) -> String {
206 210 String::from_utf8(format_bytes!(b"dirstate.{}", self.uuid)).unwrap()
207 211 }
208 212 }
209 213
210 214 pub fn read_docket(
211 215 on_disk: &[u8],
212 216 ) -> Result<Docket<'_>, DirstateV2ParseError> {
213 217 let (header, uuid) =
214 218 DocketHeader::from_bytes(on_disk).map_err(|_| DirstateV2ParseError)?;
215 219 let uuid_size = header.uuid_size as usize;
216 220 if header.marker == *V2_FORMAT_MARKER && uuid.len() == uuid_size {
217 221 Ok(Docket { header, uuid })
218 222 } else {
219 223 Err(DirstateV2ParseError)
220 224 }
221 225 }
222 226
223 227 pub(super) fn read<'on_disk>(
224 228 on_disk: &'on_disk [u8],
225 229 metadata: &[u8],
226 230 ) -> Result<DirstateMap<'on_disk>, DirstateV2ParseError> {
227 231 if on_disk.is_empty() {
228 232 return Ok(DirstateMap::empty(on_disk));
229 233 }
230 234 let (meta, _) = TreeMetadata::from_bytes(metadata)
231 235 .map_err(|_| DirstateV2ParseError)?;
232 236 let dirstate_map = DirstateMap {
233 237 on_disk,
234 238 root: dirstate_map::ChildNodes::OnDisk(read_nodes(
235 239 on_disk,
236 240 meta.root_nodes,
237 241 )?),
238 242 nodes_with_entry_count: meta.nodes_with_entry_count.get(),
239 243 nodes_with_copy_source_count: meta.nodes_with_copy_source_count.get(),
240 244 ignore_patterns_hash: meta.ignore_patterns_hash,
241 245 unreachable_bytes: meta.unreachable_bytes.get(),
242 246 };
243 247 Ok(dirstate_map)
244 248 }
245 249
246 250 impl Node {
247 251 pub(super) fn full_path<'on_disk>(
248 252 &self,
249 253 on_disk: &'on_disk [u8],
250 254 ) -> Result<&'on_disk HgPath, DirstateV2ParseError> {
251 255 read_hg_path(on_disk, self.full_path)
252 256 }
253 257
254 258 pub(super) fn base_name_start<'on_disk>(
255 259 &self,
256 260 ) -> Result<usize, DirstateV2ParseError> {
257 261 let start = self.base_name_start.get();
258 262 if start < self.full_path.len.get() {
259 263 let start = usize::try_from(start)
260 264 // u32 -> usize, could only panic on a 16-bit CPU
261 265 .expect("dirstate-v2 base_name_start out of bounds");
262 266 Ok(start)
263 267 } else {
264 268 Err(DirstateV2ParseError)
265 269 }
266 270 }
267 271
268 272 pub(super) fn base_name<'on_disk>(
269 273 &self,
270 274 on_disk: &'on_disk [u8],
271 275 ) -> Result<&'on_disk HgPath, DirstateV2ParseError> {
272 276 let full_path = self.full_path(on_disk)?;
273 277 let base_name_start = self.base_name_start()?;
274 278 Ok(HgPath::new(&full_path.as_bytes()[base_name_start..]))
275 279 }
276 280
277 281 pub(super) fn path<'on_disk>(
278 282 &self,
279 283 on_disk: &'on_disk [u8],
280 284 ) -> Result<dirstate_map::NodeKey<'on_disk>, DirstateV2ParseError> {
281 285 Ok(WithBasename::from_raw_parts(
282 286 Cow::Borrowed(self.full_path(on_disk)?),
283 287 self.base_name_start()?,
284 288 ))
285 289 }
286 290
287 291 pub(super) fn has_copy_source<'on_disk>(&self) -> bool {
288 292 self.copy_source.start.get() != 0
289 293 }
290 294
291 295 pub(super) fn copy_source<'on_disk>(
292 296 &self,
293 297 on_disk: &'on_disk [u8],
294 298 ) -> Result<Option<&'on_disk HgPath>, DirstateV2ParseError> {
295 299 Ok(if self.has_copy_source() {
296 300 Some(read_hg_path(on_disk, self.copy_source)?)
297 301 } else {
298 302 None
299 303 })
300 304 }
301 305
302 306 fn flags(&self) -> Flags {
303 307 Flags::from_bits_truncate(self.flags.get())
304 308 }
305 309
306 310 fn has_entry(&self) -> bool {
307 311 self.flags().intersects(
308 312 Flags::WDIR_TRACKED | Flags::P1_TRACKED | Flags::P2_INFO,
309 313 )
310 314 }
311 315
312 316 pub(super) fn node_data(
313 317 &self,
314 318 ) -> Result<dirstate_map::NodeData, DirstateV2ParseError> {
315 319 if self.has_entry() {
316 320 Ok(dirstate_map::NodeData::Entry(self.assume_entry()))
317 321 } else if let Some(mtime) = self.cached_directory_mtime()? {
318 322 Ok(dirstate_map::NodeData::CachedDirectory { mtime })
319 323 } else {
320 324 Ok(dirstate_map::NodeData::None)
321 325 }
322 326 }
323 327
324 328 pub(super) fn cached_directory_mtime(
325 329 &self,
326 330 ) -> Result<Option<TruncatedTimestamp>, DirstateV2ParseError> {
327 331 // For now we do not have code to handle ALL_UNKNOWN_RECORDED, so we
328 332 // ignore the mtime if the flag is set.
329 333 if self.flags().contains(Flags::HAS_DIRECTORY_MTIME)
330 334 && self.flags().contains(Flags::ALL_UNKNOWN_RECORDED)
331 335 {
332 336 if self.flags().contains(Flags::HAS_FILE_MTIME) {
333 337 Err(DirstateV2ParseError)
334 338 } else {
335 339 Ok(Some(self.mtime.try_into()?))
336 340 }
337 341 } else {
338 342 Ok(None)
339 343 }
340 344 }
341 345
342 346 fn synthesize_unix_mode(&self) -> u32 {
343 347 let file_type = if self.flags().contains(Flags::MODE_IS_SYMLINK) {
344 348 libc::S_IFLNK
345 349 } else {
346 350 libc::S_IFREG
347 351 };
348 352 let permisions = if self.flags().contains(Flags::MODE_EXEC_PERM) {
349 353 0o755
350 354 } else {
351 355 0o644
352 356 };
353 357 file_type | permisions
354 358 }
355 359
356 360 fn assume_entry(&self) -> DirstateEntry {
357 361 // TODO: convert through raw bits instead?
358 362 let wdir_tracked = self.flags().contains(Flags::WDIR_TRACKED);
359 363 let p1_tracked = self.flags().contains(Flags::P1_TRACKED);
360 364 let p2_info = self.flags().contains(Flags::P2_INFO);
361 365 let mode_size = if self.flags().contains(Flags::HAS_MODE_AND_SIZE)
362 366 && !self.flags().contains(Flags::EXPECTED_STATE_IS_MODIFIED)
363 367 {
364 368 Some((self.synthesize_unix_mode(), self.size.into()))
365 369 } else {
366 370 None
367 371 };
368 372 let mtime = if self.flags().contains(Flags::HAS_FILE_MTIME)
369 373 && !self.flags().contains(Flags::EXPECTED_STATE_IS_MODIFIED)
370 374 {
371 375 Some(self.mtime.truncated_seconds.into())
372 376 } else {
373 377 None
374 378 };
375 379 DirstateEntry::from_v2_data(
376 380 wdir_tracked,
377 381 p1_tracked,
378 382 p2_info,
379 383 mode_size,
380 384 mtime,
381 385 None,
382 386 None,
383 387 )
384 388 }
385 389
386 390 pub(super) fn entry(
387 391 &self,
388 392 ) -> Result<Option<DirstateEntry>, DirstateV2ParseError> {
389 393 if self.has_entry() {
390 394 Ok(Some(self.assume_entry()))
391 395 } else {
392 396 Ok(None)
393 397 }
394 398 }
395 399
396 400 pub(super) fn children<'on_disk>(
397 401 &self,
398 402 on_disk: &'on_disk [u8],
399 403 ) -> Result<&'on_disk [Node], DirstateV2ParseError> {
400 404 read_nodes(on_disk, self.children)
401 405 }
402 406
403 407 pub(super) fn to_in_memory_node<'on_disk>(
404 408 &self,
405 409 on_disk: &'on_disk [u8],
406 410 ) -> Result<dirstate_map::Node<'on_disk>, DirstateV2ParseError> {
407 411 Ok(dirstate_map::Node {
408 412 children: dirstate_map::ChildNodes::OnDisk(
409 413 self.children(on_disk)?,
410 414 ),
411 415 copy_source: self.copy_source(on_disk)?.map(Cow::Borrowed),
412 416 data: self.node_data()?,
413 417 descendants_with_entry_count: self
414 418 .descendants_with_entry_count
415 419 .get(),
416 420 tracked_descendants_count: self.tracked_descendants_count.get(),
417 421 })
418 422 }
419 423
420 424 fn from_dirstate_entry(
421 425 entry: &DirstateEntry,
422 426 ) -> (Flags, U32Be, PackedTruncatedTimestamp) {
423 let (wdir_tracked, p1_tracked, p2_info, mode_size_opt, mtime_opt) =
424 entry.v2_data();
427 let (
428 wdir_tracked,
429 p1_tracked,
430 p2_info,
431 mode_size_opt,
432 mtime_opt,
433 fallback_exec,
434 fallback_symlink,
435 ) = entry.v2_data();
425 436 // TODO: convert throug raw flag bits instead?
426 437 let mut flags = Flags::empty();
427 438 flags.set(Flags::WDIR_TRACKED, wdir_tracked);
428 439 flags.set(Flags::P1_TRACKED, p1_tracked);
429 440 flags.set(Flags::P2_INFO, p2_info);
430 441 let size = if let Some((m, s)) = mode_size_opt {
431 442 let exec_perm = m & libc::S_IXUSR != 0;
432 443 let is_symlink = m & libc::S_IFMT == libc::S_IFLNK;
433 444 flags.set(Flags::MODE_EXEC_PERM, exec_perm);
434 445 flags.set(Flags::MODE_IS_SYMLINK, is_symlink);
435 446 flags.insert(Flags::HAS_MODE_AND_SIZE);
436 447 s.into()
437 448 } else {
438 449 0.into()
439 450 };
440 451 let mtime = if let Some(m) = mtime_opt {
441 452 flags.insert(Flags::HAS_FILE_MTIME);
442 453 PackedTruncatedTimestamp {
443 454 truncated_seconds: m.into(),
444 455 nanoseconds: 0.into(),
445 456 }
446 457 } else {
447 458 PackedTruncatedTimestamp::null()
448 459 };
460 if let Some(f_exec) = fallback_exec {
461 flags.insert(Flags::HAS_FALLBACK_EXEC);
462 if f_exec {
463 flags.insert(Flags::FALLBACK_EXEC);
464 }
465 }
466 if let Some(f_symlink) = fallback_symlink {
467 flags.insert(Flags::HAS_FALLBACK_SYMLINK);
468 if f_symlink {
469 flags.insert(Flags::FALLBACK_SYMLINK);
470 }
471 }
449 472 (flags, size, mtime)
450 473 }
451 474 }
452 475
453 476 fn read_hg_path(
454 477 on_disk: &[u8],
455 478 slice: PathSlice,
456 479 ) -> Result<&HgPath, DirstateV2ParseError> {
457 480 read_slice(on_disk, slice.start, slice.len.get()).map(HgPath::new)
458 481 }
459 482
460 483 fn read_nodes(
461 484 on_disk: &[u8],
462 485 slice: ChildNodes,
463 486 ) -> Result<&[Node], DirstateV2ParseError> {
464 487 read_slice(on_disk, slice.start, slice.len.get())
465 488 }
466 489
467 490 fn read_slice<T, Len>(
468 491 on_disk: &[u8],
469 492 start: Offset,
470 493 len: Len,
471 494 ) -> Result<&[T], DirstateV2ParseError>
472 495 where
473 496 T: BytesCast,
474 497 Len: TryInto<usize>,
475 498 {
476 499 // Either `usize::MAX` would result in "out of bounds" error since a single
477 500 // `&[u8]` cannot occupy the entire addess space.
478 501 let start = start.get().try_into().unwrap_or(std::usize::MAX);
479 502 let len = len.try_into().unwrap_or(std::usize::MAX);
480 503 on_disk
481 504 .get(start..)
482 505 .and_then(|bytes| T::slice_from_bytes(bytes, len).ok())
483 506 .map(|(slice, _rest)| slice)
484 507 .ok_or_else(|| DirstateV2ParseError)
485 508 }
486 509
487 510 pub(crate) fn for_each_tracked_path<'on_disk>(
488 511 on_disk: &'on_disk [u8],
489 512 metadata: &[u8],
490 513 mut f: impl FnMut(&'on_disk HgPath),
491 514 ) -> Result<(), DirstateV2ParseError> {
492 515 let (meta, _) = TreeMetadata::from_bytes(metadata)
493 516 .map_err(|_| DirstateV2ParseError)?;
494 517 fn recur<'on_disk>(
495 518 on_disk: &'on_disk [u8],
496 519 nodes: ChildNodes,
497 520 f: &mut impl FnMut(&'on_disk HgPath),
498 521 ) -> Result<(), DirstateV2ParseError> {
499 522 for node in read_nodes(on_disk, nodes)? {
500 523 if let Some(entry) = node.entry()? {
501 524 if entry.state().is_tracked() {
502 525 f(node.full_path(on_disk)?)
503 526 }
504 527 }
505 528 recur(on_disk, node.children, f)?
506 529 }
507 530 Ok(())
508 531 }
509 532 recur(on_disk, meta.root_nodes, &mut f)
510 533 }
511 534
512 535 /// Returns new data and metadata, together with whether that data should be
513 536 /// appended to the existing data file whose content is at
514 537 /// `dirstate_map.on_disk` (true), instead of written to a new data file
515 538 /// (false).
516 539 pub(super) fn write(
517 540 dirstate_map: &mut DirstateMap,
518 541 can_append: bool,
519 542 ) -> Result<(Vec<u8>, Vec<u8>, bool), DirstateError> {
520 543 let append = can_append && dirstate_map.write_should_append();
521 544
522 545 // This ignores the space for paths, and for nodes without an entry.
523 546 // TODO: better estimate? Skip the `Vec` and write to a file directly?
524 547 let size_guess = std::mem::size_of::<Node>()
525 548 * dirstate_map.nodes_with_entry_count as usize;
526 549
527 550 let mut writer = Writer {
528 551 dirstate_map,
529 552 append,
530 553 out: Vec::with_capacity(size_guess),
531 554 };
532 555
533 556 let root_nodes = writer.write_nodes(dirstate_map.root.as_ref())?;
534 557
535 558 let meta = TreeMetadata {
536 559 root_nodes,
537 560 nodes_with_entry_count: dirstate_map.nodes_with_entry_count.into(),
538 561 nodes_with_copy_source_count: dirstate_map
539 562 .nodes_with_copy_source_count
540 563 .into(),
541 564 unreachable_bytes: dirstate_map.unreachable_bytes.into(),
542 565 unused: [0; 4],
543 566 ignore_patterns_hash: dirstate_map.ignore_patterns_hash,
544 567 };
545 568 Ok((writer.out, meta.as_bytes().to_vec(), append))
546 569 }
547 570
548 571 struct Writer<'dmap, 'on_disk> {
549 572 dirstate_map: &'dmap DirstateMap<'on_disk>,
550 573 append: bool,
551 574 out: Vec<u8>,
552 575 }
553 576
554 577 impl Writer<'_, '_> {
555 578 fn write_nodes(
556 579 &mut self,
557 580 nodes: dirstate_map::ChildNodesRef,
558 581 ) -> Result<ChildNodes, DirstateError> {
559 582 // Reuse already-written nodes if possible
560 583 if self.append {
561 584 if let dirstate_map::ChildNodesRef::OnDisk(nodes_slice) = nodes {
562 585 let start = self.on_disk_offset_of(nodes_slice).expect(
563 586 "dirstate-v2 OnDisk nodes not found within on_disk",
564 587 );
565 588 let len = child_nodes_len_from_usize(nodes_slice.len());
566 589 return Ok(ChildNodes { start, len });
567 590 }
568 591 }
569 592
570 593 // `dirstate_map::ChildNodes::InMemory` contains a `HashMap` which has
571 594 // undefined iteration order. Sort to enable binary search in the
572 595 // written file.
573 596 let nodes = nodes.sorted();
574 597 let nodes_len = nodes.len();
575 598
576 599 // First accumulate serialized nodes in a `Vec`
577 600 let mut on_disk_nodes = Vec::with_capacity(nodes_len);
578 601 for node in nodes {
579 602 let children =
580 603 self.write_nodes(node.children(self.dirstate_map.on_disk)?)?;
581 604 let full_path = node.full_path(self.dirstate_map.on_disk)?;
582 605 let full_path = self.write_path(full_path.as_bytes());
583 606 let copy_source = if let Some(source) =
584 607 node.copy_source(self.dirstate_map.on_disk)?
585 608 {
586 609 self.write_path(source.as_bytes())
587 610 } else {
588 611 PathSlice {
589 612 start: 0.into(),
590 613 len: 0.into(),
591 614 }
592 615 };
593 616 on_disk_nodes.push(match node {
594 617 NodeRef::InMemory(path, node) => {
595 618 let (flags, size, mtime) = match &node.data {
596 619 dirstate_map::NodeData::Entry(entry) => {
597 620 Node::from_dirstate_entry(entry)
598 621 }
599 622 dirstate_map::NodeData::CachedDirectory { mtime } => (
600 623 // we currently never set a mtime if unknown file
601 624 // are present.
602 625 // So if we have a mtime for a directory, we know
603 626 // they are no unknown
604 627 // files and we
605 628 // blindly set ALL_UNKNOWN_RECORDED.
606 629 //
607 630 // We never set ALL_IGNORED_RECORDED since we
608 631 // don't track that case
609 632 // currently.
610 633 Flags::HAS_DIRECTORY_MTIME
611 634 | Flags::ALL_UNKNOWN_RECORDED,
612 635 0.into(),
613 636 (*mtime).into(),
614 637 ),
615 638 dirstate_map::NodeData::None => (
616 639 Flags::empty(),
617 640 0.into(),
618 641 PackedTruncatedTimestamp::null(),
619 642 ),
620 643 };
621 644 Node {
622 645 children,
623 646 copy_source,
624 647 full_path,
625 648 base_name_start: u16::try_from(path.base_name_start())
626 649 // Could only panic for paths over 64 KiB
627 650 .expect("dirstate-v2 path length overflow")
628 651 .into(),
629 652 descendants_with_entry_count: node
630 653 .descendants_with_entry_count
631 654 .into(),
632 655 tracked_descendants_count: node
633 656 .tracked_descendants_count
634 657 .into(),
635 658 flags: flags.bits().into(),
636 659 size,
637 660 mtime,
638 661 }
639 662 }
640 663 NodeRef::OnDisk(node) => Node {
641 664 children,
642 665 copy_source,
643 666 full_path,
644 667 ..*node
645 668 },
646 669 })
647 670 }
648 671 // … so we can write them contiguously, after writing everything else
649 672 // they refer to.
650 673 let start = self.current_offset();
651 674 let len = child_nodes_len_from_usize(nodes_len);
652 675 self.out.extend(on_disk_nodes.as_bytes());
653 676 Ok(ChildNodes { start, len })
654 677 }
655 678
656 679 /// If the given slice of items is within `on_disk`, returns its offset
657 680 /// from the start of `on_disk`.
658 681 fn on_disk_offset_of<T>(&self, slice: &[T]) -> Option<Offset>
659 682 where
660 683 T: BytesCast,
661 684 {
662 685 fn address_range(slice: &[u8]) -> std::ops::RangeInclusive<usize> {
663 686 let start = slice.as_ptr() as usize;
664 687 let end = start + slice.len();
665 688 start..=end
666 689 }
667 690 let slice_addresses = address_range(slice.as_bytes());
668 691 let on_disk_addresses = address_range(self.dirstate_map.on_disk);
669 692 if on_disk_addresses.contains(slice_addresses.start())
670 693 && on_disk_addresses.contains(slice_addresses.end())
671 694 {
672 695 let offset = slice_addresses.start() - on_disk_addresses.start();
673 696 Some(offset_from_usize(offset))
674 697 } else {
675 698 None
676 699 }
677 700 }
678 701
679 702 fn current_offset(&mut self) -> Offset {
680 703 let mut offset = self.out.len();
681 704 if self.append {
682 705 offset += self.dirstate_map.on_disk.len()
683 706 }
684 707 offset_from_usize(offset)
685 708 }
686 709
687 710 fn write_path(&mut self, slice: &[u8]) -> PathSlice {
688 711 let len = path_len_from_usize(slice.len());
689 712 // Reuse an already-written path if possible
690 713 if self.append {
691 714 if let Some(start) = self.on_disk_offset_of(slice) {
692 715 return PathSlice { start, len };
693 716 }
694 717 }
695 718 let start = self.current_offset();
696 719 self.out.extend(slice.as_bytes());
697 720 PathSlice { start, len }
698 721 }
699 722 }
700 723
701 724 fn offset_from_usize(x: usize) -> Offset {
702 725 u32::try_from(x)
703 726 // Could only panic for a dirstate file larger than 4 GiB
704 727 .expect("dirstate-v2 offset overflow")
705 728 .into()
706 729 }
707 730
708 731 fn child_nodes_len_from_usize(x: usize) -> Size {
709 732 u32::try_from(x)
710 733 // Could only panic with over 4 billion nodes
711 734 .expect("dirstate-v2 slice length overflow")
712 735 .into()
713 736 }
714 737
715 738 fn path_len_from_usize(x: usize) -> PathSize {
716 739 u16::try_from(x)
717 740 // Could only panic for paths over 64 KiB
718 741 .expect("dirstate-v2 path length overflow")
719 742 .into()
720 743 }
721 744
722 745 impl From<TruncatedTimestamp> for PackedTruncatedTimestamp {
723 746 fn from(timestamp: TruncatedTimestamp) -> Self {
724 747 Self {
725 748 truncated_seconds: timestamp.truncated_seconds().into(),
726 749 nanoseconds: timestamp.nanoseconds().into(),
727 750 }
728 751 }
729 752 }
730 753
731 754 impl TryFrom<PackedTruncatedTimestamp> for TruncatedTimestamp {
732 755 type Error = DirstateV2ParseError;
733 756
734 757 fn try_from(
735 758 timestamp: PackedTruncatedTimestamp,
736 759 ) -> Result<Self, Self::Error> {
737 760 Self::from_already_truncated(
738 761 timestamp.truncated_seconds.get(),
739 762 timestamp.nanoseconds.get(),
740 763 )
741 764 }
742 765 }
743 766 impl PackedTruncatedTimestamp {
744 767 fn null() -> Self {
745 768 Self {
746 769 truncated_seconds: 0.into(),
747 770 nanoseconds: 0.into(),
748 771 }
749 772 }
750 773 }
General Comments 0
You need to be logged in to leave comments. Login now