##// END OF EJS Templates
dirstate-v2: Document flags/mode/size/mtime fields of tree nodes...
Simon Sapin -
r49002:77fc340a default
parent child Browse files
Show More
@@ -1,376 +1,484 b''
1 1 The *dirstate* is what Mercurial uses internally to track
2 2 the state of files in the working directory,
3 3 such as set by commands like `hg add` and `hg rm`.
4 4 It also contains some cached data that help make `hg status` faster.
5 5 The name refers both to `.hg/dirstate` on the filesystem
6 6 and the corresponding data structure in memory while a Mercurial process
7 7 is running.
8 8
9 9 The original file format, retroactively dubbed `dirstate-v1`,
10 10 is described at https://www.mercurial-scm.org/wiki/DirState.
11 11 It is made of a flat sequence of unordered variable-size entries,
12 12 so accessing any information in it requires parsing all of it.
13 13 Similarly, saving changes requires rewriting the entire file.
14 14
15 15 The newer `dirsate-v2` file format is designed to fix these limitations
16 16 and make `hg status` faster.
17 17
18 18 User guide
19 19 ==========
20 20
21 21 Compatibility
22 22 -------------
23 23
24 24 The file format is experimental and may still change.
25 25 Different versions of Mercurial may not be compatible with each other
26 26 when working on a local repository that uses this format.
27 27 When using an incompatible version with the experimental format,
28 28 anything can happen including data corruption.
29 29
30 30 Since the dirstate is entirely local and not relevant to the wire protocol,
31 31 `dirstate-v2` does not affect compatibility with remote Mercurial versions.
32 32
33 33 When `share-safe` is enabled, different repositories sharing the same store
34 34 can use different dirstate formats.
35 35
36 36 Enabling `dirsate-v2` for new local repositories
37 37 ------------------------------------------------
38 38
39 39 When creating a new local repository such as with `hg init` or `hg clone`,
40 40 the `exp-dirstate-v2` boolean in the `format` configuration section
41 41 controls whether to use this file format.
42 42 This is disabled by default as of this writing.
43 43 To enable it for a single repository, run for example::
44 44
45 45 $ hg init my-project --config format.exp-dirstate-v2=1
46 46
47 47 Checking the format of an existing local repsitory
48 48 --------------------------------------------------
49 49
50 50 The `debugformat` commands prints information about
51 51 which of multiple optional formats are used in the current repository,
52 52 including `dirstate-v2`::
53 53
54 54 $ hg debugformat
55 55 format-variant repo
56 56 fncache: yes
57 57 dirstate-v2: yes
58 58 […]
59 59
60 60 Upgrading or downgrading an existing local repository
61 61 -----------------------------------------------------
62 62
63 63 The `debugupgrade` command does various upgrades or downgrades
64 64 on a local repository
65 65 based on the current Mercurial version and on configuration.
66 66 The same `format.exp-dirstate-v2` configuration is used again.
67 67
68 68 Example to upgrade::
69 69
70 70 $ hg debugupgrade --config format.exp-dirstate-v2=1
71 71
72 72 Example to downgrade to `dirstate-v1`::
73 73
74 74 $ hg debugupgrade --config format.exp-dirstate-v2=0
75 75
76 76 Both of this commands do nothing but print a list of proposed changes,
77 77 which may include changes unrelated to the dirstate.
78 78 Those other changes are controlled by their own configuration keys.
79 79 Add `--run` to a command to actually apply the proposed changes.
80 80
81 81 Backups of `.hg/requires` and `.hg/dirstate` are created
82 82 in a `.hg/upgradebackup.*` directory.
83 83 If something goes wrong, restoring those files should undo the change.
84 84
85 85 Note that upgrading affects compatibility with older versions of Mercurial
86 86 as noted above.
87 87 This can be relevant when a repository’s files are on a USB drive
88 88 or some other removable media, or shared over the network, etc.
89 89
90 90 Internal filesystem representation
91 91 ==================================
92 92
93 93 Requirements file
94 94 -----------------
95 95
96 96 The `.hg/requires` file indicates which of various optional file formats
97 97 are used by a given repository.
98 98 Mercurial aborts when seeing a requirement it does not know about,
99 99 which avoids older version accidentally messing up a respository
100 100 that uses a format that was introduced later.
101 101 For versions that do support a format, the presence or absence of
102 102 the corresponding requirement indicates whether to use that format.
103 103
104 104 When the file contains a `exp-dirstate-v2` line,
105 105 the `dirstate-v2` format is used.
106 106 With no such line `dirstate-v1` is used.
107 107
108 108 High level description
109 109 ----------------------
110 110
111 111 Whereas `dirstate-v1` uses a single `.hg/disrtate` file,
112 112 in `dirstate-v2` that file is a "docket" file
113 113 that only contains some metadata
114 114 and points to separate data file named `.hg/dirstate.{ID}`,
115 115 where `{ID}` is a random identifier.
116 116
117 117 This separation allows making data files append-only
118 118 and therefore safer to memory-map.
119 119 Creating a new data file (occasionally to clean up unused data)
120 120 can be done with a different ID
121 121 without disrupting another Mercurial process
122 122 that could still be using the previous data file.
123 123
124 124 Both files have a format designed to reduce the need for parsing,
125 125 by using fixed-size binary components as much as possible.
126 126 For data that is not fixed-size,
127 127 references to other parts of a file can be made by storing "pseudo-pointers":
128 128 integers counted in bytes from the start of a file.
129 129 For read-only access no data structure is needed,
130 130 only a bytes buffer (possibly memory-mapped directly from the filesystem)
131 131 with specific parts read on demand.
132 132
133 133 The data file contains "nodes" organized in a tree.
134 134 Each node represents a file or directory inside the working directory
135 135 or its parent changeset.
136 136 This tree has the same structure as the filesystem,
137 137 so a node representing a directory has child nodes representing
138 138 the files and subdirectories contained directly in that directory.
139 139
140 140 The docket file format
141 141 ----------------------
142 142
143 143 This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
144 144 and `mercurial/dirstateutils/docket.py`.
145 145
146 146 Components of the docket file are found at fixed offsets,
147 147 counted in bytes from the start of the file:
148 148
149 149 * Offset 0:
150 150 The 12-bytes marker string "dirstate-v2\n" ending with a newline character.
151 151 This makes it easier to tell a dirstate-v2 file from a dirstate-v1 file,
152 152 although it is not strictly necessary
153 153 since `.hg/requires` determines which format to use.
154 154
155 155 * Offset 12:
156 156 The changeset node ID on the first parent of the working directory,
157 157 as up to 32 binary bytes.
158 158 If a node ID is shorter (20 bytes for SHA-1),
159 159 it is start-aligned and the rest of the bytes are set to zero.
160 160
161 161 * Offset 44:
162 162 The changeset node ID on the second parent of the working directory,
163 163 or all zeros if there isn’t one.
164 164 Also 32 binary bytes.
165 165
166 166 * Offset 76:
167 167 Tree metadata on 44 bytes, described below.
168 168 Its separation in this documentation from the rest of the docket
169 169 reflects a detail of the current implementation.
170 170 Since tree metadata is also made of fields at fixed offsets, those could
171 171 be inlined here by adding 76 bytes to each offset.
172 172
173 173 * Offset 120:
174 174 The used size of the data file, as a 32-bit big-endian integer.
175 175 The actual size of the data file may be larger
176 176 (if another Mercurial processis in appending to it
177 177 but has not updated the docket yet).
178 178 That extra data must be ignored.
179 179
180 180 * Offset 124:
181 181 The length of the data file identifier, as a 8-bit integer.
182 182
183 183 * Offset 125:
184 184 The data file identifier.
185 185
186 186 * Any additional data is current ignored, and dropped when updating the file.
187 187
188 188 Tree metadata in the docket file
189 189 --------------------------------
190 190
191 191 Tree metadata is similarly made of components at fixed offsets.
192 192 These offsets are counted in bytes from the start of tree metadata,
193 193 which is 76 bytes after the start of the docket file.
194 194
195 195 This metadata can be thought of as the singular root of the tree
196 196 formed by nodes in the data file.
197 197
198 198 * Offset 0:
199 199 Pseudo-pointer to the start of root nodes,
200 200 counted in bytes from the start of the data file,
201 201 as a 32-bit big-endian integer.
202 202 These nodes describe files and directories found directly
203 203 at the root of the working directory.
204 204
205 205 * Offset 4:
206 206 Number of root nodes, as a 32-bit big-endian integer.
207 207
208 208 * Offset 8:
209 209 Total number of nodes in the entire tree that "have a dirstate entry",
210 210 as a 32-bit big-endian integer.
211 211 Those nodes represent files that would be present at all in `dirstate-v1`.
212 212 This is typically less than the total number of nodes.
213 213 This counter is used to implement `len(dirstatemap)`.
214 214
215 215 * Offset 12:
216 216 Number of nodes in the entire tree that have a copy source,
217 217 as a 32-bit big-endian integer.
218 218 At the next commit, these files are recorded
219 219 as having been copied or moved/renamed from that source.
220 220 (A move is recorded as a copy and separate removal of the source.)
221 221 This counter is used to implement `len(dirstatemap.copymap)`.
222 222
223 223 * Offset 16:
224 224 An estimation of how many bytes of the data file
225 225 (within its used size) are unused, as a 32-bit big-endian integer.
226 226 When appending to an existing data file,
227 227 some existing nodes or paths can be unreachable from the new root
228 228 but they still take up space.
229 229 This counter is used to decide when to write a new data file from scratch
230 230 instead of appending to an existing one,
231 231 in order to get rid of that unreachable data
232 232 and avoid unbounded file size growth.
233 233
234 234 * Offset 20:
235 235 These four bytes are currently ignored
236 236 and reset to zero when updating a docket file.
237 237 This is an attempt at forward compatibility:
238 238 future Mercurial versions could use this as a bit field
239 239 to indicate that a dirstate has additional data or constraints.
240 240 Finding a dirstate file with the relevant bit unset indicates that
241 241 it was written by a then-older version
242 242 which is not aware of that future change.
243 243
244 244 * Offset 24:
245 245 Either 20 zero bytes, or a SHA-1 hash as 20 binary bytes.
246 246 When present, the hash is of ignore patterns
247 247 that were used for some previous run of the `status` algorithm.
248 248
249 249 * (Offset 44: end of tree metadata)
250 250
251 251 Optional hash of ignore patterns
252 252 --------------------------------
253 253
254 254 The implementation of `status` at `rust/hg-core/src/dirstate_tree/status.rs`
255 255 has been optimized such that its run time is dominated by calls
256 256 to `stat` for reading the filesystem metadata of a file or directory,
257 257 and to `readdir` for listing the contents of a directory.
258 258 In some cases the algorithm can skip calls to `readdir`
259 259 (saving significant time)
260 260 because the dirstate already contains enough of the relevant information
261 261 to build the correct `status` results.
262 262
263 263 The default configuration of `hg status` is to list unknown files
264 264 but not ignored files.
265 265 In this case, it matters for the `readdir`-skipping optimization
266 266 if a given file used to be ignored but became unknown
267 267 because `.hgignore` changed.
268 268 To detect the possibility of such a change,
269 269 the tree metadata contains an optional hash of all ignore patterns.
270 270
271 271 We define:
272 272
273 273 * "Root" ignore files as:
274 274
275 275 - `.hgignore` at the root of the repository if it exists
276 276 - And all files from `ui.ignore.*` config.
277 277
278 278 This set of files is sorted by the string representation of their path.
279 279
280 280 * The "expanded contents" of an ignore files is the byte string made
281 281 by the concatenation of its contents followed by the "expanded contents"
282 282 of other files included with `include:` or `subinclude:` directives,
283 283 in inclusion order. This definition is recursive, as included files can
284 284 themselves include more files.
285 285
286 286 This hash is defined as the SHA-1 of the concatenation (in sorted
287 287 order) of the "expanded contents" of each "root" ignore file.
288 288 (Note that computing this does not require actually concatenating
289 289 into a single contiguous byte sequence.
290 290 Instead a SHA-1 hasher object can be created
291 291 and fed separate chunks one by one.)
292 292
293 293 The data file format
294 294 --------------------
295 295
296 296 This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
297 297 and `mercurial/dirstateutils/v2.py`.
298 298
299 299 The data file contains two types of data: paths and nodes.
300 300
301 301 Paths and nodes can be organized in any order in the file, except that sibling
302 302 nodes must be next to each other and sorted by their path.
303 303 Contiguity lets the parent refer to them all
304 304 by their count and a single pseudo-pointer,
305 305 instead of storing one pseudo-pointer per child node.
306 306 Sorting allows using binary seach to find a child node with a given name
307 307 in `O(log(n))` byte sequence comparisons.
308 308
309 309 The current implemention writes paths and child node before a given node
310 310 for ease of figuring out the value of pseudo-pointers by the time the are to be
311 311 written, but this is not an obligation and readers must not rely on it.
312 312
313 313 A path is stored as a byte string anywhere in the file, without delimiter.
314 314 It is refered to by one or more node by a pseudo-pointer to its start, and its
315 315 length in bytes. Since there is no delimiter,
316 316 when a path is a substring of another the same bytes could be reused,
317 317 although the implementation does not exploit this as of this writing.
318 318
319 319 A node is stored on 43 bytes with components at fixed offsets. Paths and
320 320 child nodes relevant to a node are stored externally and referenced though
321 321 pseudo-pointers.
322 322
323 323 All integers are stored in big-endian. All pseudo-pointers are 32-bit integers
324 324 counting bytes from the start of the data file. Path lengths and positions
325 325 are 16-bit integers, also counted in bytes.
326 326
327 327 Node components are:
328 328
329 329 * Offset 0:
330 330 Pseudo-pointer to the full path of this node,
331 331 from the working directory root.
332 332
333 333 * Offset 4:
334 334 Length of the full path.
335 335
336 336 * Offset 6:
337 337 Position of the last `/` path separator within the full path,
338 338 in bytes from the start of the full path,
339 339 or zero if there isn’t one.
340 340 The part of the full path after this position is the "base name".
341 341 Since sibling nodes have the same parent, only their base name vary
342 342 and needs to be considered when doing binary search to find a given path.
343 343
344 344 * Offset 8:
345 345 Pseudo-pointer to the "copy source" path for this node,
346 346 or zero if there is no copy source.
347 347
348 348 * Offset 12:
349 349 Length of the copy source path, or zero if there isn’t one.
350 350
351 351 * Offset 14:
352 352 Pseudo-pointer to the start of child nodes.
353 353
354 354 * Offset 18:
355 355 Number of child nodes, as a 32-bit integer.
356 356 They occupy 43 times this number of bytes
357 357 (not counting space for paths, and further descendants).
358 358
359 359 * Offset 22:
360 360 Number as a 32-bit integer of descendant nodes in this subtree,
361 361 not including this node itself,
362 362 that "have a dirstate entry".
363 363 Those nodes represent files that would be present at all in `dirstate-v1`.
364 364 This is typically less than the total number of descendants.
365 365 This counter is used to implement `has_dir`.
366 366
367 367 * Offset 26:
368 368 Number as a 32-bit integer of descendant nodes in this subtree,
369 369 not including this node itself,
370 370 that represent files tracked in the working directory.
371 371 (For example, `hg rm` makes a file untracked.)
372 372 This counter is used to implement `has_tracked_dir`.
373 373
374 * Offset 30 and more:
375 **TODO:** docs not written yet
376 as this part of the format might be changing soon.
374 * Offset 30:
375 Some boolean values packed as bits of a single byte.
376 Starting from least-significant, bit masks are::
377
378 WDIR_TRACKED = 1 << 0
379 P1_TRACKED = 1 << 1
380 P2_INFO = 1 << 2
381 HAS_MODE_AND_SIZE = 1 << 3
382 HAS_MTIME = 1 << 4
383
384 Other bits are unset. The meaning of these bits are:
385
386 `WDIR_TRACKED`
387 Set if the working directory contains a tracked file at this node’s path.
388 This is typically set and unset by `hg add` and `hg rm`.
389
390 `P1_TRACKED`
391 set if the working directory’s first parent changeset
392 (whose node identifier is found in tree metadata)
393 contains a tracked file at this node’s path.
394 This is a cache to reduce manifest lookups.
395
396 `P2_INFO`
397 Set if the file has been involved in some merge operation.
398 Either because it was actually merged,
399 or because the version in the second parent p2 version was ahead,
400 or because some rename moved it there.
401 In either case `hg status` will want it displayed as modified.
402
403 Files that would be mentioned at all in the `dirstate-v1` file format
404 have a node with at least one of the above three bits set in `dirstate-v2`.
405 Let’s call these files "tracked anywhere",
406 and "untracked" the nodes with all three of these bits unset.
407 Untracked nodes are typically for directories:
408 they hold child nodes and form the tree structure.
409 Additional untracked nodes may also exist.
410 Although implementations should strive to clean up nodes
411 that are entirely unused, other untracked nodes may also exist.
412 For example, a future version of Mercurial might in some cases
413 add nodes for untracked files or/and ignored files in the working directory
414 in order to optimize `hg status`
415 by enabling it to skip `readdir` in more cases.
416
417 When a node is for a file tracked anywhere,
418 the rest of the node data is three fields:
419
420 * Offset 31:
421 If `HAS_MODE_AND_SIZE` is unset, four zero bytes.
422 Otherwise, a 32-bit integer for the Unix mode (as in `stat_result.st_mode`)
423 expected for this file to be considered clean.
424 Only the `S_IXUSR` bit (owner has execute permission) is considered.
425
426 * Offset 35:
427 If `HAS_MTIME` is unset, four zero bytes.
428 Otherwise, a 32-bit integer for expected modified time of the file
429 (as in `stat_result.st_mtime`),
430 truncated to its 31 least-significant bits.
431 Unlike in dirstate-v1, negative values are not used.
432
433 * Offset 39:
434 If `HAS_MODE_AND_SIZE` is unset, four zero bytes.
435 Otherwise, a 32-bit integer for expected size of the file
436 truncated to its 31 least-significant bits.
437 Unlike in dirstate-v1, negative values are not used.
438
439 If an untracked node `HAS_MTIME` *unset*, this space is unused:
440
441 * Offset 31:
442 12 bytes set to zero
443
444 If an untracked node `HAS_MTIME` *set*,
445 what follows is the modification time of a directory
446 represented with separated second and sub-second components
447 since the Unix epoch:
448
449 * Offset 31:
450 The number of seconds as a signed (two’s complement) 64-bit integer.
451
452 * Offset 39:
453 The number of nanoseconds as 32-bit integer.
454 Always greater than or equal to zero, and strictly less than a billion.
455 Increasing this component makes the modification time
456 go forward or backward in time dependening
457 on the sign of the integral seconds components.
458 (Note: this is buggy because there is no negative zero integer,
459 but will be changed soon.)
460
461 The presence of a directory modification time means that at some point,
462 this path in the working directory was observed:
463
464 - To be a directory
465 - With the given modification time
466 - That time was already strictly in the past when observed,
467 meaning that later changes cannot happen in the same clock tick
468 and must cause a different modification time
469 (unless the system clock jumps back and we get unlucky,
470 which is not impossible but deemed unlikely enough).
471 - All direct children of this directory
472 (as returned by `std::fs::read_dir`)
473 either have a corresponding dirstate node,
474 or are ignored by ignore patterns whose hash is in tree metadata.
475
476 This means that if `std::fs::symlink_metadata` later reports
477 the same modification time
478 and ignored patterns haven’t changed,
479 a run of status that is not listing ignored files
480 can skip calling `std::fs::read_dir` again for this directory,
481 and iterate child dirstate nodes instead.
482
483
484 * (Offset 43: end of this node)
@@ -1,736 +1,736 b''
1 1 # parsers.py - Python implementation of parsers.c
2 2 #
3 3 # Copyright 2009 Olivia Mackall <olivia@selenic.com> and others
4 4 #
5 5 # This software may be used and distributed according to the terms of the
6 6 # GNU General Public License version 2 or any later version.
7 7
8 8 from __future__ import absolute_import
9 9
10 10 import struct
11 11 import zlib
12 12
13 13 from ..node import (
14 14 nullrev,
15 15 sha1nodeconstants,
16 16 )
17 17 from ..thirdparty import attr
18 18 from .. import (
19 19 error,
20 20 pycompat,
21 21 revlogutils,
22 22 util,
23 23 )
24 24
25 25 from ..revlogutils import nodemap as nodemaputil
26 26 from ..revlogutils import constants as revlog_constants
27 27
28 28 stringio = pycompat.bytesio
29 29
30 30
31 31 _pack = struct.pack
32 32 _unpack = struct.unpack
33 33 _compress = zlib.compress
34 34 _decompress = zlib.decompress
35 35
36 36
37 37 # a special value used internally for `size` if the file come from the other parent
38 38 FROM_P2 = -2
39 39
40 40 # a special value used internally for `size` if the file is modified/merged/added
41 41 NONNORMAL = -1
42 42
43 43 # a special value used internally for `time` if the time is ambigeous
44 44 AMBIGUOUS_TIME = -1
45 45
46 46
47 47 @attr.s(slots=True, init=False)
48 48 class DirstateItem(object):
49 49 """represent a dirstate entry
50 50
51 51 It hold multiple attributes
52 52
53 53 # about file tracking
54 54 - wc_tracked: is the file tracked by the working copy
55 55 - p1_tracked: is the file tracked in working copy first parent
56 56 - p2_info: the file has been involved in some merge operation. Either
57 57 because it was actually merged, or because the p2 version was
58 ahead, or because some renamed moved it there. In either case
58 ahead, or because some rename moved it there. In either case
59 59 `hg status` will want it displayed as modified.
60 60
61 61 # about the file state expected from p1 manifest:
62 62 - mode: the file mode in p1
63 63 - size: the file size in p1
64 64
65 65 These value can be set to None, which mean we don't have a meaningful value
66 66 to compare with. Either because we don't really care about them as there
67 67 `status` is known without having to look at the disk or because we don't
68 68 know these right now and a full comparison will be needed to find out if
69 69 the file is clean.
70 70
71 71 # about the file state on disk last time we saw it:
72 72 - mtime: the last known clean mtime for the file.
73 73
74 74 This value can be set to None if no cachable state exist. Either because we
75 75 do not care (see previous section) or because we could not cache something
76 76 yet.
77 77 """
78 78
79 79 _wc_tracked = attr.ib()
80 80 _p1_tracked = attr.ib()
81 81 _p2_info = attr.ib()
82 82 _mode = attr.ib()
83 83 _size = attr.ib()
84 84 _mtime = attr.ib()
85 85
86 86 def __init__(
87 87 self,
88 88 wc_tracked=False,
89 89 p1_tracked=False,
90 90 p2_info=False,
91 91 has_meaningful_data=True,
92 92 has_meaningful_mtime=True,
93 93 parentfiledata=None,
94 94 ):
95 95 self._wc_tracked = wc_tracked
96 96 self._p1_tracked = p1_tracked
97 97 self._p2_info = p2_info
98 98
99 99 self._mode = None
100 100 self._size = None
101 101 self._mtime = None
102 102 if parentfiledata is None:
103 103 has_meaningful_mtime = False
104 104 has_meaningful_data = False
105 105 if has_meaningful_data:
106 106 self._mode = parentfiledata[0]
107 107 self._size = parentfiledata[1]
108 108 if has_meaningful_mtime:
109 109 self._mtime = parentfiledata[2]
110 110
111 111 @classmethod
112 112 def from_v1_data(cls, state, mode, size, mtime):
113 113 """Build a new DirstateItem object from V1 data
114 114
115 115 Since the dirstate-v1 format is frozen, the signature of this function
116 116 is not expected to change, unlike the __init__ one.
117 117 """
118 118 if state == b'm':
119 119 return cls(wc_tracked=True, p1_tracked=True, p2_info=True)
120 120 elif state == b'a':
121 121 return cls(wc_tracked=True)
122 122 elif state == b'r':
123 123 if size == NONNORMAL:
124 124 p1_tracked = True
125 125 p2_info = True
126 126 elif size == FROM_P2:
127 127 p1_tracked = False
128 128 p2_info = True
129 129 else:
130 130 p1_tracked = True
131 131 p2_info = False
132 132 return cls(p1_tracked=p1_tracked, p2_info=p2_info)
133 133 elif state == b'n':
134 134 if size == FROM_P2:
135 135 return cls(wc_tracked=True, p2_info=True)
136 136 elif size == NONNORMAL:
137 137 return cls(wc_tracked=True, p1_tracked=True)
138 138 elif mtime == AMBIGUOUS_TIME:
139 139 return cls(
140 140 wc_tracked=True,
141 141 p1_tracked=True,
142 142 has_meaningful_mtime=False,
143 143 parentfiledata=(mode, size, 42),
144 144 )
145 145 else:
146 146 return cls(
147 147 wc_tracked=True,
148 148 p1_tracked=True,
149 149 parentfiledata=(mode, size, mtime),
150 150 )
151 151 else:
152 152 raise RuntimeError(b'unknown state: %s' % state)
153 153
154 154 def set_possibly_dirty(self):
155 155 """Mark a file as "possibly dirty"
156 156
157 157 This means the next status call will have to actually check its content
158 158 to make sure it is correct.
159 159 """
160 160 self._mtime = None
161 161
162 162 def set_clean(self, mode, size, mtime):
163 163 """mark a file as "clean" cancelling potential "possibly dirty call"
164 164
165 165 Note: this function is a descendant of `dirstate.normal` and is
166 166 currently expected to be call on "normal" entry only. There are not
167 167 reason for this to not change in the future as long as the ccode is
168 168 updated to preserve the proper state of the non-normal files.
169 169 """
170 170 self._wc_tracked = True
171 171 self._p1_tracked = True
172 172 self._mode = mode
173 173 self._size = size
174 174 self._mtime = mtime
175 175
176 176 def set_tracked(self):
177 177 """mark a file as tracked in the working copy
178 178
179 179 This will ultimately be called by command like `hg add`.
180 180 """
181 181 self._wc_tracked = True
182 182 # `set_tracked` is replacing various `normallookup` call. So we mark
183 183 # the files as needing lookup
184 184 #
185 185 # Consider dropping this in the future in favor of something less broad.
186 186 self._mtime = None
187 187
188 188 def set_untracked(self):
189 189 """mark a file as untracked in the working copy
190 190
191 191 This will ultimately be called by command like `hg remove`.
192 192 """
193 193 self._wc_tracked = False
194 194 self._mode = None
195 195 self._size = None
196 196 self._mtime = None
197 197
198 198 def drop_merge_data(self):
199 199 """remove all "merge-only" from a DirstateItem
200 200
201 201 This is to be call by the dirstatemap code when the second parent is dropped
202 202 """
203 203 if self._p2_info:
204 204 self._p2_info = False
205 205 self._mode = None
206 206 self._size = None
207 207 self._mtime = None
208 208
209 209 @property
210 210 def mode(self):
211 211 return self.v1_mode()
212 212
213 213 @property
214 214 def size(self):
215 215 return self.v1_size()
216 216
217 217 @property
218 218 def mtime(self):
219 219 return self.v1_mtime()
220 220
221 221 @property
222 222 def state(self):
223 223 """
224 224 States are:
225 225 n normal
226 226 m needs merging
227 227 r marked for removal
228 228 a marked for addition
229 229
230 230 XXX This "state" is a bit obscure and mostly a direct expression of the
231 231 dirstatev1 format. It would make sense to ultimately deprecate it in
232 232 favor of the more "semantic" attributes.
233 233 """
234 234 if not self.any_tracked:
235 235 return b'?'
236 236 return self.v1_state()
237 237
238 238 @property
239 239 def tracked(self):
240 240 """True is the file is tracked in the working copy"""
241 241 return self._wc_tracked
242 242
243 243 @property
244 244 def any_tracked(self):
245 245 """True is the file is tracked anywhere (wc or parents)"""
246 246 return self._wc_tracked or self._p1_tracked or self._p2_info
247 247
248 248 @property
249 249 def added(self):
250 250 """True if the file has been added"""
251 251 return self._wc_tracked and not (self._p1_tracked or self._p2_info)
252 252
253 253 @property
254 254 def maybe_clean(self):
255 255 """True if the file has a chance to be in the "clean" state"""
256 256 if not self._wc_tracked:
257 257 return False
258 258 elif not self._p1_tracked:
259 259 return False
260 260 elif self._p2_info:
261 261 return False
262 262 return True
263 263
264 264 @property
265 265 def p1_tracked(self):
266 266 """True if the file is tracked in the first parent manifest"""
267 267 return self._p1_tracked
268 268
269 269 @property
270 270 def p2_info(self):
271 271 """True if the file needed to merge or apply any input from p2
272 272
273 273 See the class documentation for details.
274 274 """
275 275 return self._wc_tracked and self._p2_info
276 276
277 277 @property
278 278 def removed(self):
279 279 """True if the file has been removed"""
280 280 return not self._wc_tracked and (self._p1_tracked or self._p2_info)
281 281
282 282 def v1_state(self):
283 283 """return a "state" suitable for v1 serialization"""
284 284 if not self.any_tracked:
285 285 # the object has no state to record, this is -currently-
286 286 # unsupported
287 287 raise RuntimeError('untracked item')
288 288 elif self.removed:
289 289 return b'r'
290 290 elif self._p1_tracked and self._p2_info:
291 291 return b'm'
292 292 elif self.added:
293 293 return b'a'
294 294 else:
295 295 return b'n'
296 296
297 297 def v1_mode(self):
298 298 """return a "mode" suitable for v1 serialization"""
299 299 return self._mode if self._mode is not None else 0
300 300
301 301 def v1_size(self):
302 302 """return a "size" suitable for v1 serialization"""
303 303 if not self.any_tracked:
304 304 # the object has no state to record, this is -currently-
305 305 # unsupported
306 306 raise RuntimeError('untracked item')
307 307 elif self.removed and self._p1_tracked and self._p2_info:
308 308 return NONNORMAL
309 309 elif self._p2_info:
310 310 return FROM_P2
311 311 elif self.removed:
312 312 return 0
313 313 elif self.added:
314 314 return NONNORMAL
315 315 elif self._size is None:
316 316 return NONNORMAL
317 317 else:
318 318 return self._size
319 319
320 320 def v1_mtime(self):
321 321 """return a "mtime" suitable for v1 serialization"""
322 322 if not self.any_tracked:
323 323 # the object has no state to record, this is -currently-
324 324 # unsupported
325 325 raise RuntimeError('untracked item')
326 326 elif self.removed:
327 327 return 0
328 328 elif self._mtime is None:
329 329 return AMBIGUOUS_TIME
330 330 elif self._p2_info:
331 331 return AMBIGUOUS_TIME
332 332 elif not self._p1_tracked:
333 333 return AMBIGUOUS_TIME
334 334 else:
335 335 return self._mtime
336 336
337 337 def need_delay(self, now):
338 338 """True if the stored mtime would be ambiguous with the current time"""
339 339 return self.v1_state() == b'n' and self.v1_mtime() == now
340 340
341 341
342 342 def gettype(q):
343 343 return int(q & 0xFFFF)
344 344
345 345
346 346 class BaseIndexObject(object):
347 347 # Can I be passed to an algorithme implemented in Rust ?
348 348 rust_ext_compat = 0
349 349 # Format of an index entry according to Python's `struct` language
350 350 index_format = revlog_constants.INDEX_ENTRY_V1
351 351 # Size of a C unsigned long long int, platform independent
352 352 big_int_size = struct.calcsize(b'>Q')
353 353 # Size of a C long int, platform independent
354 354 int_size = struct.calcsize(b'>i')
355 355 # An empty index entry, used as a default value to be overridden, or nullrev
356 356 null_item = (
357 357 0,
358 358 0,
359 359 0,
360 360 -1,
361 361 -1,
362 362 -1,
363 363 -1,
364 364 sha1nodeconstants.nullid,
365 365 0,
366 366 0,
367 367 revlog_constants.COMP_MODE_INLINE,
368 368 revlog_constants.COMP_MODE_INLINE,
369 369 )
370 370
371 371 @util.propertycache
372 372 def entry_size(self):
373 373 return self.index_format.size
374 374
375 375 @property
376 376 def nodemap(self):
377 377 msg = b"index.nodemap is deprecated, use index.[has_node|rev|get_rev]"
378 378 util.nouideprecwarn(msg, b'5.3', stacklevel=2)
379 379 return self._nodemap
380 380
381 381 @util.propertycache
382 382 def _nodemap(self):
383 383 nodemap = nodemaputil.NodeMap({sha1nodeconstants.nullid: nullrev})
384 384 for r in range(0, len(self)):
385 385 n = self[r][7]
386 386 nodemap[n] = r
387 387 return nodemap
388 388
389 389 def has_node(self, node):
390 390 """return True if the node exist in the index"""
391 391 return node in self._nodemap
392 392
393 393 def rev(self, node):
394 394 """return a revision for a node
395 395
396 396 If the node is unknown, raise a RevlogError"""
397 397 return self._nodemap[node]
398 398
399 399 def get_rev(self, node):
400 400 """return a revision for a node
401 401
402 402 If the node is unknown, return None"""
403 403 return self._nodemap.get(node)
404 404
405 405 def _stripnodes(self, start):
406 406 if '_nodemap' in vars(self):
407 407 for r in range(start, len(self)):
408 408 n = self[r][7]
409 409 del self._nodemap[n]
410 410
411 411 def clearcaches(self):
412 412 self.__dict__.pop('_nodemap', None)
413 413
414 414 def __len__(self):
415 415 return self._lgt + len(self._extra)
416 416
417 417 def append(self, tup):
418 418 if '_nodemap' in vars(self):
419 419 self._nodemap[tup[7]] = len(self)
420 420 data = self._pack_entry(len(self), tup)
421 421 self._extra.append(data)
422 422
423 423 def _pack_entry(self, rev, entry):
424 424 assert entry[8] == 0
425 425 assert entry[9] == 0
426 426 return self.index_format.pack(*entry[:8])
427 427
428 428 def _check_index(self, i):
429 429 if not isinstance(i, int):
430 430 raise TypeError(b"expecting int indexes")
431 431 if i < 0 or i >= len(self):
432 432 raise IndexError
433 433
434 434 def __getitem__(self, i):
435 435 if i == -1:
436 436 return self.null_item
437 437 self._check_index(i)
438 438 if i >= self._lgt:
439 439 data = self._extra[i - self._lgt]
440 440 else:
441 441 index = self._calculate_index(i)
442 442 data = self._data[index : index + self.entry_size]
443 443 r = self._unpack_entry(i, data)
444 444 if self._lgt and i == 0:
445 445 offset = revlogutils.offset_type(0, gettype(r[0]))
446 446 r = (offset,) + r[1:]
447 447 return r
448 448
449 449 def _unpack_entry(self, rev, data):
450 450 r = self.index_format.unpack(data)
451 451 r = r + (
452 452 0,
453 453 0,
454 454 revlog_constants.COMP_MODE_INLINE,
455 455 revlog_constants.COMP_MODE_INLINE,
456 456 )
457 457 return r
458 458
459 459 def pack_header(self, header):
460 460 """pack header information as binary"""
461 461 v_fmt = revlog_constants.INDEX_HEADER
462 462 return v_fmt.pack(header)
463 463
464 464 def entry_binary(self, rev):
465 465 """return the raw binary string representing a revision"""
466 466 entry = self[rev]
467 467 p = revlog_constants.INDEX_ENTRY_V1.pack(*entry[:8])
468 468 if rev == 0:
469 469 p = p[revlog_constants.INDEX_HEADER.size :]
470 470 return p
471 471
472 472
473 473 class IndexObject(BaseIndexObject):
474 474 def __init__(self, data):
475 475 assert len(data) % self.entry_size == 0, (
476 476 len(data),
477 477 self.entry_size,
478 478 len(data) % self.entry_size,
479 479 )
480 480 self._data = data
481 481 self._lgt = len(data) // self.entry_size
482 482 self._extra = []
483 483
484 484 def _calculate_index(self, i):
485 485 return i * self.entry_size
486 486
487 487 def __delitem__(self, i):
488 488 if not isinstance(i, slice) or not i.stop == -1 or i.step is not None:
489 489 raise ValueError(b"deleting slices only supports a:-1 with step 1")
490 490 i = i.start
491 491 self._check_index(i)
492 492 self._stripnodes(i)
493 493 if i < self._lgt:
494 494 self._data = self._data[: i * self.entry_size]
495 495 self._lgt = i
496 496 self._extra = []
497 497 else:
498 498 self._extra = self._extra[: i - self._lgt]
499 499
500 500
501 501 class PersistentNodeMapIndexObject(IndexObject):
502 502 """a Debug oriented class to test persistent nodemap
503 503
504 504 We need a simple python object to test API and higher level behavior. See
505 505 the Rust implementation for more serious usage. This should be used only
506 506 through the dedicated `devel.persistent-nodemap` config.
507 507 """
508 508
509 509 def nodemap_data_all(self):
510 510 """Return bytes containing a full serialization of a nodemap
511 511
512 512 The nodemap should be valid for the full set of revisions in the
513 513 index."""
514 514 return nodemaputil.persistent_data(self)
515 515
516 516 def nodemap_data_incremental(self):
517 517 """Return bytes containing a incremental update to persistent nodemap
518 518
519 519 This containst the data for an append-only update of the data provided
520 520 in the last call to `update_nodemap_data`.
521 521 """
522 522 if self._nm_root is None:
523 523 return None
524 524 docket = self._nm_docket
525 525 changed, data = nodemaputil.update_persistent_data(
526 526 self, self._nm_root, self._nm_max_idx, self._nm_docket.tip_rev
527 527 )
528 528
529 529 self._nm_root = self._nm_max_idx = self._nm_docket = None
530 530 return docket, changed, data
531 531
532 532 def update_nodemap_data(self, docket, nm_data):
533 533 """provide full block of persisted binary data for a nodemap
534 534
535 535 The data are expected to come from disk. See `nodemap_data_all` for a
536 536 produceur of such data."""
537 537 if nm_data is not None:
538 538 self._nm_root, self._nm_max_idx = nodemaputil.parse_data(nm_data)
539 539 if self._nm_root:
540 540 self._nm_docket = docket
541 541 else:
542 542 self._nm_root = self._nm_max_idx = self._nm_docket = None
543 543
544 544
545 545 class InlinedIndexObject(BaseIndexObject):
546 546 def __init__(self, data, inline=0):
547 547 self._data = data
548 548 self._lgt = self._inline_scan(None)
549 549 self._inline_scan(self._lgt)
550 550 self._extra = []
551 551
552 552 def _inline_scan(self, lgt):
553 553 off = 0
554 554 if lgt is not None:
555 555 self._offsets = [0] * lgt
556 556 count = 0
557 557 while off <= len(self._data) - self.entry_size:
558 558 start = off + self.big_int_size
559 559 (s,) = struct.unpack(
560 560 b'>i',
561 561 self._data[start : start + self.int_size],
562 562 )
563 563 if lgt is not None:
564 564 self._offsets[count] = off
565 565 count += 1
566 566 off += self.entry_size + s
567 567 if off != len(self._data):
568 568 raise ValueError(b"corrupted data")
569 569 return count
570 570
571 571 def __delitem__(self, i):
572 572 if not isinstance(i, slice) or not i.stop == -1 or i.step is not None:
573 573 raise ValueError(b"deleting slices only supports a:-1 with step 1")
574 574 i = i.start
575 575 self._check_index(i)
576 576 self._stripnodes(i)
577 577 if i < self._lgt:
578 578 self._offsets = self._offsets[:i]
579 579 self._lgt = i
580 580 self._extra = []
581 581 else:
582 582 self._extra = self._extra[: i - self._lgt]
583 583
584 584 def _calculate_index(self, i):
585 585 return self._offsets[i]
586 586
587 587
588 588 def parse_index2(data, inline, revlogv2=False):
589 589 if not inline:
590 590 cls = IndexObject2 if revlogv2 else IndexObject
591 591 return cls(data), None
592 592 cls = InlinedIndexObject
593 593 return cls(data, inline), (0, data)
594 594
595 595
596 596 def parse_index_cl_v2(data):
597 597 return IndexChangelogV2(data), None
598 598
599 599
600 600 class IndexObject2(IndexObject):
601 601 index_format = revlog_constants.INDEX_ENTRY_V2
602 602
603 603 def replace_sidedata_info(
604 604 self,
605 605 rev,
606 606 sidedata_offset,
607 607 sidedata_length,
608 608 offset_flags,
609 609 compression_mode,
610 610 ):
611 611 """
612 612 Replace an existing index entry's sidedata offset and length with new
613 613 ones.
614 614 This cannot be used outside of the context of sidedata rewriting,
615 615 inside the transaction that creates the revision `rev`.
616 616 """
617 617 if rev < 0:
618 618 raise KeyError
619 619 self._check_index(rev)
620 620 if rev < self._lgt:
621 621 msg = b"cannot rewrite entries outside of this transaction"
622 622 raise KeyError(msg)
623 623 else:
624 624 entry = list(self[rev])
625 625 entry[0] = offset_flags
626 626 entry[8] = sidedata_offset
627 627 entry[9] = sidedata_length
628 628 entry[11] = compression_mode
629 629 entry = tuple(entry)
630 630 new = self._pack_entry(rev, entry)
631 631 self._extra[rev - self._lgt] = new
632 632
633 633 def _unpack_entry(self, rev, data):
634 634 data = self.index_format.unpack(data)
635 635 entry = data[:10]
636 636 data_comp = data[10] & 3
637 637 sidedata_comp = (data[10] & (3 << 2)) >> 2
638 638 return entry + (data_comp, sidedata_comp)
639 639
640 640 def _pack_entry(self, rev, entry):
641 641 data = entry[:10]
642 642 data_comp = entry[10] & 3
643 643 sidedata_comp = (entry[11] & 3) << 2
644 644 data += (data_comp | sidedata_comp,)
645 645
646 646 return self.index_format.pack(*data)
647 647
648 648 def entry_binary(self, rev):
649 649 """return the raw binary string representing a revision"""
650 650 entry = self[rev]
651 651 return self._pack_entry(rev, entry)
652 652
653 653 def pack_header(self, header):
654 654 """pack header information as binary"""
655 655 msg = 'version header should go in the docket, not the index: %d'
656 656 msg %= header
657 657 raise error.ProgrammingError(msg)
658 658
659 659
660 660 class IndexChangelogV2(IndexObject2):
661 661 index_format = revlog_constants.INDEX_ENTRY_CL_V2
662 662
663 663 def _unpack_entry(self, rev, data, r=True):
664 664 items = self.index_format.unpack(data)
665 665 entry = items[:3] + (rev, rev) + items[3:8]
666 666 data_comp = items[8] & 3
667 667 sidedata_comp = (items[8] >> 2) & 3
668 668 return entry + (data_comp, sidedata_comp)
669 669
670 670 def _pack_entry(self, rev, entry):
671 671 assert entry[3] == rev, entry[3]
672 672 assert entry[4] == rev, entry[4]
673 673 data = entry[:3] + entry[5:10]
674 674 data_comp = entry[10] & 3
675 675 sidedata_comp = (entry[11] & 3) << 2
676 676 data += (data_comp | sidedata_comp,)
677 677 return self.index_format.pack(*data)
678 678
679 679
680 680 def parse_index_devel_nodemap(data, inline):
681 681 """like parse_index2, but alway return a PersistentNodeMapIndexObject"""
682 682 return PersistentNodeMapIndexObject(data), None
683 683
684 684
685 685 def parse_dirstate(dmap, copymap, st):
686 686 parents = [st[:20], st[20:40]]
687 687 # dereference fields so they will be local in loop
688 688 format = b">cllll"
689 689 e_size = struct.calcsize(format)
690 690 pos1 = 40
691 691 l = len(st)
692 692
693 693 # the inner loop
694 694 while pos1 < l:
695 695 pos2 = pos1 + e_size
696 696 e = _unpack(b">cllll", st[pos1:pos2]) # a literal here is faster
697 697 pos1 = pos2 + e[4]
698 698 f = st[pos2:pos1]
699 699 if b'\0' in f:
700 700 f, c = f.split(b'\0')
701 701 copymap[f] = c
702 702 dmap[f] = DirstateItem.from_v1_data(*e[:4])
703 703 return parents
704 704
705 705
706 706 def pack_dirstate(dmap, copymap, pl, now):
707 707 now = int(now)
708 708 cs = stringio()
709 709 write = cs.write
710 710 write(b"".join(pl))
711 711 for f, e in pycompat.iteritems(dmap):
712 712 if e.need_delay(now):
713 713 # The file was last modified "simultaneously" with the current
714 714 # write to dirstate (i.e. within the same second for file-
715 715 # systems with a granularity of 1 sec). This commonly happens
716 716 # for at least a couple of files on 'update'.
717 717 # The user could change the file without changing its size
718 718 # within the same second. Invalidate the file's mtime in
719 719 # dirstate, forcing future 'status' calls to compare the
720 720 # contents of the file if the size is the same. This prevents
721 721 # mistakenly treating such files as clean.
722 722 e.set_possibly_dirty()
723 723
724 724 if f in copymap:
725 725 f = b"%s\0%s" % (f, copymap[f])
726 726 e = _pack(
727 727 b">cllll",
728 728 e.v1_state(),
729 729 e.v1_mode(),
730 730 e.v1_size(),
731 731 e.v1_mtime(),
732 732 len(f),
733 733 )
734 734 write(e)
735 735 write(f)
736 736 return cs.getvalue()
@@ -1,792 +1,733 b''
1 1 //! The "version 2" disk representation of the dirstate
2 2 //!
3 3 //! See `mercurial/helptext/internals/dirstate-v2.txt`
4 4
5 5 use crate::dirstate_tree::dirstate_map::{self, DirstateMap, NodeRef};
6 6 use crate::dirstate_tree::path_with_basename::WithBasename;
7 7 use crate::errors::HgError;
8 8 use crate::utils::hg_path::HgPath;
9 9 use crate::DirstateEntry;
10 10 use crate::DirstateError;
11 11 use crate::DirstateParents;
12 12 use bitflags::bitflags;
13 13 use bytes_cast::unaligned::{I32Be, I64Be, U16Be, U32Be};
14 14 use bytes_cast::BytesCast;
15 15 use format_bytes::format_bytes;
16 16 use std::borrow::Cow;
17 17 use std::convert::{TryFrom, TryInto};
18 18 use std::time::{Duration, SystemTime, UNIX_EPOCH};
19 19
20 20 /// Added at the start of `.hg/dirstate` when the "v2" format is used.
21 21 /// This a redundant sanity check more than an actual "magic number" since
22 22 /// `.hg/requires` already governs which format should be used.
23 23 pub const V2_FORMAT_MARKER: &[u8; 12] = b"dirstate-v2\n";
24 24
25 25 /// Keep space for 256-bit hashes
26 26 const STORED_NODE_ID_BYTES: usize = 32;
27 27
28 28 /// … even though only 160 bits are used for now, with SHA-1
29 29 const USED_NODE_ID_BYTES: usize = 20;
30 30
31 31 pub(super) const IGNORE_PATTERNS_HASH_LEN: usize = 20;
32 32 pub(super) type IgnorePatternsHash = [u8; IGNORE_PATTERNS_HASH_LEN];
33 33
34 34 /// Must match the constant of the same name in
35 35 /// `mercurial/dirstateutils/docket.py`
36 36 const TREE_METADATA_SIZE: usize = 44;
37 37
38 38 /// Make sure that size-affecting changes are made knowingly
39 39 #[allow(unused)]
40 40 fn static_assert_size_of() {
41 41 let _ = std::mem::transmute::<TreeMetadata, [u8; TREE_METADATA_SIZE]>;
42 42 let _ = std::mem::transmute::<DocketHeader, [u8; TREE_METADATA_SIZE + 81]>;
43 43 let _ = std::mem::transmute::<Node, [u8; 43]>;
44 44 }
45 45
46 46 // Must match `HEADER` in `mercurial/dirstateutils/docket.py`
47 47 #[derive(BytesCast)]
48 48 #[repr(C)]
49 49 struct DocketHeader {
50 50 marker: [u8; V2_FORMAT_MARKER.len()],
51 51 parent_1: [u8; STORED_NODE_ID_BYTES],
52 52 parent_2: [u8; STORED_NODE_ID_BYTES],
53 53
54 54 metadata: TreeMetadata,
55 55
56 56 /// Counted in bytes
57 57 data_size: Size,
58 58
59 59 uuid_size: u8,
60 60 }
61 61
62 62 pub struct Docket<'on_disk> {
63 63 header: &'on_disk DocketHeader,
64 64 uuid: &'on_disk [u8],
65 65 }
66 66
67 /// Fields are documented in the *Tree metadata in the docket file*
68 /// section of `mercurial/helptext/internals/dirstate-v2.txt`
67 69 #[derive(BytesCast)]
68 70 #[repr(C)]
69 71 struct TreeMetadata {
70 72 root_nodes: ChildNodes,
71 73 nodes_with_entry_count: Size,
72 74 nodes_with_copy_source_count: Size,
73
74 /// How many bytes of this data file are not used anymore
75 75 unreachable_bytes: Size,
76
77 /// Current version always sets these bytes to zero when creating or
78 /// updating a dirstate. Future versions could assign some bits to signal
79 /// for example "the version that last wrote/updated this dirstate did so
80 /// in such and such way that can be relied on by versions that know to."
81 76 unused: [u8; 4],
82 77
83 /// If non-zero, a hash of ignore files that were used for some previous
84 /// run of the `status` algorithm.
85 ///
86 /// We define:
87 ///
88 /// * "Root" ignore files are `.hgignore` at the root of the repository if
89 /// it exists, and files from `ui.ignore.*` config. This set of files is
90 /// then sorted by the string representation of their path.
91 /// * The "expanded contents" of an ignore files is the byte string made
92 /// by concatenating its contents with the "expanded contents" of other
93 /// files included with `include:` or `subinclude:` files, in inclusion
94 /// order. This definition is recursive, as included files can
95 /// themselves include more files.
96 ///
97 /// This hash is defined as the SHA-1 of the concatenation (in sorted
98 /// order) of the "expanded contents" of each "root" ignore file.
99 /// (Note that computing this does not require actually concatenating byte
100 /// strings into contiguous memory, instead SHA-1 hashing can be done
101 /// incrementally.)
78 /// See *Optional hash of ignore patterns* section of
79 /// `mercurial/helptext/internals/dirstate-v2.txt`
102 80 ignore_patterns_hash: IgnorePatternsHash,
103 81 }
104 82
83 /// Fields are documented in the *The data file format*
84 /// section of `mercurial/helptext/internals/dirstate-v2.txt`
105 85 #[derive(BytesCast)]
106 86 #[repr(C)]
107 87 pub(super) struct Node {
108 88 full_path: PathSlice,
109 89
110 90 /// In bytes from `self.full_path.start`
111 91 base_name_start: PathSize,
112 92
113 93 copy_source: OptPathSlice,
114 94 children: ChildNodes,
115 95 pub(super) descendants_with_entry_count: Size,
116 96 pub(super) tracked_descendants_count: Size,
117
118 /// Depending on the bits in `flags`:
119 ///
120 /// * If any of `WDIR_TRACKED`, `P1_TRACKED`, or `P2_INFO` are set, the
121 /// node has an entry.
122 ///
123 /// - If `HAS_MODE_AND_SIZE` is set, `data.mode` and `data.size` are
124 /// meaningful. Otherwise they are set to zero
125 /// - If `HAS_MTIME` is set, `data.mtime` is meaningful. Otherwise it is
126 /// set to zero.
127 ///
128 /// * If none of `WDIR_TRACKED`, `P1_TRACKED`, `P2_INFO`, or `HAS_MTIME`
129 /// are set, the node does not have an entry and `data` is set to all
130 /// zeros.
131 ///
132 /// * If none of `WDIR_TRACKED`, `P1_TRACKED`, `P2_INFO` are set, but
133 /// `HAS_MTIME` is set, the bytes of `data` should instead be
134 /// interpreted as the `Timestamp` for the mtime of a cached directory.
135 ///
136 /// The presence of this combination of flags means that at some point,
137 /// this path in the working directory was observed:
138 ///
139 /// - To be a directory
140 /// - With the modification time as given by `Timestamp`
141 /// - That timestamp was already strictly in the past when observed,
142 /// meaning that later changes cannot happen in the same clock tick
143 /// and must cause a different modification time (unless the system
144 /// clock jumps back and we get unlucky, which is not impossible but
145 /// but deemed unlikely enough).
146 /// - All direct children of this directory (as returned by
147 /// `std::fs::read_dir`) either have a corresponding dirstate node, or
148 /// are ignored by ignore patterns whose hash is in
149 /// `TreeMetadata::ignore_patterns_hash`.
150 ///
151 /// This means that if `std::fs::symlink_metadata` later reports the
152 /// same modification time and ignored patterns haven’t changed, a run
153 /// of status that is not listing ignored files can skip calling
154 /// `std::fs::read_dir` again for this directory, iterate child
155 /// dirstate nodes instead.
156 97 flags: Flags,
157 98 data: Entry,
158 99 }
159 100
160 101 bitflags! {
161 102 #[derive(BytesCast)]
162 103 #[repr(C)]
163 104 struct Flags: u8 {
164 105 const WDIR_TRACKED = 1 << 0;
165 106 const P1_TRACKED = 1 << 1;
166 107 const P2_INFO = 1 << 2;
167 108 const HAS_MODE_AND_SIZE = 1 << 3;
168 109 const HAS_MTIME = 1 << 4;
169 110 }
170 111 }
171 112
172 113 #[derive(BytesCast, Copy, Clone, Debug)]
173 114 #[repr(C)]
174 115 struct Entry {
175 116 mode: I32Be,
176 117 mtime: I32Be,
177 118 size: I32Be,
178 119 }
179 120
180 121 /// Duration since the Unix epoch
181 122 #[derive(BytesCast, Copy, Clone, PartialEq)]
182 123 #[repr(C)]
183 124 pub(super) struct Timestamp {
184 125 seconds: I64Be,
185 126
186 127 /// In `0 .. 1_000_000_000`.
187 128 ///
188 129 /// This timestamp is later or earlier than `(seconds, 0)` by this many
189 130 /// nanoseconds, if `seconds` is non-negative or negative, respectively.
190 131 nanoseconds: U32Be,
191 132 }
192 133
193 134 /// Counted in bytes from the start of the file
194 135 ///
195 136 /// NOTE: not supporting `.hg/dirstate` files larger than 4 GiB.
196 137 type Offset = U32Be;
197 138
198 139 /// Counted in number of items
199 140 ///
200 141 /// NOTE: we choose not to support counting more than 4 billion nodes anywhere.
201 142 type Size = U32Be;
202 143
203 144 /// Counted in bytes
204 145 ///
205 146 /// NOTE: we choose not to support file names/paths longer than 64 KiB.
206 147 type PathSize = U16Be;
207 148
208 149 /// A contiguous sequence of `len` times `Node`, representing the child nodes
209 150 /// of either some other node or of the repository root.
210 151 ///
211 152 /// Always sorted by ascending `full_path`, to allow binary search.
212 153 /// Since nodes with the same parent nodes also have the same parent path,
213 154 /// only the `base_name`s need to be compared during binary search.
214 155 #[derive(BytesCast, Copy, Clone)]
215 156 #[repr(C)]
216 157 struct ChildNodes {
217 158 start: Offset,
218 159 len: Size,
219 160 }
220 161
221 162 /// A `HgPath` of `len` bytes
222 163 #[derive(BytesCast, Copy, Clone)]
223 164 #[repr(C)]
224 165 struct PathSlice {
225 166 start: Offset,
226 167 len: PathSize,
227 168 }
228 169
229 170 /// Either nothing if `start == 0`, or a `HgPath` of `len` bytes
230 171 type OptPathSlice = PathSlice;
231 172
232 173 /// Unexpected file format found in `.hg/dirstate` with the "v2" format.
233 174 ///
234 175 /// This should only happen if Mercurial is buggy or a repository is corrupted.
235 176 #[derive(Debug)]
236 177 pub struct DirstateV2ParseError;
237 178
238 179 impl From<DirstateV2ParseError> for HgError {
239 180 fn from(_: DirstateV2ParseError) -> Self {
240 181 HgError::corrupted("dirstate-v2 parse error")
241 182 }
242 183 }
243 184
244 185 impl From<DirstateV2ParseError> for crate::DirstateError {
245 186 fn from(error: DirstateV2ParseError) -> Self {
246 187 HgError::from(error).into()
247 188 }
248 189 }
249 190
250 191 impl<'on_disk> Docket<'on_disk> {
251 192 pub fn parents(&self) -> DirstateParents {
252 193 use crate::Node;
253 194 let p1 = Node::try_from(&self.header.parent_1[..USED_NODE_ID_BYTES])
254 195 .unwrap()
255 196 .clone();
256 197 let p2 = Node::try_from(&self.header.parent_2[..USED_NODE_ID_BYTES])
257 198 .unwrap()
258 199 .clone();
259 200 DirstateParents { p1, p2 }
260 201 }
261 202
262 203 pub fn tree_metadata(&self) -> &[u8] {
263 204 self.header.metadata.as_bytes()
264 205 }
265 206
266 207 pub fn data_size(&self) -> usize {
267 208 // This `unwrap` could only panic on a 16-bit CPU
268 209 self.header.data_size.get().try_into().unwrap()
269 210 }
270 211
271 212 pub fn data_filename(&self) -> String {
272 213 String::from_utf8(format_bytes!(b"dirstate.{}", self.uuid)).unwrap()
273 214 }
274 215 }
275 216
276 217 pub fn read_docket(
277 218 on_disk: &[u8],
278 219 ) -> Result<Docket<'_>, DirstateV2ParseError> {
279 220 let (header, uuid) =
280 221 DocketHeader::from_bytes(on_disk).map_err(|_| DirstateV2ParseError)?;
281 222 let uuid_size = header.uuid_size as usize;
282 223 if header.marker == *V2_FORMAT_MARKER && uuid.len() == uuid_size {
283 224 Ok(Docket { header, uuid })
284 225 } else {
285 226 Err(DirstateV2ParseError)
286 227 }
287 228 }
288 229
289 230 pub(super) fn read<'on_disk>(
290 231 on_disk: &'on_disk [u8],
291 232 metadata: &[u8],
292 233 ) -> Result<DirstateMap<'on_disk>, DirstateV2ParseError> {
293 234 if on_disk.is_empty() {
294 235 return Ok(DirstateMap::empty(on_disk));
295 236 }
296 237 let (meta, _) = TreeMetadata::from_bytes(metadata)
297 238 .map_err(|_| DirstateV2ParseError)?;
298 239 let dirstate_map = DirstateMap {
299 240 on_disk,
300 241 root: dirstate_map::ChildNodes::OnDisk(read_nodes(
301 242 on_disk,
302 243 meta.root_nodes,
303 244 )?),
304 245 nodes_with_entry_count: meta.nodes_with_entry_count.get(),
305 246 nodes_with_copy_source_count: meta.nodes_with_copy_source_count.get(),
306 247 ignore_patterns_hash: meta.ignore_patterns_hash,
307 248 unreachable_bytes: meta.unreachable_bytes.get(),
308 249 };
309 250 Ok(dirstate_map)
310 251 }
311 252
312 253 impl Node {
313 254 pub(super) fn full_path<'on_disk>(
314 255 &self,
315 256 on_disk: &'on_disk [u8],
316 257 ) -> Result<&'on_disk HgPath, DirstateV2ParseError> {
317 258 read_hg_path(on_disk, self.full_path)
318 259 }
319 260
320 261 pub(super) fn base_name_start<'on_disk>(
321 262 &self,
322 263 ) -> Result<usize, DirstateV2ParseError> {
323 264 let start = self.base_name_start.get();
324 265 if start < self.full_path.len.get() {
325 266 let start = usize::try_from(start)
326 267 // u32 -> usize, could only panic on a 16-bit CPU
327 268 .expect("dirstate-v2 base_name_start out of bounds");
328 269 Ok(start)
329 270 } else {
330 271 Err(DirstateV2ParseError)
331 272 }
332 273 }
333 274
334 275 pub(super) fn base_name<'on_disk>(
335 276 &self,
336 277 on_disk: &'on_disk [u8],
337 278 ) -> Result<&'on_disk HgPath, DirstateV2ParseError> {
338 279 let full_path = self.full_path(on_disk)?;
339 280 let base_name_start = self.base_name_start()?;
340 281 Ok(HgPath::new(&full_path.as_bytes()[base_name_start..]))
341 282 }
342 283
343 284 pub(super) fn path<'on_disk>(
344 285 &self,
345 286 on_disk: &'on_disk [u8],
346 287 ) -> Result<dirstate_map::NodeKey<'on_disk>, DirstateV2ParseError> {
347 288 Ok(WithBasename::from_raw_parts(
348 289 Cow::Borrowed(self.full_path(on_disk)?),
349 290 self.base_name_start()?,
350 291 ))
351 292 }
352 293
353 294 pub(super) fn has_copy_source<'on_disk>(&self) -> bool {
354 295 self.copy_source.start.get() != 0
355 296 }
356 297
357 298 pub(super) fn copy_source<'on_disk>(
358 299 &self,
359 300 on_disk: &'on_disk [u8],
360 301 ) -> Result<Option<&'on_disk HgPath>, DirstateV2ParseError> {
361 302 Ok(if self.has_copy_source() {
362 303 Some(read_hg_path(on_disk, self.copy_source)?)
363 304 } else {
364 305 None
365 306 })
366 307 }
367 308
368 309 fn has_entry(&self) -> bool {
369 310 self.flags.intersects(
370 311 Flags::WDIR_TRACKED | Flags::P1_TRACKED | Flags::P2_INFO,
371 312 )
372 313 }
373 314
374 315 pub(super) fn node_data(
375 316 &self,
376 317 ) -> Result<dirstate_map::NodeData, DirstateV2ParseError> {
377 318 if self.has_entry() {
378 319 Ok(dirstate_map::NodeData::Entry(self.assume_entry()))
379 320 } else if let Some(&mtime) = self.cached_directory_mtime() {
380 321 Ok(dirstate_map::NodeData::CachedDirectory { mtime })
381 322 } else {
382 323 Ok(dirstate_map::NodeData::None)
383 324 }
384 325 }
385 326
386 327 pub(super) fn cached_directory_mtime(&self) -> Option<&Timestamp> {
387 328 if self.flags.contains(Flags::HAS_MTIME) && !self.has_entry() {
388 329 Some(self.data.as_timestamp())
389 330 } else {
390 331 None
391 332 }
392 333 }
393 334
394 335 fn assume_entry(&self) -> DirstateEntry {
395 336 // TODO: convert through raw bits instead?
396 337 let wdir_tracked = self.flags.contains(Flags::WDIR_TRACKED);
397 338 let p1_tracked = self.flags.contains(Flags::P1_TRACKED);
398 339 let p2_info = self.flags.contains(Flags::P2_INFO);
399 340 let mode_size = if self.flags.contains(Flags::HAS_MODE_AND_SIZE) {
400 341 Some((self.data.mode.into(), self.data.size.into()))
401 342 } else {
402 343 None
403 344 };
404 345 let mtime = if self.flags.contains(Flags::HAS_MTIME) {
405 346 Some(self.data.mtime.into())
406 347 } else {
407 348 None
408 349 };
409 350 DirstateEntry::from_v2_data(
410 351 wdir_tracked,
411 352 p1_tracked,
412 353 p2_info,
413 354 mode_size,
414 355 mtime,
415 356 )
416 357 }
417 358
418 359 pub(super) fn entry(
419 360 &self,
420 361 ) -> Result<Option<DirstateEntry>, DirstateV2ParseError> {
421 362 if self.has_entry() {
422 363 Ok(Some(self.assume_entry()))
423 364 } else {
424 365 Ok(None)
425 366 }
426 367 }
427 368
428 369 pub(super) fn children<'on_disk>(
429 370 &self,
430 371 on_disk: &'on_disk [u8],
431 372 ) -> Result<&'on_disk [Node], DirstateV2ParseError> {
432 373 read_nodes(on_disk, self.children)
433 374 }
434 375
435 376 pub(super) fn to_in_memory_node<'on_disk>(
436 377 &self,
437 378 on_disk: &'on_disk [u8],
438 379 ) -> Result<dirstate_map::Node<'on_disk>, DirstateV2ParseError> {
439 380 Ok(dirstate_map::Node {
440 381 children: dirstate_map::ChildNodes::OnDisk(
441 382 self.children(on_disk)?,
442 383 ),
443 384 copy_source: self.copy_source(on_disk)?.map(Cow::Borrowed),
444 385 data: self.node_data()?,
445 386 descendants_with_entry_count: self
446 387 .descendants_with_entry_count
447 388 .get(),
448 389 tracked_descendants_count: self.tracked_descendants_count.get(),
449 390 })
450 391 }
451 392 }
452 393
453 394 impl Entry {
454 395 fn from_dirstate_entry(entry: &DirstateEntry) -> (Flags, Self) {
455 396 let (wdir_tracked, p1_tracked, p2_info, mode_size_opt, mtime_opt) =
456 397 entry.v2_data();
457 398 // TODO: convert throug raw flag bits instead?
458 399 let mut flags = Flags::empty();
459 400 flags.set(Flags::WDIR_TRACKED, wdir_tracked);
460 401 flags.set(Flags::P1_TRACKED, p1_tracked);
461 402 flags.set(Flags::P2_INFO, p2_info);
462 403 let (mode, size, mtime);
463 404 if let Some((m, s)) = mode_size_opt {
464 405 mode = m;
465 406 size = s;
466 407 flags.insert(Flags::HAS_MODE_AND_SIZE)
467 408 } else {
468 409 mode = 0;
469 410 size = 0;
470 411 }
471 412 if let Some(m) = mtime_opt {
472 413 mtime = m;
473 414 flags.insert(Flags::HAS_MTIME);
474 415 } else {
475 416 mtime = 0;
476 417 }
477 418 let raw_entry = Entry {
478 419 mode: mode.into(),
479 420 size: size.into(),
480 421 mtime: mtime.into(),
481 422 };
482 423 (flags, raw_entry)
483 424 }
484 425
485 426 fn from_timestamp(timestamp: Timestamp) -> Self {
486 427 // Safety: both types implement the `ByteCast` trait, so we could
487 428 // safely use `as_bytes` and `from_bytes` to do this conversion. Using
488 429 // `transmute` instead makes the compiler check that the two types
489 430 // have the same size, which eliminates the error case of
490 431 // `from_bytes`.
491 432 unsafe { std::mem::transmute::<Timestamp, Entry>(timestamp) }
492 433 }
493 434
494 435 fn as_timestamp(&self) -> &Timestamp {
495 436 // Safety: same as above in `from_timestamp`
496 437 unsafe { &*(self as *const Entry as *const Timestamp) }
497 438 }
498 439 }
499 440
500 441 impl Timestamp {
501 442 pub fn seconds(&self) -> i64 {
502 443 self.seconds.get()
503 444 }
504 445 }
505 446
506 447 impl From<SystemTime> for Timestamp {
507 448 fn from(system_time: SystemTime) -> Self {
508 449 let (secs, nanos) = match system_time.duration_since(UNIX_EPOCH) {
509 450 Ok(duration) => {
510 451 (duration.as_secs() as i64, duration.subsec_nanos())
511 452 }
512 453 Err(error) => {
513 454 let negative = error.duration();
514 455 (-(negative.as_secs() as i64), negative.subsec_nanos())
515 456 }
516 457 };
517 458 Timestamp {
518 459 seconds: secs.into(),
519 460 nanoseconds: nanos.into(),
520 461 }
521 462 }
522 463 }
523 464
524 465 impl From<&'_ Timestamp> for SystemTime {
525 466 fn from(timestamp: &'_ Timestamp) -> Self {
526 467 let secs = timestamp.seconds.get();
527 468 let nanos = timestamp.nanoseconds.get();
528 469 if secs >= 0 {
529 470 UNIX_EPOCH + Duration::new(secs as u64, nanos)
530 471 } else {
531 472 UNIX_EPOCH - Duration::new((-secs) as u64, nanos)
532 473 }
533 474 }
534 475 }
535 476
536 477 fn read_hg_path(
537 478 on_disk: &[u8],
538 479 slice: PathSlice,
539 480 ) -> Result<&HgPath, DirstateV2ParseError> {
540 481 read_slice(on_disk, slice.start, slice.len.get()).map(HgPath::new)
541 482 }
542 483
543 484 fn read_nodes(
544 485 on_disk: &[u8],
545 486 slice: ChildNodes,
546 487 ) -> Result<&[Node], DirstateV2ParseError> {
547 488 read_slice(on_disk, slice.start, slice.len.get())
548 489 }
549 490
550 491 fn read_slice<T, Len>(
551 492 on_disk: &[u8],
552 493 start: Offset,
553 494 len: Len,
554 495 ) -> Result<&[T], DirstateV2ParseError>
555 496 where
556 497 T: BytesCast,
557 498 Len: TryInto<usize>,
558 499 {
559 500 // Either `usize::MAX` would result in "out of bounds" error since a single
560 501 // `&[u8]` cannot occupy the entire addess space.
561 502 let start = start.get().try_into().unwrap_or(std::usize::MAX);
562 503 let len = len.try_into().unwrap_or(std::usize::MAX);
563 504 on_disk
564 505 .get(start..)
565 506 .and_then(|bytes| T::slice_from_bytes(bytes, len).ok())
566 507 .map(|(slice, _rest)| slice)
567 508 .ok_or_else(|| DirstateV2ParseError)
568 509 }
569 510
570 511 pub(crate) fn for_each_tracked_path<'on_disk>(
571 512 on_disk: &'on_disk [u8],
572 513 metadata: &[u8],
573 514 mut f: impl FnMut(&'on_disk HgPath),
574 515 ) -> Result<(), DirstateV2ParseError> {
575 516 let (meta, _) = TreeMetadata::from_bytes(metadata)
576 517 .map_err(|_| DirstateV2ParseError)?;
577 518 fn recur<'on_disk>(
578 519 on_disk: &'on_disk [u8],
579 520 nodes: ChildNodes,
580 521 f: &mut impl FnMut(&'on_disk HgPath),
581 522 ) -> Result<(), DirstateV2ParseError> {
582 523 for node in read_nodes(on_disk, nodes)? {
583 524 if let Some(entry) = node.entry()? {
584 525 if entry.state().is_tracked() {
585 526 f(node.full_path(on_disk)?)
586 527 }
587 528 }
588 529 recur(on_disk, node.children, f)?
589 530 }
590 531 Ok(())
591 532 }
592 533 recur(on_disk, meta.root_nodes, &mut f)
593 534 }
594 535
595 536 /// Returns new data and metadata, together with whether that data should be
596 537 /// appended to the existing data file whose content is at
597 538 /// `dirstate_map.on_disk` (true), instead of written to a new data file
598 539 /// (false).
599 540 pub(super) fn write(
600 541 dirstate_map: &mut DirstateMap,
601 542 can_append: bool,
602 543 ) -> Result<(Vec<u8>, Vec<u8>, bool), DirstateError> {
603 544 let append = can_append && dirstate_map.write_should_append();
604 545
605 546 // This ignores the space for paths, and for nodes without an entry.
606 547 // TODO: better estimate? Skip the `Vec` and write to a file directly?
607 548 let size_guess = std::mem::size_of::<Node>()
608 549 * dirstate_map.nodes_with_entry_count as usize;
609 550
610 551 let mut writer = Writer {
611 552 dirstate_map,
612 553 append,
613 554 out: Vec::with_capacity(size_guess),
614 555 };
615 556
616 557 let root_nodes = writer.write_nodes(dirstate_map.root.as_ref())?;
617 558
618 559 let meta = TreeMetadata {
619 560 root_nodes,
620 561 nodes_with_entry_count: dirstate_map.nodes_with_entry_count.into(),
621 562 nodes_with_copy_source_count: dirstate_map
622 563 .nodes_with_copy_source_count
623 564 .into(),
624 565 unreachable_bytes: dirstate_map.unreachable_bytes.into(),
625 566 unused: [0; 4],
626 567 ignore_patterns_hash: dirstate_map.ignore_patterns_hash,
627 568 };
628 569 Ok((writer.out, meta.as_bytes().to_vec(), append))
629 570 }
630 571
631 572 struct Writer<'dmap, 'on_disk> {
632 573 dirstate_map: &'dmap DirstateMap<'on_disk>,
633 574 append: bool,
634 575 out: Vec<u8>,
635 576 }
636 577
637 578 impl Writer<'_, '_> {
638 579 fn write_nodes(
639 580 &mut self,
640 581 nodes: dirstate_map::ChildNodesRef,
641 582 ) -> Result<ChildNodes, DirstateError> {
642 583 // Reuse already-written nodes if possible
643 584 if self.append {
644 585 if let dirstate_map::ChildNodesRef::OnDisk(nodes_slice) = nodes {
645 586 let start = self.on_disk_offset_of(nodes_slice).expect(
646 587 "dirstate-v2 OnDisk nodes not found within on_disk",
647 588 );
648 589 let len = child_nodes_len_from_usize(nodes_slice.len());
649 590 return Ok(ChildNodes { start, len });
650 591 }
651 592 }
652 593
653 594 // `dirstate_map::ChildNodes::InMemory` contains a `HashMap` which has
654 595 // undefined iteration order. Sort to enable binary search in the
655 596 // written file.
656 597 let nodes = nodes.sorted();
657 598 let nodes_len = nodes.len();
658 599
659 600 // First accumulate serialized nodes in a `Vec`
660 601 let mut on_disk_nodes = Vec::with_capacity(nodes_len);
661 602 for node in nodes {
662 603 let children =
663 604 self.write_nodes(node.children(self.dirstate_map.on_disk)?)?;
664 605 let full_path = node.full_path(self.dirstate_map.on_disk)?;
665 606 let full_path = self.write_path(full_path.as_bytes());
666 607 let copy_source = if let Some(source) =
667 608 node.copy_source(self.dirstate_map.on_disk)?
668 609 {
669 610 self.write_path(source.as_bytes())
670 611 } else {
671 612 PathSlice {
672 613 start: 0.into(),
673 614 len: 0.into(),
674 615 }
675 616 };
676 617 on_disk_nodes.push(match node {
677 618 NodeRef::InMemory(path, node) => {
678 619 let (flags, data) = match &node.data {
679 620 dirstate_map::NodeData::Entry(entry) => {
680 621 Entry::from_dirstate_entry(entry)
681 622 }
682 623 dirstate_map::NodeData::CachedDirectory { mtime } => {
683 624 (Flags::HAS_MTIME, Entry::from_timestamp(*mtime))
684 625 }
685 626 dirstate_map::NodeData::None => (
686 627 Flags::empty(),
687 628 Entry {
688 629 mode: 0.into(),
689 630 size: 0.into(),
690 631 mtime: 0.into(),
691 632 },
692 633 ),
693 634 };
694 635 Node {
695 636 children,
696 637 copy_source,
697 638 full_path,
698 639 base_name_start: u16::try_from(path.base_name_start())
699 640 // Could only panic for paths over 64 KiB
700 641 .expect("dirstate-v2 path length overflow")
701 642 .into(),
702 643 descendants_with_entry_count: node
703 644 .descendants_with_entry_count
704 645 .into(),
705 646 tracked_descendants_count: node
706 647 .tracked_descendants_count
707 648 .into(),
708 649 flags,
709 650 data,
710 651 }
711 652 }
712 653 NodeRef::OnDisk(node) => Node {
713 654 children,
714 655 copy_source,
715 656 full_path,
716 657 ..*node
717 658 },
718 659 })
719 660 }
720 661 // … so we can write them contiguously, after writing everything else
721 662 // they refer to.
722 663 let start = self.current_offset();
723 664 let len = child_nodes_len_from_usize(nodes_len);
724 665 self.out.extend(on_disk_nodes.as_bytes());
725 666 Ok(ChildNodes { start, len })
726 667 }
727 668
728 669 /// If the given slice of items is within `on_disk`, returns its offset
729 670 /// from the start of `on_disk`.
730 671 fn on_disk_offset_of<T>(&self, slice: &[T]) -> Option<Offset>
731 672 where
732 673 T: BytesCast,
733 674 {
734 675 fn address_range(slice: &[u8]) -> std::ops::RangeInclusive<usize> {
735 676 let start = slice.as_ptr() as usize;
736 677 let end = start + slice.len();
737 678 start..=end
738 679 }
739 680 let slice_addresses = address_range(slice.as_bytes());
740 681 let on_disk_addresses = address_range(self.dirstate_map.on_disk);
741 682 if on_disk_addresses.contains(slice_addresses.start())
742 683 && on_disk_addresses.contains(slice_addresses.end())
743 684 {
744 685 let offset = slice_addresses.start() - on_disk_addresses.start();
745 686 Some(offset_from_usize(offset))
746 687 } else {
747 688 None
748 689 }
749 690 }
750 691
751 692 fn current_offset(&mut self) -> Offset {
752 693 let mut offset = self.out.len();
753 694 if self.append {
754 695 offset += self.dirstate_map.on_disk.len()
755 696 }
756 697 offset_from_usize(offset)
757 698 }
758 699
759 700 fn write_path(&mut self, slice: &[u8]) -> PathSlice {
760 701 let len = path_len_from_usize(slice.len());
761 702 // Reuse an already-written path if possible
762 703 if self.append {
763 704 if let Some(start) = self.on_disk_offset_of(slice) {
764 705 return PathSlice { start, len };
765 706 }
766 707 }
767 708 let start = self.current_offset();
768 709 self.out.extend(slice.as_bytes());
769 710 PathSlice { start, len }
770 711 }
771 712 }
772 713
773 714 fn offset_from_usize(x: usize) -> Offset {
774 715 u32::try_from(x)
775 716 // Could only panic for a dirstate file larger than 4 GiB
776 717 .expect("dirstate-v2 offset overflow")
777 718 .into()
778 719 }
779 720
780 721 fn child_nodes_len_from_usize(x: usize) -> Size {
781 722 u32::try_from(x)
782 723 // Could only panic with over 4 billion nodes
783 724 .expect("dirstate-v2 slice length overflow")
784 725 .into()
785 726 }
786 727
787 728 fn path_len_from_usize(x: usize) -> PathSize {
788 729 u16::try_from(x)
789 730 // Could only panic for paths over 64 KiB
790 731 .expect("dirstate-v2 path length overflow")
791 732 .into()
792 733 }
General Comments 0
You need to be logged in to leave comments. Login now