##// END OF EJS Templates
dirstate-v2: Document flags/mode/size/mtime fields of tree nodes...
Simon Sapin -
r49002:77fc340a default
parent child Browse files
Show More
@@ -1,376 +1,484 b''
1 The *dirstate* is what Mercurial uses internally to track
1 The *dirstate* is what Mercurial uses internally to track
2 the state of files in the working directory,
2 the state of files in the working directory,
3 such as set by commands like `hg add` and `hg rm`.
3 such as set by commands like `hg add` and `hg rm`.
4 It also contains some cached data that help make `hg status` faster.
4 It also contains some cached data that help make `hg status` faster.
5 The name refers both to `.hg/dirstate` on the filesystem
5 The name refers both to `.hg/dirstate` on the filesystem
6 and the corresponding data structure in memory while a Mercurial process
6 and the corresponding data structure in memory while a Mercurial process
7 is running.
7 is running.
8
8
9 The original file format, retroactively dubbed `dirstate-v1`,
9 The original file format, retroactively dubbed `dirstate-v1`,
10 is described at https://www.mercurial-scm.org/wiki/DirState.
10 is described at https://www.mercurial-scm.org/wiki/DirState.
11 It is made of a flat sequence of unordered variable-size entries,
11 It is made of a flat sequence of unordered variable-size entries,
12 so accessing any information in it requires parsing all of it.
12 so accessing any information in it requires parsing all of it.
13 Similarly, saving changes requires rewriting the entire file.
13 Similarly, saving changes requires rewriting the entire file.
14
14
15 The newer `dirsate-v2` file format is designed to fix these limitations
15 The newer `dirsate-v2` file format is designed to fix these limitations
16 and make `hg status` faster.
16 and make `hg status` faster.
17
17
18 User guide
18 User guide
19 ==========
19 ==========
20
20
21 Compatibility
21 Compatibility
22 -------------
22 -------------
23
23
24 The file format is experimental and may still change.
24 The file format is experimental and may still change.
25 Different versions of Mercurial may not be compatible with each other
25 Different versions of Mercurial may not be compatible with each other
26 when working on a local repository that uses this format.
26 when working on a local repository that uses this format.
27 When using an incompatible version with the experimental format,
27 When using an incompatible version with the experimental format,
28 anything can happen including data corruption.
28 anything can happen including data corruption.
29
29
30 Since the dirstate is entirely local and not relevant to the wire protocol,
30 Since the dirstate is entirely local and not relevant to the wire protocol,
31 `dirstate-v2` does not affect compatibility with remote Mercurial versions.
31 `dirstate-v2` does not affect compatibility with remote Mercurial versions.
32
32
33 When `share-safe` is enabled, different repositories sharing the same store
33 When `share-safe` is enabled, different repositories sharing the same store
34 can use different dirstate formats.
34 can use different dirstate formats.
35
35
36 Enabling `dirsate-v2` for new local repositories
36 Enabling `dirsate-v2` for new local repositories
37 ------------------------------------------------
37 ------------------------------------------------
38
38
39 When creating a new local repository such as with `hg init` or `hg clone`,
39 When creating a new local repository such as with `hg init` or `hg clone`,
40 the `exp-dirstate-v2` boolean in the `format` configuration section
40 the `exp-dirstate-v2` boolean in the `format` configuration section
41 controls whether to use this file format.
41 controls whether to use this file format.
42 This is disabled by default as of this writing.
42 This is disabled by default as of this writing.
43 To enable it for a single repository, run for example::
43 To enable it for a single repository, run for example::
44
44
45 $ hg init my-project --config format.exp-dirstate-v2=1
45 $ hg init my-project --config format.exp-dirstate-v2=1
46
46
47 Checking the format of an existing local repsitory
47 Checking the format of an existing local repsitory
48 --------------------------------------------------
48 --------------------------------------------------
49
49
50 The `debugformat` commands prints information about
50 The `debugformat` commands prints information about
51 which of multiple optional formats are used in the current repository,
51 which of multiple optional formats are used in the current repository,
52 including `dirstate-v2`::
52 including `dirstate-v2`::
53
53
54 $ hg debugformat
54 $ hg debugformat
55 format-variant repo
55 format-variant repo
56 fncache: yes
56 fncache: yes
57 dirstate-v2: yes
57 dirstate-v2: yes
58 […]
58 […]
59
59
60 Upgrading or downgrading an existing local repository
60 Upgrading or downgrading an existing local repository
61 -----------------------------------------------------
61 -----------------------------------------------------
62
62
63 The `debugupgrade` command does various upgrades or downgrades
63 The `debugupgrade` command does various upgrades or downgrades
64 on a local repository
64 on a local repository
65 based on the current Mercurial version and on configuration.
65 based on the current Mercurial version and on configuration.
66 The same `format.exp-dirstate-v2` configuration is used again.
66 The same `format.exp-dirstate-v2` configuration is used again.
67
67
68 Example to upgrade::
68 Example to upgrade::
69
69
70 $ hg debugupgrade --config format.exp-dirstate-v2=1
70 $ hg debugupgrade --config format.exp-dirstate-v2=1
71
71
72 Example to downgrade to `dirstate-v1`::
72 Example to downgrade to `dirstate-v1`::
73
73
74 $ hg debugupgrade --config format.exp-dirstate-v2=0
74 $ hg debugupgrade --config format.exp-dirstate-v2=0
75
75
76 Both of this commands do nothing but print a list of proposed changes,
76 Both of this commands do nothing but print a list of proposed changes,
77 which may include changes unrelated to the dirstate.
77 which may include changes unrelated to the dirstate.
78 Those other changes are controlled by their own configuration keys.
78 Those other changes are controlled by their own configuration keys.
79 Add `--run` to a command to actually apply the proposed changes.
79 Add `--run` to a command to actually apply the proposed changes.
80
80
81 Backups of `.hg/requires` and `.hg/dirstate` are created
81 Backups of `.hg/requires` and `.hg/dirstate` are created
82 in a `.hg/upgradebackup.*` directory.
82 in a `.hg/upgradebackup.*` directory.
83 If something goes wrong, restoring those files should undo the change.
83 If something goes wrong, restoring those files should undo the change.
84
84
85 Note that upgrading affects compatibility with older versions of Mercurial
85 Note that upgrading affects compatibility with older versions of Mercurial
86 as noted above.
86 as noted above.
87 This can be relevant when a repository’s files are on a USB drive
87 This can be relevant when a repository’s files are on a USB drive
88 or some other removable media, or shared over the network, etc.
88 or some other removable media, or shared over the network, etc.
89
89
90 Internal filesystem representation
90 Internal filesystem representation
91 ==================================
91 ==================================
92
92
93 Requirements file
93 Requirements file
94 -----------------
94 -----------------
95
95
96 The `.hg/requires` file indicates which of various optional file formats
96 The `.hg/requires` file indicates which of various optional file formats
97 are used by a given repository.
97 are used by a given repository.
98 Mercurial aborts when seeing a requirement it does not know about,
98 Mercurial aborts when seeing a requirement it does not know about,
99 which avoids older version accidentally messing up a respository
99 which avoids older version accidentally messing up a respository
100 that uses a format that was introduced later.
100 that uses a format that was introduced later.
101 For versions that do support a format, the presence or absence of
101 For versions that do support a format, the presence or absence of
102 the corresponding requirement indicates whether to use that format.
102 the corresponding requirement indicates whether to use that format.
103
103
104 When the file contains a `exp-dirstate-v2` line,
104 When the file contains a `exp-dirstate-v2` line,
105 the `dirstate-v2` format is used.
105 the `dirstate-v2` format is used.
106 With no such line `dirstate-v1` is used.
106 With no such line `dirstate-v1` is used.
107
107
108 High level description
108 High level description
109 ----------------------
109 ----------------------
110
110
111 Whereas `dirstate-v1` uses a single `.hg/disrtate` file,
111 Whereas `dirstate-v1` uses a single `.hg/disrtate` file,
112 in `dirstate-v2` that file is a "docket" file
112 in `dirstate-v2` that file is a "docket" file
113 that only contains some metadata
113 that only contains some metadata
114 and points to separate data file named `.hg/dirstate.{ID}`,
114 and points to separate data file named `.hg/dirstate.{ID}`,
115 where `{ID}` is a random identifier.
115 where `{ID}` is a random identifier.
116
116
117 This separation allows making data files append-only
117 This separation allows making data files append-only
118 and therefore safer to memory-map.
118 and therefore safer to memory-map.
119 Creating a new data file (occasionally to clean up unused data)
119 Creating a new data file (occasionally to clean up unused data)
120 can be done with a different ID
120 can be done with a different ID
121 without disrupting another Mercurial process
121 without disrupting another Mercurial process
122 that could still be using the previous data file.
122 that could still be using the previous data file.
123
123
124 Both files have a format designed to reduce the need for parsing,
124 Both files have a format designed to reduce the need for parsing,
125 by using fixed-size binary components as much as possible.
125 by using fixed-size binary components as much as possible.
126 For data that is not fixed-size,
126 For data that is not fixed-size,
127 references to other parts of a file can be made by storing "pseudo-pointers":
127 references to other parts of a file can be made by storing "pseudo-pointers":
128 integers counted in bytes from the start of a file.
128 integers counted in bytes from the start of a file.
129 For read-only access no data structure is needed,
129 For read-only access no data structure is needed,
130 only a bytes buffer (possibly memory-mapped directly from the filesystem)
130 only a bytes buffer (possibly memory-mapped directly from the filesystem)
131 with specific parts read on demand.
131 with specific parts read on demand.
132
132
133 The data file contains "nodes" organized in a tree.
133 The data file contains "nodes" organized in a tree.
134 Each node represents a file or directory inside the working directory
134 Each node represents a file or directory inside the working directory
135 or its parent changeset.
135 or its parent changeset.
136 This tree has the same structure as the filesystem,
136 This tree has the same structure as the filesystem,
137 so a node representing a directory has child nodes representing
137 so a node representing a directory has child nodes representing
138 the files and subdirectories contained directly in that directory.
138 the files and subdirectories contained directly in that directory.
139
139
140 The docket file format
140 The docket file format
141 ----------------------
141 ----------------------
142
142
143 This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
143 This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
144 and `mercurial/dirstateutils/docket.py`.
144 and `mercurial/dirstateutils/docket.py`.
145
145
146 Components of the docket file are found at fixed offsets,
146 Components of the docket file are found at fixed offsets,
147 counted in bytes from the start of the file:
147 counted in bytes from the start of the file:
148
148
149 * Offset 0:
149 * Offset 0:
150 The 12-bytes marker string "dirstate-v2\n" ending with a newline character.
150 The 12-bytes marker string "dirstate-v2\n" ending with a newline character.
151 This makes it easier to tell a dirstate-v2 file from a dirstate-v1 file,
151 This makes it easier to tell a dirstate-v2 file from a dirstate-v1 file,
152 although it is not strictly necessary
152 although it is not strictly necessary
153 since `.hg/requires` determines which format to use.
153 since `.hg/requires` determines which format to use.
154
154
155 * Offset 12:
155 * Offset 12:
156 The changeset node ID on the first parent of the working directory,
156 The changeset node ID on the first parent of the working directory,
157 as up to 32 binary bytes.
157 as up to 32 binary bytes.
158 If a node ID is shorter (20 bytes for SHA-1),
158 If a node ID is shorter (20 bytes for SHA-1),
159 it is start-aligned and the rest of the bytes are set to zero.
159 it is start-aligned and the rest of the bytes are set to zero.
160
160
161 * Offset 44:
161 * Offset 44:
162 The changeset node ID on the second parent of the working directory,
162 The changeset node ID on the second parent of the working directory,
163 or all zeros if there isn’t one.
163 or all zeros if there isn’t one.
164 Also 32 binary bytes.
164 Also 32 binary bytes.
165
165
166 * Offset 76:
166 * Offset 76:
167 Tree metadata on 44 bytes, described below.
167 Tree metadata on 44 bytes, described below.
168 Its separation in this documentation from the rest of the docket
168 Its separation in this documentation from the rest of the docket
169 reflects a detail of the current implementation.
169 reflects a detail of the current implementation.
170 Since tree metadata is also made of fields at fixed offsets, those could
170 Since tree metadata is also made of fields at fixed offsets, those could
171 be inlined here by adding 76 bytes to each offset.
171 be inlined here by adding 76 bytes to each offset.
172
172
173 * Offset 120:
173 * Offset 120:
174 The used size of the data file, as a 32-bit big-endian integer.
174 The used size of the data file, as a 32-bit big-endian integer.
175 The actual size of the data file may be larger
175 The actual size of the data file may be larger
176 (if another Mercurial processis in appending to it
176 (if another Mercurial processis in appending to it
177 but has not updated the docket yet).
177 but has not updated the docket yet).
178 That extra data must be ignored.
178 That extra data must be ignored.
179
179
180 * Offset 124:
180 * Offset 124:
181 The length of the data file identifier, as a 8-bit integer.
181 The length of the data file identifier, as a 8-bit integer.
182
182
183 * Offset 125:
183 * Offset 125:
184 The data file identifier.
184 The data file identifier.
185
185
186 * Any additional data is current ignored, and dropped when updating the file.
186 * Any additional data is current ignored, and dropped when updating the file.
187
187
188 Tree metadata in the docket file
188 Tree metadata in the docket file
189 --------------------------------
189 --------------------------------
190
190
191 Tree metadata is similarly made of components at fixed offsets.
191 Tree metadata is similarly made of components at fixed offsets.
192 These offsets are counted in bytes from the start of tree metadata,
192 These offsets are counted in bytes from the start of tree metadata,
193 which is 76 bytes after the start of the docket file.
193 which is 76 bytes after the start of the docket file.
194
194
195 This metadata can be thought of as the singular root of the tree
195 This metadata can be thought of as the singular root of the tree
196 formed by nodes in the data file.
196 formed by nodes in the data file.
197
197
198 * Offset 0:
198 * Offset 0:
199 Pseudo-pointer to the start of root nodes,
199 Pseudo-pointer to the start of root nodes,
200 counted in bytes from the start of the data file,
200 counted in bytes from the start of the data file,
201 as a 32-bit big-endian integer.
201 as a 32-bit big-endian integer.
202 These nodes describe files and directories found directly
202 These nodes describe files and directories found directly
203 at the root of the working directory.
203 at the root of the working directory.
204
204
205 * Offset 4:
205 * Offset 4:
206 Number of root nodes, as a 32-bit big-endian integer.
206 Number of root nodes, as a 32-bit big-endian integer.
207
207
208 * Offset 8:
208 * Offset 8:
209 Total number of nodes in the entire tree that "have a dirstate entry",
209 Total number of nodes in the entire tree that "have a dirstate entry",
210 as a 32-bit big-endian integer.
210 as a 32-bit big-endian integer.
211 Those nodes represent files that would be present at all in `dirstate-v1`.
211 Those nodes represent files that would be present at all in `dirstate-v1`.
212 This is typically less than the total number of nodes.
212 This is typically less than the total number of nodes.
213 This counter is used to implement `len(dirstatemap)`.
213 This counter is used to implement `len(dirstatemap)`.
214
214
215 * Offset 12:
215 * Offset 12:
216 Number of nodes in the entire tree that have a copy source,
216 Number of nodes in the entire tree that have a copy source,
217 as a 32-bit big-endian integer.
217 as a 32-bit big-endian integer.
218 At the next commit, these files are recorded
218 At the next commit, these files are recorded
219 as having been copied or moved/renamed from that source.
219 as having been copied or moved/renamed from that source.
220 (A move is recorded as a copy and separate removal of the source.)
220 (A move is recorded as a copy and separate removal of the source.)
221 This counter is used to implement `len(dirstatemap.copymap)`.
221 This counter is used to implement `len(dirstatemap.copymap)`.
222
222
223 * Offset 16:
223 * Offset 16:
224 An estimation of how many bytes of the data file
224 An estimation of how many bytes of the data file
225 (within its used size) are unused, as a 32-bit big-endian integer.
225 (within its used size) are unused, as a 32-bit big-endian integer.
226 When appending to an existing data file,
226 When appending to an existing data file,
227 some existing nodes or paths can be unreachable from the new root
227 some existing nodes or paths can be unreachable from the new root
228 but they still take up space.
228 but they still take up space.
229 This counter is used to decide when to write a new data file from scratch
229 This counter is used to decide when to write a new data file from scratch
230 instead of appending to an existing one,
230 instead of appending to an existing one,
231 in order to get rid of that unreachable data
231 in order to get rid of that unreachable data
232 and avoid unbounded file size growth.
232 and avoid unbounded file size growth.
233
233
234 * Offset 20:
234 * Offset 20:
235 These four bytes are currently ignored
235 These four bytes are currently ignored
236 and reset to zero when updating a docket file.
236 and reset to zero when updating a docket file.
237 This is an attempt at forward compatibility:
237 This is an attempt at forward compatibility:
238 future Mercurial versions could use this as a bit field
238 future Mercurial versions could use this as a bit field
239 to indicate that a dirstate has additional data or constraints.
239 to indicate that a dirstate has additional data or constraints.
240 Finding a dirstate file with the relevant bit unset indicates that
240 Finding a dirstate file with the relevant bit unset indicates that
241 it was written by a then-older version
241 it was written by a then-older version
242 which is not aware of that future change.
242 which is not aware of that future change.
243
243
244 * Offset 24:
244 * Offset 24:
245 Either 20 zero bytes, or a SHA-1 hash as 20 binary bytes.
245 Either 20 zero bytes, or a SHA-1 hash as 20 binary bytes.
246 When present, the hash is of ignore patterns
246 When present, the hash is of ignore patterns
247 that were used for some previous run of the `status` algorithm.
247 that were used for some previous run of the `status` algorithm.
248
248
249 * (Offset 44: end of tree metadata)
249 * (Offset 44: end of tree metadata)
250
250
251 Optional hash of ignore patterns
251 Optional hash of ignore patterns
252 --------------------------------
252 --------------------------------
253
253
254 The implementation of `status` at `rust/hg-core/src/dirstate_tree/status.rs`
254 The implementation of `status` at `rust/hg-core/src/dirstate_tree/status.rs`
255 has been optimized such that its run time is dominated by calls
255 has been optimized such that its run time is dominated by calls
256 to `stat` for reading the filesystem metadata of a file or directory,
256 to `stat` for reading the filesystem metadata of a file or directory,
257 and to `readdir` for listing the contents of a directory.
257 and to `readdir` for listing the contents of a directory.
258 In some cases the algorithm can skip calls to `readdir`
258 In some cases the algorithm can skip calls to `readdir`
259 (saving significant time)
259 (saving significant time)
260 because the dirstate already contains enough of the relevant information
260 because the dirstate already contains enough of the relevant information
261 to build the correct `status` results.
261 to build the correct `status` results.
262
262
263 The default configuration of `hg status` is to list unknown files
263 The default configuration of `hg status` is to list unknown files
264 but not ignored files.
264 but not ignored files.
265 In this case, it matters for the `readdir`-skipping optimization
265 In this case, it matters for the `readdir`-skipping optimization
266 if a given file used to be ignored but became unknown
266 if a given file used to be ignored but became unknown
267 because `.hgignore` changed.
267 because `.hgignore` changed.
268 To detect the possibility of such a change,
268 To detect the possibility of such a change,
269 the tree metadata contains an optional hash of all ignore patterns.
269 the tree metadata contains an optional hash of all ignore patterns.
270
270
271 We define:
271 We define:
272
272
273 * "Root" ignore files as:
273 * "Root" ignore files as:
274
274
275 - `.hgignore` at the root of the repository if it exists
275 - `.hgignore` at the root of the repository if it exists
276 - And all files from `ui.ignore.*` config.
276 - And all files from `ui.ignore.*` config.
277
277
278 This set of files is sorted by the string representation of their path.
278 This set of files is sorted by the string representation of their path.
279
279
280 * The "expanded contents" of an ignore files is the byte string made
280 * The "expanded contents" of an ignore files is the byte string made
281 by the concatenation of its contents followed by the "expanded contents"
281 by the concatenation of its contents followed by the "expanded contents"
282 of other files included with `include:` or `subinclude:` directives,
282 of other files included with `include:` or `subinclude:` directives,
283 in inclusion order. This definition is recursive, as included files can
283 in inclusion order. This definition is recursive, as included files can
284 themselves include more files.
284 themselves include more files.
285
285
286 This hash is defined as the SHA-1 of the concatenation (in sorted
286 This hash is defined as the SHA-1 of the concatenation (in sorted
287 order) of the "expanded contents" of each "root" ignore file.
287 order) of the "expanded contents" of each "root" ignore file.
288 (Note that computing this does not require actually concatenating
288 (Note that computing this does not require actually concatenating
289 into a single contiguous byte sequence.
289 into a single contiguous byte sequence.
290 Instead a SHA-1 hasher object can be created
290 Instead a SHA-1 hasher object can be created
291 and fed separate chunks one by one.)
291 and fed separate chunks one by one.)
292
292
293 The data file format
293 The data file format
294 --------------------
294 --------------------
295
295
296 This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
296 This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
297 and `mercurial/dirstateutils/v2.py`.
297 and `mercurial/dirstateutils/v2.py`.
298
298
299 The data file contains two types of data: paths and nodes.
299 The data file contains two types of data: paths and nodes.
300
300
301 Paths and nodes can be organized in any order in the file, except that sibling
301 Paths and nodes can be organized in any order in the file, except that sibling
302 nodes must be next to each other and sorted by their path.
302 nodes must be next to each other and sorted by their path.
303 Contiguity lets the parent refer to them all
303 Contiguity lets the parent refer to them all
304 by their count and a single pseudo-pointer,
304 by their count and a single pseudo-pointer,
305 instead of storing one pseudo-pointer per child node.
305 instead of storing one pseudo-pointer per child node.
306 Sorting allows using binary seach to find a child node with a given name
306 Sorting allows using binary seach to find a child node with a given name
307 in `O(log(n))` byte sequence comparisons.
307 in `O(log(n))` byte sequence comparisons.
308
308
309 The current implemention writes paths and child node before a given node
309 The current implemention writes paths and child node before a given node
310 for ease of figuring out the value of pseudo-pointers by the time the are to be
310 for ease of figuring out the value of pseudo-pointers by the time the are to be
311 written, but this is not an obligation and readers must not rely on it.
311 written, but this is not an obligation and readers must not rely on it.
312
312
313 A path is stored as a byte string anywhere in the file, without delimiter.
313 A path is stored as a byte string anywhere in the file, without delimiter.
314 It is refered to by one or more node by a pseudo-pointer to its start, and its
314 It is refered to by one or more node by a pseudo-pointer to its start, and its
315 length in bytes. Since there is no delimiter,
315 length in bytes. Since there is no delimiter,
316 when a path is a substring of another the same bytes could be reused,
316 when a path is a substring of another the same bytes could be reused,
317 although the implementation does not exploit this as of this writing.
317 although the implementation does not exploit this as of this writing.
318
318
319 A node is stored on 43 bytes with components at fixed offsets. Paths and
319 A node is stored on 43 bytes with components at fixed offsets. Paths and
320 child nodes relevant to a node are stored externally and referenced though
320 child nodes relevant to a node are stored externally and referenced though
321 pseudo-pointers.
321 pseudo-pointers.
322
322
323 All integers are stored in big-endian. All pseudo-pointers are 32-bit integers
323 All integers are stored in big-endian. All pseudo-pointers are 32-bit integers
324 counting bytes from the start of the data file. Path lengths and positions
324 counting bytes from the start of the data file. Path lengths and positions
325 are 16-bit integers, also counted in bytes.
325 are 16-bit integers, also counted in bytes.
326
326
327 Node components are:
327 Node components are:
328
328
329 * Offset 0:
329 * Offset 0:
330 Pseudo-pointer to the full path of this node,
330 Pseudo-pointer to the full path of this node,
331 from the working directory root.
331 from the working directory root.
332
332
333 * Offset 4:
333 * Offset 4:
334 Length of the full path.
334 Length of the full path.
335
335
336 * Offset 6:
336 * Offset 6:
337 Position of the last `/` path separator within the full path,
337 Position of the last `/` path separator within the full path,
338 in bytes from the start of the full path,
338 in bytes from the start of the full path,
339 or zero if there isn’t one.
339 or zero if there isn’t one.
340 The part of the full path after this position is the "base name".
340 The part of the full path after this position is the "base name".
341 Since sibling nodes have the same parent, only their base name vary
341 Since sibling nodes have the same parent, only their base name vary
342 and needs to be considered when doing binary search to find a given path.
342 and needs to be considered when doing binary search to find a given path.
343
343
344 * Offset 8:
344 * Offset 8:
345 Pseudo-pointer to the "copy source" path for this node,
345 Pseudo-pointer to the "copy source" path for this node,
346 or zero if there is no copy source.
346 or zero if there is no copy source.
347
347
348 * Offset 12:
348 * Offset 12:
349 Length of the copy source path, or zero if there isn’t one.
349 Length of the copy source path, or zero if there isn’t one.
350
350
351 * Offset 14:
351 * Offset 14:
352 Pseudo-pointer to the start of child nodes.
352 Pseudo-pointer to the start of child nodes.
353
353
354 * Offset 18:
354 * Offset 18:
355 Number of child nodes, as a 32-bit integer.
355 Number of child nodes, as a 32-bit integer.
356 They occupy 43 times this number of bytes
356 They occupy 43 times this number of bytes
357 (not counting space for paths, and further descendants).
357 (not counting space for paths, and further descendants).
358
358
359 * Offset 22:
359 * Offset 22:
360 Number as a 32-bit integer of descendant nodes in this subtree,
360 Number as a 32-bit integer of descendant nodes in this subtree,
361 not including this node itself,
361 not including this node itself,
362 that "have a dirstate entry".
362 that "have a dirstate entry".
363 Those nodes represent files that would be present at all in `dirstate-v1`.
363 Those nodes represent files that would be present at all in `dirstate-v1`.
364 This is typically less than the total number of descendants.
364 This is typically less than the total number of descendants.
365 This counter is used to implement `has_dir`.
365 This counter is used to implement `has_dir`.
366
366
367 * Offset 26:
367 * Offset 26:
368 Number as a 32-bit integer of descendant nodes in this subtree,
368 Number as a 32-bit integer of descendant nodes in this subtree,
369 not including this node itself,
369 not including this node itself,
370 that represent files tracked in the working directory.
370 that represent files tracked in the working directory.
371 (For example, `hg rm` makes a file untracked.)
371 (For example, `hg rm` makes a file untracked.)
372 This counter is used to implement `has_tracked_dir`.
372 This counter is used to implement `has_tracked_dir`.
373
373
374 * Offset 30 and more:
374 * Offset 30:
375 **TODO:** docs not written yet
375 Some boolean values packed as bits of a single byte.
376 as this part of the format might be changing soon.
376 Starting from least-significant, bit masks are::
377
378 WDIR_TRACKED = 1 << 0
379 P1_TRACKED = 1 << 1
380 P2_INFO = 1 << 2
381 HAS_MODE_AND_SIZE = 1 << 3
382 HAS_MTIME = 1 << 4
383
384 Other bits are unset. The meaning of these bits are:
385
386 `WDIR_TRACKED`
387 Set if the working directory contains a tracked file at this node’s path.
388 This is typically set and unset by `hg add` and `hg rm`.
389
390 `P1_TRACKED`
391 set if the working directory’s first parent changeset
392 (whose node identifier is found in tree metadata)
393 contains a tracked file at this node’s path.
394 This is a cache to reduce manifest lookups.
395
396 `P2_INFO`
397 Set if the file has been involved in some merge operation.
398 Either because it was actually merged,
399 or because the version in the second parent p2 version was ahead,
400 or because some rename moved it there.
401 In either case `hg status` will want it displayed as modified.
402
403 Files that would be mentioned at all in the `dirstate-v1` file format
404 have a node with at least one of the above three bits set in `dirstate-v2`.
405 Let’s call these files "tracked anywhere",
406 and "untracked" the nodes with all three of these bits unset.
407 Untracked nodes are typically for directories:
408 they hold child nodes and form the tree structure.
409 Additional untracked nodes may also exist.
410 Although implementations should strive to clean up nodes
411 that are entirely unused, other untracked nodes may also exist.
412 For example, a future version of Mercurial might in some cases
413 add nodes for untracked files or/and ignored files in the working directory
414 in order to optimize `hg status`
415 by enabling it to skip `readdir` in more cases.
416
417 When a node is for a file tracked anywhere,
418 the rest of the node data is three fields:
419
420 * Offset 31:
421 If `HAS_MODE_AND_SIZE` is unset, four zero bytes.
422 Otherwise, a 32-bit integer for the Unix mode (as in `stat_result.st_mode`)
423 expected for this file to be considered clean.
424 Only the `S_IXUSR` bit (owner has execute permission) is considered.
425
426 * Offset 35:
427 If `HAS_MTIME` is unset, four zero bytes.
428 Otherwise, a 32-bit integer for expected modified time of the file
429 (as in `stat_result.st_mtime`),
430 truncated to its 31 least-significant bits.
431 Unlike in dirstate-v1, negative values are not used.
432
433 * Offset 39:
434 If `HAS_MODE_AND_SIZE` is unset, four zero bytes.
435 Otherwise, a 32-bit integer for expected size of the file
436 truncated to its 31 least-significant bits.
437 Unlike in dirstate-v1, negative values are not used.
438
439 If an untracked node `HAS_MTIME` *unset*, this space is unused:
440
441 * Offset 31:
442 12 bytes set to zero
443
444 If an untracked node `HAS_MTIME` *set*,
445 what follows is the modification time of a directory
446 represented with separated second and sub-second components
447 since the Unix epoch:
448
449 * Offset 31:
450 The number of seconds as a signed (two’s complement) 64-bit integer.
451
452 * Offset 39:
453 The number of nanoseconds as 32-bit integer.
454 Always greater than or equal to zero, and strictly less than a billion.
455 Increasing this component makes the modification time
456 go forward or backward in time dependening
457 on the sign of the integral seconds components.
458 (Note: this is buggy because there is no negative zero integer,
459 but will be changed soon.)
460
461 The presence of a directory modification time means that at some point,
462 this path in the working directory was observed:
463
464 - To be a directory
465 - With the given modification time
466 - That time was already strictly in the past when observed,
467 meaning that later changes cannot happen in the same clock tick
468 and must cause a different modification time
469 (unless the system clock jumps back and we get unlucky,
470 which is not impossible but deemed unlikely enough).
471 - All direct children of this directory
472 (as returned by `std::fs::read_dir`)
473 either have a corresponding dirstate node,
474 or are ignored by ignore patterns whose hash is in tree metadata.
475
476 This means that if `std::fs::symlink_metadata` later reports
477 the same modification time
478 and ignored patterns haven’t changed,
479 a run of status that is not listing ignored files
480 can skip calling `std::fs::read_dir` again for this directory,
481 and iterate child dirstate nodes instead.
482
483
484 * (Offset 43: end of this node)
@@ -1,736 +1,736 b''
1 # parsers.py - Python implementation of parsers.c
1 # parsers.py - Python implementation of parsers.c
2 #
2 #
3 # Copyright 2009 Olivia Mackall <olivia@selenic.com> and others
3 # Copyright 2009 Olivia Mackall <olivia@selenic.com> and others
4 #
4 #
5 # This software may be used and distributed according to the terms of the
5 # This software may be used and distributed according to the terms of the
6 # GNU General Public License version 2 or any later version.
6 # GNU General Public License version 2 or any later version.
7
7
8 from __future__ import absolute_import
8 from __future__ import absolute_import
9
9
10 import struct
10 import struct
11 import zlib
11 import zlib
12
12
13 from ..node import (
13 from ..node import (
14 nullrev,
14 nullrev,
15 sha1nodeconstants,
15 sha1nodeconstants,
16 )
16 )
17 from ..thirdparty import attr
17 from ..thirdparty import attr
18 from .. import (
18 from .. import (
19 error,
19 error,
20 pycompat,
20 pycompat,
21 revlogutils,
21 revlogutils,
22 util,
22 util,
23 )
23 )
24
24
25 from ..revlogutils import nodemap as nodemaputil
25 from ..revlogutils import nodemap as nodemaputil
26 from ..revlogutils import constants as revlog_constants
26 from ..revlogutils import constants as revlog_constants
27
27
28 stringio = pycompat.bytesio
28 stringio = pycompat.bytesio
29
29
30
30
31 _pack = struct.pack
31 _pack = struct.pack
32 _unpack = struct.unpack
32 _unpack = struct.unpack
33 _compress = zlib.compress
33 _compress = zlib.compress
34 _decompress = zlib.decompress
34 _decompress = zlib.decompress
35
35
36
36
37 # a special value used internally for `size` if the file come from the other parent
37 # a special value used internally for `size` if the file come from the other parent
38 FROM_P2 = -2
38 FROM_P2 = -2
39
39
40 # a special value used internally for `size` if the file is modified/merged/added
40 # a special value used internally for `size` if the file is modified/merged/added
41 NONNORMAL = -1
41 NONNORMAL = -1
42
42
43 # a special value used internally for `time` if the time is ambigeous
43 # a special value used internally for `time` if the time is ambigeous
44 AMBIGUOUS_TIME = -1
44 AMBIGUOUS_TIME = -1
45
45
46
46
47 @attr.s(slots=True, init=False)
47 @attr.s(slots=True, init=False)
48 class DirstateItem(object):
48 class DirstateItem(object):
49 """represent a dirstate entry
49 """represent a dirstate entry
50
50
51 It hold multiple attributes
51 It hold multiple attributes
52
52
53 # about file tracking
53 # about file tracking
54 - wc_tracked: is the file tracked by the working copy
54 - wc_tracked: is the file tracked by the working copy
55 - p1_tracked: is the file tracked in working copy first parent
55 - p1_tracked: is the file tracked in working copy first parent
56 - p2_info: the file has been involved in some merge operation. Either
56 - p2_info: the file has been involved in some merge operation. Either
57 because it was actually merged, or because the p2 version was
57 because it was actually merged, or because the p2 version was
58 ahead, or because some renamed moved it there. In either case
58 ahead, or because some rename moved it there. In either case
59 `hg status` will want it displayed as modified.
59 `hg status` will want it displayed as modified.
60
60
61 # about the file state expected from p1 manifest:
61 # about the file state expected from p1 manifest:
62 - mode: the file mode in p1
62 - mode: the file mode in p1
63 - size: the file size in p1
63 - size: the file size in p1
64
64
65 These value can be set to None, which mean we don't have a meaningful value
65 These value can be set to None, which mean we don't have a meaningful value
66 to compare with. Either because we don't really care about them as there
66 to compare with. Either because we don't really care about them as there
67 `status` is known without having to look at the disk or because we don't
67 `status` is known without having to look at the disk or because we don't
68 know these right now and a full comparison will be needed to find out if
68 know these right now and a full comparison will be needed to find out if
69 the file is clean.
69 the file is clean.
70
70
71 # about the file state on disk last time we saw it:
71 # about the file state on disk last time we saw it:
72 - mtime: the last known clean mtime for the file.
72 - mtime: the last known clean mtime for the file.
73
73
74 This value can be set to None if no cachable state exist. Either because we
74 This value can be set to None if no cachable state exist. Either because we
75 do not care (see previous section) or because we could not cache something
75 do not care (see previous section) or because we could not cache something
76 yet.
76 yet.
77 """
77 """
78
78
79 _wc_tracked = attr.ib()
79 _wc_tracked = attr.ib()
80 _p1_tracked = attr.ib()
80 _p1_tracked = attr.ib()
81 _p2_info = attr.ib()
81 _p2_info = attr.ib()
82 _mode = attr.ib()
82 _mode = attr.ib()
83 _size = attr.ib()
83 _size = attr.ib()
84 _mtime = attr.ib()
84 _mtime = attr.ib()
85
85
86 def __init__(
86 def __init__(
87 self,
87 self,
88 wc_tracked=False,
88 wc_tracked=False,
89 p1_tracked=False,
89 p1_tracked=False,
90 p2_info=False,
90 p2_info=False,
91 has_meaningful_data=True,
91 has_meaningful_data=True,
92 has_meaningful_mtime=True,
92 has_meaningful_mtime=True,
93 parentfiledata=None,
93 parentfiledata=None,
94 ):
94 ):
95 self._wc_tracked = wc_tracked
95 self._wc_tracked = wc_tracked
96 self._p1_tracked = p1_tracked
96 self._p1_tracked = p1_tracked
97 self._p2_info = p2_info
97 self._p2_info = p2_info
98
98
99 self._mode = None
99 self._mode = None
100 self._size = None
100 self._size = None
101 self._mtime = None
101 self._mtime = None
102 if parentfiledata is None:
102 if parentfiledata is None:
103 has_meaningful_mtime = False
103 has_meaningful_mtime = False
104 has_meaningful_data = False
104 has_meaningful_data = False
105 if has_meaningful_data:
105 if has_meaningful_data:
106 self._mode = parentfiledata[0]
106 self._mode = parentfiledata[0]
107 self._size = parentfiledata[1]
107 self._size = parentfiledata[1]
108 if has_meaningful_mtime:
108 if has_meaningful_mtime:
109 self._mtime = parentfiledata[2]
109 self._mtime = parentfiledata[2]
110
110
111 @classmethod
111 @classmethod
112 def from_v1_data(cls, state, mode, size, mtime):
112 def from_v1_data(cls, state, mode, size, mtime):
113 """Build a new DirstateItem object from V1 data
113 """Build a new DirstateItem object from V1 data
114
114
115 Since the dirstate-v1 format is frozen, the signature of this function
115 Since the dirstate-v1 format is frozen, the signature of this function
116 is not expected to change, unlike the __init__ one.
116 is not expected to change, unlike the __init__ one.
117 """
117 """
118 if state == b'm':
118 if state == b'm':
119 return cls(wc_tracked=True, p1_tracked=True, p2_info=True)
119 return cls(wc_tracked=True, p1_tracked=True, p2_info=True)
120 elif state == b'a':
120 elif state == b'a':
121 return cls(wc_tracked=True)
121 return cls(wc_tracked=True)
122 elif state == b'r':
122 elif state == b'r':
123 if size == NONNORMAL:
123 if size == NONNORMAL:
124 p1_tracked = True
124 p1_tracked = True
125 p2_info = True
125 p2_info = True
126 elif size == FROM_P2:
126 elif size == FROM_P2:
127 p1_tracked = False
127 p1_tracked = False
128 p2_info = True
128 p2_info = True
129 else:
129 else:
130 p1_tracked = True
130 p1_tracked = True
131 p2_info = False
131 p2_info = False
132 return cls(p1_tracked=p1_tracked, p2_info=p2_info)
132 return cls(p1_tracked=p1_tracked, p2_info=p2_info)
133 elif state == b'n':
133 elif state == b'n':
134 if size == FROM_P2:
134 if size == FROM_P2:
135 return cls(wc_tracked=True, p2_info=True)
135 return cls(wc_tracked=True, p2_info=True)
136 elif size == NONNORMAL:
136 elif size == NONNORMAL:
137 return cls(wc_tracked=True, p1_tracked=True)
137 return cls(wc_tracked=True, p1_tracked=True)
138 elif mtime == AMBIGUOUS_TIME:
138 elif mtime == AMBIGUOUS_TIME:
139 return cls(
139 return cls(
140 wc_tracked=True,
140 wc_tracked=True,
141 p1_tracked=True,
141 p1_tracked=True,
142 has_meaningful_mtime=False,
142 has_meaningful_mtime=False,
143 parentfiledata=(mode, size, 42),
143 parentfiledata=(mode, size, 42),
144 )
144 )
145 else:
145 else:
146 return cls(
146 return cls(
147 wc_tracked=True,
147 wc_tracked=True,
148 p1_tracked=True,
148 p1_tracked=True,
149 parentfiledata=(mode, size, mtime),
149 parentfiledata=(mode, size, mtime),
150 )
150 )
151 else:
151 else:
152 raise RuntimeError(b'unknown state: %s' % state)
152 raise RuntimeError(b'unknown state: %s' % state)
153
153
154 def set_possibly_dirty(self):
154 def set_possibly_dirty(self):
155 """Mark a file as "possibly dirty"
155 """Mark a file as "possibly dirty"
156
156
157 This means the next status call will have to actually check its content
157 This means the next status call will have to actually check its content
158 to make sure it is correct.
158 to make sure it is correct.
159 """
159 """
160 self._mtime = None
160 self._mtime = None
161
161
162 def set_clean(self, mode, size, mtime):
162 def set_clean(self, mode, size, mtime):
163 """mark a file as "clean" cancelling potential "possibly dirty call"
163 """mark a file as "clean" cancelling potential "possibly dirty call"
164
164
165 Note: this function is a descendant of `dirstate.normal` and is
165 Note: this function is a descendant of `dirstate.normal` and is
166 currently expected to be call on "normal" entry only. There are not
166 currently expected to be call on "normal" entry only. There are not
167 reason for this to not change in the future as long as the ccode is
167 reason for this to not change in the future as long as the ccode is
168 updated to preserve the proper state of the non-normal files.
168 updated to preserve the proper state of the non-normal files.
169 """
169 """
170 self._wc_tracked = True
170 self._wc_tracked = True
171 self._p1_tracked = True
171 self._p1_tracked = True
172 self._mode = mode
172 self._mode = mode
173 self._size = size
173 self._size = size
174 self._mtime = mtime
174 self._mtime = mtime
175
175
176 def set_tracked(self):
176 def set_tracked(self):
177 """mark a file as tracked in the working copy
177 """mark a file as tracked in the working copy
178
178
179 This will ultimately be called by command like `hg add`.
179 This will ultimately be called by command like `hg add`.
180 """
180 """
181 self._wc_tracked = True
181 self._wc_tracked = True
182 # `set_tracked` is replacing various `normallookup` call. So we mark
182 # `set_tracked` is replacing various `normallookup` call. So we mark
183 # the files as needing lookup
183 # the files as needing lookup
184 #
184 #
185 # Consider dropping this in the future in favor of something less broad.
185 # Consider dropping this in the future in favor of something less broad.
186 self._mtime = None
186 self._mtime = None
187
187
188 def set_untracked(self):
188 def set_untracked(self):
189 """mark a file as untracked in the working copy
189 """mark a file as untracked in the working copy
190
190
191 This will ultimately be called by command like `hg remove`.
191 This will ultimately be called by command like `hg remove`.
192 """
192 """
193 self._wc_tracked = False
193 self._wc_tracked = False
194 self._mode = None
194 self._mode = None
195 self._size = None
195 self._size = None
196 self._mtime = None
196 self._mtime = None
197
197
198 def drop_merge_data(self):
198 def drop_merge_data(self):
199 """remove all "merge-only" from a DirstateItem
199 """remove all "merge-only" from a DirstateItem
200
200
201 This is to be call by the dirstatemap code when the second parent is dropped
201 This is to be call by the dirstatemap code when the second parent is dropped
202 """
202 """
203 if self._p2_info:
203 if self._p2_info:
204 self._p2_info = False
204 self._p2_info = False
205 self._mode = None
205 self._mode = None
206 self._size = None
206 self._size = None
207 self._mtime = None
207 self._mtime = None
208
208
209 @property
209 @property
210 def mode(self):
210 def mode(self):
211 return self.v1_mode()
211 return self.v1_mode()
212
212
213 @property
213 @property
214 def size(self):
214 def size(self):
215 return self.v1_size()
215 return self.v1_size()
216
216
217 @property
217 @property
218 def mtime(self):
218 def mtime(self):
219 return self.v1_mtime()
219 return self.v1_mtime()
220
220
221 @property
221 @property
222 def state(self):
222 def state(self):
223 """
223 """
224 States are:
224 States are:
225 n normal
225 n normal
226 m needs merging
226 m needs merging
227 r marked for removal
227 r marked for removal
228 a marked for addition
228 a marked for addition
229
229
230 XXX This "state" is a bit obscure and mostly a direct expression of the
230 XXX This "state" is a bit obscure and mostly a direct expression of the
231 dirstatev1 format. It would make sense to ultimately deprecate it in
231 dirstatev1 format. It would make sense to ultimately deprecate it in
232 favor of the more "semantic" attributes.
232 favor of the more "semantic" attributes.
233 """
233 """
234 if not self.any_tracked:
234 if not self.any_tracked:
235 return b'?'
235 return b'?'
236 return self.v1_state()
236 return self.v1_state()
237
237
238 @property
238 @property
239 def tracked(self):
239 def tracked(self):
240 """True is the file is tracked in the working copy"""
240 """True is the file is tracked in the working copy"""
241 return self._wc_tracked
241 return self._wc_tracked
242
242
243 @property
243 @property
244 def any_tracked(self):
244 def any_tracked(self):
245 """True is the file is tracked anywhere (wc or parents)"""
245 """True is the file is tracked anywhere (wc or parents)"""
246 return self._wc_tracked or self._p1_tracked or self._p2_info
246 return self._wc_tracked or self._p1_tracked or self._p2_info
247
247
248 @property
248 @property
249 def added(self):
249 def added(self):
250 """True if the file has been added"""
250 """True if the file has been added"""
251 return self._wc_tracked and not (self._p1_tracked or self._p2_info)
251 return self._wc_tracked and not (self._p1_tracked or self._p2_info)
252
252
253 @property
253 @property
254 def maybe_clean(self):
254 def maybe_clean(self):
255 """True if the file has a chance to be in the "clean" state"""
255 """True if the file has a chance to be in the "clean" state"""
256 if not self._wc_tracked:
256 if not self._wc_tracked:
257 return False
257 return False
258 elif not self._p1_tracked:
258 elif not self._p1_tracked:
259 return False
259 return False
260 elif self._p2_info:
260 elif self._p2_info:
261 return False
261 return False
262 return True
262 return True
263
263
264 @property
264 @property
265 def p1_tracked(self):
265 def p1_tracked(self):
266 """True if the file is tracked in the first parent manifest"""
266 """True if the file is tracked in the first parent manifest"""
267 return self._p1_tracked
267 return self._p1_tracked
268
268
269 @property
269 @property
270 def p2_info(self):
270 def p2_info(self):
271 """True if the file needed to merge or apply any input from p2
271 """True if the file needed to merge or apply any input from p2
272
272
273 See the class documentation for details.
273 See the class documentation for details.
274 """
274 """
275 return self._wc_tracked and self._p2_info
275 return self._wc_tracked and self._p2_info
276
276
277 @property
277 @property
278 def removed(self):
278 def removed(self):
279 """True if the file has been removed"""
279 """True if the file has been removed"""
280 return not self._wc_tracked and (self._p1_tracked or self._p2_info)
280 return not self._wc_tracked and (self._p1_tracked or self._p2_info)
281
281
282 def v1_state(self):
282 def v1_state(self):
283 """return a "state" suitable for v1 serialization"""
283 """return a "state" suitable for v1 serialization"""
284 if not self.any_tracked:
284 if not self.any_tracked:
285 # the object has no state to record, this is -currently-
285 # the object has no state to record, this is -currently-
286 # unsupported
286 # unsupported
287 raise RuntimeError('untracked item')
287 raise RuntimeError('untracked item')
288 elif self.removed:
288 elif self.removed:
289 return b'r'
289 return b'r'
290 elif self._p1_tracked and self._p2_info:
290 elif self._p1_tracked and self._p2_info:
291 return b'm'
291 return b'm'
292 elif self.added:
292 elif self.added:
293 return b'a'
293 return b'a'
294 else:
294 else:
295 return b'n'
295 return b'n'
296
296
297 def v1_mode(self):
297 def v1_mode(self):
298 """return a "mode" suitable for v1 serialization"""
298 """return a "mode" suitable for v1 serialization"""
299 return self._mode if self._mode is not None else 0
299 return self._mode if self._mode is not None else 0
300
300
301 def v1_size(self):
301 def v1_size(self):
302 """return a "size" suitable for v1 serialization"""
302 """return a "size" suitable for v1 serialization"""
303 if not self.any_tracked:
303 if not self.any_tracked:
304 # the object has no state to record, this is -currently-
304 # the object has no state to record, this is -currently-
305 # unsupported
305 # unsupported
306 raise RuntimeError('untracked item')
306 raise RuntimeError('untracked item')
307 elif self.removed and self._p1_tracked and self._p2_info:
307 elif self.removed and self._p1_tracked and self._p2_info:
308 return NONNORMAL
308 return NONNORMAL
309 elif self._p2_info:
309 elif self._p2_info:
310 return FROM_P2
310 return FROM_P2
311 elif self.removed:
311 elif self.removed:
312 return 0
312 return 0
313 elif self.added:
313 elif self.added:
314 return NONNORMAL
314 return NONNORMAL
315 elif self._size is None:
315 elif self._size is None:
316 return NONNORMAL
316 return NONNORMAL
317 else:
317 else:
318 return self._size
318 return self._size
319
319
320 def v1_mtime(self):
320 def v1_mtime(self):
321 """return a "mtime" suitable for v1 serialization"""
321 """return a "mtime" suitable for v1 serialization"""
322 if not self.any_tracked:
322 if not self.any_tracked:
323 # the object has no state to record, this is -currently-
323 # the object has no state to record, this is -currently-
324 # unsupported
324 # unsupported
325 raise RuntimeError('untracked item')
325 raise RuntimeError('untracked item')
326 elif self.removed:
326 elif self.removed:
327 return 0
327 return 0
328 elif self._mtime is None:
328 elif self._mtime is None:
329 return AMBIGUOUS_TIME
329 return AMBIGUOUS_TIME
330 elif self._p2_info:
330 elif self._p2_info:
331 return AMBIGUOUS_TIME
331 return AMBIGUOUS_TIME
332 elif not self._p1_tracked:
332 elif not self._p1_tracked:
333 return AMBIGUOUS_TIME
333 return AMBIGUOUS_TIME
334 else:
334 else:
335 return self._mtime
335 return self._mtime
336
336
337 def need_delay(self, now):
337 def need_delay(self, now):
338 """True if the stored mtime would be ambiguous with the current time"""
338 """True if the stored mtime would be ambiguous with the current time"""
339 return self.v1_state() == b'n' and self.v1_mtime() == now
339 return self.v1_state() == b'n' and self.v1_mtime() == now
340
340
341
341
342 def gettype(q):
342 def gettype(q):
343 return int(q & 0xFFFF)
343 return int(q & 0xFFFF)
344
344
345
345
346 class BaseIndexObject(object):
346 class BaseIndexObject(object):
347 # Can I be passed to an algorithme implemented in Rust ?
347 # Can I be passed to an algorithme implemented in Rust ?
348 rust_ext_compat = 0
348 rust_ext_compat = 0
349 # Format of an index entry according to Python's `struct` language
349 # Format of an index entry according to Python's `struct` language
350 index_format = revlog_constants.INDEX_ENTRY_V1
350 index_format = revlog_constants.INDEX_ENTRY_V1
351 # Size of a C unsigned long long int, platform independent
351 # Size of a C unsigned long long int, platform independent
352 big_int_size = struct.calcsize(b'>Q')
352 big_int_size = struct.calcsize(b'>Q')
353 # Size of a C long int, platform independent
353 # Size of a C long int, platform independent
354 int_size = struct.calcsize(b'>i')
354 int_size = struct.calcsize(b'>i')
355 # An empty index entry, used as a default value to be overridden, or nullrev
355 # An empty index entry, used as a default value to be overridden, or nullrev
356 null_item = (
356 null_item = (
357 0,
357 0,
358 0,
358 0,
359 0,
359 0,
360 -1,
360 -1,
361 -1,
361 -1,
362 -1,
362 -1,
363 -1,
363 -1,
364 sha1nodeconstants.nullid,
364 sha1nodeconstants.nullid,
365 0,
365 0,
366 0,
366 0,
367 revlog_constants.COMP_MODE_INLINE,
367 revlog_constants.COMP_MODE_INLINE,
368 revlog_constants.COMP_MODE_INLINE,
368 revlog_constants.COMP_MODE_INLINE,
369 )
369 )
370
370
371 @util.propertycache
371 @util.propertycache
372 def entry_size(self):
372 def entry_size(self):
373 return self.index_format.size
373 return self.index_format.size
374
374
375 @property
375 @property
376 def nodemap(self):
376 def nodemap(self):
377 msg = b"index.nodemap is deprecated, use index.[has_node|rev|get_rev]"
377 msg = b"index.nodemap is deprecated, use index.[has_node|rev|get_rev]"
378 util.nouideprecwarn(msg, b'5.3', stacklevel=2)
378 util.nouideprecwarn(msg, b'5.3', stacklevel=2)
379 return self._nodemap
379 return self._nodemap
380
380
381 @util.propertycache
381 @util.propertycache
382 def _nodemap(self):
382 def _nodemap(self):
383 nodemap = nodemaputil.NodeMap({sha1nodeconstants.nullid: nullrev})
383 nodemap = nodemaputil.NodeMap({sha1nodeconstants.nullid: nullrev})
384 for r in range(0, len(self)):
384 for r in range(0, len(self)):
385 n = self[r][7]
385 n = self[r][7]
386 nodemap[n] = r
386 nodemap[n] = r
387 return nodemap
387 return nodemap
388
388
389 def has_node(self, node):
389 def has_node(self, node):
390 """return True if the node exist in the index"""
390 """return True if the node exist in the index"""
391 return node in self._nodemap
391 return node in self._nodemap
392
392
393 def rev(self, node):
393 def rev(self, node):
394 """return a revision for a node
394 """return a revision for a node
395
395
396 If the node is unknown, raise a RevlogError"""
396 If the node is unknown, raise a RevlogError"""
397 return self._nodemap[node]
397 return self._nodemap[node]
398
398
399 def get_rev(self, node):
399 def get_rev(self, node):
400 """return a revision for a node
400 """return a revision for a node
401
401
402 If the node is unknown, return None"""
402 If the node is unknown, return None"""
403 return self._nodemap.get(node)
403 return self._nodemap.get(node)
404
404
405 def _stripnodes(self, start):
405 def _stripnodes(self, start):
406 if '_nodemap' in vars(self):
406 if '_nodemap' in vars(self):
407 for r in range(start, len(self)):
407 for r in range(start, len(self)):
408 n = self[r][7]
408 n = self[r][7]
409 del self._nodemap[n]
409 del self._nodemap[n]
410
410
411 def clearcaches(self):
411 def clearcaches(self):
412 self.__dict__.pop('_nodemap', None)
412 self.__dict__.pop('_nodemap', None)
413
413
414 def __len__(self):
414 def __len__(self):
415 return self._lgt + len(self._extra)
415 return self._lgt + len(self._extra)
416
416
417 def append(self, tup):
417 def append(self, tup):
418 if '_nodemap' in vars(self):
418 if '_nodemap' in vars(self):
419 self._nodemap[tup[7]] = len(self)
419 self._nodemap[tup[7]] = len(self)
420 data = self._pack_entry(len(self), tup)
420 data = self._pack_entry(len(self), tup)
421 self._extra.append(data)
421 self._extra.append(data)
422
422
423 def _pack_entry(self, rev, entry):
423 def _pack_entry(self, rev, entry):
424 assert entry[8] == 0
424 assert entry[8] == 0
425 assert entry[9] == 0
425 assert entry[9] == 0
426 return self.index_format.pack(*entry[:8])
426 return self.index_format.pack(*entry[:8])
427
427
428 def _check_index(self, i):
428 def _check_index(self, i):
429 if not isinstance(i, int):
429 if not isinstance(i, int):
430 raise TypeError(b"expecting int indexes")
430 raise TypeError(b"expecting int indexes")
431 if i < 0 or i >= len(self):
431 if i < 0 or i >= len(self):
432 raise IndexError
432 raise IndexError
433
433
434 def __getitem__(self, i):
434 def __getitem__(self, i):
435 if i == -1:
435 if i == -1:
436 return self.null_item
436 return self.null_item
437 self._check_index(i)
437 self._check_index(i)
438 if i >= self._lgt:
438 if i >= self._lgt:
439 data = self._extra[i - self._lgt]
439 data = self._extra[i - self._lgt]
440 else:
440 else:
441 index = self._calculate_index(i)
441 index = self._calculate_index(i)
442 data = self._data[index : index + self.entry_size]
442 data = self._data[index : index + self.entry_size]
443 r = self._unpack_entry(i, data)
443 r = self._unpack_entry(i, data)
444 if self._lgt and i == 0:
444 if self._lgt and i == 0:
445 offset = revlogutils.offset_type(0, gettype(r[0]))
445 offset = revlogutils.offset_type(0, gettype(r[0]))
446 r = (offset,) + r[1:]
446 r = (offset,) + r[1:]
447 return r
447 return r
448
448
449 def _unpack_entry(self, rev, data):
449 def _unpack_entry(self, rev, data):
450 r = self.index_format.unpack(data)
450 r = self.index_format.unpack(data)
451 r = r + (
451 r = r + (
452 0,
452 0,
453 0,
453 0,
454 revlog_constants.COMP_MODE_INLINE,
454 revlog_constants.COMP_MODE_INLINE,
455 revlog_constants.COMP_MODE_INLINE,
455 revlog_constants.COMP_MODE_INLINE,
456 )
456 )
457 return r
457 return r
458
458
459 def pack_header(self, header):
459 def pack_header(self, header):
460 """pack header information as binary"""
460 """pack header information as binary"""
461 v_fmt = revlog_constants.INDEX_HEADER
461 v_fmt = revlog_constants.INDEX_HEADER
462 return v_fmt.pack(header)
462 return v_fmt.pack(header)
463
463
464 def entry_binary(self, rev):
464 def entry_binary(self, rev):
465 """return the raw binary string representing a revision"""
465 """return the raw binary string representing a revision"""
466 entry = self[rev]
466 entry = self[rev]
467 p = revlog_constants.INDEX_ENTRY_V1.pack(*entry[:8])
467 p = revlog_constants.INDEX_ENTRY_V1.pack(*entry[:8])
468 if rev == 0:
468 if rev == 0:
469 p = p[revlog_constants.INDEX_HEADER.size :]
469 p = p[revlog_constants.INDEX_HEADER.size :]
470 return p
470 return p
471
471
472
472
473 class IndexObject(BaseIndexObject):
473 class IndexObject(BaseIndexObject):
474 def __init__(self, data):
474 def __init__(self, data):
475 assert len(data) % self.entry_size == 0, (
475 assert len(data) % self.entry_size == 0, (
476 len(data),
476 len(data),
477 self.entry_size,
477 self.entry_size,
478 len(data) % self.entry_size,
478 len(data) % self.entry_size,
479 )
479 )
480 self._data = data
480 self._data = data
481 self._lgt = len(data) // self.entry_size
481 self._lgt = len(data) // self.entry_size
482 self._extra = []
482 self._extra = []
483
483
484 def _calculate_index(self, i):
484 def _calculate_index(self, i):
485 return i * self.entry_size
485 return i * self.entry_size
486
486
487 def __delitem__(self, i):
487 def __delitem__(self, i):
488 if not isinstance(i, slice) or not i.stop == -1 or i.step is not None:
488 if not isinstance(i, slice) or not i.stop == -1 or i.step is not None:
489 raise ValueError(b"deleting slices only supports a:-1 with step 1")
489 raise ValueError(b"deleting slices only supports a:-1 with step 1")
490 i = i.start
490 i = i.start
491 self._check_index(i)
491 self._check_index(i)
492 self._stripnodes(i)
492 self._stripnodes(i)
493 if i < self._lgt:
493 if i < self._lgt:
494 self._data = self._data[: i * self.entry_size]
494 self._data = self._data[: i * self.entry_size]
495 self._lgt = i
495 self._lgt = i
496 self._extra = []
496 self._extra = []
497 else:
497 else:
498 self._extra = self._extra[: i - self._lgt]
498 self._extra = self._extra[: i - self._lgt]
499
499
500
500
501 class PersistentNodeMapIndexObject(IndexObject):
501 class PersistentNodeMapIndexObject(IndexObject):
502 """a Debug oriented class to test persistent nodemap
502 """a Debug oriented class to test persistent nodemap
503
503
504 We need a simple python object to test API and higher level behavior. See
504 We need a simple python object to test API and higher level behavior. See
505 the Rust implementation for more serious usage. This should be used only
505 the Rust implementation for more serious usage. This should be used only
506 through the dedicated `devel.persistent-nodemap` config.
506 through the dedicated `devel.persistent-nodemap` config.
507 """
507 """
508
508
509 def nodemap_data_all(self):
509 def nodemap_data_all(self):
510 """Return bytes containing a full serialization of a nodemap
510 """Return bytes containing a full serialization of a nodemap
511
511
512 The nodemap should be valid for the full set of revisions in the
512 The nodemap should be valid for the full set of revisions in the
513 index."""
513 index."""
514 return nodemaputil.persistent_data(self)
514 return nodemaputil.persistent_data(self)
515
515
516 def nodemap_data_incremental(self):
516 def nodemap_data_incremental(self):
517 """Return bytes containing a incremental update to persistent nodemap
517 """Return bytes containing a incremental update to persistent nodemap
518
518
519 This containst the data for an append-only update of the data provided
519 This containst the data for an append-only update of the data provided
520 in the last call to `update_nodemap_data`.
520 in the last call to `update_nodemap_data`.
521 """
521 """
522 if self._nm_root is None:
522 if self._nm_root is None:
523 return None
523 return None
524 docket = self._nm_docket
524 docket = self._nm_docket
525 changed, data = nodemaputil.update_persistent_data(
525 changed, data = nodemaputil.update_persistent_data(
526 self, self._nm_root, self._nm_max_idx, self._nm_docket.tip_rev
526 self, self._nm_root, self._nm_max_idx, self._nm_docket.tip_rev
527 )
527 )
528
528
529 self._nm_root = self._nm_max_idx = self._nm_docket = None
529 self._nm_root = self._nm_max_idx = self._nm_docket = None
530 return docket, changed, data
530 return docket, changed, data
531
531
532 def update_nodemap_data(self, docket, nm_data):
532 def update_nodemap_data(self, docket, nm_data):
533 """provide full block of persisted binary data for a nodemap
533 """provide full block of persisted binary data for a nodemap
534
534
535 The data are expected to come from disk. See `nodemap_data_all` for a
535 The data are expected to come from disk. See `nodemap_data_all` for a
536 produceur of such data."""
536 produceur of such data."""
537 if nm_data is not None:
537 if nm_data is not None:
538 self._nm_root, self._nm_max_idx = nodemaputil.parse_data(nm_data)
538 self._nm_root, self._nm_max_idx = nodemaputil.parse_data(nm_data)
539 if self._nm_root:
539 if self._nm_root:
540 self._nm_docket = docket
540 self._nm_docket = docket
541 else:
541 else:
542 self._nm_root = self._nm_max_idx = self._nm_docket = None
542 self._nm_root = self._nm_max_idx = self._nm_docket = None
543
543
544
544
545 class InlinedIndexObject(BaseIndexObject):
545 class InlinedIndexObject(BaseIndexObject):
546 def __init__(self, data, inline=0):
546 def __init__(self, data, inline=0):
547 self._data = data
547 self._data = data
548 self._lgt = self._inline_scan(None)
548 self._lgt = self._inline_scan(None)
549 self._inline_scan(self._lgt)
549 self._inline_scan(self._lgt)
550 self._extra = []
550 self._extra = []
551
551
552 def _inline_scan(self, lgt):
552 def _inline_scan(self, lgt):
553 off = 0
553 off = 0
554 if lgt is not None:
554 if lgt is not None:
555 self._offsets = [0] * lgt
555 self._offsets = [0] * lgt
556 count = 0
556 count = 0
557 while off <= len(self._data) - self.entry_size:
557 while off <= len(self._data) - self.entry_size:
558 start = off + self.big_int_size
558 start = off + self.big_int_size
559 (s,) = struct.unpack(
559 (s,) = struct.unpack(
560 b'>i',
560 b'>i',
561 self._data[start : start + self.int_size],
561 self._data[start : start + self.int_size],
562 )
562 )
563 if lgt is not None:
563 if lgt is not None:
564 self._offsets[count] = off
564 self._offsets[count] = off
565 count += 1
565 count += 1
566 off += self.entry_size + s
566 off += self.entry_size + s
567 if off != len(self._data):
567 if off != len(self._data):
568 raise ValueError(b"corrupted data")
568 raise ValueError(b"corrupted data")
569 return count
569 return count
570
570
571 def __delitem__(self, i):
571 def __delitem__(self, i):
572 if not isinstance(i, slice) or not i.stop == -1 or i.step is not None:
572 if not isinstance(i, slice) or not i.stop == -1 or i.step is not None:
573 raise ValueError(b"deleting slices only supports a:-1 with step 1")
573 raise ValueError(b"deleting slices only supports a:-1 with step 1")
574 i = i.start
574 i = i.start
575 self._check_index(i)
575 self._check_index(i)
576 self._stripnodes(i)
576 self._stripnodes(i)
577 if i < self._lgt:
577 if i < self._lgt:
578 self._offsets = self._offsets[:i]
578 self._offsets = self._offsets[:i]
579 self._lgt = i
579 self._lgt = i
580 self._extra = []
580 self._extra = []
581 else:
581 else:
582 self._extra = self._extra[: i - self._lgt]
582 self._extra = self._extra[: i - self._lgt]
583
583
584 def _calculate_index(self, i):
584 def _calculate_index(self, i):
585 return self._offsets[i]
585 return self._offsets[i]
586
586
587
587
588 def parse_index2(data, inline, revlogv2=False):
588 def parse_index2(data, inline, revlogv2=False):
589 if not inline:
589 if not inline:
590 cls = IndexObject2 if revlogv2 else IndexObject
590 cls = IndexObject2 if revlogv2 else IndexObject
591 return cls(data), None
591 return cls(data), None
592 cls = InlinedIndexObject
592 cls = InlinedIndexObject
593 return cls(data, inline), (0, data)
593 return cls(data, inline), (0, data)
594
594
595
595
596 def parse_index_cl_v2(data):
596 def parse_index_cl_v2(data):
597 return IndexChangelogV2(data), None
597 return IndexChangelogV2(data), None
598
598
599
599
600 class IndexObject2(IndexObject):
600 class IndexObject2(IndexObject):
601 index_format = revlog_constants.INDEX_ENTRY_V2
601 index_format = revlog_constants.INDEX_ENTRY_V2
602
602
603 def replace_sidedata_info(
603 def replace_sidedata_info(
604 self,
604 self,
605 rev,
605 rev,
606 sidedata_offset,
606 sidedata_offset,
607 sidedata_length,
607 sidedata_length,
608 offset_flags,
608 offset_flags,
609 compression_mode,
609 compression_mode,
610 ):
610 ):
611 """
611 """
612 Replace an existing index entry's sidedata offset and length with new
612 Replace an existing index entry's sidedata offset and length with new
613 ones.
613 ones.
614 This cannot be used outside of the context of sidedata rewriting,
614 This cannot be used outside of the context of sidedata rewriting,
615 inside the transaction that creates the revision `rev`.
615 inside the transaction that creates the revision `rev`.
616 """
616 """
617 if rev < 0:
617 if rev < 0:
618 raise KeyError
618 raise KeyError
619 self._check_index(rev)
619 self._check_index(rev)
620 if rev < self._lgt:
620 if rev < self._lgt:
621 msg = b"cannot rewrite entries outside of this transaction"
621 msg = b"cannot rewrite entries outside of this transaction"
622 raise KeyError(msg)
622 raise KeyError(msg)
623 else:
623 else:
624 entry = list(self[rev])
624 entry = list(self[rev])
625 entry[0] = offset_flags
625 entry[0] = offset_flags
626 entry[8] = sidedata_offset
626 entry[8] = sidedata_offset
627 entry[9] = sidedata_length
627 entry[9] = sidedata_length
628 entry[11] = compression_mode
628 entry[11] = compression_mode
629 entry = tuple(entry)
629 entry = tuple(entry)
630 new = self._pack_entry(rev, entry)
630 new = self._pack_entry(rev, entry)
631 self._extra[rev - self._lgt] = new
631 self._extra[rev - self._lgt] = new
632
632
633 def _unpack_entry(self, rev, data):
633 def _unpack_entry(self, rev, data):
634 data = self.index_format.unpack(data)
634 data = self.index_format.unpack(data)
635 entry = data[:10]
635 entry = data[:10]
636 data_comp = data[10] & 3
636 data_comp = data[10] & 3
637 sidedata_comp = (data[10] & (3 << 2)) >> 2
637 sidedata_comp = (data[10] & (3 << 2)) >> 2
638 return entry + (data_comp, sidedata_comp)
638 return entry + (data_comp, sidedata_comp)
639
639
640 def _pack_entry(self, rev, entry):
640 def _pack_entry(self, rev, entry):
641 data = entry[:10]
641 data = entry[:10]
642 data_comp = entry[10] & 3
642 data_comp = entry[10] & 3
643 sidedata_comp = (entry[11] & 3) << 2
643 sidedata_comp = (entry[11] & 3) << 2
644 data += (data_comp | sidedata_comp,)
644 data += (data_comp | sidedata_comp,)
645
645
646 return self.index_format.pack(*data)
646 return self.index_format.pack(*data)
647
647
648 def entry_binary(self, rev):
648 def entry_binary(self, rev):
649 """return the raw binary string representing a revision"""
649 """return the raw binary string representing a revision"""
650 entry = self[rev]
650 entry = self[rev]
651 return self._pack_entry(rev, entry)
651 return self._pack_entry(rev, entry)
652
652
653 def pack_header(self, header):
653 def pack_header(self, header):
654 """pack header information as binary"""
654 """pack header information as binary"""
655 msg = 'version header should go in the docket, not the index: %d'
655 msg = 'version header should go in the docket, not the index: %d'
656 msg %= header
656 msg %= header
657 raise error.ProgrammingError(msg)
657 raise error.ProgrammingError(msg)
658
658
659
659
660 class IndexChangelogV2(IndexObject2):
660 class IndexChangelogV2(IndexObject2):
661 index_format = revlog_constants.INDEX_ENTRY_CL_V2
661 index_format = revlog_constants.INDEX_ENTRY_CL_V2
662
662
663 def _unpack_entry(self, rev, data, r=True):
663 def _unpack_entry(self, rev, data, r=True):
664 items = self.index_format.unpack(data)
664 items = self.index_format.unpack(data)
665 entry = items[:3] + (rev, rev) + items[3:8]
665 entry = items[:3] + (rev, rev) + items[3:8]
666 data_comp = items[8] & 3
666 data_comp = items[8] & 3
667 sidedata_comp = (items[8] >> 2) & 3
667 sidedata_comp = (items[8] >> 2) & 3
668 return entry + (data_comp, sidedata_comp)
668 return entry + (data_comp, sidedata_comp)
669
669
670 def _pack_entry(self, rev, entry):
670 def _pack_entry(self, rev, entry):
671 assert entry[3] == rev, entry[3]
671 assert entry[3] == rev, entry[3]
672 assert entry[4] == rev, entry[4]
672 assert entry[4] == rev, entry[4]
673 data = entry[:3] + entry[5:10]
673 data = entry[:3] + entry[5:10]
674 data_comp = entry[10] & 3
674 data_comp = entry[10] & 3
675 sidedata_comp = (entry[11] & 3) << 2
675 sidedata_comp = (entry[11] & 3) << 2
676 data += (data_comp | sidedata_comp,)
676 data += (data_comp | sidedata_comp,)
677 return self.index_format.pack(*data)
677 return self.index_format.pack(*data)
678
678
679
679
680 def parse_index_devel_nodemap(data, inline):
680 def parse_index_devel_nodemap(data, inline):
681 """like parse_index2, but alway return a PersistentNodeMapIndexObject"""
681 """like parse_index2, but alway return a PersistentNodeMapIndexObject"""
682 return PersistentNodeMapIndexObject(data), None
682 return PersistentNodeMapIndexObject(data), None
683
683
684
684
685 def parse_dirstate(dmap, copymap, st):
685 def parse_dirstate(dmap, copymap, st):
686 parents = [st[:20], st[20:40]]
686 parents = [st[:20], st[20:40]]
687 # dereference fields so they will be local in loop
687 # dereference fields so they will be local in loop
688 format = b">cllll"
688 format = b">cllll"
689 e_size = struct.calcsize(format)
689 e_size = struct.calcsize(format)
690 pos1 = 40
690 pos1 = 40
691 l = len(st)
691 l = len(st)
692
692
693 # the inner loop
693 # the inner loop
694 while pos1 < l:
694 while pos1 < l:
695 pos2 = pos1 + e_size
695 pos2 = pos1 + e_size
696 e = _unpack(b">cllll", st[pos1:pos2]) # a literal here is faster
696 e = _unpack(b">cllll", st[pos1:pos2]) # a literal here is faster
697 pos1 = pos2 + e[4]
697 pos1 = pos2 + e[4]
698 f = st[pos2:pos1]
698 f = st[pos2:pos1]
699 if b'\0' in f:
699 if b'\0' in f:
700 f, c = f.split(b'\0')
700 f, c = f.split(b'\0')
701 copymap[f] = c
701 copymap[f] = c
702 dmap[f] = DirstateItem.from_v1_data(*e[:4])
702 dmap[f] = DirstateItem.from_v1_data(*e[:4])
703 return parents
703 return parents
704
704
705
705
706 def pack_dirstate(dmap, copymap, pl, now):
706 def pack_dirstate(dmap, copymap, pl, now):
707 now = int(now)
707 now = int(now)
708 cs = stringio()
708 cs = stringio()
709 write = cs.write
709 write = cs.write
710 write(b"".join(pl))
710 write(b"".join(pl))
711 for f, e in pycompat.iteritems(dmap):
711 for f, e in pycompat.iteritems(dmap):
712 if e.need_delay(now):
712 if e.need_delay(now):
713 # The file was last modified "simultaneously" with the current
713 # The file was last modified "simultaneously" with the current
714 # write to dirstate (i.e. within the same second for file-
714 # write to dirstate (i.e. within the same second for file-
715 # systems with a granularity of 1 sec). This commonly happens
715 # systems with a granularity of 1 sec). This commonly happens
716 # for at least a couple of files on 'update'.
716 # for at least a couple of files on 'update'.
717 # The user could change the file without changing its size
717 # The user could change the file without changing its size
718 # within the same second. Invalidate the file's mtime in
718 # within the same second. Invalidate the file's mtime in
719 # dirstate, forcing future 'status' calls to compare the
719 # dirstate, forcing future 'status' calls to compare the
720 # contents of the file if the size is the same. This prevents
720 # contents of the file if the size is the same. This prevents
721 # mistakenly treating such files as clean.
721 # mistakenly treating such files as clean.
722 e.set_possibly_dirty()
722 e.set_possibly_dirty()
723
723
724 if f in copymap:
724 if f in copymap:
725 f = b"%s\0%s" % (f, copymap[f])
725 f = b"%s\0%s" % (f, copymap[f])
726 e = _pack(
726 e = _pack(
727 b">cllll",
727 b">cllll",
728 e.v1_state(),
728 e.v1_state(),
729 e.v1_mode(),
729 e.v1_mode(),
730 e.v1_size(),
730 e.v1_size(),
731 e.v1_mtime(),
731 e.v1_mtime(),
732 len(f),
732 len(f),
733 )
733 )
734 write(e)
734 write(e)
735 write(f)
735 write(f)
736 return cs.getvalue()
736 return cs.getvalue()
@@ -1,792 +1,733 b''
1 //! The "version 2" disk representation of the dirstate
1 //! The "version 2" disk representation of the dirstate
2 //!
2 //!
3 //! See `mercurial/helptext/internals/dirstate-v2.txt`
3 //! See `mercurial/helptext/internals/dirstate-v2.txt`
4
4
5 use crate::dirstate_tree::dirstate_map::{self, DirstateMap, NodeRef};
5 use crate::dirstate_tree::dirstate_map::{self, DirstateMap, NodeRef};
6 use crate::dirstate_tree::path_with_basename::WithBasename;
6 use crate::dirstate_tree::path_with_basename::WithBasename;
7 use crate::errors::HgError;
7 use crate::errors::HgError;
8 use crate::utils::hg_path::HgPath;
8 use crate::utils::hg_path::HgPath;
9 use crate::DirstateEntry;
9 use crate::DirstateEntry;
10 use crate::DirstateError;
10 use crate::DirstateError;
11 use crate::DirstateParents;
11 use crate::DirstateParents;
12 use bitflags::bitflags;
12 use bitflags::bitflags;
13 use bytes_cast::unaligned::{I32Be, I64Be, U16Be, U32Be};
13 use bytes_cast::unaligned::{I32Be, I64Be, U16Be, U32Be};
14 use bytes_cast::BytesCast;
14 use bytes_cast::BytesCast;
15 use format_bytes::format_bytes;
15 use format_bytes::format_bytes;
16 use std::borrow::Cow;
16 use std::borrow::Cow;
17 use std::convert::{TryFrom, TryInto};
17 use std::convert::{TryFrom, TryInto};
18 use std::time::{Duration, SystemTime, UNIX_EPOCH};
18 use std::time::{Duration, SystemTime, UNIX_EPOCH};
19
19
20 /// Added at the start of `.hg/dirstate` when the "v2" format is used.
20 /// Added at the start of `.hg/dirstate` when the "v2" format is used.
21 /// This a redundant sanity check more than an actual "magic number" since
21 /// This a redundant sanity check more than an actual "magic number" since
22 /// `.hg/requires` already governs which format should be used.
22 /// `.hg/requires` already governs which format should be used.
23 pub const V2_FORMAT_MARKER: &[u8; 12] = b"dirstate-v2\n";
23 pub const V2_FORMAT_MARKER: &[u8; 12] = b"dirstate-v2\n";
24
24
25 /// Keep space for 256-bit hashes
25 /// Keep space for 256-bit hashes
26 const STORED_NODE_ID_BYTES: usize = 32;
26 const STORED_NODE_ID_BYTES: usize = 32;
27
27
28 /// … even though only 160 bits are used for now, with SHA-1
28 /// … even though only 160 bits are used for now, with SHA-1
29 const USED_NODE_ID_BYTES: usize = 20;
29 const USED_NODE_ID_BYTES: usize = 20;
30
30
31 pub(super) const IGNORE_PATTERNS_HASH_LEN: usize = 20;
31 pub(super) const IGNORE_PATTERNS_HASH_LEN: usize = 20;
32 pub(super) type IgnorePatternsHash = [u8; IGNORE_PATTERNS_HASH_LEN];
32 pub(super) type IgnorePatternsHash = [u8; IGNORE_PATTERNS_HASH_LEN];
33
33
34 /// Must match the constant of the same name in
34 /// Must match the constant of the same name in
35 /// `mercurial/dirstateutils/docket.py`
35 /// `mercurial/dirstateutils/docket.py`
36 const TREE_METADATA_SIZE: usize = 44;
36 const TREE_METADATA_SIZE: usize = 44;
37
37
38 /// Make sure that size-affecting changes are made knowingly
38 /// Make sure that size-affecting changes are made knowingly
39 #[allow(unused)]
39 #[allow(unused)]
40 fn static_assert_size_of() {
40 fn static_assert_size_of() {
41 let _ = std::mem::transmute::<TreeMetadata, [u8; TREE_METADATA_SIZE]>;
41 let _ = std::mem::transmute::<TreeMetadata, [u8; TREE_METADATA_SIZE]>;
42 let _ = std::mem::transmute::<DocketHeader, [u8; TREE_METADATA_SIZE + 81]>;
42 let _ = std::mem::transmute::<DocketHeader, [u8; TREE_METADATA_SIZE + 81]>;
43 let _ = std::mem::transmute::<Node, [u8; 43]>;
43 let _ = std::mem::transmute::<Node, [u8; 43]>;
44 }
44 }
45
45
46 // Must match `HEADER` in `mercurial/dirstateutils/docket.py`
46 // Must match `HEADER` in `mercurial/dirstateutils/docket.py`
47 #[derive(BytesCast)]
47 #[derive(BytesCast)]
48 #[repr(C)]
48 #[repr(C)]
49 struct DocketHeader {
49 struct DocketHeader {
50 marker: [u8; V2_FORMAT_MARKER.len()],
50 marker: [u8; V2_FORMAT_MARKER.len()],
51 parent_1: [u8; STORED_NODE_ID_BYTES],
51 parent_1: [u8; STORED_NODE_ID_BYTES],
52 parent_2: [u8; STORED_NODE_ID_BYTES],
52 parent_2: [u8; STORED_NODE_ID_BYTES],
53
53
54 metadata: TreeMetadata,
54 metadata: TreeMetadata,
55
55
56 /// Counted in bytes
56 /// Counted in bytes
57 data_size: Size,
57 data_size: Size,
58
58
59 uuid_size: u8,
59 uuid_size: u8,
60 }
60 }
61
61
62 pub struct Docket<'on_disk> {
62 pub struct Docket<'on_disk> {
63 header: &'on_disk DocketHeader,
63 header: &'on_disk DocketHeader,
64 uuid: &'on_disk [u8],
64 uuid: &'on_disk [u8],
65 }
65 }
66
66
67 /// Fields are documented in the *Tree metadata in the docket file*
68 /// section of `mercurial/helptext/internals/dirstate-v2.txt`
67 #[derive(BytesCast)]
69 #[derive(BytesCast)]
68 #[repr(C)]
70 #[repr(C)]
69 struct TreeMetadata {
71 struct TreeMetadata {
70 root_nodes: ChildNodes,
72 root_nodes: ChildNodes,
71 nodes_with_entry_count: Size,
73 nodes_with_entry_count: Size,
72 nodes_with_copy_source_count: Size,
74 nodes_with_copy_source_count: Size,
73
74 /// How many bytes of this data file are not used anymore
75 unreachable_bytes: Size,
75 unreachable_bytes: Size,
76
77 /// Current version always sets these bytes to zero when creating or
78 /// updating a dirstate. Future versions could assign some bits to signal
79 /// for example "the version that last wrote/updated this dirstate did so
80 /// in such and such way that can be relied on by versions that know to."
81 unused: [u8; 4],
76 unused: [u8; 4],
82
77
83 /// If non-zero, a hash of ignore files that were used for some previous
78 /// See *Optional hash of ignore patterns* section of
84 /// run of the `status` algorithm.
79 /// `mercurial/helptext/internals/dirstate-v2.txt`
85 ///
86 /// We define:
87 ///
88 /// * "Root" ignore files are `.hgignore` at the root of the repository if
89 /// it exists, and files from `ui.ignore.*` config. This set of files is
90 /// then sorted by the string representation of their path.
91 /// * The "expanded contents" of an ignore files is the byte string made
92 /// by concatenating its contents with the "expanded contents" of other
93 /// files included with `include:` or `subinclude:` files, in inclusion
94 /// order. This definition is recursive, as included files can
95 /// themselves include more files.
96 ///
97 /// This hash is defined as the SHA-1 of the concatenation (in sorted
98 /// order) of the "expanded contents" of each "root" ignore file.
99 /// (Note that computing this does not require actually concatenating byte
100 /// strings into contiguous memory, instead SHA-1 hashing can be done
101 /// incrementally.)
102 ignore_patterns_hash: IgnorePatternsHash,
80 ignore_patterns_hash: IgnorePatternsHash,
103 }
81 }
104
82
83 /// Fields are documented in the *The data file format*
84 /// section of `mercurial/helptext/internals/dirstate-v2.txt`
105 #[derive(BytesCast)]
85 #[derive(BytesCast)]
106 #[repr(C)]
86 #[repr(C)]
107 pub(super) struct Node {
87 pub(super) struct Node {
108 full_path: PathSlice,
88 full_path: PathSlice,
109
89
110 /// In bytes from `self.full_path.start`
90 /// In bytes from `self.full_path.start`
111 base_name_start: PathSize,
91 base_name_start: PathSize,
112
92
113 copy_source: OptPathSlice,
93 copy_source: OptPathSlice,
114 children: ChildNodes,
94 children: ChildNodes,
115 pub(super) descendants_with_entry_count: Size,
95 pub(super) descendants_with_entry_count: Size,
116 pub(super) tracked_descendants_count: Size,
96 pub(super) tracked_descendants_count: Size,
117
118 /// Depending on the bits in `flags`:
119 ///
120 /// * If any of `WDIR_TRACKED`, `P1_TRACKED`, or `P2_INFO` are set, the
121 /// node has an entry.
122 ///
123 /// - If `HAS_MODE_AND_SIZE` is set, `data.mode` and `data.size` are
124 /// meaningful. Otherwise they are set to zero
125 /// - If `HAS_MTIME` is set, `data.mtime` is meaningful. Otherwise it is
126 /// set to zero.
127 ///
128 /// * If none of `WDIR_TRACKED`, `P1_TRACKED`, `P2_INFO`, or `HAS_MTIME`
129 /// are set, the node does not have an entry and `data` is set to all
130 /// zeros.
131 ///
132 /// * If none of `WDIR_TRACKED`, `P1_TRACKED`, `P2_INFO` are set, but
133 /// `HAS_MTIME` is set, the bytes of `data` should instead be
134 /// interpreted as the `Timestamp` for the mtime of a cached directory.
135 ///
136 /// The presence of this combination of flags means that at some point,
137 /// this path in the working directory was observed:
138 ///
139 /// - To be a directory
140 /// - With the modification time as given by `Timestamp`
141 /// - That timestamp was already strictly in the past when observed,
142 /// meaning that later changes cannot happen in the same clock tick
143 /// and must cause a different modification time (unless the system
144 /// clock jumps back and we get unlucky, which is not impossible but
145 /// but deemed unlikely enough).
146 /// - All direct children of this directory (as returned by
147 /// `std::fs::read_dir`) either have a corresponding dirstate node, or
148 /// are ignored by ignore patterns whose hash is in
149 /// `TreeMetadata::ignore_patterns_hash`.
150 ///
151 /// This means that if `std::fs::symlink_metadata` later reports the
152 /// same modification time and ignored patterns haven’t changed, a run
153 /// of status that is not listing ignored files can skip calling
154 /// `std::fs::read_dir` again for this directory, iterate child
155 /// dirstate nodes instead.
156 flags: Flags,
97 flags: Flags,
157 data: Entry,
98 data: Entry,
158 }
99 }
159
100
160 bitflags! {
101 bitflags! {
161 #[derive(BytesCast)]
102 #[derive(BytesCast)]
162 #[repr(C)]
103 #[repr(C)]
163 struct Flags: u8 {
104 struct Flags: u8 {
164 const WDIR_TRACKED = 1 << 0;
105 const WDIR_TRACKED = 1 << 0;
165 const P1_TRACKED = 1 << 1;
106 const P1_TRACKED = 1 << 1;
166 const P2_INFO = 1 << 2;
107 const P2_INFO = 1 << 2;
167 const HAS_MODE_AND_SIZE = 1 << 3;
108 const HAS_MODE_AND_SIZE = 1 << 3;
168 const HAS_MTIME = 1 << 4;
109 const HAS_MTIME = 1 << 4;
169 }
110 }
170 }
111 }
171
112
172 #[derive(BytesCast, Copy, Clone, Debug)]
113 #[derive(BytesCast, Copy, Clone, Debug)]
173 #[repr(C)]
114 #[repr(C)]
174 struct Entry {
115 struct Entry {
175 mode: I32Be,
116 mode: I32Be,
176 mtime: I32Be,
117 mtime: I32Be,
177 size: I32Be,
118 size: I32Be,
178 }
119 }
179
120
180 /// Duration since the Unix epoch
121 /// Duration since the Unix epoch
181 #[derive(BytesCast, Copy, Clone, PartialEq)]
122 #[derive(BytesCast, Copy, Clone, PartialEq)]
182 #[repr(C)]
123 #[repr(C)]
183 pub(super) struct Timestamp {
124 pub(super) struct Timestamp {
184 seconds: I64Be,
125 seconds: I64Be,
185
126
186 /// In `0 .. 1_000_000_000`.
127 /// In `0 .. 1_000_000_000`.
187 ///
128 ///
188 /// This timestamp is later or earlier than `(seconds, 0)` by this many
129 /// This timestamp is later or earlier than `(seconds, 0)` by this many
189 /// nanoseconds, if `seconds` is non-negative or negative, respectively.
130 /// nanoseconds, if `seconds` is non-negative or negative, respectively.
190 nanoseconds: U32Be,
131 nanoseconds: U32Be,
191 }
132 }
192
133
193 /// Counted in bytes from the start of the file
134 /// Counted in bytes from the start of the file
194 ///
135 ///
195 /// NOTE: not supporting `.hg/dirstate` files larger than 4 GiB.
136 /// NOTE: not supporting `.hg/dirstate` files larger than 4 GiB.
196 type Offset = U32Be;
137 type Offset = U32Be;
197
138
198 /// Counted in number of items
139 /// Counted in number of items
199 ///
140 ///
200 /// NOTE: we choose not to support counting more than 4 billion nodes anywhere.
141 /// NOTE: we choose not to support counting more than 4 billion nodes anywhere.
201 type Size = U32Be;
142 type Size = U32Be;
202
143
203 /// Counted in bytes
144 /// Counted in bytes
204 ///
145 ///
205 /// NOTE: we choose not to support file names/paths longer than 64 KiB.
146 /// NOTE: we choose not to support file names/paths longer than 64 KiB.
206 type PathSize = U16Be;
147 type PathSize = U16Be;
207
148
208 /// A contiguous sequence of `len` times `Node`, representing the child nodes
149 /// A contiguous sequence of `len` times `Node`, representing the child nodes
209 /// of either some other node or of the repository root.
150 /// of either some other node or of the repository root.
210 ///
151 ///
211 /// Always sorted by ascending `full_path`, to allow binary search.
152 /// Always sorted by ascending `full_path`, to allow binary search.
212 /// Since nodes with the same parent nodes also have the same parent path,
153 /// Since nodes with the same parent nodes also have the same parent path,
213 /// only the `base_name`s need to be compared during binary search.
154 /// only the `base_name`s need to be compared during binary search.
214 #[derive(BytesCast, Copy, Clone)]
155 #[derive(BytesCast, Copy, Clone)]
215 #[repr(C)]
156 #[repr(C)]
216 struct ChildNodes {
157 struct ChildNodes {
217 start: Offset,
158 start: Offset,
218 len: Size,
159 len: Size,
219 }
160 }
220
161
221 /// A `HgPath` of `len` bytes
162 /// A `HgPath` of `len` bytes
222 #[derive(BytesCast, Copy, Clone)]
163 #[derive(BytesCast, Copy, Clone)]
223 #[repr(C)]
164 #[repr(C)]
224 struct PathSlice {
165 struct PathSlice {
225 start: Offset,
166 start: Offset,
226 len: PathSize,
167 len: PathSize,
227 }
168 }
228
169
229 /// Either nothing if `start == 0`, or a `HgPath` of `len` bytes
170 /// Either nothing if `start == 0`, or a `HgPath` of `len` bytes
230 type OptPathSlice = PathSlice;
171 type OptPathSlice = PathSlice;
231
172
232 /// Unexpected file format found in `.hg/dirstate` with the "v2" format.
173 /// Unexpected file format found in `.hg/dirstate` with the "v2" format.
233 ///
174 ///
234 /// This should only happen if Mercurial is buggy or a repository is corrupted.
175 /// This should only happen if Mercurial is buggy or a repository is corrupted.
235 #[derive(Debug)]
176 #[derive(Debug)]
236 pub struct DirstateV2ParseError;
177 pub struct DirstateV2ParseError;
237
178
238 impl From<DirstateV2ParseError> for HgError {
179 impl From<DirstateV2ParseError> for HgError {
239 fn from(_: DirstateV2ParseError) -> Self {
180 fn from(_: DirstateV2ParseError) -> Self {
240 HgError::corrupted("dirstate-v2 parse error")
181 HgError::corrupted("dirstate-v2 parse error")
241 }
182 }
242 }
183 }
243
184
244 impl From<DirstateV2ParseError> for crate::DirstateError {
185 impl From<DirstateV2ParseError> for crate::DirstateError {
245 fn from(error: DirstateV2ParseError) -> Self {
186 fn from(error: DirstateV2ParseError) -> Self {
246 HgError::from(error).into()
187 HgError::from(error).into()
247 }
188 }
248 }
189 }
249
190
250 impl<'on_disk> Docket<'on_disk> {
191 impl<'on_disk> Docket<'on_disk> {
251 pub fn parents(&self) -> DirstateParents {
192 pub fn parents(&self) -> DirstateParents {
252 use crate::Node;
193 use crate::Node;
253 let p1 = Node::try_from(&self.header.parent_1[..USED_NODE_ID_BYTES])
194 let p1 = Node::try_from(&self.header.parent_1[..USED_NODE_ID_BYTES])
254 .unwrap()
195 .unwrap()
255 .clone();
196 .clone();
256 let p2 = Node::try_from(&self.header.parent_2[..USED_NODE_ID_BYTES])
197 let p2 = Node::try_from(&self.header.parent_2[..USED_NODE_ID_BYTES])
257 .unwrap()
198 .unwrap()
258 .clone();
199 .clone();
259 DirstateParents { p1, p2 }
200 DirstateParents { p1, p2 }
260 }
201 }
261
202
262 pub fn tree_metadata(&self) -> &[u8] {
203 pub fn tree_metadata(&self) -> &[u8] {
263 self.header.metadata.as_bytes()
204 self.header.metadata.as_bytes()
264 }
205 }
265
206
266 pub fn data_size(&self) -> usize {
207 pub fn data_size(&self) -> usize {
267 // This `unwrap` could only panic on a 16-bit CPU
208 // This `unwrap` could only panic on a 16-bit CPU
268 self.header.data_size.get().try_into().unwrap()
209 self.header.data_size.get().try_into().unwrap()
269 }
210 }
270
211
271 pub fn data_filename(&self) -> String {
212 pub fn data_filename(&self) -> String {
272 String::from_utf8(format_bytes!(b"dirstate.{}", self.uuid)).unwrap()
213 String::from_utf8(format_bytes!(b"dirstate.{}", self.uuid)).unwrap()
273 }
214 }
274 }
215 }
275
216
276 pub fn read_docket(
217 pub fn read_docket(
277 on_disk: &[u8],
218 on_disk: &[u8],
278 ) -> Result<Docket<'_>, DirstateV2ParseError> {
219 ) -> Result<Docket<'_>, DirstateV2ParseError> {
279 let (header, uuid) =
220 let (header, uuid) =
280 DocketHeader::from_bytes(on_disk).map_err(|_| DirstateV2ParseError)?;
221 DocketHeader::from_bytes(on_disk).map_err(|_| DirstateV2ParseError)?;
281 let uuid_size = header.uuid_size as usize;
222 let uuid_size = header.uuid_size as usize;
282 if header.marker == *V2_FORMAT_MARKER && uuid.len() == uuid_size {
223 if header.marker == *V2_FORMAT_MARKER && uuid.len() == uuid_size {
283 Ok(Docket { header, uuid })
224 Ok(Docket { header, uuid })
284 } else {
225 } else {
285 Err(DirstateV2ParseError)
226 Err(DirstateV2ParseError)
286 }
227 }
287 }
228 }
288
229
289 pub(super) fn read<'on_disk>(
230 pub(super) fn read<'on_disk>(
290 on_disk: &'on_disk [u8],
231 on_disk: &'on_disk [u8],
291 metadata: &[u8],
232 metadata: &[u8],
292 ) -> Result<DirstateMap<'on_disk>, DirstateV2ParseError> {
233 ) -> Result<DirstateMap<'on_disk>, DirstateV2ParseError> {
293 if on_disk.is_empty() {
234 if on_disk.is_empty() {
294 return Ok(DirstateMap::empty(on_disk));
235 return Ok(DirstateMap::empty(on_disk));
295 }
236 }
296 let (meta, _) = TreeMetadata::from_bytes(metadata)
237 let (meta, _) = TreeMetadata::from_bytes(metadata)
297 .map_err(|_| DirstateV2ParseError)?;
238 .map_err(|_| DirstateV2ParseError)?;
298 let dirstate_map = DirstateMap {
239 let dirstate_map = DirstateMap {
299 on_disk,
240 on_disk,
300 root: dirstate_map::ChildNodes::OnDisk(read_nodes(
241 root: dirstate_map::ChildNodes::OnDisk(read_nodes(
301 on_disk,
242 on_disk,
302 meta.root_nodes,
243 meta.root_nodes,
303 )?),
244 )?),
304 nodes_with_entry_count: meta.nodes_with_entry_count.get(),
245 nodes_with_entry_count: meta.nodes_with_entry_count.get(),
305 nodes_with_copy_source_count: meta.nodes_with_copy_source_count.get(),
246 nodes_with_copy_source_count: meta.nodes_with_copy_source_count.get(),
306 ignore_patterns_hash: meta.ignore_patterns_hash,
247 ignore_patterns_hash: meta.ignore_patterns_hash,
307 unreachable_bytes: meta.unreachable_bytes.get(),
248 unreachable_bytes: meta.unreachable_bytes.get(),
308 };
249 };
309 Ok(dirstate_map)
250 Ok(dirstate_map)
310 }
251 }
311
252
312 impl Node {
253 impl Node {
313 pub(super) fn full_path<'on_disk>(
254 pub(super) fn full_path<'on_disk>(
314 &self,
255 &self,
315 on_disk: &'on_disk [u8],
256 on_disk: &'on_disk [u8],
316 ) -> Result<&'on_disk HgPath, DirstateV2ParseError> {
257 ) -> Result<&'on_disk HgPath, DirstateV2ParseError> {
317 read_hg_path(on_disk, self.full_path)
258 read_hg_path(on_disk, self.full_path)
318 }
259 }
319
260
320 pub(super) fn base_name_start<'on_disk>(
261 pub(super) fn base_name_start<'on_disk>(
321 &self,
262 &self,
322 ) -> Result<usize, DirstateV2ParseError> {
263 ) -> Result<usize, DirstateV2ParseError> {
323 let start = self.base_name_start.get();
264 let start = self.base_name_start.get();
324 if start < self.full_path.len.get() {
265 if start < self.full_path.len.get() {
325 let start = usize::try_from(start)
266 let start = usize::try_from(start)
326 // u32 -> usize, could only panic on a 16-bit CPU
267 // u32 -> usize, could only panic on a 16-bit CPU
327 .expect("dirstate-v2 base_name_start out of bounds");
268 .expect("dirstate-v2 base_name_start out of bounds");
328 Ok(start)
269 Ok(start)
329 } else {
270 } else {
330 Err(DirstateV2ParseError)
271 Err(DirstateV2ParseError)
331 }
272 }
332 }
273 }
333
274
334 pub(super) fn base_name<'on_disk>(
275 pub(super) fn base_name<'on_disk>(
335 &self,
276 &self,
336 on_disk: &'on_disk [u8],
277 on_disk: &'on_disk [u8],
337 ) -> Result<&'on_disk HgPath, DirstateV2ParseError> {
278 ) -> Result<&'on_disk HgPath, DirstateV2ParseError> {
338 let full_path = self.full_path(on_disk)?;
279 let full_path = self.full_path(on_disk)?;
339 let base_name_start = self.base_name_start()?;
280 let base_name_start = self.base_name_start()?;
340 Ok(HgPath::new(&full_path.as_bytes()[base_name_start..]))
281 Ok(HgPath::new(&full_path.as_bytes()[base_name_start..]))
341 }
282 }
342
283
343 pub(super) fn path<'on_disk>(
284 pub(super) fn path<'on_disk>(
344 &self,
285 &self,
345 on_disk: &'on_disk [u8],
286 on_disk: &'on_disk [u8],
346 ) -> Result<dirstate_map::NodeKey<'on_disk>, DirstateV2ParseError> {
287 ) -> Result<dirstate_map::NodeKey<'on_disk>, DirstateV2ParseError> {
347 Ok(WithBasename::from_raw_parts(
288 Ok(WithBasename::from_raw_parts(
348 Cow::Borrowed(self.full_path(on_disk)?),
289 Cow::Borrowed(self.full_path(on_disk)?),
349 self.base_name_start()?,
290 self.base_name_start()?,
350 ))
291 ))
351 }
292 }
352
293
353 pub(super) fn has_copy_source<'on_disk>(&self) -> bool {
294 pub(super) fn has_copy_source<'on_disk>(&self) -> bool {
354 self.copy_source.start.get() != 0
295 self.copy_source.start.get() != 0
355 }
296 }
356
297
357 pub(super) fn copy_source<'on_disk>(
298 pub(super) fn copy_source<'on_disk>(
358 &self,
299 &self,
359 on_disk: &'on_disk [u8],
300 on_disk: &'on_disk [u8],
360 ) -> Result<Option<&'on_disk HgPath>, DirstateV2ParseError> {
301 ) -> Result<Option<&'on_disk HgPath>, DirstateV2ParseError> {
361 Ok(if self.has_copy_source() {
302 Ok(if self.has_copy_source() {
362 Some(read_hg_path(on_disk, self.copy_source)?)
303 Some(read_hg_path(on_disk, self.copy_source)?)
363 } else {
304 } else {
364 None
305 None
365 })
306 })
366 }
307 }
367
308
368 fn has_entry(&self) -> bool {
309 fn has_entry(&self) -> bool {
369 self.flags.intersects(
310 self.flags.intersects(
370 Flags::WDIR_TRACKED | Flags::P1_TRACKED | Flags::P2_INFO,
311 Flags::WDIR_TRACKED | Flags::P1_TRACKED | Flags::P2_INFO,
371 )
312 )
372 }
313 }
373
314
374 pub(super) fn node_data(
315 pub(super) fn node_data(
375 &self,
316 &self,
376 ) -> Result<dirstate_map::NodeData, DirstateV2ParseError> {
317 ) -> Result<dirstate_map::NodeData, DirstateV2ParseError> {
377 if self.has_entry() {
318 if self.has_entry() {
378 Ok(dirstate_map::NodeData::Entry(self.assume_entry()))
319 Ok(dirstate_map::NodeData::Entry(self.assume_entry()))
379 } else if let Some(&mtime) = self.cached_directory_mtime() {
320 } else if let Some(&mtime) = self.cached_directory_mtime() {
380 Ok(dirstate_map::NodeData::CachedDirectory { mtime })
321 Ok(dirstate_map::NodeData::CachedDirectory { mtime })
381 } else {
322 } else {
382 Ok(dirstate_map::NodeData::None)
323 Ok(dirstate_map::NodeData::None)
383 }
324 }
384 }
325 }
385
326
386 pub(super) fn cached_directory_mtime(&self) -> Option<&Timestamp> {
327 pub(super) fn cached_directory_mtime(&self) -> Option<&Timestamp> {
387 if self.flags.contains(Flags::HAS_MTIME) && !self.has_entry() {
328 if self.flags.contains(Flags::HAS_MTIME) && !self.has_entry() {
388 Some(self.data.as_timestamp())
329 Some(self.data.as_timestamp())
389 } else {
330 } else {
390 None
331 None
391 }
332 }
392 }
333 }
393
334
394 fn assume_entry(&self) -> DirstateEntry {
335 fn assume_entry(&self) -> DirstateEntry {
395 // TODO: convert through raw bits instead?
336 // TODO: convert through raw bits instead?
396 let wdir_tracked = self.flags.contains(Flags::WDIR_TRACKED);
337 let wdir_tracked = self.flags.contains(Flags::WDIR_TRACKED);
397 let p1_tracked = self.flags.contains(Flags::P1_TRACKED);
338 let p1_tracked = self.flags.contains(Flags::P1_TRACKED);
398 let p2_info = self.flags.contains(Flags::P2_INFO);
339 let p2_info = self.flags.contains(Flags::P2_INFO);
399 let mode_size = if self.flags.contains(Flags::HAS_MODE_AND_SIZE) {
340 let mode_size = if self.flags.contains(Flags::HAS_MODE_AND_SIZE) {
400 Some((self.data.mode.into(), self.data.size.into()))
341 Some((self.data.mode.into(), self.data.size.into()))
401 } else {
342 } else {
402 None
343 None
403 };
344 };
404 let mtime = if self.flags.contains(Flags::HAS_MTIME) {
345 let mtime = if self.flags.contains(Flags::HAS_MTIME) {
405 Some(self.data.mtime.into())
346 Some(self.data.mtime.into())
406 } else {
347 } else {
407 None
348 None
408 };
349 };
409 DirstateEntry::from_v2_data(
350 DirstateEntry::from_v2_data(
410 wdir_tracked,
351 wdir_tracked,
411 p1_tracked,
352 p1_tracked,
412 p2_info,
353 p2_info,
413 mode_size,
354 mode_size,
414 mtime,
355 mtime,
415 )
356 )
416 }
357 }
417
358
418 pub(super) fn entry(
359 pub(super) fn entry(
419 &self,
360 &self,
420 ) -> Result<Option<DirstateEntry>, DirstateV2ParseError> {
361 ) -> Result<Option<DirstateEntry>, DirstateV2ParseError> {
421 if self.has_entry() {
362 if self.has_entry() {
422 Ok(Some(self.assume_entry()))
363 Ok(Some(self.assume_entry()))
423 } else {
364 } else {
424 Ok(None)
365 Ok(None)
425 }
366 }
426 }
367 }
427
368
428 pub(super) fn children<'on_disk>(
369 pub(super) fn children<'on_disk>(
429 &self,
370 &self,
430 on_disk: &'on_disk [u8],
371 on_disk: &'on_disk [u8],
431 ) -> Result<&'on_disk [Node], DirstateV2ParseError> {
372 ) -> Result<&'on_disk [Node], DirstateV2ParseError> {
432 read_nodes(on_disk, self.children)
373 read_nodes(on_disk, self.children)
433 }
374 }
434
375
435 pub(super) fn to_in_memory_node<'on_disk>(
376 pub(super) fn to_in_memory_node<'on_disk>(
436 &self,
377 &self,
437 on_disk: &'on_disk [u8],
378 on_disk: &'on_disk [u8],
438 ) -> Result<dirstate_map::Node<'on_disk>, DirstateV2ParseError> {
379 ) -> Result<dirstate_map::Node<'on_disk>, DirstateV2ParseError> {
439 Ok(dirstate_map::Node {
380 Ok(dirstate_map::Node {
440 children: dirstate_map::ChildNodes::OnDisk(
381 children: dirstate_map::ChildNodes::OnDisk(
441 self.children(on_disk)?,
382 self.children(on_disk)?,
442 ),
383 ),
443 copy_source: self.copy_source(on_disk)?.map(Cow::Borrowed),
384 copy_source: self.copy_source(on_disk)?.map(Cow::Borrowed),
444 data: self.node_data()?,
385 data: self.node_data()?,
445 descendants_with_entry_count: self
386 descendants_with_entry_count: self
446 .descendants_with_entry_count
387 .descendants_with_entry_count
447 .get(),
388 .get(),
448 tracked_descendants_count: self.tracked_descendants_count.get(),
389 tracked_descendants_count: self.tracked_descendants_count.get(),
449 })
390 })
450 }
391 }
451 }
392 }
452
393
453 impl Entry {
394 impl Entry {
454 fn from_dirstate_entry(entry: &DirstateEntry) -> (Flags, Self) {
395 fn from_dirstate_entry(entry: &DirstateEntry) -> (Flags, Self) {
455 let (wdir_tracked, p1_tracked, p2_info, mode_size_opt, mtime_opt) =
396 let (wdir_tracked, p1_tracked, p2_info, mode_size_opt, mtime_opt) =
456 entry.v2_data();
397 entry.v2_data();
457 // TODO: convert throug raw flag bits instead?
398 // TODO: convert throug raw flag bits instead?
458 let mut flags = Flags::empty();
399 let mut flags = Flags::empty();
459 flags.set(Flags::WDIR_TRACKED, wdir_tracked);
400 flags.set(Flags::WDIR_TRACKED, wdir_tracked);
460 flags.set(Flags::P1_TRACKED, p1_tracked);
401 flags.set(Flags::P1_TRACKED, p1_tracked);
461 flags.set(Flags::P2_INFO, p2_info);
402 flags.set(Flags::P2_INFO, p2_info);
462 let (mode, size, mtime);
403 let (mode, size, mtime);
463 if let Some((m, s)) = mode_size_opt {
404 if let Some((m, s)) = mode_size_opt {
464 mode = m;
405 mode = m;
465 size = s;
406 size = s;
466 flags.insert(Flags::HAS_MODE_AND_SIZE)
407 flags.insert(Flags::HAS_MODE_AND_SIZE)
467 } else {
408 } else {
468 mode = 0;
409 mode = 0;
469 size = 0;
410 size = 0;
470 }
411 }
471 if let Some(m) = mtime_opt {
412 if let Some(m) = mtime_opt {
472 mtime = m;
413 mtime = m;
473 flags.insert(Flags::HAS_MTIME);
414 flags.insert(Flags::HAS_MTIME);
474 } else {
415 } else {
475 mtime = 0;
416 mtime = 0;
476 }
417 }
477 let raw_entry = Entry {
418 let raw_entry = Entry {
478 mode: mode.into(),
419 mode: mode.into(),
479 size: size.into(),
420 size: size.into(),
480 mtime: mtime.into(),
421 mtime: mtime.into(),
481 };
422 };
482 (flags, raw_entry)
423 (flags, raw_entry)
483 }
424 }
484
425
485 fn from_timestamp(timestamp: Timestamp) -> Self {
426 fn from_timestamp(timestamp: Timestamp) -> Self {
486 // Safety: both types implement the `ByteCast` trait, so we could
427 // Safety: both types implement the `ByteCast` trait, so we could
487 // safely use `as_bytes` and `from_bytes` to do this conversion. Using
428 // safely use `as_bytes` and `from_bytes` to do this conversion. Using
488 // `transmute` instead makes the compiler check that the two types
429 // `transmute` instead makes the compiler check that the two types
489 // have the same size, which eliminates the error case of
430 // have the same size, which eliminates the error case of
490 // `from_bytes`.
431 // `from_bytes`.
491 unsafe { std::mem::transmute::<Timestamp, Entry>(timestamp) }
432 unsafe { std::mem::transmute::<Timestamp, Entry>(timestamp) }
492 }
433 }
493
434
494 fn as_timestamp(&self) -> &Timestamp {
435 fn as_timestamp(&self) -> &Timestamp {
495 // Safety: same as above in `from_timestamp`
436 // Safety: same as above in `from_timestamp`
496 unsafe { &*(self as *const Entry as *const Timestamp) }
437 unsafe { &*(self as *const Entry as *const Timestamp) }
497 }
438 }
498 }
439 }
499
440
500 impl Timestamp {
441 impl Timestamp {
501 pub fn seconds(&self) -> i64 {
442 pub fn seconds(&self) -> i64 {
502 self.seconds.get()
443 self.seconds.get()
503 }
444 }
504 }
445 }
505
446
506 impl From<SystemTime> for Timestamp {
447 impl From<SystemTime> for Timestamp {
507 fn from(system_time: SystemTime) -> Self {
448 fn from(system_time: SystemTime) -> Self {
508 let (secs, nanos) = match system_time.duration_since(UNIX_EPOCH) {
449 let (secs, nanos) = match system_time.duration_since(UNIX_EPOCH) {
509 Ok(duration) => {
450 Ok(duration) => {
510 (duration.as_secs() as i64, duration.subsec_nanos())
451 (duration.as_secs() as i64, duration.subsec_nanos())
511 }
452 }
512 Err(error) => {
453 Err(error) => {
513 let negative = error.duration();
454 let negative = error.duration();
514 (-(negative.as_secs() as i64), negative.subsec_nanos())
455 (-(negative.as_secs() as i64), negative.subsec_nanos())
515 }
456 }
516 };
457 };
517 Timestamp {
458 Timestamp {
518 seconds: secs.into(),
459 seconds: secs.into(),
519 nanoseconds: nanos.into(),
460 nanoseconds: nanos.into(),
520 }
461 }
521 }
462 }
522 }
463 }
523
464
524 impl From<&'_ Timestamp> for SystemTime {
465 impl From<&'_ Timestamp> for SystemTime {
525 fn from(timestamp: &'_ Timestamp) -> Self {
466 fn from(timestamp: &'_ Timestamp) -> Self {
526 let secs = timestamp.seconds.get();
467 let secs = timestamp.seconds.get();
527 let nanos = timestamp.nanoseconds.get();
468 let nanos = timestamp.nanoseconds.get();
528 if secs >= 0 {
469 if secs >= 0 {
529 UNIX_EPOCH + Duration::new(secs as u64, nanos)
470 UNIX_EPOCH + Duration::new(secs as u64, nanos)
530 } else {
471 } else {
531 UNIX_EPOCH - Duration::new((-secs) as u64, nanos)
472 UNIX_EPOCH - Duration::new((-secs) as u64, nanos)
532 }
473 }
533 }
474 }
534 }
475 }
535
476
536 fn read_hg_path(
477 fn read_hg_path(
537 on_disk: &[u8],
478 on_disk: &[u8],
538 slice: PathSlice,
479 slice: PathSlice,
539 ) -> Result<&HgPath, DirstateV2ParseError> {
480 ) -> Result<&HgPath, DirstateV2ParseError> {
540 read_slice(on_disk, slice.start, slice.len.get()).map(HgPath::new)
481 read_slice(on_disk, slice.start, slice.len.get()).map(HgPath::new)
541 }
482 }
542
483
543 fn read_nodes(
484 fn read_nodes(
544 on_disk: &[u8],
485 on_disk: &[u8],
545 slice: ChildNodes,
486 slice: ChildNodes,
546 ) -> Result<&[Node], DirstateV2ParseError> {
487 ) -> Result<&[Node], DirstateV2ParseError> {
547 read_slice(on_disk, slice.start, slice.len.get())
488 read_slice(on_disk, slice.start, slice.len.get())
548 }
489 }
549
490
550 fn read_slice<T, Len>(
491 fn read_slice<T, Len>(
551 on_disk: &[u8],
492 on_disk: &[u8],
552 start: Offset,
493 start: Offset,
553 len: Len,
494 len: Len,
554 ) -> Result<&[T], DirstateV2ParseError>
495 ) -> Result<&[T], DirstateV2ParseError>
555 where
496 where
556 T: BytesCast,
497 T: BytesCast,
557 Len: TryInto<usize>,
498 Len: TryInto<usize>,
558 {
499 {
559 // Either `usize::MAX` would result in "out of bounds" error since a single
500 // Either `usize::MAX` would result in "out of bounds" error since a single
560 // `&[u8]` cannot occupy the entire addess space.
501 // `&[u8]` cannot occupy the entire addess space.
561 let start = start.get().try_into().unwrap_or(std::usize::MAX);
502 let start = start.get().try_into().unwrap_or(std::usize::MAX);
562 let len = len.try_into().unwrap_or(std::usize::MAX);
503 let len = len.try_into().unwrap_or(std::usize::MAX);
563 on_disk
504 on_disk
564 .get(start..)
505 .get(start..)
565 .and_then(|bytes| T::slice_from_bytes(bytes, len).ok())
506 .and_then(|bytes| T::slice_from_bytes(bytes, len).ok())
566 .map(|(slice, _rest)| slice)
507 .map(|(slice, _rest)| slice)
567 .ok_or_else(|| DirstateV2ParseError)
508 .ok_or_else(|| DirstateV2ParseError)
568 }
509 }
569
510
570 pub(crate) fn for_each_tracked_path<'on_disk>(
511 pub(crate) fn for_each_tracked_path<'on_disk>(
571 on_disk: &'on_disk [u8],
512 on_disk: &'on_disk [u8],
572 metadata: &[u8],
513 metadata: &[u8],
573 mut f: impl FnMut(&'on_disk HgPath),
514 mut f: impl FnMut(&'on_disk HgPath),
574 ) -> Result<(), DirstateV2ParseError> {
515 ) -> Result<(), DirstateV2ParseError> {
575 let (meta, _) = TreeMetadata::from_bytes(metadata)
516 let (meta, _) = TreeMetadata::from_bytes(metadata)
576 .map_err(|_| DirstateV2ParseError)?;
517 .map_err(|_| DirstateV2ParseError)?;
577 fn recur<'on_disk>(
518 fn recur<'on_disk>(
578 on_disk: &'on_disk [u8],
519 on_disk: &'on_disk [u8],
579 nodes: ChildNodes,
520 nodes: ChildNodes,
580 f: &mut impl FnMut(&'on_disk HgPath),
521 f: &mut impl FnMut(&'on_disk HgPath),
581 ) -> Result<(), DirstateV2ParseError> {
522 ) -> Result<(), DirstateV2ParseError> {
582 for node in read_nodes(on_disk, nodes)? {
523 for node in read_nodes(on_disk, nodes)? {
583 if let Some(entry) = node.entry()? {
524 if let Some(entry) = node.entry()? {
584 if entry.state().is_tracked() {
525 if entry.state().is_tracked() {
585 f(node.full_path(on_disk)?)
526 f(node.full_path(on_disk)?)
586 }
527 }
587 }
528 }
588 recur(on_disk, node.children, f)?
529 recur(on_disk, node.children, f)?
589 }
530 }
590 Ok(())
531 Ok(())
591 }
532 }
592 recur(on_disk, meta.root_nodes, &mut f)
533 recur(on_disk, meta.root_nodes, &mut f)
593 }
534 }
594
535
595 /// Returns new data and metadata, together with whether that data should be
536 /// Returns new data and metadata, together with whether that data should be
596 /// appended to the existing data file whose content is at
537 /// appended to the existing data file whose content is at
597 /// `dirstate_map.on_disk` (true), instead of written to a new data file
538 /// `dirstate_map.on_disk` (true), instead of written to a new data file
598 /// (false).
539 /// (false).
599 pub(super) fn write(
540 pub(super) fn write(
600 dirstate_map: &mut DirstateMap,
541 dirstate_map: &mut DirstateMap,
601 can_append: bool,
542 can_append: bool,
602 ) -> Result<(Vec<u8>, Vec<u8>, bool), DirstateError> {
543 ) -> Result<(Vec<u8>, Vec<u8>, bool), DirstateError> {
603 let append = can_append && dirstate_map.write_should_append();
544 let append = can_append && dirstate_map.write_should_append();
604
545
605 // This ignores the space for paths, and for nodes without an entry.
546 // This ignores the space for paths, and for nodes without an entry.
606 // TODO: better estimate? Skip the `Vec` and write to a file directly?
547 // TODO: better estimate? Skip the `Vec` and write to a file directly?
607 let size_guess = std::mem::size_of::<Node>()
548 let size_guess = std::mem::size_of::<Node>()
608 * dirstate_map.nodes_with_entry_count as usize;
549 * dirstate_map.nodes_with_entry_count as usize;
609
550
610 let mut writer = Writer {
551 let mut writer = Writer {
611 dirstate_map,
552 dirstate_map,
612 append,
553 append,
613 out: Vec::with_capacity(size_guess),
554 out: Vec::with_capacity(size_guess),
614 };
555 };
615
556
616 let root_nodes = writer.write_nodes(dirstate_map.root.as_ref())?;
557 let root_nodes = writer.write_nodes(dirstate_map.root.as_ref())?;
617
558
618 let meta = TreeMetadata {
559 let meta = TreeMetadata {
619 root_nodes,
560 root_nodes,
620 nodes_with_entry_count: dirstate_map.nodes_with_entry_count.into(),
561 nodes_with_entry_count: dirstate_map.nodes_with_entry_count.into(),
621 nodes_with_copy_source_count: dirstate_map
562 nodes_with_copy_source_count: dirstate_map
622 .nodes_with_copy_source_count
563 .nodes_with_copy_source_count
623 .into(),
564 .into(),
624 unreachable_bytes: dirstate_map.unreachable_bytes.into(),
565 unreachable_bytes: dirstate_map.unreachable_bytes.into(),
625 unused: [0; 4],
566 unused: [0; 4],
626 ignore_patterns_hash: dirstate_map.ignore_patterns_hash,
567 ignore_patterns_hash: dirstate_map.ignore_patterns_hash,
627 };
568 };
628 Ok((writer.out, meta.as_bytes().to_vec(), append))
569 Ok((writer.out, meta.as_bytes().to_vec(), append))
629 }
570 }
630
571
631 struct Writer<'dmap, 'on_disk> {
572 struct Writer<'dmap, 'on_disk> {
632 dirstate_map: &'dmap DirstateMap<'on_disk>,
573 dirstate_map: &'dmap DirstateMap<'on_disk>,
633 append: bool,
574 append: bool,
634 out: Vec<u8>,
575 out: Vec<u8>,
635 }
576 }
636
577
637 impl Writer<'_, '_> {
578 impl Writer<'_, '_> {
638 fn write_nodes(
579 fn write_nodes(
639 &mut self,
580 &mut self,
640 nodes: dirstate_map::ChildNodesRef,
581 nodes: dirstate_map::ChildNodesRef,
641 ) -> Result<ChildNodes, DirstateError> {
582 ) -> Result<ChildNodes, DirstateError> {
642 // Reuse already-written nodes if possible
583 // Reuse already-written nodes if possible
643 if self.append {
584 if self.append {
644 if let dirstate_map::ChildNodesRef::OnDisk(nodes_slice) = nodes {
585 if let dirstate_map::ChildNodesRef::OnDisk(nodes_slice) = nodes {
645 let start = self.on_disk_offset_of(nodes_slice).expect(
586 let start = self.on_disk_offset_of(nodes_slice).expect(
646 "dirstate-v2 OnDisk nodes not found within on_disk",
587 "dirstate-v2 OnDisk nodes not found within on_disk",
647 );
588 );
648 let len = child_nodes_len_from_usize(nodes_slice.len());
589 let len = child_nodes_len_from_usize(nodes_slice.len());
649 return Ok(ChildNodes { start, len });
590 return Ok(ChildNodes { start, len });
650 }
591 }
651 }
592 }
652
593
653 // `dirstate_map::ChildNodes::InMemory` contains a `HashMap` which has
594 // `dirstate_map::ChildNodes::InMemory` contains a `HashMap` which has
654 // undefined iteration order. Sort to enable binary search in the
595 // undefined iteration order. Sort to enable binary search in the
655 // written file.
596 // written file.
656 let nodes = nodes.sorted();
597 let nodes = nodes.sorted();
657 let nodes_len = nodes.len();
598 let nodes_len = nodes.len();
658
599
659 // First accumulate serialized nodes in a `Vec`
600 // First accumulate serialized nodes in a `Vec`
660 let mut on_disk_nodes = Vec::with_capacity(nodes_len);
601 let mut on_disk_nodes = Vec::with_capacity(nodes_len);
661 for node in nodes {
602 for node in nodes {
662 let children =
603 let children =
663 self.write_nodes(node.children(self.dirstate_map.on_disk)?)?;
604 self.write_nodes(node.children(self.dirstate_map.on_disk)?)?;
664 let full_path = node.full_path(self.dirstate_map.on_disk)?;
605 let full_path = node.full_path(self.dirstate_map.on_disk)?;
665 let full_path = self.write_path(full_path.as_bytes());
606 let full_path = self.write_path(full_path.as_bytes());
666 let copy_source = if let Some(source) =
607 let copy_source = if let Some(source) =
667 node.copy_source(self.dirstate_map.on_disk)?
608 node.copy_source(self.dirstate_map.on_disk)?
668 {
609 {
669 self.write_path(source.as_bytes())
610 self.write_path(source.as_bytes())
670 } else {
611 } else {
671 PathSlice {
612 PathSlice {
672 start: 0.into(),
613 start: 0.into(),
673 len: 0.into(),
614 len: 0.into(),
674 }
615 }
675 };
616 };
676 on_disk_nodes.push(match node {
617 on_disk_nodes.push(match node {
677 NodeRef::InMemory(path, node) => {
618 NodeRef::InMemory(path, node) => {
678 let (flags, data) = match &node.data {
619 let (flags, data) = match &node.data {
679 dirstate_map::NodeData::Entry(entry) => {
620 dirstate_map::NodeData::Entry(entry) => {
680 Entry::from_dirstate_entry(entry)
621 Entry::from_dirstate_entry(entry)
681 }
622 }
682 dirstate_map::NodeData::CachedDirectory { mtime } => {
623 dirstate_map::NodeData::CachedDirectory { mtime } => {
683 (Flags::HAS_MTIME, Entry::from_timestamp(*mtime))
624 (Flags::HAS_MTIME, Entry::from_timestamp(*mtime))
684 }
625 }
685 dirstate_map::NodeData::None => (
626 dirstate_map::NodeData::None => (
686 Flags::empty(),
627 Flags::empty(),
687 Entry {
628 Entry {
688 mode: 0.into(),
629 mode: 0.into(),
689 size: 0.into(),
630 size: 0.into(),
690 mtime: 0.into(),
631 mtime: 0.into(),
691 },
632 },
692 ),
633 ),
693 };
634 };
694 Node {
635 Node {
695 children,
636 children,
696 copy_source,
637 copy_source,
697 full_path,
638 full_path,
698 base_name_start: u16::try_from(path.base_name_start())
639 base_name_start: u16::try_from(path.base_name_start())
699 // Could only panic for paths over 64 KiB
640 // Could only panic for paths over 64 KiB
700 .expect("dirstate-v2 path length overflow")
641 .expect("dirstate-v2 path length overflow")
701 .into(),
642 .into(),
702 descendants_with_entry_count: node
643 descendants_with_entry_count: node
703 .descendants_with_entry_count
644 .descendants_with_entry_count
704 .into(),
645 .into(),
705 tracked_descendants_count: node
646 tracked_descendants_count: node
706 .tracked_descendants_count
647 .tracked_descendants_count
707 .into(),
648 .into(),
708 flags,
649 flags,
709 data,
650 data,
710 }
651 }
711 }
652 }
712 NodeRef::OnDisk(node) => Node {
653 NodeRef::OnDisk(node) => Node {
713 children,
654 children,
714 copy_source,
655 copy_source,
715 full_path,
656 full_path,
716 ..*node
657 ..*node
717 },
658 },
718 })
659 })
719 }
660 }
720 // … so we can write them contiguously, after writing everything else
661 // … so we can write them contiguously, after writing everything else
721 // they refer to.
662 // they refer to.
722 let start = self.current_offset();
663 let start = self.current_offset();
723 let len = child_nodes_len_from_usize(nodes_len);
664 let len = child_nodes_len_from_usize(nodes_len);
724 self.out.extend(on_disk_nodes.as_bytes());
665 self.out.extend(on_disk_nodes.as_bytes());
725 Ok(ChildNodes { start, len })
666 Ok(ChildNodes { start, len })
726 }
667 }
727
668
728 /// If the given slice of items is within `on_disk`, returns its offset
669 /// If the given slice of items is within `on_disk`, returns its offset
729 /// from the start of `on_disk`.
670 /// from the start of `on_disk`.
730 fn on_disk_offset_of<T>(&self, slice: &[T]) -> Option<Offset>
671 fn on_disk_offset_of<T>(&self, slice: &[T]) -> Option<Offset>
731 where
672 where
732 T: BytesCast,
673 T: BytesCast,
733 {
674 {
734 fn address_range(slice: &[u8]) -> std::ops::RangeInclusive<usize> {
675 fn address_range(slice: &[u8]) -> std::ops::RangeInclusive<usize> {
735 let start = slice.as_ptr() as usize;
676 let start = slice.as_ptr() as usize;
736 let end = start + slice.len();
677 let end = start + slice.len();
737 start..=end
678 start..=end
738 }
679 }
739 let slice_addresses = address_range(slice.as_bytes());
680 let slice_addresses = address_range(slice.as_bytes());
740 let on_disk_addresses = address_range(self.dirstate_map.on_disk);
681 let on_disk_addresses = address_range(self.dirstate_map.on_disk);
741 if on_disk_addresses.contains(slice_addresses.start())
682 if on_disk_addresses.contains(slice_addresses.start())
742 && on_disk_addresses.contains(slice_addresses.end())
683 && on_disk_addresses.contains(slice_addresses.end())
743 {
684 {
744 let offset = slice_addresses.start() - on_disk_addresses.start();
685 let offset = slice_addresses.start() - on_disk_addresses.start();
745 Some(offset_from_usize(offset))
686 Some(offset_from_usize(offset))
746 } else {
687 } else {
747 None
688 None
748 }
689 }
749 }
690 }
750
691
751 fn current_offset(&mut self) -> Offset {
692 fn current_offset(&mut self) -> Offset {
752 let mut offset = self.out.len();
693 let mut offset = self.out.len();
753 if self.append {
694 if self.append {
754 offset += self.dirstate_map.on_disk.len()
695 offset += self.dirstate_map.on_disk.len()
755 }
696 }
756 offset_from_usize(offset)
697 offset_from_usize(offset)
757 }
698 }
758
699
759 fn write_path(&mut self, slice: &[u8]) -> PathSlice {
700 fn write_path(&mut self, slice: &[u8]) -> PathSlice {
760 let len = path_len_from_usize(slice.len());
701 let len = path_len_from_usize(slice.len());
761 // Reuse an already-written path if possible
702 // Reuse an already-written path if possible
762 if self.append {
703 if self.append {
763 if let Some(start) = self.on_disk_offset_of(slice) {
704 if let Some(start) = self.on_disk_offset_of(slice) {
764 return PathSlice { start, len };
705 return PathSlice { start, len };
765 }
706 }
766 }
707 }
767 let start = self.current_offset();
708 let start = self.current_offset();
768 self.out.extend(slice.as_bytes());
709 self.out.extend(slice.as_bytes());
769 PathSlice { start, len }
710 PathSlice { start, len }
770 }
711 }
771 }
712 }
772
713
773 fn offset_from_usize(x: usize) -> Offset {
714 fn offset_from_usize(x: usize) -> Offset {
774 u32::try_from(x)
715 u32::try_from(x)
775 // Could only panic for a dirstate file larger than 4 GiB
716 // Could only panic for a dirstate file larger than 4 GiB
776 .expect("dirstate-v2 offset overflow")
717 .expect("dirstate-v2 offset overflow")
777 .into()
718 .into()
778 }
719 }
779
720
780 fn child_nodes_len_from_usize(x: usize) -> Size {
721 fn child_nodes_len_from_usize(x: usize) -> Size {
781 u32::try_from(x)
722 u32::try_from(x)
782 // Could only panic with over 4 billion nodes
723 // Could only panic with over 4 billion nodes
783 .expect("dirstate-v2 slice length overflow")
724 .expect("dirstate-v2 slice length overflow")
784 .into()
725 .into()
785 }
726 }
786
727
787 fn path_len_from_usize(x: usize) -> PathSize {
728 fn path_len_from_usize(x: usize) -> PathSize {
788 u16::try_from(x)
729 u16::try_from(x)
789 // Could only panic for paths over 64 KiB
730 // Could only panic for paths over 64 KiB
790 .expect("dirstate-v2 path length overflow")
731 .expect("dirstate-v2 path length overflow")
791 .into()
732 .into()
792 }
733 }
General Comments 0
You need to be logged in to leave comments. Login now