|
@@
-0,0
+1,375
b''
|
|
|
|
|
1
|
The *dirstate* is what Mercurial uses internally to track
|
|
|
|
|
2
|
the state of files in the working directory,
|
|
|
|
|
3
|
such as set by commands like `hg add` and `hg rm`.
|
|
|
|
|
4
|
It also contains some cached data that help make `hg status` faster.
|
|
|
|
|
5
|
The name refers both to `.hg/dirstate` on the filesystem
|
|
|
|
|
6
|
and the corresponding data structure in memory while a Mercurial process
|
|
|
|
|
7
|
is running.
|
|
|
|
|
8
|
|
|
|
|
|
9
|
The original file format, retroactively dubbed `dirstate-v1`,
|
|
|
|
|
10
|
is described at https://www.mercurial-scm.org/wiki/DirState.
|
|
|
|
|
11
|
It is made of a flat sequence of unordered variable-size entries,
|
|
|
|
|
12
|
so accessing any information in it requires parsing all of it.
|
|
|
|
|
13
|
Similarly, saving changes requires rewriting the entire file.
|
|
|
|
|
14
|
|
|
|
|
|
15
|
The newer `dirsate-v2` file format is designed to fix these limitations
|
|
|
|
|
16
|
and make `hg status` faster.
|
|
|
|
|
17
|
|
|
|
|
|
18
|
User guide
|
|
|
|
|
19
|
==========
|
|
|
|
|
20
|
|
|
|
|
|
21
|
Compatibility
|
|
|
|
|
22
|
-------------
|
|
|
|
|
23
|
|
|
|
|
|
24
|
The file format is experimental and may still change.
|
|
|
|
|
25
|
Different versions of Mercurial may not be compatible with each other
|
|
|
|
|
26
|
when working on a local repository that uses this format.
|
|
|
|
|
27
|
When using an incompatible version with the experimental format,
|
|
|
|
|
28
|
anything can happen including data corruption.
|
|
|
|
|
29
|
|
|
|
|
|
30
|
Since the dirstate is entirely local and not relevant to the wire protocol,
|
|
|
|
|
31
|
`dirstate-v2` does not affect compatibility with remote Mercurial versions.
|
|
|
|
|
32
|
|
|
|
|
|
33
|
When `share-safe` is enabled, different repositories sharing the same store
|
|
|
|
|
34
|
can use different dirstate formats.
|
|
|
|
|
35
|
|
|
|
|
|
36
|
Enabling `dirsate-v2` for new local repositories
|
|
|
|
|
37
|
------------------------------------------------
|
|
|
|
|
38
|
|
|
|
|
|
39
|
When creating a new local repository such as with `hg init` or `hg clone`,
|
|
|
|
|
40
|
the `exp-dirstate-v2` boolean in the `format` configuration section
|
|
|
|
|
41
|
controls whether to use this file format.
|
|
|
|
|
42
|
This is disabled by default as of this writing.
|
|
|
|
|
43
|
To enable it for a single repository, run for example::
|
|
|
|
|
44
|
|
|
|
|
|
45
|
$ hg init my-project --config format.exp-dirstate-v2=1
|
|
|
|
|
46
|
|
|
|
|
|
47
|
Checking the format of an existing local repsitory
|
|
|
|
|
48
|
--------------------------------------------------
|
|
|
|
|
49
|
|
|
|
|
|
50
|
The `debugformat` commands prints information about
|
|
|
|
|
51
|
which of multiple optional formats are used in the current repository,
|
|
|
|
|
52
|
including `dirstate-v2`::
|
|
|
|
|
53
|
|
|
|
|
|
54
|
$ hg debugformat
|
|
|
|
|
55
|
format-variant repo
|
|
|
|
|
56
|
fncache: yes
|
|
|
|
|
57
|
dirstate-v2: yes
|
|
|
|
|
58
|
[…]
|
|
|
|
|
59
|
|
|
|
|
|
60
|
Upgrading or downgrading an existing local repository
|
|
|
|
|
61
|
-----------------------------------------------------
|
|
|
|
|
62
|
|
|
|
|
|
63
|
The `debugupgrade` command does various upgrades or downgrades
|
|
|
|
|
64
|
on a local repository
|
|
|
|
|
65
|
based on the current Mercurial version and on configuration.
|
|
|
|
|
66
|
The same `format.exp-dirstate-v2` configuration is used again.
|
|
|
|
|
67
|
|
|
|
|
|
68
|
Example to upgrade::
|
|
|
|
|
69
|
|
|
|
|
|
70
|
$ hg debugupgrade --config format.exp-dirstate-v2=1
|
|
|
|
|
71
|
|
|
|
|
|
72
|
Example to downgrade to `dirstate-v1`::
|
|
|
|
|
73
|
|
|
|
|
|
74
|
$ hg debugupgrade --config format.exp-dirstate-v2=0
|
|
|
|
|
75
|
|
|
|
|
|
76
|
Both of this commands do nothing but print a list of proposed changes,
|
|
|
|
|
77
|
which may include changes unrelated to the dirstate.
|
|
|
|
|
78
|
Those other changes are controlled by their own configuration keys.
|
|
|
|
|
79
|
Add `--run` to a command to actually apply the proposed changes.
|
|
|
|
|
80
|
|
|
|
|
|
81
|
Backups of `.hg/requires` and `.hg/dirstate` are created
|
|
|
|
|
82
|
in a `.hg/upgradebackup.*` directory.
|
|
|
|
|
83
|
If something goes wrong, restoring those files should undo the change.
|
|
|
|
|
84
|
|
|
|
|
|
85
|
Note that upgrading affects compatibility with older versions of Mercurial
|
|
|
|
|
86
|
as noted above.
|
|
|
|
|
87
|
This can be relevant when a repository’s files are on a USB drive
|
|
|
|
|
88
|
or some other removable media, or shared over the network, etc.
|
|
|
|
|
89
|
|
|
|
|
|
90
|
Internal filesystem representation
|
|
|
|
|
91
|
==================================
|
|
|
|
|
92
|
|
|
|
|
|
93
|
Requirements file
|
|
|
|
|
94
|
-----------------
|
|
|
|
|
95
|
|
|
|
|
|
96
|
The `.hg/requires` file indicates which of various optional file formats
|
|
|
|
|
97
|
are used by a given repository.
|
|
|
|
|
98
|
Mercurial aborts when seeing a requirement it does not know about,
|
|
|
|
|
99
|
which avoids older version accidentally messing up a respository
|
|
|
|
|
100
|
that uses a format that was introduced later.
|
|
|
|
|
101
|
For versions that do support a format, the presence or absence of
|
|
|
|
|
102
|
the corresponding requirement indicates whether to use that format.
|
|
|
|
|
103
|
|
|
|
|
|
104
|
When the file contains a `exp-dirstate-v2` line,
|
|
|
|
|
105
|
the `dirstate-v2` format is used.
|
|
|
|
|
106
|
With no such line `dirstate-v1` is used.
|
|
|
|
|
107
|
|
|
|
|
|
108
|
High level description
|
|
|
|
|
109
|
----------------------
|
|
|
|
|
110
|
|
|
|
|
|
111
|
Whereas `dirstate-v1` uses a single `.hg/disrtate` file,
|
|
|
|
|
112
|
in `dirstate-v2` that file is a "docket" file
|
|
|
|
|
113
|
that only contains some metadata
|
|
|
|
|
114
|
and points to separate data file named `.hg/dirstate.{ID}`,
|
|
|
|
|
115
|
where `{ID}` is a random identifier.
|
|
|
|
|
116
|
|
|
|
|
|
117
|
This separation allows making data files append-only
|
|
|
|
|
118
|
and therefore safer to memory-map.
|
|
|
|
|
119
|
Creating a new data file (occasionally to clean up unused data)
|
|
|
|
|
120
|
can be done with a different ID
|
|
|
|
|
121
|
without disrupting another Mercurial process
|
|
|
|
|
122
|
that could still be using the previous data file.
|
|
|
|
|
123
|
|
|
|
|
|
124
|
Both files have a format designed to reduce the need for parsing,
|
|
|
|
|
125
|
by using fixed-size binary components as much as possible.
|
|
|
|
|
126
|
For data that is not fixed-size,
|
|
|
|
|
127
|
references to other parts of a file can be made by storing "pseudo-pointers":
|
|
|
|
|
128
|
integers counted in bytes from the start of a file.
|
|
|
|
|
129
|
For read-only access no data structure is needed,
|
|
|
|
|
130
|
only a bytes buffer (possibly memory-mapped directly from the filesystem)
|
|
|
|
|
131
|
with specific parts read on demand.
|
|
|
|
|
132
|
|
|
|
|
|
133
|
The data file contains "nodes" organized in a tree.
|
|
|
|
|
134
|
Each node represents a file or directory inside the working directory
|
|
|
|
|
135
|
or its parent changeset.
|
|
|
|
|
136
|
This tree has the same structure as the filesystem,
|
|
|
|
|
137
|
so a node representing a directory has child nodes representing
|
|
|
|
|
138
|
the files and subdirectories contained directly in that directory.
|
|
|
|
|
139
|
|
|
|
|
|
140
|
The docket file format
|
|
|
|
|
141
|
----------------------
|
|
|
|
|
142
|
|
|
|
|
|
143
|
This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
|
|
|
|
|
144
|
and `mercurial/dirstateutils/docket.py`.
|
|
|
|
|
145
|
|
|
|
|
|
146
|
Components of the docket file are found at fixed offsets,
|
|
|
|
|
147
|
counted in bytes from the start of the file:
|
|
|
|
|
148
|
|
|
|
|
|
149
|
* Offset 0:
|
|
|
|
|
150
|
The 12-bytes marker string "dirstate-v2\n" ending with a newline character.
|
|
|
|
|
151
|
This makes it easier to tell a dirstate-v2 file from a dirstate-v1 file,
|
|
|
|
|
152
|
although it is not strictly necessary
|
|
|
|
|
153
|
since `.hg/requires` determines which format to use.
|
|
|
|
|
154
|
|
|
|
|
|
155
|
* Offset 12:
|
|
|
|
|
156
|
The changeset node ID on the first parent of the working directory,
|
|
|
|
|
157
|
as up to 32 binary bytes.
|
|
|
|
|
158
|
If a node ID is shorter (20 bytes for SHA-1),
|
|
|
|
|
159
|
it is start-aligned and the rest of the bytes are set to zero.
|
|
|
|
|
160
|
|
|
|
|
|
161
|
* Offset 44:
|
|
|
|
|
162
|
The changeset node ID on the second parent of the working directory,
|
|
|
|
|
163
|
or all zeros if there isn’t one.
|
|
|
|
|
164
|
Also 32 binary bytes.
|
|
|
|
|
165
|
|
|
|
|
|
166
|
* Offset 76:
|
|
|
|
|
167
|
Tree metadata on 44 bytes, described below.
|
|
|
|
|
168
|
Its separation in this documentation from the rest of the docket
|
|
|
|
|
169
|
reflects a detail of the current implementation.
|
|
|
|
|
170
|
Since tree metadata is also made of fields at fixed offsets, those could
|
|
|
|
|
171
|
be inlined here by adding 76 bytes to each offset.
|
|
|
|
|
172
|
|
|
|
|
|
173
|
* Offset 120:
|
|
|
|
|
174
|
The used size of the data file, as a 32-bit big-endian integer.
|
|
|
|
|
175
|
The actual size of the data file may be larger
|
|
|
|
|
176
|
(if another Mercurial processis in appending to it
|
|
|
|
|
177
|
but has not updated the docket yet).
|
|
|
|
|
178
|
That extra data must be ignored.
|
|
|
|
|
179
|
|
|
|
|
|
180
|
* Offset 124:
|
|
|
|
|
181
|
The length of the data file identifier, as a 8-bit integer.
|
|
|
|
|
182
|
|
|
|
|
|
183
|
* Offset 125:
|
|
|
|
|
184
|
The data file identifier.
|
|
|
|
|
185
|
|
|
|
|
|
186
|
* Any additional data is current ignored, and dropped when updating the file.
|
|
|
|
|
187
|
|
|
|
|
|
188
|
Tree metadata in the docket file
|
|
|
|
|
189
|
--------------------------------
|
|
|
|
|
190
|
|
|
|
|
|
191
|
Tree metadata is similarly made of components at fixed offsets.
|
|
|
|
|
192
|
These offsets are counted in bytes from the start of tree metadata,
|
|
|
|
|
193
|
which is 76 bytes after the start of the docket file.
|
|
|
|
|
194
|
|
|
|
|
|
195
|
This metadata can be thought of as the singular root of the tree
|
|
|
|
|
196
|
formed by nodes in the data file.
|
|
|
|
|
197
|
|
|
|
|
|
198
|
* Offset 0:
|
|
|
|
|
199
|
Pseudo-pointer to the start of root nodes,
|
|
|
|
|
200
|
counted in bytes from the start of the data file,
|
|
|
|
|
201
|
as a 32-bit big-endian integer.
|
|
|
|
|
202
|
These nodes describe files and directories found directly
|
|
|
|
|
203
|
at the root of the working directory.
|
|
|
|
|
204
|
|
|
|
|
|
205
|
* Offset 4:
|
|
|
|
|
206
|
Number of root nodes, as a 32-bit big-endian integer.
|
|
|
|
|
207
|
|
|
|
|
|
208
|
* Offset 8:
|
|
|
|
|
209
|
Total number of nodes in the entire tree that "have a dirstate entry",
|
|
|
|
|
210
|
as a 32-bit big-endian integer.
|
|
|
|
|
211
|
Those nodes represent files that would be present at all in `dirstate-v1`.
|
|
|
|
|
212
|
This is typically less than the total number of nodes.
|
|
|
|
|
213
|
This counter is used to implement `len(dirstatemap)`.
|
|
|
|
|
214
|
|
|
|
|
|
215
|
* Offset 12:
|
|
|
|
|
216
|
Number of nodes in the entire tree that have a copy source,
|
|
|
|
|
217
|
as a 32-bit big-endian integer.
|
|
|
|
|
218
|
At the next commit, these files are recorded
|
|
|
|
|
219
|
as having been copied or moved/renamed from that source.
|
|
|
|
|
220
|
(A move is recorded as a copy and separate removal of the source.)
|
|
|
|
|
221
|
This counter is used to implement `len(dirstatemap.copymap)`.
|
|
|
|
|
222
|
|
|
|
|
|
223
|
* Offset 16:
|
|
|
|
|
224
|
An estimation of how many bytes of the data file
|
|
|
|
|
225
|
(within its used size) are unused, as a 32-bit big-endian integer.
|
|
|
|
|
226
|
When appending to an existing data file,
|
|
|
|
|
227
|
some existing nodes or paths can be unreachable from the new root
|
|
|
|
|
228
|
but they still take up space.
|
|
|
|
|
229
|
This counter is used to decide when to write a new data file from scratch
|
|
|
|
|
230
|
instead of appending to an existing one,
|
|
|
|
|
231
|
in order to get rid of that unreachable data
|
|
|
|
|
232
|
and avoid unbounded file size growth.
|
|
|
|
|
233
|
|
|
|
|
|
234
|
* Offset 20:
|
|
|
|
|
235
|
These four bytes are currently ignored
|
|
|
|
|
236
|
and reset to zero when updating a docket file.
|
|
|
|
|
237
|
This is an attempt at forward compatibility:
|
|
|
|
|
238
|
future Mercurial versions could use this as a bit field
|
|
|
|
|
239
|
to indicate that a dirstate has additional data or constraints.
|
|
|
|
|
240
|
Finding a dirstate file with the relevant bit unset indicates that
|
|
|
|
|
241
|
it was written by a then-older version
|
|
|
|
|
242
|
which is not aware of that future change.
|
|
|
|
|
243
|
|
|
|
|
|
244
|
* Offset 24:
|
|
|
|
|
245
|
Either 20 zero bytes, or a SHA-1 hash as 20 binary bytes.
|
|
|
|
|
246
|
When present, the hash is of ignore patterns
|
|
|
|
|
247
|
that were used for some previous run of the `status` algorithm.
|
|
|
|
|
248
|
|
|
|
|
|
249
|
* (Offset 44: end of tree metadata)
|
|
|
|
|
250
|
|
|
|
|
|
251
|
Optional hash of ignore patterns
|
|
|
|
|
252
|
--------------------------------
|
|
|
|
|
253
|
|
|
|
|
|
254
|
The implementation of `status` at `rust/hg-core/src/dirstate_tree/status.rs`
|
|
|
|
|
255
|
has been optimized such that its run time is dominated by calls
|
|
|
|
|
256
|
to `stat` for reading the filesystem metadata of a file or directory,
|
|
|
|
|
257
|
and to `readdir` for listing the contents of a directory.
|
|
|
|
|
258
|
In some cases the algorithm can skip calls to `readdir`
|
|
|
|
|
259
|
(saving significant time)
|
|
|
|
|
260
|
because the dirstate already contains enough of the relevant information
|
|
|
|
|
261
|
to build the correct `status` results.
|
|
|
|
|
262
|
|
|
|
|
|
263
|
The default configuration of `hg status` is to list unknown files
|
|
|
|
|
264
|
but not ignored files.
|
|
|
|
|
265
|
In this case, it matters for the `readdir`-skipping optimization
|
|
|
|
|
266
|
if a given file used to be ignored but became unknown
|
|
|
|
|
267
|
because `.hgignore` changed.
|
|
|
|
|
268
|
To detect the possibility of such a change,
|
|
|
|
|
269
|
the tree metadata contains an optional hash of all ignore patterns.
|
|
|
|
|
270
|
|
|
|
|
|
271
|
We define:
|
|
|
|
|
272
|
|
|
|
|
|
273
|
* "Root" ignore files as:
|
|
|
|
|
274
|
|
|
|
|
|
275
|
- `.hgignore` at the root of the repository if it exists
|
|
|
|
|
276
|
- And all files from `ui.ignore.*` config.
|
|
|
|
|
277
|
|
|
|
|
|
278
|
This set of files is sorted by the string representation of their path.
|
|
|
|
|
279
|
|
|
|
|
|
280
|
* The "expanded contents" of an ignore files is the byte string made
|
|
|
|
|
281
|
by the concatenation of its contents followed by the "expanded contents"
|
|
|
|
|
282
|
of other files included with `include:` or `subinclude:` directives,
|
|
|
|
|
283
|
in inclusion order. This definition is recursive, as included files can
|
|
|
|
|
284
|
themselves include more files.
|
|
|
|
|
285
|
|
|
|
|
|
286
|
This hash is defined as the SHA-1 of the concatenation (in sorted
|
|
|
|
|
287
|
order) of the "expanded contents" of each "root" ignore file.
|
|
|
|
|
288
|
(Note that computing this does not require actually concatenating byte ranges into
|
|
|
|
|
289
|
contiguous memory.
|
|
|
|
|
290
|
Instead a SHA-1 hasher object can be created and fed separate byte ranges one by
|
|
|
|
|
291
|
one.)
|
|
|
|
|
292
|
|
|
|
|
|
293
|
The data file format
|
|
|
|
|
294
|
--------------------
|
|
|
|
|
295
|
|
|
|
|
|
296
|
This is implemented in `rust/hg-core/src/dirstate_tree/on_disk.rs`
|
|
|
|
|
297
|
and `mercurial/dirstateutils/v2.py`.
|
|
|
|
|
298
|
|
|
|
|
|
299
|
The data file contains two types of data: paths and nodes.
|
|
|
|
|
300
|
|
|
|
|
|
301
|
Paths and nodes can be organized in any order in the file, except that sibling
|
|
|
|
|
302
|
nodes must be next to each other and sorted by their path. Contiguity lets
|
|
|
|
|
303
|
the parent refer to them all by their count with a single pseudo-pointer,
|
|
|
|
|
304
|
instead of storing one pseudo-pointer per child node. Sorting allows using
|
|
|
|
|
305
|
binary seach to find a child node with a given name in `O(log(n))` byte ranges
|
|
|
|
|
306
|
comparisons.
|
|
|
|
|
307
|
|
|
|
|
|
308
|
The current implemention writes paths and child node before a given node
|
|
|
|
|
309
|
for ease of figuring out the value of pseudo-pointers by the time the are to be
|
|
|
|
|
310
|
written, but this is not an obligation and readers must not rely on it.
|
|
|
|
|
311
|
|
|
|
|
|
312
|
A path is stored as a byte string anywhere in the file, without delimiter.
|
|
|
|
|
313
|
It is refered to by one or more node by a pseudo-pointer to its start, and its
|
|
|
|
|
314
|
length in bytes. Since there is no delimiter,
|
|
|
|
|
315
|
when a path is a substring of another the same bytes could be reused,
|
|
|
|
|
316
|
although the implementation does not exploit this as of this writing.
|
|
|
|
|
317
|
|
|
|
|
|
318
|
A node is stored on 43 bytes with components at fixed offsets. Paths and
|
|
|
|
|
319
|
child nodes relevant to a node are stored externally and referenced though
|
|
|
|
|
320
|
pseudo-pointers.
|
|
|
|
|
321
|
|
|
|
|
|
322
|
All integers are stored in big-endian. All pseudo-pointers are 32-bit integers
|
|
|
|
|
323
|
counting bytes from the start of the data file. Path lengths and positions
|
|
|
|
|
324
|
are 16-bit integers, also counted in bytes.
|
|
|
|
|
325
|
|
|
|
|
|
326
|
Node components are:
|
|
|
|
|
327
|
|
|
|
|
|
328
|
* Offset 0:
|
|
|
|
|
329
|
Pseudo-pointer to the full path of this node,
|
|
|
|
|
330
|
from the working directory root.
|
|
|
|
|
331
|
|
|
|
|
|
332
|
* Offset 4:
|
|
|
|
|
333
|
Length of the full path.
|
|
|
|
|
334
|
|
|
|
|
|
335
|
* Offset 6:
|
|
|
|
|
336
|
Position of the last `/` path separator within the full path,
|
|
|
|
|
337
|
in bytes from the start of the full path,
|
|
|
|
|
338
|
or zero if there isn’t one.
|
|
|
|
|
339
|
The part of the full path after this position is the "base name".
|
|
|
|
|
340
|
Since sibling nodes have the same parent, only their base name vary
|
|
|
|
|
341
|
and needs to be considered when doing binary search to find a given path.
|
|
|
|
|
342
|
|
|
|
|
|
343
|
* Offset 8:
|
|
|
|
|
344
|
Pseudo-pointer to the "copy source" path for this node,
|
|
|
|
|
345
|
or zero if there is no copy source.
|
|
|
|
|
346
|
|
|
|
|
|
347
|
* Offset 12:
|
|
|
|
|
348
|
Length of the copy source path, or zero if there isn’t one.
|
|
|
|
|
349
|
|
|
|
|
|
350
|
* Offset 14:
|
|
|
|
|
351
|
Pseudo-pointer to the start of child nodes.
|
|
|
|
|
352
|
|
|
|
|
|
353
|
* Offset 18:
|
|
|
|
|
354
|
Number of child nodes, as a 32-bit integer.
|
|
|
|
|
355
|
They occupy 43 times this number of bytes
|
|
|
|
|
356
|
(not counting space for paths, and further descendants).
|
|
|
|
|
357
|
|
|
|
|
|
358
|
* Offset 22:
|
|
|
|
|
359
|
Number as a 32-bit integer of descendant nodes in this subtree,
|
|
|
|
|
360
|
not including this node itself,
|
|
|
|
|
361
|
that "have a dirstate entry".
|
|
|
|
|
362
|
Those nodes represent files that would be present at all in `dirstate-v1`.
|
|
|
|
|
363
|
This is typically less than the total number of descendants.
|
|
|
|
|
364
|
This counter is used to implement `has_dir`.
|
|
|
|
|
365
|
|
|
|
|
|
366
|
* Offset 26:
|
|
|
|
|
367
|
Number as a 32-bit integer of descendant nodes in this subtree,
|
|
|
|
|
368
|
not including this node itself,
|
|
|
|
|
369
|
that represent files tracked in the working directory.
|
|
|
|
|
370
|
(For example, `hg rm` makes a file untracked.)
|
|
|
|
|
371
|
This counter is used to implement `has_tracked_dir`.
|
|
|
|
|
372
|
|
|
|
|
|
373
|
* Offset 30 and more:
|
|
|
|
|
374
|
**TODO:** docs not written yet
|
|
|
|
|
375
|
as this part of the format might be changing soon.
|