changegroups.txt
207 lines
| 9.0 KiB
| text/plain
|
TextLexer
Gregory Szorc
|
r27372 | Changegroups are representations of repository revlog data, specifically | ||
Kyle Lippincott
|
r31213 | the changelog data, root/flat manifest data, treemanifest data, and | ||
filelogs. | ||||
Gregory Szorc
|
r27372 | |||
Augie Fackler
|
r27434 | There are 3 versions of changegroups: ``1``, ``2``, and ``3``. From a | ||
Kyle Lippincott
|
r31213 | high-level, versions ``1`` and ``2`` are almost exactly the same, with the | ||
Gregory Szorc
|
r40083 | only difference being an additional item in the *delta header*. Version | ||
``3`` adds support for storage flags in the *delta header* and optionally | ||||
Kyle Lippincott
|
r31213 | exchanging treemanifests (enabled by setting an option on the | ||
``changegroup`` part in the bundle2). | ||||
Gregory Szorc
|
r27372 | |||
Kyle Lippincott
|
r31213 | Changegroups when not exchanging treemanifests consist of 3 logical | ||
segments:: | ||||
Gregory Szorc
|
r27372 | |||
+---------------------------------+ | ||||
| | | | | ||||
| changeset | manifest | filelogs | | ||||
| | | | | ||||
Kyle Lippincott
|
r31213 | | | | | | ||
Gregory Szorc
|
r27372 | +---------------------------------+ | ||
Kyle Lippincott
|
r31213 | When exchanging treemanifests, there are 4 logical segments:: | ||
+-------------------------------------------------+ | ||||
| | | | | | ||||
| changeset | root | treemanifests | filelogs | | ||||
| | manifest | | | | ||||
| | | | | | ||||
+-------------------------------------------------+ | ||||
Gregory Szorc
|
r27372 | The principle building block of each segment is a *chunk*. A *chunk* | ||
is a framed piece of data:: | ||||
+---------------------------------------+ | ||||
| | | | ||||
| length | data | | ||||
Kyle Lippincott
|
r31213 | | (4 bytes) | (<length - 4> bytes) | | ||
Gregory Szorc
|
r27372 | | | | | ||
+---------------------------------------+ | ||||
Kyle Lippincott
|
r31213 | All integers are big-endian signed integers. Each chunk starts with a 32-bit | ||
integer indicating the length of the entire chunk (including the length field | ||||
itself). | ||||
Gregory Szorc
|
r27372 | |||
Kyle Lippincott
|
r31213 | There is a special case chunk that has a value of 0 for the length | ||
(``0x00000000``). We call this an *empty chunk*. | ||||
Gregory Szorc
|
r27372 | |||
Delta Groups | ||||
Gregory Szorc
|
r29747 | ============ | ||
Gregory Szorc
|
r27372 | |||
A *delta group* expresses the content of a revlog as a series of deltas, | ||||
or patches against previous revisions. | ||||
Delta groups consist of 0 or more *chunks* followed by the *empty chunk* | ||||
to signal the end of the delta group:: | ||||
+------------------------------------------------------------------------+ | ||||
| | | | | | | ||||
| chunk0 length | chunk0 data | chunk1 length | chunk1 data | 0x0 | | ||||
Kyle Lippincott
|
r31213 | | (4 bytes) | (various) | (4 bytes) | (various) | (4 bytes) | | ||
Gregory Szorc
|
r27372 | | | | | | | | ||
Kyle Lippincott
|
r31213 | +------------------------------------------------------------------------+ | ||
Gregory Szorc
|
r27372 | |||
Each *chunk*'s data consists of the following:: | ||||
Kyle Lippincott
|
r31213 | +---------------------------------------+ | ||
| | | | ||||
| delta header | delta data | | ||||
| (various by version) | (various) | | ||||
| | | | ||||
+---------------------------------------+ | ||||
Gregory Szorc
|
r27372 | |||
Kyle Lippincott
|
r31213 | The *delta data* is a series of *delta*s that describe a diff from an existing | ||
entry (either that the recipient already has, or previously specified in the | ||||
Matt Harbison
|
r32139 | bundle/changegroup). | ||
Gregory Szorc
|
r27372 | |||
Augie Fackler
|
r27434 | The *delta header* is different between versions ``1``, ``2``, and | ||
``3`` of the changegroup format. | ||||
Gregory Szorc
|
r27372 | |||
Kyle Lippincott
|
r31213 | Version 1 (headerlen=80):: | ||
Gregory Szorc
|
r27372 | |||
+------------------------------------------------------+ | ||||
| | | | | | ||||
| node | p1 node | p2 node | link node | | ||||
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | | ||||
| | | | | | ||||
+------------------------------------------------------+ | ||||
Kyle Lippincott
|
r31213 | Version 2 (headerlen=100):: | ||
Gregory Szorc
|
r27372 | |||
+------------------------------------------------------------------+ | ||||
| | | | | | | ||||
| node | p1 node | p2 node | base node | link node | | ||||
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | | ||||
| | | | | | | ||||
+------------------------------------------------------------------+ | ||||
Kyle Lippincott
|
r31213 | Version 3 (headerlen=102):: | ||
Augie Fackler
|
r27434 | |||
+------------------------------------------------------------------------------+ | ||||
| | | | | | | | ||||
Kyle Lippincott
|
r31213 | | node | p1 node | p2 node | base node | link node | flags | | ||
Augie Fackler
|
r27434 | | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (2 bytes) | | ||
| | | | | | | | ||||
+------------------------------------------------------------------------------+ | ||||
Kyle Lippincott
|
r31213 | The *delta data* consists of ``chunklen - 4 - headerlen`` bytes, which contain a | ||
series of *delta*s, densely packed (no separators). These deltas describe a diff | ||||
from an existing entry (either that the recipient already has, or previously | ||||
specified in the bundle/changegroup). The format is described more fully in | ||||
Yuya Nishihara
|
r31287 | ``hg help internals.bdiff``, but briefly:: | ||
Gregory Szorc
|
r27372 | |||
Kyle Lippincott
|
r31213 | +---------------------------------------------------------------+ | ||
| | | | | | ||||
| start offset | end offset | new length | content | | ||||
| (4 bytes) | (4 bytes) | (4 bytes) | (<new length> bytes) | | ||||
| | | | | | ||||
+---------------------------------------------------------------+ | ||||
Please note that the length field in the delta data does *not* include itself. | ||||
Gregory Szorc
|
r27372 | |||
In version 1, the delta is always applied against the previous node from | ||||
the changegroup or the first parent if this is the first entry in the | ||||
changegroup. | ||||
Kyle Lippincott
|
r31213 | In version 2 and up, the delta base node is encoded in the entry in the | ||
Gregory Szorc
|
r27372 | changegroup. This allows the delta to be expressed against any parent, | ||
which can result in smaller deltas and more efficient encoding of data. | ||||
Gregory Szorc
|
r40083 | The *flags* field holds bitwise flags affecting the processing of revision | ||
data. The following flags are defined: | ||||
32768 | ||||
Censored revision. The revision's fulltext has been replaced by censor | ||||
metadata. May only occur on file revisions. | ||||
16384 | ||||
Ellipsis revision. Revision hash does not match data (likely due to rewritten | ||||
parents). | ||||
8192 | ||||
Externally stored. The revision fulltext contains ``key:value`` ``\n`` | ||||
delimited metadata defining an object stored elsewhere. Used by the LFS | ||||
extension. | ||||
For historical reasons, the integer values are identical to revlog version 1 | ||||
per-revision storage flags and correspond to bits being set in this 2-byte | ||||
field. Bits were allocated starting from the most-significant bit, hence the | ||||
reverse ordering and allocation of these flags. | ||||
Gregory Szorc
|
r27372 | Changeset Segment | ||
Gregory Szorc
|
r29747 | ================= | ||
Gregory Szorc
|
r27372 | |||
The *changeset segment* consists of a single *delta group* holding | ||||
Kyle Lippincott
|
r31213 | changelog data. The *empty chunk* at the end of the *delta group* denotes | ||
the boundary to the *manifest segment*. | ||||
Gregory Szorc
|
r27372 | |||
Manifest Segment | ||||
Gregory Szorc
|
r29747 | ================ | ||
Gregory Szorc
|
r27372 | |||
Kyle Lippincott
|
r31213 | The *manifest segment* consists of a single *delta group* holding manifest | ||
data. If treemanifests are in use, it contains only the manifest for the | ||||
root directory of the repository. Otherwise, it contains the entire | ||||
manifest data. The *empty chunk* at the end of the *delta group* denotes | ||||
the boundary to the next segment (either the *treemanifests segment* or the | ||||
*filelogs segment*, depending on version and the request options). | ||||
Treemanifests Segment | ||||
--------------------- | ||||
The *treemanifests segment* only exists in changegroup version ``3``, and | ||||
only if the 'treemanifest' param is part of the bundle2 changegroup part | ||||
(it is not possible to use changegroup version 3 outside of bundle2). | ||||
Aside from the filenames in the *treemanifests segment* containing a | ||||
trailing ``/`` character, it behaves identically to the *filelogs segment* | ||||
(see below). The final sub-segment is followed by an *empty chunk* (logically, | ||||
a sub-segment with filename size 0). This denotes the boundary to the | ||||
*filelogs segment*. | ||||
Gregory Szorc
|
r27372 | |||
Filelogs Segment | ||||
Gregory Szorc
|
r29747 | ================ | ||
Gregory Szorc
|
r27372 | |||
Kyle Lippincott
|
r31213 | The *filelogs segment* consists of multiple sub-segments, each | ||
Gregory Szorc
|
r27372 | corresponding to an individual file whose data is being described:: | ||
Kyle Lippincott
|
r31213 | +--------------------------------------------------+ | ||
| | | | | | | ||||
| filelog0 | filelog1 | filelog2 | ... | 0x0 | | ||||
| | | | | (4 bytes) | | ||||
| | | | | | | ||||
+--------------------------------------------------+ | ||||
Gregory Szorc
|
r27372 | |||
Kyle Lippincott
|
r31213 | The final filelog sub-segment is followed by an *empty chunk* (logically, | ||
a sub-segment with filename size 0). This denotes the end of the segment | ||||
and of the overall changegroup. | ||||
Gregory Szorc
|
r27372 | |||
Each filelog sub-segment consists of the following:: | ||||
Kyle Lippincott
|
r31213 | +------------------------------------------------------+ | ||
| | | | | ||||
| filename length | filename | delta group | | ||||
| (4 bytes) | (<length - 4> bytes) | (various) | | ||||
| | | | | ||||
+------------------------------------------------------+ | ||||
Gregory Szorc
|
r27372 | |||
That is, a *chunk* consisting of the filename (not terminated or padded) | ||||
Kyle Lippincott
|
r31213 | followed by N chunks constituting the *delta group* for this file. The | ||
*empty chunk* at the end of each *delta group* denotes the boundary to the | ||||
next filelog sub-segment. | ||||