|
|
Changegroups are representations of repository revlog data, specifically
|
|
|
the changelog data, root/flat manifest data, treemanifest data, and
|
|
|
filelogs.
|
|
|
|
|
|
There are 4 versions of changegroups: ``1``, ``2``, ``3`` and ``4``. From a
|
|
|
high-level, versions ``1`` and ``2`` are almost exactly the same, with the
|
|
|
only difference being an additional item in the *delta header*. Version
|
|
|
``3`` adds support for storage flags in the *delta header* and optionally
|
|
|
exchanging treemanifests (enabled by setting an option on the
|
|
|
``changegroup`` part in the bundle2). Version ``4`` adds support for exchanging
|
|
|
sidedata (additional revision metadata not part of the digest).
|
|
|
|
|
|
Changegroups when not exchanging treemanifests consist of 3 logical
|
|
|
segments::
|
|
|
|
|
|
+---------------------------------+
|
|
|
| | | |
|
|
|
| changeset | manifest | filelogs |
|
|
|
| | | |
|
|
|
| | | |
|
|
|
+---------------------------------+
|
|
|
|
|
|
When exchanging treemanifests, there are 4 logical segments::
|
|
|
|
|
|
+-------------------------------------------------+
|
|
|
| | | | |
|
|
|
| changeset | root | treemanifests | filelogs |
|
|
|
| | manifest | | |
|
|
|
| | | | |
|
|
|
+-------------------------------------------------+
|
|
|
|
|
|
The principle building block of each segment is a *chunk*. A *chunk*
|
|
|
is a framed piece of data::
|
|
|
|
|
|
+---------------------------------------+
|
|
|
| | |
|
|
|
| length | data |
|
|
|
| (4 bytes) | (<length - 4> bytes) |
|
|
|
| | |
|
|
|
+---------------------------------------+
|
|
|
|
|
|
All integers are big-endian signed integers. Each chunk starts with a 32-bit
|
|
|
integer indicating the length of the entire chunk (including the length field
|
|
|
itself).
|
|
|
|
|
|
There is a special case chunk that has a value of 0 for the length
|
|
|
(``0x00000000``). We call this an *empty chunk*.
|
|
|
|
|
|
Delta Groups
|
|
|
============
|
|
|
|
|
|
A *delta group* expresses the content of a revlog as a series of deltas,
|
|
|
or patches against previous revisions.
|
|
|
|
|
|
Delta groups consist of 0 or more *chunks* followed by the *empty chunk*
|
|
|
to signal the end of the delta group::
|
|
|
|
|
|
+------------------------------------------------------------------------+
|
|
|
| | | | | |
|
|
|
| chunk0 length | chunk0 data | chunk1 length | chunk1 data | 0x0 |
|
|
|
| (4 bytes) | (various) | (4 bytes) | (various) | (4 bytes) |
|
|
|
| | | | | |
|
|
|
+------------------------------------------------------------------------+
|
|
|
|
|
|
Each *chunk*'s data consists of the following::
|
|
|
|
|
|
+---------------------------------------+
|
|
|
| | |
|
|
|
| delta header | delta data |
|
|
|
| (various by version) | (various) |
|
|
|
| | |
|
|
|
+---------------------------------------+
|
|
|
|
|
|
The *delta data* is a series of *delta*s that describe a diff from an existing
|
|
|
entry (either that the recipient already has, or previously specified in the
|
|
|
bundle/changegroup).
|
|
|
|
|
|
The *delta header* is different between versions ``1``, ``2``, ``3`` and ``4``
|
|
|
of the changegroup format.
|
|
|
|
|
|
Version 1 (headerlen=80)::
|
|
|
|
|
|
+------------------------------------------------------+
|
|
|
| | | | |
|
|
|
| node | p1 node | p2 node | link node |
|
|
|
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) |
|
|
|
| | | | |
|
|
|
+------------------------------------------------------+
|
|
|
|
|
|
Version 2 (headerlen=100)::
|
|
|
|
|
|
+------------------------------------------------------------------+
|
|
|
| | | | | |
|
|
|
| node | p1 node | p2 node | base node | link node |
|
|
|
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) |
|
|
|
| | | | | |
|
|
|
+------------------------------------------------------------------+
|
|
|
|
|
|
Version 3 (headerlen=102)::
|
|
|
|
|
|
+------------------------------------------------------------------------------+
|
|
|
| | | | | | |
|
|
|
| node | p1 node | p2 node | base node | link node | flags |
|
|
|
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (2 bytes) |
|
|
|
| | | | | | |
|
|
|
+------------------------------------------------------------------------------+
|
|
|
|
|
|
Version 4 (headerlen=103)::
|
|
|
|
|
|
+------------------------------------------------------------------------------+----------+
|
|
|
| | | | | | | |
|
|
|
| node | p1 node | p2 node | base node | link node | flags | pflags |
|
|
|
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (2 bytes) | (1 byte) |
|
|
|
| | | | | | | |
|
|
|
+------------------------------------------------------------------------------+----------+
|
|
|
|
|
|
The *delta data* consists of ``chunklen - 4 - headerlen`` bytes, which contain a
|
|
|
series of *delta*s, densely packed (no separators). These deltas describe a diff
|
|
|
from an existing entry (either that the recipient already has, or previously
|
|
|
specified in the bundle/changegroup). The format is described more fully in
|
|
|
``hg help internals.bdiff``, but briefly::
|
|
|
|
|
|
+---------------------------------------------------------------+
|
|
|
| | | | |
|
|
|
| start offset | end offset | new length | content |
|
|
|
| (4 bytes) | (4 bytes) | (4 bytes) | (<new length> bytes) |
|
|
|
| | | | |
|
|
|
+---------------------------------------------------------------+
|
|
|
|
|
|
Please note that the length field in the delta data does *not* include itself.
|
|
|
|
|
|
In version 1, the delta is always applied against the previous node from
|
|
|
the changegroup or the first parent if this is the first entry in the
|
|
|
changegroup.
|
|
|
|
|
|
In version 2 and up, the delta base node is encoded in the entry in the
|
|
|
changegroup. This allows the delta to be expressed against any parent,
|
|
|
which can result in smaller deltas and more efficient encoding of data.
|
|
|
|
|
|
The *flags* field holds bitwise flags affecting the processing of revision
|
|
|
data. The following flags are defined:
|
|
|
|
|
|
32768
|
|
|
Censored revision. The revision's fulltext has been replaced by censor
|
|
|
metadata. May only occur on file revisions.
|
|
|
16384
|
|
|
Ellipsis revision. Revision hash does not match data (likely due to rewritten
|
|
|
parents).
|
|
|
8192
|
|
|
Externally stored. The revision fulltext contains ``key:value`` ``\n``
|
|
|
delimited metadata defining an object stored elsewhere. Used by the LFS
|
|
|
extension.
|
|
|
4096
|
|
|
Contains copy information. This revision changes files in a way that could
|
|
|
affect copy tracing. This does *not* affect changegroup handling, but is
|
|
|
relevant for other parts of Mercurial.
|
|
|
|
|
|
For historical reasons, the integer values are identical to revlog version 1
|
|
|
per-revision storage flags and correspond to bits being set in this 2-byte
|
|
|
field. Bits were allocated starting from the most-significant bit, hence the
|
|
|
reverse ordering and allocation of these flags.
|
|
|
|
|
|
The *pflags* (protocol flags) field holds bitwise flags affecting the protocol
|
|
|
itself. They are first in the header since they may affect the handling of the
|
|
|
rest of the fields in a future version. They are defined as such:
|
|
|
|
|
|
1 indicates whether to read a chunk of sidedata (of variable length) right
|
|
|
after the revision flags.
|
|
|
|
|
|
|
|
|
Changeset Segment
|
|
|
=================
|
|
|
|
|
|
The *changeset segment* consists of a single *delta group* holding
|
|
|
changelog data. The *empty chunk* at the end of the *delta group* denotes
|
|
|
the boundary to the *manifest segment*.
|
|
|
|
|
|
Manifest Segment
|
|
|
================
|
|
|
|
|
|
The *manifest segment* consists of a single *delta group* holding manifest
|
|
|
data. If treemanifests are in use, it contains only the manifest for the
|
|
|
root directory of the repository. Otherwise, it contains the entire
|
|
|
manifest data. The *empty chunk* at the end of the *delta group* denotes
|
|
|
the boundary to the next segment (either the *treemanifests segment* or the
|
|
|
*filelogs segment*, depending on version and the request options).
|
|
|
|
|
|
Treemanifests Segment
|
|
|
---------------------
|
|
|
|
|
|
The *treemanifests segment* only exists in changegroup version ``3`` and ``4``,
|
|
|
and only if the 'treemanifest' param is part of the bundle2 changegroup part
|
|
|
(it is not possible to use changegroup version 3 or 4 outside of bundle2).
|
|
|
Aside from the filenames in the *treemanifests segment* containing a
|
|
|
trailing ``/`` character, it behaves identically to the *filelogs segment*
|
|
|
(see below). The final sub-segment is followed by an *empty chunk* (logically,
|
|
|
a sub-segment with filename size 0). This denotes the boundary to the
|
|
|
*filelogs segment*.
|
|
|
|
|
|
Filelogs Segment
|
|
|
================
|
|
|
|
|
|
The *filelogs segment* consists of multiple sub-segments, each
|
|
|
corresponding to an individual file whose data is being described::
|
|
|
|
|
|
+--------------------------------------------------+
|
|
|
| | | | | |
|
|
|
| filelog0 | filelog1 | filelog2 | ... | 0x0 |
|
|
|
| | | | | (4 bytes) |
|
|
|
| | | | | |
|
|
|
+--------------------------------------------------+
|
|
|
|
|
|
The final filelog sub-segment is followed by an *empty chunk* (logically,
|
|
|
a sub-segment with filename size 0). This denotes the end of the segment
|
|
|
and of the overall changegroup.
|
|
|
|
|
|
Each filelog sub-segment consists of the following::
|
|
|
|
|
|
+------------------------------------------------------+
|
|
|
| | | |
|
|
|
| filename length | filename | delta group |
|
|
|
| (4 bytes) | (<length - 4> bytes) | (various) |
|
|
|
| | | |
|
|
|
+------------------------------------------------------+
|
|
|
|
|
|
That is, a *chunk* consisting of the filename (not terminated or padded)
|
|
|
followed by N chunks constituting the *delta group* for this file. The
|
|
|
*empty chunk* at the end of each *delta group* denotes the boundary to the
|
|
|
next filelog sub-segment.
|
|
|
|