##// END OF EJS Templates
dirstate-v2: Move data file info in the docket closer together...
dirstate-v2: Move data file info in the docket closer together Having `data_size` next to `uuid_size` (and the UUID itself) makes more sense. Differential Revision: https://phab.mercurial-scm.org/D11545

File last commit:

r47843:119790e1 default
r48977:d467e44f default
Show More
changegroups.txt
229 lines | 10.2 KiB | text/plain | TextLexer
Matt Harbison
help: create packages for the help text...
r44031 Changegroups are representations of repository revlog data, specifically
the changelog data, root/flat manifest data, treemanifest data, and
filelogs.
Raphaël Gomès
cg4: introduce protocol flag to signify the presence of sidedata...
r47843 There are 4 versions of changegroups: ``1``, ``2``, ``3`` and ``4``. From a
Matt Harbison
help: create packages for the help text...
r44031 high-level, versions ``1`` and ``2`` are almost exactly the same, with the
only difference being an additional item in the *delta header*. Version
``3`` adds support for storage flags in the *delta header* and optionally
exchanging treemanifests (enabled by setting an option on the
Raphaël Gomès
cg4: introduce protocol flag to signify the presence of sidedata...
r47843 ``changegroup`` part in the bundle2). Version ``4`` adds support for exchanging
sidedata (additional revision metadata not part of the digest).
Matt Harbison
help: create packages for the help text...
r44031
Changegroups when not exchanging treemanifests consist of 3 logical
segments::
+---------------------------------+
| | | |
| changeset | manifest | filelogs |
| | | |
| | | |
+---------------------------------+
When exchanging treemanifests, there are 4 logical segments::
+-------------------------------------------------+
| | | | |
| changeset | root | treemanifests | filelogs |
| | manifest | | |
| | | | |
+-------------------------------------------------+
The principle building block of each segment is a *chunk*. A *chunk*
is a framed piece of data::
+---------------------------------------+
| | |
| length | data |
| (4 bytes) | (<length - 4> bytes) |
| | |
+---------------------------------------+
All integers are big-endian signed integers. Each chunk starts with a 32-bit
integer indicating the length of the entire chunk (including the length field
itself).
There is a special case chunk that has a value of 0 for the length
(``0x00000000``). We call this an *empty chunk*.
Delta Groups
============
A *delta group* expresses the content of a revlog as a series of deltas,
or patches against previous revisions.
Delta groups consist of 0 or more *chunks* followed by the *empty chunk*
to signal the end of the delta group::
+------------------------------------------------------------------------+
| | | | | |
| chunk0 length | chunk0 data | chunk1 length | chunk1 data | 0x0 |
| (4 bytes) | (various) | (4 bytes) | (various) | (4 bytes) |
| | | | | |
+------------------------------------------------------------------------+
Each *chunk*'s data consists of the following::
+---------------------------------------+
| | |
| delta header | delta data |
| (various by version) | (various) |
| | |
+---------------------------------------+
The *delta data* is a series of *delta*s that describe a diff from an existing
entry (either that the recipient already has, or previously specified in the
bundle/changegroup).
Raphaël Gomès
cg4: introduce protocol flag to signify the presence of sidedata...
r47843 The *delta header* is different between versions ``1``, ``2``, ``3`` and ``4``
of the changegroup format.
Matt Harbison
help: create packages for the help text...
r44031
Version 1 (headerlen=80)::
+------------------------------------------------------+
| | | | |
| node | p1 node | p2 node | link node |
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) |
| | | | |
+------------------------------------------------------+
Version 2 (headerlen=100)::
+------------------------------------------------------------------+
| | | | | |
| node | p1 node | p2 node | base node | link node |
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) |
| | | | | |
+------------------------------------------------------------------+
Version 3 (headerlen=102)::
+------------------------------------------------------------------------------+
| | | | | | |
| node | p1 node | p2 node | base node | link node | flags |
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (2 bytes) |
| | | | | | |
+------------------------------------------------------------------------------+
Raphaël Gomès
cg4: introduce protocol flag to signify the presence of sidedata...
r47843 Version 4 (headerlen=103)::
+------------------------------------------------------------------------------+----------+
| | | | | | | |
| node | p1 node | p2 node | base node | link node | flags | pflags |
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (2 bytes) | (1 byte) |
| | | | | | | |
+------------------------------------------------------------------------------+----------+
Matt Harbison
help: create packages for the help text...
r44031 The *delta data* consists of ``chunklen - 4 - headerlen`` bytes, which contain a
series of *delta*s, densely packed (no separators). These deltas describe a diff
from an existing entry (either that the recipient already has, or previously
specified in the bundle/changegroup). The format is described more fully in
``hg help internals.bdiff``, but briefly::
+---------------------------------------------------------------+
| | | | |
| start offset | end offset | new length | content |
| (4 bytes) | (4 bytes) | (4 bytes) | (<new length> bytes) |
| | | | |
+---------------------------------------------------------------+
Please note that the length field in the delta data does *not* include itself.
In version 1, the delta is always applied against the previous node from
the changegroup or the first parent if this is the first entry in the
changegroup.
In version 2 and up, the delta base node is encoded in the entry in the
changegroup. This allows the delta to be expressed against any parent,
which can result in smaller deltas and more efficient encoding of data.
The *flags* field holds bitwise flags affecting the processing of revision
data. The following flags are defined:
32768
Censored revision. The revision's fulltext has been replaced by censor
metadata. May only occur on file revisions.
16384
Ellipsis revision. Revision hash does not match data (likely due to rewritten
parents).
8192
Externally stored. The revision fulltext contains ``key:value`` ``\n``
delimited metadata defining an object stored elsewhere. Used by the LFS
extension.
Raphaël Gomès
cg4: introduce protocol flag to signify the presence of sidedata...
r47843 4096
Contains copy information. This revision changes files in a way that could
affect copy tracing. This does *not* affect changegroup handling, but is
relevant for other parts of Mercurial.
Matt Harbison
help: create packages for the help text...
r44031
For historical reasons, the integer values are identical to revlog version 1
per-revision storage flags and correspond to bits being set in this 2-byte
field. Bits were allocated starting from the most-significant bit, hence the
reverse ordering and allocation of these flags.
Raphaël Gomès
cg4: introduce protocol flag to signify the presence of sidedata...
r47843 The *pflags* (protocol flags) field holds bitwise flags affecting the protocol
itself. They are first in the header since they may affect the handling of the
rest of the fields in a future version. They are defined as such:
1 indicates whether to read a chunk of sidedata (of variable length) right
after the revision flags.
Matt Harbison
help: create packages for the help text...
r44031 Changeset Segment
=================
The *changeset segment* consists of a single *delta group* holding
changelog data. The *empty chunk* at the end of the *delta group* denotes
the boundary to the *manifest segment*.
Manifest Segment
================
The *manifest segment* consists of a single *delta group* holding manifest
data. If treemanifests are in use, it contains only the manifest for the
root directory of the repository. Otherwise, it contains the entire
manifest data. The *empty chunk* at the end of the *delta group* denotes
the boundary to the next segment (either the *treemanifests segment* or the
*filelogs segment*, depending on version and the request options).
Treemanifests Segment
---------------------
Raphaël Gomès
cg4: introduce protocol flag to signify the presence of sidedata...
r47843 The *treemanifests segment* only exists in changegroup version ``3`` and ``4``,
and only if the 'treemanifest' param is part of the bundle2 changegroup part
(it is not possible to use changegroup version 3 or 4 outside of bundle2).
Matt Harbison
help: create packages for the help text...
r44031 Aside from the filenames in the *treemanifests segment* containing a
trailing ``/`` character, it behaves identically to the *filelogs segment*
(see below). The final sub-segment is followed by an *empty chunk* (logically,
a sub-segment with filename size 0). This denotes the boundary to the
*filelogs segment*.
Filelogs Segment
================
The *filelogs segment* consists of multiple sub-segments, each
corresponding to an individual file whose data is being described::
+--------------------------------------------------+
| | | | | |
| filelog0 | filelog1 | filelog2 | ... | 0x0 |
| | | | | (4 bytes) |
| | | | | |
+--------------------------------------------------+
The final filelog sub-segment is followed by an *empty chunk* (logically,
a sub-segment with filename size 0). This denotes the end of the segment
and of the overall changegroup.
Each filelog sub-segment consists of the following::
+------------------------------------------------------+
| | | |
| filename length | filename | delta group |
| (4 bytes) | (<length - 4> bytes) | (various) |
| | | |
+------------------------------------------------------+
That is, a *chunk* consisting of the filename (not terminated or padded)
followed by N chunks constituting the *delta group* for this file. The
*empty chunk* at the end of each *delta group* denotes the boundary to the
next filelog sub-segment.