##// END OF EJS Templates
help: add documentation for changegroup formats...
Gregory Szorc -
r27372:a79cba6c default
parent child Browse files
Show More
@@ -0,0 +1,142 b''
1 Changegroups
2 ============
3
4 Changegroups are representations of repository revlog data, specifically
5 the changelog, manifest, and filelogs.
6
7 There are 2 versions of changegroups: ``1`` and ``2``. From a
8 high-level, they are almost exactly the same, with the only difference
9 being a header on entries in the changeset segment.
10
11 Changegroups consists of 3 logical segments::
12
13 +---------------------------------+
14 | | | |
15 | changeset | manifest | filelogs |
16 | | | |
17 +---------------------------------+
18
19 The principle building block of each segment is a *chunk*. A *chunk*
20 is a framed piece of data::
21
22 +---------------------------------------+
23 | | |
24 | length | data |
25 | (32 bits) | <length> bytes |
26 | | |
27 +---------------------------------------+
28
29 Each chunk starts with a 32-bit big-endian signed integer indicating
30 the length of the raw data that follows.
31
32 There is a special case chunk that has 0 length (``0x00000000``). We
33 call this an *empty chunk*.
34
35 Delta Groups
36 ------------
37
38 A *delta group* expresses the content of a revlog as a series of deltas,
39 or patches against previous revisions.
40
41 Delta groups consist of 0 or more *chunks* followed by the *empty chunk*
42 to signal the end of the delta group::
43
44 +------------------------------------------------------------------------+
45 | | | | | |
46 | chunk0 length | chunk0 data | chunk1 length | chunk1 data | 0x0 |
47 | (32 bits) | (various) | (32 bits) | (various) | (32 bits) |
48 | | | | | |
49 +------------------------------------------------------------+-----------+
50
51 Each *chunk*'s data consists of the following::
52
53 +-----------------------------------------+
54 | | | |
55 | delta header | mdiff header | delta |
56 | (various) | (12 bytes) | (various) |
57 | | | |
58 +-----------------------------------------+
59
60 The *length* field is the byte length of the remaining 3 logical pieces
61 of data. The *delta* is a diff from an existing entry in the changelog.
62
63 The *delta header* is different between versions ``1`` and ``2`` of the
64 changegroup format.
65
66 Version 1::
67
68 +------------------------------------------------------+
69 | | | | |
70 | node | p1 node | p2 node | link node |
71 | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) |
72 | | | | |
73 +------------------------------------------------------+
74
75 Version 2::
76
77 +------------------------------------------------------------------+
78 | | | | | |
79 | node | p1 node | p2 node | base node | link node |
80 | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) |
81 | | | | | |
82 +------------------------------------------------------------------+
83
84 The *mdiff header* consists of 3 32-bit big-endian signed integers
85 describing offsets at which to apply the following delta content::
86
87 +-------------------------------------+
88 | | | |
89 | offset | old length | new length |
90 | (32 bits) | (32 bits) | (32 bits) |
91 | | | |
92 +-------------------------------------+
93
94 In version 1, the delta is always applied against the previous node from
95 the changegroup or the first parent if this is the first entry in the
96 changegroup.
97
98 In version 2, the delta base node is encoded in the entry in the
99 changegroup. This allows the delta to be expressed against any parent,
100 which can result in smaller deltas and more efficient encoding of data.
101
102 Changeset Segment
103 -----------------
104
105 The *changeset segment* consists of a single *delta group* holding
106 changelog data. It is followed by an *empty chunk* to denote the
107 boundary to the *manifests segment*.
108
109 Manifest Segment
110 ----------------
111
112 The *manifest segment* consists of a single *delta group* holding
113 manifest data. It is followed by an *empty chunk* to denote the boundary
114 to the *filelogs segment*.
115
116 Filelogs Segment
117 ----------------
118
119 The *filelogs* segment consists of multiple sub-segments, each
120 corresponding to an individual file whose data is being described::
121
122 +--------------------------------------+
123 | | | | |
124 | filelog0 | filelog1 | filelog2 | ... |
125 | | | | |
126 +--------------------------------------+
127
128 The final filelog sub-segment is followed by an *empty chunk* to denote
129 the end of the segment and the overall changegroup.
130
131 Each filelog sub-segment consists of the following::
132
133 +------------------------------------------+
134 | | | |
135 | filename size | filename | delta group |
136 | (32 bits) | (various) | (various) |
137 | | | |
138 +------------------------------------------+
139
140 That is, a *chunk* consisting of the filename (not terminated or padded)
141 followed by N chunks constituting the *delta group* for this file.
142
General Comments 0
You need to be logged in to leave comments. Login now