##// END OF EJS Templates
internals: document bundle2 format...
Gregory Szorc -
r36469:1fa35ca3 default
parent child Browse files
Show More
This diff has been collapsed as it changes many lines, (677 lines changed) Show them Hide them
@@ -0,0 +1,677 b''
1 Bundle2 refers to a data format that is used for both on-disk storage
2 and over-the-wire transfer of repository data and state.
3
4 The data format allows the capture of multiple components of
5 repository data. Contrast with the initial bundle format, which
6 only captured *changegroup* data (and couldn't store bookmarks,
7 phases, etc).
8
9 Bundle2 is used for:
10
11 * Transferring data from a repository (e.g. as part of an ``hg clone``
12 or ``hg pull`` operation).
13 * Transferring data to a repository (e.g. as part of an ``hg push``
14 operation).
15 * Storing data on disk (e.g. the result of an ``hg bundle``
16 operation).
17 * Transferring the results of a repository operation (e.g. the
18 reply to an ``hg push`` operation).
19
20 At its highest level, a bundle2 payload is a stream that begins
21 with some metadata and consists of a series of *parts*, with each
22 part describing repository data or state or the result of an
23 operation. New bundle2 parts are introduced over time when there is
24 a need to capture a new form of data. A *capabilities* mechanism
25 exists to allow peers to understand which bundle2 parts the other
26 understands.
27
28 Stream Format
29 =============
30
31 A bundle2 payload consists of a magic string (``HG20``) followed by
32 stream level parameters, followed by any number of payload *parts*.
33
34 It may help to think of the stream level parameters as *headers* and the
35 payload parts as the *body*.
36
37 Stream Level Parameters
38 -----------------------
39
40 Following the magic string is data that defines parameters applicable to the
41 entire payload.
42
43 Stream level parameters begin with a 32-bit unsigned big-endian integer.
44 The value of this integer defines the number of bytes of stream level
45 parameters that follow.
46
47 The *N* bytes of raw data contains a space separated list of parameters.
48 Each parameter consists of a required name and an optional value.
49
50 Parameters have the form ``<name>`` or ``<name>=<value>``.
51
52 Both the parameter name and value are URL quoted.
53
54 Names MUST start with a letter. If the first letter is lower case, the
55 parameter is advisory and can safely be ignored. If the first letter
56 is upper case, the parameter is mandatory and the handler MUST stop if
57 it is unable to process it.
58
59 Stream level parameters apply to the entire bundle2 payload. Lower-level
60 options should go into a bundle2 part instead.
61
62 The following stream level parameters are defined:
63
64 compression
65 Compression format of payload data. ``GZ`` denotes zlib. ``BZ``
66 denotes bzip2. ``ZS`` denotes zstandard.
67
68 When defined, all bytes after the stream level parameters are
69 compressed using the compression format defined by this parameter.
70
71 If this parameter isn't present, data is raw/uncompressed.
72
73 This parameter MUST be mandatory because attempting to consume
74 streams without knowing how to decode the underlying bytes will
75 result in errors.
76
77 Payload Part
78 ------------
79
80 Following the stream level parameters are 0 or more payload parts. Each
81 payload part consists of a header and a body.
82
83 The payload part header consists of a 32-bit unsigned big-endian integer
84 defining the number of bytes in the header that follow. The special
85 value ``0`` indicates the end of the bundle2 stream.
86
87 The binary format of the part header is as follows:
88
89 * 8-bit unsigned size of the part name
90 * N-bytes alphanumeric part name
91 * 32-bit unsigned big-endian part ID
92 * N bytes part parameter data
93
94 The *part name* identifies the type of the part. A part name with an
95 UPPERCASE letter is mandatory. Otherwise, the part is advisory. A
96 consumer should abort if it encounters a mandatory part it doesn't know
97 how to process. See the sections below for each defined part type.
98
99 The *part ID* is a unique identifier within the bundle used to refer to a
100 specific part. It should be unique within the bundle2 payload.
101
102 Part parameter data consists of:
103
104 * 1 byte number of mandatory parameters
105 * 1 byte number of advisory parameters
106 * 2 * N bytes of sizes of parameter key and values
107 * N * M blobs of values for parameter key and values
108
109 Following the 2 bytes of mandatory and advisory parameter counts are
110 2-tuples of bytes of the sizes of each parameter. e.g.
111 (<key size>, <value size>).
112
113 Following that are the raw values, without padding. Mandatory parameters
114 come first, followed by advisory parameters.
115
116 Each parameter's key MUST be unique within the part.
117
118 Following the part parameter data is the part payload. The part payload
119 consists of a series of framed chunks. The frame header is a 32-bit
120 big-endian integer defining the size of the chunk. The N bytes of raw
121 payload data follows.
122
123 The part payload consists of 0 or more chunks.
124
125 A chunk with size ``0`` denotes the end of the part payload. Therefore,
126 there will always be at least 1 32-bit integer following the payload
127 part header.
128
129 A chunk size of ``-1`` is used to signal an *interrupt*. If such a chunk
130 size is seen, the stream processor should process the next bytes as a new
131 payload part. After this payload part, processing of the original,
132 interrupted part should resume.
133
134 Capabilities
135 ============
136
137 Bundle2 is a dynamic format that can evolve over time. For example,
138 when a new repository data concept is invented, a new bundle2 part
139 is typically invented to hold that data. In addition, parts performing
140 similar functionality may come into existence if there is a better
141 mechanism for performing certain functionality.
142
143 Because the bundle2 format evolves over time, peers need to understand
144 what bundle2 features the other can understand. The *capabilities*
145 mechanism is how those features are expressed.
146
147 Bundle2 capabilities are logically expressed as a dictionary of
148 string key-value pairs where the keys are strings and the values
149 are lists of strings.
150
151 Capabilities are encoded for exchange between peers. The encoded
152 capabilities blob consists of a newline (``\n``) delimited list of
153 entries. Each entry has the form ``<key>`` or ``<key>=<value>``,
154 depending if the capability has a value.
155
156 The capability name is URL quoted (``%XX`` encoding of URL unsafe
157 characters).
158
159 The value, if present, is formed by URL quoting each value in
160 the capability list and concatenating the result with a comma (``,``).
161
162 For example, the capabilities ``novaluekey`` and ``listvaluekey``
163 with values ``value 1`` and ``value 2``. This would be encoded as:
164
165 listvaluekey=value%201,value%202\nnovaluekey
166
167 The sections below detail the defined bundle2 capabilities.
168
169 HG20
170 ----
171
172 Denotes that the peer supports the bundle2 data format.
173
174 bookmarks
175 ---------
176
177 Denotes that the peer supports the ``bookmarks`` part.
178
179 Peers should not issue mandatory ``bookmarks`` parts unless this
180 capability is present.
181
182 changegroup
183 -----------
184
185 Denotes which versions of the *changegroup* format the peer can
186 receive. Values include ``01``, ``02``, and ``03``.
187
188 The peer should not generate changegroup data for a version not
189 specified by this capability.
190
191 checkheads
192 ----------
193
194 Denotes which forms of heads checking the peer supports.
195
196 If ``related`` is in the value, then the peer supports the ``check:heads``
197 part and the peer is capable of detecting race conditions when applying
198 changelog data.
199
200 digests
201 -------
202
203 Denotes which hashing formats the peer supports.
204
205 Values are names of hashing function. Values include ``md5``, ``sha1``,
206 and ``sha512``.
207
208 error
209 -----
210
211 Denotes which ``error:`` parts the peer supports.
212
213 Value is a list of strings of ``error:`` part names. Valid values
214 include ``abort``, ``unsupportecontent``, ``pushraced``, and ``pushkey``.
215
216 Peers should not issue an ``error:`` part unless the type of that
217 part is listed as supported by this capability.
218
219 listkeys
220 --------
221
222 Denotes that the peer supports the ``listkeys`` part.
223
224 hgtagsfnodes
225 ------------
226
227 Denotes that the peer supports the ``hgtagsfnodes`` part.
228
229 obsmarkers
230 ----------
231
232 Denotes that the peer supports the ``obsmarker`` part and which versions
233 of the obsolescence data format it can receive. Values are strings like
234 ``V<N>``. e.g. ``V1``.
235
236 phases
237 ------
238
239 Denotes that the peer supports the ``phases`` part.
240
241 pushback
242 --------
243
244 Denotes that the peer supports sending/receiving bundle2 data in response
245 to a bundle2 request.
246
247 This capability is typically used by servers that employ server-side
248 rewriting of pushed repository data. For example, a server may wish to
249 automatically rebase pushed changesets. When this capability is present,
250 the server can send a bundle2 response containing the rewritten changeset
251 data and the client will apply it.
252
253 pushkey
254 -------
255
256 Denotes that the peer supports the ``puskey`` part.
257
258 remote-changegroup
259 ------------------
260
261 Denotes that the peer supports the ``remote-changegroup`` part and
262 which protocols it can use to fetch remote changegroup data.
263
264 Values are protocol names. e.g. ``http`` and ``https``.
265
266 stream
267 ------
268
269 Denotes that the peer supports ``stream*`` parts in order to support
270 *stream clone*.
271
272 Values are which ``stream*`` parts the peer supports. ``v2`` denotes
273 support for the ``stream2`` part.
274
275 Bundle2 Part Types
276 ==================
277
278 The sections below detail the various bundle2 part types.
279
280 bookmarks
281 ---------
282
283 The ``bookmarks`` part holds bookmarks information.
284
285 This part has no parameters.
286
287 The payload consists of entries defining bookmarks. Each entry consists of:
288
289 * 20 bytes binary changeset node.
290 * 2 bytes big endian short defining bookmark name length.
291 * N bytes defining bookmark name.
292
293 Receivers typically update bookmarks to match the state specified in
294 this part.
295
296 changegroup
297 -----------
298
299 The ``changegroup`` part contains *changegroup* data (changelog, manifestlog,
300 and filelog revision data).
301
302 The following part parameters are defined for this part.
303
304 version
305 Changegroup version string. e.g. ``01``, ``02``, and ``03``. This parameter
306 determines how to interpret the changegroup data within the part.
307
308 nbchanges
309 The number of changesets in this changegroup. This parameter can be used
310 to aid in the display of progress bars, etc during part application.
311
312 treemanifest
313 Whether the changegroup contains tree manifests.
314
315 targetphase
316 The target phase of changesets in this part. Value is an integer of
317 the target phase.
318
319 The payload of this part is raw changegroup data. See
320 :hg:`help internals.changegroups` for the format of changegroup data.
321
322 check:bookmarks
323 ---------------
324
325 The ``check:bookmarks`` part is inserted into a bundle as a means for the
326 receiver to validate that the sender's known state of bookmarks matches
327 the receiver's.
328
329 This part has no parameters.
330
331 The payload is a binary stream of bookmark data. Each entry in the stream
332 consists of:
333
334 * 20 bytes binary node that bookmark is associated with
335 * 2 bytes unsigned short defining length of bookmark name
336 * N bytes containing the bookmark name
337
338 If all bits in the node value are ``1``, then this signifies a missing
339 bookmark.
340
341 When the receiver encounters this part, for each bookmark in the part
342 payload, it should validate that the current bookmark state matches
343 the specified state. If it doesn't, then the receiver should take
344 appropriate action. (In the case of pushes, this mismatch signifies
345 a race condition and the receiver should consider rejecting the push.)
346
347 check:heads
348 -----------
349
350 The ``check:heads`` part is a means to validate that the sender's state
351 of DAG heads matches the receiver's.
352
353 This part has no parameters.
354
355 The body of this part is an array of 20 byte binary nodes representing
356 changeset heads.
357
358 Receivers should compare the set of heads defined in this part to the
359 current set of repo heads and take action if there is a mismatch in that
360 set.
361
362 Note that this part applies to *all* heads in the repo.
363
364 check:phases
365 ------------
366
367 The ``check:phases`` part validates that the sender's state of phase
368 boundaries matches the receiver's.
369
370 This part has no parameters.
371
372 The payload consists of an array of 24 byte entries. Each entry is
373 a big endian 32-bit integer defining the phase integer and 20 byte
374 binary node value.
375
376 For each changeset defined in this part, the receiver should validate
377 that its current phase matches the phase defined in this part. The
378 receiver should take appropriate action if a mismatch occurs.
379
380 check:updated-heads
381 -------------------
382
383 The ``check:updated-heads`` part validates that the sender's state of
384 DAG heads updated by this bundle matches the receiver's.
385
386 This type is nearly identical to ``check:heads`` except the heads
387 in the payload are only a subset of heads in the repository. The
388 receiver should validate that all nodes specified by the sender are
389 branch heads and take appropriate action if not.
390
391 error:abort
392 -----------
393
394 The ``error:abort`` part conveys a fatal error.
395
396 The following part parameters are defined:
397
398 message
399 The string content of the error message.
400
401 hint
402 Supplemental string giving a hint on how to fix the problem.
403
404 error:pushkey
405 -------------
406
407 The ``error:pushkey`` part conveys an error in the *pushkey* protocol.
408
409 The following part parameters are defined:
410
411 namespace
412 The pushkey domain that exhibited the error.
413
414 key
415 The key whose update failed.
416
417 new
418 The value we tried to set the key to.
419
420 old
421 The old value of the key (as supplied by the client).
422
423 ret
424 The integer result code for the pushkey request.
425
426 in-reply-to
427 Part ID that triggered this error.
428
429 This part is generated if there was an error applying *pushkey* data.
430 Pushkey data includes bookmarks, phases, and obsolescence markers.
431
432 error:pushraced
433 ---------------
434
435 The ``error:pushraced`` part conveys that an error occurred and
436 the likely cause is losing a race with another pusher.
437
438 The following part parameters are defined:
439
440 message
441 String error message.
442
443 This part is typically emitted when a receiver examining ``check:*``
444 parts encountered inconsistency between incoming state and local state.
445 The likely cause of that inconsistency is another repository change
446 operation (often another client performing an ``hg push``).
447
448 error:unsupportedcontent
449 ------------------------
450
451 The ``error:unsupportedcontent`` part conveys that a bundle2 receiver
452 encountered a part or content it was not able to handle.
453
454 The following part parameters are defined:
455
456 parttype
457 The name of the part that triggered this error.
458
459 params
460 ``\0`` delimited list of parameters.
461
462 hgtagsfnodes
463 ------------
464
465 The ``hgtagsfnodes`` type defines file nodes for the ``.hgtags`` file
466 for various changesets.
467
468 This part has no parameters.
469
470 The payload is an array of pairs of 20 byte binary nodes. The first node
471 is a changeset node. The second node is the ``.hgtags`` file node.
472
473 Resolving tags requires resolving the ``.hgtags`` file node for changesets.
474 On large repositories, this can be expensive. Repositories cache the
475 mapping of changeset to ``.hgtags`` file node on disk as a performance
476 optimization. This part allows that cached data to be transferred alongside
477 changeset data.
478
479 Receivers should update their ``.hgtags`` cache file node mappings with
480 the incoming data.
481
482 listkeys
483 --------
484
485 The ``listkeys`` part holds content for a *pushkey* namespace.
486
487 The following part parameters are defined:
488
489 namespace
490 The pushkey domain this data belongs to.
491
492 The part payload contains a newline (``\n``) delimited list of
493 tab (``\t``) delimited key-value pairs defining entries in this pushkey
494 namespace.
495
496 obsmarkers
497 ----------
498
499 The ``obsmarkers`` part defines obsolescence markers.
500
501 This part has no parameters.
502
503 The payload consists of obsolescence markers using the on-disk markers
504 format. The first byte defines the version format.
505
506 The receiver should apply the obsolescence markers defined in this
507 part. A ``reply:obsmarkers`` part should be sent to the sender, if possible.
508
509 output
510 ------
511
512 The ``output`` part is used to display output on the receiver.
513
514 This part has no parameters.
515
516 The payload consists of raw data to be printed on the receiver.
517
518 phase-heads
519 -----------
520
521 The ``phase-heads`` part defines phase boundaries.
522
523 This part has no parameters.
524
525 The payload consists of an array of 24 byte entries. Each entry is
526 a big endian 32-bit integer defining the phase integer and 20 byte
527 binary node value.
528
529 pushkey
530 -------
531
532 The ``pushkey`` part communicates an intent to perform a ``pushkey``
533 request.
534
535 The following part parameters are defined:
536
537 namespace
538 The pushkey domain to operate on.
539
540 key
541 The key within the pushkey namespace that is being changed.
542
543 old
544 The old value for the key being changed.
545
546 new
547 The new value for the key being changed.
548
549 This part has no payload.
550
551 The receiver should perform a pushkey operation as described by this
552 part's parameters.
553
554 If the pushey operation fails, a ``reply:pushkey`` part should be sent
555 back to the sender, if possible. The ``in-reply-to`` part parameter
556 should reference the source part.
557
558 pushvars
559 --------
560
561 The ``pushvars`` part defines environment variables that should be
562 set when processing this bundle2 payload.
563
564 The part's advisory parameters define environment variables.
565
566 There is no part payload.
567
568 When received, part parameters are prefixed with ``USERVAR_`` and the
569 resulting variables are defined in the hooks context for the current
570 bundle2 application. This part provides a mechanism for senders to
571 inject extra state into the hook execution environment on the receiver.
572
573 remote-changegroup
574 ------------------
575
576 The ``remote-changegroup`` part defines an external location of a bundle
577 to apply. This part can be used by servers to serve pre-generated bundles
578 hosted at arbitrary URLs.
579
580 The following part parameters are defined:
581
582 url
583 The URL of the remote bundle.
584
585 size
586 The size in bytes of the remote bundle.
587
588 digests
589 A space separated list of the digest types provided in additional
590 part parameters.
591
592 digest:<type>
593 The hexadecimal representation of the digest (hash) of the remote bundle.
594
595 There is no payload for this part type.
596
597 When encountered, clients should attempt to fetch the URL being advertised
598 and read and apply it as a bundle.
599
600 The ``size`` and ``digest:<type>`` parameters should be used to validate
601 that the downloaded bundle matches what was advertised. If a mismatch occurs,
602 the client should abort.
603
604 reply:changegroup
605 -----------------
606
607 The ``reply:changegroup`` part conveys the results of application of a
608 ``changegroup`` part.
609
610 The following part parameters are defined:
611
612 return
613 Integer return code from changegroup application.
614
615 in-reply-to
616 Part ID of part this reply is in response to.
617
618 reply:obsmarkers
619 ----------------
620
621 The ``reply:obsmarkers`` part conveys the results of applying an
622 ``obsmarkers`` part.
623
624 The following part parameters are defined:
625
626 new
627 The integer number of new markers that were applied.
628
629 in-reply-to
630 The part ID that this part is in reply to.
631
632 reply:pushkey
633 -------------
634
635 The ``reply:pushkey`` part conveys the result of a *pushkey* operation.
636
637 The following part parameters are defined:
638
639 return
640 Integer result code from pushkey operation.
641
642 in-reply-to
643 Part ID that triggered this pushkey operation.
644
645 This part has no payload.
646
647 replycaps
648 ---------
649
650 The ``replycaps`` part notifies the receiver that a reply bundle should
651 be created.
652
653 This part has no parameters.
654
655 The payload consists of a bundle2 capabilities blob.
656
657 stream2
658 -------
659
660 The ``stream2`` part contains *streaming clone* version 2 data.
661
662 The following part parameters are defined:
663
664 requirements
665 URL quoted repository requirements string. Requirements are delimited by a
666 command (``,``).
667
668 filecount
669 The total number of files being transferred in the payload.
670
671 bytecount
672 The total size of file content being transferred in the payload.
673
674 The payload consists of raw stream clone version 2 data.
675
676 The ``filecount`` and ``bytecount`` parameters can be used for progress and
677 reporting purposes. The values may not be exact.
@@ -40,6 +40,7 b''
40
40
41 <Directory Id="help.internaldir" Name="internals">
41 <Directory Id="help.internaldir" Name="internals">
42 <Component Id="help.internals" Guid="$(var.help.internals.guid)" Win64='$(var.IsX64)'>
42 <Component Id="help.internals" Guid="$(var.help.internals.guid)" Win64='$(var.IsX64)'>
43 <File Id="internals.bundle2.txt" Name="bundle2.txt" />
43 <File Id="internals.bundles.txt" Name="bundles.txt" KeyPath="yes" />
44 <File Id="internals.bundles.txt" Name="bundles.txt" KeyPath="yes" />
44 <File Id="internals.censor.txt" Name="censor.txt" />
45 <File Id="internals.censor.txt" Name="censor.txt" />
45 <File Id="internals.changegroups.txt" Name="changegroups.txt" />
46 <File Id="internals.changegroups.txt" Name="changegroups.txt" />
@@ -197,6 +197,8 b' def loaddoc(topic, subdir=None):'
197 return loader
197 return loader
198
198
199 internalstable = sorted([
199 internalstable = sorted([
200 (['bundle2'], _('Bundle2'),
201 loaddoc('bundle2', subdir='internals')),
200 (['bundles'], _('Bundles'),
202 (['bundles'], _('Bundles'),
201 loaddoc('bundles', subdir='internals')),
203 loaddoc('bundles', subdir='internals')),
202 (['censor'], _('Censor'),
204 (['censor'], _('Censor'),
@@ -63,8 +63,7 b' supported by ``HG10`` bundles.'
63
63
64 ``HG20`` is currently the only defined bundle2 version.
64 ``HG20`` is currently the only defined bundle2 version.
65
65
66 The ``HG20`` format is not yet documented here. See the inline comments
66 The ``HG20`` format is documented at :hg:`help internals.bundle2`.
67 in ``mercurial/exchange.py`` for now.
68
67
69 Initial ``HG20`` support was added in Mercurial 3.0 (released May
68 Initial ``HG20`` support was added in Mercurial 3.0 (released May
70 2014). However, bundle2 bundles were hidden behind an experimental flag
69 2014). However, bundle2 bundles were hidden behind an experimental flag
@@ -993,6 +993,7 b' internals topic renders index of availab'
993
993
994 To access a subtopic, use "hg help internals.{subtopic-name}"
994 To access a subtopic, use "hg help internals.{subtopic-name}"
995
995
996 bundle2 Bundle2
996 bundles Bundles
997 bundles Bundles
997 censor Censor
998 censor Censor
998 changegroups Changegroups
999 changegroups Changegroups
@@ -3059,6 +3060,13 b' Sub-topic indexes rendered properly'
3059 <tr><td colspan="2"><h2><a name="topics" href="#topics">Topics</a></h2></td></tr>
3060 <tr><td colspan="2"><h2><a name="topics" href="#topics">Topics</a></h2></td></tr>
3060
3061
3061 <tr><td>
3062 <tr><td>
3063 <a href="/help/internals.bundle2">
3064 bundle2
3065 </a>
3066 </td><td>
3067 Bundle2
3068 </td></tr>
3069 <tr><td>
3062 <a href="/help/internals.bundles">
3070 <a href="/help/internals.bundles">
3063 bundles
3071 bundles
3064 </a>
3072 </a>
General Comments 0
You need to be logged in to leave comments. Login now