##// END OF EJS Templates
wireprotov2: define semantics for content redirects...
Gregory Szorc -
r40058:33eb670e default
parent child Browse files
Show More
@@ -1,519 +1,649 b''
1 1 **Experimental and under development**
2 2
3 3 This document describe's Mercurial's transport-agnostic remote procedure
4 4 call (RPC) protocol which is used to perform interactions with remote
5 5 servers. This protocol is also referred to as ``hgrpc``.
6 6
7 7 The protocol has the following high-level features:
8 8
9 9 * Concurrent request and response support (multiple commands can be issued
10 10 simultaneously and responses can be streamed simultaneously).
11 11 * Supports half-duplex and full-duplex connections.
12 12 * All data is transmitted within *frames*, which have a well-defined
13 13 header and encode their length.
14 14 * Side-channels for sending progress updates and printing output. Text
15 15 output from the remote can be localized locally.
16 16 * Support for simultaneous and long-lived compression streams, even across
17 17 requests.
18 18 * Uses CBOR for data exchange.
19 19
20 20 The protocol is not specific to Mercurial and could be used by other
21 21 applications.
22 22
23 23 High-level Overview
24 24 ===================
25 25
26 26 To operate the protocol, a bi-directional, half-duplex pipe supporting
27 27 ordered sends and receives is required. That is, each peer has one pipe
28 28 for sending data and another for receiving. Full-duplex pipes are also
29 29 supported.
30 30
31 31 All data is read and written in atomic units called *frames*. These
32 32 are conceptually similar to TCP packets. Higher-level functionality
33 33 is built on the exchange and processing of frames.
34 34
35 35 All frames are associated with a *stream*. A *stream* provides a
36 36 unidirectional grouping of frames. Streams facilitate two goals:
37 37 content encoding and parallelism. There is a dedicated section on
38 38 streams below.
39 39
40 40 The protocol is request-response based: the client issues requests to
41 41 the server, which issues replies to those requests. Server-initiated
42 42 messaging is not currently supported, but this specification carves
43 43 out room to implement it.
44 44
45 45 All frames are associated with a numbered request. Frames can thus
46 46 be logically grouped by their request ID.
47 47
48 48 Frames
49 49 ======
50 50
51 51 Frames begin with an 8 octet header followed by a variable length
52 52 payload::
53 53
54 54 +------------------------------------------------+
55 55 | Length (24) |
56 56 +--------------------------------+---------------+
57 57 | Request ID (16) | Stream ID (8) |
58 58 +------------------+-------------+---------------+
59 59 | Stream Flags (8) |
60 60 +-----------+------+
61 61 | Type (4) |
62 62 +-----------+
63 63 | Flags (4) |
64 64 +===========+===================================================|
65 65 | Frame Payload (0...) ...
66 66 +---------------------------------------------------------------+
67 67
68 68 The length of the frame payload is expressed as an unsigned 24 bit
69 69 little endian integer. Values larger than 65535 MUST NOT be used unless
70 70 given permission by the server as part of the negotiated capabilities
71 71 during the handshake. The frame header is not part of the advertised
72 72 frame length. The payload length is the over-the-wire length. If there
73 73 is content encoding applied to the payload as part of the frame's stream,
74 74 the length is the output of that content encoding, not the input.
75 75
76 76 The 16-bit ``Request ID`` field denotes the integer request identifier,
77 77 stored as an unsigned little endian integer. Odd numbered requests are
78 78 client-initiated. Even numbered requests are server-initiated. This
79 79 refers to where the *request* was initiated - not where the *frame* was
80 80 initiated, so servers will send frames with odd ``Request ID`` in
81 81 response to client-initiated requests. Implementations are advised to
82 82 start ordering request identifiers at ``1`` and ``0``, increment by
83 83 ``2``, and wrap around if all available numbers have been exhausted.
84 84
85 85 The 8-bit ``Stream ID`` field denotes the stream that the frame is
86 86 associated with. Frames belonging to a stream may have content
87 87 encoding applied and the receiver may need to decode the raw frame
88 88 payload to obtain the original data. Odd numbered IDs are
89 89 client-initiated. Even numbered IDs are server-initiated.
90 90
91 91 The 8-bit ``Stream Flags`` field defines stream processing semantics.
92 92 See the section on streams below.
93 93
94 94 The 4-bit ``Type`` field denotes the type of frame being sent.
95 95
96 96 The 4-bit ``Flags`` field defines special, per-type attributes for
97 97 the frame.
98 98
99 99 The sections below define the frame types and their behavior.
100 100
101 101 Command Request (``0x01``)
102 102 --------------------------
103 103
104 104 This frame contains a request to run a command.
105 105
106 106 The payload consists of a CBOR map defining the command request. The
107 107 bytestring keys of that map are:
108 108
109 109 name
110 110 Name of the command that should be executed (bytestring).
111 111 args
112 112 Map of bytestring keys to various value types containing the named
113 113 arguments to this command.
114 114
115 115 Each command defines its own set of argument names and their expected
116 116 types.
117 117
118 redirect (optional)
119 (map) Advertises client support for following response *redirects*.
120
121 This map has the following bytestring keys:
122
123 targets
124 (array of bytestring) List of named redirect targets supported by
125 this client. The names come from the targets advertised by the
126 server's *capabilities* message.
127
128 hashes
129 (array of bytestring) List of preferred hashing algorithms that can
130 be used for content integrity verification.
131
132 See the *Content Redirects* section below for more on content redirects.
133
118 134 This frame type MUST ONLY be sent from clients to servers: it is illegal
119 135 for a server to send this frame to a client.
120 136
121 137 The following flag values are defined for this type:
122 138
123 139 0x01
124 140 New command request. When set, this frame represents the beginning
125 141 of a new request to run a command. The ``Request ID`` attached to this
126 142 frame MUST NOT be active.
127 143 0x02
128 144 Command request continuation. When set, this frame is a continuation
129 145 from a previous command request frame for its ``Request ID``. This
130 146 flag is set when the CBOR data for a command request does not fit
131 147 in a single frame.
132 148 0x04
133 149 Additional frames expected. When set, the command request didn't fit
134 150 into a single frame and additional CBOR data follows in a subsequent
135 151 frame.
136 152 0x08
137 153 Command data frames expected. When set, command data frames are
138 154 expected to follow the final command request frame for this request.
139 155
140 156 ``0x01`` MUST be set on the initial command request frame for a
141 157 ``Request ID``.
142 158
143 159 ``0x01`` or ``0x02`` MUST be set to indicate this frame's role in
144 160 a series of command request frames.
145 161
146 162 If command data frames are to be sent, ``0x08`` MUST be set on ALL
147 163 command request frames.
148 164
149 165 Command Data (``0x02``)
150 166 -----------------------
151 167
152 168 This frame contains raw data for a command.
153 169
154 170 Most commands can be executed by specifying arguments. However,
155 171 arguments have an upper bound to their length. For commands that
156 172 accept data that is beyond this length or whose length isn't known
157 173 when the command is initially sent, they will need to stream
158 174 arbitrary data to the server. This frame type facilitates the sending
159 175 of this data.
160 176
161 177 The payload of this frame type consists of a stream of raw data to be
162 178 consumed by the command handler on the server. The format of the data
163 179 is command specific.
164 180
165 181 The following flag values are defined for this type:
166 182
167 183 0x01
168 184 Command data continuation. When set, the data for this command
169 185 continues into a subsequent frame.
170 186
171 187 0x02
172 188 End of data. When set, command data has been fully sent to the
173 189 server. The command has been fully issued and no new data for this
174 190 command will be sent. The next frame will belong to a new command.
175 191
176 192 Command Response Data (``0x03``)
177 193 --------------------------------
178 194
179 195 This frame contains response data to an issued command.
180 196
181 197 Response data ALWAYS consists of a series of 1 or more CBOR encoded
182 198 values. A CBOR value may be using indefinite length encoding. And the
183 199 bytes constituting the value may span several frames.
184 200
185 201 The following flag values are defined for this type:
186 202
187 203 0x01
188 204 Data continuation. When set, an additional frame containing response data
189 205 will follow.
190 206 0x02
191 207 End of data. When set, the response data has been fully sent and
192 208 no additional frames for this response will be sent.
193 209
194 210 The ``0x01`` flag is mutually exclusive with the ``0x02`` flag.
195 211
196 212 Error Occurred (``0x05``)
197 213 -------------------------
198 214
199 215 Some kind of error occurred.
200 216
201 217 There are 3 general kinds of failures that can occur:
202 218
203 219 * Command error encountered before any response issued
204 220 * Command error encountered after a response was issued
205 221 * Protocol or stream level error
206 222
207 223 This frame type is used to capture the latter cases. (The general
208 224 command error case is handled by the leading CBOR map in
209 225 ``Command Response`` frames.)
210 226
211 227 The payload of this frame contains a CBOR map detailing the error. That
212 228 map has the following bytestring keys:
213 229
214 230 type
215 231 (bytestring) The overall type of error encountered. Can be one of the
216 232 following values:
217 233
218 234 protocol
219 235 A protocol-level error occurred. This typically means someone
220 236 is violating the framing protocol semantics and the server is
221 237 refusing to proceed.
222 238
223 239 server
224 240 A server-level error occurred. This typically indicates some kind of
225 241 logic error on the server, likely the fault of the server.
226 242
227 243 command
228 244 A command-level error, likely the fault of the client.
229 245
230 246 message
231 247 (array of maps) A richly formatted message that is intended for
232 248 human consumption. See the ``Human Output Side-Channel`` frame
233 249 section for a description of the format of this data structure.
234 250
235 251 Human Output Side-Channel (``0x06``)
236 252 ------------------------------------
237 253
238 254 This frame contains a message that is intended to be displayed to
239 255 people. Whereas most frames communicate machine readable data, this
240 256 frame communicates textual data that is intended to be shown to
241 257 humans.
242 258
243 259 The frame consists of a series of *formatting requests*. Each formatting
244 260 request consists of a formatting string, arguments for that formatting
245 261 string, and labels to apply to that formatting string.
246 262
247 263 A formatting string is a printf()-like string that allows variable
248 264 substitution within the string. Labels allow the rendered text to be
249 265 *decorated*. Assuming use of the canonical Mercurial code base, a
250 266 formatting string can be the input to the ``i18n._`` function. This
251 267 allows messages emitted from the server to be localized. So even if
252 268 the server has different i18n settings, people could see messages in
253 269 their *native* settings. Similarly, the use of labels allows
254 270 decorations like coloring and underlining to be applied using the
255 271 client's configured rendering settings.
256 272
257 273 Formatting strings are similar to ``printf()`` strings or how
258 274 Python's ``%`` operator works. The only supported formatting sequences
259 275 are ``%s`` and ``%%``. ``%s`` will be replaced by whatever the string
260 276 at that position resolves to. ``%%`` will be replaced by ``%``. All
261 277 other 2-byte sequences beginning with ``%`` represent a literal
262 278 ``%`` followed by that character. However, future versions of the
263 279 wire protocol reserve the right to allow clients to opt in to receiving
264 280 formatting strings with additional formatters, hence why ``%%`` is
265 281 required to represent the literal ``%``.
266 282
267 283 The frame payload consists of a CBOR array of CBOR maps. Each map
268 284 defines an *atom* of text data to print. Each *atom* has the following
269 285 bytestring keys:
270 286
271 287 msg
272 288 (bytestring) The formatting string. Content MUST be ASCII.
273 289 args (optional)
274 290 Array of bytestrings defining arguments to the formatting string.
275 291 labels (optional)
276 292 Array of bytestrings defining labels to apply to this atom.
277 293
278 294 All data to be printed MUST be encoded into a single frame: this frame
279 295 does not support spanning data across multiple frames.
280 296
281 297 All textual data encoded in these frames is assumed to be line delimited.
282 298 The last atom in the frame SHOULD end with a newline (``\n``). If it
283 299 doesn't, clients MAY add a newline to facilitate immediate printing.
284 300
285 301 Progress Update (``0x07``)
286 302 --------------------------
287 303
288 304 This frame holds the progress of an operation on the peer. Consumption
289 305 of these frames allows clients to display progress bars, estimated
290 306 completion times, etc.
291 307
292 308 Each frame defines the progress of a single operation on the peer. The
293 309 payload consists of a CBOR map with the following bytestring keys:
294 310
295 311 topic
296 312 Topic name (string)
297 313 pos
298 314 Current numeric position within the topic (integer)
299 315 total
300 316 Total/end numeric position of this topic (unsigned integer)
301 317 label (optional)
302 318 Unit label (string)
303 319 item (optional)
304 320 Item name (string)
305 321
306 322 Progress state is created when a frame is received referencing a
307 323 *topic* that isn't currently tracked. Progress tracking for that
308 324 *topic* is finished when a frame is received reporting the current
309 325 position of that topic as ``-1``.
310 326
311 327 Multiple *topics* may be active at any given time.
312 328
313 329 Rendering of progress information is not mandated or governed by this
314 330 specification: implementations MAY render progress information however
315 331 they see fit, including not at all.
316 332
317 333 The string data describing the topic SHOULD be static strings to
318 334 facilitate receivers localizing that string data. The emitter
319 335 MUST normalize all string data to valid UTF-8 and receivers SHOULD
320 336 validate that received data conforms to UTF-8. The topic name
321 337 SHOULD be ASCII.
322 338
323 339 Stream Encoding Settings (``0x08``)
324 340 -----------------------------------
325 341
326 342 This frame type holds information defining the content encoding
327 343 settings for a *stream*.
328 344
329 345 This frame type is likely consumed by the protocol layer and is not
330 346 passed on to applications.
331 347
332 348 This frame type MUST ONLY occur on frames having the *Beginning of Stream*
333 349 ``Stream Flag`` set.
334 350
335 351 The payload of this frame defines what content encoding has (possibly)
336 352 been applied to the payloads of subsequent frames in this stream.
337 353
338 354 The payload begins with an 8-bit integer defining the length of the
339 355 encoding *profile*, followed by the string name of that profile, which
340 356 must be an ASCII string. All bytes that follow can be used by that
341 357 profile for supplemental settings definitions. See the section below
342 358 on defined encoding profiles.
343 359
344 360 Stream States and Flags
345 361 =======================
346 362
347 363 Streams can be in two states: *open* and *closed*. An *open* stream
348 364 is active and frames attached to that stream could arrive at any time.
349 365 A *closed* stream is not active. If a frame attached to a *closed*
350 366 stream arrives, that frame MUST have an appropriate stream flag
351 367 set indicating beginning of stream. All streams are in the *closed*
352 368 state by default.
353 369
354 370 The ``Stream Flags`` field denotes a set of bit flags for defining
355 371 the relationship of this frame within a stream. The following flags
356 372 are defined:
357 373
358 374 0x01
359 375 Beginning of stream. The first frame in the stream MUST set this
360 376 flag. When received, the ``Stream ID`` this frame is attached to
361 377 becomes ``open``.
362 378
363 379 0x02
364 380 End of stream. The last frame in a stream MUST set this flag. When
365 381 received, the ``Stream ID`` this frame is attached to becomes
366 382 ``closed``. Any content encoding context associated with this stream
367 383 can be destroyed after processing the payload of this frame.
368 384
369 385 0x04
370 386 Apply content encoding. When set, any content encoding settings
371 387 defined by the stream should be applied when attempting to read
372 388 the frame. When not set, the frame payload isn't encoded.
373 389
374 390 Streams
375 391 =======
376 392
377 393 Streams - along with ``Request IDs`` - facilitate grouping of frames.
378 394 But the purpose of each is quite different and the groupings they
379 395 constitute are independent.
380 396
381 397 A ``Request ID`` is essentially a tag. It tells you which logical
382 398 request a frame is associated with.
383 399
384 400 A *stream* is a sequence of frames grouped for the express purpose
385 401 of applying a stateful encoding or for denoting sub-groups of frames.
386 402
387 403 Unlike ``Request ID``s which span the request and response, a stream
388 404 is unidirectional and stream IDs are independent from client to
389 405 server.
390 406
391 407 There is no strict hierarchical relationship between ``Request IDs``
392 408 and *streams*. A stream can contain frames having multiple
393 409 ``Request IDs``. Frames belonging to the same ``Request ID`` can
394 410 span multiple streams.
395 411
396 412 One goal of streams is to facilitate content encoding. A stream can
397 413 define an encoding to be applied to frame payloads. For example, the
398 414 payload transmitted over the wire may contain output from a
399 415 zstandard compression operation and the receiving end may decompress
400 416 that payload to obtain the original data.
401 417
402 418 The other goal of streams is to facilitate concurrent execution. For
403 419 example, a server could spawn 4 threads to service a request that can
404 420 be easily parallelized. Each of those 4 threads could write into its
405 421 own stream. Those streams could then in turn be delivered to 4 threads
406 422 on the receiving end, with each thread consuming its stream in near
407 423 isolation. The *main* thread on both ends merely does I/O and
408 424 encodes/decodes frame headers: the bulk of the work is done by worker
409 425 threads.
410 426
411 427 In addition, since content encoding is defined per stream, each
412 428 *worker thread* could perform potentially CPU bound work concurrently
413 429 with other threads. This approach of applying encoding at the
414 430 sub-protocol / stream level eliminates a potential resource constraint
415 431 on the protocol stream as a whole (it is common for the throughput of
416 432 a compression engine to be smaller than the throughput of a network).
417 433
418 434 Having multiple streams - each with their own encoding settings - also
419 435 facilitates the use of advanced data compression techniques. For
420 436 example, a transmitter could see that it is generating data faster
421 437 and slower than the receiving end is consuming it and adjust its
422 438 compression settings to trade CPU for compression ratio accordingly.
423 439
424 440 While streams can define a content encoding, not all frames within
425 441 that stream must use that content encoding. This can be useful when
426 442 data is being served from caches and being derived dynamically. A
427 443 cache could pre-compressed data so the server doesn't have to
428 444 recompress it. The ability to pick and choose which frames are
429 445 compressed allows servers to easily send data to the wire without
430 446 involving potentially expensive encoding overhead.
431 447
432 448 Content Encoding Profiles
433 449 =========================
434 450
435 451 Streams can have named content encoding *profiles* associated with
436 452 them. A profile defines a shared understanding of content encoding
437 453 settings and behavior.
438 454
439 455 The following profiles are defined:
440 456
441 457 TBD
442 458
443 459 Command Protocol
444 460 ================
445 461
446 462 A client can request that a remote run a command by sending it
447 463 frames defining that command. This logical stream is composed of
448 464 1 or more ``Command Request`` frames and and 0 or more ``Command Data``
449 465 frames.
450 466
451 467 All frames composing a single command request MUST be associated with
452 468 the same ``Request ID``.
453 469
454 470 Clients MAY send additional command requests without waiting on the
455 471 response to a previous command request. If they do so, they MUST ensure
456 472 that the ``Request ID`` field of outbound frames does not conflict
457 473 with that of an active ``Request ID`` whose response has not yet been
458 474 fully received.
459 475
460 476 Servers MAY respond to commands in a different order than they were
461 477 sent over the wire. Clients MUST be prepared to deal with this. Servers
462 478 also MAY start executing commands in a different order than they were
463 479 received, or MAY execute multiple commands concurrently.
464 480
465 481 If there is a dependency between commands or a race condition between
466 482 commands executing (e.g. a read-only command that depends on the results
467 483 of a command that mutates the repository), then clients MUST NOT send
468 484 frames issuing a command until a response to all dependent commands has
469 485 been received.
470 486 TODO think about whether we should express dependencies between commands
471 487 to avoid roundtrip latency.
472 488
473 489 A command is defined by a command name, 0 or more command arguments,
474 490 and optional command data.
475 491
476 492 Arguments are the recommended mechanism for transferring fixed sets of
477 493 parameters to a command. Data is appropriate for transferring variable
478 494 data. Thinking in terms of HTTP, arguments would be headers and data
479 495 would be the message body.
480 496
481 497 It is recommended for servers to delay the dispatch of a command
482 498 until all argument have been received. Servers MAY impose limits on the
483 499 maximum argument size.
484 500 TODO define failure mechanism.
485 501
486 502 Servers MAY dispatch to commands immediately once argument data
487 503 is available or delay until command data is received in full.
488 504
489 505 Once a ``Command Request`` frame is sent, a client must be prepared to
490 506 receive any of the following frames associated with that request:
491 507 ``Command Response``, ``Error Response``, ``Human Output Side-Channel``,
492 508 ``Progress Update``.
493 509
494 510 The *main* response for a command will be in ``Command Response`` frames.
495 511 The payloads of these frames consist of 1 or more CBOR encoded values.
496 512 The first CBOR value on the first ``Command Response`` frame is special
497 513 and denotes the overall status of the command. This CBOR map contains
498 514 the following bytestring keys:
499 515
500 516 status
501 517 (bytestring) A well-defined message containing the overall status of
502 518 this command request. The following values are defined:
503 519
504 520 ok
505 521 The command was received successfully and its response follows.
506 522 error
507 523 There was an error processing the command. More details about the
508 524 error are encoded in the ``error`` key.
525 redirect
526 The response for this command is available elsewhere. Details on
527 where are in the ``location`` key.
509 528
510 529 error (optional)
511 530 A map containing information about an encountered error. The map has the
512 531 following keys:
513 532
514 533 message
515 534 (array of maps) A message describing the error. The message uses the
516 535 same format as those in the ``Human Output Side-Channel`` frame.
517 536
537 location (optional)
538 (map) Presence indicates that a *content redirect* has occurred. The map
539 provides the external location of the content.
540
541 This map contains the following bytestring keys:
542
543 url
544 (bytestring) URL from which this content may be requested.
545
546 mediatype
547 (bytestring) The media type for the fetched content. e.g.
548 ``application/mercurial-*``.
549
550 In some transports, this value is also advertised by the transport.
551 e.g. as the ``Content-Type`` HTTP header.
552
553 size (optional)
554 (unsigned integer) Total size of remote object in bytes. This is
555 the raw size of the entity that will be fetched, minus any
556 non-Mercurial protocol encoding (e.g. HTTP content or transfer
557 encoding.)
558
559 fullhashes (optional)
560 (array of arrays) Content hashes for the entire payload. Each entry
561 is an array of bytestrings containing the hash name and the hash value.
562
563 fullhashseed (optional)
564 (bytestring) Optional seed value to feed into hasher for full content
565 hash verification.
566
567 serverdercerts (optional)
568 (array of bytestring) DER encoded x509 certificates for the server. When
569 defined, clients MAY validate that the x509 certificate on the target
570 server exactly matches the certificate used here.
571
572 servercadercerts (optional)
573 (array of bytestring) DER encoded x509 certificates for the certificate
574 authority of the target server. When defined, clients MAY validate that
575 the x509 on the target server was signed by CA certificate in this set.
576
577 # TODO support for giving client an x509 certificate pair to be used as a
578 # client certificate.
579
580 # TODO support common authentication mechanisms (e.g. HTTP basic/digest
581 # auth).
582
583 # TODO support custom authentication mechanisms. This likely requires
584 # server to advertise required auth mechanism so client can filter.
585
586 # TODO support chained hashes. e.g. hash for each 1MB segment so client
587 # can iteratively validate data without having to consume all of it first.
588
518 589 TODO formalize when error frames can be seen and how errors can be
519 590 recognized midway through a command response.
591
592 Content Redirects
593 =================
594
595 Servers have the ability to respond to ANY command request with a
596 *redirect* to another location. Such a response is referred to as a *redirect
597 response*. (This feature is conceptually similar to HTTP redirects, but is
598 more powerful.)
599
600 A *redirect response* MUST ONLY be issued if the client advertises support
601 for a redirect *target*.
602
603 A *redirect response* MUST NOT be issued unless the client advertises support
604 for one.
605
606 Clients advertise support for *redirect responses* after looking at the server's
607 *capabilities* data, which is fetched during initial server connection
608 handshake. The server's capabilities data advertises named *targets* for
609 potential redirects.
610
611 Each target is described by a protocol name, connection and protocol features,
612 etc. The server also advertises target-agnostic redirect settings, such as
613 which hash algorithms are supported for content integrity checking. (See
614 the documentation for the *capabilities* command for more.)
615
616 Clients examine the set of advertised redirect targets for compatibility.
617 When sending a command request, the client advertises the set of redirect
618 target names it is willing to follow, along with some other settings influencing
619 behavior.
620
621 For example, say the server is advertising a ``cdn`` redirect target that
622 requires SNI and TLS 1.2. If the client supports those features, it will
623 send command requests stating that the ``cdn`` target is acceptable to use.
624 But if the client doesn't support SNI or TLS 1.2 (or maybe it encountered an
625 error using this target from a previous request), then it omits this target
626 name.
627
628 If the client advertises support for a redirect target, the server MAY
629 substitute the normal, inline response data for a *redirect response* -
630 one where the initial CBOR map has a ``status`` key with value ``redirect``.
631
632 The *redirect response* at a minimum advertises the URL where the response
633 can be retrieved.
634
635 The *redirect response* MAY also advertise additional details about that
636 content and how to retrieve it. Notably, the response may contain the
637 x509 public certificates for the server being redirected to or the
638 certificate authority that signed that server's certificate. Unless the
639 client has existing settings that offer stronger trust validation than what
640 the server advertises, the client SHOULD use the server-provided certificates
641 when validating the connection to the remote server in place of any default
642 connection verification checks. This is because certificates coming from
643 the server SHOULD establish a stronger chain of trust than what the default
644 certification validation mechanism in most environments provides. (By default,
645 certificate validation ensures the signer of the cert chains up to a set of
646 trusted root certificates. And if an explicit certificate or CA certificate
647 is presented, that greadly reduces the set of certificates that will be
648 recognized as valid, thus reducing the potential for a "bad" certificate
649 to be used and trusted.)
@@ -1,507 +1,539 b''
1 1 **Experimental and under active development**
2 2
3 3 This section documents the wire protocol commands exposed to transports
4 4 using the frame-based protocol. The set of commands exposed through
5 5 these transports is distinct from the set of commands exposed to legacy
6 6 transports.
7 7
8 8 The frame-based protocol uses CBOR to encode command execution requests.
9 9 All command arguments must be mapped to a specific or set of CBOR data
10 10 types.
11 11
12 12 The response to many commands is also CBOR. There is no common response
13 13 format: each command defines its own response format.
14 14
15 15 TODOs
16 16 =====
17 17
18 18 * Add "node namespace" support to each command. In order to support
19 19 SHA-1 hash transition, we want servers to be able to expose different
20 20 "node namespaces" for the same data. Every command operating on nodes
21 21 should specify which "node namespace" it is operating on and responses
22 22 should encode the "node namespace" accordingly.
23 23
24 24 Commands
25 25 ========
26 26
27 27 The sections below detail all commands available to wire protocol version
28 28 2.
29 29
30 30 branchmap
31 31 ---------
32 32
33 33 Obtain heads in named branches.
34 34
35 35 Receives no arguments.
36 36
37 37 The response is a map with bytestring keys defining the branch name.
38 38 Values are arrays of bytestring defining raw changeset nodes.
39 39
40 40 capabilities
41 41 ------------
42 42
43 43 Obtain the server's capabilities.
44 44
45 45 Receives no arguments.
46 46
47 47 This command is typically called only as part of the handshake during
48 48 initial connection establishment.
49 49
50 50 The response is a map with bytestring keys defining server information.
51 51
52 52 The defined keys are:
53 53
54 54 commands
55 55 A map defining available wire protocol commands on this server.
56 56
57 57 Keys in the map are the names of commands that can be invoked. Values
58 58 are maps defining information about that command. The bytestring keys
59 59 are:
60 60
61 61 args
62 62 (map) Describes arguments accepted by the command.
63 63
64 64 Keys are bytestrings denoting the argument name.
65 65
66 66 Values are maps describing the argument. The map has the following
67 67 bytestring keys:
68 68
69 69 default
70 70 (varied) The default value for this argument if not specified. Only
71 71 present if ``required`` is not true.
72 72
73 73 required
74 74 (boolean) Whether the argument must be specified. Failure to send
75 75 required arguments will result in an error executing the command.
76 76
77 77 type
78 78 (bytestring) The type of the argument. e.g. ``bytes`` or ``bool``.
79 79
80 80 validvalues
81 81 (set) Values that are recognized for this argument. Some arguments
82 82 only allow a fixed set of values to be specified. These arguments
83 83 may advertise that set in this key. If this set is advertised and
84 84 a value not in this set is specified, the command should result
85 85 in error.
86 86
87 87 permissions
88 88 An array of permissions required to execute this command.
89 89
90 90 compression
91 91 An array of maps defining available compression format support.
92 92
93 93 The array is sorted from most preferred to least preferred.
94 94
95 95 Each entry has the following bytestring keys:
96 96
97 97 name
98 98 Name of the compression engine. e.g. ``zstd`` or ``zlib``.
99 99
100 100 framingmediatypes
101 101 An array of bytestrings defining the supported framing protocol
102 102 media types. Servers will not accept media types not in this list.
103 103
104 104 pathfilterprefixes
105 105 (set of bytestring) Matcher prefixes that are recognized when performing
106 106 path filtering. Specifying a path filter whose type/prefix does not
107 107 match one in this set will likely be rejected by the server.
108 108
109 109 rawrepoformats
110 110 An array of storage formats the repository is using. This set of
111 111 requirements can be used to determine whether a client can read a
112 112 *raw* copy of file data available.
113 113
114 redirect
115 A map declaring potential *content redirects* that may be used by this
116 server. Contains the following bytestring keys:
117
118 targets
119 (array of maps) Potential redirect targets. Values are maps describing
120 this target in more detail. Each map has the following bytestring keys:
121
122 name
123 (bytestring) Identifier for this target. The identifier will be used
124 by clients to uniquely identify this target.
125
126 protocol
127 (bytestring) High-level network protocol. Values can be
128 ``http``, ```https``, ``ssh``, etc.
129
130 uris
131 (array of bytestrings) Representative URIs for this target.
132
133 snirequired (optional)
134 (boolean) Indicates whether Server Name Indication is required
135 to use this target. Defaults to False.
136
137 tlsversions (optional)
138 (array of bytestring) Indicates which TLS versions are supported by
139 this target. Values are ``1.1``, ``1.2``, ``1.3``, etc.
140
141 hashes
142 (array of bytestring) Indicates support for hashing algorithms that are
143 used to ensure content integrity. Values include ``sha1``, ``sha256``,
144 etc.
145
114 146 changesetdata
115 147 -------------
116 148
117 149 Obtain various data related to changesets.
118 150
119 151 The command accepts the following arguments:
120 152
121 153 noderange
122 154 (array of arrays of bytestrings) An array of 2 elements, each being an
123 155 array of node bytestrings. The first array denotes the changelog revisions
124 156 that are already known to the client. The second array denotes the changelog
125 157 revision DAG heads to fetch. The argument essentially defines a DAG range
126 158 bounded by root and head nodes to fetch.
127 159
128 160 The roots array may be empty. The heads array must be defined.
129 161
130 162 nodes
131 163 (array of bytestrings) Changelog revisions to request explicitly.
132 164
133 165 nodesdepth
134 166 (unsigned integer) Number of ancestor revisions of elements in ``nodes``
135 167 to also fetch. When defined, for each element in ``nodes``, DAG ancestors
136 168 will be walked until at most N total revisions are emitted.
137 169
138 170 fields
139 171 (set of bytestring) Which data associated with changelog revisions to
140 172 fetch. The following values are recognized:
141 173
142 174 bookmarks
143 175 Bookmarks associated with a revision.
144 176
145 177 parents
146 178 Parent revisions.
147 179
148 180 phase
149 181 The phase state of a revision.
150 182
151 183 revision
152 184 The raw, revision data for the changelog entry. The hash of this data
153 185 will match the revision's node value.
154 186
155 187 The server resolves the set of revisions relevant to the request by taking
156 188 the union of the ``noderange`` and ``nodes`` arguments. At least one of these
157 189 arguments must be defined.
158 190
159 191 The response bytestream starts with a CBOR map describing the data that follows.
160 192 This map has the following bytestring keys:
161 193
162 194 totalitems
163 195 (unsigned integer) Total number of changelog revisions whose data is being
164 196 transferred. This maps to the set of revisions in the requested node
165 197 range, not the total number of records that follow (see below for why).
166 198
167 199 Following the map header is a series of 0 or more CBOR values. If values
168 200 are present, the first value will always be a map describing a single changeset
169 201 revision.
170 202
171 203 If the ``fieldsfollowing`` key is present, the map will immediately be followed
172 204 by N CBOR bytestring values, where N is the number of elements in
173 205 ``fieldsfollowing``. Each bytestring value corresponds to a field denoted
174 206 by ``fieldsfollowing``.
175 207
176 208 Following the optional bytestring field values is the next revision descriptor
177 209 map, or end of stream.
178 210
179 211 Each revision descriptor map has the following bytestring keys:
180 212
181 213 node
182 214 (bytestring) The node value for this revision. This is the SHA-1 hash of
183 215 the raw revision data.
184 216
185 217 bookmarks (optional)
186 218 (array of bytestrings) Bookmarks attached to this revision. Only present
187 219 if ``bookmarks`` data is being requested and the revision has bookmarks
188 220 attached.
189 221
190 222 fieldsfollowing (optional)
191 223 (array of 2-array) Denotes what fields immediately follow this map. Each
192 224 value is an array with 2 elements: the bytestring field name and an unsigned
193 225 integer describing the length of the data, in bytes.
194 226
195 227 If this key isn't present, no special fields will follow this map.
196 228
197 229 The following fields may be present:
198 230
199 231 revision
200 232 Raw, revision data for the changelog entry. Contains a serialized form
201 233 of the changeset data, including the author, date, commit message, set
202 234 of changed files, manifest node, and other metadata.
203 235
204 236 Only present if the ``revision`` field was requested.
205 237
206 238 parents (optional)
207 239 (array of bytestrings) The nodes representing the parent revisions of this
208 240 revision. Only present if ``parents`` data is being requested.
209 241
210 242 phase (optional)
211 243 (bytestring) The phase that a revision is in. Recognized values are
212 244 ``secret``, ``draft``, and ``public``. Only present if ``phase`` data
213 245 is being requested.
214 246
215 247 If nodes are requested via ``noderange``, they will be emitted in DAG order,
216 248 parents always before children.
217 249
218 250 If nodes are requested via ``nodes``, they will be emitted in requested order.
219 251
220 252 Nodes from ``nodes`` are emitted before nodes from ``noderange``.
221 253
222 254 The set of changeset revisions emitted may not match the exact set of
223 255 changesets requested. Furthermore, the set of keys present on each
224 256 map may vary. This is to facilitate emitting changeset updates as well
225 257 as new revisions.
226 258
227 259 For example, if the request wants ``phase`` and ``revision`` data,
228 260 the response may contain entries for each changeset in the common nodes
229 261 set with the ``phase`` key and without the ``revision`` key in order
230 262 to reflect a phase-only update.
231 263
232 264 TODO support different revision selection mechanisms (e.g. non-public, specific
233 265 revisions)
234 266 TODO support different hash "namespaces" for revisions (e.g. sha-1 versus other)
235 267 TODO support emitting obsolescence data
236 268 TODO support filtering based on relevant paths (narrow clone)
237 269 TODO support hgtagsfnodes cache / tags data
238 270 TODO support branch heads cache
239 271 TODO consider unify query mechanism. e.g. as an array of "query descriptors"
240 272 rather than a set of top-level arguments that have semantics when combined.
241 273
242 274 filedata
243 275 --------
244 276
245 277 Obtain various data related to an individual tracked file.
246 278
247 279 The command accepts the following arguments:
248 280
249 281 fields
250 282 (set of bytestring) Which data associated with a file to fetch.
251 283 The following values are recognized:
252 284
253 285 parents
254 286 Parent nodes for the revision.
255 287
256 288 revision
257 289 The raw revision data for a file.
258 290
259 291 haveparents
260 292 (bool) Whether the client has the parent revisions of all requested
261 293 nodes. If set, the server may emit revision data as deltas against
262 294 any parent revision. If not set, the server MUST only emit deltas for
263 295 revisions previously emitted by this command.
264 296
265 297 False is assumed in the absence of any value.
266 298
267 299 nodes
268 300 (array of bytestrings) File nodes whose data to retrieve.
269 301
270 302 path
271 303 (bytestring) Path of the tracked file whose data to retrieve.
272 304
273 305 TODO allow specifying revisions via alternate means (such as from
274 306 changeset revisions or ranges)
275 307
276 308 The response bytestream starts with a CBOR map describing the data that
277 309 follows. It has the following bytestream keys:
278 310
279 311 totalitems
280 312 (unsigned integer) Total number of file revisions whose data is
281 313 being returned.
282 314
283 315 Following the map header is a series of 0 or more CBOR values. If values
284 316 are present, the first value will always be a map describing a single changeset
285 317 revision.
286 318
287 319 If the ``fieldsfollowing`` key is present, the map will immediately be followed
288 320 by N CBOR bytestring values, where N is the number of elements in
289 321 ``fieldsfollowing``. Each bytestring value corresponds to a field denoted
290 322 by ``fieldsfollowing``.
291 323
292 324 Following the optional bytestring field values is the next revision descriptor
293 325 map, or end of stream.
294 326
295 327 Each revision descriptor map has the following bytestring keys:
296 328
297 329 Each map has the following bytestring keys:
298 330
299 331 node
300 332 (bytestring) The node of the file revision whose data is represented.
301 333
302 334 deltabasenode
303 335 (bytestring) Node of the file revision the following delta is against.
304 336
305 337 Only present if the ``revision`` field is requested and delta data
306 338 follows this map.
307 339
308 340 fieldsfollowing
309 341 (array of 2-array) Denotes extra bytestring fields that following this map.
310 342 See the documentation for ``changesetdata`` for semantics.
311 343
312 344 The following named fields may be present:
313 345
314 346 ``delta``
315 347 The delta data to use to construct the fulltext revision.
316 348
317 349 Only present if the ``revision`` field is requested and a delta is
318 350 being emitted. The ``deltabasenode`` top-level key will also be
319 351 present if this field is being emitted.
320 352
321 353 ``revision``
322 354 The fulltext revision data for this manifest. Only present if the
323 355 ``revision`` field is requested and a fulltext revision is being emitted.
324 356
325 357 parents
326 358 (array of bytestring) The nodes of the parents of this file revision.
327 359
328 360 Only present if the ``parents`` field is requested.
329 361
330 362 When ``revision`` data is requested, the server chooses to emit either fulltext
331 363 revision data or a delta. What the server decides can be inferred by looking
332 364 for the presence of the ``delta`` or ``revision`` keys in the
333 365 ``fieldsfollowing`` array.
334 366
335 367 heads
336 368 -----
337 369
338 370 Obtain DAG heads in the repository.
339 371
340 372 The command accepts the following arguments:
341 373
342 374 publiconly (optional)
343 375 (boolean) If set, operate on the DAG for public phase changesets only.
344 376 Non-public (i.e. draft) phase DAG heads will not be returned.
345 377
346 378 The response is a CBOR array of bytestrings defining changeset nodes
347 379 of DAG heads. The array can be empty if the repository is empty or no
348 380 changesets satisfied the request.
349 381
350 382 TODO consider exposing phase of heads in response
351 383
352 384 known
353 385 -----
354 386
355 387 Determine whether a series of changeset nodes is known to the server.
356 388
357 389 The command accepts the following arguments:
358 390
359 391 nodes
360 392 (array of bytestrings) List of changeset nodes whose presence to
361 393 query.
362 394
363 395 The response is a bytestring where each byte contains a 0 or 1 for the
364 396 corresponding requested node at the same index.
365 397
366 398 TODO use a bit array for even more compact response
367 399
368 400 listkeys
369 401 --------
370 402
371 403 List values in a specified ``pushkey`` namespace.
372 404
373 405 The command receives the following arguments:
374 406
375 407 namespace
376 408 (bytestring) Pushkey namespace to query.
377 409
378 410 The response is a map with bytestring keys and values.
379 411
380 412 TODO consider using binary to represent nodes in certain pushkey namespaces.
381 413
382 414 lookup
383 415 ------
384 416
385 417 Try to resolve a value to a changeset revision.
386 418
387 419 Unlike ``known`` which operates on changeset nodes, lookup operates on
388 420 node fragments and other names that a user may use.
389 421
390 422 The command receives the following arguments:
391 423
392 424 key
393 425 (bytestring) Value to try to resolve.
394 426
395 427 On success, returns a bytestring containing the resolved node.
396 428
397 429 manifestdata
398 430 ------------
399 431
400 432 Obtain various data related to manifests (which are lists of files in
401 433 a revision).
402 434
403 435 The command accepts the following arguments:
404 436
405 437 fields
406 438 (set of bytestring) Which data associated with manifests to fetch.
407 439 The following values are recognized:
408 440
409 441 parents
410 442 Parent nodes for the manifest.
411 443
412 444 revision
413 445 The raw revision data for the manifest.
414 446
415 447 haveparents
416 448 (bool) Whether the client has the parent revisions of all requested
417 449 nodes. If set, the server may emit revision data as deltas against
418 450 any parent revision. If not set, the server MUST only emit deltas for
419 451 revisions previously emitted by this command.
420 452
421 453 False is assumed in the absence of any value.
422 454
423 455 nodes
424 456 (array of bytestring) Manifest nodes whose data to retrieve.
425 457
426 458 tree
427 459 (bytestring) Path to manifest to retrieve. The empty bytestring represents
428 460 the root manifest. All other values represent directories/trees within
429 461 the repository.
430 462
431 463 TODO allow specifying revisions via alternate means (such as from changeset
432 464 revisions or ranges)
433 465 TODO consider recursive expansion of manifests (with path filtering for
434 466 narrow use cases)
435 467
436 468 The response bytestream starts with a CBOR map describing the data that
437 469 follows. It has the following bytestring keys:
438 470
439 471 totalitems
440 472 (unsigned integer) Total number of manifest revisions whose data is
441 473 being returned.
442 474
443 475 Following the map header is a series of 0 or more CBOR values. If values
444 476 are present, the first value will always be a map describing a single manifest
445 477 revision.
446 478
447 479 If the ``fieldsfollowing`` key is present, the map will immediately be followed
448 480 by N CBOR bytestring values, where N is the number of elements in
449 481 ``fieldsfollowing``. Each bytestring value corresponds to a field denoted
450 482 by ``fieldsfollowing``.
451 483
452 484 Following the optional bytestring field values is the next revision descriptor
453 485 map, or end of stream.
454 486
455 487 Each revision descriptor map has the following bytestring keys:
456 488
457 489 node
458 490 (bytestring) The node of the manifest revision whose data is represented.
459 491
460 492 deltabasenode
461 493 (bytestring) The node that the delta representation of this revision is
462 494 computed against. Only present if the ``revision`` field is requested and
463 495 a delta is being emitted.
464 496
465 497 fieldsfollowing
466 498 (array of 2-array) Denotes extra bytestring fields that following this map.
467 499 See the documentation for ``changesetdata`` for semantics.
468 500
469 501 The following named fields may be present:
470 502
471 503 ``delta``
472 504 The delta data to use to construct the fulltext revision.
473 505
474 506 Only present if the ``revision`` field is requested and a delta is
475 507 being emitted. The ``deltabasenode`` top-level key will also be
476 508 present if this field is being emitted.
477 509
478 510 ``revision``
479 511 The fulltext revision data for this manifest. Only present if the
480 512 ``revision`` field is requested and a fulltext revision is being emitted.
481 513
482 514 parents
483 515 (array of bytestring) The nodes of the parents of this manifest revision.
484 516 Only present if the ``parents`` field is requested.
485 517
486 518 When ``revision`` data is requested, the server chooses to emit either fulltext
487 519 revision data or a delta. What the server decides can be inferred by looking
488 520 for the presence of ``delta`` or ``revision`` in the ``fieldsfollowing`` array.
489 521
490 522 pushkey
491 523 -------
492 524
493 525 Set a value using the ``pushkey`` protocol.
494 526
495 527 The command receives the following arguments:
496 528
497 529 namespace
498 530 (bytestring) Pushkey namespace to operate on.
499 531 key
500 532 (bytestring) The pushkey key to set.
501 533 old
502 534 (bytestring) Old value for this key.
503 535 new
504 536 (bytestring) New value for this key.
505 537
506 538 TODO consider using binary to represent nodes is certain pushkey namespaces.
507 539 TODO better define response type and meaning.
General Comments 0
You need to be logged in to leave comments. Login now