##// END OF EJS Templates
help: document wire protocol "handshake" protocol...
Gregory Szorc -
r29864:f0d47aca default
parent child Browse files
Show More
@@ -1,370 +1,417 b''
1 1 The Mercurial wire protocol is a request-response based protocol
2 2 with multiple wire representations.
3 3
4 4 Each request is modeled as a command name, a dictionary of arguments, and
5 5 optional raw input. Command arguments and their types are intrinsic
6 6 properties of commands. So is the response type of the command. This means
7 7 clients can't always send arbitrary arguments to servers and servers can't
8 8 return multiple response types.
9 9
10 10 The protocol is synchronous and does not support multiplexing (concurrent
11 11 commands).
12 12
13 13 Transport Protocols
14 14 ===================
15 15
16 16 HTTP Transport
17 17 --------------
18 18
19 19 Commands are issued as HTTP/1.0 or HTTP/1.1 requests. Commands are
20 20 sent to the base URL of the repository with the command name sent in
21 21 the ``cmd`` query string parameter. e.g.
22 22 ``https://example.com/repo?cmd=capabilities``. The HTTP method is ``GET``
23 23 or ``POST`` depending on the command and whether there is a request
24 24 body.
25 25
26 26 Command arguments can be sent multiple ways.
27 27
28 28 The simplest is part of the URL query string using ``x-www-form-urlencoded``
29 29 encoding (see Python's ``urllib.urlencode()``. However, many servers impose
30 30 length limitations on the URL. So this mechanism is typically only used if
31 31 the server doesn't support other mechanisms.
32 32
33 33 If the server supports the ``httpheader`` capability, command arguments can
34 34 be sent in HTTP request headers named ``X-HgArg-<N>`` where ``<N>`` is an
35 35 integer starting at 1. A ``x-www-form-urlencoded`` representation of the
36 36 arguments is obtained. This full string is then split into chunks and sent
37 37 in numbered ``X-HgArg-<N>`` headers. The maximum length of each HTTP header
38 38 is defined by the server in the ``httpheader`` capability value, which defaults
39 39 to ``1024``. The server reassembles the encoded arguments string by
40 40 concatenating the ``X-HgArg-<N>`` headers then URL decodes them into a
41 41 dictionary.
42 42
43 43 The list of ``X-HgArg-<N>`` headers should be added to the ``Vary`` request
44 44 header to instruct caches to take these headers into consideration when caching
45 45 requests.
46 46
47 47 If the server supports the ``httppostargs`` capability, the client
48 48 may send command arguments in the HTTP request body as part of an
49 49 HTTP POST request. The command arguments will be URL encoded just like
50 50 they would for sending them via HTTP headers. However, no splitting is
51 51 performed: the raw arguments are included in the HTTP request body.
52 52
53 53 The client sends a ``X-HgArgs-Post`` header with the string length of the
54 54 encoded arguments data. Additional data may be included in the HTTP
55 55 request body immediately following the argument data. The offset of the
56 56 non-argument data is defined by the ``X-HgArgs-Post`` header. The
57 57 ``X-HgArgs-Post`` header is not required if there is no argument data.
58 58
59 59 Additional command data can be sent as part of the HTTP request body. The
60 60 default ``Content-Type`` when sending data is ``application/mercurial-0.1``.
61 61 A ``Content-Length`` header is currently always sent.
62 62
63 63 Example HTTP requests::
64 64
65 65 GET /repo?cmd=capabilities
66 66 X-HgArg-1: foo=bar&baz=hello%20world
67 67
68 68 The ``Content-Type`` HTTP response header identifies the response as coming
69 69 from Mercurial and can also be used to signal an error has occurred.
70 70
71 71 The ``application/mercurial-0.1`` media type indicates a generic Mercurial
72 72 response. It matches the media type sent by the client.
73 73
74 74 The ``application/hg-error`` media type indicates a generic error occurred.
75 75 The content of the HTTP response body typically holds text describing the
76 76 error.
77 77
78 78 The ``application/hg-changegroup`` media type indicates a changegroup response
79 79 type.
80 80
81 81 Clients also accept the ``text/plain`` media type. All other media
82 82 types should cause the client to error.
83 83
84 84 Clients should issue a ``User-Agent`` request header that identifies the client.
85 85 The server should not use the ``User-Agent`` for feature detection.
86 86
87 87 A command returning a ``string`` response issues the
88 88 ``application/mercurial-0.1`` media type and the HTTP response body contains
89 89 the raw string value. A ``Content-Length`` header is typically issued.
90 90
91 91 A command returning a ``stream`` response issues the
92 92 ``application/mercurial-0.1`` media type and the HTTP response is typically
93 93 using *chunked transfer* (``Transfer-Encoding: chunked``).
94 94
95 95 SSH Transport
96 96 =============
97 97
98 98 The SSH transport is a custom text-based protocol suitable for use over any
99 99 bi-directional stream transport. It is most commonly used with SSH.
100 100
101 101 A SSH transport server can be started with ``hg serve --stdio``. The stdin,
102 102 stderr, and stdout file descriptors of the started process are used to exchange
103 103 data. When Mercurial connects to a remote server over SSH, it actually starts
104 104 a ``hg serve --stdio`` process on the remote server.
105 105
106 106 Commands are issued by sending the command name followed by a trailing newline
107 107 ``\n`` to the server. e.g. ``capabilities\n``.
108 108
109 109 Command arguments are sent in the following format::
110 110
111 111 <argument> <length>\n<value>
112 112
113 113 That is, the argument string name followed by a space followed by the
114 114 integer length of the value (expressed as a string) followed by a newline
115 115 (``\n``) followed by the raw argument value.
116 116
117 117 Dictionary arguments are encoded differently::
118 118
119 119 <argument> <# elements>\n
120 120 <key1> <length1>\n<value1>
121 121 <key2> <length2>\n<value2>
122 122 ...
123 123
124 124 Non-argument data is sent immediately after the final argument value. It is
125 125 encoded in chunks::
126 126
127 127 <length>\n<data>
128 128
129 129 Each command declares a list of supported arguments and their types. If a
130 130 client sends an unknown argument to the server, the server should abort
131 131 immediately. The special argument ``*`` in a command's definition indicates
132 132 that all argument names are allowed.
133 133
134 134 The definition of supported arguments and types is initially made when a
135 135 new command is implemented. The client and server must initially independently
136 136 agree on the arguments and their types. This initial set of arguments can be
137 137 supplemented through the presence of *capabilities* advertised by the server.
138 138
139 139 Each command has a defined expected response type.
140 140
141 141 A ``string`` response type is a length framed value. The response consists of
142 142 the string encoded integer length of a value followed by a newline (``\n``)
143 143 followed by the value. Empty values are allowed (and are represented as
144 144 ``0\n``).
145 145
146 146 A ``stream`` response type consists of raw bytes of data. There is no framing.
147 147
148 148 A generic error response type is also supported. It consists of a an error
149 149 message written to ``stderr`` followed by ``\n-\n``. In addition, ``\n`` is
150 150 written to ``stdout``.
151 151
152 152 If the server receives an unknown command, it will send an empty ``string``
153 153 response.
154 154
155 155 The server terminates if it receives an empty command (a ``\n`` character).
156 156
157 157 Capabilities
158 158 ============
159 159
160 160 Servers advertise supported wire protocol features. This allows clients to
161 161 probe for server features before blindly calling a command or passing a
162 162 specific argument.
163 163
164 164 The server's features are exposed via a *capabilities* string. This is a
165 165 space-delimited string of tokens/features. Some features are single words
166 166 like ``lookup`` or ``batch``. Others are complicated key-value pairs
167 167 advertising sub-features. e.g. ``httpheader=2048``. When complex, non-word
168 168 values are used, each feature name can define its own encoding of sub-values.
169 169 Comma-delimited and ``x-www-form-urlencoded`` values are common.
170 170
171 171 The following document capabilities defined by the canonical Mercurial server
172 172 implementation.
173 173
174 174 batch
175 175 -----
176 176
177 177 Whether the server supports the ``batch`` command.
178 178
179 179 This capability/command was introduced in Mercurial 1.9 (released July 2011).
180 180
181 181 branchmap
182 182 ---------
183 183
184 184 Whether the server supports the ``branchmap`` command.
185 185
186 186 This capability/command was introduced in Mercurial 1.3 (released July 2009).
187 187
188 188 bundle2-exp
189 189 -----------
190 190
191 191 Precursor to ``bundle2`` capability that was used before bundle2 was a
192 192 stable feature.
193 193
194 194 This capability was introduced in Mercurial 3.0 behind an experimental
195 195 flag. This capability should not be observed in the wild.
196 196
197 197 bundle2
198 198 -------
199 199
200 200 Indicates whether the server supports the ``bundle2`` data exchange format.
201 201
202 202 The value of the capability is a URL quoted, newline (``\n``) delimited
203 203 list of keys or key-value pairs.
204 204
205 205 A key is simply a URL encoded string.
206 206
207 207 A key-value pair is a URL encoded key separated from a URL encoded value by
208 208 an ``=``. If the value is a list, elements are delimited by a ``,`` after
209 209 URL encoding.
210 210
211 211 For example, say we have the values::
212 212
213 213 {'HG20': [], 'changegroup': ['01', '02'], 'digests': ['sha1', 'sha512']}
214 214
215 215 We would first construct a string::
216 216
217 217 HG20\nchangegroup=01,02\ndigests=sha1,sha512
218 218
219 219 We would then URL quote this string::
220 220
221 221 HG20%0Achangegroup%3D01%2C02%0Adigests%3Dsha1%2Csha512
222 222
223 223 This capability was introduced in Mercurial 3.4 (released May 2015).
224 224
225 225 changegroupsubset
226 226 -----------------
227 227
228 228 Whether the server supports the ``changegroupsubset`` command.
229 229
230 230 This capability was introduced in Mercurial 0.9.2 (released December
231 231 2006).
232 232
233 233 This capability was introduced at the same time as the ``lookup``
234 234 capability/command.
235 235
236 236 getbundle
237 237 ---------
238 238
239 239 Whether the server supports the ``getbundle`` command.
240 240
241 241 This capability was introduced in Mercurial 1.9 (released July 2011).
242 242
243 243 httpheader
244 244 ----------
245 245
246 246 Whether the server supports receiving command arguments via HTTP request
247 247 headers.
248 248
249 249 The value of the capability is an integer describing the max header
250 250 length that clients should send. Clients should ignore any content after a
251 251 comma in the value, as this is reserved for future use.
252 252
253 253 This capability was introduced in Mercurial 1.9 (released July 2011).
254 254
255 255 httppostargs
256 256 ------------
257 257
258 258 **Experimental**
259 259
260 260 Indicates that the server supports and prefers clients send command arguments
261 261 via a HTTP POST request as part of the request body.
262 262
263 263 This capability was introduced in Mercurial 3.8 (released May 2016).
264 264
265 265 known
266 266 -----
267 267
268 268 Whether the server supports the ``known`` command.
269 269
270 270 This capability/command was introduced in Mercurial 1.9 (released July 2011).
271 271
272 272 lookup
273 273 ------
274 274
275 275 Whether the server supports the ``lookup`` command.
276 276
277 277 This capability was introduced in Mercurial 0.9.2 (released December
278 278 2006).
279 279
280 280 This capability was introduced at the same time as the ``changegroupsubset``
281 281 capability/command.
282 282
283 283 pushkey
284 284 -------
285 285
286 286 Whether the server supports the ``pushkey`` and ``listkeys`` commands.
287 287
288 288 This capability was introduced in Mercurial 1.6 (released July 2010).
289 289
290 290 standardbundle
291 291 --------------
292 292
293 293 **Unsupported**
294 294
295 295 This capability was introduced during the Mercurial 0.9.2 development cycle in
296 296 2006. It was never present in a release, as it was replaced by the ``unbundle``
297 297 capability. This capability should not be encountered in the wild.
298 298
299 299 stream-preferred
300 300 ----------------
301 301
302 302 If present the server prefers that clients clone using the streaming clone
303 303 protocol (``hg clone --uncompressed``) rather than the standard
304 304 changegroup/bundle based protocol.
305 305
306 306 This capability was introduced in Mercurial 2.2 (released May 2012).
307 307
308 308 streamreqs
309 309 ----------
310 310
311 311 Indicates whether the server supports *streaming clones* and the *requirements*
312 312 that clients must support to receive it.
313 313
314 314 If present, the server supports the ``stream_out`` command, which transmits
315 315 raw revlogs from the repository instead of changegroups. This provides a faster
316 316 cloning mechanism at the expense of more bandwidth used.
317 317
318 318 The value of this capability is a comma-delimited list of repo format
319 319 *requirements*. These are requirements that impact the reading of data in
320 320 the ``.hg/store`` directory. An example value is
321 321 ``streamreqs=generaldelta,revlogv1`` indicating the server repo requires
322 322 the ``revlogv1`` and ``generaldelta`` requirements.
323 323
324 324 If the only format requirement is ``revlogv1``, the server may expose the
325 325 ``stream`` capability instead of the ``streamreqs`` capability.
326 326
327 327 This capability was introduced in Mercurial 1.7 (released November 2010).
328 328
329 329 stream
330 330 ------
331 331
332 332 Whether the server supports *streaming clones* from ``revlogv1`` repos.
333 333
334 334 If present, the server supports the ``stream_out`` command, which transmits
335 335 raw revlogs from the repository instead of changegroups. This provides a faster
336 336 cloning mechanism at the expense of more bandwidth used.
337 337
338 338 This capability was introduced in Mercurial 0.9.1 (released July 2006).
339 339
340 340 When initially introduced, the value of the capability was the numeric
341 341 revlog revision. e.g. ``stream=1``. This indicates the changegroup is using
342 342 ``revlogv1``. This simple integer value wasn't powerful enough, so the
343 343 ``streamreqs`` capability was invented to handle cases where the repo
344 344 requirements have more than just ``revlogv1``. Newer servers omit the
345 345 ``=1`` since it was the only value supported and the value of ``1`` can
346 346 be implied by clients.
347 347
348 348 unbundlehash
349 349 ------------
350 350
351 351 Whether the ``unbundle`` commands supports receiving a hash of all the
352 352 heads instead of a list.
353 353
354 354 For more, see the documentation for the ``unbundle`` command.
355 355
356 356 This capability was introduced in Mercurial 1.9 (released July 2011).
357 357
358 358 unbundle
359 359 --------
360 360
361 361 Whether the server supports pushing via the ``unbundle`` command.
362 362
363 363 This capability/command has been present since Mercurial 0.9.1 (released
364 364 July 2006).
365 365
366 366 Mercurial 0.9.2 (released December 2006) added values to the capability
367 367 indicating which bundle types the server supports receiving. This value is a
368 368 comma-delimited list. e.g. ``HG10GZ,HG10BZ,HG10UN``. The order of values
369 369 reflects the priority/preference of that type, where the first value is the
370 370 most preferred type.
371
372 Handshake Protocol
373 ==================
374
375 While not explicitly required, it is common for clients to perform a
376 *handshake* when connecting to a server. The handshake accomplishes 2 things:
377
378 * Obtaining capabilities and other server features
379 * Flushing extra server output (e.g. SSH servers may print extra text
380 when connecting that may confuse the wire protocol)
381
382 This isn't a traditional *handshake* as far as network protocols go because
383 there is no persistent state as a result of the handshake: the handshake is
384 simply the issuing of commands and commands are stateless.
385
386 The canonical clients perform a capabilities lookup at connection establishment
387 time. This is because clients must assume a server only supports the features
388 of the original Mercurial server implementation until proven otherwise (from
389 advertised capabilities). Nearly every server running today supports features
390 that weren't present in the original Mercurial server implementation. Rather
391 than wait for a client to perform functionality that needs to consult
392 capabilities, it issues the lookup at connection start to avoid any delay later.
393
394 For HTTP servers, the client sends a ``capabilities`` command request as
395 soon as the connection is established. The server responds with a capabilities
396 string, which the client parses.
397
398 For SSH servers, the client sends the ``hello`` command (no arguments)
399 and a ``between`` command with the ``pairs`` argument having the value
400 ``0000000000000000000000000000000000000000-0000000000000000000000000000000000000000``.
401
402 The ``between`` command has been supported since the original Mercurial
403 server. Requesting the empty range will return a ``\n`` string response,
404 which will be encoded as ``1\n\n`` (value length of ``1`` followed by a newline
405 followed by the value, which happens to be a newline).
406
407 The ``hello`` command was later introduced. Servers supporting it will issue
408 a response to that command before sending the ``1\n\n`` response to the
409 ``between`` command. Servers not supporting ``hello`` will send an empty
410 response (``0\n``).
411
412 In addition to the expected output from the ``hello`` and ``between`` commands,
413 servers may also send other output, such as *message of the day (MOTD)*
414 announcements. Clients assume servers will send this output before the
415 Mercurial server replies to the client-issued commands. So any server output
416 not conforming to the expected command responses is assumed to be not related
417 to Mercurial and can be ignored.
General Comments 0
You need to be logged in to leave comments. Login now