upstream/mercurial-mirror Files · mercurial/help/internals/cbor.txt

exchangev2: fetch file revisions...

exchangev2: fetch file revisions Now that the server has an API for fetching file data, we can call into it to fetch file revisions. The implementation is relatively straightforward: we examine the manifests that we fetched and find all new file revisions referenced by them. We build up a mapping from file path to file nodes to manifest node. (The mapping to first manifest node allows us to map back to first changelog node/revision, which is used for the linkrev.) Once that map is built up, we iterate over it in a deterministic manner and fetch and store file data. The code is very similar to manifest fetching. So similar that we could probably extract the common bits into a generic function. With file data retrieval implemented, `hg clone` and `hg pull` are effectively feature complete, at least as far as the completeness of data transfer for essential repository data (changesets, manifests, files, phases, and bookmarks). We're still missing support for obsolescence markers, the hgtags fnodes cache, and the branchmap cache. But these are non-essential for the moment (and will be implemented later). This is a good point to assess the state of exchangev2 in terms of performance. I ran a local `hg clone` for the mozilla-unified repository using both version 1 and version 2 of the wire protocols and exchange methods. This is effectively comparing the performance of the wire protocol overhead and "getbundle" versus domain-specific commands. Wire protocol version 2 doesn't have compression implemented yet. So I tested version 1 with `server.compressionengines=none` to remove compression overhead from the equation. server before: user 220.420+0.000 sys 14.420+0.000 after: user 321.980+0.000 sys 18.990+0.000 client before: real 561.650 secs (user 497.670+0.000 sys 28.160+0.000) after: real 1226.260 secs (user 944.240+0.000 sys 354.150+0.000) We have substantial regressions on both client and server. This is obviously not desirable. I'm aware of some reasons: * Lack of hgtagsfnodes transfer (contributes significant CPU to client). * Lack of branch cache transfer (contributes significant CPU to client). * Little to no profiling / optimization performed on wire protocol version 2 code. * There appears to be a memory leak on the client and that is likely causing swapping on my machine. * Using multiple threads on the client may be counter-productive because Python. * We're not compressing on the server. * We're tracking file nodes on the client via manifest diffing rather than using linkrev shortcuts on the server. I'm pretty confident that most of these issues are addressable. But even if we can't get wire protocol version 2 on performance parity with "getbundle," I still think it is important to have the set of low level data-specific retrieval commands that we have implemented so far. This is because the existence of such commands allows flexibility in how clients access server data. Differential Revision: https://phab.mercurial-scm.org/D4491

Gregory Szorc - - Load All Authors

File last commit:

r39446:2fe21c65 default


                r39676:039bf1ed

default

Download file

             cbor.txt
        
                    130 lines
            
             | 3.8 KiB
            
                | text/plain
            
             |
                TextLexer

/ mercurial / help / internals / cbor.txt

History | Source | Raw |Copy content |Copy permalink

Gregory Szorc internals: document CBOR utilization...	r39446	Mercurial uses Concise Binary Object Representation (CBOR)
		(RFC 7049) for various data formats.

		This document describes the subset of CBOR that Mercurial uses and
		gives recommendations for appropriate use of CBOR within Mercurial.

		Type Limitations
		================

		Major types 0 and 1 (unsigned integers and negative integers) MUST be
		fully supported.

		Major type 2 (byte strings) MUST be fully supported. However, there
		are limitations around the use of indefinite-length byte strings.
		(See below.)

		Major type 3 (text strings) are NOT supported.

		Major type 4 (arrays) MUST be supported. However, values are limited
		to the set of types described in the "Container Types" section below.
		And indefinite-length arrays are NOT supported.

		Major type 5 (maps) MUST be supported. However, key values are limited
		to the set of types described in the "Container Types" section below.
		And indefinite-length maps are NOT supported.

		Major type 6 (semantic tagging of major types) can be used with the
		following semantic tag values:

		258
		Mathematical finite set. Suitable for representing Python's
		``set`` type.

		All other semantic tag values are not allowed.

		Major type 7 (simple data types) can be used with the following
		type values:

		20
		False
		21
		True
		22
		Null
		31
		Break stop code (for indefinite-length items).

		All other simple data type values (including every value requiring the
		1 byte extension) are disallowed.

		Indefinite-Length Byte Strings
		==============================

		Indefinite-length byte strings (major type 2) are allowed. However,
		they MUST NOT occur inside a container type (such as an array or map).
		i.e. they can only occur as the "top-most" element in a stream of
		values.

		Encoders and decoders SHOULD stream indefinite-length byte strings.
		i.e. an encoder or decoder SHOULD NOT buffer the entirety of a long
		byte string value when indefinite-length byte strings are being used
		if it can be avoided. Mercurial MAY use extremely long indefinite-length
		byte strings and buffering the source or destination value COULD lead to
		memory exhaustion.

		Chunks in an indefinite-length byte string SHOULD NOT exceed 2^20
		bytes.

		Container Types
		===============

		Mercurial may use the array (major type 4), map (major type 5), and
		set (semantic tag 258 plus major type 4 array) container types.

		An array may contain any supported type as values.

		A map MUST only use the following types as keys:

		* unsigned integers (major type 0)
		* negative integers (major type 1)
		* byte strings (major type 2) (but not indefinite-length byte strings)
		* false (simple type 20)
		* true (simple type 21)
		* null (simple type 22)

		A map MUST only use the following types as values:

		* all types supported as map keys
		* arrays
		* maps
		* sets

		A set may only use the following types as values:

		* all types supported as map keys

		It is recommended that keys in maps and values in sets and arrays all
		be of a uniform type.

		Avoiding Large Byte Strings
		===========================

		The use of large byte strings is discouraged, especially in scenarios where
		the total size of the byte string may by unbound for some inputs (e.g. when
		representing the content of a tracked file). It is highly recommended to use
		indefinite-length byte strings for these purposes.

		Since indefinite-length byte strings cannot be nested within an outer
		container (such as an array or map), to associate a large byte string
		with another data structure, it is recommended to use an array or
		map followed immediately by an indefinite-length byte string. For example,
		instead of the following map::

		{
		"key1": "value1",
		"key2": "value2",
		"long_value": "some very large value...",
		}

		Use a map followed by a byte string:

		{
		"key1": "value1",
		"key2": "value2",
		"value_follows": True,
		}
		<BEGIN INDEFINITE-LENGTH BYTE STRING>
		"some very large value"
		"..."
		<END INDEFINITE-LENGTH BYTE STRING>

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages