upstream/mercurial-mirror Files · mercurial/help/internals/cbor.txt

discovery: slowly increase sampling size...

discovery: slowly increase sampling size Some pathological discovery runs can requires many roundtrip. When this happens things can get very slow. To make the algorithm more resilience again such pathological case. We slowly increase the sample size with each roundtrip (+5%). This will have a negligible impact on "normal" discovery with few roundtrips, but a large positive impact of case with many roundtrips. Asking more question per roundtrip helps to reduce the undecided set faster. Instead of reducing the undecided set a linear speed (in the worst case), we reduce it as a guaranteed (small) exponential rate. The data below show this slow ramp up in sample size: round trip | 1 | 5 | 10 | 20 | 50 | 100 | 130 | sample size | 200 | 254 | 321 | 517 | 2 199 | 25 123 | 108 549 | covered nodes | 200 | 1 357 | 2 821 | 7 031 | 42 658 | 524 530 | 2 276 755 | To be a bit more concrete, lets take a very pathological case as an example. We are doing discovery from a copy of Mozilla-try to a more recent version of mozilla-unified. Mozilla-unified heads are unknown to the mozilla-try repo and there are over 1 million "missing" changesets. (the discovery is "local" to avoid network interference) Without this change, the discovery: - last 1858 seconds (31 minutes), - does 1700 round trip, - asking about 340 000 nodes. With this change, the discovery: - last 218 seconds (3 minutes, 38 seconds a -88% improvement), - does 94 round trip (-94%), - asking about 344 211 nodes (+1%). Of course, this is an extreme case (and 3 minutes is still slow). However this give a good example of how this sample size increase act as a safety net catching any bad situations. We could image a steeper increase than 5%. For example 10% would give the following number: round trip | 1 | 5 | 10 | 20 | 50 | 75 | 100 | sample size | 200 | 321 | 514 | 1 326 | 23 060 | 249 812 | 2 706 594 | covered nodes | 200 | 1 541 | 3 690 | 12 671 | 251 871 | 2 746 254 | 29 770 966 | In parallel, it is useful to understand these pathological cases and improve them. However the current change provides a general purpose safety net to smooth the impact of pathological cases. To avoid issue with older http server, the increase in sample size only occurs if the protocol has not limit on command argument size.

Gregory Szorc - - Load All Authors

File last commit:

r39446:2fe21c65 default


                r42546:dbd0fcca

default

Download file

             cbor.txt
        
                    130 lines
            
             | 3.8 KiB
            
                | text/plain
            
             |
                TextLexer

/ mercurial / help / internals / cbor.txt

History | Annotation | Raw |Copy content |Copy permalink

				Mercurial uses Concise Binary Object Representation (CBOR)
				(RFC 7049) for various data formats.

				This document describes the subset of CBOR that Mercurial uses and
				gives recommendations for appropriate use of CBOR within Mercurial.

				Type Limitations
				================

				Major types 0 and 1 (unsigned integers and negative integers) MUST be
				fully supported.

				Major type 2 (byte strings) MUST be fully supported. However, there
				are limitations around the use of indefinite-length byte strings.
				(See below.)

				Major type 3 (text strings) are NOT supported.

				Major type 4 (arrays) MUST be supported. However, values are limited
				to the set of types described in the "Container Types" section below.
				And indefinite-length arrays are NOT supported.

				Major type 5 (maps) MUST be supported. However, key values are limited
				to the set of types described in the "Container Types" section below.
				And indefinite-length maps are NOT supported.

				Major type 6 (semantic tagging of major types) can be used with the
				following semantic tag values:

				258
				Mathematical finite set. Suitable for representing Python's
				``set`` type.

				All other semantic tag values are not allowed.

				Major type 7 (simple data types) can be used with the following
				type values:

				20
				False
				21
				True
				22
				Null
				31
				Break stop code (for indefinite-length items).

				All other simple data type values (including every value requiring the
				1 byte extension) are disallowed.

				Indefinite-Length Byte Strings
				==============================

				Indefinite-length byte strings (major type 2) are allowed. However,
				they MUST NOT occur inside a container type (such as an array or map).
				i.e. they can only occur as the "top-most" element in a stream of
				values.

				Encoders and decoders SHOULD stream indefinite-length byte strings.
				i.e. an encoder or decoder SHOULD NOT buffer the entirety of a long
				byte string value when indefinite-length byte strings are being used
				if it can be avoided. Mercurial MAY use extremely long indefinite-length
				byte strings and buffering the source or destination value COULD lead to
				memory exhaustion.

				Chunks in an indefinite-length byte string SHOULD NOT exceed 2^20
				bytes.

				Container Types
				===============

				Mercurial may use the array (major type 4), map (major type 5), and
				set (semantic tag 258 plus major type 4 array) container types.

				An array may contain any supported type as values.

				A map MUST only use the following types as keys:

				* unsigned integers (major type 0)
				* negative integers (major type 1)
				* byte strings (major type 2) (but not indefinite-length byte strings)
				* false (simple type 20)
				* true (simple type 21)
				* null (simple type 22)

				A map MUST only use the following types as values:

				* all types supported as map keys
				* arrays
				* maps
				* sets

				A set may only use the following types as values:

				* all types supported as map keys

				It is recommended that keys in maps and values in sets and arrays all
				be of a uniform type.

				Avoiding Large Byte Strings
				===========================

				The use of large byte strings is discouraged, especially in scenarios where
				the total size of the byte string may by unbound for some inputs (e.g. when
				representing the content of a tracked file). It is highly recommended to use
				indefinite-length byte strings for these purposes.

				Since indefinite-length byte strings cannot be nested within an outer
				container (such as an array or map), to associate a large byte string
				with another data structure, it is recommended to use an array or
				map followed immediately by an indefinite-length byte string. For example,
				instead of the following map::

				{
				"key1": "value1",
				"key2": "value2",
				"long_value": "some very large value...",
				}

				Use a map followed by a byte string:

				{
				"key1": "value1",
				"key2": "value2",
				"value_follows": True,
				}
				<BEGIN INDEFINITE-LENGTH BYTE STRING>
				"some very large value"
				"..."
				<END INDEFINITE-LENGTH BYTE STRING>

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages