##// END OF EJS Templates
match: sort patterns before compiling them into a regex...
match: sort patterns before compiling them into a regex While investigating cripping performance for `hg cat` in some context, I discovered that, for large inputs, building a regex from out of order patterns result may result in a *much* slower regex and a much slower associated matcher's performance. So we are now sorting the patterns to help the regex engine. There is more to the story as we rely on regexp more than we should. See the next changeset for details. Benchmarks ========== In the following benchmark we are comparing the `hg cat` and `hg files` run time when matching against the full list of files in the repository. They are run: - without the rust extensions - with the standard python enfine (so without re2) sort vs non-sorted - Before this changeset (3f5137543773) --------------------------------------------------------- ###### hg files ############################################################### ### mercurial-2018-08-01-zstd-sparse-revlog sorted: 0.230092 seconds shuffled: 0.234235 seconds (+1.80%) ### pypy-2018-08-01-zstd-sparse-revlog sorted: 0.613567 seconds shuffled: 0.801880 seconds (+30.69%) ### mozilla-central-2018-08-01-zstd-sparse-revlog sorted: 62.474221 seconds shuffled: 1364.180218 seconds (+2083.59%) ### netbeans-2018-08-01-zstd-sparse-revlog sorted: 21.541828 seconds shuffled: 172.759857 seconds (+701.97%) ###### hg cat ################################################################# ### mercurial-2018-08-01-zstd-sparse-revlog sorted: 0.764407 seconds shuffled: 0.768924 seconds ### pypy-2018-08-01-zstd-sparse-revlog sorted: 2.065220 seconds shuffled: 2.276388 seconds (+10.22%) ### netbeans-2018-08-01-zstd-sparse-revlog sorted: 40.967983 seconds shuffled: 216.388709 seconds (+428.19%) ### mozilla-central-2018-08-01-zstd-sparse-revlog sorted: 105.228510 seconds shuffled: 1448.722784 seconds (+1276.74%) sort vs non-sorted - With this changeset ---------------------------------------- ###### hg files ############################################################### ### mercurial-2018-08-01-zstd-sparse-revlog all-list-pattern-sorted: 0.230069 all-list-pattern-shuffled: 0.231165 ### pypy-2018-08-01-zstd-sparse-revlog all-list-pattern-sorted: 0.616799 all-list-pattern-shuffled: 0.616393 ### netbeans-2018-08-01-zstd-sparse-revlog all-list-pattern-sorted: 21.586773 all-list-pattern-shuffled: 21.908197 ### mozilla-central-2018-08-01-zstd-sparse-revlog all-list-pattern-sorted: 61.279490 all-list-pattern-shuffled: 62.473549 ###### hg cat ################################################################# ### mercurial-2018-08-01-zstd-sparse-revlog sorted: 0.763883 seconds shuffled: 0.765848 seconds ### pypy-2018-08-01-zstd-sparse-revlog sorted: 2.070498 seconds shuffled: 2.069197 seconds ### netbeans-2018-08-01-zstd-sparse-revlog sorted: 41.392423 seconds shuffled: 41.648689 seconds ### mozilla-central-2018-08-01-zstd-sparse-revlog sorted: 103.315670 seconds shuffled: 104.369358 seconds

File last commit:

r44031:2e017696 default
r51285:47686726 stable
Show More
bundles.txt
93 lines | 3.2 KiB | text/plain | TextLexer
A bundle is a container for repository data.
Bundles are used as standalone files as well as the interchange format
over the wire protocol used when two Mercurial peers communicate with
each other.
Headers
=======
Bundles produced since Mercurial 0.7 (September 2005) have a 4 byte
header identifying the major bundle type. The header always begins with
``HG`` and the follow 2 bytes indicate the bundle type/version. Some
bundle types have additional data after this 4 byte header.
The following sections describe each bundle header/type.
HG10
----
``HG10`` headers indicate a *changegroup bundle*. This is the original
bundle format, so it is sometimes referred to as *bundle1*. It has been
present since version 0.7 (released September 2005).
This header is followed by 2 bytes indicating the compression algorithm
used for data that follows. All subsequent data following this
compression identifier is compressed according to the algorithm/method
specified.
Supported algorithms include the following.
``BZ``
*bzip2* compression.
Bzip2 compressors emit a leading ``BZ`` header. Mercurial uses this
leading ``BZ`` as part of the bundle header. Therefore consumers
of bzip2 bundles need to *seed* the bzip2 decompressor with ``BZ`` or
seek the input stream back to the beginning of the algorithm component
of the bundle header so that decompressor input is valid. This behavior
is unique among supported compression algorithms.
Supported since version 0.7 (released December 2006).
``GZ``
*zlib* compression.
Supported since version 0.9.2 (released December 2006).
``UN``
*Uncompressed* or no compression. Unmodified changegroup data follows.
Supported since version 0.9.2 (released December 2006).
3rd party extensions may implement their own compression. However, no
authority reserves values for their compression algorithm identifiers.
HG2X
----
``HG2X`` headers (where ``X`` is any value) denote a *bundle2* bundle.
Bundle2 bundles are a container format for various kinds of repository
data and capabilities, beyond changegroup data (which was the only data
supported by ``HG10`` bundles.
``HG20`` is currently the only defined bundle2 version.
The ``HG20`` format is documented at :hg:`help internals.bundle2`.
Initial ``HG20`` support was added in Mercurial 3.0 (released May
2014). However, bundle2 bundles were hidden behind an experimental flag
until version 3.5 (released August 2015), when they were enabled in the
wire protocol. Various commands (including ``hg bundle``) did not
support generating bundle2 files until Mercurial 3.6 (released November
2015).
HGS1
----
*Experimental*
A ``HGS1`` header indicates a *streaming clone bundle*. This is a bundle
that contains raw revlog data from a repository store. (Typically revlog
data is exchanged in the form of changegroups.)
The purpose of *streaming clone bundles* are to *clone* repository data
very efficiently.
The ``HGS1`` header is always followed by 2 bytes indicating a
compression algorithm of the data that follows. Only ``UN``
(uncompressed data) is currently allowed.
``HGS1UN`` support was added as an experimental feature in version 3.6
(released November 2015) as part of the initial offering of the *clone
bundles* feature.