##// END OF EJS Templates
branchmap-v3: filter topo heads using node for performance reason...
branchmap-v3: filter topo heads using node for performance reason The branchmap currently contains heads as nodeid. If we build a set of revnum with the topological heads, we need to turn the nodeid in the branchmap to revnum to be able to check if they are topo-heads. That nodeid → revnum lookup is "expensive" and adds up to something noticeable if you do it hundreds of thousand of time. Instead we turn all the topo-heads revnums into nodes and build a set. So we can directly test membership of the nodeids stored in the branchmap. That is much faster. Ideally we would have revnum in the branchmap and could directly test revnum against a revnum set and that would be even faster. However that's an adventure for another time. Without this change, the branchmap format "v3" was significantly slower than the "v2" format. With this changes, some of that gap is recovered With rust + persistent nodemap, this overhead was smaller because the extra lookup did not had to to build the nodemap from scratch. In addition the mozilla-unified repository is able to use the "pure_top" mode of branchmap v3, so it was not really affected by this. Future changeset will work of the remaining of the performance gap. ### benchmark.name = hg.command.unbundle # bin-env-vars.hg.py-re2-module = default # benchmark.variants.issue6528 = disabled # benchmark.variants.resource-usage = default # benchmark.variants.reuse-external-delta-parent = yes # benchmark.variants.revs = any-1-extra-rev # benchmark.variants.source = unbundle # benchmark.variants.validate = default # benchmark.variants.verbosity = quiet ## data-env-vars.name = netbeans-2018-08-01-zstd-sparse-revlog # bin-env-vars.hg.flavor = default branch-v2: 0.233711 ~~~~~ branch-v3 before: 0.380994 (+63.02%, +0.15) branch-v3 after: 0.368769 (+57.79%, +0.14) # bin-env-vars.hg.flavor = rust branch-v2: 0.235230 ~~~~~ branch-v3 before: 0.385060 (+63.70%, +0.15) branch-v3 after: 0.372460 (+58.34%, +0.14) ## data-env-vars.name = netbeans-2018-08-01-ds2-pnm # bin-env-vars.hg.flavor = rust branch-v2: 0.255586 ~~~~~ branch-v3 before: 0.317524 (+24.23%, +0.06) branch-v3 after: 0.318907 (+24.78%, +0.06) ## data-env-vars.name = mozilla-central-2024-03-22-zstd-sparse-revlog # bin-env-vars.hg.flavor = default branch-v2: 0.339010 ~~~~~ branch-v3 before: 0.410007 (+20.94%, +0.07) branch-v3 after: 0.349752 (+3.17%, +0.01) # bin-env-vars.hg.flavor = rust branch-v2: 0.346525 ~~~~~ branch-v3 before: 0.410428 (+18.44%, +0.06) branch-v3 after: 0.354300 (+2.24%, +0.01) ## data-env-vars.name = mozilla-central-2024-03-22-ds2-pnm # bin-env-vars.hg.flavor = rust branch-v2: 0.380202 ~~~~~ branch-v3 before: 0.393871 (+3.60%, +0.01) branch-v3 after: 0.396293 (+4.23%, +0.02) ## data-env-vars.name = mozilla-unified-2024-03-22-zstd-sparse-revlog # bin-env-vars.hg.flavor = default branch-v2: 0.412165 ~~~~~ branch-v3 before: 0.438105 (+6.29%, +0.03) branch-v3 after: 0.424769 (+3.06%, +0.01) # bin-env-vars.hg.flavor = rust branch-v2: 0.412397 ~~~~~ branch-v3 before: 0.438405 (+6.31%, +0.03) branch-v3 after: 0.421796 (+2.28%, +0.01) ## data-env-vars.name = mozilla-unified-2024-03-22-ds2-pnm # bin-env-vars.hg.flavor = rust branch-v2: 0.429501 ~~~~~ branch-v3 before: 0.452692 (+5.40%, +0.02) branch-v3 after: 0.443849 (+3.34%, +0.01) ## data-env-vars.name = mozilla-try-2024-03-26-zstd-sparse-revlog # bin-env-vars.hg.flavor = default branch-v2: 3.403171 ~~~~~ branch-v3 before: 6.562345 (+92.83%, +3.16) branch-v3 after: 6.234055 (+83.18%, +2.83) # bin-env-vars.hg.flavor = rust branch-v2: 3.454876 ~~~~~ branch-v3 before: 6.160248 (+78.31%, +2.71) branch-v3 after: 6.307813 (+82.58%, +2.85) ## data-env-vars.name = mozilla-try-2024-03-26-ds2-pnm # bin-env-vars.hg.flavor = rust branch-v2: 3.465435 ~~~~~ branch-v3 before: 5.381648 (+55.30%, +1.92) branch-v3 after: 5.176076 (+49.36%, +1.71)

File last commit:

r52756:f4733654 default
r52869:41b8892a default
Show More
constants.py
319 lines | 9.8 KiB | text/x-python | PythonLexer
revlog: add a "data compression mode" entry in the index tuple...
r48023 # revlogdeltas.py - constant used for revlog logic.
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365 #
Raphaël Gomès
contributor: change mentions of mpm to olivia...
r47575 # Copyright 2005-2007 Olivia Mackall <olivia@selenic.com>
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365 # Copyright 2018 Octobus <contact@octobus.net>
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
"""Helper class to compute deltas stored inside revlogs"""
Matt Harbison
typing: add `from __future__ import annotations` to most files...
r52756 from __future__ import annotations
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365
revlog: move the details of revlog "v0" index inside revlog.utils.constants...
r47615 import struct
Augie Fackler
formatting: blacken the codebase...
r43346 from ..interfaces import repository
revlog: add a function to build index entry tuple...
r48187 from .. import revlogutils
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365
revlog: introduce an explicit tracking of what the revlog is about...
r47838 ### Internal utily constants
KIND_CHANGELOG = 1001 # over 256 to not be comparable with a bytes
KIND_MANIFESTLOG = 1002
KIND_FILELOG = 1003
KIND_OTHER = 1004
ALL_KINDS = {
KIND_CHANGELOG,
KIND_MANIFESTLOG,
KIND_FILELOG,
KIND_OTHER,
}
revlog: move entry documentation alongside new related constants...
r48185 ### Index entry key
#
#
# Internal details
# ----------------
#
# A large part of the revlog logic deals with revisions' "index entries", tuple
# objects that contains the same "items" whatever the revlog version.
# Different versions will have different ways of storing these items (sometimes
# not having them at all), but the tuple will always be the same. New fields
# are usually added at the end to avoid breaking existing code that relies
# on the existing order. The field are defined as follows:
# [0] offset:
# The byte index of the start of revision data chunk.
# That value is shifted up by 16 bits. use "offset = field >> 16" to
# retrieve it.
#
# flags:
# A flag field that carries special information or changes the behavior
# of the revision. (see `REVIDX_*` constants for details)
# The flag field only occupies the first 16 bits of this field,
# use "flags = field & 0xFFFF" to retrieve the value.
ENTRY_DATA_OFFSET = 0
# [1] compressed length:
# The size, in bytes, of the chunk on disk
ENTRY_DATA_COMPRESSED_LENGTH = 1
# [2] uncompressed length:
# The size, in bytes, of the full revision once reconstructed.
ENTRY_DATA_UNCOMPRESSED_LENGTH = 2
# [3] base rev:
# Either the base of the revision delta chain (without general
# delta), or the base of the delta (stored in the data chunk)
# with general delta.
ENTRY_DELTA_BASE = 3
# [4] link rev:
# Changelog revision number of the changeset introducing this
# revision.
ENTRY_LINK_REV = 4
# [5] parent 1 rev:
# Revision number of the first parent
ENTRY_PARENT_1 = 5
# [6] parent 2 rev:
# Revision number of the second parent
ENTRY_PARENT_2 = 6
# [7] node id:
# The node id of the current revision
ENTRY_NODE_ID = 7
# [8] sidedata offset:
# The byte index of the start of the revision's side-data chunk.
ENTRY_SIDEDATA_OFFSET = 8
# [9] sidedata chunk length:
# The size, in bytes, of the revision's side-data chunk.
ENTRY_SIDEDATA_COMPRESSED_LENGTH = 9
# [10] data compression mode:
# two bits that detail the way the data chunk is compressed on disk.
# (see "COMP_MODE_*" constants for details). For revlog version 0 and
# 1 this will always be COMP_MODE_INLINE.
ENTRY_DATA_COMPRESSION_MODE = 10
# [11] side-data compression mode:
# two bits that detail the way the sidedata chunk is compressed on disk.
# (see "COMP_MODE_*" constants for details)
ENTRY_SIDEDATA_COMPRESSION_MODE = 11
rank: add a "rank" value to the revlog-entry tuple...
r49330 # [12] Revision rank:
# The number of revision under this one.
#
# Formally this is defined as : rank(X) = len(ancestors(X) + X)
#
# If rank == -1; then we do not have this information available.
# Only `null` has a rank of 0.
ENTRY_RANK = 12
RANK_UNKNOWN = -1
revlog: add some comment in the header sections...
r47614 ### main revlog header
index: use an explicit constant for INDEX_HEADER format and use it for docket...
r48162 # We cannot rely on Struct.format is inconsistent for python <=3.6 versus above
INDEX_HEADER_FMT = b">I"
INDEX_HEADER = struct.Struct(INDEX_HEADER_FMT)
revlog: move the "index header" struct inside revlog.utils.constants...
r47618
revlog: add some comment in the header sections...
r47614 ## revlog version
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365 REVLOGV0 = 0
REVLOGV1 = 1
# Dummy value until file format is finalized.
REVLOGV2 = 0xDEAD
changelogv2: use a dedicated version number...
r48040 # Dummy value until file format is finalized.
CHANGELOGV2 = 0xD34D
revlog: add some comment in the header sections...
r47614
## global revlog header flags
Gregory Szorc
revlog: always enable generaldelta on version 2 revlogs...
r41238 # Shared across v1 and v2.
Augie Fackler
formatting: blacken the codebase...
r43346 FLAG_INLINE_DATA = 1 << 16
Gregory Szorc
revlog: always enable generaldelta on version 2 revlogs...
r41238 # Only used by v1, implied by v2.
Augie Fackler
formatting: blacken the codebase...
r43346 FLAG_GENERALDELTA = 1 << 17
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365 REVLOG_DEFAULT_FLAGS = FLAG_INLINE_DATA
REVLOG_DEFAULT_FORMAT = REVLOGV1
REVLOG_DEFAULT_VERSION = REVLOG_DEFAULT_FORMAT | REVLOG_DEFAULT_FLAGS
revlog: unify checks for supported flag...
r48004 REVLOGV0_FLAGS = 0
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365 REVLOGV1_FLAGS = FLAG_INLINE_DATA | FLAG_GENERALDELTA
Gregory Szorc
revlog: always enable generaldelta on version 2 revlogs...
r41238 REVLOGV2_FLAGS = FLAG_INLINE_DATA
changelogv2: use a dedicated version number...
r48040 CHANGELOGV2_FLAGS = 0
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365
revlog: add some comment in the header sections...
r47614 ### individual entry
revlog: move the details of revlog "v0" index inside revlog.utils.constants...
r47615 ## index v0:
# 4 bytes: offset
# 4 bytes: compressed length
# 4 bytes: base rev
# 4 bytes: link rev
# 20 bytes: parent 1 nodeid
# 20 bytes: parent 2 nodeid
# 20 bytes: nodeid
INDEX_ENTRY_V0 = struct.Struct(b">4l20s20s20s")
revlog: move the details of revlog "v1" index inside revlog.utils.constants...
r47616 ## index v1
# 6 bytes: offset
# 2 bytes: flags
# 4 bytes: compressed length
# 4 bytes: uncompressed length
# 4 bytes: base rev
# 4 bytes: link rev
# 4 bytes: parent 1 rev
# 4 bytes: parent 2 rev
# 32 bytes: nodeid
INDEX_ENTRY_V1 = struct.Struct(b">Qiiiiii20s12x")
assert INDEX_ENTRY_V1.size == 32 * 2
revlog: move the details of revlog "v2" index inside revlog.utils.constants...
r47617 # 6 bytes: offset
# 2 bytes: flags
# 4 bytes: compressed length
# 4 bytes: uncompressed length
# 4 bytes: base rev
# 4 bytes: link rev
# 4 bytes: parent 1 rev
# 4 bytes: parent 2 rev
# 32 bytes: nodeid
# 8 bytes: sidedata offset
# 4 bytes: sidedata compressed length
revlogv2: preserve the compression mode on disk...
r48025 # 1 bytes: compression mode (2 lower bit are data_compression_mode)
# 19 bytes: Padding to align to 96 bytes (see RevlogV2Plan wiki page)
INDEX_ENTRY_V2 = struct.Struct(b">Qiiiiii20s12xQiB19x")
assert INDEX_ENTRY_V2.size == 32 * 3, INDEX_ENTRY_V2.size
revlog: move the details of revlog "v2" index inside revlog.utils.constants...
r47617
changelogv2: use a dedicated on disk format for changelogv2...
r48044 # 6 bytes: offset
# 2 bytes: flags
# 4 bytes: compressed length
# 4 bytes: uncompressed length
# 4 bytes: parent 1 rev
# 4 bytes: parent 2 rev
# 32 bytes: nodeid
# 8 bytes: sidedata offset
# 4 bytes: sidedata compressed length
# 1 bytes: compression mode (2 lower bit are data_compression_mode)
rank: actually persist revision's rank in changelog-v2...
r49331 # 4 bytes: changeset rank (i.e. `len(::REV)`)
# 23 bytes: Padding to align to 96 bytes (see RevlogV2Plan wiki page)
INDEX_ENTRY_CL_V2 = struct.Struct(b">Qiiii20s12xQiBi23x")
changelog-v2: fix an assertion error to display the right data...
r49329 assert INDEX_ENTRY_CL_V2.size == 32 * 3, INDEX_ENTRY_CL_V2.size
changelog-v2: use helper constant in the code to pack/unpack entries...
r49328 INDEX_ENTRY_V2_IDX_OFFSET = 0
INDEX_ENTRY_V2_IDX_COMPRESSED_LENGTH = 1
INDEX_ENTRY_V2_IDX_UNCOMPRESSED_LENGTH = 2
INDEX_ENTRY_V2_IDX_PARENT_1 = 3
INDEX_ENTRY_V2_IDX_PARENT_2 = 4
INDEX_ENTRY_V2_IDX_NODEID = 5
INDEX_ENTRY_V2_IDX_SIDEDATA_OFFSET = 6
INDEX_ENTRY_V2_IDX_SIDEDATA_COMPRESSED_LENGTH = 7
INDEX_ENTRY_V2_IDX_COMPRESSION_MODE = 8
rank: actually persist revision's rank in changelog-v2...
r49331 INDEX_ENTRY_V2_IDX_RANK = 9
changelogv2: use a dedicated on disk format for changelogv2...
r48044
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365 # revlog index flags
Gregory Szorc
repository: define and use revision flag constants...
r40083
# For historical reasons, revlog's internal flags were exposed via the
# wire protocol and are even exposed in parts of the storage APIs.
# revision has censor metadata, must be verified
REVIDX_ISCENSORED = repository.REVISION_FLAG_CENSORED
# revision hash does not match data (narrowhg)
REVIDX_ELLIPSIS = repository.REVISION_FLAG_ELLIPSIS
# revision data is stored externally
REVIDX_EXTSTORED = repository.REVISION_FLAG_EXTSTORED
copies: add a HASCOPIESINFO flag to highlight rev with useful data...
r46263 # revision changes files in a way that could affect copy tracing.
REVIDX_HASCOPIESINFO = repository.REVISION_FLAG_HASCOPIESINFO
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365 REVIDX_DEFAULT_FLAGS = 0
# stable order in which flags need to be processed and their processors applied
REVIDX_FLAGS_ORDER = [
REVIDX_ISCENSORED,
REVIDX_ELLIPSIS,
REVIDX_EXTSTORED,
copies: add a HASCOPIESINFO flag to highlight rev with useful data...
r46263 REVIDX_HASCOPIESINFO,
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365 ]
flagutil: move REVIDX_KNOWN_FLAGS source of truth in flagutil (API)...
r42956
Boris Feld
revlog: split constants into a new `revlogutils.constants` module...
r39365 # bitmark for flags that could cause rawdata content change
Raphaël Gomès
cg4: introduce protocol flag to signify the presence of sidedata...
r47843 REVIDX_RAWTEXT_CHANGING_FLAGS = REVIDX_ISCENSORED | REVIDX_EXTSTORED
Boris Feld
sparse-revlog: set max delta chain length to on thousand...
r39542
revlog: add a "data compression mode" entry in the index tuple...
r48023 ## chunk compression mode constants:
# These constants are used in revlog version >=2 to denote the compression used
# for a chunk.
revlog: introduce a plain compression mode...
r48027 # Chunk use no compression, the data stored on disk can be directly use as
# chunk value. Without any header information prefixed.
COMP_MODE_PLAIN = 0
revlog: implement a "default compression" mode...
r48029 # Chunk use the "default compression" for the revlog (usually defined in the
# revlog docket). A header is still used.
#
# XXX: keeping a header is probably not useful and we should probably drop it.
#
# XXX: The value of allow mixed type of compression in the revlog is unclear
# and we should consider making PLAIN/DEFAULT the only available mode for
# revlog v2, disallowing INLINE mode.
COMP_MODE_DEFAULT = 1
revlog: add a "data compression mode" entry in the index tuple...
r48023 # Chunk use a compression mode stored "inline" at the start of the chunk
# itself. This is the mode always used for revlog version "0" and "1"
revlog: add a function to build index entry tuple...
r48187 COMP_MODE_INLINE = revlogutils.COMP_MODE_INLINE
revlog: add a "data compression mode" entry in the index tuple...
r48023
revlog: unify checks for supported flag...
r48004 SUPPORTED_FLAGS = {
REVLOGV0: REVLOGV0_FLAGS,
REVLOGV1: REVLOGV1_FLAGS,
REVLOGV2: REVLOGV2_FLAGS,
changelogv2: use a dedicated version number...
r48040 CHANGELOGV2: CHANGELOGV2_FLAGS,
revlog: unify checks for supported flag...
r48004 }
revlog: unify flag processing when loading index...
r48005 _no = lambda flags: False
_yes = lambda flags: True
def _from_flag(flag):
return lambda flags: bool(flags & flag)
FEATURES_BY_VERSION = {
REVLOGV0: {
b'inline': _no,
b'generaldelta': _no,
b'sidedata': False,
revlogv2: introduce a very basic docket file...
r48008 b'docket': False,
revlog: unify flag processing when loading index...
r48005 },
REVLOGV1: {
b'inline': _from_flag(FLAG_INLINE_DATA),
b'generaldelta': _from_flag(FLAG_GENERALDELTA),
b'sidedata': False,
revlogv2: introduce a very basic docket file...
r48008 b'docket': False,
revlog: unify flag processing when loading index...
r48005 },
REVLOGV2: {
revlogv2: introduce a very basic docket file...
r48008 # The point of inline-revlog is to reduce the number of files used in
# the store. Using a docket defeat this purpose. So we needs other
# means to reduce the number of files for revlogv2.
revlog: unify flag processing when loading index...
r48005 b'inline': _no,
b'generaldelta': _yes,
b'sidedata': True,
revlogv2: introduce a very basic docket file...
r48008 b'docket': True,
revlog: unify flag processing when loading index...
r48005 },
changelogv2: use a dedicated version number...
r48040 CHANGELOGV2: {
b'inline': _no,
# General delta is useless for changelog since we don't do any delta
b'generaldelta': _no,
b'sidedata': True,
b'docket': True,
},
revlog: unify flag processing when loading index...
r48005 }
revlog: add a "data compression mode" entry in the index tuple...
r48023
Boris Feld
sparse-revlog: set max delta chain length to on thousand...
r39542 SPARSE_REVLOG_MAX_CHAIN_LENGTH = 1000
find-delta: pass the cache-delta usage policy alongside the cache-delta...
r50572
### What should be done with a cached delta and its base ?
# Ignore the cache when considering candidates.
#
# The cached delta might be used, but the delta base will not be scheduled for
# usage earlier than in "normal" order.
DELTA_BASE_REUSE_NO = 0
# Prioritize trying the cached delta base
#
# The delta base will be tested for validy first. So that the cached deltas get
# used when possible.
DELTA_BASE_REUSE_TRY = 1
delta-find: add a delta-reuse policy that blindly accepts incoming deltas...
r50662 DELTA_BASE_REUSE_FORCE = 2