##// END OF EJS Templates
deltas: set estimated compression upper bound to "3x" instead of "10x"...
deltas: set estimated compression upper bound to "3x" instead of "10x" In pratice, we very rarely observer compression better than "3x" on manifest deltas. Having a more aggressive estimate significantly helps our pathological use case on a private repository. Here are a comparison of timings using different upper bound. Estimated compression | ø | ×10 | ×5 | ×3 | timing | 14.11 | 2.61 | 1.96 | 1.53 | We also tested the impact of this series on an array of public repositories. This shown no impact in either size nor timing. Full data set below for those interested. Size ---- Regarding size, not significant impact have been noticed on neither public nor private repositories. Here are the number we gathered on public repositories: zlib/upperbound | no | 10x | 5x | 3x mercurial | 5 875 730 | 5 875 730 | 5 875 730 | 5 875 730 pypy | 27 782 913 | 27 782 913 | 27 782 913 | 27 782 913 netbeans | 159 161 207 | 159 161 207 | 159 161 207 | 159 959 879 (+0.5%) mozilla-central | 323 841 642 | 323 841 642 | 323 841 642 | 319 867 519 (-2.5%) mozilla-try | 746 649 123 | 746 649 123 | 746 649 123 | 741 155 568 (-0.7%) private-repo | 1 485 287 294 | 1 485 287 294 | 1 485 287 294 | 1 409 248 382 (-5.1%) zstd/upperbound | no | 10x | 5x | 3x mercurial | 5 895 206 | 5 895 206 | 5 895 206 | 5 895 206 pypy | 28 689 230 | 28 689 230 | 28 689 230 | 28 689 230 netbeans | 157 636 387 | 157 636 387 | 157 636 387 | 159 692 678 (+1.3%) mozilla-central | 317 650 281 | 317 650 281 | 317 650 281 | 319 613 603 (+0.6%) mozilla-try | 737 555 275 | 737 555 275 | 737 555 275 | 738 079 473 (+0.1%) private-repo | 1 352 362 982 | 1 352 362 982 | 1 346 961 880 | 1 361 327 384 (+0.7%) Speed ------ Timing gathered using `hg perfrevlogwrite -m`. Value are in seconds. mercurial zlib | no | 10x | 5x | 3x | total | 65.551783 | 65.388887 | 65.260658 | 65.321199 | max | 0.034544 | 0.034571 | 0.034659 | 0.034521 | 99.99% | 0.034544 | 0.034571 | 0.034659 | 0.034521 | zstd | no | 10x | 5x | 3x | total | 49.118449 | 49.054062 | 48.753588 | 48.740230 | max | 0.009338 | 0.009239 | 0.009202 | 0.009178 | 99.99% | 0.007618 | 0.007639 | 0.007626 | 0.007621 | pypy zlib | no | 10x | 5x | 3x | total | 560.865984 | 558.983817 | 559.083815 | 559.349152 | max | 0.219614 | 0.215922 | 0.218112 | 0.218107 | 99.99% | 0.219614 | 0.215922 | 0.218112 | 0.218107 | zstd | no | 10x | 5x | 3x | total | 349.393280 | 347.395819 | 347.185407 | 345.643985 | max | 0.084143 | 0.083536 | 0.081834 | 0.082178 | 99.99% | 0.039445 | 0.039639 | 0.039612 | 0.039175 | netbeans zlib | no | 10x | 5x | 3x | total | 33103.327727 | 33314.932260 | 33211.745233 | 33345.891778 | max | 2.666852 | 2.672059 | 2.662453 | 2.662936 | 99.99% | 2.058772 | 2.070429 | 2.069569 | 2.064653 | zstd | no | 10x | 5x | 3x | total | 20112.102708 | 20095.879719 | 20083.390300 | 20123.221859 | max | 2.063482 | 2.062851 | 2.065229 | 2.060147 | 99.99% | 1.146647 | 1.143794 | 1.142933 | 1.146529 | mozilla zlib | no | 10x | 5x | 3x | total | 41374.102138 | 41418.816773 | 41381.956370 | 41334.280732 | max | 3.383474 | 3.387400 | 3.405711 | 3.387316 | 99.99% | 1.006755 | 1.005954 | 1.007700 | 1.007373 | zstd | no | 10x | 5x | 3x | total | 24689.691520 | 24643.939662 | 24664.630027 | 24664.512714 | max | 1.460822 | 1.449640 | 1.439747 | 1.465304 | 99.99% | 0.527111 | 0.527377 | 0.527807 | 0.527226 |

File last commit:

r40457:6a917075 default
r42669:4a3abb33 default
Show More
filelog.py
239 lines | 7.9 KiB | text/x-python | PythonLexer
mpm@selenic.com
Break apart hg.py...
r1089 # filelog.py - file history class for mercurial
#
Thomas Arendsen Hein
Updated copyright notices and add "and others" to "hg version"
r4635 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
mpm@selenic.com
Break apart hg.py...
r1089 #
Martin Geisler
updated license to be explicit about GPL version 2
r8225 # This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
mpm@selenic.com
Break apart hg.py...
r1089
Gregory Szorc
filelog: use absolute_import
r25948 from __future__ import absolute_import
Gregory Szorc
repository: teach addgroup() to receive data with missing parents...
r40425 from .i18n import _
Gregory Szorc
filelog: add a hasnode() method (API)...
r40423 from .node import (
nullid,
nullrev,
)
Gregory Szorc
filelog: use absolute_import
r25948 from . import (
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 error,
Gregory Szorc
filelog: declare that filelog implements a storage interface...
r37459 repository,
Gregory Szorc
filelog: use absolute_import
r25948 revlog,
)
Gregory Szorc
interfaceutil: module to stub out zope.interface...
r37828 from .utils import (
interfaceutil,
Gregory Szorc
storageutil: move metadata parsing and packing from revlog (API)...
r39914 storageutil,
Gregory Szorc
interfaceutil: module to stub out zope.interface...
r37828 )
mpm@selenic.com
Break apart hg.py...
r1089
Gregory Szorc
interfaceutil: module to stub out zope.interface...
r37828 @interfaceutil.implementer(repository.ifilestorage)
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 class filelog(object):
Matt Mackall
revlog: simplify revlog version handling...
r4258 def __init__(self, opener, path):
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 self._revlog = revlog.revlog(opener,
'/'.join(('data', path + '.i')),
censorable=True)
Gregory Szorc
filelog: record what's using attributes...
r39819 # Full name of the user visible file, relative to the repository root.
# Used by LFS.
Gregory Szorc
filelog: store filename directly on revlog instance...
r39892 self._revlog.filename = path
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515
def __len__(self):
return len(self._revlog)
def __iter__(self):
return self._revlog.__iter__()
Gregory Szorc
filelog: add a hasnode() method (API)...
r40423 def hasnode(self, node):
if node in (nullid, nullrev):
return False
try:
self._revlog.rev(node)
return True
except (TypeError, ValueError, IndexError, error.LookupError):
return False
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 def revs(self, start=0, stop=None):
return self._revlog.revs(start=start, stop=stop)
def parents(self, node):
return self._revlog.parents(node)
def parentrevs(self, rev):
return self._revlog.parentrevs(rev)
def rev(self, node):
return self._revlog.rev(node)
def node(self, rev):
return self._revlog.node(rev)
def lookup(self, node):
Gregory Szorc
storageutil: implement file identifier resolution method (BC)...
r40038 return storageutil.fileidlookup(self._revlog, node,
self._revlog.indexfile)
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515
def linkrev(self, rev):
return self._revlog.linkrev(rev)
def commonancestorsheads(self, node1, node2):
return self._revlog.commonancestorsheads(node1, node2)
Gregory Szorc
filelog: record what's using attributes...
r39819 # Used by dagop.blockdescendants().
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 def descendants(self, revs):
return self._revlog.descendants(revs)
def heads(self, start=None, stop=None):
return self._revlog.heads(start, stop)
Gregory Szorc
filelog: record what's using attributes...
r39819 # Used by hgweb, children extension.
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 def children(self, node):
return self._revlog.children(node)
def iscensored(self, rev):
return self._revlog.iscensored(rev)
def revision(self, node, _df=None, raw=False):
return self._revlog.revision(node, _df=_df, raw=raw)
Gregory Szorc
revlog: new API to emit revision data...
r39898 def emitrevisions(self, nodes, nodesorder=None,
revisiondata=False, assumehaveparentrevisions=False,
Boris Feld
storage: also use `deltamode argument` for ifiledata...
r40457 deltamode=repository.CG_DELTAMODE_STD):
Gregory Szorc
revlog: new API to emit revision data...
r39898 return self._revlog.emitrevisions(
nodes, nodesorder=nodesorder, revisiondata=revisiondata,
assumehaveparentrevisions=assumehaveparentrevisions,
Boris Feld
storage: also use `deltamode argument` for ifiledata...
r40457 deltamode=deltamode)
Gregory Szorc
revlog: new API to emit revision data...
r39898
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 def addrevision(self, revisiondata, transaction, linkrev, p1, p2,
node=None, flags=revlog.REVIDX_DEFAULT_FLAGS,
cachedelta=None):
return self._revlog.addrevision(revisiondata, transaction, linkrev,
p1, p2, node=node, flags=flags,
cachedelta=cachedelta)
Gregory Szorc
repository: teach addgroup() to receive data with missing parents...
r40425 def addgroup(self, deltas, linkmapper, transaction, addrevisioncb=None,
maybemissingparents=False):
if maybemissingparents:
raise error.Abort(_('revlog storage does not support missing '
'parents write mode'))
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 return self._revlog.addgroup(deltas, linkmapper, transaction,
Gregory Szorc
repository: teach addgroup() to receive data with missing parents...
r40425 addrevisioncb=addrevisioncb)
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515
def getstrippoint(self, minlink):
return self._revlog.getstrippoint(minlink)
def strip(self, minlink, transaction):
return self._revlog.strip(minlink, transaction)
Gregory Szorc
revlog: move censor logic out of censor extension...
r39814 def censorrevision(self, tr, node, tombstone=b''):
Gregory Szorc
revlog: rewrite censoring logic...
r40092 return self._revlog.censorrevision(tr, node, tombstone=tombstone)
Gregory Szorc
revlog: move censor logic out of censor extension...
r39814
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 def files(self):
return self._revlog.files()
mpm@selenic.com
Break apart hg.py...
r1089 def read(self, node):
Gregory Szorc
storageutil: new function for extracting metadata-less content from text...
r39916 return storageutil.filtermetadata(self.revision(node))
mpm@selenic.com
Break apart hg.py...
r1089
def add(self, text, meta, transaction, link, p1=None, p2=None):
if meta or text.startswith('\1\n'):
Gregory Szorc
storageutil: move metadata parsing and packing from revlog (API)...
r39914 text = storageutil.packmeta(meta, text)
mpm@selenic.com
Break apart hg.py...
r1089 return self.addrevision(text, transaction, link, p1, p2)
mpm@selenic.com
Add some rename debugging support
r1116 def renamed(self, node):
Gregory Szorc
storageutil: extract copy metadata retrieval out of filelog...
r40041 return storageutil.filerevisioncopied(self, node)
mpm@selenic.com
Add some rename debugging support
r1116
Matt Mackall
merge: use file size stored in revlog index...
r2898 def size(self, rev):
"""return the size of a given revision"""
# for revisions with renames, we have to go the slow way
node = self.node(rev)
if self.renamed(node):
return len(self.read(node))
Mike Edgar
revlog: add "iscensored()" to revlog public API...
r24118 if self.iscensored(rev):
Mike Edgar
filelog: censored files compare against empty data, have 0 size...
r22597 return 0
Matt Mackall
merge: use file size stored in revlog index...
r2898
Nicolas Dumazet
filelog: test behaviour for data starting with "\1\n"...
r11540 # XXX if self.read(node).startswith("\1\n"), this returns (size+4)
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 return self._revlog.size(rev)
Matt Mackall
merge: use file size stored in revlog index...
r2898
Matt Mackall
filelog: add hash-based comparisons...
r2887 def cmp(self, node, text):
Nicolas Dumazet
cmp: document the fact that we return True if content is different...
r11539 """compare text with a given file revision
returns True if text is different than what is stored.
"""
Gregory Szorc
storageutil: invert logic of file data comparison...
r40043 return not storageutil.filedataequivalent(self, node, text)
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515
Gregory Szorc
verify: start to abstract file verification...
r39878 def verifyintegrity(self, state):
return self._revlog.verifyintegrity(state)
Gregory Szorc
revlog: add method for obtaining storage info (API)...
r39905 def storageinfo(self, exclusivefiles=False, sharedfiles=False,
revisionscount=False, trackedsize=False,
storedsize=False):
return self._revlog.storageinfo(
exclusivefiles=exclusivefiles, sharedfiles=sharedfiles,
revisionscount=revisionscount, trackedsize=trackedsize,
storedsize=storedsize)
Gregory Szorc
filelog: record what's using attributes...
r39819 # TODO these aren't part of the interface and aren't internal methods.
# Callers should be fixed to not use them.
# Used by bundlefilelog, unionfilelog.
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 @property
def indexfile(self):
return self._revlog.indexfile
@indexfile.setter
def indexfile(self, value):
self._revlog.indexfile = value
Gregory Szorc
filelog: record what's using attributes...
r39819 # Used by repo upgrade.
Gregory Szorc
filelog: wrap revlog instead of inheriting it (API)...
r37515 def clone(self, tr, destrevlog, **kwargs):
if not isinstance(destrevlog, filelog):
raise error.ProgrammingError('expected filelog to clone()')
return self._revlog.clone(tr, destrevlog._revlog, **kwargs)
Gregory Szorc
filelog: custom filelog to be used with narrow repos...
r39801 class narrowfilelog(filelog):
"""Filelog variation to be used with narrow stores."""
def __init__(self, opener, path, narrowmatch):
super(narrowfilelog, self).__init__(opener, path)
self._narrowmatch = narrowmatch
def renamed(self, node):
res = super(narrowfilelog, self).renamed(node)
# Renames that come from outside the narrowspec are problematic
# because we may lack the base text for the rename. This can result
# in code attempting to walk the ancestry or compute a diff
# encountering a missing revision. We address this by silently
# removing rename metadata if the source file is outside the
# narrow spec.
#
# A better solution would be to see if the base revision is available,
# rather than assuming it isn't.
#
# An even better solution would be to teach all consumers of rename
# metadata that the base revision may not be available.
#
# TODO consider better ways of doing this.
if res and not self._narrowmatch(res[0]):
return None
return res
def size(self, rev):
# Because we have a custom renamed() that may lie, we need to call
# the base renamed() to report accurate results.
node = self.node(rev)
if super(narrowfilelog, self).renamed(node):
return len(self.read(node))
else:
return super(narrowfilelog, self).size(rev)
def cmp(self, node, text):
different = super(narrowfilelog, self).cmp(node, text)
# Because renamed() may lie, we may get false positives for
# different content. Check for this by comparing against the original
# renamed() implementation.
if different:
if super(narrowfilelog, self).renamed(node):
t2 = self.read(node)
return t2 != text
return different