##// END OF EJS Templates
findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes....
findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes. We speed up 'findrenames' for the usecase when a user specifies they want a similarity of 100% by matching files by their exact SHA1 hash value. This reduces the number of comparisons required to find exact matches from O(n^2) to O(n). While it would be nice if we could just use mercurial's pre-calculated SHA1 hash for existing files, this hash includes the file's ancestor information making it unsuitable for our purposes. Instead, we calculate the hash of old content from scratch. The following benchmarks were taken on the current head of crew: addremove 100% similarity: rm -rf *; hg up -C; mv tests tests.new hg --time addremove -s100 --dry-run before: real 176.350 secs (user 128.890+0.000 sys 47.430+0.000) after: real 2.130 secs (user 1.890+0.000 sys 0.240+0.000) addremove 75% similarity: rm -rf *; hg up -C; mv tests tests.new; \ for i in tests.new/*; do echo x >> $i; done hg --time addremove -s75 --dry-run before: real 264.560 secs (user 215.130+0.000 sys 49.410+0.000) after: real 218.710 secs (user 172.790+0.000 sys 45.870+0.000)

File last commit:

r10706:d8d1b56d merge default
r11060:e6df0177 default
Show More
filelog.py
66 lines | 2.0 KiB | text/x-python | PythonLexer
mpm@selenic.com
Break apart hg.py...
r1089 # filelog.py - file history class for mercurial
#
Thomas Arendsen Hein
Updated copyright notices and add "and others" to "hg version"
r4635 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
mpm@selenic.com
Break apart hg.py...
r1089 #
Martin Geisler
updated license to be explicit about GPL version 2
r8225 # This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
revlog: kill from-style imports...
r7634 import revlog
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
revlog: kill from-style imports...
r7634 class filelog(revlog.revlog):
Matt Mackall
revlog: simplify revlog version handling...
r4258 def __init__(self, opener, path):
Matt Mackall
revlog: kill from-style imports...
r7634 revlog.revlog.__init__(self, opener,
Benoit Boissinot
filelog encoding: move the encoding/decoding into store...
r8531 "/".join(("data", path + ".i")))
mpm@selenic.com
Break apart hg.py...
r1089
def read(self, node):
t = self.revision(node)
if not t.startswith('\1\n'):
return t
Benoit Boissinot
use __contains__, index or split instead of str.find...
r2579 s = t.index('\1\n', 2)
Matt Mackall
many, many trivial check-code fixups
r10282 return t[s + 2:]
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
filelog: make metadata method private
r3123 def _readmeta(self, node):
mpm@selenic.com
Break apart hg.py...
r1089 t = self.revision(node)
if not t.startswith('\1\n'):
mpm@selenic.com
Add some rename debugging support
r1116 return {}
Benoit Boissinot
use __contains__, index or split instead of str.find...
r2579 s = t.index('\1\n', 2)
mpm@selenic.com
Break apart hg.py...
r1089 mt = t[2:s]
mpm@selenic.com
Add some rename debugging support
r1116 m = {}
mpm@selenic.com
Break apart hg.py...
r1089 for l in mt.splitlines():
k, v = l.split(": ", 1)
m[k] = v
return m
def add(self, text, meta, transaction, link, p1=None, p2=None):
if meta or text.startswith('\1\n'):
Benoit Boissinot
filelog: no need to optimize an uncommon case, assume meta = {}
r10705 mt = ["%s: %s\n" % (k, v) for k, v in sorted(meta.iteritems())]
twaldmann@thinkmo.de
minor optimization: save some string trash
r1540 text = "\1\n%s\1\n%s" % ("".join(mt), text)
mpm@selenic.com
Break apart hg.py...
r1089 return self.addrevision(text, transaction, link, p1, p2)
mpm@selenic.com
Add some rename debugging support
r1116 def renamed(self, node):
Matt Mackall
revlog: kill from-style imports...
r7634 if self.parents(node)[0] != revlog.nullid:
mpm@selenic.com
Add some rename debugging support
r1116 return False
Matt Mackall
filelog: make metadata method private
r3123 m = self._readmeta(node)
Christian Ebert
Prefer i in d over d.has_key(i)
r5915 if m and "copy" in m:
Matt Mackall
revlog: kill from-style imports...
r7634 return (m["copy"], revlog.bin(m["copyrev"]))
mpm@selenic.com
Add some rename debugging support
r1116 return False
Matt Mackall
merge: use file size stored in revlog index...
r2898 def size(self, rev):
"""return the size of a given revision"""
# for revisions with renames, we have to go the slow way
node = self.node(rev)
if self.renamed(node):
return len(self.read(node))
Matt Mackall
revlog: kill from-style imports...
r7634 return revlog.revlog.size(self, rev)
Matt Mackall
merge: use file size stored in revlog index...
r2898
Matt Mackall
filelog: add hash-based comparisons...
r2887 def cmp(self, node, text):
"""compare text with a given file revision"""
# for renames, we have to go the slow way
Benoit Boissinot
filelog: text is stored modified when it starts with '\1\n'
r10704 if text.startswith('\1\n') or self.renamed(node):
Matt Mackall
filelog: add hash-based comparisons...
r2887 t2 = self.read(node)
Matt Mackall
filelog.cmp: return 0 for equality...
r2895 return t2 != text
Matt Mackall
filelog: add hash-based comparisons...
r2887
Matt Mackall
revlog: kill from-style imports...
r7634 return revlog.revlog.cmp(self, node, text)