##// END OF EJS Templates
findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes....
findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes. We speed up 'findrenames' for the usecase when a user specifies they want a similarity of 100% by matching files by their exact SHA1 hash value. This reduces the number of comparisons required to find exact matches from O(n^2) to O(n). While it would be nice if we could just use mercurial's pre-calculated SHA1 hash for existing files, this hash includes the file's ancestor information making it unsuitable for our purposes. Instead, we calculate the hash of old content from scratch. The following benchmarks were taken on the current head of crew: addremove 100% similarity: rm -rf *; hg up -C; mv tests tests.new hg --time addremove -s100 --dry-run before: real 176.350 secs (user 128.890+0.000 sys 47.430+0.000) after: real 2.130 secs (user 1.890+0.000 sys 0.240+0.000) addremove 75% similarity: rm -rf *; hg up -C; mv tests tests.new; \ for i in tests.new/*; do echo x >> $i; done hg --time addremove -s75 --dry-run before: real 264.560 secs (user 215.130+0.000 sys 49.410+0.000) after: real 218.710 secs (user 172.790+0.000 sys 45.870+0.000)

File last commit:

r10282:08a0f04b default
r11060:e6df0177 default
Show More
manifest.py
200 lines | 7.4 KiB | text/x-python | PythonLexer
mpm@selenic.com
Break apart hg.py...
r1089 # manifest.py - manifest revision class for mercurial
#
Thomas Arendsen Hein
Updated copyright notices and add "and others" to "hg version"
r4635 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
mpm@selenic.com
Break apart hg.py...
r1089 #
Martin Geisler
updated license to be explicit about GPL version 2
r8225 # This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
Simplify i18n imports
r3891 from i18n import _
Peter Arrenbrecht
drop unused imports
r8390 import mdiff, parsers, error, revlog
Simon Heimberg
separate import lines from mercurial and general python modules
r8312 import array, struct
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
Combine manifest dict and flags dict into a single object...
r2835 class manifestdict(dict):
Alexis S. L. Carvalho
Fix some bugs introduced during the manifest refactoring
r2857 def __init__(self, mapping=None, flags=None):
Matt Mackall
many, many trivial check-code fixups
r10282 if mapping is None:
mapping = {}
if flags is None:
flags = {}
Matt Mackall
Add manifestflags class
r2831 dict.__init__(self, mapping)
Matt Mackall
Switch to simpler manifestdict
r2839 self._flags = flags
Matt Mackall
manifestflags: eliminate remaining users of direct dict access
r2834 def flags(self, f):
Matt Mackall
Switch to simpler manifestdict
r2839 return self._flags.get(f, "")
Matt Mackall
simplify flag handling...
r6743 def set(self, f, flags):
self._flags[f] = flags
Matt Mackall
Add manifestflags class
r2831 def copy(self):
Benoit Boissinot
manifestdict: remove unnecessary dictionary copy...
r9416 return manifestdict(self, dict.copy(self._flags))
Matt Mackall
Add manifestflags class
r2831
Matt Mackall
revlog: kill from-style imports...
r7634 class manifest(revlog.revlog):
Matt Mackall
revlog: simplify revlog version handling...
r4258 def __init__(self, opener):
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 self._mancache = None
Matt Mackall
revlog: kill from-style imports...
r7634 revlog.revlog.__init__(self, opener, "00manifest.i")
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
manifest: speed up creation of the manifestdict...
r4995 def parse(self, lines):
mfdict = manifestdict()
Bryan O'Sullivan
manifest: improve parsing performance by 8x via a new C extension
r6389 parsers.parse_manifest(mfdict, mfdict._flags, lines)
Matt Mackall
manifest: speed up creation of the manifestdict...
r4995 return mfdict
Brendan Cully
Abstract manifest block parsing.
r3196
def readdelta(self, node):
Matt Mackall
revlog: remove delta function
r7362 r = self.rev(node)
return self.parse(mdiff.patchtext(self.revdiff(r - 1, r)))
Thomas Arendsen Hein
Whitespace/Tab cleanup
r3223
mpm@selenic.com
Break apart hg.py...
r1089 def read(self, node):
Matt Mackall
revlog: kill from-style imports...
r7634 if node == revlog.nullid:
return manifestdict() # don't upset local cache
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 if self._mancache and self._mancache[0] == node:
return self._mancache[1]
mpm@selenic.com
Break apart hg.py...
r1089 text = self.revision(node)
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 arraytext = array.array('c', text)
Matt Mackall
manifest: speed up creation of the manifestdict...
r4995 mapping = self.parse(text)
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 self._mancache = (node, mapping, arraytext)
Matt Mackall
Combine manifest dict and flags dict into a single object...
r2835 return mapping
mpm@selenic.com
Break apart hg.py...
r1089
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 def _search(self, m, s, lo=0, hi=None):
'''return a tuple (start, end) that says where to find s within m.
If the string is found m[start:end] are the line containing
that string. If start == end the string was not found and
they indicate the proper sorted insertion point. This was
taken from bisect_left, and modified to find line start/end as
it goes along.
m should be a buffer or a string
s is a string'''
def advance(i, c):
while i < lenm and m[i] != c:
i += 1
return i
Patrick Mezard
manifest: fix _search() corner-case...
r7405 if not s:
return (lo, lo)
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 lenm = len(m)
if not hi:
hi = lenm
while lo < hi:
mid = (lo + hi) // 2
start = mid
Matt Mackall
many, many trivial check-code fixups
r10282 while start > 0 and m[start - 1] != '\n':
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 start -= 1
end = advance(start, '\0')
if m[start:end] < s:
# we know that after the null there are 40 bytes of sha1
# this translates to the bisect lo = mid + 1
lo = advance(end + 40, '\n') + 1
else:
# this translates to the bisect hi = mid
hi = start
end = advance(lo, '\0')
found = m[lo:end]
if cmp(s, found) == 0:
# we know that after the null there are 40 bytes of sha1
end = advance(end + 40, '\n')
Matt Mackall
many, many trivial check-code fixups
r10282 return (lo, end + 1)
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 else:
return (lo, lo)
def find(self, node, f):
'''look up entry for a single file efficiently.
Alexis S. L. Carvalho
fix manifest.find
r4159 return (node, flags) pair if found, (None, None) if not.'''
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 if self._mancache and self._mancache[0] == node:
return self._mancache[1].get(f), self._mancache[1].flags(f)
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 text = self.revision(node)
start, end = self._search(text, f)
if start == end:
return None, None
l = text[start:end]
f, n = l.split('\0')
Matt Mackall
revlog: kill from-style imports...
r7634 return revlog.bin(n[:40]), n[40:-1]
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320
Matt Mackall
Remove manifest.readflags
r2841 def add(self, map, transaction, link, p1=None, p2=None,
mpm@selenic.com
Break apart hg.py...
r1089 changed=None):
# apply the changes collected during the bisect loop to our addlist
mason@suse.com
Optimize manifest.add...
r1534 # return a delta suitable for addrevision
def addlistdelta(addlist, x):
# start from the bottom up
mpm@selenic.com
Break apart hg.py...
r1089 # so changes to the offsets don't mess things up.
Benoit Boissinot
manifest.add(): simplify with iterators and generator expressions
r9413 for start, end, content in reversed(x):
if content:
addlist[start:end] = array.array('c', content)
mpm@selenic.com
Break apart hg.py...
r1089 else:
del addlist[start:end]
Benoit Boissinot
manifest.add(): simplify with iterators and generator expressions
r9413 return "".join(struct.pack(">lll", start, end, len(content)) + content
for start, end, content in x)
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
manifest: make checkforbidden take a list
r6765 def checkforbidden(l):
for f in l:
if '\n' in f or '\r' in f:
Matt Mackall
errors: move revlog errors...
r7633 raise error.RevlogError(
Greg Ward
manifest: improve error message about newlines in filenames...
r8077 _("'\\n' and '\\r' disallowed in filenames: %r") % f)
Benoit Boissinot
issue352: disallow '\n' and '\r' in filenames (dirstate and manifest)
r3607
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 # if we're using the cache, make sure it is valid and
mpm@selenic.com
Break apart hg.py...
r1089 # parented by the same node we're diffing against
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 if not (changed and self._mancache and p1 and self._mancache[0] == p1):
Matt Mackall
replace util.sort with sorted built-in...
r8209 files = sorted(map)
Matt Mackall
manifest: make checkforbidden take a list
r6765 checkforbidden(files)
Benoit Boissinot
issue352: disallow '\n' and '\r' in filenames (dirstate and manifest)
r3607
Matt Mackall
Fix comment syntax
r1651 # if this is changed to support newlines in filenames,
# be sure to check the templates/ dir again (especially *-raw.tmpl)
Matt Mackall
revlog: kill from-style imports...
r7634 hex, flags = revlog.hex, map.flags
Benoit Boissinot
manifest/revlog: do not let the revlog cache mutable objects...
r9420 text = ''.join("%s\000%s%s\n" % (f, hex(map[f]), flags(f))
for f in files)
arraytext = array.array('c', text)
mpm@selenic.com
Break apart hg.py...
r1089 cachedelta = None
else:
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 added, removed = changed
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 addlist = self._mancache[2]
mpm@selenic.com
Break apart hg.py...
r1089
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 checkforbidden(added)
mpm@selenic.com
Break apart hg.py...
r1089 # combine the changed lists into one list for sorting
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 work = [(x, False) for x in added]
work.extend((x, True) for x in removed)
# this could use heapq.merge() (from python2.6+) or equivalent
# since the lists are already sorted
mpm@selenic.com
Break apart hg.py...
r1089 work.sort()
delta = []
mason@suse.com
Optimize manifest.add...
r1534 dstart = None
dend = None
dline = [""]
start = 0
# zero copy representation of addlist as a buffer
addbuf = buffer(addlist)
mpm@selenic.com
Break apart hg.py...
r1089
mason@suse.com
Optimize manifest.add...
r1534 # start with a readonly loop that finds the offset of
# each line and creates the deltas
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 for f, todelete in work:
mpm@selenic.com
Break apart hg.py...
r1089 # bs will either be the index of the item or the insert point
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 start, end = self._search(addbuf, f, start)
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 if not todelete:
Matt Mackall
revlog: kill from-style imports...
r7634 l = "%s\000%s%s\n" % (f, revlog.hex(map[f]), map.flags(f))
mpm@selenic.com
Break apart hg.py...
r1089 else:
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 if start == end:
# item we want to delete was not found, error out
raise AssertionError(
_("failed to remove %s from manifest") % f)
mason@suse.com
Optimize manifest.add...
r1534 l = ""
if dstart != None and dstart <= start and dend >= start:
if dend < end:
dend = end
if l:
dline.append(l)
mpm@selenic.com
Break apart hg.py...
r1089 else:
mason@suse.com
Optimize manifest.add...
r1534 if dstart != None:
delta.append([dstart, dend, "".join(dline)])
dstart = start
dend = end
dline = [l]
mpm@selenic.com
Break apart hg.py...
r1089
mason@suse.com
Optimize manifest.add...
r1534 if dstart != None:
delta.append([dstart, dend, "".join(dline)])
# apply the delta to the addlist, and get a delta for addrevision
cachedelta = addlistdelta(addlist, delta)
mpm@selenic.com
Break apart hg.py...
r1089
mason@suse.com
Optimize manifest.add...
r1534 # the delta is only valid if we've been processing the tip revision
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 if p1 != self.tip():
mason@suse.com
Optimize manifest.add...
r1534 cachedelta = None
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 arraytext = addlist
Benoit Boissinot
manifest/revlog: do not let the revlog cache mutable objects...
r9420 text = buffer(arraytext)
mason@suse.com
Optimize manifest.add...
r1534
Benoit Boissinot
manifest/revlog: do not let the revlog cache mutable objects...
r9420 n = self.addrevision(text, transaction, link, p1, p2, cachedelta)
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 self._mancache = (n, map, arraytext)
mpm@selenic.com
Break apart hg.py...
r1089
return n