##// END OF EJS Templates
findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes....
findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes. We speed up 'findrenames' for the usecase when a user specifies they want a similarity of 100% by matching files by their exact SHA1 hash value. This reduces the number of comparisons required to find exact matches from O(n^2) to O(n). While it would be nice if we could just use mercurial's pre-calculated SHA1 hash for existing files, this hash includes the file's ancestor information making it unsuitable for our purposes. Instead, we calculate the hash of old content from scratch. The following benchmarks were taken on the current head of crew: addremove 100% similarity: rm -rf *; hg up -C; mv tests tests.new hg --time addremove -s100 --dry-run before: real 176.350 secs (user 128.890+0.000 sys 47.430+0.000) after: real 2.130 secs (user 1.890+0.000 sys 0.240+0.000) addremove 75% similarity: rm -rf *; hg up -C; mv tests tests.new; \ for i in tests.new/*; do echo x >> $i; done hg --time addremove -s75 --dry-run before: real 264.560 secs (user 215.130+0.000 sys 49.410+0.000) after: real 218.710 secs (user 172.790+0.000 sys 45.870+0.000)

File last commit:

r10881:a685011e default
r11060:e6df0177 default
Show More
repair.py
145 lines | 4.7 KiB | text/x-python | PythonLexer
Matt Mackall
strip: move strip code to a new repair module
r4702 # repair.py - functions for repository repair for mercurial
#
# Copyright 2005, 2006 Chris Mason <mason@suse.com>
# Copyright 2007 Matt Mackall
#
Martin Geisler
updated license to be explicit about GPL version 2
r8225 # This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
Matt Mackall
strip: move strip code to a new repair module
r4702
Simon Heimberg
separate import lines from mercurial and general python modules
r8312 import changegroup
Joel Rosdahl
Expand import * to allow Pyflakes to find problems
r6211 from node import nullrev, short
Martin Geisler
i18n: mark strings for translation in Mercurial
r6953 from i18n import _
Simon Heimberg
separate import lines from mercurial and general python modules
r8312 import os
Matt Mackall
strip: move strip code to a new repair module
r4702
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909 def _bundle(repo, bases, heads, node, suffix, extranodes=None):
Alexis S. L. Carvalho
repair.py: don't use nested functions.
r5905 """create a bundle with the specified revisions as a backup"""
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909 cg = repo.changegroupsubset(bases, heads, 'strip', extranodes)
Alexis S. L. Carvalho
repair.py: don't use nested functions.
r5905 backupdir = repo.join("strip-backup")
if not os.path.isdir(backupdir):
os.mkdir(backupdir)
name = os.path.join(backupdir, "%s-%s" % (short(node), suffix))
Martin Geisler
i18n: mark strings for translation in Mercurial
r6953 repo.ui.warn(_("saving bundle to %s\n") % name)
Alexis S. L. Carvalho
repair.py: don't use nested functions.
r5905 return changegroup.writebundle(cg, name, "HG10BZ")
Matt Mackall
strip: move strip code to a new repair module
r4702
Alexis S. L. Carvalho
simplify revlog.strip interface and callers; add docstring...
r5910 def _collectfiles(repo, striprev):
"""find out the filelogs affected by the strip"""
Benoit Boissinot
repair: use set instead of dict
r8462 files = set()
Matt Mackall
strip: move strip code to a new repair module
r4702
Matt Mackall
add __len__ and __iter__ methods to repo and revlog
r6750 for x in xrange(striprev, len(repo)):
Martin Geisler
repair: bulk update sets...
r8479 files.update(repo[x].files())
Alexis S. L. Carvalho
repair.py: split stripall into two functions; clean it up a bit
r5902
Benoit Boissinot
repair: use set instead of dict
r8462 return sorted(files)
Alexis S. L. Carvalho
repair.py: split stripall into two functions; clean it up a bit
r5902
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909 def _collectextranodes(repo, files, link):
"""return the nodes that have to be saved before the strip"""
def collectone(revlog):
extra = []
Matt Mackall
add __len__ and __iter__ methods to repo and revlog
r6750 startrev = count = len(revlog)
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909 # find the truncation point of the revlog
Martin Geisler
replace xrange(0, n) with xrange(n)
r8624 for i in xrange(count):
Matt Mackall
linkrev: take a revision number rather than a hash
r7361 lrev = revlog.linkrev(i)
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909 if lrev >= link:
startrev = i + 1
break
# see if any revision after that point has a linkrev less than link
# (we have to manually save these guys)
for i in xrange(startrev, count):
node = revlog.node(i)
Matt Mackall
linkrev: take a revision number rather than a hash
r7361 lrev = revlog.linkrev(i)
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909 if lrev < link:
extra.append((node, cl.node(lrev)))
return extra
extranodes = {}
cl = repo.changelog
extra = collectone(repo.manifest)
if extra:
extranodes[1] = extra
for fname in files:
f = repo.file(fname)
extra = collectone(f)
if extra:
extranodes[fname] = extra
return extranodes
Alexis S. L. Carvalho
repair.py: don't use nested functions.
r5905 def strip(ui, repo, node, backup="all"):
Alexis S. L. Carvalho
repair.py: rename chlog to cl
r5901 cl = repo.changelog
Matt Mackall
strip: move strip code to a new repair module
r4702 # TODO delete the undo files, and handle undo of merge sets
Alexis S. L. Carvalho
repair.py: rename chlog to cl
r5901 striprev = cl.rev(node)
Matt Mackall
strip: move strip code to a new repair module
r4702
Alexis S. L. Carvalho
repair.py: rewrite a loop, making it cleaner and faster
r6147 # Some revisions with rev > striprev may not be descendants of striprev.
# We have to find these revisions and put them in a bundle, so that
# we can restore them after the truncations.
# To create the bundle we use repo.changegroupsubset which requires
# the list of heads and bases of the set of interesting revisions.
# (head = revision in the set that has no descendant in the set;
# base = revision in the set that has no ancestor in the set)
Benoit Boissinot
repair: use set instead of dict
r8462 tostrip = set((striprev,))
saveheads = set()
Alexis S. L. Carvalho
repair.py: rewrite a loop, making it cleaner and faster
r6147 savebases = []
Matt Mackall
add __len__ and __iter__ methods to repo and revlog
r6750 for r in xrange(striprev + 1, len(cl)):
Alexis S. L. Carvalho
repair.py: rewrite a loop, making it cleaner and faster
r6147 parents = cl.parentrevs(r)
if parents[0] in tostrip or parents[1] in tostrip:
# r is a descendant of striprev
Benoit Boissinot
repair: use set instead of dict
r8462 tostrip.add(r)
Alexis S. L. Carvalho
repair.py: rewrite a loop, making it cleaner and faster
r6147 # if this is a merge and one of the parents does not descend
# from striprev, mark that parent as a savehead.
if parents[1] != nullrev:
for p in parents:
if p not in tostrip and p > striprev:
Benoit Boissinot
repair: use set instead of dict
r8462 saveheads.add(p)
Alexis S. L. Carvalho
repair.py: rewrite a loop, making it cleaner and faster
r6147 else:
# if no parents of this revision will be stripped, mark it as
# a savebase
if parents[0] < striprev and parents[1] < striprev:
savebases.append(cl.node(r))
Matt Mackall
strip: move strip code to a new repair module
r4702
Martin Geisler
repair: bulk update sets...
r8479 saveheads.difference_update(parents)
Benoit Boissinot
repair: use set instead of dict
r8462 saveheads.add(r)
Matt Mackall
strip: move strip code to a new repair module
r4702
Alexis S. L. Carvalho
repair.py: rewrite a loop, making it cleaner and faster
r6147 saveheads = [cl.node(r) for r in saveheads]
Alexis S. L. Carvalho
simplify revlog.strip interface and callers; add docstring...
r5910 files = _collectfiles(repo, striprev)
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909
Alexis S. L. Carvalho
simplify revlog.strip interface and callers; add docstring...
r5910 extranodes = _collectextranodes(repo, files, striprev)
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909
Matt Mackall
strip: move strip code to a new repair module
r4702 # create a changegroup for all the branches we need to keep
if backup == "all":
Alexis S. L. Carvalho
repair.py: don't use nested functions.
r5905 _bundle(repo, [node], cl.heads(), node, 'backup')
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909 if saveheads or extranodes:
Alexis S. L. Carvalho
repair.py: rewrite a loop, making it cleaner and faster
r6147 chgrpfile = _bundle(repo, savebases, saveheads, node, 'temp',
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909 extranodes)
Matt Mackall
strip: move strip code to a new repair module
r4702
Henrik Stuart
strip: make repair.strip transactional to avoid repository corruption...
r8073 mfst = repo.manifest
Steve Borho
localrepo: add desc parameter to transaction...
r10881 tr = repo.transaction("strip")
Henrik Stuart
strip: make repair.strip transactional to avoid repository corruption...
r8073 offset = len(tr.entries)
Henrik Stuart
transaction: add atomic groups to transaction logic...
r8363 tr.startgroup()
Henrik Stuart
strip: make repair.strip transactional to avoid repository corruption...
r8073 cl.strip(striprev, tr)
mfst.strip(striprev, tr)
Brendan Cully
Fix issue1738 for strip too....
r9125 for fn in files:
repo.file(fn).strip(striprev, tr)
Henrik Stuart
transaction: add atomic groups to transaction logic...
r8363 tr.endgroup()
Henrik Stuart
strip: make repair.strip transactional to avoid repository corruption...
r8073
try:
for i in xrange(offset, len(tr.entries)):
file, troffset, ignore = tr.entries[i]
repo.sopener(file, 'a').truncate(troffset)
tr.close()
except:
tr.abort()
raise
Matt Mackall
strip: move strip code to a new repair module
r4702
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909 if saveheads or extranodes:
Martin Geisler
i18n: mark strings for translation in Mercurial
r6953 ui.status(_("adding branch\n"))
Alexis S. L. Carvalho
repair.py: don't import commands.py
r5898 f = open(chgrpfile, "rb")
gen = changegroup.readbundle(f, chgrpfile)
Alexis S. L. Carvalho
strip: calculate list of extra nodes to save and pass it to changegroupsubset...
r5909 repo.addchangegroup(gen, 'strip', 'bundle:' + chgrpfile, True)
Alexis S. L. Carvalho
repair.py: don't import commands.py
r5898 f.close()
Matt Mackall
strip: move strip code to a new repair module
r4702 if backup != "strip":
os.unlink(chgrpfile)
Greg Ward
localrepo: add destroyed() method for strip/rollback to use (issue548).
r9150 repo.destroyed()