##// END OF EJS Templates
revlog: change generaldelta delta parent heuristic...
revlog: change generaldelta delta parent heuristic The old generaldelta heuristic was "if p1 (or p2) was closer than the last full text, use it, otherwise use prev". This was problematic when a repo contained multiple branches that were very different. If commits to branch A were pushed, and the last full text was branch B, it would generate a fulltext. Then if branch B was pushed, it would generate another fulltext. The problem is that the last fulltext (and delta'ing against `prev` in general) has no correlation with the contents of the incoming revision, and therefore will always have degenerate cases. According to the blame, that algorithm was chosen to minimize the chain length. Since there is already code that protects against that (the delta-vs-fulltext code), and since it has been improved since the original generaldelta algorithm went in (2011), I believe the chain length criteria will still be preserved. The new algorithm always diffs against p1 (or p2 if it's closer), unless the resulting delta will fail the delta-vs-fulltext check, in which case we delta against prev. Some before and after stats on manifest.d size. internal large repo old heuristic - 2.0 GB new heuristic - 1.2 GB mozilla-central old heuristic - 242 MB new heuristic - 261 MB The regression in mozilla central is due to the new heuristic choosing p2r as the delta when it's closer to the tip. Switching the algorithm to always prefer p1r brings the size back down (242 MB). This is result of the way in which mozilla does merges and pushes, and the result could easily swing the other direction in other repos (depending on if they merge X into Y or Y into X), but will never be as degenerate as before. I future patch will address the regression by introducing an optional, even more aggressive delta heuristic which will knock the mozilla manifest size down dramatically.

File last commit:

r25952:f0ad094d default
r26117:4dc5b51f default
Show More
hbisect.py
269 lines | 9.2 KiB | text/x-python | PythonLexer
Matt Mackall
bisect: make bisect a built-in command
r5775 # changelog bisection for mercurial
#
# Copyright 2007 Matt Mackall
# Copyright 2005, 2006 Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Martin Geisler
add blank line after copyright notices and after header
r8228 #
Matt Mackall
bisect: make bisect a built-in command
r5775 # Inspired by git bisect, extension skeleton taken from mq.py.
#
Martin Geisler
updated license to be explicit about GPL version 2
r8225 # This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
Matt Mackall
bisect: make bisect a built-in command
r5775
Gregory Szorc
hbisect: use absolute_import
r25952 from __future__ import absolute_import
Martin von Zweigbergk
util: drop alias for collections.deque...
r25113 import collections
Augie Fackler
cleanup: move stdlib imports to their own import statement...
r20034 import os
Gregory Szorc
hbisect: use absolute_import
r25952
from .i18n import _
from .node import (
hex,
short,
)
from . import (
error,
util,
)
Matt Mackall
bisect: make bisect a built-in command
r5775
def bisect(changelog, state):
Bernhard Leiner
Add support for multiple possible bisect results (issue1228, issue1182)...
r6858 """find the next node (if any) for testing during a bisect search.
returns a (nodes, number, good) tuple.
'nodes' is the final result of the bisect if 'number' is 0.
Otherwise 'number' indicates the remaining possible candidates for
the search and 'nodes' contains the next bisect target.
'good' is True if bisect is searching for a first good changeset, False
if searching for a first bad one.
"""
Matt Mackall
bisect: make bisect a built-in command
r5775 clparents = changelog.parentrevs
Martin Geisler
replace set-like dictionaries with real sets...
r8152 skip = set([changelog.rev(n) for n in state['skip']])
Matt Mackall
bisect: make bisect a built-in command
r5775
Matt Mackall
bisect: handle search for bad to good transitions...
r5776 def buildancestors(bad, good):
# only the earliest bad revision matters
badrev = min([changelog.rev(n) for n in bad])
goodrevs = [changelog.rev(n) for n in good]
Matt Mackall
bisect: limit considered set to descendants of first good rev
r9583 goodrev = min(goodrevs)
# build visit array
ancestors = [None] * (len(changelog) + 1) # an extra for [-1]
Alexander Krauss
hbisect: do not assume that min(good) is an ancestor of min(bad)...
r14895 # set nodes descended from goodrevs
for rev in goodrevs:
ancestors[rev] = []
Pierre-Yves David
bisect: use changelog for iteration...
r18463 for rev in changelog.revs(goodrev + 1):
Matt Mackall
bisect: limit considered set to descendants of first good rev
r9583 for prev in clparents(rev):
if ancestors[prev] == []:
ancestors[rev] = []
Matt Mackall
bisect: make bisect a built-in command
r5775
Matt Mackall
bisect: handle search for bad to good transitions...
r5776 # clear good revs from array
Alexander Krauss
hbisect: more consistent variable name
r14894 for rev in goodrevs:
ancestors[rev] = None
Pierre-Yves David
bisect: use changelog for iteration...
r18463 for rev in changelog.revs(len(changelog), goodrev):
Matt Mackall
bisect: handle search for bad to good transitions...
r5776 if ancestors[rev] is None:
for prev in clparents(rev):
ancestors[prev] = None
Matt Mackall
bisect: make bisect a built-in command
r5775
Matt Mackall
bisect: handle search for bad to good transitions...
r5776 if ancestors[badrev] is None:
Matt Mackall
bisect: improve tests...
r5777 return badrev, None
Matt Mackall
bisect: handle search for bad to good transitions...
r5776 return badrev, ancestors
Martin Geisler
hbisect: use real Booleans instead of 0/1
r14215 good = False
Matt Mackall
bisect: handle search for bad to good transitions...
r5776 badrev, ancestors = buildancestors(state['bad'], state['good'])
if not ancestors: # looking for bad to good transition?
Martin Geisler
hbisect: use real Booleans instead of 0/1
r14215 good = True
Matt Mackall
bisect: handle search for bad to good transitions...
r5776 badrev, ancestors = buildancestors(state['good'], state['bad'])
Matt Mackall
bisect: improve tests...
r5777 bad = changelog.node(badrev)
Matt Mackall
bisect: handle search for bad to good transitions...
r5776 if not ancestors: # now we're confused
Mads Kiilerich
bisect: report "both good and bad" as such, not as "not directly related"
r20094 if (len(state['bad']) == 1 and len(state['good']) == 1 and
state['bad'] != state['good']):
Matt Mackall
bisect: better message for unrelated starting revisions
r12005 raise util.Abort(_("starting revisions are not directly related"))
Martin Geisler
Lowercase error messages
r12067 raise util.Abort(_("inconsistent state, %s:%s is good and bad")
Joel Rosdahl
Avoid importing mercurial.node/mercurial.repo stuff from mercurial.hg
r6217 % (badrev, short(bad)))
Matt Mackall
bisect: make bisect a built-in command
r5775
# build children dict
children = {}
Martin von Zweigbergk
util: drop alias for collections.deque...
r25113 visit = collections.deque([badrev])
Matt Mackall
bisect: make bisect a built-in command
r5775 candidates = []
while visit:
Bryan O'Sullivan
cleanup: use the deque type where appropriate...
r16803 rev = visit.popleft()
Matt Mackall
bisect: make bisect a built-in command
r5775 if ancestors[rev] == []:
candidates.append(rev)
for prev in clparents(rev):
if prev != -1:
if prev in children:
children[prev].append(rev)
else:
children[prev] = [rev]
visit.append(prev)
candidates.sort()
# have we narrowed it down to one entry?
Bernhard Leiner
Add support for multiple possible bisect results (issue1228, issue1182)...
r6858 # or have all other possible candidates besides 'bad' have been skipped?
Matt Mackall
bisect: make bisect a built-in command
r5775 tot = len(candidates)
Bernhard Leiner
Add support for multiple possible bisect results (issue1228, issue1182)...
r6858 unskipped = [c for c in candidates if (c not in skip) and (c != badrev)]
if tot == 1 or not unskipped:
return ([changelog.node(rev) for rev in candidates], 0, good)
Benoit Boissinot
bisect: use integer division
r7902 perfect = tot // 2
Matt Mackall
bisect: make bisect a built-in command
r5775
# find the best node to test
best_rev = None
best_len = -1
Benoit Boissinot
bisect: use set instead of dict
r8463 poison = set()
Matt Mackall
bisect: make bisect a built-in command
r5775 for rev in candidates:
if rev in poison:
Martin Geisler
hbisect: use set.update for bulk updates
r8482 # poison children
poison.update(children.get(rev, []))
Matt Mackall
bisect: make bisect a built-in command
r5775 continue
a = ancestors[rev] or [rev]
ancestors[rev] = None
x = len(a) # number of ancestors
y = tot - x # number of non-ancestors
value = min(x, y) # how good is this test?
if value > best_len and rev not in skip:
best_len = value
best_rev = rev
if value == perfect: # found a perfect candidate? quit early
break
Bernhard Leiner
Circumvent removal of valid bisect candidates due to previously skipped ones...
r7557 if y < perfect and rev not in skip: # all downhill from here?
Martin Geisler
hbisect: use set.update for bulk updates
r8482 # poison children
poison.update(children.get(rev, []))
Matt Mackall
bisect: make bisect a built-in command
r5775 continue
for c in children.get(rev, []):
if ancestors[c]:
Martin Geisler
replace set-like dictionaries with real sets...
r8152 ancestors[c] = list(set(ancestors[c] + a))
Matt Mackall
bisect: make bisect a built-in command
r5775 else:
ancestors[c] = a + [c]
assert best_rev is not None
best_node = changelog.node(best_rev)
Bernhard Leiner
Add support for multiple possible bisect results (issue1228, issue1182)...
r6858 return ([best_node], tot, good)
Alexander Solovyov
bisect: ability to check revision with command
r7227
def load_state(repo):
Bryan O'Sullivan
bisect: track the current changeset (issue3382)...
r16647 state = {'current': [], 'good': [], 'bad': [], 'skip': []}
Alexander Solovyov
bisect: ability to check revision with command
r7227 if os.path.exists(repo.join("bisect.state")):
Angel Ezquerra
localrepo: remove all external users of localrepo.opener...
r23877 for l in repo.vfs("bisect.state"):
Alexander Solovyov
bisect: ability to check revision with command
r7227 kind, node = l[:-1].split()
node = repo.lookup(node)
if kind not in state:
raise util.Abort(_("unknown bisect kind %s") % kind)
state[kind].append(node)
return state
def save_state(repo, state):
Angel Ezquerra
localrepo: remove all external users of localrepo.opener...
r23877 f = repo.vfs("bisect.state", "w", atomictemp=True)
Alexander Solovyov
bisect: ability to check revision with command
r7227 wlock = repo.wlock()
try:
Mads Kiilerich
bisect: store state sorted
r18358 for kind in sorted(state):
Alexander Solovyov
bisect: ability to check revision with command
r7227 for node in state[kind]:
f.write("%s %s\n" % (kind, hex(node)))
Greg Ward
atomictempfile: make close() consistent with other file-like objects....
r15057 f.close()
Alexander Solovyov
bisect: ability to check revision with command
r7227 finally:
Ronny Pfannschmidt
switch lock releasing in the core from gc to explicit
r8109 wlock.release()
Alexander Solovyov
bisect: ability to check revision with command
r7227
"Yann E. MORIN"
revset.bisect: move bisect() code to hbisect.py...
r15135 def get(repo, status):
"""
Return a list of revision(s) that match the given status:
"Yann E. MORIN"
hbisect: add two new revset descriptions: 'goods' and 'bads'...
r15153 - ``good``, ``bad``, ``skip``: csets explicitly marked as good/bad/skip
Mads Kiilerich
fix trivial spelling errors
r17424 - ``goods``, ``bads`` : csets topologically good/bad
"Yann E. MORIN"
hbisect: add two new revset descriptions: 'goods' and 'bads'...
r15153 - ``range`` : csets taking part in the bisection
- ``pruned`` : csets that are goods, bads or skipped
"Yann E. MORIN"
revset.bisect: add new 'untested' set to the bisect keyword...
r15138 - ``untested`` : csets whose fate is yet unknown
"Yann E. MORIN"
revset.bisect: add 'ignored' set to the bisect keyword...
r15147 - ``ignored`` : csets ignored due to DAG topology
Bryan O'Sullivan
bisect: track the current changeset (issue3382)...
r16647 - ``current`` : the cset currently being bisected
"Yann E. MORIN"
revset.bisect: move bisect() code to hbisect.py...
r15135 """
state = load_state(repo)
Bryan O'Sullivan
bisect: track the current changeset (issue3382)...
r16647 if status in ('good', 'bad', 'skip', 'current'):
return map(repo.changelog.rev, state[status])
"Yann E. MORIN"
revset.bisect: move bisect() code to hbisect.py...
r15135 else:
timeless@mozdev.org
spelling: following
r17493 # In the following sets, we do *not* call 'bisect()' with more
timeless@mozdev.org
spelling: recursion
r17509 # than one level of recursion, because that can be very, very
"Yann E. MORIN"
hbisect.get: use simpler code with repo.set(), fix 'pruned' set...
r15146 # time consuming. Instead, we always develop the expression as
# much as possible.
# 'range' is all csets that make the bisection:
# - have a good ancestor and a bad descendant, or conversely
# that's because the bisection can go either way
range = '( bisect(bad)::bisect(good) | bisect(good)::bisect(bad) )'
"Yann E. MORIN"
revset.bisect: add new 'range' set to the bisect keyword...
r15136
Matt Mackall
localrepo: convert various repo.set() users to repo.revs()
r15404 _t = repo.revs('bisect(good)::bisect(bad)')
"Yann E. MORIN"
hbisect: add two new revset descriptions: 'goods' and 'bads'...
r15153 # The sets of topologically good or bad csets
if len(_t) == 0:
# Goods are topologically after bads
goods = 'bisect(good)::' # Pruned good csets
bads = '::bisect(bad)' # Pruned bad csets
else:
# Goods are topologically before bads
goods = '::bisect(good)' # Pruned good csets
bads = 'bisect(bad)::' # Pruned bad csets
# 'pruned' is all csets whose fate is already known: good, bad, skip
skips = 'bisect(skip)' # Pruned skipped csets
pruned = '( (%s) | (%s) | (%s) )' % (goods, bads, skips)
"Yann E. MORIN"
revset.bisect: add new 'range' set to the bisect keyword...
r15136
"Yann E. MORIN"
hbisect.get: use simpler code with repo.set(), fix 'pruned' set...
r15146 # 'untested' is all cset that are- in 'range', but not in 'pruned'
untested = '( (%s) - (%s) )' % (range, pruned)
"Yann E. MORIN"
revset.bisect: add new 'range' set to the bisect keyword...
r15136
"Yann E. MORIN"
revset.bisect: add 'ignored' set to the bisect keyword...
r15147 # 'ignored' is all csets that were not used during the bisection
# due to DAG topology, but may however have had an impact.
Mads Kiilerich
fix trivial spelling errors
r17424 # E.g., a branch merged between bads and goods, but whose branch-
"Yann E. MORIN"
revset.bisect: add 'ignored' set to the bisect keyword...
r15147 # point is out-side of the range.
iba = '::bisect(bad) - ::bisect(good)' # Ignored bads' ancestors
iga = '::bisect(good) - ::bisect(bad)' # Ignored goods' ancestors
ignored = '( ( (%s) | (%s) ) - (%s) )' % (iba, iga, range)
"Yann E. MORIN"
revset.bisect: add new 'range' set to the bisect keyword...
r15136 if status == 'range':
Matt Mackall
localrepo: convert various repo.set() users to repo.revs()
r15404 return repo.revs(range)
"Yann E. MORIN"
revset.bisect: add new 'pruned' set to the bisect keyword...
r15137 elif status == 'pruned':
Matt Mackall
localrepo: convert various repo.set() users to repo.revs()
r15404 return repo.revs(pruned)
"Yann E. MORIN"
revset.bisect: add new 'untested' set to the bisect keyword...
r15138 elif status == 'untested':
Matt Mackall
localrepo: convert various repo.set() users to repo.revs()
r15404 return repo.revs(untested)
"Yann E. MORIN"
revset.bisect: add 'ignored' set to the bisect keyword...
r15147 elif status == 'ignored':
Matt Mackall
localrepo: convert various repo.set() users to repo.revs()
r15404 return repo.revs(ignored)
"Yann E. MORIN"
hbisect: add two new revset descriptions: 'goods' and 'bads'...
r15153 elif status == "goods":
Matt Mackall
localrepo: convert various repo.set() users to repo.revs()
r15404 return repo.revs(goods)
"Yann E. MORIN"
hbisect: add two new revset descriptions: 'goods' and 'bads'...
r15153 elif status == "bads":
Matt Mackall
localrepo: convert various repo.set() users to repo.revs()
r15404 return repo.revs(bads)
"Yann E. MORIN"
revset.bisect: add new 'range' set to the bisect keyword...
r15136 else:
raise error.ParseError(_('invalid bisect state'))
"Yann E. MORIN"
hbisect: add functions to return a label for a cset bisection status...
r15154
"Yann E. MORIN"
bisect: remove superfluous parameter in label()...
r15406 def label(repo, node):
"Yann E. MORIN"
hbisect: add functions to return a label for a cset bisection status...
r15154 rev = repo.changelog.rev(node)
# Try explicit sets
if rev in get(repo, 'good'):
Wagner Bruna
bisect: add i18n contexts
r15308 # i18n: bisect changeset status
"Yann E. MORIN"
hbisect: add functions to return a label for a cset bisection status...
r15154 return _('good')
if rev in get(repo, 'bad'):
Wagner Bruna
bisect: add i18n contexts
r15308 # i18n: bisect changeset status
"Yann E. MORIN"
hbisect: add functions to return a label for a cset bisection status...
r15154 return _('bad')
if rev in get(repo, 'skip'):
Wagner Bruna
bisect: add i18n contexts
r15308 # i18n: bisect changeset status
"Yann E. MORIN"
hbisect: add functions to return a label for a cset bisection status...
r15154 return _('skipped')
Bryan O'Sullivan
bisect: track the current changeset (issue3382)...
r16647 if rev in get(repo, 'untested') or rev in get(repo, 'current'):
Wagner Bruna
bisect: add i18n contexts
r15308 # i18n: bisect changeset status
"Yann E. MORIN"
hbisect: add functions to return a label for a cset bisection status...
r15154 return _('untested')
if rev in get(repo, 'ignored'):
Wagner Bruna
bisect: add i18n contexts
r15308 # i18n: bisect changeset status
"Yann E. MORIN"
hbisect: add functions to return a label for a cset bisection status...
r15154 return _('ignored')
# Try implicit sets
if rev in get(repo, 'goods'):
Wagner Bruna
bisect: add i18n contexts
r15308 # i18n: bisect changeset status
"Yann E. MORIN"
hbisect: add functions to return a label for a cset bisection status...
r15154 return _('good (implicit)')
if rev in get(repo, 'bads'):
Wagner Bruna
bisect: add i18n contexts
r15308 # i18n: bisect changeset status
"Yann E. MORIN"
hbisect: add functions to return a label for a cset bisection status...
r15154 return _('bad (implicit)')
return None
def shortlabel(label):
if label:
return label[0].upper()
return None