##// END OF EJS Templates
contrib: enforce wrapping too-long lines with () instead of \...
contrib: enforce wrapping too-long lines with () instead of \ This is the style I prefer, and an anecdotal exploration of styles recommended in style guides etc. Further, to quote pep8: > The preferred way of wrapping long lines is by using Python's implied > line continuation inside parentheses, brackets and braces. Long lines > can be broken over multiple lines by wrapping expressions in > parentheses. These should be used in preference to using a backslash > for line continuation. So I think this is a virtuous change. Differential Revision: https://phab.mercurial-scm.org/D5995

File last commit:

r41374:76873548 stable
r41927:e2472b12 default
Show More
setdiscovery.py
356 lines | 12.0 KiB | text/x-python | PythonLexer
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 # setdiscovery.py - improved discovery of common nodeset for mercurial
#
# Copyright 2010 Benoit Boissinot <bboissin@gmail.com>
# and Peter Arrenbrecht <peter@arrenbrecht.ch>
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
Olle Lundberg
setdiscovery: document algorithms used...
r20656 """
Algorithm works in the following way. You have two repository: local and
remote. They both contains a DAG of changelists.
The goal of the discovery protocol is to find one set of node *common*,
the set of nodes shared by local and remote.
One of the issue with the original protocol was latency, it could
potentially require lots of roundtrips to discover that the local repo was a
subset of remote (which is a very common case, you usually have few changes
compared to upstream, while upstream probably had lots of development).
The new protocol only requires one interface for the remote repo: `known()`,
which given a set of changelists tells you if they are present in the DAG.
The algorithm then works as follow:
- We will be using three sets, `common`, `missing`, `unknown`. Originally
all nodes are in `unknown`.
- Take a sample from `unknown`, call `remote.known(sample)`
- For each node that remote knows, move it and all its ancestors to `common`
- For each node that remote doesn't know, move it and all its descendants
to `missing`
- Iterate until `unknown` is empty
There are a couple optimizations, first is instead of starting with a random
sample of missing, start by sending all heads, in the case where the local
repo is a subset, you computed the answer in one round trip.
Then you can do something similar to the bisecting strategy used when
finding faulty changesets. Instead of random samples, you can try picking
nodes that will maximize the number of nodes that will be
classified with it (since all ancestors or descendants will be marked as well).
"""
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164
Gregory Szorc
setdiscovery: use absolute_import
r25973 from __future__ import absolute_import
Martin von Zweigbergk
util: drop alias for collections.deque...
r25113 import collections
Augie Fackler
cleanup: move stdlib imports to their own import statement...
r20034 import random
Gregory Szorc
setdiscovery: use absolute_import
r25973
from .i18n import _
from .node import (
nullid,
nullrev,
)
from . import (
Pierre-Yves David
error: get Abort from 'error' instead of 'util'...
r26587 error,
discovery: include timing in the debug output...
r32712 util,
Gregory Szorc
setdiscovery: use absolute_import
r25973 )
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164
Gregory Szorc
setdiscovery: don't use dagutil for parent resolution...
r39210 def _updatesample(revs, heads, sample, parentfn, quicksamplesize=0):
Pierre-Yves David
setdiscovery: document the '_updatesample' function...
r23809 """update an existing sample to match the expected size
Gregory Szorc
setdiscovery: reflect use of revs instead of nodes...
r39204 The sample is updated with revs exponentially distant from each head of the
<revs> set. (H~1, H~2, H~4, H~8, etc).
Pierre-Yves David
setdiscovery: document the '_updatesample' function...
r23809
If a target size is specified, the sampling will stop once this size is
Gregory Szorc
setdiscovery: reflect use of revs instead of nodes...
r39204 reached. Otherwise sampling will happen until roots of the <revs> set are
Pierre-Yves David
setdiscovery: document the '_updatesample' function...
r23809 reached.
Gregory Szorc
setdiscovery: reflect use of revs instead of nodes...
r39204 :revs: set of revs we want to discover (if None, assume the whole dag)
Gregory Szorc
setdiscovery: pass heads into _updatesample()...
r39206 :heads: set of DAG head revs
Pierre-Yves David
setdiscovery: document the '_updatesample' function...
r23809 :sample: a sample to update
Gregory Szorc
setdiscovery: don't use dagutil for parent resolution...
r39210 :parentfn: a callable to resolve parents for a revision
Pierre-Yves David
setdiscovery: document the '_updatesample' function...
r23809 :quicksamplesize: optional target size of the sample"""
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 dist = {}
Martin von Zweigbergk
util: drop alias for collections.deque...
r25113 visit = collections.deque(heads)
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 seen = set()
factor = 1
while visit:
curr = visit.popleft()
if curr in seen:
continue
d = dist.setdefault(curr, 1)
if d > factor:
factor *= 2
if d == factor:
Pierre-Yves David
setdiscovery: drop the 'always' argument to '_updatesample'...
r23814 sample.add(curr)
if quicksamplesize and (len(sample) >= quicksamplesize):
return
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 seen.add(curr)
Gregory Szorc
setdiscovery: don't use dagutil for parent resolution...
r39210
for p in parentfn(curr):
if p != nullrev and (not revs or p in revs):
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 dist.setdefault(p, d + 1)
visit.append(p)
Gregory Szorc
setdiscovery: don't use dagutil for parent resolution...
r39210 def _takequicksample(repo, headrevs, revs, size):
Pierre-Yves David
setdiscovery: document '_takequicksample'
r23816 """takes a quick sample of size <size>
It is meant for initial sampling and focuses on querying heads and close
ancestors of heads.
:dag: a dag object
Gregory Szorc
setdiscovery: pass head revisions into sample functions...
r39207 :headrevs: set of head revisions in local DAG to consider
Gregory Szorc
setdiscovery: reflect use of revs instead of nodes...
r39204 :revs: set of revs to discover
Pierre-Yves David
setdiscovery: document '_takequicksample'
r23816 :size: the maximum size of the sample"""
Boris Feld
discovery: move handling of sampling special case inside sampling function...
r41146 if len(revs) <= size:
return list(revs)
Gregory Szorc
setdiscovery: use a revset for finding DAG heads in a subset...
r39205 sample = set(repo.revs('heads(%ld)', revs))
Martin von Zweigbergk
setdiscovery: avoid a Yoda condition...
r36735 if len(sample) >= size:
Pierre-Yves David
setdiscovery: drop '_setupsample' usage in '_takequicksample'...
r23815 return _limitsample(sample, size)
Gregory Szorc
setdiscovery: pass heads into _updatesample()...
r39206
Gregory Szorc
setdiscovery: don't use dagutil for parent resolution...
r39210 _updatesample(None, headrevs, sample, repo.changelog.parentrevs,
quicksamplesize=size)
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 return sample
Gregory Szorc
setdiscovery: don't use dagutil for parent resolution...
r39210 def _takefullsample(repo, headrevs, revs, size):
Boris Feld
discovery: move handling of sampling special case inside sampling function...
r41146 if len(revs) <= size:
return list(revs)
Gregory Szorc
setdiscovery: use a revset for finding DAG heads in a subset...
r39205 sample = set(repo.revs('heads(%ld)', revs))
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 # update from heads
Gregory Szorc
setdiscovery: use revsets for computing a subset's heads and roots...
r39209 revsheads = set(repo.revs('heads(%ld)', revs))
Gregory Szorc
setdiscovery: don't use dagutil for parent resolution...
r39210 _updatesample(revs, revsheads, sample, repo.changelog.parentrevs)
Gregory Szorc
setdiscovery: precompute children revisions to avoid quadratic lookup...
r39214
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 # update from roots
Gregory Szorc
setdiscovery: use revsets for computing a subset's heads and roots...
r39209 revsroots = set(repo.revs('roots(%ld)', revs))
Gregory Szorc
setdiscovery: don't use dagutil for parent resolution...
r39210
Gregory Szorc
setdiscovery: precompute children revisions to avoid quadratic lookup...
r39214 # _updatesample() essentially does interaction over revisions to look up
# their children. This lookup is expensive and doing it in a loop is
# quadratic. We precompute the children for all relevant revisions and
# make the lookup in _updatesample() a simple dict lookup.
#
# Because this function can be called multiple times during discovery, we
# may still perform redundant work and there is room to optimize this by
# keeping a persistent cache of children across invocations.
children = {}
Gregory Szorc
setdiscovery: don't use dagutil for parent resolution...
r39210
Gregory Szorc
setdiscovery: precompute children revisions to avoid quadratic lookup...
r39214 parentrevs = repo.changelog.parentrevs
for rev in repo.changelog.revs(start=min(revsroots)):
# Always ensure revision has an entry so we don't need to worry about
# missing keys.
children.setdefault(rev, [])
for prev in parentrevs(rev):
if prev == nullrev:
continue
children.setdefault(prev, []).append(rev)
_updatesample(revs, revsroots, sample, children.__getitem__)
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 assert sample
Pierre-Yves David
setdiscovery: randomly pick between heads and sample when taking full sample...
r23810 sample = _limitsample(sample, size)
Boris Feld
discovery: re-adjust a conditional wrongly changed...
r41197 if len(sample) < size:
Pierre-Yves David
setdiscovery: randomly pick between heads and sample when taking full sample...
r23810 more = size - len(sample)
Gregory Szorc
setdiscovery: reflect use of revs instead of nodes...
r39204 sample.update(random.sample(list(revs - sample), more))
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 return sample
Pierre-Yves David
setdiscovery: extract sample limitation in a `_limitsample` function...
r23083 def _limitsample(sample, desiredlen):
"""return a random subset of sample of at most desiredlen item"""
if len(sample) > desiredlen:
sample = set(random.sample(sample, desiredlen))
return sample
Boris Feld
discovery: introduce a partialdiscovery object...
r41147 class partialdiscovery(object):
"""an object representing ongoing discovery
Feed with data from the remote repository, this object keep track of the
current set of changeset in various states:
Boris Feld
discovery: improve partial discovery documentation...
r41208 - common: revs also known remotely
- undecided: revs we don't have information on yet
- missing: revs missing remotely
(all tracked revisions are known locally)
Boris Feld
discovery: introduce a partialdiscovery object...
r41147 """
Boris Feld
discovery: move undecided set on the partialdiscovery...
r41203 def __init__(self, repo, targetheads):
Boris Feld
discovery: introduce a partialdiscovery object...
r41147 self._repo = repo
Boris Feld
discovery: move undecided set on the partialdiscovery...
r41203 self._targetheads = targetheads
Boris Feld
discovery: introduce a partialdiscovery object...
r41147 self._common = repo.changelog.incrementalmissingrevs()
Boris Feld
discovery: move undecided set on the partialdiscovery...
r41203 self._undecided = None
Boris Feld
discovery: move missing tracking inside the partialdiscovery object...
r41206 self.missing = set()
Boris Feld
discovery: introduce a partialdiscovery object...
r41147
def addcommons(self, commons):
"""registrer nodes known as common"""
self._common.addbases(commons)
Boris Feld
partialdiscovery: avoid `undecided` related computation sooner than necessary...
r41374 if self._undecided is not None:
self._common.removeancestorsfrom(self._undecided)
Boris Feld
discovery: introduce a partialdiscovery object...
r41147
Boris Feld
discovery: move missing tracking inside the partialdiscovery object...
r41206 def addmissings(self, missings):
"""registrer some nodes as missing"""
Boris Feld
discovery: compute newly discovered missing in a more efficient way...
r41316 newmissing = self._repo.revs('%ld::%ld', missings, self.undecided)
if newmissing:
self.missing.update(newmissing)
self.undecided.difference_update(newmissing)
Boris Feld
discovery: move missing tracking inside the partialdiscovery object...
r41206
Boris Feld
discovery: add a simple `addinfo` method...
r41207 def addinfo(self, sample):
"""consume an iterable of (rev, known) tuples"""
common = set()
missing = set()
for rev, known in sample:
if known:
common.add(rev)
else:
missing.add(rev)
if common:
self.addcommons(common)
if missing:
self.addmissings(missing)
Boris Feld
discovery: introduce a partialdiscovery object...
r41147 def hasinfo(self):
"""return True is we have any clue about the remote state"""
return self._common.hasbases()
Boris Feld
discovery: add a `iscomplete` method to the `partialdiscovery` object...
r41205 def iscomplete(self):
"""True if all the necessary data have been gathered"""
return self._undecided is not None and not self._undecided
Boris Feld
discovery: move undecided set on the partialdiscovery...
r41203 @property
def undecided(self):
if self._undecided is not None:
return self._undecided
self._undecided = set(self._common.missingancestors(self._targetheads))
return self._undecided
Boris Feld
discovery: move common heads computation inside partialdiscovery object...
r41148 def commonheads(self):
"""the heads of the known common set"""
# heads(common) == heads(common.bases) since common represents
# common.bases and all its ancestors
Georges Racinet
discovery: using the new basesheads()...
r41281 return self._common.basesheads()
Boris Feld
discovery: introduce a partialdiscovery object...
r41147
Martin von Zweigbergk
setdiscovery: back out changeset 5cfdf6137af8 (issue5809)...
r36732 def findcommonheads(ui, local, remote,
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 initialsamplesize=100,
fullsamplesize=200,
Boris Feld
setdiscover: allow to ignore part of the local graph...
r35305 abortwhenunrelated=True,
ancestorsof=None):
Steven Brown
setdiscovery: limit lines to 80 characters
r14206 '''Return a tuple (common, anyincoming, remoteheads) used to identify
missing nodes from or in remote.
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 '''
discovery: include timing in the debug output...
r32712 start = util.timer()
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 roundtrips = 0
cl = local.changelog
Gregory Szorc
setdiscovery: don't use dagutil for rev -> node conversions...
r39195 clnode = cl.node
Gregory Szorc
setdiscovery: don't use dagutil for node -> rev conversion...
r39197 clrev = cl.rev
Gregory Szorc
setdiscovery: don't use dagutil for rev -> node conversions...
r39195
Boris Feld
setdiscover: allow to ignore part of the local graph...
r35305 if ancestorsof is not None:
Gregory Szorc
setdiscovery: don't use dagutil to compute heads...
r39201 ownheads = [clrev(n) for n in ancestorsof]
else:
ownheads = [rev for rev in cl.headrevs() if rev != nullrev]
Peter Arrenbrecht
setdiscovery: batch heads and known(ownheads)...
r14624 # early exit if we know all the specified remote heads already
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 ui.debug("query 1; heads\n")
roundtrips += 1
Pierre-Yves David
setdiscovery: limit the size of the initial sample (issue4411)...
r23084 sample = _limitsample(ownheads, initialsamplesize)
Mads Kiilerich
discovery: indices between sample and yesno must match (issue4438)...
r23192 # indices between sample and externalized version must match
sample = list(sample)
Gregory Szorc
wireproto: implement batching on peer executor interface...
r37649
with remote.commandexecutor() as e:
fheads = e.callcommand('heads', {})
fknown = e.callcommand('known', {
Gregory Szorc
setdiscovery: don't use dagutil for rev -> node conversions...
r39195 'nodes': [clnode(r) for r in sample],
Gregory Szorc
wireproto: implement batching on peer executor interface...
r37649 })
srvheadhashes, yesno = fheads.result(), fknown.result()
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164
if cl.tip() == nullid:
if srvheadhashes != [nullid]:
return [nullid], True, srvheadhashes
return [nullid], False, []
Steven Brown
setdiscovery: limit lines to 80 characters
r14206 # start actual discovery (we note this before the next "if" for
# compatibility reasons)
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 ui.status(_("searching for changes\n"))
Gregory Szorc
setdiscovery: don't use dagutil for node -> rev conversion...
r39197 srvheads = []
for node in srvheadhashes:
if node == nullid:
continue
try:
srvheads.append(clrev(node))
# Catches unknown and filtered nodes.
except error.LookupError:
continue
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 if len(srvheads) == len(srvheadhashes):
Matt Mackall
discovery: quiet note about heads...
r14833 ui.debug("all remote heads known locally\n")
Gregory Szorc
setdiscovery: don't use dagutil for rev -> node conversions...
r39195 return srvheadhashes, False, srvheadhashes
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164
Martin von Zweigbergk
setdiscovery: remove initialsamplesize from a condition...
r36733 if len(sample) == len(ownheads) and all(yesno):
Mads Kiilerich
add missing localization markup
r15497 ui.note(_("all local heads known remotely\n"))
Gregory Szorc
setdiscovery: don't use dagutil for rev -> node conversions...
r39195 ownheadhashes = [clnode(r) for r in ownheads]
return ownheadhashes, True, srvheadhashes
Peter Arrenbrecht
setdiscovery: batch heads and known(ownheads)...
r14624
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 # full blown discovery
Boris Feld
discovery: move undecided set on the partialdiscovery...
r41203 disco = partialdiscovery(local, ownheads)
Siddharth Agarwal
setdiscovery: avoid a full changelog graph traversal...
r23343 # treat remote heads (and maybe own heads) as a first implicit sample
# response
Boris Feld
discovery: introduce a partialdiscovery object...
r41147 disco.addcommons(srvheads)
Boris Feld
discovery: add a simple `addinfo` method...
r41207 disco.addinfo(zip(sample, yesno))
Brodie Rao
cleanup: eradicate long lines
r16683
Peter Arrenbrecht
setdiscovery: batch heads and known(ownheads)...
r14624 full = False
Martin von Zweigbergk
setdiscovery: use progress helper...
r38369 progress = ui.makeprogress(_('searching'), unit=_('queries'))
Boris Feld
discovery: add a `iscomplete` method to the `partialdiscovery` object...
r41205 while not disco.iscomplete():
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164
Boris Feld
discovery: introduce a partialdiscovery object...
r41147 if full or disco.hasinfo():
Pierre-Yves David
setdiscovery: factorize similar sampling code...
r23747 if full:
ui.note(_("sampling from both directions\n"))
else:
ui.debug("taking initial sample\n")
Pierre-Yves David
setdiscovery: delay sample building calls to gather them in a single place...
r23807 samplefunc = _takefullsample
Pierre-Yves David
setdiscovery: limit the size of all sample (issue4411)...
r23130 targetsize = fullsamplesize
Peter Arrenbrecht
setdiscovery: batch heads and known(ownheads)...
r14624 else:
# use even cheaper initial sample
ui.debug("taking quick initial sample\n")
Pierre-Yves David
setdiscovery: delay sample building calls to gather them in a single place...
r23807 samplefunc = _takequicksample
Pierre-Yves David
setdiscovery: limit the size of all sample (issue4411)...
r23130 targetsize = initialsamplesize
Boris Feld
discovery: move undecided set on the partialdiscovery...
r41203 sample = samplefunc(local, ownheads, disco.undecided, targetsize)
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164
roundtrips += 1
Martin von Zweigbergk
setdiscovery: use progress helper...
r38369 progress.update(roundtrips)
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 ui.debug("query %i; still undecided: %i, sample size is: %i\n"
Boris Feld
discovery: move undecided set on the partialdiscovery...
r41203 % (roundtrips, len(disco.undecided), len(sample)))
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 # indices between sample and externalized version must match
sample = list(sample)
Gregory Szorc
wireproto: implement command executor interface for version 1 peers...
r37648
with remote.commandexecutor() as e:
yesno = e.callcommand('known', {
Gregory Szorc
setdiscovery: don't use dagutil for rev -> node conversions...
r39195 'nodes': [clnode(r) for r in sample],
Gregory Szorc
wireproto: implement command executor interface for version 1 peers...
r37648 }).result()
Peter Arrenbrecht
setdiscovery: batch heads and known(ownheads)...
r14624 full = True
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164
Boris Feld
discovery: add a simple `addinfo` method...
r41207 disco.addinfo(zip(sample, yesno))
Siddharth Agarwal
setdiscovery: avoid a full changelog graph traversal...
r23343
Boris Feld
discovery: move common heads computation inside partialdiscovery object...
r41148 result = disco.commonheads()
discovery: include timing in the debug output...
r32712 elapsed = util.timer() - start
Martin von Zweigbergk
progress: hide update(None) in a new complete() method...
r38392 progress.complete()
discovery: include timing in the debug output...
r32712 ui.debug("%d total queries in %.4fs\n" % (roundtrips, elapsed))
setdiscovery: improves logged message...
r32768 msg = ('found %d common and %d unknown server heads,'
' %d roundtrips in %.4fs\n')
missing = set(result) - set(srvheads)
ui.log('discovery', msg, len(result), len(missing), roundtrips,
discovery: log discovery result in non-trivial cases...
r32713 elapsed)
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164
if not result and srvheadhashes != [nullid]:
if abortwhenunrelated:
Pierre-Yves David
error: get Abort from 'error' instead of 'util'...
r26587 raise error.Abort(_("repository is unrelated"))
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164 else:
ui.warn(_("warning: repository is unrelated\n"))
Martin von Zweigbergk
cleanup: use set literals...
r32291 return ({nullid}, True, srvheadhashes,)
Peter Arrenbrecht
discovery: add new set-based discovery...
r14164
Andrew Pritchard
setdiscovery: return anyincoming=False when remote's only head is nullid...
r14981 anyincoming = (srvheadhashes != [nullid])
Gregory Szorc
setdiscovery: don't use dagutil for rev -> node conversions...
r39195 result = {clnode(r) for r in result}
return result, anyincoming, srvheadhashes