upstream/mercurial-mirror Files · mercurial/setdiscovery.py

rebase: use "rebase.collapse" as "editform" for "--collapse" always...

rebase: use "rebase.collapse" as "editform" for "--collapse" always Before this patch, if both "--message" and "--collapse" are specified for "hg rebase", "rebaes.normal" is used as "editform" unexpectedly. Unlike patches before and after in this series for improvement, this is bug fix patch.

Olle Lundberg - - Load All Authors

File last commit:

r20656:cdecbc5a default


                r22206:6122ad50

default

Download file

             setdiscovery.py
        
                    232 lines
            
             | 8.3 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / mercurial / setdiscovery.py
          
                    History
                
                 |
                  Source
                 | Raw
                 |Copy content
                 |Copy permalink

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
      # setdiscovery.py - improved discovery of common nodeset for mercurial

      #

      # Copyright 2010 Benoit Boissinot <bboissin@gmail.com>

      # and Peter Arrenbrecht <peter@arrenbrecht.ch>

      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

        Olle Lundberg
    
setdiscovery: document algorithms used...

              r20656
            
      """

      Algorithm works in the following way. You have two repository: local and

      remote. They both contains a DAG of changelists.

      The goal of the discovery protocol is to find one set of node *common*,

      the set of nodes shared by local and remote.

      One of the issue with the original protocol was latency, it could

      potentially require lots of roundtrips to discover that the local repo was a

      subset of remote (which is a very common case, you usually have few changes

      compared to upstream, while upstream probably had lots of development).

      The new protocol only requires one interface for the remote repo: `known()`,

      which given a set of changelists tells you if they are present in the DAG.

      The algorithm then works as follow:

       - We will be using three sets, `common`, `missing`, `unknown`. Originally

       all nodes are in `unknown`.

       - Take a sample from `unknown`, call `remote.known(sample)`

         - For each node that remote knows, move it and all its ancestors to `common`

         - For each node that remote doesn't know, move it and all its descendants

         to `missing`

       - Iterate until `unknown` is empty

      There are a couple optimizations, first is instead of starting with a random

      sample of missing, start by sending all heads, in the case where the local

      repo is a subset, you computed the answer in one round trip.

      Then you can do something similar to the bisecting strategy used when

      finding faulty changesets. Instead of random samples, you can try picking

      nodes that will maximize the number of nodes that will be

      classified with it (since all ancestors or descendants will be marked as well).

      """

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
      from node import nullid

      from i18n import _

        Augie Fackler
    
cleanup: move stdlib imports to their own import statement...

              r20034
            
      import random

      import util, dagutil

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
      def _updatesample(dag, nodes, sample, always, quicksamplesize=0):

          # if nodes is empty we scan the entire graph

          if nodes:

              heads = dag.headsetofconnecteds(nodes)

          else:

              heads = dag.heads()

          dist = {}

        Bryan O'Sullivan
    
util: subclass deque for Python 2.4 backwards compatibility...

              r16834
            
          visit = util.deque(heads)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          seen = set()

          factor = 1

          while visit:

              curr = visit.popleft()

              if curr in seen:

                  continue

              d = dist.setdefault(curr, 1)

              if d > factor:

                  factor *= 2

              if d == factor:

                  if curr not in always: # need this check for the early exit below

                      sample.add(curr)

                      if quicksamplesize and (len(sample) >= quicksamplesize):

                          return

              seen.add(curr)

              for p in dag.parents(curr):

                  if not nodes or p in nodes:

                      dist.setdefault(p, d + 1)

                      visit.append(p)

      def _setupsample(dag, nodes, size):

          if len(nodes) <= size:

              return set(nodes), None, 0

        Peter Arrenbrecht
    
setdiscovery: fix hang when #heads>200 (issue2971)...

              r15063
            
          always = dag.headsetofconnecteds(nodes)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          desiredlen = size - len(always)

          if desiredlen <= 0:

              # This could be bad if there are very many heads, all unknown to the

              # server. We're counting on long request support here.

              return always, None, desiredlen

          return always, set(), desiredlen

      def _takequicksample(dag, nodes, size, initial):

          always, sample, desiredlen = _setupsample(dag, nodes, size)

          if sample is None:

              return always

          if initial:

              fromset = None

          else:

              fromset = nodes

          _updatesample(dag, fromset, sample, always, quicksamplesize=desiredlen)

          sample.update(always)

          return sample

      def _takefullsample(dag, nodes, size):

          always, sample, desiredlen = _setupsample(dag, nodes, size)

          if sample is None:

              return always

          # update from heads

          _updatesample(dag, nodes, sample, always)

          # update from roots

          _updatesample(dag.inverse(), nodes, sample, always)

          assert sample

          if len(sample) > desiredlen:

              sample = set(random.sample(sample, desiredlen))

          elif len(sample) < desiredlen:

              more = desiredlen - len(sample)

              sample.update(random.sample(list(nodes - sample - always), more))

          sample.update(always)

          return sample

      def findcommonheads(ui, local, remote,

                          initialsamplesize=100,

                          fullsamplesize=200,

                          abortwhenunrelated=True):

        Steven Brown
    
setdiscovery: limit lines to 80 characters

              r14206
            
          '''Return a tuple (common, anyincoming, remoteheads) used to identify

          missing nodes from or in remote.

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          '''

          roundtrips = 0

          cl = local.changelog

          dag = dagutil.revlogdag(cl)

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
          # early exit if we know all the specified remote heads already

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          ui.debug("query 1; heads\n")

          roundtrips += 1

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
          ownheads = dag.heads()

          sample = ownheads

          if remote.local():

              # stopgap until we have a proper localpeer that supports batch()

        Pierre-Yves David
    
localpeer: return only visible heads and branchmap...

              r17204
            
              srvheadhashes = remote.heads()

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
              yesno = remote.known(dag.externalizeall(sample))

          elif remote.capable('batch'):

              batch = remote.batch()

              srvheadhashesref = batch.heads()

              yesnoref = batch.known(dag.externalizeall(sample))

              batch.submit()

              srvheadhashes = srvheadhashesref.value

              yesno = yesnoref.value

          else:

        Mads Kiilerich
    
fix trivial spelling errors

              r17424
            
              # compatibility with pre-batch, but post-known remotes during 1.9

              # development

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
              srvheadhashes = remote.heads()

              sample = []

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          if cl.tip() == nullid:

              if srvheadhashes != [nullid]:

                  return [nullid], True, srvheadhashes

              return [nullid], False, []

        Steven Brown
    
setdiscovery: limit lines to 80 characters

              r14206
            
          # start actual discovery (we note this before the next "if" for

          # compatibility reasons)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          ui.status(_("searching for changes\n"))

          srvheads = dag.internalizeall(srvheadhashes, filterunknown=True)

          if len(srvheads) == len(srvheadhashes):

        Matt Mackall
    
discovery: quiet note about heads...

              r14833
            
              ui.debug("all remote heads known locally\n")

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
              return (srvheadhashes, False, srvheadhashes,)

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
          if sample and util.all(yesno):

        Mads Kiilerich
    
add missing localization markup

              r15497
            
              ui.note(_("all local heads known remotely\n"))

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
              ownheadhashes = dag.externalizeall(ownheads)

              return (ownheadhashes, True, srvheadhashes,)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          # full blown discovery

        Brodie Rao
    
cleanup: eradicate long lines

              r16683
            
          # own nodes where I don't know if remote knows them

          undecided = dag.nodeset()

          # own nodes I know we both know

          common = set()

          # own nodes I know remote lacks

          missing = set()

          # treat remote heads (and maybe own heads) as a first implicit sample

          # response

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          common.update(dag.ancestorset(srvheads))

          undecided.difference_update(common)

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
          full = False

          while undecided:

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
              if sample:

                  commoninsample = set(n for i, n in enumerate(sample) if yesno[i])

                  common.update(dag.ancestorset(commoninsample, common))

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
                  missinginsample = [n for i, n in enumerate(sample) if not yesno[i]]

                  missing.update(dag.descendantset(missinginsample, missing))

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
                  undecided.difference_update(missing)

                  undecided.difference_update(common)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
              if not undecided:

                  break

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
              if full:

        Mads Kiilerich
    
add missing localization markup

              r15497
            
                  ui.note(_("sampling from both directions\n"))

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
                  sample = _takefullsample(dag, undecided, size=fullsamplesize)

              elif common:

                  # use cheapish initial sample

                  ui.debug("taking initial sample\n")

                  sample = _takefullsample(dag, undecided, size=fullsamplesize)

              else:

                  # use even cheaper initial sample

                  ui.debug("taking quick initial sample\n")

                  sample = _takequicksample(dag, undecided, size=initialsamplesize,

                                            initial=True)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
              roundtrips += 1

              ui.progress(_('searching'), roundtrips, unit=_('queries'))

              ui.debug("query %i; still undecided: %i, sample size is: %i\n"

                       % (roundtrips, len(undecided), len(sample)))

              # indices between sample and externalized version must match

              sample = list(sample)

              yesno = remote.known(dag.externalizeall(sample))

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
              full = True

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          result = dag.headsetofconnecteds(common)

          ui.progress(_('searching'), None)

          ui.debug("%d total queries\n" % roundtrips)

          if not result and srvheadhashes != [nullid]:

              if abortwhenunrelated:

                  raise util.Abort(_("repository is unrelated"))

              else:

                  ui.warn(_("warning: repository is unrelated\n"))

              return (set([nullid]), True, srvheadhashes,)

        Andrew Pritchard
    
setdiscovery: return anyincoming=False when remote's only head is nullid...

              r14981
            
          anyincoming = (srvheadhashes != [nullid])

          return dag.externalizeall(result), anyincoming, srvheadhashes

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

Peter Arrenbrecht discovery: add new set-based discovery...	r14164	# setdiscovery.py - improved discovery of common nodeset for mercurial
		#
		# Copyright 2010 Benoit Boissinot <bboissin@gmail.com>
		# and Peter Arrenbrecht <peter@arrenbrecht.ch>
		#
		# This software may be used and distributed according to the terms of the
		# GNU General Public License version 2 or any later version.
Olle Lundberg setdiscovery: document algorithms used...	r20656	"""
		Algorithm works in the following way. You have two repository: local and
		remote. They both contains a DAG of changelists.

		The goal of the discovery protocol is to find one set of node common,
		the set of nodes shared by local and remote.

		One of the issue with the original protocol was latency, it could
		potentially require lots of roundtrips to discover that the local repo was a
		subset of remote (which is a very common case, you usually have few changes
		compared to upstream, while upstream probably had lots of development).

		The new protocol only requires one interface for the remote repo: `known()`,
		which given a set of changelists tells you if they are present in the DAG.

		The algorithm then works as follow:

		- We will be using three sets, `common`, `missing`, `unknown`. Originally
		all nodes are in `unknown`.
		- Take a sample from `unknown`, call `remote.known(sample)`
		- For each node that remote knows, move it and all its ancestors to `common`
		- For each node that remote doesn't know, move it and all its descendants
		to `missing`
		- Iterate until `unknown` is empty

		There are a couple optimizations, first is instead of starting with a random
		sample of missing, start by sending all heads, in the case where the local
		repo is a subset, you computed the answer in one round trip.

		Then you can do something similar to the bisecting strategy used when
		finding faulty changesets. Instead of random samples, you can try picking
		nodes that will maximize the number of nodes that will be
		classified with it (since all ancestors or descendants will be marked as well).
		"""
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
		from node import nullid
		from i18n import _
Augie Fackler cleanup: move stdlib imports to their own import statement...	r20034	import random
		import util, dagutil
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
		def _updatesample(dag, nodes, sample, always, quicksamplesize=0):
		# if nodes is empty we scan the entire graph
		if nodes:
		heads = dag.headsetofconnecteds(nodes)
		else:
		heads = dag.heads()
		dist = {}
Bryan O'Sullivan util: subclass deque for Python 2.4 backwards compatibility...	r16834	visit = util.deque(heads)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	seen = set()
		factor = 1
		while visit:
		curr = visit.popleft()
		if curr in seen:
		continue
		d = dist.setdefault(curr, 1)
		if d > factor:
		factor *= 2
		if d == factor:
		if curr not in always: # need this check for the early exit below
		sample.add(curr)
		if quicksamplesize and (len(sample) >= quicksamplesize):
		return
		seen.add(curr)
		for p in dag.parents(curr):
		if not nodes or p in nodes:
		dist.setdefault(p, d + 1)
		visit.append(p)

		def _setupsample(dag, nodes, size):
		if len(nodes) <= size:
		return set(nodes), None, 0
Peter Arrenbrecht setdiscovery: fix hang when #heads>200 (issue2971)...	r15063	always = dag.headsetofconnecteds(nodes)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	desiredlen = size - len(always)
		if desiredlen <= 0:
		# This could be bad if there are very many heads, all unknown to the
		# server. We're counting on long request support here.
		return always, None, desiredlen
		return always, set(), desiredlen

		def _takequicksample(dag, nodes, size, initial):
		always, sample, desiredlen = _setupsample(dag, nodes, size)
		if sample is None:
		return always
		if initial:
		fromset = None
		else:
		fromset = nodes
		_updatesample(dag, fromset, sample, always, quicksamplesize=desiredlen)
		sample.update(always)
		return sample

		def _takefullsample(dag, nodes, size):
		always, sample, desiredlen = _setupsample(dag, nodes, size)
		if sample is None:
		return always
		# update from heads
		_updatesample(dag, nodes, sample, always)
		# update from roots
		_updatesample(dag.inverse(), nodes, sample, always)
		assert sample
		if len(sample) > desiredlen:
		sample = set(random.sample(sample, desiredlen))
		elif len(sample) < desiredlen:
		more = desiredlen - len(sample)
		sample.update(random.sample(list(nodes - sample - always), more))
		sample.update(always)
		return sample

		def findcommonheads(ui, local, remote,
		initialsamplesize=100,
		fullsamplesize=200,
		abortwhenunrelated=True):
Steven Brown setdiscovery: limit lines to 80 characters	r14206	'''Return a tuple (common, anyincoming, remoteheads) used to identify
		missing nodes from or in remote.
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	'''
		roundtrips = 0
		cl = local.changelog
		dag = dagutil.revlogdag(cl)

Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	# early exit if we know all the specified remote heads already
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	ui.debug("query 1; heads\n")
		roundtrips += 1
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	ownheads = dag.heads()
		sample = ownheads
		if remote.local():
		# stopgap until we have a proper localpeer that supports batch()
Pierre-Yves David localpeer: return only visible heads and branchmap...	r17204	srvheadhashes = remote.heads()
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	yesno = remote.known(dag.externalizeall(sample))
		elif remote.capable('batch'):
		batch = remote.batch()
		srvheadhashesref = batch.heads()
		yesnoref = batch.known(dag.externalizeall(sample))
		batch.submit()
		srvheadhashes = srvheadhashesref.value
		yesno = yesnoref.value
		else:
Mads Kiilerich fix trivial spelling errors	r17424	# compatibility with pre-batch, but post-known remotes during 1.9
		# development
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	srvheadhashes = remote.heads()
		sample = []
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
		if cl.tip() == nullid:
		if srvheadhashes != [nullid]:
		return [nullid], True, srvheadhashes
		return [nullid], False, []

Steven Brown setdiscovery: limit lines to 80 characters	r14206	# start actual discovery (we note this before the next "if" for
		# compatibility reasons)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	ui.status(_("searching for changes\n"))

		srvheads = dag.internalizeall(srvheadhashes, filterunknown=True)
		if len(srvheads) == len(srvheadhashes):
Matt Mackall discovery: quiet note about heads...	r14833	ui.debug("all remote heads known locally\n")
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	return (srvheadhashes, False, srvheadhashes,)

Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	if sample and util.all(yesno):
Mads Kiilerich add missing localization markup	r15497	ui.note(_("all local heads known remotely\n"))
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	ownheadhashes = dag.externalizeall(ownheads)
		return (ownheadhashes, True, srvheadhashes,)

Peter Arrenbrecht discovery: add new set-based discovery...	r14164	# full blown discovery

Brodie Rao cleanup: eradicate long lines	r16683	# own nodes where I don't know if remote knows them
		undecided = dag.nodeset()
		# own nodes I know we both know
		common = set()
		# own nodes I know remote lacks
		missing = set()

		# treat remote heads (and maybe own heads) as a first implicit sample
		# response
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	common.update(dag.ancestorset(srvheads))
		undecided.difference_update(common)
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624
		full = False
		while undecided:
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	if sample:
		commoninsample = set(n for i, n in enumerate(sample) if yesno[i])
		common.update(dag.ancestorset(commoninsample, common))
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	missinginsample = [n for i, n in enumerate(sample) if not yesno[i]]
		missing.update(dag.descendantset(missinginsample, missing))
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	undecided.difference_update(missing)
		undecided.difference_update(common)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
		if not undecided:
		break

Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	if full:
Mads Kiilerich add missing localization markup	r15497	ui.note(_("sampling from both directions\n"))
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	sample = _takefullsample(dag, undecided, size=fullsamplesize)
		elif common:
		# use cheapish initial sample
		ui.debug("taking initial sample\n")
		sample = _takefullsample(dag, undecided, size=fullsamplesize)
		else:
		# use even cheaper initial sample
		ui.debug("taking quick initial sample\n")
		sample = _takequicksample(dag, undecided, size=initialsamplesize,
		initial=True)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
		roundtrips += 1
		ui.progress(_('searching'), roundtrips, unit=_('queries'))
		ui.debug("query %i; still undecided: %i, sample size is: %i\n"
		% (roundtrips, len(undecided), len(sample)))
		# indices between sample and externalized version must match
		sample = list(sample)
		yesno = remote.known(dag.externalizeall(sample))
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	full = True
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
		result = dag.headsetofconnecteds(common)
		ui.progress(_('searching'), None)
		ui.debug("%d total queries\n" % roundtrips)

		if not result and srvheadhashes != [nullid]:
		if abortwhenunrelated:
		raise util.Abort(_("repository is unrelated"))
		else:
		ui.warn(_("warning: repository is unrelated\n"))
		return (set([nullid]), True, srvheadhashes,)

Andrew Pritchard setdiscovery: return anyincoming=False when remote's only head is nullid...	r14981	anyincoming = (srvheadhashes != [nullid])
		return dag.externalizeall(result), anyincoming, srvheadhashes