upstream/mercurial-mirror Files · mercurial/setdiscovery.py

wireproto: add streams to frame-based protocol...

wireproto: add streams to frame-based protocol Previously, the frame-based protocol was just a series of frames, with each frame associated with a request ID. In order to scale the protocol, we'll want to enable the use of compression. While it is possible to enable compression at the socket/pipe level, this has its disadvantages. The big one is it undermines the point of frames being standalone, atomic units that can be read and written: if you add compression above the framing protocol, you are back to having a stream-based protocol as opposed to something frame-based. So in order to preserve frames, compression needs to occur at the frame payload level. Compressing each frame's payload individually will limit compression ratios because the window size of the compressor will be limited by the max frame size, which is 32-64kb as currently defined. It will also add CPU overhead, as it is more efficient for compressors to operate on fewer, larger blocks of data than more, smaller blocks. So compressing each frame independently is out. This means we need to compress each frame's payload as if it is part of a larger stream. The simplest approach is to have 1 stream per connection. This could certainly work. However, it has disadvantages (documented below). We could also have 1 stream per RPC/command invocation. (This is the model HTTP/2 goes with.) This also has disadvantages. The main disadvantage to one global stream is that it has the very real potential to create CPU bottlenecks doing compression. Networks are only getting faster and the performance of single CPU cores has been relatively flat. Newer compression formats like zstandard offer better CPU cycle efficiency than predecessors like zlib. But it still all too common to saturate your CPU with compression overhead long before you saturate the network pipe. The main disadvantage with streams per request is that you can't reap the benefits of the compression context for multiple requests. For example, if you send 1000 RPC requests (or HTTP/2 requests for that matter), the response to each would have its own compression context. The overall size of the raw responses would be larger because compression contexts wouldn't be able to reference data from another request or response. The approach for streams as implemented in this commit is to support N streams per connection and for streams to potentially span requests and responses. As explained by the added internals docs, this facilitates servers and clients delegating independent streams and compression to independent threads / CPU cores. This helps alleviate the CPU bottleneck of compression. This design also allows compression contexts to be reused across requests/responses. This can result in improved compression ratios and less overhead for compressors and decompressors having to build new contexts. Another feature that was defined was the ability for individual frames within a stream to declare whether that individual frame's payload uses the content encoding (read: compression) defined by the stream. The idea here is that some servers may serve data from a combination of caches and dynamic resolution. Data coming from caches may be pre-compressed. We want to facilitate servers being able to essentially stream bytes from caches to the wire with minimal overhead. Being able to mix and match with frames are compressed within a stream enables these types of advanced server functionality. This commit defines the new streams mechanism. Basic code for supporting streams in frames has been added. But that code is seriously lacking and doesn't fully conform to the defined protocol. For example, we don't close any streams. And support for content encoding within streams is not yet implemented. The change was rather invasive and I didn't think it would be reasonable to implement the entire feature in a single commit. For the record, I would have loved to reuse an existing multiplexing protocol to build the new wire protocol on top of. However, I couldn't find a protocol that offers the performance and scaling characteristics that I desired. Namely, it should support multiple compression contexts to facilitate scaling out to multiple CPU cores and compression contexts should be able to live longer than single RPC requests. HTTP/2 *almost* fits the bill. But the semantics of HTTP message exchange state that streams can only live for a single request-response. We /could/ tunnel on top of HTTP/2 streams and frames with HEADER and DATA frames. But there's no guarantee that HTTP/2 libraries and proxies would allow us to use HTTP/2 streams and frames without the HTTP message exchange semantics defined in RFC 7540 Section 8. Other RPC protocols like gRPC tunnel are built on top of HTTP/2 and thus preserve its semantics of stream per RPC invocation. Even QUIC does this. We could attempt to invent a higher-level stream that spans HTTP/2 streams. But this would be violating HTTP/2 because there is no guarantee that HTTP/2 streams are routed to the same server. The best we can do - which is what this protocol does - is shoehorn all request and response data into a single HTTP message and create streams within. At that point, we've defined a Content-Type in HTTP parlance. It just so happens our media type can also work as a standalone, stream-based protocol, without leaning on HTTP or similar protocol. Differential Revision: https://phab.mercurial-scm.org/D2907

Martin von Zweigbergk - - Load All Authors

File last commit:

r36735:59802fa5 default


                r37304:9bfcbe4f

default

Download file

             setdiscovery.py
        
                    262 lines
            
             | 9.2 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / mercurial / setdiscovery.py
          
                    History
                
                 |
                  Source
                 | Raw
                 |Copy content
                 |Copy permalink

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
      # setdiscovery.py - improved discovery of common nodeset for mercurial

      #

      # Copyright 2010 Benoit Boissinot <bboissin@gmail.com>

      # and Peter Arrenbrecht <peter@arrenbrecht.ch>

      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

        Olle Lundberg
    
setdiscovery: document algorithms used...

              r20656
            
      """

      Algorithm works in the following way. You have two repository: local and

      remote. They both contains a DAG of changelists.

      The goal of the discovery protocol is to find one set of node *common*,

      the set of nodes shared by local and remote.

      One of the issue with the original protocol was latency, it could

      potentially require lots of roundtrips to discover that the local repo was a

      subset of remote (which is a very common case, you usually have few changes

      compared to upstream, while upstream probably had lots of development).

      The new protocol only requires one interface for the remote repo: `known()`,

      which given a set of changelists tells you if they are present in the DAG.

      The algorithm then works as follow:

       - We will be using three sets, `common`, `missing`, `unknown`. Originally

       all nodes are in `unknown`.

       - Take a sample from `unknown`, call `remote.known(sample)`

         - For each node that remote knows, move it and all its ancestors to `common`

         - For each node that remote doesn't know, move it and all its descendants

         to `missing`

       - Iterate until `unknown` is empty

      There are a couple optimizations, first is instead of starting with a random

      sample of missing, start by sending all heads, in the case where the local

      repo is a subset, you computed the answer in one round trip.

      Then you can do something similar to the bisecting strategy used when

      finding faulty changesets. Instead of random samples, you can try picking

      nodes that will maximize the number of nodes that will be

      classified with it (since all ancestors or descendants will be marked as well).

      """

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
        Gregory Szorc
    
setdiscovery: use absolute_import

              r25973
            
      from __future__ import absolute_import

        Martin von Zweigbergk
    
util: drop alias for collections.deque...

              r25113
            
      import collections

        Augie Fackler
    
cleanup: move stdlib imports to their own import statement...

              r20034
            
      import random

        Gregory Szorc
    
setdiscovery: use absolute_import

              r25973
            
      from .i18n import _

      from .node import (

          nullid,

          nullrev,

      )

      from . import (

          dagutil,

        Pierre-Yves David
    
error: get Abort from 'error' instead of 'util'...

              r26587
            
          error,

        marmoute
    
discovery: include timing in the debug output...

              r32712
            
          util,

        Gregory Szorc
    
setdiscovery: use absolute_import

              r25973
            
      )

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
        Pierre-Yves David
    
setdiscovery: drop the 'always' argument to '_updatesample'...

              r23814
            
      def _updatesample(dag, nodes, sample, quicksamplesize=0):

        Pierre-Yves David
    
setdiscovery: document the '_updatesample' function...

              r23809
            
          """update an existing sample to match the expected size

          The sample is updated with nodes exponentially distant from each head of the

          <nodes> set. (H~1, H~2, H~4, H~8, etc).

          If a target size is specified, the sampling will stop once this size is

          reached. Otherwise sampling will happen until roots of the <nodes> set are

          reached.

          :dag: a dag object from dagutil

          :nodes:  set of nodes we want to discover (if None, assume the whole dag)

          :sample: a sample to update

          :quicksamplesize: optional target size of the sample"""

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          # if nodes is empty we scan the entire graph

          if nodes:

              heads = dag.headsetofconnecteds(nodes)

          else:

              heads = dag.heads()

          dist = {}

        Martin von Zweigbergk
    
util: drop alias for collections.deque...

              r25113
            
          visit = collections.deque(heads)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          seen = set()

          factor = 1

          while visit:

              curr = visit.popleft()

              if curr in seen:

                  continue

              d = dist.setdefault(curr, 1)

              if d > factor:

                  factor *= 2

              if d == factor:

        Pierre-Yves David
    
setdiscovery: drop the 'always' argument to '_updatesample'...

              r23814
            
                  sample.add(curr)

                  if quicksamplesize and (len(sample) >= quicksamplesize):

                      return

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
              seen.add(curr)

              for p in dag.parents(curr):

                  if not nodes or p in nodes:

                      dist.setdefault(p, d + 1)

                      visit.append(p)

        Pierre-Yves David
    
setdiscovery: drop unused 'initial' argument for '_takequicksample'...

              r23806
            
      def _takequicksample(dag, nodes, size):

        Pierre-Yves David
    
setdiscovery: document '_takequicksample'

              r23816
            
          """takes a quick sample of size <size>

          It is meant for initial sampling and focuses on querying heads and close

          ancestors of heads.

          :dag: a dag object

          :nodes: set of nodes to discover

          :size: the maximum size of the sample"""

        Pierre-Yves David
    
setdiscovery: drop '_setupsample' usage in '_takequicksample'...

              r23815
            
          sample = dag.headsetofconnecteds(nodes)

        Martin von Zweigbergk
    
setdiscovery: avoid a Yoda condition...

              r36735
            
          if len(sample) >= size:

        Pierre-Yves David
    
setdiscovery: drop '_setupsample' usage in '_takequicksample'...

              r23815
            
              return _limitsample(sample, size)

        Pierre-Yves David
    
setdiscovery: drop the 'always' argument to '_updatesample'...

              r23814
            
          _updatesample(dag, None, sample, quicksamplesize=size)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          return sample

      def _takefullsample(dag, nodes, size):

        Pierre-Yves David
    
setdiscovery: drop the 'always' argument to '_updatesample'...

              r23814
            
          sample = dag.headsetofconnecteds(nodes)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          # update from heads

        Pierre-Yves David
    
setdiscovery: drop the 'always' argument to '_updatesample'...

              r23814
            
          _updatesample(dag, nodes, sample)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          # update from roots

        Pierre-Yves David
    
setdiscovery: drop the 'always' argument to '_updatesample'...

              r23814
            
          _updatesample(dag.inverse(), nodes, sample)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          assert sample

        Pierre-Yves David
    
setdiscovery: randomly pick between heads and sample when taking full sample...

              r23810
            
          sample = _limitsample(sample, size)

          if len(sample) < size:

              more = size - len(sample)

              sample.update(random.sample(list(nodes - sample), more))

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          return sample

        Pierre-Yves David
    
setdiscovery: extract sample limitation in a `_limitsample` function...

              r23083
            
      def _limitsample(sample, desiredlen):

          """return a random subset of sample of at most desiredlen item"""

          if len(sample) > desiredlen:

              sample = set(random.sample(sample, desiredlen))

          return sample

        Martin von Zweigbergk
    
setdiscovery: back out changeset 5cfdf6137af8 (issue5809)...

              r36732
            
      def findcommonheads(ui, local, remote,

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
                          initialsamplesize=100,

                          fullsamplesize=200,

        Boris Feld
    
setdiscover: allow to ignore part of the local graph...

              r35305
            
                          abortwhenunrelated=True,

                          ancestorsof=None):

        Steven Brown
    
setdiscovery: limit lines to 80 characters

              r14206
            
          '''Return a tuple (common, anyincoming, remoteheads) used to identify

          missing nodes from or in remote.

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          '''

        marmoute
    
discovery: include timing in the debug output...

              r32712
            
          start = util.timer()

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          roundtrips = 0

          cl = local.changelog

        Boris Feld
    
setdiscover: allow to ignore part of the local graph...

              r35305
            
          localsubset = None

          if ancestorsof is not None:

              rev = local.changelog.rev

              localsubset = [rev(n) for n in ancestorsof]

          dag = dagutil.revlogdag(cl, localsubset=localsubset)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
          # early exit if we know all the specified remote heads already

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          ui.debug("query 1; heads\n")

          roundtrips += 1

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
          ownheads = dag.heads()

        Pierre-Yves David
    
setdiscovery: limit the size of the initial sample (issue4411)...

              r23084
            
          sample = _limitsample(ownheads, initialsamplesize)

        Mads Kiilerich
    
discovery: indices between sample and yesno must match (issue4438)...

              r23192
            
          # indices between sample and externalized version must match

          sample = list(sample)

        Martin von Zweigbergk
    
setdiscovery: back out changeset 5cfdf6137af8 (issue5809)...

              r36732
            
          batch = remote.iterbatch()

          batch.heads()

          batch.known(dag.externalizeall(sample))

          batch.submit()

          srvheadhashes, yesno = batch.results()

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          if cl.tip() == nullid:

              if srvheadhashes != [nullid]:

                  return [nullid], True, srvheadhashes

              return [nullid], False, []

        Steven Brown
    
setdiscovery: limit lines to 80 characters

              r14206
            
          # start actual discovery (we note this before the next "if" for

          # compatibility reasons)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          ui.status(_("searching for changes\n"))

          srvheads = dag.internalizeall(srvheadhashes, filterunknown=True)

          if len(srvheads) == len(srvheadhashes):

        Matt Mackall
    
discovery: quiet note about heads...

              r14833
            
              ui.debug("all remote heads known locally\n")

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
              return (srvheadhashes, False, srvheadhashes,)

        Martin von Zweigbergk
    
setdiscovery: remove initialsamplesize from a condition...

              r36733
            
          if len(sample) == len(ownheads) and all(yesno):

        Mads Kiilerich
    
add missing localization markup

              r15497
            
              ui.note(_("all local heads known remotely\n"))

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
              ownheadhashes = dag.externalizeall(ownheads)

              return (ownheadhashes, True, srvheadhashes,)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          # full blown discovery

        Brodie Rao
    
cleanup: eradicate long lines

              r16683
            
          # own nodes I know we both know

        Siddharth Agarwal
    
setdiscovery: avoid a full changelog graph traversal...

              r23343
            
          # treat remote heads (and maybe own heads) as a first implicit sample

          # response

          common = cl.incrementalmissingrevs(srvheads)

          commoninsample = set(n for i, n in enumerate(sample) if yesno[i])

          common.addbases(commoninsample)

        Pierre-Yves David
    
setdiscovery: drop shadowed 'undecided' assignment...

              r23746
            
          # own nodes where I don't know if remote knows them

        Siddharth Agarwal
    
setdiscovery: avoid a full changelog graph traversal...

              r23343
            
          undecided = set(common.missingancestors(ownheads))

        Brodie Rao
    
cleanup: eradicate long lines

              r16683
            
          # own nodes I know remote lacks

          missing = set()

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
          full = False

          while undecided:

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
              if sample:

                  missinginsample = [n for i, n in enumerate(sample) if not yesno[i]]

                  missing.update(dag.descendantset(missinginsample, missing))

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
                  undecided.difference_update(missing)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
              if not undecided:

                  break

        Pierre-Yves David
    
setdiscovery: factorize similar sampling code...

              r23747
            
              if full or common.hasbases():

                  if full:

                      ui.note(_("sampling from both directions\n"))

                  else:

                      ui.debug("taking initial sample\n")

        Pierre-Yves David
    
setdiscovery: delay sample building calls to gather them in a single place...

              r23807
            
                  samplefunc = _takefullsample

        Pierre-Yves David
    
setdiscovery: limit the size of all sample (issue4411)...

              r23130
            
                  targetsize = fullsamplesize

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
              else:

                  # use even cheaper initial sample

                  ui.debug("taking quick initial sample\n")

        Pierre-Yves David
    
setdiscovery: delay sample building calls to gather them in a single place...

              r23807
            
                  samplefunc = _takequicksample

        Pierre-Yves David
    
setdiscovery: limit the size of all sample (issue4411)...

              r23130
            
                  targetsize = initialsamplesize

        Pierre-Yves David
    
setdiscovery: avoid calling any sample building if the undecided set is small...

              r23808
            
              if len(undecided) < targetsize:

                  sample = list(undecided)

              else:

                  sample = samplefunc(dag, undecided, targetsize)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
              roundtrips += 1

              ui.progress(_('searching'), roundtrips, unit=_('queries'))

              ui.debug("query %i; still undecided: %i, sample size is: %i\n"

                       % (roundtrips, len(undecided), len(sample)))

              # indices between sample and externalized version must match

              sample = list(sample)

              yesno = remote.known(dag.externalizeall(sample))

        Peter Arrenbrecht
    
setdiscovery: batch heads and known(ownheads)...

              r14624
            
              full = True

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
        Siddharth Agarwal
    
setdiscovery: avoid a full changelog graph traversal...

              r23343
            
              if sample:

                  commoninsample = set(n for i, n in enumerate(sample) if yesno[i])

                  common.addbases(commoninsample)

                  common.removeancestorsfrom(undecided)

          # heads(common) == heads(common.bases) since common represents common.bases

          # and all its ancestors

          result = dag.headsetofconnecteds(common.bases)

          # common.bases can include nullrev, but our contract requires us to not

          # return any heads in that case, so discard that

          result.discard(nullrev)

        marmoute
    
discovery: include timing in the debug output...

              r32712
            
          elapsed = util.timer() - start

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          ui.progress(_('searching'), None)

        marmoute
    
discovery: include timing in the debug output...

              r32712
            
          ui.debug("%d total queries in %.4fs\n" % (roundtrips, elapsed))

        marmoute
    
setdiscovery: improves logged message...

              r32768
            
          msg = ('found %d common and %d unknown server heads,'

                 ' %d roundtrips in %.4fs\n')

          missing = set(result) - set(srvheads)

          ui.log('discovery', msg, len(result), len(missing), roundtrips,

        marmoute
    
discovery: log discovery result in non-trivial cases...

              r32713
            
                 elapsed)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
          if not result and srvheadhashes != [nullid]:

              if abortwhenunrelated:

        Pierre-Yves David
    
error: get Abort from 'error' instead of 'util'...

              r26587
            
                  raise error.Abort(_("repository is unrelated"))

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
              else:

                  ui.warn(_("warning: repository is unrelated\n"))

        Martin von Zweigbergk
    
cleanup: use set literals...

              r32291
            
              return ({nullid}, True, srvheadhashes,)

        Peter Arrenbrecht
    
discovery: add new set-based discovery...

              r14164
            
        Andrew Pritchard
    
setdiscovery: return anyincoming=False when remote's only head is nullid...

              r14981
            
          anyincoming = (srvheadhashes != [nullid])

          return dag.externalizeall(result), anyincoming, srvheadhashes

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

Peter Arrenbrecht discovery: add new set-based discovery...	r14164	# setdiscovery.py - improved discovery of common nodeset for mercurial
		#
		# Copyright 2010 Benoit Boissinot <bboissin@gmail.com>
		# and Peter Arrenbrecht <peter@arrenbrecht.ch>
		#
		# This software may be used and distributed according to the terms of the
		# GNU General Public License version 2 or any later version.
Olle Lundberg setdiscovery: document algorithms used...	r20656	"""
		Algorithm works in the following way. You have two repository: local and
		remote. They both contains a DAG of changelists.

		The goal of the discovery protocol is to find one set of node common,
		the set of nodes shared by local and remote.

		One of the issue with the original protocol was latency, it could
		potentially require lots of roundtrips to discover that the local repo was a
		subset of remote (which is a very common case, you usually have few changes
		compared to upstream, while upstream probably had lots of development).

		The new protocol only requires one interface for the remote repo: `known()`,
		which given a set of changelists tells you if they are present in the DAG.

		The algorithm then works as follow:

		- We will be using three sets, `common`, `missing`, `unknown`. Originally
		all nodes are in `unknown`.
		- Take a sample from `unknown`, call `remote.known(sample)`
		- For each node that remote knows, move it and all its ancestors to `common`
		- For each node that remote doesn't know, move it and all its descendants
		to `missing`
		- Iterate until `unknown` is empty

		There are a couple optimizations, first is instead of starting with a random
		sample of missing, start by sending all heads, in the case where the local
		repo is a subset, you computed the answer in one round trip.

		Then you can do something similar to the bisecting strategy used when
		finding faulty changesets. Instead of random samples, you can try picking
		nodes that will maximize the number of nodes that will be
		classified with it (since all ancestors or descendants will be marked as well).
		"""
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
Gregory Szorc setdiscovery: use absolute_import	r25973	from __future__ import absolute_import

Martin von Zweigbergk util: drop alias for collections.deque...	r25113	import collections
Augie Fackler cleanup: move stdlib imports to their own import statement...	r20034	import random
Gregory Szorc setdiscovery: use absolute_import	r25973
		from .i18n import _
		from .node import (
		nullid,
		nullrev,
		)
		from . import (
		dagutil,
Pierre-Yves David error: get Abort from 'error' instead of 'util'...	r26587	error,
marmoute discovery: include timing in the debug output...	r32712	util,
Gregory Szorc setdiscovery: use absolute_import	r25973	)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
Pierre-Yves David setdiscovery: drop the 'always' argument to '_updatesample'...	r23814	def _updatesample(dag, nodes, sample, quicksamplesize=0):
Pierre-Yves David setdiscovery: document the '_updatesample' function...	r23809	"""update an existing sample to match the expected size

		The sample is updated with nodes exponentially distant from each head of the
		<nodes> set. (H~1, H~2, H~4, H~8, etc).

		If a target size is specified, the sampling will stop once this size is
		reached. Otherwise sampling will happen until roots of the <nodes> set are
		reached.

		:dag: a dag object from dagutil
		:nodes: set of nodes we want to discover (if None, assume the whole dag)
		:sample: a sample to update
		:quicksamplesize: optional target size of the sample"""
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	# if nodes is empty we scan the entire graph
		if nodes:
		heads = dag.headsetofconnecteds(nodes)
		else:
		heads = dag.heads()
		dist = {}
Martin von Zweigbergk util: drop alias for collections.deque...	r25113	visit = collections.deque(heads)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	seen = set()
		factor = 1
		while visit:
		curr = visit.popleft()
		if curr in seen:
		continue
		d = dist.setdefault(curr, 1)
		if d > factor:
		factor *= 2
		if d == factor:
Pierre-Yves David setdiscovery: drop the 'always' argument to '_updatesample'...	r23814	sample.add(curr)
		if quicksamplesize and (len(sample) >= quicksamplesize):
		return
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	seen.add(curr)
		for p in dag.parents(curr):
		if not nodes or p in nodes:
		dist.setdefault(p, d + 1)
		visit.append(p)

Pierre-Yves David setdiscovery: drop unused 'initial' argument for '_takequicksample'...	r23806	def _takequicksample(dag, nodes, size):
Pierre-Yves David setdiscovery: document '_takequicksample'	r23816	"""takes a quick sample of size <size>

		It is meant for initial sampling and focuses on querying heads and close
		ancestors of heads.

		:dag: a dag object
		:nodes: set of nodes to discover
		:size: the maximum size of the sample"""
Pierre-Yves David setdiscovery: drop '_setupsample' usage in '_takequicksample'...	r23815	sample = dag.headsetofconnecteds(nodes)
Martin von Zweigbergk setdiscovery: avoid a Yoda condition...	r36735	if len(sample) >= size:
Pierre-Yves David setdiscovery: drop '_setupsample' usage in '_takequicksample'...	r23815	return _limitsample(sample, size)
Pierre-Yves David setdiscovery: drop the 'always' argument to '_updatesample'...	r23814	_updatesample(dag, None, sample, quicksamplesize=size)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	return sample

		def _takefullsample(dag, nodes, size):
Pierre-Yves David setdiscovery: drop the 'always' argument to '_updatesample'...	r23814	sample = dag.headsetofconnecteds(nodes)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	# update from heads
Pierre-Yves David setdiscovery: drop the 'always' argument to '_updatesample'...	r23814	_updatesample(dag, nodes, sample)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	# update from roots
Pierre-Yves David setdiscovery: drop the 'always' argument to '_updatesample'...	r23814	_updatesample(dag.inverse(), nodes, sample)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	assert sample
Pierre-Yves David setdiscovery: randomly pick between heads and sample when taking full sample...	r23810	sample = _limitsample(sample, size)
		if len(sample) < size:
		more = size - len(sample)
		sample.update(random.sample(list(nodes - sample), more))
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	return sample

Pierre-Yves David setdiscovery: extract sample limitation in a `_limitsample` function...	r23083	def _limitsample(sample, desiredlen):
		"""return a random subset of sample of at most desiredlen item"""
		if len(sample) > desiredlen:
		sample = set(random.sample(sample, desiredlen))
		return sample

Martin von Zweigbergk setdiscovery: back out changeset 5cfdf6137af8 (issue5809)...	r36732	def findcommonheads(ui, local, remote,
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	initialsamplesize=100,
		fullsamplesize=200,
Boris Feld setdiscover: allow to ignore part of the local graph...	r35305	abortwhenunrelated=True,
		ancestorsof=None):
Steven Brown setdiscovery: limit lines to 80 characters	r14206	'''Return a tuple (common, anyincoming, remoteheads) used to identify
		missing nodes from or in remote.
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	'''
marmoute discovery: include timing in the debug output...	r32712	start = util.timer()

Peter Arrenbrecht discovery: add new set-based discovery...	r14164	roundtrips = 0
		cl = local.changelog
Boris Feld setdiscover: allow to ignore part of the local graph...	r35305	localsubset = None
		if ancestorsof is not None:
		rev = local.changelog.rev
		localsubset = [rev(n) for n in ancestorsof]
		dag = dagutil.revlogdag(cl, localsubset=localsubset)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	# early exit if we know all the specified remote heads already
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	ui.debug("query 1; heads\n")
		roundtrips += 1
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	ownheads = dag.heads()
Pierre-Yves David setdiscovery: limit the size of the initial sample (issue4411)...	r23084	sample = _limitsample(ownheads, initialsamplesize)
Mads Kiilerich discovery: indices between sample and yesno must match (issue4438)...	r23192	# indices between sample and externalized version must match
		sample = list(sample)
Martin von Zweigbergk setdiscovery: back out changeset 5cfdf6137af8 (issue5809)...	r36732	batch = remote.iterbatch()
		batch.heads()
		batch.known(dag.externalizeall(sample))
		batch.submit()
		srvheadhashes, yesno = batch.results()
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
		if cl.tip() == nullid:
		if srvheadhashes != [nullid]:
		return [nullid], True, srvheadhashes
		return [nullid], False, []

Steven Brown setdiscovery: limit lines to 80 characters	r14206	# start actual discovery (we note this before the next "if" for
		# compatibility reasons)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	ui.status(_("searching for changes\n"))

		srvheads = dag.internalizeall(srvheadhashes, filterunknown=True)
		if len(srvheads) == len(srvheadhashes):
Matt Mackall discovery: quiet note about heads...	r14833	ui.debug("all remote heads known locally\n")
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	return (srvheadhashes, False, srvheadhashes,)

Martin von Zweigbergk setdiscovery: remove initialsamplesize from a condition...	r36733	if len(sample) == len(ownheads) and all(yesno):
Mads Kiilerich add missing localization markup	r15497	ui.note(_("all local heads known remotely\n"))
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	ownheadhashes = dag.externalizeall(ownheads)
		return (ownheadhashes, True, srvheadhashes,)

Peter Arrenbrecht discovery: add new set-based discovery...	r14164	# full blown discovery

Brodie Rao cleanup: eradicate long lines	r16683	# own nodes I know we both know
Siddharth Agarwal setdiscovery: avoid a full changelog graph traversal...	r23343	# treat remote heads (and maybe own heads) as a first implicit sample
		# response
		common = cl.incrementalmissingrevs(srvheads)
		commoninsample = set(n for i, n in enumerate(sample) if yesno[i])
		common.addbases(commoninsample)
Pierre-Yves David setdiscovery: drop shadowed 'undecided' assignment...	r23746	# own nodes where I don't know if remote knows them
Siddharth Agarwal setdiscovery: avoid a full changelog graph traversal...	r23343	undecided = set(common.missingancestors(ownheads))
Brodie Rao cleanup: eradicate long lines	r16683	# own nodes I know remote lacks
		missing = set()

Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	full = False
		while undecided:
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	if sample:
		missinginsample = [n for i, n in enumerate(sample) if not yesno[i]]
		missing.update(dag.descendantset(missinginsample, missing))
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	undecided.difference_update(missing)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
		if not undecided:
		break

Pierre-Yves David setdiscovery: factorize similar sampling code...	r23747	if full or common.hasbases():
		if full:
		ui.note(_("sampling from both directions\n"))
		else:
		ui.debug("taking initial sample\n")
Pierre-Yves David setdiscovery: delay sample building calls to gather them in a single place...	r23807	samplefunc = _takefullsample
Pierre-Yves David setdiscovery: limit the size of all sample (issue4411)...	r23130	targetsize = fullsamplesize
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	else:
		# use even cheaper initial sample
		ui.debug("taking quick initial sample\n")
Pierre-Yves David setdiscovery: delay sample building calls to gather them in a single place...	r23807	samplefunc = _takequicksample
Pierre-Yves David setdiscovery: limit the size of all sample (issue4411)...	r23130	targetsize = initialsamplesize
Pierre-Yves David setdiscovery: avoid calling any sample building if the undecided set is small...	r23808	if len(undecided) < targetsize:
		sample = list(undecided)
		else:
		sample = samplefunc(dag, undecided, targetsize)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
		roundtrips += 1
		ui.progress(_('searching'), roundtrips, unit=_('queries'))
		ui.debug("query %i; still undecided: %i, sample size is: %i\n"
		% (roundtrips, len(undecided), len(sample)))
		# indices between sample and externalized version must match
		sample = list(sample)
		yesno = remote.known(dag.externalizeall(sample))
Peter Arrenbrecht setdiscovery: batch heads and known(ownheads)...	r14624	full = True
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
Siddharth Agarwal setdiscovery: avoid a full changelog graph traversal...	r23343	if sample:
		commoninsample = set(n for i, n in enumerate(sample) if yesno[i])
		common.addbases(commoninsample)
		common.removeancestorsfrom(undecided)

		# heads(common) == heads(common.bases) since common represents common.bases
		# and all its ancestors
		result = dag.headsetofconnecteds(common.bases)
		# common.bases can include nullrev, but our contract requires us to not
		# return any heads in that case, so discard that
		result.discard(nullrev)
marmoute discovery: include timing in the debug output...	r32712	elapsed = util.timer() - start
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	ui.progress(_('searching'), None)
marmoute discovery: include timing in the debug output...	r32712	ui.debug("%d total queries in %.4fs\n" % (roundtrips, elapsed))
marmoute setdiscovery: improves logged message...	r32768	msg = ('found %d common and %d unknown server heads,'
		' %d roundtrips in %.4fs\n')
		missing = set(result) - set(srvheads)
		ui.log('discovery', msg, len(result), len(missing), roundtrips,
marmoute discovery: log discovery result in non-trivial cases...	r32713	elapsed)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
		if not result and srvheadhashes != [nullid]:
		if abortwhenunrelated:
Pierre-Yves David error: get Abort from 'error' instead of 'util'...	r26587	raise error.Abort(_("repository is unrelated"))
Peter Arrenbrecht discovery: add new set-based discovery...	r14164	else:
		ui.warn(_("warning: repository is unrelated\n"))
Martin von Zweigbergk cleanup: use set literals...	r32291	return ({nullid}, True, srvheadhashes,)
Peter Arrenbrecht discovery: add new set-based discovery...	r14164
Andrew Pritchard setdiscovery: return anyincoming=False when remote's only head is nullid...	r14981	anyincoming = (srvheadhashes != [nullid])
		return dag.externalizeall(result), anyincoming, srvheadhashes