upstream/mercurial-mirror Files · mercurial/pvec.py

interfaces: convert `imanifestrevisionstored` from zope `Attribute` attrs...

interfaces: convert `imanifestrevisionstored` from zope `Attribute` attrs This is the same transformation as did for dirstate.

Matt Harbison - - Load All Authors

File last commit:

r52756:f4733654 default


                r53370:a1dd484b

default

Download file

             pvec.py
        
                    227 lines
            
             | 5.9 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / mercurial / pvec.py
          
                    History
                
                 |
                  Source
                 | Raw
                 |Copy content
                 |Copy permalink

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      # pvec.py - probabilistic vector clocks for Mercurial

      #

        Raphaël Gomès
    
contributor: change mentions of mpm to olivia...

              r47575
            
      # Copyright 2012 Olivia Mackall <olivia@selenic.com>

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

      '''

      A "pvec" is a changeset property based on the theory of vector clocks

      that can be compared to discover relatedness without consulting a

      graph. This can be useful for tasks like determining how a

      disconnected patch relates to a repository.

      Currently a pvec consist of 448 bits, of which 24 are 'depth' and the

      remainder are a bit vector. It is represented as a 70-character base85

      string.

      Construction:

      - a root changeset has a depth of 0 and a bit vector based on its hash

      - a normal commit has a changeset where depth is increased by one and

        one bit vector bit is flipped based on its hash

      - a merge changeset pvec is constructed by copying changes from one pvec into

        the other to balance its depth

      Properties:

      - for linear changes, difference in depth is always <= hamming distance

      - otherwise, changes are probably divergent

      - when hamming distance is < 200, we can reliably detect when pvecs are near

      Issues:

      - hamming distance ceases to work over distances of ~ 200

      - detecting divergence is less accurate when the common ancestor is very close

        to either revision or total distance is high

      - this could probably be improved by modeling the relation between

        delta and hdist

      Uses:

      - a patch pvec can be used to locate the nearest available common ancestor for

        resolving conflicts

      - ordering of patches can be established without a DAG

      - two head pvecs can be compared to determine whether push/pull/merge is needed

        and approximately how many changesets are involved

      - can be used to find a heuristic divergence measure between changesets on

        different branches

      '''

        Matt Harbison
    
typing: add `from __future__ import annotations` to most files...

              r52756
            
      from __future__ import annotations

        Gregory Szorc
    
pvec: use absolute_import

              r27501
            
      from .node import nullrev

      from . import (

        Gregory Szorc
    
global: use pycompat.xrange()...

              r38806
            
          pycompat,

        Gregory Szorc
    
pvec: use absolute_import

              r27501
            
          util,

      )

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
      _size = 448  # 70 chars b85-encoded

        Augie Fackler
    
pvec: migrate to modern integer division...

              r43766
            
      _bytes = _size // 8

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      _depthbits = 24

        Augie Fackler
    
pvec: migrate to modern integer division...

              r43766
            
      _depthbytes = _depthbits // 8

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      _vecbytes = _bytes - _depthbytes

      _vecbits = _vecbytes * 8

        Augie Fackler
    
pvec: migrate to modern integer division...

              r43766
            
      _radius = (_vecbits - 30) // 2  # high probability vectors are related

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      def _bin(bs):

          '''convert a bytestring to a long'''

          v = 0

          for b in bs:

              v = v * 256 + ord(b)

          return v

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
        marmoute
    
pytype: move some type comment to proper annotation...

              r52180
            
      def _str(v: int, l: int) -> bytes:

        Augie Fackler
    
formatting: byteify all mercurial/ and hgext/ string literals...

              r43347
            
          bs = b""

        Manuel Jacob
    
py3: replace `pycompat.xrange` by `range`

              r50179
            
          for p in range(l):

        Augie Fackler
    
pvec: fix overlooked chr() call...

              r43764
            
              bs = pycompat.bytechr(v & 255) + bs

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
              v >>= 8

          return bs

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      def _split(b):

          '''depth and bitvec'''

          return _bin(b[:_depthbytes]), _bin(b[_depthbytes:])

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      def _join(depth, bitvec):

          return _str(depth, _depthbytes) + _str(bitvec, _vecbytes)

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      def _hweight(x):

          c = 0

          while x:

              if x & 1:

                  c += 1

              x >>= 1

          return c

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
        Manuel Jacob
    
py3: replace `pycompat.xrange` by `range`

              r50179
            
      _htab = [_hweight(x) for x in range(256)]

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      def _hamming(a, b):

          '''find the hamming distance between two longs'''

          d = a ^ b

          c = 0

          while d:

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
              c += _htab[d & 0xFF]

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
              d >>= 8

          return c

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      def _mergevec(x, y, c):

          # Ideally, this function would be x ^ y ^ ancestor, but finding

          # ancestors is a nuisance. So instead we find the minimal number

          # of changes to balance the depth and hamming distance

          d1, v1 = x

          d2, v2 = y

          if d1 < d2:

              d1, d2, v1, v2 = d2, d1, v2, v1

          hdist = _hamming(v1, v2)

          ddist = d1 - d2

          v = v1

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
          m = v1 ^ v2  # mask of different bits

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
          i = 1

          if hdist > ddist:

              # if delta = 10 and hdist = 100, then we need to go up 55 steps

              # to the ancestor and down 45

        Augie Fackler
    
pvec: migrate to modern integer division...

              r43766
            
              changes = (hdist - ddist + 1) // 2

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
          else:

              # must make at least one change

              changes = 1

          depth = d1 + changes

          # copy changes from v2

          if m:

              while changes:

                  if m & i:

                      v ^= i

                      changes -= 1

                  i <<= 1

          else:

              v = _flipbit(v, c)

          return depth, v

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      def _flipbit(v, node):

          # converting bit strings to longs is slow

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
          bit = (hash(node) & 0xFFFFFFFF) % _vecbits

          return v ^ (1 << bit)

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
      def ctxpvec(ctx):

          '''construct a pvec for ctx while filling in the cache'''

        Matt Harbison
    
pvec: replace 'ctx._repo' with 'ctx.repo()'

              r24339
            
          r = ctx.repo()

        marmoute
    
safehasattr: drop usage in favor of hasattr...

              r51821
            
          if not hasattr(r, "_pveccache"):

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
              r._pveccache = {}

          pvc = r._pveccache

          if ctx.rev() not in pvc:

              cl = r.changelog

        Manuel Jacob
    
py3: replace `pycompat.xrange` by `range`

              r50179
            
              for n in range(ctx.rev() + 1):

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
                  if n not in pvc:

                      node = cl.node(n)

                      p1, p2 = cl.parentrevs(n)

                      if p1 == nullrev:

                          # start with a 'random' vector at root

                          pvc[n] = (0, _bin((node * 3)[:_vecbytes]))

                      elif p2 == nullrev:

                          d, v = pvc[p1]

                          pvc[n] = (d + 1, _flipbit(v, node))

                      else:

                          pvc[n] = _mergevec(pvc[p1], pvc[p2], node)

          bs = _join(*pvc[ctx.rev()])

        Yuya Nishihara
    
base85: proxy through util module...

              r32200
            
          return pvec(util.b85encode(bs))

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
        Gregory Szorc
    
py3: use class X: instead of class X(object):...

              r49801
            
      class pvec:

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
          def __init__(self, hashorctx):

        Matt Harbison
    
pvec: fix a `str` type conditional for py3...

              r44184
            
              if isinstance(hashorctx, bytes):

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
                  self._bs = hashorctx

        Yuya Nishihara
    
base85: proxy through util module...

              r32200
            
                  self._depth, self._vec = _split(util.b85decode(hashorctx))

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
              else:

        Bryan O'Sullivan
    
pvec: use the correct name for an identifier...

              r18918
            
                  self._vec = ctxpvec(hashorctx)

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
          def __str__(self):

              return self._bs

          def __eq__(self, b):

              return self._vec == b._vec and self._depth == b._depth

          def __lt__(self, b):

              delta = b._depth - self._depth

              if delta < 0:

        Augie Fackler
    
formatting: blacken the codebase...

              r43346
            
                  return False  # always correct

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
              if _hamming(self._vec, b._vec) > delta:

                  return False

              return True

          def __gt__(self, b):

              return b < self

          def __or__(self, b):

              delta = abs(b._depth - self._depth)

              if _hamming(self._vec, b._vec) <= delta:

                  return False

              return True

          def __sub__(self, b):

              if self | b:

        Augie Fackler
    
formatting: byteify all mercurial/ and hgext/ string literals...

              r43347
            
                  raise ValueError(b"concurrent pvecs")

        Matt Mackall
    
pvec: introduce pvecs

              r16249
            
              return self._depth - b._depth

          def distance(self, b):

              d = abs(b._depth - self._depth)

              h = _hamming(self._vec, b._vec)

              return max(d, h)

          def near(self, b):

              dist = abs(b.depth - self._depth)

              if dist > _radius or _hamming(self._vec, b._vec) > _radius:

                  return False

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

Matt Mackall pvec: introduce pvecs	r16249	# pvec.py - probabilistic vector clocks for Mercurial
		#
Raphaël Gomès contributor: change mentions of mpm to olivia...	r47575	# Copyright 2012 Olivia Mackall <olivia@selenic.com>
Matt Mackall pvec: introduce pvecs	r16249	#
		# This software may be used and distributed according to the terms of the
		# GNU General Public License version 2 or any later version.

		'''
		A "pvec" is a changeset property based on the theory of vector clocks
		that can be compared to discover relatedness without consulting a
		graph. This can be useful for tasks like determining how a
		disconnected patch relates to a repository.

		Currently a pvec consist of 448 bits, of which 24 are 'depth' and the
		remainder are a bit vector. It is represented as a 70-character base85
		string.

		Construction:

		- a root changeset has a depth of 0 and a bit vector based on its hash
		- a normal commit has a changeset where depth is increased by one and
		one bit vector bit is flipped based on its hash
		- a merge changeset pvec is constructed by copying changes from one pvec into
		the other to balance its depth

		Properties:

		- for linear changes, difference in depth is always <= hamming distance
		- otherwise, changes are probably divergent
		- when hamming distance is < 200, we can reliably detect when pvecs are near

		Issues:

		- hamming distance ceases to work over distances of ~ 200
		- detecting divergence is less accurate when the common ancestor is very close
		to either revision or total distance is high
		- this could probably be improved by modeling the relation between
		delta and hdist

		Uses:

		- a patch pvec can be used to locate the nearest available common ancestor for
		resolving conflicts
		- ordering of patches can be established without a DAG
		- two head pvecs can be compared to determine whether push/pull/merge is needed
		and approximately how many changesets are involved
		- can be used to find a heuristic divergence measure between changesets on
		different branches
		'''

Matt Harbison typing: add `from __future__ import annotations` to most files...	r52756	from __future__ import annotations
Gregory Szorc pvec: use absolute_import	r27501
		from .node import nullrev
		from . import (
Gregory Szorc global: use pycompat.xrange()...	r38806	pycompat,
Gregory Szorc pvec: use absolute_import	r27501	util,
		)
Matt Mackall pvec: introduce pvecs	r16249
Augie Fackler formatting: blacken the codebase...	r43346	_size = 448 # 70 chars b85-encoded
Augie Fackler pvec: migrate to modern integer division...	r43766	_bytes = _size // 8
Matt Mackall pvec: introduce pvecs	r16249	_depthbits = 24
Augie Fackler pvec: migrate to modern integer division...	r43766	_depthbytes = _depthbits // 8
Matt Mackall pvec: introduce pvecs	r16249	_vecbytes = _bytes - _depthbytes
		_vecbits = _vecbytes * 8
Augie Fackler pvec: migrate to modern integer division...	r43766	_radius = (_vecbits - 30) // 2 # high probability vectors are related
Augie Fackler formatting: blacken the codebase...	r43346
Matt Mackall pvec: introduce pvecs	r16249
		def _bin(bs):
		'''convert a bytestring to a long'''
		v = 0
		for b in bs:
		v = v * 256 + ord(b)
		return v

Augie Fackler formatting: blacken the codebase...	r43346
marmoute pytype: move some type comment to proper annotation...	r52180	def _str(v: int, l: int) -> bytes:
Augie Fackler formatting: byteify all mercurial/ and hgext/ string literals...	r43347	bs = b""
Manuel Jacob py3: replace `pycompat.xrange` by `range`	r50179	for p in range(l):
Augie Fackler pvec: fix overlooked chr() call...	r43764	bs = pycompat.bytechr(v & 255) + bs
Matt Mackall pvec: introduce pvecs	r16249	v >>= 8
		return bs

Augie Fackler formatting: blacken the codebase...	r43346
Matt Mackall pvec: introduce pvecs	r16249	def _split(b):
		'''depth and bitvec'''
		return _bin(b[:_depthbytes]), _bin(b[_depthbytes:])

Augie Fackler formatting: blacken the codebase...	r43346
Matt Mackall pvec: introduce pvecs	r16249	def _join(depth, bitvec):
		return _str(depth, _depthbytes) + _str(bitvec, _vecbytes)

Augie Fackler formatting: blacken the codebase...	r43346
Matt Mackall pvec: introduce pvecs	r16249	def _hweight(x):
		c = 0
		while x:
		if x & 1:
		c += 1
		x >>= 1
		return c
Augie Fackler formatting: blacken the codebase...	r43346

Manuel Jacob py3: replace `pycompat.xrange` by `range`	r50179	_htab = [_hweight(x) for x in range(256)]
Matt Mackall pvec: introduce pvecs	r16249
Augie Fackler formatting: blacken the codebase...	r43346
Matt Mackall pvec: introduce pvecs	r16249	def _hamming(a, b):
		'''find the hamming distance between two longs'''
		d = a ^ b
		c = 0
		while d:
Augie Fackler formatting: blacken the codebase...	r43346	c += _htab[d & 0xFF]
Matt Mackall pvec: introduce pvecs	r16249	d >>= 8
		return c

Augie Fackler formatting: blacken the codebase...	r43346
Matt Mackall pvec: introduce pvecs	r16249	def _mergevec(x, y, c):
		# Ideally, this function would be x ^ y ^ ancestor, but finding
		# ancestors is a nuisance. So instead we find the minimal number
		# of changes to balance the depth and hamming distance

		d1, v1 = x
		d2, v2 = y
		if d1 < d2:
		d1, d2, v1, v2 = d2, d1, v2, v1

		hdist = _hamming(v1, v2)
		ddist = d1 - d2
		v = v1
Augie Fackler formatting: blacken the codebase...	r43346	m = v1 ^ v2 # mask of different bits
Matt Mackall pvec: introduce pvecs	r16249	i = 1

		if hdist > ddist:
		# if delta = 10 and hdist = 100, then we need to go up 55 steps
		# to the ancestor and down 45
Augie Fackler pvec: migrate to modern integer division...	r43766	changes = (hdist - ddist + 1) // 2
Matt Mackall pvec: introduce pvecs	r16249	else:
		# must make at least one change
		changes = 1
		depth = d1 + changes

		# copy changes from v2
		if m:
		while changes:
		if m & i:
		v ^= i
		changes -= 1
		i <<= 1
		else:
		v = _flipbit(v, c)

		return depth, v

Augie Fackler formatting: blacken the codebase...	r43346
Matt Mackall pvec: introduce pvecs	r16249	def _flipbit(v, node):
		# converting bit strings to longs is slow
Augie Fackler formatting: blacken the codebase...	r43346	bit = (hash(node) & 0xFFFFFFFF) % _vecbits
		return v ^ (1 << bit)

Matt Mackall pvec: introduce pvecs	r16249
		def ctxpvec(ctx):
		'''construct a pvec for ctx while filling in the cache'''
Matt Harbison pvec: replace 'ctx._repo' with 'ctx.repo()'	r24339	r = ctx.repo()
marmoute safehasattr: drop usage in favor of hasattr...	r51821	if not hasattr(r, "_pveccache"):
Matt Mackall pvec: introduce pvecs	r16249	r._pveccache = {}
		pvc = r._pveccache
		if ctx.rev() not in pvc:
		cl = r.changelog
Manuel Jacob py3: replace `pycompat.xrange` by `range`	r50179	for n in range(ctx.rev() + 1):
Matt Mackall pvec: introduce pvecs	r16249	if n not in pvc:
		node = cl.node(n)
		p1, p2 = cl.parentrevs(n)
		if p1 == nullrev:
		# start with a 'random' vector at root
		pvc[n] = (0, _bin((node * 3)[:_vecbytes]))
		elif p2 == nullrev:
		d, v = pvc[p1]
		pvc[n] = (d + 1, _flipbit(v, node))
		else:
		pvc[n] = _mergevec(pvc[p1], pvc[p2], node)
		bs = _join(*pvc[ctx.rev()])
Yuya Nishihara base85: proxy through util module...	r32200	return pvec(util.b85encode(bs))
Matt Mackall pvec: introduce pvecs	r16249
Augie Fackler formatting: blacken the codebase...	r43346
Gregory Szorc py3: use class X: instead of class X(object):...	r49801	class pvec:
Matt Mackall pvec: introduce pvecs	r16249	def __init__(self, hashorctx):
Matt Harbison pvec: fix a `str` type conditional for py3...	r44184	if isinstance(hashorctx, bytes):
Matt Mackall pvec: introduce pvecs	r16249	self._bs = hashorctx
Yuya Nishihara base85: proxy through util module...	r32200	self._depth, self._vec = _split(util.b85decode(hashorctx))
Matt Mackall pvec: introduce pvecs	r16249	else:
Bryan O'Sullivan pvec: use the correct name for an identifier...	r18918	self._vec = ctxpvec(hashorctx)
Matt Mackall pvec: introduce pvecs	r16249
		def __str__(self):
		return self._bs

		def __eq__(self, b):
		return self._vec == b._vec and self._depth == b._depth

		def __lt__(self, b):
		delta = b._depth - self._depth
		if delta < 0:
Augie Fackler formatting: blacken the codebase...	r43346	return False # always correct
Matt Mackall pvec: introduce pvecs	r16249	if _hamming(self._vec, b._vec) > delta:
		return False
		return True

		def __gt__(self, b):
		return b < self

		def __or__(self, b):
		delta = abs(b._depth - self._depth)
		if _hamming(self._vec, b._vec) <= delta:
		return False
		return True

		def __sub__(self, b):
		if self \| b:
Augie Fackler formatting: byteify all mercurial/ and hgext/ string literals...	r43347	raise ValueError(b"concurrent pvecs")
Matt Mackall pvec: introduce pvecs	r16249	return self._depth - b._depth

		def distance(self, b):
		d = abs(b._depth - self._depth)
		h = _hamming(self._vec, b._vec)
		return max(d, h)

		def near(self, b):
		dist = abs(b.depth - self._depth)
		if dist > _radius or _hamming(self._vec, b._vec) > _radius:
		return False