upstream/mercurial-mirror Files · contrib/bdiff-torture.py

merge: mark file gets as not thread safe (issue5933)...

merge: mark file gets as not thread safe (issue5933) In default installs, this has the effect of disabling the thread-based worker on Windows when manifesting files in the working directory. My measurements have shown that with revlog-based repositories, Mercurial spends a lot of CPU time in revlog code resolving file data. This ends up incurring a lot of context switching across threads and slows down `hg update` operations when going from an empty working directory to the tip of the repo. On mozilla-unified (246,351 files) on an i7-6700K (4+4 CPUs): before: 487s wall after: 360s wall (equivalent to worker.enabled=false) cpus=2: 379s wall Even with only 2 threads, the thread pool is still slower. The introduction of the thread-based worker (02b36e860e0b) states that it resulted in a "~50%" speedup for `hg sparse --enable-profile` and `hg sparse --disable-profile`. This disagrees with my measurement above. I theorize a few reasons for this: 1) Removal of files from the working directory is I/O - not CPU - bound and should benefit from a thread pool (unless I/O is insanely fast and the GIL release is near instantaneous). So tests like `hg sparse --enable-profile` may exercise deletion throughput and aren't good benchmarks for worker tasks that are CPU heavy. 2) The patch was authored by someone at Facebook. The results were likely measured against a repository using remotefilelog. And I believe that revision retrieval during working directory updates with remotefilelog will often use a remote store, thus being I/O and not CPU bound. This probably resulted in an overstated performance gain. Since there appears to be a need to enable the thread-based worker with some stores, I've made the flagging of file gets as thread safe configurable. I've made it experimental because I don't want to formalize a boolean flag for this option and because this attribute is best captured against the store implementation. But we don't have a proper store API for this yet. I'd rather cross this bridge later. It is possible there are revlog-based repositories that do benefit from a thread-based worker. I didn't do very comprehensive testing. If there are, we may want to devise a more proper algorithm for whether to use the thread-based worker, including possibly config options to limit the number of threads to use. But until I see evidence that justifies complexity, simplicity wins. Differential Revision: https://phab.mercurial-scm.org/D3963

Yuya Nishihara - - Load All Authors

File last commit:

r32202:0c73634d default


                r38755:be498426

default

Download file

             bdiff-torture.py
        
                    99 lines
            
             | 2.1 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / contrib / bdiff-torture.py
          
                    History
                
                 |
                  Annotation
                 | Raw
                 |Copy content
                 |Copy permalink

      # Randomized torture test generation for bdiff

      from __future__ import absolute_import, print_function

      import random

      import sys

      from mercurial import (

          mdiff,

      )

      def reducetest(a, b):

          tries = 0

          reductions = 0

          print("reducing...")

          while tries < 1000:

              a2 = "\n".join(l for l in a.splitlines()

                             if random.randint(0, 100) > 0) + "\n"

              b2 = "\n".join(l for l in b.splitlines()

                             if random.randint(0, 100) > 0) + "\n"

              if a2 == a and b2 == b:

                  continue

              if a2 == b2:

                  continue

              tries += 1

              try:

                  test1(a, b)

              except Exception as inst:

                  reductions += 1

                  tries = 0

                  a = a2

                  b = b2

          print("reduced:", reductions, len(a) + len(b),

                repr(a), repr(b))

          try:

              test1(a, b)

          except Exception as inst:

              print("failed:", inst)

          sys.exit(0)

      def test1(a, b):

          d = mdiff.textdiff(a, b)

          if not d:

              raise ValueError("empty")

          c = mdiff.patches(a, [d])

          if c != b:

              raise ValueError("bad")

      def testwrap(a, b):

          try:

              test1(a, b)

              return

          except Exception as inst:

              pass

          print("exception:", inst)

          reducetest(a, b)

      def test(a, b):

          testwrap(a, b)

          testwrap(b, a)

      def rndtest(size, noise):

          a = []

          src = "                aaaaaaaabbbbccd"

          for x in xrange(size):

              a.append(src[random.randint(0, len(src) - 1)])

          while True:

              b = [c for c in a if random.randint(0, 99) > noise]

              b2 = []

              for c in b:

                  b2.append(c)

                  while random.randint(0, 99) < noise:

                      b2.append(src[random.randint(0, len(src) - 1)])

              if b2 != a:

                  break

          a = "\n".join(a) + "\n"

          b = "\n".join(b2) + "\n"

          test(a, b)

      maxvol = 10000

      startsize = 2

      while True:

          size = startsize

          count = 0

          while size < maxvol:

              print(size)

              volume = 0

              while volume < maxvol:

                  rndtest(size, 2)

                  volume += size

                  count += 2

              size *= 2

          maxvol *= 4

          startsize *= 4

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

				# Randomized torture test generation for bdiff

				from __future__ import absolute_import, print_function
				import random
				import sys

				from mercurial import (
				mdiff,
				)

				def reducetest(a, b):
				tries = 0
				reductions = 0
				print("reducing...")
				while tries < 1000:
				a2 = "\n".join(l for l in a.splitlines()
				if random.randint(0, 100) > 0) + "\n"
				b2 = "\n".join(l for l in b.splitlines()
				if random.randint(0, 100) > 0) + "\n"
				if a2 == a and b2 == b:
				continue
				if a2 == b2:
				continue
				tries += 1

				try:
				test1(a, b)
				except Exception as inst:
				reductions += 1
				tries = 0
				a = a2
				b = b2

				print("reduced:", reductions, len(a) + len(b),
				repr(a), repr(b))
				try:
				test1(a, b)
				except Exception as inst:
				print("failed:", inst)

				sys.exit(0)

				def test1(a, b):
				d = mdiff.textdiff(a, b)
				if not d:
				raise ValueError("empty")
				c = mdiff.patches(a, [d])
				if c != b:
				raise ValueError("bad")

				def testwrap(a, b):
				try:
				test1(a, b)
				return
				except Exception as inst:
				pass
				print("exception:", inst)
				reducetest(a, b)

				def test(a, b):
				testwrap(a, b)
				testwrap(b, a)

				def rndtest(size, noise):
				a = []
				src = " aaaaaaaabbbbccd"
				for x in xrange(size):
				a.append(src[random.randint(0, len(src) - 1)])

				while True:
				b = [c for c in a if random.randint(0, 99) > noise]
				b2 = []
				for c in b:
				b2.append(c)
				while random.randint(0, 99) < noise:
				b2.append(src[random.randint(0, len(src) - 1)])
				if b2 != a:
				break

				a = "\n".join(a) + "\n"
				b = "\n".join(b2) + "\n"

				test(a, b)

				maxvol = 10000
				startsize = 2
				while True:
				size = startsize
				count = 0
				while size < maxvol:
				print(size)
				volume = 0
				while volume < maxvol:
				rndtest(size, 2)
				volume += size
				count += 2
				size *= 2
				maxvol *= 4
				startsize *= 4