upstream/mercurial-mirror Files · mercurial/worker.py

changelog: add class to represent parsed changelog revisions...

changelog: add class to represent parsed changelog revisions Currently, changelog entries are parsed into their respective components at read time. Many operations are only interested in a subset of fields of a changelog entry. The parsing and storing of all the fields adds avoidable overhead. This patch introduces the "changelogrevision" class. It takes changelog raw text and exposes the parsed results as attributes. The code for parsing changelog entries has been moved into its construction function. changelog.read() has been modified to use the new class internally while maintaining its existing API. Future patches will make revision parsing lazy. We implement the construction function of the new class with __new__ instead of __init__ so we can use a named tuple to represent the empty revision. This saves overhead and complexity of coercing later versions of this class to represent an empty instance. While we are here, we add a method on changelog to obtain an instance of the new type. The overhead of constructing the new class regresses performance of revsets accessing this data: author(mpm) 0.896565 0.929984 desc(bug) 0.887169 0.935642 105% date(2015) 0.878797 0.908094 extra(rebase_source) 0.865446 0.922624 106% author(mpm) or author(greg) 1.801832 1.902112 105% author(mpm) or desc(bug) 1.812438 1.860977 date(2015) or branch(default) 0.968276 1.005824 author(mpm) or desc(bug) or date(2015) or extra(rebase_source) 3.656193 3.743381 Once lazy parsing is implemented, these revsets will all be faster than before. There is no performance change on revsets that do not access this data. There /could/ be a performance regression on operations that perform several changelog reads. However, I can't think of anything outside of revsets and `hg log` (basically the same as a revset) that would be impacted.

Gregory Szorc - - Load All Authors

File last commit:

r28292:3eb7faf6 default


                r28487:98d98a64

default

Download file

             worker.py
        
                    184 lines
            
             | 5.7 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / mercurial / worker.py
          
                    History
                
                 |
                  Source
                 | Raw
                 |Copy content
                 |Copy permalink

        Bryan O'Sullivan
    
worker: count the number of CPUs...

              r18635
            
      # worker.py - master-slave parallelism support

      #

      # Copyright 2013 Facebook, Inc.

      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

        Gregory Szorc
    
worker: use absolute_import

              r25992
            
      from __future__ import absolute_import

      import errno

      import os

      import signal

      import sys

      import threading

      from .i18n import _

        Pierre-Yves David
    
error: get Abort from 'error' instead of 'util'...

              r26587
            
      from . import error

        Bryan O'Sullivan
    
worker: count the number of CPUs...

              r18635
            
      def countcpus():

          '''try to count the number of CPUs on the system'''

        Gregory Szorc
    
worker: restore old countcpus code (issue4869)...

              r26568
            
          # posix

        Bryan O'Sullivan
    
worker: count the number of CPUs...

              r18635
            
          try:

        Gregory Szorc
    
worker: restore old countcpus code (issue4869)...

              r26568
            
              n = int(os.sysconf('SC_NPROCESSORS_ONLN'))

              if n > 0:

                  return n

          except (AttributeError, ValueError):

              pass

          # windows

          try:

              n = int(os.environ['NUMBER_OF_PROCESSORS'])

              if n > 0:

                  return n

          except (KeyError, ValueError):

              pass

          return 1

        Bryan O'Sullivan
    
worker: estimate whether it's worth running a task in parallel...

              r18636
            
      def _numworkers(ui):

          s = ui.config('worker', 'numcpus')

          if s:

              try:

                  n = int(s)

                  if n >= 1:

                      return n

              except ValueError:

        Pierre-Yves David
    
error: get Abort from 'error' instead of 'util'...

              r26587
            
                  raise error.Abort(_('number of cpus must be an integer'))

        Bryan O'Sullivan
    
worker: estimate whether it's worth running a task in parallel...

              r18636
            
          return min(max(countcpus(), 4), 32)

      if os.name == 'posix':

          _startupcost = 0.01

      else:

          _startupcost = 1e30

      def worthwhile(ui, costperop, nops):

          '''try to determine whether the benefit of multiple processes can

          outweigh the cost of starting them'''

          linear = costperop * nops

          workers = _numworkers(ui)

          benefit = linear - (_startupcost * workers + linear / workers)

          return benefit >= 0.15

        Bryan O'Sullivan
    
worker: partition a list (of tasks) into equal-sized chunks

              r18637
            
        Bryan O'Sullivan
    
worker: allow a function to be run in multiple worker processes...

              r18638
            
      def worker(ui, costperarg, func, staticargs, args):

          '''run a function, possibly in parallel in multiple worker

          processes.

          returns a progress iterator

          costperarg - cost of a single task

          func - function to run

          staticargs - arguments to pass to every invocation of the function

          args - arguments to split into chunks, to pass to individual

          workers

          '''

          if worthwhile(ui, costperarg, len(args)):

              return _platformworker(ui, func, staticargs, args)

          return func(*staticargs + (args,))

      def _posixworker(ui, func, staticargs, args):

          rfd, wfd = os.pipe()

          workers = _numworkers(ui)

        Bryan O'Sullivan
    
worker: fix a race in SIGINT handling...

              r18708
            
          oldhandler = signal.getsignal(signal.SIGINT)

          signal.signal(signal.SIGINT, signal.SIG_IGN)

        Bryan O'Sullivan
    
worker: handle worker failures more aggressively...

              r18709
            
          pids, problem = [], [0]

        Bryan O'Sullivan
    
worker: allow a function to be run in multiple worker processes...

              r18638
            
          for pargs in partition(args, workers):

              pid = os.fork()

              if pid == 0:

        Bryan O'Sullivan
    
worker: fix a race in SIGINT handling...

              r18708
            
                  signal.signal(signal.SIGINT, oldhandler)

        Bryan O'Sullivan
    
worker: allow a function to be run in multiple worker processes...

              r18638
            
                  try:

                      os.close(rfd)

                      for i, item in func(*(staticargs + (pargs,))):

                          os.write(wfd, '%d %s\n' % (i, item))

                      os._exit(0)

                  except KeyboardInterrupt:

                      os._exit(255)

        Matt Mackall
    
worker: properly report errors from worker processes (issue3982)

              r19408
            
                      # other exceptions are allowed to propagate, we rely

                      # on lock.py's pid checks to avoid release callbacks

        Bryan O'Sullivan
    
worker: handle worker failures more aggressively...

              r18709
            
              pids.append(pid)

          pids.reverse()

        Bryan O'Sullivan
    
worker: allow a function to be run in multiple worker processes...

              r18638
            
          os.close(wfd)

          fp = os.fdopen(rfd, 'rb', 0)

        Bryan O'Sullivan
    
worker: handle worker failures more aggressively...

              r18709
            
          def killworkers():

              # if one worker bails, there's no good reason to wait for the rest

              for p in pids:

                  try:

                      os.kill(p, signal.SIGTERM)

        Gregory Szorc
    
global: mass rewrite to use modern exception syntax...

              r25660
            
                  except OSError as err:

        Bryan O'Sullivan
    
worker: handle worker failures more aggressively...

              r18709
            
                      if err.errno != errno.ESRCH:

                          raise

          def waitforworkers():

        Mads Kiilerich
    
cleanup: avoid _ for local unused tmp variables - that is reserved for i18n...

              r22199
            
              for _pid in pids:

        Bryan O'Sullivan
    
worker: handle worker failures more aggressively...

              r18709
            
                  st = _exitstatus(os.wait()[1])

        Matt Mackall
    
worker: check problem state correctly (issue3982)...

              r19406
            
                  if st and not problem[0]:

        Bryan O'Sullivan
    
worker: handle worker failures more aggressively...

              r18709
            
                      problem[0] = st

                      killworkers()

          t = threading.Thread(target=waitforworkers)

          t.start()

        Bryan O'Sullivan
    
worker: allow a function to be run in multiple worker processes...

              r18638
            
          def cleanup():

              signal.signal(signal.SIGINT, oldhandler)

        Bryan O'Sullivan
    
worker: handle worker failures more aggressively...

              r18709
            
              t.join()

              status = problem[0]

              if status:

                  if status < 0:

                      os.kill(os.getpid(), -status)

                  sys.exit(status)

        Bryan O'Sullivan
    
worker: allow a function to be run in multiple worker processes...

              r18638
            
          try:

              for line in fp:

                  l = line.split(' ', 1)

                  yield int(l[0]), l[1][:-1]

          except: # re-raises

        Bryan O'Sullivan
    
worker: handle worker failures more aggressively...

              r18709
            
              killworkers()

        Bryan O'Sullivan
    
worker: allow a function to be run in multiple worker processes...

              r18638
            
              cleanup()

              raise

          cleanup()

        Bryan O'Sullivan
    
worker: on error, exit similarly to the first failing worker...

              r18707
            
      def _posixexitstatus(code):

          '''convert a posix exit status into the same form returned by

          os.spawnv

          returns None if the process was stopped instead of exiting'''

          if os.WIFEXITED(code):

              return os.WEXITSTATUS(code)

          elif os.WIFSIGNALED(code):

              return -os.WTERMSIG(code)

        Bryan O'Sullivan
    
worker: allow a function to be run in multiple worker processes...

              r18638
            
      if os.name != 'nt':

          _platformworker = _posixworker

        Bryan O'Sullivan
    
worker: on error, exit similarly to the first failing worker...

              r18707
            
          _exitstatus = _posixexitstatus

        Bryan O'Sullivan
    
worker: allow a function to be run in multiple worker processes...

              r18638
            
        Bryan O'Sullivan
    
worker: partition a list (of tasks) into equal-sized chunks

              r18637
            
      def partition(lst, nslices):

        Gregory Szorc
    
worker: change partition strategy to every Nth element...

              r28181
            
          '''partition a list into N slices of roughly equal size

          The current strategy takes every Nth element from the input. If

          we ever write workers that need to preserve grouping in input

          we should consider allowing callers to specify a partition strategy.

        Gregory Szorc
    
worker: document poor partitioning scheme impact...

              r28292
            
          mpm is not a fan of this partitioning strategy when files are involved.

          In his words:

              Single-threaded Mercurial makes a point of creating and visiting

              files in a fixed order (alphabetical). When creating files in order,

              a typical filesystem is likely to allocate them on nearby regions on

              disk. Thus, when revisiting in the same order, locality is maximized

              and various forms of OS and disk-level caching and read-ahead get a

              chance to work.

              This effect can be quite significant on spinning disks. I discovered it

              circa Mercurial v0.4 when revlogs were named by hashes of filenames.

              Tarring a repo and copying it to another disk effectively randomized

              the revlog ordering on disk by sorting the revlogs by hash and suddenly

              performance of my kernel checkout benchmark dropped by ~10x because the

              "working set" of sectors visited no longer fit in the drive's cache and

              the workload switched from streaming to random I/O.

              What we should really be doing is have workers read filenames from a

              ordered queue. This preserves locality and also keeps any worker from

              getting more than one file out of balance.

        Gregory Szorc
    
worker: change partition strategy to every Nth element...

              r28181
            
          '''

          for i in range(nslices):

              yield lst[i::nslices]

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

Bryan O'Sullivan worker: count the number of CPUs...	r18635	# worker.py - master-slave parallelism support
		#
		# Copyright 2013 Facebook, Inc.
		#
		# This software may be used and distributed according to the terms of the
		# GNU General Public License version 2 or any later version.

Gregory Szorc worker: use absolute_import	r25992	from __future__ import absolute_import

		import errno
		import os
		import signal
		import sys
		import threading

		from .i18n import _
Pierre-Yves David error: get Abort from 'error' instead of 'util'...	r26587	from . import error
Bryan O'Sullivan worker: count the number of CPUs...	r18635
		def countcpus():
		'''try to count the number of CPUs on the system'''
Gregory Szorc worker: restore old countcpus code (issue4869)...	r26568
		# posix
Bryan O'Sullivan worker: count the number of CPUs...	r18635	try:
Gregory Szorc worker: restore old countcpus code (issue4869)...	r26568	n = int(os.sysconf('SC_NPROCESSORS_ONLN'))
		if n > 0:
		return n
		except (AttributeError, ValueError):
		pass

		# windows
		try:
		n = int(os.environ['NUMBER_OF_PROCESSORS'])
		if n > 0:
		return n
		except (KeyError, ValueError):
		pass

		return 1
Bryan O'Sullivan worker: estimate whether it's worth running a task in parallel...	r18636
		def _numworkers(ui):
		s = ui.config('worker', 'numcpus')
		if s:
		try:
		n = int(s)
		if n >= 1:
		return n
		except ValueError:
Pierre-Yves David error: get Abort from 'error' instead of 'util'...	r26587	raise error.Abort(_('number of cpus must be an integer'))
Bryan O'Sullivan worker: estimate whether it's worth running a task in parallel...	r18636	return min(max(countcpus(), 4), 32)

		if os.name == 'posix':
		_startupcost = 0.01
		else:
		_startupcost = 1e30

		def worthwhile(ui, costperop, nops):
		'''try to determine whether the benefit of multiple processes can
		outweigh the cost of starting them'''
		linear = costperop * nops
		workers = _numworkers(ui)
		benefit = linear - (_startupcost * workers + linear / workers)
		return benefit >= 0.15
Bryan O'Sullivan worker: partition a list (of tasks) into equal-sized chunks	r18637
Bryan O'Sullivan worker: allow a function to be run in multiple worker processes...	r18638	def worker(ui, costperarg, func, staticargs, args):
		'''run a function, possibly in parallel in multiple worker
		processes.

		returns a progress iterator

		costperarg - cost of a single task

		func - function to run

		staticargs - arguments to pass to every invocation of the function

		args - arguments to split into chunks, to pass to individual
		workers
		'''
		if worthwhile(ui, costperarg, len(args)):
		return _platformworker(ui, func, staticargs, args)
		return func(*staticargs + (args,))

		def _posixworker(ui, func, staticargs, args):
		rfd, wfd = os.pipe()
		workers = _numworkers(ui)
Bryan O'Sullivan worker: fix a race in SIGINT handling...	r18708	oldhandler = signal.getsignal(signal.SIGINT)
		signal.signal(signal.SIGINT, signal.SIG_IGN)
Bryan O'Sullivan worker: handle worker failures more aggressively...	r18709	pids, problem = [], [0]
Bryan O'Sullivan worker: allow a function to be run in multiple worker processes...	r18638	for pargs in partition(args, workers):
		pid = os.fork()
		if pid == 0:
Bryan O'Sullivan worker: fix a race in SIGINT handling...	r18708	signal.signal(signal.SIGINT, oldhandler)
Bryan O'Sullivan worker: allow a function to be run in multiple worker processes...	r18638	try:
		os.close(rfd)
		for i, item in func(*(staticargs + (pargs,))):
		os.write(wfd, '%d %s\n' % (i, item))
		os._exit(0)
		except KeyboardInterrupt:
		os._exit(255)
Matt Mackall worker: properly report errors from worker processes (issue3982)	r19408	# other exceptions are allowed to propagate, we rely
		# on lock.py's pid checks to avoid release callbacks
Bryan O'Sullivan worker: handle worker failures more aggressively...	r18709	pids.append(pid)
		pids.reverse()
Bryan O'Sullivan worker: allow a function to be run in multiple worker processes...	r18638	os.close(wfd)
		fp = os.fdopen(rfd, 'rb', 0)
Bryan O'Sullivan worker: handle worker failures more aggressively...	r18709	def killworkers():
		# if one worker bails, there's no good reason to wait for the rest
		for p in pids:
		try:
		os.kill(p, signal.SIGTERM)
Gregory Szorc global: mass rewrite to use modern exception syntax...	r25660	except OSError as err:
Bryan O'Sullivan worker: handle worker failures more aggressively...	r18709	if err.errno != errno.ESRCH:
		raise
		def waitforworkers():
Mads Kiilerich cleanup: avoid _ for local unused tmp variables - that is reserved for i18n...	r22199	for _pid in pids:
Bryan O'Sullivan worker: handle worker failures more aggressively...	r18709	st = _exitstatus(os.wait()[1])
Matt Mackall worker: check problem state correctly (issue3982)...	r19406	if st and not problem[0]:
Bryan O'Sullivan worker: handle worker failures more aggressively...	r18709	problem[0] = st
		killworkers()
		t = threading.Thread(target=waitforworkers)
		t.start()
Bryan O'Sullivan worker: allow a function to be run in multiple worker processes...	r18638	def cleanup():
		signal.signal(signal.SIGINT, oldhandler)
Bryan O'Sullivan worker: handle worker failures more aggressively...	r18709	t.join()
		status = problem[0]
		if status:
		if status < 0:
		os.kill(os.getpid(), -status)
		sys.exit(status)
Bryan O'Sullivan worker: allow a function to be run in multiple worker processes...	r18638	try:
		for line in fp:
		l = line.split(' ', 1)
		yield int(l[0]), l[1][:-1]
		except: # re-raises
Bryan O'Sullivan worker: handle worker failures more aggressively...	r18709	killworkers()
Bryan O'Sullivan worker: allow a function to be run in multiple worker processes...	r18638	cleanup()
		raise
		cleanup()

Bryan O'Sullivan worker: on error, exit similarly to the first failing worker...	r18707	def _posixexitstatus(code):
		'''convert a posix exit status into the same form returned by
		os.spawnv

		returns None if the process was stopped instead of exiting'''
		if os.WIFEXITED(code):
		return os.WEXITSTATUS(code)
		elif os.WIFSIGNALED(code):
		return -os.WTERMSIG(code)

Bryan O'Sullivan worker: allow a function to be run in multiple worker processes...	r18638	if os.name != 'nt':
		_platformworker = _posixworker
Bryan O'Sullivan worker: on error, exit similarly to the first failing worker...	r18707	_exitstatus = _posixexitstatus
Bryan O'Sullivan worker: allow a function to be run in multiple worker processes...	r18638
Bryan O'Sullivan worker: partition a list (of tasks) into equal-sized chunks	r18637	def partition(lst, nslices):
Gregory Szorc worker: change partition strategy to every Nth element...	r28181	'''partition a list into N slices of roughly equal size

		The current strategy takes every Nth element from the input. If
		we ever write workers that need to preserve grouping in input
		we should consider allowing callers to specify a partition strategy.
Gregory Szorc worker: document poor partitioning scheme impact...	r28292
		mpm is not a fan of this partitioning strategy when files are involved.
		In his words:

		Single-threaded Mercurial makes a point of creating and visiting
		files in a fixed order (alphabetical). When creating files in order,
		a typical filesystem is likely to allocate them on nearby regions on
		disk. Thus, when revisiting in the same order, locality is maximized
		and various forms of OS and disk-level caching and read-ahead get a
		chance to work.

		This effect can be quite significant on spinning disks. I discovered it
		circa Mercurial v0.4 when revlogs were named by hashes of filenames.
		Tarring a repo and copying it to another disk effectively randomized
		the revlog ordering on disk by sorting the revlogs by hash and suddenly
		performance of my kernel checkout benchmark dropped by ~10x because the
		"working set" of sectors visited no longer fit in the drive's cache and
		the workload switched from streaming to random I/O.

		What we should really be doing is have workers read filenames from a
		ordered queue. This preserves locality and also keeps any worker from
		getting more than one file out of balance.
Gregory Szorc worker: change partition strategy to every Nth element...	r28181	'''
		for i in range(nslices):
		yield lst[i::nslices]