upstream/mercurial-mirror Files · mercurial/worker.py

osutil: implement setprocname to set process title for some platforms...

osutil: implement setprocname to set process title for some platforms This patch adds a simple setprocname method to osutil. The operation is not defined by any standard and is platform-specific, the current implementation tries to cover some major platforms (ex. Linux, OS X, FreeBSD) that is relatively easy to support. Other platforms (Windows [4], other BSDs, ...) can be added in the future. The current implementation supports two methods to change process title: a. setproctitle if available (works in FreeBSD). b. rewrite argv in place (works in Linux [1] and Mac OS X). [2] [3] [1]: Linux has "prctl(PR_SET_NAME, ...)" but 1) it has 16-byte limit, which is too small; 2) it is not quite equivalent to what we want - it changes "/proc/self/comm", not "/proc/self/cmdline" - "comm" change won't show up in "ps" output unless "-o comm" is used. [2]: The implementation does not rewrite the **environ buffer like some other implementations do, just to make the code simpler and safer. However, this also means the buffer size we can rewrite is significantly shorter. If we are really greedy and want the "environ" space, we can change the implementation later. [3]: It requires a CPython private API: Py_GetArgcArgv to get the original argv. Unfortunately Python 3 makes a copy of argv and returns the wchar_t version, so it is not supported for now. (if we really want to, we could count backwards from "char **environ", given known argc and argv, not sure if that's a good idea - probably not) [4]: The feature is aimed to make it easier for forked command server processes to show what they are doing. Since Windows does not support fork(), despite it's a major platform, its support is not added in this patch.

Jun Wu - - Load All Authors

File last commit:

r30396:78a58dcf default


                r30409:08521615

default

Download file

             worker.py
        
                    187 lines
            
             | 5.7 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / mercurial / worker.py
          
                    History
                
                 |
                  Annotation
                 | Raw
                 |Copy content
                 |Copy permalink

      # worker.py - master-slave parallelism support

      #

      # Copyright 2013 Facebook, Inc.

      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

      from __future__ import absolute_import

      import errno

      import os

      import signal

      import sys

      import threading

      from .i18n import _

      from . import (

          error,

          util,

      )

      def countcpus():

          '''try to count the number of CPUs on the system'''

          # posix

          try:

              n = int(os.sysconf('SC_NPROCESSORS_ONLN'))

              if n > 0:

                  return n

          except (AttributeError, ValueError):

              pass

          # windows

          try:

              n = int(os.environ['NUMBER_OF_PROCESSORS'])

              if n > 0:

                  return n

          except (KeyError, ValueError):

              pass

          return 1

      def _numworkers(ui):

          s = ui.config('worker', 'numcpus')

          if s:

              try:

                  n = int(s)

                  if n >= 1:

                      return n

              except ValueError:

                  raise error.Abort(_('number of cpus must be an integer'))

          return min(max(countcpus(), 4), 32)

      if os.name == 'posix':

          _startupcost = 0.01

      else:

          _startupcost = 1e30

      def worthwhile(ui, costperop, nops):

          '''try to determine whether the benefit of multiple processes can

          outweigh the cost of starting them'''

          linear = costperop * nops

          workers = _numworkers(ui)

          benefit = linear - (_startupcost * workers + linear / workers)

          return benefit >= 0.15

      def worker(ui, costperarg, func, staticargs, args):

          '''run a function, possibly in parallel in multiple worker

          processes.

          returns a progress iterator

          costperarg - cost of a single task

          func - function to run

          staticargs - arguments to pass to every invocation of the function

          args - arguments to split into chunks, to pass to individual

          workers

          '''

          if worthwhile(ui, costperarg, len(args)):

              return _platformworker(ui, func, staticargs, args)

          return func(*staticargs + (args,))

      def _posixworker(ui, func, staticargs, args):

          rfd, wfd = os.pipe()

          workers = _numworkers(ui)

          oldhandler = signal.getsignal(signal.SIGINT)

          signal.signal(signal.SIGINT, signal.SIG_IGN)

          pids, problem = [], [0]

          for pargs in partition(args, workers):

              pid = os.fork()

              if pid == 0:

                  signal.signal(signal.SIGINT, oldhandler)

                  try:

                      os.close(rfd)

                      for i, item in func(*(staticargs + (pargs,))):

                          os.write(wfd, '%d %s\n' % (i, item))

                      os._exit(0)

                  except KeyboardInterrupt:

                      os._exit(255)

                      # other exceptions are allowed to propagate, we rely

                      # on lock.py's pid checks to avoid release callbacks

              pids.append(pid)

          pids.reverse()

          os.close(wfd)

          fp = os.fdopen(rfd, 'rb', 0)

          def killworkers():

              # if one worker bails, there's no good reason to wait for the rest

              for p in pids:

                  try:

                      os.kill(p, signal.SIGTERM)

                  except OSError as err:

                      if err.errno != errno.ESRCH:

                          raise

          def waitforworkers():

              for _pid in pids:

                  st = _exitstatus(os.wait()[1])

                  if st and not problem[0]:

                      problem[0] = st

                      killworkers()

          t = threading.Thread(target=waitforworkers)

          t.start()

          def cleanup():

              signal.signal(signal.SIGINT, oldhandler)

              t.join()

              status = problem[0]

              if status:

                  if status < 0:

                      os.kill(os.getpid(), -status)

                  sys.exit(status)

          try:

              for line in util.iterfile(fp):

                  l = line.split(' ', 1)

                  yield int(l[0]), l[1][:-1]

          except: # re-raises

              killworkers()

              cleanup()

              raise

          cleanup()

      def _posixexitstatus(code):

          '''convert a posix exit status into the same form returned by

          os.spawnv

          returns None if the process was stopped instead of exiting'''

          if os.WIFEXITED(code):

              return os.WEXITSTATUS(code)

          elif os.WIFSIGNALED(code):

              return -os.WTERMSIG(code)

      if os.name != 'nt':

          _platformworker = _posixworker

          _exitstatus = _posixexitstatus

      def partition(lst, nslices):

          '''partition a list into N slices of roughly equal size

          The current strategy takes every Nth element from the input. If

          we ever write workers that need to preserve grouping in input

          we should consider allowing callers to specify a partition strategy.

          mpm is not a fan of this partitioning strategy when files are involved.

          In his words:

              Single-threaded Mercurial makes a point of creating and visiting

              files in a fixed order (alphabetical). When creating files in order,

              a typical filesystem is likely to allocate them on nearby regions on

              disk. Thus, when revisiting in the same order, locality is maximized

              and various forms of OS and disk-level caching and read-ahead get a

              chance to work.

              This effect can be quite significant on spinning disks. I discovered it

              circa Mercurial v0.4 when revlogs were named by hashes of filenames.

              Tarring a repo and copying it to another disk effectively randomized

              the revlog ordering on disk by sorting the revlogs by hash and suddenly

              performance of my kernel checkout benchmark dropped by ~10x because the

              "working set" of sectors visited no longer fit in the drive's cache and

              the workload switched from streaming to random I/O.

              What we should really be doing is have workers read filenames from a

              ordered queue. This preserves locality and also keeps any worker from

              getting more than one file out of balance.

          '''

          for i in range(nslices):

              yield lst[i::nslices]

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

				# worker.py - master-slave parallelism support
				#
				# Copyright 2013 Facebook, Inc.
				#
				# This software may be used and distributed according to the terms of the
				# GNU General Public License version 2 or any later version.

				from __future__ import absolute_import

				import errno
				import os
				import signal
				import sys
				import threading

				from .i18n import _
				from . import (
				error,
				util,
				)

				def countcpus():
				'''try to count the number of CPUs on the system'''

				# posix
				try:
				n = int(os.sysconf('SC_NPROCESSORS_ONLN'))
				if n > 0:
				return n
				except (AttributeError, ValueError):
				pass

				# windows
				try:
				n = int(os.environ['NUMBER_OF_PROCESSORS'])
				if n > 0:
				return n
				except (KeyError, ValueError):
				pass

				return 1

				def _numworkers(ui):
				s = ui.config('worker', 'numcpus')
				if s:
				try:
				n = int(s)
				if n >= 1:
				return n
				except ValueError:
				raise error.Abort(_('number of cpus must be an integer'))
				return min(max(countcpus(), 4), 32)

				if os.name == 'posix':
				_startupcost = 0.01
				else:
				_startupcost = 1e30

				def worthwhile(ui, costperop, nops):
				'''try to determine whether the benefit of multiple processes can
				outweigh the cost of starting them'''
				linear = costperop * nops
				workers = _numworkers(ui)
				benefit = linear - (_startupcost * workers + linear / workers)
				return benefit >= 0.15

				def worker(ui, costperarg, func, staticargs, args):
				'''run a function, possibly in parallel in multiple worker
				processes.

				returns a progress iterator

				costperarg - cost of a single task

				func - function to run

				staticargs - arguments to pass to every invocation of the function

				args - arguments to split into chunks, to pass to individual
				workers
				'''
				if worthwhile(ui, costperarg, len(args)):
				return _platformworker(ui, func, staticargs, args)
				return func(*staticargs + (args,))

				def _posixworker(ui, func, staticargs, args):
				rfd, wfd = os.pipe()
				workers = _numworkers(ui)
				oldhandler = signal.getsignal(signal.SIGINT)
				signal.signal(signal.SIGINT, signal.SIG_IGN)
				pids, problem = [], [0]
				for pargs in partition(args, workers):
				pid = os.fork()
				if pid == 0:
				signal.signal(signal.SIGINT, oldhandler)
				try:
				os.close(rfd)
				for i, item in func(*(staticargs + (pargs,))):
				os.write(wfd, '%d %s\n' % (i, item))
				os._exit(0)
				except KeyboardInterrupt:
				os._exit(255)
				# other exceptions are allowed to propagate, we rely
				# on lock.py's pid checks to avoid release callbacks
				pids.append(pid)
				pids.reverse()
				os.close(wfd)
				fp = os.fdopen(rfd, 'rb', 0)
				def killworkers():
				# if one worker bails, there's no good reason to wait for the rest
				for p in pids:
				try:
				os.kill(p, signal.SIGTERM)
				except OSError as err:
				if err.errno != errno.ESRCH:
				raise
				def waitforworkers():
				for _pid in pids:
				st = _exitstatus(os.wait()[1])
				if st and not problem[0]:
				problem[0] = st
				killworkers()
				t = threading.Thread(target=waitforworkers)
				t.start()
				def cleanup():
				signal.signal(signal.SIGINT, oldhandler)
				t.join()
				status = problem[0]
				if status:
				if status < 0:
				os.kill(os.getpid(), -status)
				sys.exit(status)
				try:
				for line in util.iterfile(fp):
				l = line.split(' ', 1)
				yield int(l[0]), l[1][:-1]
				except: # re-raises
				killworkers()
				cleanup()
				raise
				cleanup()

				def _posixexitstatus(code):
				'''convert a posix exit status into the same form returned by
				os.spawnv

				returns None if the process was stopped instead of exiting'''
				if os.WIFEXITED(code):
				return os.WEXITSTATUS(code)
				elif os.WIFSIGNALED(code):
				return -os.WTERMSIG(code)

				if os.name != 'nt':
				_platformworker = _posixworker
				_exitstatus = _posixexitstatus

				def partition(lst, nslices):
				'''partition a list into N slices of roughly equal size

				The current strategy takes every Nth element from the input. If
				we ever write workers that need to preserve grouping in input
				we should consider allowing callers to specify a partition strategy.

				mpm is not a fan of this partitioning strategy when files are involved.
				In his words:

				Single-threaded Mercurial makes a point of creating and visiting
				files in a fixed order (alphabetical). When creating files in order,
				a typical filesystem is likely to allocate them on nearby regions on
				disk. Thus, when revisiting in the same order, locality is maximized
				and various forms of OS and disk-level caching and read-ahead get a
				chance to work.

				This effect can be quite significant on spinning disks. I discovered it
				circa Mercurial v0.4 when revlogs were named by hashes of filenames.
				Tarring a repo and copying it to another disk effectively randomized
				the revlog ordering on disk by sorting the revlogs by hash and suddenly
				performance of my kernel checkout benchmark dropped by ~10x because the
				"working set" of sectors visited no longer fit in the drive's cache and
				the workload switched from streaming to random I/O.

				What we should really be doing is have workers read filenames from a
				ordered queue. This preserves locality and also keeps any worker from
				getting more than one file out of balance.
				'''
				for i in range(nslices):
				yield lst[i::nslices]