upstream/mercurial-mirror Files · mercurial/hgweb/wsgicgi.py

merge: mark file gets as not thread safe (issue5933)...

merge: mark file gets as not thread safe (issue5933) In default installs, this has the effect of disabling the thread-based worker on Windows when manifesting files in the working directory. My measurements have shown that with revlog-based repositories, Mercurial spends a lot of CPU time in revlog code resolving file data. This ends up incurring a lot of context switching across threads and slows down `hg update` operations when going from an empty working directory to the tip of the repo. On mozilla-unified (246,351 files) on an i7-6700K (4+4 CPUs): before: 487s wall after: 360s wall (equivalent to worker.enabled=false) cpus=2: 379s wall Even with only 2 threads, the thread pool is still slower. The introduction of the thread-based worker (02b36e860e0b) states that it resulted in a "~50%" speedup for `hg sparse --enable-profile` and `hg sparse --disable-profile`. This disagrees with my measurement above. I theorize a few reasons for this: 1) Removal of files from the working directory is I/O - not CPU - bound and should benefit from a thread pool (unless I/O is insanely fast and the GIL release is near instantaneous). So tests like `hg sparse --enable-profile` may exercise deletion throughput and aren't good benchmarks for worker tasks that are CPU heavy. 2) The patch was authored by someone at Facebook. The results were likely measured against a repository using remotefilelog. And I believe that revision retrieval during working directory updates with remotefilelog will often use a remote store, thus being I/O and not CPU bound. This probably resulted in an overstated performance gain. Since there appears to be a need to enable the thread-based worker with some stores, I've made the flagging of file gets as thread safe configurable. I've made it experimental because I don't want to formalize a boolean flag for this option and because this attribute is best captured against the store implementation. But we don't have a proper store API for this yet. I'd rather cross this bridge later. It is possible there are revlog-based repositories that do benefit from a thread-based worker. I didn't do very comprehensive testing. If there are, we may want to devise a more proper algorithm for whether to use the thread-based worker, including possibly config options to limit the number of threads to use. But until I see evidence that justifies complexity, simplicity wins. Differential Revision: https://phab.mercurial-scm.org/D3963

Augie Fackler - - Load All Authors

File last commit:

r37765:2d5b5bcc default


                r38755:be498426

default

Download file

             wsgicgi.py
        
                    96 lines
            
             | 3.0 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / mercurial / hgweb / wsgicgi.py
          
                    History
                
                 |
                  Annotation
                 | Raw
                 |Copy content
                 |Copy permalink

      # hgweb/wsgicgi.py - CGI->WSGI translator

      #

      # Copyright 2006 Eric Hopper <hopper@omnifarious.org>

      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

      #

      # This was originally copied from the public domain code at

      # http://www.python.org/dev/peps/pep-0333/#the-server-gateway-side

      from __future__ import absolute_import

      import os

      from .. import (

          pycompat,

      )

      from ..utils import (

          procutil,

      )

      from . import (

          common,

      )

      def launch(application):

          procutil.setbinary(procutil.stdin)

          procutil.setbinary(procutil.stdout)

          environ = dict(os.environ.iteritems()) # re-exports

          environ.setdefault(r'PATH_INFO', '')

          if environ.get(r'SERVER_SOFTWARE', r'').startswith(r'Microsoft-IIS'):

              # IIS includes script_name in PATH_INFO

              scriptname = environ[r'SCRIPT_NAME']

              if environ[r'PATH_INFO'].startswith(scriptname):

                  environ[r'PATH_INFO'] = environ[r'PATH_INFO'][len(scriptname):]

          stdin = procutil.stdin

          if environ.get(r'HTTP_EXPECT', r'').lower() == r'100-continue':

              stdin = common.continuereader(stdin, procutil.stdout.write)

          environ[r'wsgi.input'] = stdin

          environ[r'wsgi.errors'] = procutil.stderr

          environ[r'wsgi.version'] = (1, 0)

          environ[r'wsgi.multithread'] = False

          environ[r'wsgi.multiprocess'] = True

          environ[r'wsgi.run_once'] = True

          if environ.get(r'HTTPS', r'off').lower() in (r'on', r'1', r'yes'):

              environ[r'wsgi.url_scheme'] = r'https'

          else:

              environ[r'wsgi.url_scheme'] = r'http'

          headers_set = []

          headers_sent = []

          out = procutil.stdout

          def write(data):

              if not headers_set:

                  raise AssertionError("write() before start_response()")

              elif not headers_sent:

                  # Before the first output, send the stored headers

                  status, response_headers = headers_sent[:] = headers_set

                  out.write('Status: %s\r\n' % pycompat.bytesurl(status))

                  for hk, hv in response_headers:

                      out.write('%s: %s\r\n' % (pycompat.bytesurl(hk),

                                                pycompat.bytesurl(hv)))

                  out.write('\r\n')

              out.write(data)

              out.flush()

          def start_response(status, response_headers, exc_info=None):

              if exc_info:

                  try:

                      if headers_sent:

                          # Re-raise original exception if headers sent

                          raise exc_info[0](exc_info[1], exc_info[2])

                  finally:

                      exc_info = None     # avoid dangling circular ref

              elif headers_set:

                  raise AssertionError("Headers already set!")

              headers_set[:] = [status, response_headers]

              return write

          content = application(environ, start_response)

          try:

              for chunk in content:

                  write(chunk)

              if not headers_sent:

                  write('')   # send headers now if body was empty

          finally:

              getattr(content, 'close', lambda: None)()

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

				# hgweb/wsgicgi.py - CGI->WSGI translator
				#
				# Copyright 2006 Eric Hopper <hopper@omnifarious.org>
				#
				# This software may be used and distributed according to the terms of the
				# GNU General Public License version 2 or any later version.
				#
				# This was originally copied from the public domain code at
				# http://www.python.org/dev/peps/pep-0333/#the-server-gateway-side

				from __future__ import absolute_import

				import os

				from .. import (
				pycompat,
				)

				from ..utils import (
				procutil,
				)

				from . import (
				common,
				)

				def launch(application):
				procutil.setbinary(procutil.stdin)
				procutil.setbinary(procutil.stdout)

				environ = dict(os.environ.iteritems()) # re-exports
				environ.setdefault(r'PATH_INFO', '')
				if environ.get(r'SERVER_SOFTWARE', r'').startswith(r'Microsoft-IIS'):
				# IIS includes script_name in PATH_INFO
				scriptname = environ[r'SCRIPT_NAME']
				if environ[r'PATH_INFO'].startswith(scriptname):
				environ[r'PATH_INFO'] = environ[r'PATH_INFO'][len(scriptname):]

				stdin = procutil.stdin
				if environ.get(r'HTTP_EXPECT', r'').lower() == r'100-continue':
				stdin = common.continuereader(stdin, procutil.stdout.write)

				environ[r'wsgi.input'] = stdin
				environ[r'wsgi.errors'] = procutil.stderr
				environ[r'wsgi.version'] = (1, 0)
				environ[r'wsgi.multithread'] = False
				environ[r'wsgi.multiprocess'] = True
				environ[r'wsgi.run_once'] = True

				if environ.get(r'HTTPS', r'off').lower() in (r'on', r'1', r'yes'):
				environ[r'wsgi.url_scheme'] = r'https'
				else:
				environ[r'wsgi.url_scheme'] = r'http'

				headers_set = []
				headers_sent = []
				out = procutil.stdout

				def write(data):
				if not headers_set:
				raise AssertionError("write() before start_response()")

				elif not headers_sent:
				# Before the first output, send the stored headers
				status, response_headers = headers_sent[:] = headers_set
				out.write('Status: %s\r\n' % pycompat.bytesurl(status))
				for hk, hv in response_headers:
				out.write('%s: %s\r\n' % (pycompat.bytesurl(hk),
				pycompat.bytesurl(hv)))
				out.write('\r\n')

				out.write(data)
				out.flush()

				def start_response(status, response_headers, exc_info=None):
				if exc_info:
				try:
				if headers_sent:
				# Re-raise original exception if headers sent
				raise exc_info[0](exc_info[1], exc_info[2])
				finally:
				exc_info = None # avoid dangling circular ref
				elif headers_set:
				raise AssertionError("Headers already set!")

				headers_set[:] = [status, response_headers]
				return write

				content = application(environ, start_response)
				try:
				for chunk in content:
				write(chunk)
				if not headers_sent:
				write('') # send headers now if body was empty
				finally:
				getattr(content, 'close', lambda: None)()