upstream/mercurial-mirror Files · mercurial/hgweb/common.py

localrepo: experimental support for non-zlib revlog compression...

localrepo: experimental support for non-zlib revlog compression The final part of integrating the compression manager APIs into revlog storage is the plumbing for repositories to advertise they are using non-zlib storage and for revlogs to instantiate a non-zlib compression engine. The main intent of the compression manager work was to zstd all of the things. Adding zstd to revlogs has proved to be more involved than other places because revlogs are... special. Very small inputs and the use of delta chains (which are themselves a form of compression) are a completely different use case from streaming compression, which bundles and the wire protocol employ. I've conducted numerous experiments with zstd in revlogs and have yet to formalize compression settings and a storage architecture that I'm confident I won't regret later. In other words, I'm not yet ready to commit to a new mechanism for using zstd - or any other compression format - in revlogs. That being said, having some support for zstd (and other compression formats) in revlogs in core is beneficial. It can allow others to conduct experiments. This patch introduces *highly experimental* support for non-zlib compression formats in revlogs. Introduced is a config option to control which compression engine to use. Also introduced is a namespace of "exp-compression-*" requirements to denote support for non-zlib compression in revlogs. I've prefixed the namespace with "exp-" (short for "experimental") because I'm not confident of the requirements "schema" and in no way want to give the illusion of supporting these requirements in the future. I fully intend to drop support for these requirements once we figure out what we're doing with zstd in revlogs. A good portion of the patch is teaching the requirements system about registered compression engines and passing the requested compression engine as an opener option so revlogs can instantiate the proper compression engine for new operations. That's a verbose way of saying "we can now use zstd in revlogs!" On an `hg pull` conversion of the mozilla-unified repo with no extra redelta settings (like aggressivemergedeltas), we can see the impact of zstd vs zlib in revlogs: $ hg perfrevlogchunks -c ! chunk ! wall 2.032052 comb 2.040000 user 1.990000 sys 0.050000 (best of 5) ! wall 1.866360 comb 1.860000 user 1.820000 sys 0.040000 (best of 6) ! chunk batch ! wall 1.877261 comb 1.870000 user 1.860000 sys 0.010000 (best of 6) ! wall 1.705410 comb 1.710000 user 1.690000 sys 0.020000 (best of 6) $ hg perfrevlogchunks -m ! chunk ! wall 2.721427 comb 2.720000 user 2.640000 sys 0.080000 (best of 4) ! wall 2.035076 comb 2.030000 user 1.950000 sys 0.080000 (best of 5) ! chunk batch ! wall 2.614561 comb 2.620000 user 2.580000 sys 0.040000 (best of 4) ! wall 1.910252 comb 1.910000 user 1.880000 sys 0.030000 (best of 6) $ hg perfrevlog -c -d 1 ! wall 4.812885 comb 4.820000 user 4.800000 sys 0.020000 (best of 3) ! wall 4.699621 comb 4.710000 user 4.700000 sys 0.010000 (best of 3) $ hg perfrevlog -m -d 1000 ! wall 34.252800 comb 34.250000 user 33.730000 sys 0.520000 (best of 3) ! wall 24.094999 comb 24.090000 user 23.320000 sys 0.770000 (best of 3) Only modest wins for the changelog. But manifest reading is significantly faster. What's going on? One reason might be data volume. zstd decompresses faster. So given more bytes, it will put more distance between it and zlib. Another reason is size. In the current design, zstd revlogs are *larger*: debugcreatestreamclonebundle (size in bytes) zlib: 1,638,852,492 zstd: 1,680,601,332 I haven't investigated this fully, but I reckon a significant cause of larger revlogs is that the zstd frame/header has more bytes than zlib's. For very small inputs or data that doesn't compress well, we'll tend to store more uncompressed chunks than with zlib (because the compressed size isn't smaller than original). This will make revlog reading faster because it is doing less decompression. Moving on to bundle performance: $ hg bundle -a -t none-v2 (total CPU time) zlib: 102.79s zstd: 97.75s So, marginal CPU decrease for reading all chunks in all revlogs (this is somewhat disappointing). $ hg bundle -a -t <engine>-v2 (total CPU time) zlib: 191.59s zstd: 115.36s This last test effectively measures the difference between zlib->zlib and zstd->zstd for revlogs to bundle. This is a rough approximation of what a server does during `hg clone`. There are some promising results for zstd. But not enough for me to feel comfortable advertising it to users. We'll get there...

Gregory Szorc - - Load All Authors

File last commit:

r30766:d7bf7d2b default


                r30818:4c0a5a25

default

Download file

             common.py
        
                    222 lines
            
             | 7.2 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / mercurial / hgweb / common.py
          
                    History
                
                 |
                  Annotation
                 | Raw
                 |Copy content
                 |Copy permalink

      # hgweb/common.py - Utility functions needed by hgweb_mod and hgwebdir_mod

      #

      # Copyright 21 May 2005 - (c) 2005 Jake Edge <jake@edge2.net>

      # Copyright 2005, 2006 Matt Mackall <mpm@selenic.com>

      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

      from __future__ import absolute_import

      import base64

      import errno

      import mimetypes

      import os

      import uuid

      from .. import (

          encoding,

          pycompat,

          util,

      )

      httpserver = util.httpserver

      HTTP_OK = 200

      HTTP_NOT_MODIFIED = 304

      HTTP_BAD_REQUEST = 400

      HTTP_UNAUTHORIZED = 401

      HTTP_FORBIDDEN = 403

      HTTP_NOT_FOUND = 404

      HTTP_METHOD_NOT_ALLOWED = 405

      HTTP_SERVER_ERROR = 500

      def ismember(ui, username, userlist):

          """Check if username is a member of userlist.

          If userlist has a single '*' member, all users are considered members.

          Can be overridden by extensions to provide more complex authorization

          schemes.

          """

          return userlist == ['*'] or username in userlist

      def checkauthz(hgweb, req, op):

          '''Check permission for operation based on request data (including

          authentication info). Return if op allowed, else raise an ErrorResponse

          exception.'''

          user = req.env.get('REMOTE_USER')

          deny_read = hgweb.configlist('web', 'deny_read')

          if deny_read and (not user or ismember(hgweb.repo.ui, user, deny_read)):

              raise ErrorResponse(HTTP_UNAUTHORIZED, 'read not authorized')

          allow_read = hgweb.configlist('web', 'allow_read')

          if allow_read and (not ismember(hgweb.repo.ui, user, allow_read)):

              raise ErrorResponse(HTTP_UNAUTHORIZED, 'read not authorized')

          if op == 'pull' and not hgweb.allowpull:

              raise ErrorResponse(HTTP_UNAUTHORIZED, 'pull not authorized')

          elif op == 'pull' or op is None: # op is None for interface requests

              return

          # enforce that you can only push using POST requests

          if req.env['REQUEST_METHOD'] != 'POST':

              msg = 'push requires POST request'

              raise ErrorResponse(HTTP_METHOD_NOT_ALLOWED, msg)

          # require ssl by default for pushing, auth info cannot be sniffed

          # and replayed

          scheme = req.env.get('wsgi.url_scheme')

          if hgweb.configbool('web', 'push_ssl', True) and scheme != 'https':

              raise ErrorResponse(HTTP_FORBIDDEN, 'ssl required')

          deny = hgweb.configlist('web', 'deny_push')

          if deny and (not user or ismember(hgweb.repo.ui, user, deny)):

              raise ErrorResponse(HTTP_UNAUTHORIZED, 'push not authorized')

          allow = hgweb.configlist('web', 'allow_push')

          if not (allow and ismember(hgweb.repo.ui, user, allow)):

              raise ErrorResponse(HTTP_UNAUTHORIZED, 'push not authorized')

      # Hooks for hgweb permission checks; extensions can add hooks here.

      # Each hook is invoked like this: hook(hgweb, request, operation),

      # where operation is either read, pull or push. Hooks should either

      # raise an ErrorResponse exception, or just return.

      #

      # It is possible to do both authentication and authorization through

      # this.

      permhooks = [checkauthz]

      class ErrorResponse(Exception):

          def __init__(self, code, message=None, headers=[]):

              if message is None:

                  message = _statusmessage(code)

              Exception.__init__(self, message)

              self.code = code

              self.headers = headers

      class continuereader(object):

          def __init__(self, f, write):

              self.f = f

              self._write = write

              self.continued = False

          def read(self, amt=-1):

              if not self.continued:

                  self.continued = True

                  self._write('HTTP/1.1 100 Continue\r\n\r\n')

              return self.f.read(amt)

          def __getattr__(self, attr):

              if attr in ('close', 'readline', 'readlines', '__iter__'):

                  return getattr(self.f, attr)

              raise AttributeError

      def _statusmessage(code):

          responses = httpserver.basehttprequesthandler.responses

          return responses.get(code, ('Error', 'Unknown error'))[0]

      def statusmessage(code, message=None):

          return '%d %s' % (code, message or _statusmessage(code))

      def get_stat(spath, fn):

          """stat fn if it exists, spath otherwise"""

          cl_path = os.path.join(spath, fn)

          if os.path.exists(cl_path):

              return os.stat(cl_path)

          else:

              return os.stat(spath)

      def get_mtime(spath):

          return get_stat(spath, "00changelog.i").st_mtime

      def staticfile(directory, fname, req):

          """return a file inside directory with guessed Content-Type header

          fname always uses '/' as directory separator and isn't allowed to

          contain unusual path components.

          Content-Type is guessed using the mimetypes module.

          Return an empty string if fname is illegal or file not found.

          """

          parts = fname.split('/')

          for part in parts:

              if (part in ('', os.curdir, os.pardir) or

                  pycompat.ossep in part or

                  pycompat.osaltsep is not None and pycompat.osaltsep in part):

                  return

          fpath = os.path.join(*parts)

          if isinstance(directory, str):

              directory = [directory]

          for d in directory:

              path = os.path.join(d, fpath)

              if os.path.exists(path):

                  break

          try:

              os.stat(path)

              ct = mimetypes.guess_type(path)[0] or "text/plain"

              fp = open(path, 'rb')

              data = fp.read()

              fp.close()

              req.respond(HTTP_OK, ct, body=data)

          except TypeError:

              raise ErrorResponse(HTTP_SERVER_ERROR, 'illegal filename')

          except OSError as err:

              if err.errno == errno.ENOENT:

                  raise ErrorResponse(HTTP_NOT_FOUND)

              else:

                  raise ErrorResponse(HTTP_SERVER_ERROR, err.strerror)

      def paritygen(stripecount, offset=0):

          """count parity of horizontal stripes for easier reading"""

          if stripecount and offset:

              # account for offset, e.g. due to building the list in reverse

              count = (stripecount + offset) % stripecount

              parity = (stripecount + offset) / stripecount & 1

          else:

              count = 0

              parity = 0

          while True:

              yield parity

              count += 1

              if stripecount and count >= stripecount:

                  parity = 1 - parity

                  count = 0

      def get_contact(config):

          """Return repo contact information or empty string.

          web.contact is the primary source, but if that is not set, try

          ui.username or $EMAIL as a fallback to display something useful.

          """

          return (config("web", "contact") or

                  config("ui", "username") or

                  encoding.environ.get("EMAIL") or "")

      def caching(web, req):

          tag = 'W/"%s"' % web.mtime

          if req.env.get('HTTP_IF_NONE_MATCH') == tag:

              raise ErrorResponse(HTTP_NOT_MODIFIED)

          req.headers.append(('ETag', tag))

      def cspvalues(ui):

          """Obtain the Content-Security-Policy header and nonce value.

          Returns a 2-tuple of the CSP header value and the nonce value.

          First value is ``None`` if CSP isn't enabled. Second value is ``None``

          if CSP isn't enabled or if the CSP header doesn't need a nonce.

          """

          # Don't allow untrusted CSP setting since it be disable protections

          # from a trusted/global source.

          csp = ui.config('web', 'csp', untrusted=False)

          nonce = None

          if csp and '%nonce%' in csp:

              nonce = base64.urlsafe_b64encode(uuid.uuid4().bytes).rstrip('=')

              csp = csp.replace('%nonce%', nonce)

          return csp, nonce

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

				# hgweb/common.py - Utility functions needed by hgweb_mod and hgwebdir_mod
				#
				# Copyright 21 May 2005 - (c) 2005 Jake Edge <jake@edge2.net>
				# Copyright 2005, 2006 Matt Mackall <mpm@selenic.com>
				#
				# This software may be used and distributed according to the terms of the
				# GNU General Public License version 2 or any later version.

				from __future__ import absolute_import

				import base64
				import errno
				import mimetypes
				import os
				import uuid

				from .. import (
				encoding,
				pycompat,
				util,
				)

				httpserver = util.httpserver

				HTTP_OK = 200
				HTTP_NOT_MODIFIED = 304
				HTTP_BAD_REQUEST = 400
				HTTP_UNAUTHORIZED = 401
				HTTP_FORBIDDEN = 403
				HTTP_NOT_FOUND = 404
				HTTP_METHOD_NOT_ALLOWED = 405
				HTTP_SERVER_ERROR = 500


				def ismember(ui, username, userlist):
				"""Check if username is a member of userlist.

				If userlist has a single '*' member, all users are considered members.
				Can be overridden by extensions to provide more complex authorization
				schemes.
				"""
				return userlist == ['*'] or username in userlist

				def checkauthz(hgweb, req, op):
				'''Check permission for operation based on request data (including
				authentication info). Return if op allowed, else raise an ErrorResponse
				exception.'''

				user = req.env.get('REMOTE_USER')

				deny_read = hgweb.configlist('web', 'deny_read')
				if deny_read and (not user or ismember(hgweb.repo.ui, user, deny_read)):
				raise ErrorResponse(HTTP_UNAUTHORIZED, 'read not authorized')

				allow_read = hgweb.configlist('web', 'allow_read')
				if allow_read and (not ismember(hgweb.repo.ui, user, allow_read)):
				raise ErrorResponse(HTTP_UNAUTHORIZED, 'read not authorized')

				if op == 'pull' and not hgweb.allowpull:
				raise ErrorResponse(HTTP_UNAUTHORIZED, 'pull not authorized')
				elif op == 'pull' or op is None: # op is None for interface requests
				return

				# enforce that you can only push using POST requests
				if req.env['REQUEST_METHOD'] != 'POST':
				msg = 'push requires POST request'
				raise ErrorResponse(HTTP_METHOD_NOT_ALLOWED, msg)

				# require ssl by default for pushing, auth info cannot be sniffed
				# and replayed
				scheme = req.env.get('wsgi.url_scheme')
				if hgweb.configbool('web', 'push_ssl', True) and scheme != 'https':
				raise ErrorResponse(HTTP_FORBIDDEN, 'ssl required')

				deny = hgweb.configlist('web', 'deny_push')
				if deny and (not user or ismember(hgweb.repo.ui, user, deny)):
				raise ErrorResponse(HTTP_UNAUTHORIZED, 'push not authorized')

				allow = hgweb.configlist('web', 'allow_push')
				if not (allow and ismember(hgweb.repo.ui, user, allow)):
				raise ErrorResponse(HTTP_UNAUTHORIZED, 'push not authorized')

				# Hooks for hgweb permission checks; extensions can add hooks here.
				# Each hook is invoked like this: hook(hgweb, request, operation),
				# where operation is either read, pull or push. Hooks should either
				# raise an ErrorResponse exception, or just return.
				#
				# It is possible to do both authentication and authorization through
				# this.
				permhooks = [checkauthz]


				class ErrorResponse(Exception):
				def __init__(self, code, message=None, headers=[]):
				if message is None:
				message = _statusmessage(code)
				Exception.__init__(self, message)
				self.code = code
				self.headers = headers

				class continuereader(object):
				def __init__(self, f, write):
				self.f = f
				self._write = write
				self.continued = False

				def read(self, amt=-1):
				if not self.continued:
				self.continued = True
				self._write('HTTP/1.1 100 Continue\r\n\r\n')
				return self.f.read(amt)

				def __getattr__(self, attr):
				if attr in ('close', 'readline', 'readlines', '__iter__'):
				return getattr(self.f, attr)
				raise AttributeError

				def _statusmessage(code):
				responses = httpserver.basehttprequesthandler.responses
				return responses.get(code, ('Error', 'Unknown error'))[0]

				def statusmessage(code, message=None):
				return '%d %s' % (code, message or _statusmessage(code))

				def get_stat(spath, fn):
				"""stat fn if it exists, spath otherwise"""
				cl_path = os.path.join(spath, fn)
				if os.path.exists(cl_path):
				return os.stat(cl_path)
				else:
				return os.stat(spath)

				def get_mtime(spath):
				return get_stat(spath, "00changelog.i").st_mtime

				def staticfile(directory, fname, req):
				"""return a file inside directory with guessed Content-Type header

				fname always uses '/' as directory separator and isn't allowed to
				contain unusual path components.
				Content-Type is guessed using the mimetypes module.
				Return an empty string if fname is illegal or file not found.

				"""
				parts = fname.split('/')
				for part in parts:
				if (part in ('', os.curdir, os.pardir) or
				pycompat.ossep in part or
				pycompat.osaltsep is not None and pycompat.osaltsep in part):
				return
				fpath = os.path.join(*parts)
				if isinstance(directory, str):
				directory = [directory]
				for d in directory:
				path = os.path.join(d, fpath)
				if os.path.exists(path):
				break
				try:
				os.stat(path)
				ct = mimetypes.guess_type(path)[0] or "text/plain"
				fp = open(path, 'rb')
				data = fp.read()
				fp.close()
				req.respond(HTTP_OK, ct, body=data)
				except TypeError:
				raise ErrorResponse(HTTP_SERVER_ERROR, 'illegal filename')
				except OSError as err:
				if err.errno == errno.ENOENT:
				raise ErrorResponse(HTTP_NOT_FOUND)
				else:
				raise ErrorResponse(HTTP_SERVER_ERROR, err.strerror)

				def paritygen(stripecount, offset=0):
				"""count parity of horizontal stripes for easier reading"""
				if stripecount and offset:
				# account for offset, e.g. due to building the list in reverse
				count = (stripecount + offset) % stripecount
				parity = (stripecount + offset) / stripecount & 1
				else:
				count = 0
				parity = 0
				while True:
				yield parity
				count += 1
				if stripecount and count >= stripecount:
				parity = 1 - parity
				count = 0

				def get_contact(config):
				"""Return repo contact information or empty string.

				web.contact is the primary source, but if that is not set, try
				ui.username or $EMAIL as a fallback to display something useful.
				"""
				return (config("web", "contact") or
				config("ui", "username") or
				encoding.environ.get("EMAIL") or "")

				def caching(web, req):
				tag = 'W/"%s"' % web.mtime
				if req.env.get('HTTP_IF_NONE_MATCH') == tag:
				raise ErrorResponse(HTTP_NOT_MODIFIED)
				req.headers.append(('ETag', tag))

				def cspvalues(ui):
				"""Obtain the Content-Security-Policy header and nonce value.

				Returns a 2-tuple of the CSP header value and the nonce value.

				First value is ``None`` if CSP isn't enabled. Second value is ``None``
				if CSP isn't enabled or if the CSP header doesn't need a nonce.
				"""
				# Don't allow untrusted CSP setting since it be disable protections
				# from a trusted/global source.
				csp = ui.config('web', 'csp', untrusted=False)
				nonce = None

				if csp and '%nonce%' in csp:
				nonce = base64.urlsafe_b64encode(uuid.uuid4().bytes).rstrip('=')
				csp = csp.replace('%nonce%', nonce)

				return csp, nonce