upstream/mercurial-mirror Files · mercurial/hgweb/protocol.py

localrepo: experimental support for non-zlib revlog compression...

localrepo: experimental support for non-zlib revlog compression The final part of integrating the compression manager APIs into revlog storage is the plumbing for repositories to advertise they are using non-zlib storage and for revlogs to instantiate a non-zlib compression engine. The main intent of the compression manager work was to zstd all of the things. Adding zstd to revlogs has proved to be more involved than other places because revlogs are... special. Very small inputs and the use of delta chains (which are themselves a form of compression) are a completely different use case from streaming compression, which bundles and the wire protocol employ. I've conducted numerous experiments with zstd in revlogs and have yet to formalize compression settings and a storage architecture that I'm confident I won't regret later. In other words, I'm not yet ready to commit to a new mechanism for using zstd - or any other compression format - in revlogs. That being said, having some support for zstd (and other compression formats) in revlogs in core is beneficial. It can allow others to conduct experiments. This patch introduces *highly experimental* support for non-zlib compression formats in revlogs. Introduced is a config option to control which compression engine to use. Also introduced is a namespace of "exp-compression-*" requirements to denote support for non-zlib compression in revlogs. I've prefixed the namespace with "exp-" (short for "experimental") because I'm not confident of the requirements "schema" and in no way want to give the illusion of supporting these requirements in the future. I fully intend to drop support for these requirements once we figure out what we're doing with zstd in revlogs. A good portion of the patch is teaching the requirements system about registered compression engines and passing the requested compression engine as an opener option so revlogs can instantiate the proper compression engine for new operations. That's a verbose way of saying "we can now use zstd in revlogs!" On an `hg pull` conversion of the mozilla-unified repo with no extra redelta settings (like aggressivemergedeltas), we can see the impact of zstd vs zlib in revlogs: $ hg perfrevlogchunks -c ! chunk ! wall 2.032052 comb 2.040000 user 1.990000 sys 0.050000 (best of 5) ! wall 1.866360 comb 1.860000 user 1.820000 sys 0.040000 (best of 6) ! chunk batch ! wall 1.877261 comb 1.870000 user 1.860000 sys 0.010000 (best of 6) ! wall 1.705410 comb 1.710000 user 1.690000 sys 0.020000 (best of 6) $ hg perfrevlogchunks -m ! chunk ! wall 2.721427 comb 2.720000 user 2.640000 sys 0.080000 (best of 4) ! wall 2.035076 comb 2.030000 user 1.950000 sys 0.080000 (best of 5) ! chunk batch ! wall 2.614561 comb 2.620000 user 2.580000 sys 0.040000 (best of 4) ! wall 1.910252 comb 1.910000 user 1.880000 sys 0.030000 (best of 6) $ hg perfrevlog -c -d 1 ! wall 4.812885 comb 4.820000 user 4.800000 sys 0.020000 (best of 3) ! wall 4.699621 comb 4.710000 user 4.700000 sys 0.010000 (best of 3) $ hg perfrevlog -m -d 1000 ! wall 34.252800 comb 34.250000 user 33.730000 sys 0.520000 (best of 3) ! wall 24.094999 comb 24.090000 user 23.320000 sys 0.770000 (best of 3) Only modest wins for the changelog. But manifest reading is significantly faster. What's going on? One reason might be data volume. zstd decompresses faster. So given more bytes, it will put more distance between it and zlib. Another reason is size. In the current design, zstd revlogs are *larger*: debugcreatestreamclonebundle (size in bytes) zlib: 1,638,852,492 zstd: 1,680,601,332 I haven't investigated this fully, but I reckon a significant cause of larger revlogs is that the zstd frame/header has more bytes than zlib's. For very small inputs or data that doesn't compress well, we'll tend to store more uncompressed chunks than with zlib (because the compressed size isn't smaller than original). This will make revlog reading faster because it is doing less decompression. Moving on to bundle performance: $ hg bundle -a -t none-v2 (total CPU time) zlib: 102.79s zstd: 97.75s So, marginal CPU decrease for reading all chunks in all revlogs (this is somewhat disappointing). $ hg bundle -a -t <engine>-v2 (total CPU time) zlib: 191.59s zstd: 115.36s This last test effectively measures the difference between zlib->zlib and zstd->zstd for revlogs to bundle. This is a rough approximation of what a server does during `hg clone`. There are some promising results for zstd. But not enough for me to feel comfortable advertising it to users. We'll get there...

Gregory Szorc - - Load All Authors

File last commit:

r30764:e75463e3 default


                r30818:4c0a5a25

default

Download file

             protocol.py
        
                    198 lines
            
             | 6.5 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / mercurial / hgweb / protocol.py
          
                    History
                
                 |
                  Annotation
                 | Raw
                 |Copy content
                 |Copy permalink

      #

      # Copyright 21 May 2005 - (c) 2005 Jake Edge <jake@edge2.net>

      # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>

      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

      from __future__ import absolute_import

      import cgi

      import struct

      from .common import (

          HTTP_OK,

      )

      from .. import (

          util,

          wireproto,

      )

      stringio = util.stringio

      urlerr = util.urlerr

      urlreq = util.urlreq

      HGTYPE = 'application/mercurial-0.1'

      HGTYPE2 = 'application/mercurial-0.2'

      HGERRTYPE = 'application/hg-error'

      def decodevaluefromheaders(req, headerprefix):

          """Decode a long value from multiple HTTP request headers."""

          chunks = []

          i = 1

          while True:

              v = req.env.get('HTTP_%s_%d' % (

                  headerprefix.upper().replace('-', '_'), i))

              if v is None:

                  break

              chunks.append(v)

              i += 1

          return ''.join(chunks)

      class webproto(wireproto.abstractserverproto):

          def __init__(self, req, ui):

              self.req = req

              self.response = ''

              self.ui = ui

              self.name = 'http'

          def getargs(self, args):

              knownargs = self._args()

              data = {}

              keys = args.split()

              for k in keys:

                  if k == '*':

                      star = {}

                      for key in knownargs.keys():

                          if key != 'cmd' and key not in keys:

                              star[key] = knownargs[key][0]

                      data['*'] = star

                  else:

                      data[k] = knownargs[k][0]

              return [data[k] for k in keys]

          def _args(self):

              args = self.req.form.copy()

              postlen = int(self.req.env.get('HTTP_X_HGARGS_POST', 0))

              if postlen:

                  args.update(cgi.parse_qs(

                      self.req.read(postlen), keep_blank_values=True))

                  return args

              argvalue = decodevaluefromheaders(self.req, 'X-HgArg')

              args.update(cgi.parse_qs(argvalue, keep_blank_values=True))

              return args

          def getfile(self, fp):

              length = int(self.req.env['CONTENT_LENGTH'])

              for s in util.filechunkiter(self.req, limit=length):

                  fp.write(s)

          def redirect(self):

              self.oldio = self.ui.fout, self.ui.ferr

              self.ui.ferr = self.ui.fout = stringio()

          def restore(self):

              val = self.ui.fout.getvalue()

              self.ui.ferr, self.ui.fout = self.oldio

              return val

          def _client(self):

              return 'remote:%s:%s:%s' % (

                  self.req.env.get('wsgi.url_scheme') or 'http',

                  urlreq.quote(self.req.env.get('REMOTE_HOST', '')),

                  urlreq.quote(self.req.env.get('REMOTE_USER', '')))

          def responsetype(self, v1compressible=False):

              """Determine the appropriate response type and compression settings.

              The ``v1compressible`` argument states whether the response with

              application/mercurial-0.1 media types should be zlib compressed.

              Returns a tuple of (mediatype, compengine, engineopts).

              """

              # For now, if it isn't compressible in the old world, it's never

              # compressible. We can change this to send uncompressed 0.2 payloads

              # later.

              if not v1compressible:

                  return HGTYPE, None, None

              # Determine the response media type and compression engine based

              # on the request parameters.

              protocaps = decodevaluefromheaders(self.req, 'X-HgProto').split(' ')

              if '0.2' in protocaps:

                  # Default as defined by wire protocol spec.

                  compformats = ['zlib', 'none']

                  for cap in protocaps:

                      if cap.startswith('comp='):

                          compformats = cap[5:].split(',')

                          break

                  # Now find an agreed upon compression format.

                  for engine in wireproto.supportedcompengines(self.ui, self,

                                                               util.SERVERROLE):

                      if engine.wireprotosupport().name in compformats:

                          opts = {}

                          level = self.ui.configint('server',

                                                    '%slevel' % engine.name())

                          if level is not None:

                              opts['level'] = level

                          return HGTYPE2, engine, opts

                  # No mutually supported compression format. Fall back to the

                  # legacy protocol.

              # Don't allow untrusted settings because disabling compression or

              # setting a very high compression level could lead to flooding

              # the server's network or CPU.

              opts = {'level': self.ui.configint('server', 'zliblevel', -1)}

              return HGTYPE, util.compengines['zlib'], opts

      def iscmd(cmd):

          return cmd in wireproto.commands

      def call(repo, req, cmd):

          p = webproto(req, repo.ui)

          def genversion2(gen, compress, engine, engineopts):

              # application/mercurial-0.2 always sends a payload header

              # identifying the compression engine.

              name = engine.wireprotosupport().name

              assert 0 < len(name) < 256

              yield struct.pack('B', len(name))

              yield name

              if compress:

                  for chunk in engine.compressstream(gen, opts=engineopts):

                      yield chunk

              else:

                  for chunk in gen:

                      yield chunk

          rsp = wireproto.dispatch(repo, p, cmd)

          if isinstance(rsp, str):

              req.respond(HTTP_OK, HGTYPE, body=rsp)

              return []

          elif isinstance(rsp, wireproto.streamres):

              if rsp.reader:

                  gen = iter(lambda: rsp.reader.read(32768), '')

              else:

                  gen = rsp.gen

              # This code for compression should not be streamres specific. It

              # is here because we only compress streamres at the moment.

              mediatype, engine, engineopts = p.responsetype(rsp.v1compressible)

              if mediatype == HGTYPE and rsp.v1compressible:

                  gen = engine.compressstream(gen, engineopts)

              elif mediatype == HGTYPE2:

                  gen = genversion2(gen, rsp.v1compressible, engine, engineopts)

              req.respond(HTTP_OK, mediatype)

              return gen

          elif isinstance(rsp, wireproto.pushres):

              val = p.restore()

              rsp = '%d\n%s' % (rsp.res, val)

              req.respond(HTTP_OK, HGTYPE, body=rsp)

              return []

          elif isinstance(rsp, wireproto.pusherr):

              # drain the incoming bundle

              req.drain()

              p.restore()

              rsp = '0\n%s\n' % rsp.res

              req.respond(HTTP_OK, HGTYPE, body=rsp)

              return []

          elif isinstance(rsp, wireproto.ooberror):

              rsp = rsp.message

              req.respond(HTTP_OK, HGERRTYPE, body=rsp)

              return []

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

				#
				# Copyright 21 May 2005 - (c) 2005 Jake Edge <jake@edge2.net>
				# Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
				#
				# This software may be used and distributed according to the terms of the
				# GNU General Public License version 2 or any later version.

				from __future__ import absolute_import

				import cgi
				import struct

				from .common import (
				HTTP_OK,
				)

				from .. import (
				util,
				wireproto,
				)
				stringio = util.stringio

				urlerr = util.urlerr
				urlreq = util.urlreq

				HGTYPE = 'application/mercurial-0.1'
				HGTYPE2 = 'application/mercurial-0.2'
				HGERRTYPE = 'application/hg-error'

				def decodevaluefromheaders(req, headerprefix):
				"""Decode a long value from multiple HTTP request headers."""
				chunks = []
				i = 1
				while True:
				v = req.env.get('HTTP_%s_%d' % (
				headerprefix.upper().replace('-', '_'), i))
				if v is None:
				break
				chunks.append(v)
				i += 1

				return ''.join(chunks)

				class webproto(wireproto.abstractserverproto):
				def __init__(self, req, ui):
				self.req = req
				self.response = ''
				self.ui = ui
				self.name = 'http'

				def getargs(self, args):
				knownargs = self._args()
				data = {}
				keys = args.split()
				for k in keys:
				if k == '*':
				star = {}
				for key in knownargs.keys():
				if key != 'cmd' and key not in keys:
				star[key] = knownargs[key][0]
				data['*'] = star
				else:
				data[k] = knownargs[k][0]
				return [data[k] for k in keys]
				def _args(self):
				args = self.req.form.copy()
				postlen = int(self.req.env.get('HTTP_X_HGARGS_POST', 0))
				if postlen:
				args.update(cgi.parse_qs(
				self.req.read(postlen), keep_blank_values=True))
				return args

				argvalue = decodevaluefromheaders(self.req, 'X-HgArg')
				args.update(cgi.parse_qs(argvalue, keep_blank_values=True))
				return args
				def getfile(self, fp):
				length = int(self.req.env['CONTENT_LENGTH'])
				for s in util.filechunkiter(self.req, limit=length):
				fp.write(s)
				def redirect(self):
				self.oldio = self.ui.fout, self.ui.ferr
				self.ui.ferr = self.ui.fout = stringio()
				def restore(self):
				val = self.ui.fout.getvalue()
				self.ui.ferr, self.ui.fout = self.oldio
				return val

				def _client(self):
				return 'remote:%s:%s:%s' % (
				self.req.env.get('wsgi.url_scheme') or 'http',
				urlreq.quote(self.req.env.get('REMOTE_HOST', '')),
				urlreq.quote(self.req.env.get('REMOTE_USER', '')))

				def responsetype(self, v1compressible=False):
				"""Determine the appropriate response type and compression settings.

				The ``v1compressible`` argument states whether the response with
				application/mercurial-0.1 media types should be zlib compressed.

				Returns a tuple of (mediatype, compengine, engineopts).
				"""
				# For now, if it isn't compressible in the old world, it's never
				# compressible. We can change this to send uncompressed 0.2 payloads
				# later.
				if not v1compressible:
				return HGTYPE, None, None

				# Determine the response media type and compression engine based
				# on the request parameters.
				protocaps = decodevaluefromheaders(self.req, 'X-HgProto').split(' ')

				if '0.2' in protocaps:
				# Default as defined by wire protocol spec.
				compformats = ['zlib', 'none']
				for cap in protocaps:
				if cap.startswith('comp='):
				compformats = cap[5:].split(',')
				break

				# Now find an agreed upon compression format.
				for engine in wireproto.supportedcompengines(self.ui, self,
				util.SERVERROLE):
				if engine.wireprotosupport().name in compformats:
				opts = {}
				level = self.ui.configint('server',
				'%slevel' % engine.name())
				if level is not None:
				opts['level'] = level

				return HGTYPE2, engine, opts

				# No mutually supported compression format. Fall back to the
				# legacy protocol.

				# Don't allow untrusted settings because disabling compression or
				# setting a very high compression level could lead to flooding
				# the server's network or CPU.
				opts = {'level': self.ui.configint('server', 'zliblevel', -1)}
				return HGTYPE, util.compengines['zlib'], opts

				def iscmd(cmd):
				return cmd in wireproto.commands

				def call(repo, req, cmd):
				p = webproto(req, repo.ui)

				def genversion2(gen, compress, engine, engineopts):
				# application/mercurial-0.2 always sends a payload header
				# identifying the compression engine.
				name = engine.wireprotosupport().name
				assert 0 < len(name) < 256
				yield struct.pack('B', len(name))
				yield name

				if compress:
				for chunk in engine.compressstream(gen, opts=engineopts):
				yield chunk
				else:
				for chunk in gen:
				yield chunk

				rsp = wireproto.dispatch(repo, p, cmd)
				if isinstance(rsp, str):
				req.respond(HTTP_OK, HGTYPE, body=rsp)
				return []
				elif isinstance(rsp, wireproto.streamres):
				if rsp.reader:
				gen = iter(lambda: rsp.reader.read(32768), '')
				else:
				gen = rsp.gen

				# This code for compression should not be streamres specific. It
				# is here because we only compress streamres at the moment.
				mediatype, engine, engineopts = p.responsetype(rsp.v1compressible)

				if mediatype == HGTYPE and rsp.v1compressible:
				gen = engine.compressstream(gen, engineopts)
				elif mediatype == HGTYPE2:
				gen = genversion2(gen, rsp.v1compressible, engine, engineopts)

				req.respond(HTTP_OK, mediatype)
				return gen
				elif isinstance(rsp, wireproto.pushres):
				val = p.restore()
				rsp = '%d\n%s' % (rsp.res, val)
				req.respond(HTTP_OK, HGTYPE, body=rsp)
				return []
				elif isinstance(rsp, wireproto.pusherr):
				# drain the incoming bundle
				req.drain()
				p.restore()
				rsp = '0\n%s\n' % rsp.res
				req.respond(HTTP_OK, HGTYPE, body=rsp)
				return []
				elif isinstance(rsp, wireproto.ooberror):
				rsp = rsp.message
				req.respond(HTTP_OK, HGERRTYPE, body=rsp)
				return []