##// END OF EJS Templates
localrepo: experimental support for non-zlib revlog compression...
localrepo: experimental support for non-zlib revlog compression The final part of integrating the compression manager APIs into revlog storage is the plumbing for repositories to advertise they are using non-zlib storage and for revlogs to instantiate a non-zlib compression engine. The main intent of the compression manager work was to zstd all of the things. Adding zstd to revlogs has proved to be more involved than other places because revlogs are... special. Very small inputs and the use of delta chains (which are themselves a form of compression) are a completely different use case from streaming compression, which bundles and the wire protocol employ. I've conducted numerous experiments with zstd in revlogs and have yet to formalize compression settings and a storage architecture that I'm confident I won't regret later. In other words, I'm not yet ready to commit to a new mechanism for using zstd - or any other compression format - in revlogs. That being said, having some support for zstd (and other compression formats) in revlogs in core is beneficial. It can allow others to conduct experiments. This patch introduces *highly experimental* support for non-zlib compression formats in revlogs. Introduced is a config option to control which compression engine to use. Also introduced is a namespace of "exp-compression-*" requirements to denote support for non-zlib compression in revlogs. I've prefixed the namespace with "exp-" (short for "experimental") because I'm not confident of the requirements "schema" and in no way want to give the illusion of supporting these requirements in the future. I fully intend to drop support for these requirements once we figure out what we're doing with zstd in revlogs. A good portion of the patch is teaching the requirements system about registered compression engines and passing the requested compression engine as an opener option so revlogs can instantiate the proper compression engine for new operations. That's a verbose way of saying "we can now use zstd in revlogs!" On an `hg pull` conversion of the mozilla-unified repo with no extra redelta settings (like aggressivemergedeltas), we can see the impact of zstd vs zlib in revlogs: $ hg perfrevlogchunks -c ! chunk ! wall 2.032052 comb 2.040000 user 1.990000 sys 0.050000 (best of 5) ! wall 1.866360 comb 1.860000 user 1.820000 sys 0.040000 (best of 6) ! chunk batch ! wall 1.877261 comb 1.870000 user 1.860000 sys 0.010000 (best of 6) ! wall 1.705410 comb 1.710000 user 1.690000 sys 0.020000 (best of 6) $ hg perfrevlogchunks -m ! chunk ! wall 2.721427 comb 2.720000 user 2.640000 sys 0.080000 (best of 4) ! wall 2.035076 comb 2.030000 user 1.950000 sys 0.080000 (best of 5) ! chunk batch ! wall 2.614561 comb 2.620000 user 2.580000 sys 0.040000 (best of 4) ! wall 1.910252 comb 1.910000 user 1.880000 sys 0.030000 (best of 6) $ hg perfrevlog -c -d 1 ! wall 4.812885 comb 4.820000 user 4.800000 sys 0.020000 (best of 3) ! wall 4.699621 comb 4.710000 user 4.700000 sys 0.010000 (best of 3) $ hg perfrevlog -m -d 1000 ! wall 34.252800 comb 34.250000 user 33.730000 sys 0.520000 (best of 3) ! wall 24.094999 comb 24.090000 user 23.320000 sys 0.770000 (best of 3) Only modest wins for the changelog. But manifest reading is significantly faster. What's going on? One reason might be data volume. zstd decompresses faster. So given more bytes, it will put more distance between it and zlib. Another reason is size. In the current design, zstd revlogs are *larger*: debugcreatestreamclonebundle (size in bytes) zlib: 1,638,852,492 zstd: 1,680,601,332 I haven't investigated this fully, but I reckon a significant cause of larger revlogs is that the zstd frame/header has more bytes than zlib's. For very small inputs or data that doesn't compress well, we'll tend to store more uncompressed chunks than with zlib (because the compressed size isn't smaller than original). This will make revlog reading faster because it is doing less decompression. Moving on to bundle performance: $ hg bundle -a -t none-v2 (total CPU time) zlib: 102.79s zstd: 97.75s So, marginal CPU decrease for reading all chunks in all revlogs (this is somewhat disappointing). $ hg bundle -a -t <engine>-v2 (total CPU time) zlib: 191.59s zstd: 115.36s This last test effectively measures the difference between zlib->zlib and zstd->zstd for revlogs to bundle. This is a rough approximation of what a server does during `hg clone`. There are some promising results for zstd. But not enough for me to feel comfortable advertising it to users. We'll get there...

File last commit:

r27046:37fcfe52 default
r30818:4c0a5a25 default
Show More
request.py
152 lines | 5.3 KiB | text/x-python | PythonLexer
Eric Hopper
Fixing up comment headers for split up code.
r2391 # hgweb/request.py - An http request from either CGI or the standalone server.
Eric Hopper
Splitting up hgweb so it's easier to change.
r2355 #
# Copyright 21 May 2005 - (c) 2005 Jake Edge <jake@edge2.net>
Vadim Gelfer
update copyrights.
r2859 # Copyright 2005, 2006 Matt Mackall <mpm@selenic.com>
Eric Hopper
Splitting up hgweb so it's easier to change.
r2355 #
Martin Geisler
updated license to be explicit about GPL version 2
r8225 # This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
Eric Hopper
Splitting up hgweb so it's easier to change.
r2355
Yuya Nishihara
hgweb: use absolute_import
r27046 from __future__ import absolute_import
import cgi
import errno
import socket
from .common import (
ErrorResponse,
HTTP_NOT_MODIFIED,
statusmessage,
)
from .. import (
util,
)
Eric Hopper
Splitting up hgweb so it's easier to change.
r2355
Dirkjan Ochtman
hgweb: move shortcut expansion to request instantiation
r6774 shortcuts = {
'cl': [('cmd', ['changelog']), ('rev', None)],
'sl': [('cmd', ['shortlog']), ('rev', None)],
'cs': [('cmd', ['changeset']), ('node', None)],
'f': [('cmd', ['file']), ('filenode', None)],
'fl': [('cmd', ['filelog']), ('filenode', None)],
'fd': [('cmd', ['filediff']), ('node', None)],
'fa': [('cmd', ['annotate']), ('filenode', None)],
'mf': [('cmd', ['manifest']), ('manifest', None)],
'ca': [('cmd', ['archive']), ('node', None)],
'tags': [('cmd', ['tags'])],
'tip': [('cmd', ['changeset']), ('node', ['tip'])],
'static': [('cmd', ['static']), ('file', None)]
}
Nicolas Dumazet
hgweb: request: strip() form values...
r10261 def normalize(form):
# first expand the shortcuts
Dirkjan Ochtman
hgweb: move shortcut expansion to request instantiation
r6774 for k in shortcuts.iterkeys():
if k in form:
for name, value in shortcuts[k]:
if value is None:
value = form[k]
form[name] = value
del form[k]
Nicolas Dumazet
hgweb: request: strip() form values...
r10261 # And strip the values
for k, v in form.iteritems():
form[k] = [i.strip() for i in v]
Dirkjan Ochtman
hgweb: move shortcut expansion to request instantiation
r6774 return form
Dirkjan Ochtman
Less indirection in the WSGI web interface. This simplifies some code, and makes it more compliant with WSGI.
r5566 class wsgirequest(object):
Gregory Szorc
hgweb: add some documentation...
r26132 """Higher-level API for a WSGI request.
WSGI applications are invoked with 2 arguments. They are used to
instantiate instances of this class, which provides higher-level APIs
for obtaining request parameters, writing HTTP output, etc.
"""
Dirkjan Ochtman
Less indirection in the WSGI web interface. This simplifies some code, and makes it more compliant with WSGI.
r5566 def __init__(self, wsgienv, start_response):
Eric Hopper
This patch make several WSGI related alterations....
r2506 version = wsgienv['wsgi.version']
Thomas Arendsen Hein
white space and line break cleanups
r3673 if (version < (1, 0)) or (version >= (2, 0)):
Thomas Arendsen Hein
Cleanup of whitespace, indentation and line continuation.
r4633 raise RuntimeError("Unknown and unsupported WSGI version %d.%d"
Eric Hopper
This patch make several WSGI related alterations....
r2506 % version)
self.inp = wsgienv['wsgi.input']
self.err = wsgienv['wsgi.errors']
self.threaded = wsgienv['wsgi.multithread']
self.multiprocess = wsgienv['wsgi.multiprocess']
self.run_once = wsgienv['wsgi.run_once']
self.env = wsgienv
Nicolas Dumazet
hgweb: request: strip() form values...
r10261 self.form = normalize(cgi.parse(self.inp,
self.env,
keep_blank_values=1))
Dirkjan Ochtman
hgweb: separate out start_response() calling
r5888 self._start_response = start_response
Dirkjan Ochtman
hgweb: explicit response status
r5993 self.server_write = None
Eric Hopper
This patch make several WSGI related alterations....
r2506 self.headers = []
def __iter__(self):
return iter([])
Eric Hopper
Splitting up hgweb so it's easier to change.
r2355
Vadim Gelfer
push over http: server support....
r2464 def read(self, count=-1):
return self.inp.read(count)
Dirkjan Ochtman
hgweb: be sure to drain request data even in early error conditions...
r7180 def drain(self):
'''need to read all data from request, httplib is half-duplex'''
Dirkjan Ochtman
hgweb: pmezard thinks one default is enough
r13600 length = int(self.env.get('CONTENT_LENGTH') or 0)
Dirkjan Ochtman
hgweb: be sure to drain request data even in early error conditions...
r7180 for s in util.filechunkiter(self.inp, limit=length):
pass
Mads Kiilerich
hgweb: pass the actual response body to request.response, not just the length...
r18352 def respond(self, status, type, filename=None, body=None):
Dirkjan Ochtman
hgweb: separate out start_response() calling
r5888 if self._start_response is not None:
Mads Kiilerich
hgweb: simplify wsgirequest header handling...
r18348 self.headers.append(('Content-Type', type))
if filename:
av6
hgweb: replace some str.split() calls by str.partition() or str.rpartition()...
r26846 filename = (filename.rpartition('/')[-1]
Mads Kiilerich
hgweb: simplify wsgirequest header handling...
r18348 .replace('\\', '\\\\').replace('"', '\\"'))
self.headers.append(('Content-Disposition',
'inline; filename="%s"' % filename))
Mads Kiilerich
hgweb: pass the actual response body to request.response, not just the length...
r18352 if body is not None:
self.headers.append(('Content-Length', str(len(body))))
Dirkjan Ochtman
hgweb: separate out start_response() calling
r5888
Dirkjan Ochtman
hgweb: be sure to send a valid content-type for raw files
r5926 for k, v in self.headers:
if not isinstance(v, str):
Mads Kiilerich
hgweb: simplify wsgirequest header handling...
r18348 raise TypeError('header value must be string: %r' % (v,))
Dirkjan Ochtman
hgweb: be sure to send a valid content-type for raw files
r5926
Dirkjan Ochtman
hgweb: separate out start_response() calling
r5888 if isinstance(status, ErrorResponse):
Mads Kiilerich
hgweb: simplify wsgirequest header handling...
r18348 self.headers.extend(status.headers)
Augie Fackler
hgweb: don't send a body or illegal headers during 304 response...
r12739 if status.code == HTTP_NOT_MODIFIED:
# RFC 2616 Section 10.3.5: 304 Not Modified has cases where
# it MUST NOT include any headers other than these and no
# body
self.headers = [(k, v) for (k, v) in self.headers if
k in ('Date', 'ETag', 'Expires',
'Cache-Control', 'Vary')]
timeless@mozdev.org
hgweb: remove ErrorResponse.message...
r26200 status = statusmessage(status.code, str(status))
Dirkjan Ochtman
hgweb: explicit response status
r5993 elif status == 200:
status = '200 Script output follows'
Dirkjan Ochtman
hgweb: separate out start_response() calling
r5888 elif isinstance(status, int):
status = statusmessage(status)
self.server_write = self._start_response(status, self.headers)
self._start_response = None
self.headers = []
Mads Kiilerich
hgweb: pass the actual response body to request.response, not just the length...
r18352 if body is not None:
self.write(body)
self.server_write = None
Dirkjan Ochtman
hgweb: separate out start_response() calling
r5888
Dirkjan Ochtman
hgweb: explicit response status
r5993 def write(self, thing):
Mads Kiilerich
hgweb: don't pass empty response chunks on...
r18351 if thing:
try:
self.server_write(thing)
Gregory Szorc
global: mass rewrite to use modern exception syntax...
r25660 except socket.error as inst:
Mads Kiilerich
hgweb: don't pass empty response chunks on...
r18351 if inst[0] != errno.ECONNRESET:
raise
Eric Hopper
Splitting up hgweb so it's easier to change.
r2355
Alexis S. L. Carvalho
avoid _wsgioutputfile <-> _wsgirequest circular reference...
r4246 def writelines(self, lines):
for line in lines:
self.write(line)
def flush(self):
return None
def close(self):
return None
Dirkjan Ochtman
Less indirection in the WSGI web interface. This simplifies some code, and makes it more compliant with WSGI.
r5566 def wsgiapplication(app_maker):
Dirkjan Ochtman
hgweb: return iterable, add deprecation note
r5887 '''For compatibility with old CGI scripts. A plain hgweb() or hgwebdir()
can and should now be used as a WSGI application.'''
Thomas Arendsen Hein
Removed tabs and trailing whitespace in python files
r5760 application = app_maker()
def run_wsgi(env, respond):
Dirkjan Ochtman
hgweb: return iterable, add deprecation note
r5887 return application(env, respond)
Thomas Arendsen Hein
Removed tabs and trailing whitespace in python files
r5760 return run_wsgi