##// END OF EJS Templates
wireproto: compress data from a generator...
wireproto: compress data from a generator Currently, the "getbundle" wire protocol command obtains a generator of data, converts it to a util.chunkbuffer, then converts it back to a generator via the protocol's groupchunks() implementation. For the SSH protocol, groupchunks() simply reads 4kb chunks then write()s the data to a file descriptor. For the HTTP protocol, groupchunks() reads 32kb chunks, feeds those into a zlib compressor, emits compressed data as it is available, and that is sent to the WSGI layer, where it is likely turned into HTTP chunked transfer chunks as is or further buffered and turned into a larger chunk. For both the SSH and HTTP protocols, there is inefficiency from using util.chunkbuffer. For SSH, emitting consistent 4kb chunks sounds nice. However, the file descriptor it is writing to is almost certainly buffered. That means that a Python .write() probably doesn't translate into exactly what is written to the I/O layer. For HTTP, we're going through an intermediate layer to zlib compress data. So all util.chunkbuffer is doing is ensuring that the chunks we feed into the zlib compressor are of uniform size. This means more CPU time in Python buffering and emitting chunks in util.chunkbuffer but fewer function calls to zlib. This patch introduces and implements a new wire protocol abstract method: compresschunks(). It is like groupchunks() except it operates on a generator instead of something with a .read(). The SSH implementation simply proxies chunks. The HTTP implementation uses zlib compression. To avoid duplicate code, the HTTP groupchunks() has been reimplemented in terms of compresschunks(). To prove this all works, the "getbundle" wire protocol command has been switched to compresschunks(). This removes the util.chunkbuffer from that command. Now, data essentially streams straight from the changegroup emitter to the wire, possibly through a zlib compressor. Generators all the way, baby. There were slim to no performance changes on the server as measured with the mozilla-central repository. This is likely because CPU time is dominated by reading revlogs, producing the changegroup, and zlib compressing the output stream. Still, this brings us a little closer to our ideal of using generators everywhere.

File last commit:

r30206:d1051954 default
r30206:d1051954 default
Show More
protocol.py
133 lines | 4.0 KiB | text/x-python | PythonLexer
Dirkjan Ochtman
separate the wire protocol commands from the user interface commands
r5598 #
# Copyright 21 May 2005 - (c) 2005 Jake Edge <jake@edge2.net>
# Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
#
Martin Geisler
updated license to be explicit about GPL version 2
r8225 # This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
Dirkjan Ochtman
separate the wire protocol commands from the user interface commands
r5598
Yuya Nishihara
hgweb: use absolute_import
r27046 from __future__ import absolute_import
import cgi
import zlib
from .common import (
HTTP_OK,
)
from .. import (
util,
wireproto,
)
timeless
pycompat: switch to util.stringio for py3 compat
r28861 stringio = util.stringio
Dirkjan Ochtman
hgweb: explicitly check if requested command exists
r5963
timeless
pycompat: switch to util.urlreq/util.urlerr for py3 compat
r28883 urlerr = util.urlerr
urlreq = util.urlreq
Dirkjan Ochtman
hgweb: explicit response status
r5993 HGTYPE = 'application/mercurial-0.1'
Andrew Pritchard
wireproto: add out-of-band error class to allow remote repo to report errors...
r15017 HGERRTYPE = 'application/hg-error'
Matt Mackall
protocol: move hgweb protocol support back into protocol.py...
r11595
Pierre-Yves David
wireproto: introduce an abstractserverproto class...
r20903 class webproto(wireproto.abstractserverproto):
Idan Kamara
ui: use I/O descriptors internally...
r14614 def __init__(self, req, ui):
Matt Mackall
protocol: move hgweb protocol support back into protocol.py...
r11595 self.req = req
self.response = ''
Idan Kamara
ui: use I/O descriptors internally...
r14614 self.ui = ui
Matt Mackall
protocol: move hgweb protocol support back into protocol.py...
r11595 def getargs(self, args):
Steven Brown
httprepo: long arguments support (issue2126)...
r14093 knownargs = self._args()
Matt Mackall
protocol: move hgweb protocol support back into protocol.py...
r11595 data = {}
keys = args.split()
for k in keys:
if k == '*':
star = {}
Steven Brown
httprepo: long arguments support (issue2126)...
r14093 for key in knownargs.keys():
Peter Arrenbrecht
wireproto: fix handling of '*' args for HTTP and SSH
r13721 if key != 'cmd' and key not in keys:
Steven Brown
httprepo: long arguments support (issue2126)...
r14093 star[key] = knownargs[key][0]
Matt Mackall
protocol: move hgweb protocol support back into protocol.py...
r11595 data['*'] = star
else:
Steven Brown
httprepo: long arguments support (issue2126)...
r14093 data[k] = knownargs[k][0]
Matt Mackall
protocol: move hgweb protocol support back into protocol.py...
r11595 return [data[k] for k in keys]
Steven Brown
httprepo: long arguments support (issue2126)...
r14093 def _args(self):
args = self.req.form.copy()
Augie Fackler
http: support sending hgargs via POST body instead of in GET or headers...
r28530 postlen = int(self.req.env.get('HTTP_X_HGARGS_POST', 0))
if postlen:
args.update(cgi.parse_qs(
self.req.read(postlen), keep_blank_values=True))
return args
Steven Brown
httprepo: long arguments support (issue2126)...
r14093 chunks = []
Matt Mackall
http: minor tweaks to long arg handling...
r14094 i = 1
Martin Geisler
check-code: flag 0/1 used as constant Boolean expression
r14494 while True:
Matt Mackall
http: minor tweaks to long arg handling...
r14094 h = self.req.env.get('HTTP_X_HGARG_' + str(i))
Steven Brown
httprepo: long arguments support (issue2126)...
r14093 if h is None:
break
chunks += [h]
Matt Mackall
http: minor tweaks to long arg handling...
r14094 i += 1
Steven Brown
httprepo: long arguments support (issue2126)...
r14093 args.update(cgi.parse_qs(''.join(chunks), keep_blank_values=True))
return args
Dirkjan Ochtman
protocol: shuffle server methods to group send methods
r11621 def getfile(self, fp):
length = int(self.req.env['CONTENT_LENGTH'])
for s in util.filechunkiter(self.req, limit=length):
fp.write(s)
def redirect(self):
Idan Kamara
ui: use I/O descriptors internally...
r14614 self.oldio = self.ui.fout, self.ui.ferr
timeless
pycompat: switch to util.stringio for py3 compat
r28861 self.ui.ferr = self.ui.fout = stringio()
Idan Kamara
ui: use I/O descriptors internally...
r14614 def restore(self):
val = self.ui.fout.getvalue()
self.ui.ferr, self.ui.fout = self.oldio
return val
Gregory Szorc
wireproto: compress data from a generator...
r30206
Gregory Szorc
wireproto: rename argument to groupchunks()...
r30014 def groupchunks(self, fh):
Gregory Szorc
wireproto: compress data from a generator...
r30206 def getchunks():
while True:
chunk = fh.read(32768)
if not chunk:
break
yield chunk
return self.compresschunks(getchunks())
def compresschunks(self, chunks):
Gregory Szorc
hgweb: document why we don't allow untrusted settings to control zlib...
r29788 # Don't allow untrusted settings because disabling compression or
# setting a very high compression level could lead to flooding
# the server's network or CPU.
Gregory Szorc
hgweb: config option to control zlib compression level...
r29748 z = zlib.compressobj(self.ui.configint('server', 'zliblevel', -1))
Gregory Szorc
wireproto: compress data from a generator...
r30206 for chunk in chunks:
Gregory Szorc
hgweb: tweak zlib chunking behavior...
r29792 data = z.compress(chunk)
# Not all calls to compress() emit data. It is cheaper to inspect
# that here than to send it via the generator.
if data:
yield data
Dirkjan Ochtman
protocol: extract compression from streaming mechanics
r11623 yield z.flush()
Gregory Szorc
wireproto: compress data from a generator...
r30206
Matt Mackall
protocol: move hgweb protocol support back into protocol.py...
r11595 def _client(self):
return 'remote:%s:%s:%s' % (
self.req.env.get('wsgi.url_scheme') or 'http',
timeless
pycompat: switch to util.urlreq/util.urlerr for py3 compat
r28883 urlreq.quote(self.req.env.get('REMOTE_HOST', '')),
urlreq.quote(self.req.env.get('REMOTE_USER', '')))
Matt Mackall
protocol: move hgweb protocol support back into protocol.py...
r11595
def iscmd(cmd):
return cmd in wireproto.commands
def call(repo, req, cmd):
Idan Kamara
ui: use I/O descriptors internally...
r14614 p = webproto(req, repo.ui)
Dirkjan Ochtman
protocol: wrap non-string protocol responses in classes
r11625 rsp = wireproto.dispatch(repo, p, cmd)
Dirkjan Ochtman
protocol: use generators instead of req.write() for hgweb stream responses
r11626 if isinstance(rsp, str):
Mads Kiilerich
hgweb: pass the actual response body to request.response, not just the length...
r18352 req.respond(HTTP_OK, HGTYPE, body=rsp)
return []
Dirkjan Ochtman
protocol: use generators instead of req.write() for hgweb stream responses
r11626 elif isinstance(rsp, wireproto.streamres):
req.respond(HTTP_OK, HGTYPE)
return rsp.gen
elif isinstance(rsp, wireproto.pushres):
Idan Kamara
ui: use I/O descriptors internally...
r14614 val = p.restore()
Mads Kiilerich
hgweb: use Content-Length for pushres...
r18346 rsp = '%d\n%s' % (rsp.res, val)
Mads Kiilerich
hgweb: pass the actual response body to request.response, not just the length...
r18352 req.respond(HTTP_OK, HGTYPE, body=rsp)
return []
Benoit Boissinot
wireproto: introduce pusherr() to deal with "unsynced changes" error...
r12703 elif isinstance(rsp, wireproto.pusherr):
Benoit Boissinot
wireproto/http: drain the incoming bundle in case of errors
r12704 # drain the incoming bundle
req.drain()
Idan Kamara
ui: use I/O descriptors internally...
r14614 p.restore()
Benoit Boissinot
wireproto: introduce pusherr() to deal with "unsynced changes" error...
r12703 rsp = '0\n%s\n' % rsp.res
Mads Kiilerich
hgweb: pass the actual response body to request.response, not just the length...
r18352 req.respond(HTTP_OK, HGTYPE, body=rsp)
return []
Andrew Pritchard
wireproto: add out-of-band error class to allow remote repo to report errors...
r15017 elif isinstance(rsp, wireproto.ooberror):
rsp = rsp.message
Mads Kiilerich
hgweb: pass the actual response body to request.response, not just the length...
r18352 req.respond(HTTP_OK, HGERRTYPE, body=rsp)
return []