##// END OF EJS Templates
xdiff: add a preprocessing step that trims files...
xdiff: add a preprocessing step that trims files xdiff has a `xdl_trim_ends` step that removes common lines, unmatchable lines. That is in theory good, but happens too late - after splitting, hashing, and adjusting the hash values so they are unique. Those splitting, hashing and adjusting hash values steps could have noticeable overhead. Diffing two large files with minor (one-line-ish) changes are not uncommon. In that case, the raw performance of those preparation steps seriously matter. Even allocating an O(N) array and storing line offsets to it is expensive. Therefore my previous attempts [1] [2] cannot be good enough since they do not remove the O(N) array assignment. This patch adds a preprocessing step - `xdl_trim_files` that runs before other preprocessing steps. It counts common prefix and suffix and lines in them (needed for displaying line number), without doing anything else. Testing with a crafted large (169MB) file, with minor change: ``` open('a','w').write(''.join('%s\n' % (i % 100000) for i in xrange(30000000) if i != 6000000)) open('b','w').write(''.join('%s\n' % (i % 100000) for i in xrange(30000000) if i != 6003000)) ``` Running xdiff by a simple binary [3], this patch improves the xdiff perf by more than 10x for the above case: ``` # xdiff before this patch 2.41s user 1.13s system 98% cpu 3.592 total # xdiff after this patch 0.14s user 0.16s system 98% cpu 0.309 total # gnu diffutils 0.12s user 0.15s system 98% cpu 0.272 total # (best of 20 runs) ``` It's still slightly slower than GNU diffutils. But it's pretty close now. Testing with real repo data: For the whole repo, this patch makes xdiff 25% faster: ``` # hg perfbdiff --count 100 --alldata -c d334afc585e2 --blocks [--xdiff] # xdiff, after ! wall 0.058861 comb 0.050000 user 0.050000 sys 0.000000 (best of 100) # xdiff, before ! wall 0.077816 comb 0.080000 user 0.080000 sys 0.000000 (best of 91) # bdiff ! wall 0.117473 comb 0.120000 user 0.120000 sys 0.000000 (best of 67) ``` For files that are long (ex. commands.py), the speedup is more than 3x, very significant: ``` # hg perfbdiff --count 3000 --blocks commands.py.i 1 [--xdiff] # xdiff, after ! wall 0.690583 comb 0.690000 user 0.690000 sys 0.000000 (best of 12) # xdiff, before ! wall 2.240361 comb 2.210000 user 2.210000 sys 0.000000 (best of 4) # bdiff ! wall 2.469852 comb 2.440000 user 2.440000 sys 0.000000 (best of 4) ``` [1]: https://phab.mercurial-scm.org/D2631 [2]: https://phab.mercurial-scm.org/D2634 [3]: ``` // Code to run xdiff from command line. No proper error handling. #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include "mercurial/thirdparty/xdiff/xdiff.h" #define ensure(x) if (!(x)) exit(255); mmfile_t readfile(const char *path) { struct stat st; int fd = open(path, O_RDONLY); fstat(fd, &st); mmfile_t file = { malloc(st.st_size), st.st_size }; ensure(read(fd, file.ptr, st.st_size) == st.st_size); close(fd); return file; } int main(int argc, char const *argv[]) { mmfile_t a = readfile(argv[1]), b = readfile(argv[2]); xpparam_t xpp = {0}; xdemitconf_t xecfg = {0}; xdemitcb_t ecb = {0}; xdl_diff(&a, &b, &xpp, &xecfg, &ecb); return 0; } ``` Differential Revision: https://phab.mercurial-scm.org/D2686

File last commit:

r36819:66de4555 default
r36838:f33a87cf default
Show More
wireprototypes.py
157 lines | 4.6 KiB | text/x-python | PythonLexer
# Copyright 2018 Gregory Szorc <gregory.szorc@gmail.com>
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
from __future__ import absolute_import
import abc
# Names of the SSH protocol implementations.
SSHV1 = 'ssh-v1'
# This is advertised over the wire. Incremental the counter at the end
# to reflect BC breakages.
SSHV2 = 'exp-ssh-v2-0001'
# All available wire protocol transports.
TRANSPORTS = {
SSHV1: {
'transport': 'ssh',
'version': 1,
},
SSHV2: {
'transport': 'ssh',
'version': 2,
},
'http-v1': {
'transport': 'http',
'version': 1,
}
}
class bytesresponse(object):
"""A wire protocol response consisting of raw bytes."""
def __init__(self, data):
self.data = data
class ooberror(object):
"""wireproto reply: failure of a batch of operation
Something failed during a batch call. The error message is stored in
`self.message`.
"""
def __init__(self, message):
self.message = message
class pushres(object):
"""wireproto reply: success with simple integer return
The call was successful and returned an integer contained in `self.res`.
"""
def __init__(self, res, output):
self.res = res
self.output = output
class pusherr(object):
"""wireproto reply: failure
The call failed. The `self.res` attribute contains the error message.
"""
def __init__(self, res, output):
self.res = res
self.output = output
class streamres(object):
"""wireproto reply: binary stream
The call was successful and the result is a stream.
Accepts a generator containing chunks of data to be sent to the client.
``prefer_uncompressed`` indicates that the data is expected to be
uncompressable and that the stream should therefore use the ``none``
engine.
"""
def __init__(self, gen=None, prefer_uncompressed=False):
self.gen = gen
self.prefer_uncompressed = prefer_uncompressed
class streamreslegacy(object):
"""wireproto reply: uncompressed binary stream
The call was successful and the result is a stream.
Accepts a generator containing chunks of data to be sent to the client.
Like ``streamres``, but sends an uncompressed data for "version 1" clients
using the application/mercurial-0.1 media type.
"""
def __init__(self, gen=None):
self.gen = gen
class baseprotocolhandler(object):
"""Abstract base class for wire protocol handlers.
A wire protocol handler serves as an interface between protocol command
handlers and the wire protocol transport layer. Protocol handlers provide
methods to read command arguments, redirect stdio for the duration of
the request, handle response types, etc.
"""
__metaclass__ = abc.ABCMeta
@abc.abstractproperty
def name(self):
"""The name of the protocol implementation.
Used for uniquely identifying the transport type.
"""
@abc.abstractmethod
def getargs(self, args):
"""return the value for arguments in <args>
returns a list of values (same order as <args>)"""
@abc.abstractmethod
def forwardpayload(self, fp):
"""Read the raw payload and forward to a file.
The payload is read in full before the function returns.
"""
@abc.abstractmethod
def mayberedirectstdio(self):
"""Context manager to possibly redirect stdio.
The context manager yields a file-object like object that receives
stdout and stderr output when the context manager is active. Or it
yields ``None`` if no I/O redirection occurs.
The intent of this context manager is to capture stdio output
so it may be sent in the response. Some transports support streaming
stdio to the client in real time. For these transports, stdio output
won't be captured.
"""
@abc.abstractmethod
def client(self):
"""Returns a string representation of this client (as bytes)."""
@abc.abstractmethod
def addcapabilities(self, repo, caps):
"""Adds advertised capabilities specific to this protocol.
Receives the list of capabilities collected so far.
Returns a list of capabilities. The passed in argument can be returned.
"""
@abc.abstractmethod
def checkperm(self, perm):
"""Validate that the client has permissions to perform a request.
The argument is the permission required to proceed. If the client
doesn't have that permission, the exception should raise or abort
in a protocol specific manner.
"""