##// END OF EJS Templates
parsers: inline fields of dirstate values in C version...
parsers: inline fields of dirstate values in C version Previously, while unpacking the dirstate we'd create 3-4 new CPython objects for most dirstate values: - the state is a single character string, which is pooled by CPython - the mode is a new object if it isn't 0 due to being in the lookup set - the size is a new object if it is greater than 255 - the mtime is a new object if it isn't -1 due to being in the lookup set - the tuple to contain them all In some cases such as regular hg status, we actually look at all the objects. In other cases like hg add, hg status for a subdirectory, or hg status with the third-party hgwatchman enabled, we look at almost none of the objects. This patch eliminates most object creation in these cases by defining a custom C struct that is exposed to Python with an interface similar to a tuple. Only when tuple elements are actually requested are the respective objects created. The gains, where they're expected, are significant. The following tests are run against a working copy with over 270,000 files. parse_dirstate becomes significantly faster: $ hg perfdirstate before: wall 0.186437 comb 0.180000 user 0.160000 sys 0.020000 (best of 35) after: wall 0.093158 comb 0.100000 user 0.090000 sys 0.010000 (best of 95) and as a result, several commands benefit: $ time hg status # with hgwatchman enabled before: 0.42s user 0.14s system 99% cpu 0.563 total after: 0.34s user 0.12s system 99% cpu 0.471 total $ time hg add new-file before: 0.85s user 0.18s system 99% cpu 1.033 total after: 0.76s user 0.17s system 99% cpu 0.931 total There is a slight regression in regular status performance, but this is fixed in an upcoming patch.

File last commit:

r21790:3fbef7ac default
r21809:e250b830 default
Show More
mdiff.py
362 lines | 11.4 KiB | text/x-python | PythonLexer
mpm@selenic.com
mdiff.py: kill #! line, add copyright notice...
r239 # mdiff.py - diff and patch routines for mercurial
#
Vadim Gelfer
update copyrights.
r2859 # Copyright 2005, 2006 Matt Mackall <mpm@selenic.com>
mpm@selenic.com
mdiff.py: kill #! line, add copyright notice...
r239 #
Martin Geisler
updated license to be explicit about GPL version 2
r8225 # This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
mpm@selenic.com
mdiff.py: kill #! line, add copyright notice...
r239
Patrick Mezard
Let --unified default to diff.unified (issue 1076)
r6467 from i18n import _
Augie Fackler
cleanup: move stdlib imports to their own import statement...
r20034 import bdiff, mpatch, util, base85
import re, struct, zlib
mpm@selenic.com
Add back links from file revisions to changeset revisions...
r0
Vadim Gelfer
fix speed regression in mdiff caused by line split bugfix.
r2251 def splitnewlines(text):
Vadim Gelfer
fix diffs containing embedded "\r"....
r2248 '''like str.splitlines, but only split on newlines.'''
Vadim Gelfer
fix speed regression in mdiff caused by line split bugfix.
r2251 lines = [l + '\n' for l in text.split('\n')]
if lines:
if lines[-1] == '\n':
lines.pop()
else:
lines[-1] = lines[-1][:-1]
return lines
Vadim Gelfer
fix diffs containing embedded "\r"....
r2248
Vadim Gelfer
refactor text diff/patch code....
r2874 class diffopts(object):
'''context is the number of context lines
text treats all files as text
showfunc enables diff -p output
Brendan Cully
Add diff --git option
r2907 git enables the git extended patch format
Stephen Darnell
Add -D/--nodates options to hg diff/export that removes dates from diff headers...
r3199 nodates removes dates from diff headers
Vadim Gelfer
refactor text diff/patch code....
r2874 ignorews ignores all whitespace changes in the diff
ignorewsamount ignores changes in the amount of whitespace
Patrick Mezard
patch: support diff data loss detection and upgrade...
r10189 ignoreblanklines ignores changes whose lines are all blank
upgrade generates git diffs to avoid data loss
'''
Thomas Arendsen Hein
Show revisions in diffs like CVS, based on a patch from Goffredo Baroncelli....
r396
Vadim Gelfer
refactor text diff/patch code....
r2874 defaults = {
'context': 3,
'text': False,
Matt Mackall
diff: don't show function name by default...
r5863 'showfunc': False,
Brendan Cully
Add diff --git option
r2907 'git': False,
Stephen Darnell
Add -D/--nodates options to hg diff/export that removes dates from diff headers...
r3199 'nodates': False,
Stephen Lee
diff: add nobinary config to suppress git-style binary diffs
r21790 'nobinary': False,
Vadim Gelfer
refactor text diff/patch code....
r2874 'ignorews': False,
'ignorewsamount': False,
'ignoreblanklines': False,
Patrick Mezard
patch: support diff data loss detection and upgrade...
r10189 'upgrade': False,
Vadim Gelfer
refactor text diff/patch code....
r2874 }
__slots__ = defaults.keys()
def __init__(self, **opts):
for k in self.__slots__:
v = opts.get(k)
if v is None:
v = self.defaults[k]
setattr(self, k, v)
Patrick Mezard
Let --unified default to diff.unified (issue 1076)
r6467 try:
self.context = int(self.context)
except ValueError:
raise util.Abort(_('diff context lines count must be '
'an integer, not %r') % self.context)
Patrick Mezard
mq: preserve --git flag when merging patches...
r10185 def copy(self, **kwargs):
opts = dict((k, getattr(self, k)) for k in self.defaults)
opts.update(kwargs)
return diffopts(**opts)
Vadim Gelfer
refactor text diff/patch code....
r2874 defaultopts = diffopts()
Patrick Mezard
mdiff: fix diff -b/B/w on mixed whitespace hunks (issue127)...
r9827 def wsclean(opts, text, blank=True):
Matt Mackall
diff: correctly handle combinations of whitespace options
r4878 if opts.ignorews:
Patrick Mezard
mdiff: replace wscleanup() regexps with C loops...
r15530 text = bdiff.fixws(text, 1)
Matt Mackall
diff: correctly handle combinations of whitespace options
r4878 elif opts.ignorewsamount:
Patrick Mezard
mdiff: replace wscleanup() regexps with C loops...
r15530 text = bdiff.fixws(text, 0)
Patrick Mezard
mdiff: fix diff -b/B/w on mixed whitespace hunks (issue127)...
r9827 if blank and opts.ignoreblanklines:
Patrick Mezard
diff: --ignore-blank-lines was too enthusiastic...
r15509 text = re.sub('\n+', '\n', text).strip('\n')
Matt Mackall
diff: correctly handle combinations of whitespace options
r4878 return text
Patrick Mezard
annotate: support diff whitespace filtering flags (issue3030)...
r15528 def splitblock(base1, lines1, base2, lines2, opts):
# The input lines matches except for interwoven blank lines. We
# transform it into a sequence of matching blocks and blank blocks.
lines1 = [(wsclean(opts, l) and 1 or 0) for l in lines1]
lines2 = [(wsclean(opts, l) and 1 or 0) for l in lines2]
s1, e1 = 0, len(lines1)
s2, e2 = 0, len(lines2)
while s1 < e1 or s2 < e2:
i1, i2, btype = s1, s2, '='
if (i1 >= e1 or lines1[i1] == 0
or i2 >= e2 or lines2[i2] == 0):
# Consume the block of blank lines
btype = '~'
while i1 < e1 and lines1[i1] == 0:
i1 += 1
while i2 < e2 and lines2[i2] == 0:
i2 += 1
else:
# Consume the matching lines
while i1 < e1 and lines1[i1] == 1 and lines2[i2] == 1:
i1 += 1
i2 += 1
yield [base1 + s1, base1 + i1, base2 + s2, base2 + i2], btype
s1 = i1
s2 = i2
def allblocks(text1, text2, opts=None, lines1=None, lines2=None, refine=False):
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 """Return (block, type) tuples, where block is an mdiff.blocks
line entry. type is '=' for blocks matching exactly one another
(bdiff blocks), '!' for non-matching blocks and '~' for blocks
Patrick Mezard
annotate: support diff whitespace filtering flags (issue3030)...
r15528 matching only after having filtered blank lines. If refine is True,
then '~' blocks are refined and are only made of blank lines.
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 line1 and line2 are text1 and text2 split with splitnewlines() if
they are already available.
Patrick Mezard
mdiff: extract blocks whitespace normalization in diffblocks()...
r15525 """
if opts is None:
opts = defaultopts
if opts.ignorews or opts.ignorewsamount:
text1 = wsclean(opts, text1, False)
text2 = wsclean(opts, text2, False)
diff = bdiff.blocks(text1, text2)
for i, s1 in enumerate(diff):
# The first match is special.
# we've either found a match starting at line 0 or a match later
# in the file. If it starts later, old and new below will both be
# empty and we'll continue to the next match.
if i > 0:
s = diff[i - 1]
else:
s = [0, 0, 0, 0]
s = [s[1], s1[0], s[3], s1[2]]
# bdiff sometimes gives huge matches past eof, this check eats them,
# and deals with the special first match case described above
Patrick Mezard
mdiff: split lines in allblocks() only when necessary...
r15529 if s[0] != s[1] or s[2] != s[3]:
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 type = '!'
if opts.ignoreblanklines:
Patrick Mezard
mdiff: split lines in allblocks() only when necessary...
r15529 if lines1 is None:
lines1 = splitnewlines(text1)
if lines2 is None:
lines2 = splitnewlines(text2)
old = wsclean(opts, "".join(lines1[s[0]:s[1]]))
new = wsclean(opts, "".join(lines2[s[2]:s[3]]))
if old == new:
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 type = '~'
yield s, type
yield s1, '='
Patrick Mezard
mdiff: extract blocks whitespace normalization in diffblocks()...
r15525
Guillermo Pérez
diff: unify calls to diffline...
r17940 def unidiff(a, ad, b, bd, fn1, fn2, opts=defaultopts):
Patrick Mezard
mdiff: fix diff header generation for files with spaces (issue3357)...
r16362 def datetag(date, fn=None):
Alexis S. L. Carvalho
git patches: correct handling of filenames with spaces...
r4679 if not opts.git and not opts.nodates:
return '\t%s\n' % date
Patrick Mezard
mdiff: fix diff header generation for files with spaces (issue3357)...
r16362 if fn and ' ' in fn:
Alexis S. L. Carvalho
git patches: correct handling of filenames with spaces...
r4679 return '\t\n'
return '\n'
Brendan Cully
Remove dates from git export file lines - they confuse git-apply
r3026
Matt Mackall
many, many trivial check-code fixups
r10282 if not a and not b:
return ""
Matt Mackall
Clean up mdiff imports
r1379 epoch = util.datestr((0, 0))
mpm@selenic.com
Attempt to make diff deal with null sources properly...
r264
Mads Kiilerich
diff: always use / in paths in diff...
r15437 fn1 = util.pconvert(fn1)
fn2 = util.pconvert(fn2)
Vadim Gelfer
refactor text diff/patch code....
r2874 if not opts.text and (util.binary(a) or util.binary(b)):
Martin Geisler
mdiff: compare content of binary files directly...
r6871 if a and b and len(a) == len(b) and a == b:
tailgunner@smtp.ru
Don't lie that "binary file has changed"...
r4103 return ""
Dustin Sallings
Use both the from and to name in mdiff.unidiff....
r5482 l = ['Binary file %s has changed\n' % fn1]
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 elif not a:
Vadim Gelfer
fix speed regression in mdiff caused by line split bugfix.
r2251 b = splitnewlines(b)
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 if a is None:
Patrick Mezard
mdiff: fix diff header generation for files with spaces (issue3357)...
r16362 l1 = '--- /dev/null%s' % datetag(epoch)
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 else:
Patrick Mezard
mdiff: fix diff header generation for files with spaces (issue3357)...
r16362 l1 = "--- %s%s" % ("a/" + fn1, datetag(ad, fn1))
l2 = "+++ %s%s" % ("b/" + fn2, datetag(bd, fn2))
mpm@selenic.com
Attempt to make diff deal with null sources properly...
r264 l3 = "@@ -0,0 +1,%d @@\n" % len(b)
l = [l1, l2, l3] + ["+" + e for e in b]
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 elif not b:
Vadim Gelfer
fix speed regression in mdiff caused by line split bugfix.
r2251 a = splitnewlines(a)
Patrick Mezard
mdiff: fix diff header generation for files with spaces (issue3357)...
r16362 l1 = "--- %s%s" % ("a/" + fn1, datetag(ad, fn1))
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 if b is None:
Patrick Mezard
mdiff: fix diff header generation for files with spaces (issue3357)...
r16362 l2 = '+++ /dev/null%s' % datetag(epoch)
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 else:
Patrick Mezard
mdiff: fix diff header generation for files with spaces (issue3357)...
r16362 l2 = "+++ %s%s" % ("b/" + fn2, datetag(bd, fn2))
mpm@selenic.com
Attempt to make diff deal with null sources properly...
r264 l3 = "@@ -1,%d +0,0 @@\n" % len(a)
l = [l1, l2, l3] + ["-" + e for e in a]
else:
Vadim Gelfer
fix speed regression in mdiff caused by line split bugfix.
r2251 al = splitnewlines(a)
bl = splitnewlines(b)
Benoit Boissinot
remove header handling out of mdiff.bunidiff, rename it
r10614 l = list(_unidiff(a, b, al, bl, opts=opts))
Matt Mackall
many, many trivial check-code fixups
r10282 if not l:
return ""
Benoit Boissinot
remove header handling out of mdiff.bunidiff, rename it
r10614
Patrick Mezard
mdiff: fix diff header generation for files with spaces (issue3357)...
r16362 l.insert(0, "--- a/%s%s" % (fn1, datetag(ad, fn1)))
l.insert(1, "+++ b/%s%s" % (fn2, datetag(bd, fn2)))
mpm@selenic.com
hg diff: fix missing final newline bug
r170
for ln in xrange(len(l)):
if l[ln][-1] != '\n':
l[ln] += "\n\ No newline at end of file\n"
mpm@selenic.com
Add back links from file revisions to changeset revisions...
r0 return "".join(l)
Benoit Boissinot
remove header handling out of mdiff.bunidiff, rename it
r10614 # creates a headerless unified diff
mason@suse.com
Add new bdiff based unidiff generation.
r1637 # t1 and t2 are the text to be diffed
# l1 and l2 are the text broken up into lines
Benoit Boissinot
remove header handling out of mdiff.bunidiff, rename it
r10614 def _unidiff(t1, t2, l1, l2, opts=defaultopts):
mason@suse.com
Add new bdiff based unidiff generation.
r1637 def contextend(l, len):
Vadim Gelfer
refactor text diff/patch code....
r2874 ret = l + opts.context
mason@suse.com
Add new bdiff based unidiff generation.
r1637 if ret > len:
ret = len
return ret
def contextstart(l):
Vadim Gelfer
refactor text diff/patch code....
r2874 ret = l - opts.context
mason@suse.com
Add new bdiff based unidiff generation.
r1637 if ret < 0:
return 0
return ret
Brodie Rao
mdiff: speed up showfunc for large diffs...
r15141 lastfunc = [0, '']
Benoit Boissinot
remove header handling out of mdiff.bunidiff, rename it
r10614 def yieldhunk(hunk):
mason@suse.com
Add new bdiff based unidiff generation.
r1637 (astart, a2, bstart, b2, delta) = hunk
aend = contextend(a2, len(l1))
alen = aend - astart
blen = b2 - bstart + aend - a2
func = ""
Vadim Gelfer
refactor text diff/patch code....
r2874 if opts.showfunc:
Brodie Rao
mdiff: speed up showfunc for large diffs...
r15141 lastpos, func = lastfunc
# walk backwards from the start of the context up to the start of
# the previous hunk context until we find a line starting with an
# alphanumeric char.
for i in xrange(astart - 1, lastpos - 1, -1):
if l1[i][0].isalnum():
func = ' ' + l1[i].rstrip()[:40]
lastfunc[1] = func
mason@suse.com
Add new bdiff based unidiff generation.
r1637 break
Brodie Rao
mdiff: speed up showfunc for large diffs...
r15141 # by recording this hunk's starting point as the next place to
# start looking for function lines, we avoid reading any line in
# the file more than once.
lastfunc[0] = astart
mason@suse.com
Add new bdiff based unidiff generation.
r1637
Nicolas Venegas
mdiff/patch: fix bad hunk handling for unified diffs with zero context...
r15462 # zero-length hunk ranges report their start line as one less
if alen:
astart += 1
if blen:
bstart += 1
yield "@@ -%d,%d +%d,%d @@%s\n" % (astart, alen,
bstart, blen, func)
mason@suse.com
Add new bdiff based unidiff generation.
r1637 for x in delta:
yield x
for x in xrange(a2, aend):
yield ' ' + l1[x]
# bdiff.blocks gives us the matching sequences in the files. The loop
# below finds the spaces between those matching sequences and translates
# them into diff output.
#
hunk = None
Patrick Mezard
mdiff: adjust hunk offsets with --ignore-blank-lines (issue3234)...
r16089 ignoredlines = 0
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 for s, stype in allblocks(t1, t2, opts, l1, l2):
Patrick Mezard
mdiff: adjust hunk offsets with --ignore-blank-lines (issue3234)...
r16089 a1, a2, b1, b2 = s
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 if stype != '!':
Patrick Mezard
mdiff: adjust hunk offsets with --ignore-blank-lines (issue3234)...
r16089 if stype == '~':
# The diff context lines are based on t1 content. When
# blank lines are ignored, the new lines offsets must
# be adjusted as if equivalent blocks ('~') had the
# same sizes on both sides.
ignoredlines += (b2 - b1) - (a2 - a1)
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 continue
mason@suse.com
Add new bdiff based unidiff generation.
r1637 delta = []
old = l1[a1:a2]
new = l2[b1:b2]
Patrick Mezard
mdiff: adjust hunk offsets with --ignore-blank-lines (issue3234)...
r16089 b1 -= ignoredlines
b2 -= ignoredlines
mason@suse.com
Add new bdiff based unidiff generation.
r1637 astart = contextstart(a1)
bstart = contextstart(b1)
prev = None
if hunk:
# join with the previous hunk if it falls inside the context
Vadim Gelfer
refactor text diff/patch code....
r2874 if astart < hunk[1] + opts.context + 1:
mason@suse.com
Add new bdiff based unidiff generation.
r1637 prev = hunk
astart = hunk[1]
bstart = hunk[3]
else:
Benoit Boissinot
remove header handling out of mdiff.bunidiff, rename it
r10614 for x in yieldhunk(hunk):
mason@suse.com
Add new bdiff based unidiff generation.
r1637 yield x
if prev:
# we've joined the previous hunk, record the new ending points.
hunk[1] = a2
hunk[3] = b2
delta = hunk[4]
else:
# create a new hunk
Matt Mackall
many, many trivial check-code fixups
r10282 hunk = [astart, a2, bstart, b2, delta]
mason@suse.com
Add new bdiff based unidiff generation.
r1637
Matt Mackall
many, many trivial check-code fixups
r10282 delta[len(delta):] = [' ' + x for x in l1[astart:a1]]
delta[len(delta):] = ['-' + x for x in old]
delta[len(delta):] = ['+' + x for x in new]
mason@suse.com
Add new bdiff based unidiff generation.
r1637
if hunk:
Benoit Boissinot
remove header handling out of mdiff.bunidiff, rename it
r10614 for x in yieldhunk(hunk):
mason@suse.com
Add new bdiff based unidiff generation.
r1637 yield x
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939 def b85diff(to, tn):
'''print base85-encoded binary diff'''
def fmtline(line):
l = len(line)
if l <= 26:
l = chr(ord('A') + l - 1)
else:
l = chr(l - 26 + ord('a') - 1)
return '%c%s\n' % (l, base85.b85encode(line, True))
def chunk(text, csize=52):
l = len(text)
i = 0
while i < l:
yield text[i:i + csize]
i += csize
Guillermo Pérez
diff: move index header generation to patch...
r17946 if to is None:
to = ''
if tn is None:
tn = ''
if to == tn:
return ''
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939
# TODO: deltas
Guillermo Pérez
diff: move index header generation to patch...
r17946 ret = []
ret.append('GIT binary patch\n')
ret.append('literal %s\n' % len(tn))
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939 for l in chunk(zlib.compress(tn)):
ret.append(fmtline(l))
ret.append('\n')
Guillermo Pérez
diff: move index header generation to patch...
r17946
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939 return ''.join(ret)
mpm@selenic.com
Add a function to return the new text from a binary diff
r120 def patchtext(bin):
pos = 0
t = []
while pos < len(bin):
p1, p2, l = struct.unpack(">lll", bin[pos:pos + 12])
pos += 12
t.append(bin[pos:pos + l])
pos += l
return "".join(t)
mpm@selenic.com
Add back links from file revisions to changeset revisions...
r0 def patch(a, bin):
Benoit Boissinot
mdiff.patch(): add a special case for when the base text is empty...
r12025 if len(a) == 0:
# skip over trivial delta header
Matt Mackall
util: don't mess with builtins to emulate buffer()
r15657 return util.buffer(bin, 12)
Matt Mackall
Clean up mdiff imports
r1379 return mpatch.patches(a, [bin])
mpm@selenic.com
Start using bdiff for generating deltas...
r432
Alexis S. L. Carvalho
add mdiff.get_matching_blocks
r4361 # similar to difflib.SequenceMatcher.get_matching_blocks
def get_matching_blocks(a, b):
return [(d[0], d[2], d[1] - d[0]) for d in bdiff.blocks(a, b)]
Matt Mackall
revlog: generate trivial deltas against null revision...
r5367 def trivialdiffheader(length):
return struct.pack(">lll", 0, 0, length)
Matt Mackall
Clean up mdiff imports
r1379 patches = mpatch.patches
mason@suse.com
Fill in the uncompressed size during revlog.addgroup...
r2078 patchedsize = mpatch.patchedsize
mpm@selenic.com
Start using bdiff for generating deltas...
r432 textdiff = bdiff.bdiff