##// END OF EJS Templates
copies: split the combination of the copies mapping in its own function...
copies: split the combination of the copies mapping in its own function In some case, this part take up to 95% of the copy tracing that take about a hundred second. This poor performance comes from the fact we keep duplciating and merging dictionary that are mostly similar. I want to experiment with smarter native code to do this, so I need to isolate the function first.

File last commit:

r43812:2fe6121c default
r44178:0cc91600 default
Show More
mdiff.py
557 lines | 17.4 KiB | text/x-python | PythonLexer
mpm@selenic.com
mdiff.py: kill #! line, add copyright notice...
r239 # mdiff.py - diff and patch routines for mercurial
#
Vadim Gelfer
update copyrights.
r2859 # Copyright 2005, 2006 Matt Mackall <mpm@selenic.com>
mpm@selenic.com
mdiff.py: kill #! line, add copyright notice...
r239 #
Martin Geisler
updated license to be explicit about GPL version 2
r8225 # This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
mpm@selenic.com
mdiff.py: kill #! line, add copyright notice...
r239
Gregory Szorc
mdiff: use absolute_import
r27484 from __future__ import absolute_import
import re
import struct
import zlib
from .i18n import _
Gregory Szorc
py3: manually import getattr where it is needed...
r43359 from .pycompat import (
getattr,
setattr,
)
Gregory Szorc
mdiff: use absolute_import
r27484 from . import (
Yuya Nishihara
diff: do not split function name if character encoding is unknown...
r36432 encoding,
Gregory Szorc
mdiff: use absolute_import
r27484 error,
Yuya Nishihara
bdiff: switch to policy importer...
r32369 policy,
Pulkit Goyal
diff: use pycompat.{byteskwargs, strkwargs} to switch opts b/w bytes and str
r31631 pycompat,
Gregory Szorc
mdiff: use absolute_import
r27484 util,
)
Boris Feld
util: extract all date-related utils in utils/dateutil module...
r36625 from .utils import dateutil
mpm@selenic.com
Add back links from file revisions to changeset revisions...
r0
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 _missing_newline_marker = b"\\ No newline at end of file\n"
Joerg Sonnenberger
mdiff: explicitly compute places for the newline marker...
r35869
Augie Fackler
cleanup: remove pointless r-prefixes on single-quoted strings...
r43906 bdiff = policy.importmod('bdiff')
mpatch = policy.importmod('mpatch')
Yuya Nishihara
bdiff: switch to policy importer...
r32369
Yuya Nishihara
bdiff: proxy through mdiff module...
r32201 blocks = bdiff.blocks
fixws = bdiff.fixws
Yuya Nishihara
mdiff: move re-exports to top...
r32199 patches = mpatch.patches
patchedsize = mpatch.patchedsize
Gregory Szorc
cext: accept arguments as Py_buffer...
r36673 textdiff = bdiff.bdiff
Augie Fackler
bdiff: write a native version of splitnewlines...
r36163 splitnewlines = bdiff.splitnewlines
Vadim Gelfer
fix diffs containing embedded "\r"....
r2248
Augie Fackler
formatting: blacken the codebase...
r43346
Augie Fackler
mdiff: mark diffopts as having dynamic attributes...
r43784 # TODO: this looks like it could be an attrs, which might help pytype
Vadim Gelfer
refactor text diff/patch code....
r2874 class diffopts(object):
'''context is the number of context lines
text treats all files as text
showfunc enables diff -p output
Brendan Cully
Add diff --git option
r2907 git enables the git extended patch format
Stephen Darnell
Add -D/--nodates options to hg diff/export that removes dates from diff headers...
r3199 nodates removes dates from diff headers
Siddharth Agarwal
mdiff.diffopts: add doc comment for nobinary
r23293 nobinary ignores binary files
Siddharth Agarwal
mdiff.diffopts: add a new noprefix option...
r23294 noprefix disables the 'a/' and 'b/' prefixes (ignored in plain mode)
Vadim Gelfer
refactor text diff/patch code....
r2874 ignorews ignores all whitespace changes in the diff
ignorewsamount ignores changes in the amount of whitespace
Patrick Mezard
patch: support diff data loss detection and upgrade...
r10189 ignoreblanklines ignores changes whose lines are all blank
upgrade generates git diffs to avoid data loss
'''
Thomas Arendsen Hein
Show revisions in diffs like CVS, based on a patch from Goffredo Baroncelli....
r396
Augie Fackler
mdiff: mark diffopts as having dynamic attributes...
r43784 _HAS_DYNAMIC_ATTRIBUTES = True
Vadim Gelfer
refactor text diff/patch code....
r2874 defaults = {
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 b'context': 3,
b'text': False,
b'showfunc': False,
b'git': False,
b'nodates': False,
b'nobinary': False,
b'noprefix': False,
b'index': 0,
b'ignorews': False,
b'ignorewsamount': False,
b'ignorewseol': False,
b'ignoreblanklines': False,
b'upgrade': False,
b'showsimilarity': False,
b'worddiff': False,
b'xdiff': False,
Augie Fackler
formatting: blacken the codebase...
r43346 }
Vadim Gelfer
refactor text diff/patch code....
r2874
def __init__(self, **opts):
Pulkit Goyal
diff: use pycompat.{byteskwargs, strkwargs} to switch opts b/w bytes and str
r31631 opts = pycompat.byteskwargs(opts)
Gregory Szorc
mdiff: remove use of __slots__...
r29416 for k in self.defaults.keys():
Vadim Gelfer
refactor text diff/patch code....
r2874 v = opts.get(k)
if v is None:
v = self.defaults[k]
setattr(self, k, v)
Patrick Mezard
Let --unified default to diff.unified (issue 1076)
r6467 try:
self.context = int(self.context)
except ValueError:
Augie Fackler
formatting: blacken the codebase...
r43346 raise error.Abort(
Martin von Zweigbergk
cleanup: join string literals that are already on one line...
r43387 _(b'diff context lines count must be an integer, not %r')
Augie Fackler
formatting: blacken the codebase...
r43346 % pycompat.bytestr(self.context)
)
Patrick Mezard
Let --unified default to diff.unified (issue 1076)
r6467
Patrick Mezard
mq: preserve --git flag when merging patches...
r10185 def copy(self, **kwargs):
opts = dict((k, getattr(self, k)) for k in self.defaults)
Pulkit Goyal
py3: use pycompat.strkwargs() to convert kwargs keys to str
r33102 opts = pycompat.strkwargs(opts)
Patrick Mezard
mq: preserve --git flag when merging patches...
r10185 opts.update(kwargs)
return diffopts(**opts)
Augie Fackler
formatting: blacken the codebase...
r43346
Vadim Gelfer
refactor text diff/patch code....
r2874 defaultopts = diffopts()
Augie Fackler
formatting: blacken the codebase...
r43346
Patrick Mezard
mdiff: fix diff -b/B/w on mixed whitespace hunks (issue127)...
r9827 def wsclean(opts, text, blank=True):
Matt Mackall
diff: correctly handle combinations of whitespace options
r4878 if opts.ignorews:
Patrick Mezard
mdiff: replace wscleanup() regexps with C loops...
r15530 text = bdiff.fixws(text, 1)
Matt Mackall
diff: correctly handle combinations of whitespace options
r4878 elif opts.ignorewsamount:
Patrick Mezard
mdiff: replace wscleanup() regexps with C loops...
r15530 text = bdiff.fixws(text, 0)
Patrick Mezard
mdiff: fix diff -b/B/w on mixed whitespace hunks (issue127)...
r9827 if blank and opts.ignoreblanklines:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 text = re.sub(b'\n+', b'\n', text).strip(b'\n')
David Soria Parra
mdiff: add a --ignore-space-at-eol option...
r34015 if opts.ignorewseol:
Pulkit Goyal
py3: add missing b'' prefix in mdiff.py...
r37389 text = re.sub(br'[ \t\r\f]+\n', br'\n', text)
Matt Mackall
diff: correctly handle combinations of whitespace options
r4878 return text
Augie Fackler
formatting: blacken the codebase...
r43346
Patrick Mezard
annotate: support diff whitespace filtering flags (issue3030)...
r15528 def splitblock(base1, lines1, base2, lines2, opts):
# The input lines matches except for interwoven blank lines. We
# transform it into a sequence of matching blocks and blank blocks.
lines1 = [(wsclean(opts, l) and 1 or 0) for l in lines1]
lines2 = [(wsclean(opts, l) and 1 or 0) for l in lines2]
s1, e1 = 0, len(lines1)
s2, e2 = 0, len(lines2)
while s1 < e1 or s2 < e2:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 i1, i2, btype = s1, s2, b'='
Augie Fackler
formatting: blacken the codebase...
r43346 if i1 >= e1 or lines1[i1] == 0 or i2 >= e2 or lines2[i2] == 0:
Patrick Mezard
annotate: support diff whitespace filtering flags (issue3030)...
r15528 # Consume the block of blank lines
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 btype = b'~'
Patrick Mezard
annotate: support diff whitespace filtering flags (issue3030)...
r15528 while i1 < e1 and lines1[i1] == 0:
i1 += 1
while i2 < e2 and lines2[i2] == 0:
i2 += 1
else:
# Consume the matching lines
while i1 < e1 and lines1[i1] == 1 and lines2[i2] == 1:
i1 += 1
i2 += 1
yield [base1 + s1, base1 + i1, base2 + s2, base2 + i2], btype
s1 = i1
s2 = i2
Augie Fackler
formatting: blacken the codebase...
r43346
Denis Laxalde
mdiff: add a hunkinrange helper function...
r31808 def hunkinrange(hunk, linerange):
"""Return True if `hunk` defined as (start, length) is in `linerange`
defined as (lowerbound, upperbound).
>>> hunkinrange((5, 10), (2, 7))
True
>>> hunkinrange((5, 10), (6, 12))
True
>>> hunkinrange((5, 10), (13, 17))
True
>>> hunkinrange((5, 10), (3, 17))
True
>>> hunkinrange((5, 10), (1, 3))
False
>>> hunkinrange((5, 10), (18, 20))
False
>>> hunkinrange((5, 10), (1, 5))
False
>>> hunkinrange((5, 10), (15, 27))
False
"""
start, length = hunk
lowerbound, upperbound = linerange
return lowerbound < start + length and start < upperbound
Augie Fackler
formatting: blacken the codebase...
r43346
Denis Laxalde
mdiff: add a "blocksinrange" function to filter diff blocks by line range...
r30717 def blocksinrange(blocks, rangeb):
"""filter `blocks` like (a1, a2, b1, b2) from items outside line range
`rangeb` from ``(b1, b2)`` point of view.
Return `filteredblocks, rangea` where:
* `filteredblocks` is list of ``block = (a1, a2, b1, b2), stype`` items of
`blocks` that are inside `rangeb` from ``(b1, b2)`` point of view; a
block ``(b1, b2)`` being inside `rangeb` if
``rangeb[0] < b2 and b1 < rangeb[1]``;
* `rangea` is the line range w.r.t. to ``(a1, a2)`` parts of `blocks`.
"""
lbb, ubb = rangeb
lba, uba = None, None
filteredblocks = []
for block in blocks:
(a1, a2, b1, b2), stype = block
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 if lbb >= b1 and ubb <= b2 and stype == b'=':
Denis Laxalde
mdiff: add a "blocksinrange" function to filter diff blocks by line range...
r30717 # rangeb is within a single "=" hunk, restrict back linerange1
# by offsetting rangeb
lba = lbb - b1 + a1
uba = ubb - b1 + a1
else:
if b1 <= lbb < b2:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 if stype == b'=':
Denis Laxalde
mdiff: add a "blocksinrange" function to filter diff blocks by line range...
r30717 lba = a2 - (b2 - lbb)
else:
lba = a1
if b1 < ubb <= b2:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 if stype == b'=':
Denis Laxalde
mdiff: add a "blocksinrange" function to filter diff blocks by line range...
r30717 uba = a1 + (ubb - b1)
else:
uba = a2
Denis Laxalde
mdiff: add a hunkinrange helper function...
r31808 if hunkinrange((b1, (b2 - b1)), rangeb):
Denis Laxalde
mdiff: add a "blocksinrange" function to filter diff blocks by line range...
r30717 filteredblocks.append(block)
if lba is None or uba is None or uba < lba:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 raise error.Abort(_(b'line range exceeds file size'))
Denis Laxalde
mdiff: add a "blocksinrange" function to filter diff blocks by line range...
r30717 return filteredblocks, (lba, uba)
Augie Fackler
formatting: blacken the codebase...
r43346
Jun Wu
mdiff: add a config option to use xdiff algorithm...
r36694 def chooseblocksfunc(opts=None):
Augie Fackler
formatting: blacken the codebase...
r43346 if (
opts is None
or not opts.xdiff
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 or not util.safehasattr(bdiff, b'xdiffblocks')
Augie Fackler
formatting: blacken the codebase...
r43346 ):
Jun Wu
mdiff: add a config option to use xdiff algorithm...
r36694 return bdiff.blocks
else:
return bdiff.xdiffblocks
Augie Fackler
formatting: blacken the codebase...
r43346
Philippe Pepiot
mdiff: remove unused parameter 'refine' from allblocks()
r30023 def allblocks(text1, text2, opts=None, lines1=None, lines2=None):
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 """Return (block, type) tuples, where block is an mdiff.blocks
line entry. type is '=' for blocks matching exactly one another
(bdiff blocks), '!' for non-matching blocks and '~' for blocks
Philippe Pepiot
mdiff: remove unused parameter 'refine' from allblocks()
r30023 matching only after having filtered blank lines.
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 line1 and line2 are text1 and text2 split with splitnewlines() if
they are already available.
Patrick Mezard
mdiff: extract blocks whitespace normalization in diffblocks()...
r15525 """
if opts is None:
opts = defaultopts
David Soria Parra
mdiff: add a --ignore-space-at-eol option...
r34015 if opts.ignorews or opts.ignorewsamount or opts.ignorewseol:
Patrick Mezard
mdiff: extract blocks whitespace normalization in diffblocks()...
r15525 text1 = wsclean(opts, text1, False)
text2 = wsclean(opts, text2, False)
Jun Wu
mdiff: add a config option to use xdiff algorithm...
r36694 diff = chooseblocksfunc(opts)(text1, text2)
Patrick Mezard
mdiff: extract blocks whitespace normalization in diffblocks()...
r15525 for i, s1 in enumerate(diff):
# The first match is special.
# we've either found a match starting at line 0 or a match later
# in the file. If it starts later, old and new below will both be
# empty and we'll continue to the next match.
if i > 0:
s = diff[i - 1]
else:
s = [0, 0, 0, 0]
s = [s[1], s1[0], s[3], s1[2]]
# bdiff sometimes gives huge matches past eof, this check eats them,
# and deals with the special first match case described above
Patrick Mezard
mdiff: split lines in allblocks() only when necessary...
r15529 if s[0] != s[1] or s[2] != s[3]:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 type = b'!'
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 if opts.ignoreblanklines:
Patrick Mezard
mdiff: split lines in allblocks() only when necessary...
r15529 if lines1 is None:
lines1 = splitnewlines(text1)
if lines2 is None:
lines2 = splitnewlines(text2)
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 old = wsclean(opts, b"".join(lines1[s[0] : s[1]]))
new = wsclean(opts, b"".join(lines2[s[2] : s[3]]))
Patrick Mezard
mdiff: split lines in allblocks() only when necessary...
r15529 if old == new:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 type = b'~'
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 yield s, type
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 yield s1, b'='
Patrick Mezard
mdiff: extract blocks whitespace normalization in diffblocks()...
r15525
Augie Fackler
formatting: blacken the codebase...
r43346
Yuya Nishihara
patch: unify check_binary and binary flags...
r35969 def unidiff(a, ad, b, bd, fn1, fn2, binary, opts=defaultopts):
Denis Laxalde
mdiff: let unidiff return (diffheader, hunks)...
r31273 """Return a unified diff as a (headers, hunks) tuple.
Denis Laxalde
mdiff: distinguish diff headers from hunks in unidiff()...
r31271
If the diff is not null, `headers` is a list with unified diff header
Denis Laxalde
mdiff: let unidiff return (diffheader, hunks)...
r31273 lines "--- <original>" and "+++ <new>" and `hunks` is a generator yielding
(hunkrange, hunklines) coming from _unidiff().
Otherwise, `headers` and `hunks` are empty.
Joerg Sonnenberger
patch: avoid repeated binary checks if all files in a patch are text...
r35868
Yuya Nishihara
patch: unify check_binary and binary flags...
r35969 Set binary=True if either a or b should be taken as a binary file.
Denis Laxalde
mdiff: distinguish diff headers from hunks in unidiff()...
r31271 """
Augie Fackler
formatting: blacken the codebase...
r43346
Patrick Mezard
mdiff: fix diff header generation for files with spaces (issue3357)...
r16362 def datetag(date, fn=None):
Alexis S. L. Carvalho
git patches: correct handling of filenames with spaces...
r4679 if not opts.git and not opts.nodates:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 return b'\t%s' % date
if fn and b' ' in fn:
return b'\t'
return b''
Brendan Cully
Remove dates from git export file lines - they confuse git-apply
r3026
Denis Laxalde
mdiff: let unidiff return (diffheader, hunks)...
r31273 sentinel = [], ()
Matt Mackall
many, many trivial check-code fixups
r10282 if not a and not b:
Denis Laxalde
mdiff: distinguish diff headers from hunks in unidiff()...
r31271 return sentinel
Siddharth Agarwal
mdiff.unidiff: add support for noprefix
r23299
if opts.noprefix:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 aprefix = bprefix = b''
Siddharth Agarwal
mdiff.unidiff: add support for noprefix
r23299 else:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 aprefix = b'a/'
bprefix = b'b/'
Siddharth Agarwal
mdiff.unidiff: add support for noprefix
r23299
Boris Feld
util: extract all date-related utils in utils/dateutil module...
r36625 epoch = dateutil.datestr((0, 0))
mpm@selenic.com
Attempt to make diff deal with null sources properly...
r264
Mads Kiilerich
diff: always use / in paths in diff...
r15437 fn1 = util.pconvert(fn1)
fn2 = util.pconvert(fn2)
Yuya Nishihara
patch: unify check_binary and binary flags...
r35969 if binary:
Martin Geisler
mdiff: compare content of binary files directly...
r6871 if a and b and len(a) == len(b) and a == b:
Denis Laxalde
mdiff: distinguish diff headers from hunks in unidiff()...
r31271 return sentinel
headerlines = []
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 hunks = ((None, [b'Binary file %s has changed\n' % fn1]),)
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 elif not a:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 without_newline = not b.endswith(b'\n')
Vadim Gelfer
fix speed regression in mdiff caused by line split bugfix.
r2251 b = splitnewlines(b)
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 if a is None:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 l1 = b'--- /dev/null%s' % datetag(epoch)
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 else:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 l1 = b"--- %s%s%s" % (aprefix, fn1, datetag(ad, fn1))
l2 = b"+++ %s%s" % (bprefix + fn2, datetag(bd, fn2))
Denis Laxalde
mdiff: distinguish diff headers from hunks in unidiff()...
r31271 headerlines = [l1, l2]
Denis Laxalde
mdiff: let unidiff return (diffheader, hunks)...
r31273 size = len(b)
hunkrange = (0, 0, 1, size)
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 hunklines = [b"@@ -0,0 +1,%d @@\n" % size] + [b"+" + e for e in b]
Joerg Sonnenberger
mdiff: explicitly compute places for the newline marker...
r35869 if without_newline:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 hunklines[-1] += b'\n'
Joerg Sonnenberger
mdiff: explicitly compute places for the newline marker...
r35869 hunklines.append(_missing_newline_marker)
Augie Fackler
formatting: blacken the codebase...
r43346 hunks = ((hunkrange, hunklines),)
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 elif not b:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 without_newline = not a.endswith(b'\n')
Vadim Gelfer
fix speed regression in mdiff caused by line split bugfix.
r2251 a = splitnewlines(a)
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 l1 = b"--- %s%s%s" % (aprefix, fn1, datetag(ad, fn1))
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 if b is None:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 l2 = b'+++ /dev/null%s' % datetag(epoch)
Thomas Arendsen Hein
Fix diff against an empty file (issue124) and add a test for this.
r1723 else:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 l2 = b"+++ %s%s%s" % (bprefix, fn2, datetag(bd, fn2))
Denis Laxalde
mdiff: distinguish diff headers from hunks in unidiff()...
r31271 headerlines = [l1, l2]
Denis Laxalde
mdiff: let unidiff return (diffheader, hunks)...
r31273 size = len(a)
hunkrange = (1, size, 0, 0)
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 hunklines = [b"@@ -1,%d +0,0 @@\n" % size] + [b"-" + e for e in a]
Joerg Sonnenberger
mdiff: explicitly compute places for the newline marker...
r35869 if without_newline:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 hunklines[-1] += b'\n'
Joerg Sonnenberger
mdiff: explicitly compute places for the newline marker...
r35869 hunklines.append(_missing_newline_marker)
Augie Fackler
formatting: blacken the codebase...
r43346 hunks = ((hunkrange, hunklines),)
mpm@selenic.com
Attempt to make diff deal with null sources properly...
r264 else:
Joerg Sonnenberger
mdiff: remove rewindhunk by yielding a bool first to indicate data...
r35870 hunks = _unidiff(a, b, opts=opts)
if not next(hunks):
Denis Laxalde
mdiff: distinguish diff headers from hunks in unidiff()...
r31271 return sentinel
Benoit Boissinot
remove header handling out of mdiff.bunidiff, rename it
r10614
Denis Laxalde
mdiff: distinguish diff headers from hunks in unidiff()...
r31271 headerlines = [
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 b"--- %s%s%s" % (aprefix, fn1, datetag(ad, fn1)),
b"+++ %s%s%s" % (bprefix, fn2, datetag(bd, fn2)),
Denis Laxalde
mdiff: distinguish diff headers from hunks in unidiff()...
r31271 ]
Denis Laxalde
mdiff: let unidiff return (diffheader, hunks)...
r31273
return headerlines, hunks
mpm@selenic.com
Add back links from file revisions to changeset revisions...
r0
Augie Fackler
formatting: blacken the codebase...
r43346
Denis Laxalde
mdiff: compute newlines-splitted texts within _unidiff...
r31267 def _unidiff(t1, t2, opts=defaultopts):
Denis Laxalde
mdiff: let _unidiff yield hunks as (<range information>, <hunk lines>)...
r31269 """Yield hunks of a headerless unified diff from t1 and t2 texts.
Each hunk consists of a (hunkrange, hunklines) tuple where `hunkrange` is a
tuple (s1, l1, s2, l2) representing the range information of the hunk to
form the '@@ -s1,l1 +s2,l2 @@' header and `hunklines` is a list of lines
of the hunk combining said header followed by line additions and
deletions.
Joerg Sonnenberger
mdiff: explicitly compute places for the newline marker...
r35869
The hunks are prefixed with a bool.
Denis Laxalde
mdiff: let _unidiff yield hunks as (<range information>, <hunk lines>)...
r31269 """
Denis Laxalde
mdiff: compute newlines-splitted texts within _unidiff...
r31267 l1 = splitnewlines(t1)
l2 = splitnewlines(t2)
Augie Fackler
formatting: blacken the codebase...
r43346
mason@suse.com
Add new bdiff based unidiff generation.
r1637 def contextend(l, len):
Vadim Gelfer
refactor text diff/patch code....
r2874 ret = l + opts.context
mason@suse.com
Add new bdiff based unidiff generation.
r1637 if ret > len:
ret = len
return ret
def contextstart(l):
Vadim Gelfer
refactor text diff/patch code....
r2874 ret = l - opts.context
mason@suse.com
Add new bdiff based unidiff generation.
r1637 if ret < 0:
return 0
return ret
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 lastfunc = [0, b'']
Augie Fackler
formatting: blacken the codebase...
r43346
Benoit Boissinot
remove header handling out of mdiff.bunidiff, rename it
r10614 def yieldhunk(hunk):
mason@suse.com
Add new bdiff based unidiff generation.
r1637 (astart, a2, bstart, b2, delta) = hunk
aend = contextend(a2, len(l1))
alen = aend - astart
blen = b2 - bstart + aend - a2
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 func = b""
Vadim Gelfer
refactor text diff/patch code....
r2874 if opts.showfunc:
Brodie Rao
mdiff: speed up showfunc for large diffs...
r15141 lastpos, func = lastfunc
# walk backwards from the start of the context up to the start of
# the previous hunk context until we find a line starting with an
# alphanumeric char.
Gregory Szorc
global: use pycompat.xrange()...
r38806 for i in pycompat.xrange(astart - 1, lastpos - 1, -1):
Pulkit Goyal
py3: slice on bytes instead of indexing...
r35601 if l1[i][0:1].isalnum():
Yuya Nishihara
diff: do not split function name if character encoding is unknown...
r36432 func = b' ' + l1[i].rstrip()
# split long function name if ASCII. otherwise we have no
# idea where the multi-byte boundary is, so just leave it.
if encoding.isasciistr(func):
func = func[:41]
Brodie Rao
mdiff: speed up showfunc for large diffs...
r15141 lastfunc[1] = func
mason@suse.com
Add new bdiff based unidiff generation.
r1637 break
Brodie Rao
mdiff: speed up showfunc for large diffs...
r15141 # by recording this hunk's starting point as the next place to
# start looking for function lines, we avoid reading any line in
# the file more than once.
lastfunc[0] = astart
mason@suse.com
Add new bdiff based unidiff generation.
r1637
Nicolas Venegas
mdiff/patch: fix bad hunk handling for unified diffs with zero context...
r15462 # zero-length hunk ranges report their start line as one less
if alen:
astart += 1
if blen:
bstart += 1
Denis Laxalde
mdiff: let _unidiff yield hunks as (<range information>, <hunk lines>)...
r31269 hunkrange = astart, alen, bstart, blen
hunklines = (
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 [b"@@ -%d,%d +%d,%d @@%s\n" % (hunkrange + (func,))]
Denis Laxalde
mdiff: let _unidiff yield hunks as (<range information>, <hunk lines>)...
r31269 + delta
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 + [b' ' + l1[x] for x in pycompat.xrange(a2, aend)]
Denis Laxalde
mdiff: let _unidiff yield hunks as (<range information>, <hunk lines>)...
r31269 )
Joerg Sonnenberger
mdiff: explicitly compute places for the newline marker...
r35869 # If either file ends without a newline and the last line of
# that file is part of a hunk, a marker is printed. If the
# last line of both files is identical and neither ends in
# a newline, print only one marker. That's the only case in
# which the hunk can end in a shared line without a newline.
skip = False
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 if not t1.endswith(b'\n') and astart + alen == len(l1) + 1:
Gregory Szorc
global: use pycompat.xrange()...
r38806 for i in pycompat.xrange(len(hunklines) - 1, -1, -1):
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 if hunklines[i].startswith((b'-', b' ')):
if hunklines[i].startswith(b' '):
Joerg Sonnenberger
mdiff: explicitly compute places for the newline marker...
r35869 skip = True
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 hunklines[i] += b'\n'
Joerg Sonnenberger
mdiff: explicitly compute places for the newline marker...
r35869 hunklines.insert(i + 1, _missing_newline_marker)
break
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 if not skip and not t2.endswith(b'\n') and bstart + blen == len(l2) + 1:
Gregory Szorc
global: use pycompat.xrange()...
r38806 for i in pycompat.xrange(len(hunklines) - 1, -1, -1):
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 if hunklines[i].startswith(b'+'):
hunklines[i] += b'\n'
Joerg Sonnenberger
mdiff: explicitly compute places for the newline marker...
r35869 hunklines.insert(i + 1, _missing_newline_marker)
break
Denis Laxalde
mdiff: let _unidiff yield hunks as (<range information>, <hunk lines>)...
r31269 yield hunkrange, hunklines
mason@suse.com
Add new bdiff based unidiff generation.
r1637
# bdiff.blocks gives us the matching sequences in the files. The loop
# below finds the spaces between those matching sequences and translates
# them into diff output.
#
hunk = None
Patrick Mezard
mdiff: adjust hunk offsets with --ignore-blank-lines (issue3234)...
r16089 ignoredlines = 0
Joerg Sonnenberger
mdiff: remove rewindhunk by yielding a bool first to indicate data...
r35870 has_hunks = False
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 for s, stype in allblocks(t1, t2, opts, l1, l2):
Patrick Mezard
mdiff: adjust hunk offsets with --ignore-blank-lines (issue3234)...
r16089 a1, a2, b1, b2 = s
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 if stype != b'!':
if stype == b'~':
Patrick Mezard
mdiff: adjust hunk offsets with --ignore-blank-lines (issue3234)...
r16089 # The diff context lines are based on t1 content. When
# blank lines are ignored, the new lines offsets must
# be adjusted as if equivalent blocks ('~') had the
# same sizes on both sides.
ignoredlines += (b2 - b1) - (a2 - a1)
Patrick Mezard
mdiff: make diffblocks() return all blocks, matching and changed...
r15526 continue
mason@suse.com
Add new bdiff based unidiff generation.
r1637 delta = []
old = l1[a1:a2]
new = l2[b1:b2]
Patrick Mezard
mdiff: adjust hunk offsets with --ignore-blank-lines (issue3234)...
r16089 b1 -= ignoredlines
b2 -= ignoredlines
mason@suse.com
Add new bdiff based unidiff generation.
r1637 astart = contextstart(a1)
bstart = contextstart(b1)
prev = None
if hunk:
# join with the previous hunk if it falls inside the context
Vadim Gelfer
refactor text diff/patch code....
r2874 if astart < hunk[1] + opts.context + 1:
mason@suse.com
Add new bdiff based unidiff generation.
r1637 prev = hunk
astart = hunk[1]
bstart = hunk[3]
else:
Joerg Sonnenberger
mdiff: remove rewindhunk by yielding a bool first to indicate data...
r35870 if not has_hunks:
has_hunks = True
yield True
Benoit Boissinot
remove header handling out of mdiff.bunidiff, rename it
r10614 for x in yieldhunk(hunk):
mason@suse.com
Add new bdiff based unidiff generation.
r1637 yield x
if prev:
# we've joined the previous hunk, record the new ending points.
hunk[1] = a2
hunk[3] = b2
delta = hunk[4]
else:
# create a new hunk
Matt Mackall
many, many trivial check-code fixups
r10282 hunk = [astart, a2, bstart, b2, delta]
mason@suse.com
Add new bdiff based unidiff generation.
r1637
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 delta[len(delta) :] = [b' ' + x for x in l1[astart:a1]]
delta[len(delta) :] = [b'-' + x for x in old]
delta[len(delta) :] = [b'+' + x for x in new]
mason@suse.com
Add new bdiff based unidiff generation.
r1637
if hunk:
Joerg Sonnenberger
mdiff: remove rewindhunk by yielding a bool first to indicate data...
r35870 if not has_hunks:
has_hunks = True
yield True
Benoit Boissinot
remove header handling out of mdiff.bunidiff, rename it
r10614 for x in yieldhunk(hunk):
mason@suse.com
Add new bdiff based unidiff generation.
r1637 yield x
Joerg Sonnenberger
mdiff: remove rewindhunk by yielding a bool first to indicate data...
r35870 elif not has_hunks:
yield False
mason@suse.com
Add new bdiff based unidiff generation.
r1637
Augie Fackler
formatting: blacken the codebase...
r43346
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939 def b85diff(to, tn):
'''print base85-encoded binary diff'''
Augie Fackler
formatting: blacken the codebase...
r43346
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939 def fmtline(line):
l = len(line)
if l <= 26:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 l = pycompat.bytechr(ord(b'A') + l - 1)
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939 else:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 l = pycompat.bytechr(l - 26 + ord(b'a') - 1)
return b'%c%s\n' % (l, util.b85encode(line, True))
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939
def chunk(text, csize=52):
l = len(text)
i = 0
while i < l:
Augie Fackler
formatting: blacken the codebase...
r43346 yield text[i : i + csize]
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939 i += csize
Guillermo Pérez
diff: move index header generation to patch...
r17946 if to is None:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 to = b''
Guillermo Pérez
diff: move index header generation to patch...
r17946 if tn is None:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 tn = b''
Guillermo Pérez
diff: move index header generation to patch...
r17946
if to == tn:
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 return b''
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939
# TODO: deltas
Guillermo Pérez
diff: move index header generation to patch...
r17946 ret = []
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 ret.append(b'GIT binary patch\n')
ret.append(b'literal %d\n' % len(tn))
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939 for l in chunk(zlib.compress(tn)):
ret.append(fmtline(l))
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 ret.append(b'\n')
Guillermo Pérez
diff: move index header generation to patch...
r17946
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 return b''.join(ret)
Guillermo Pérez <bisho at fb.com>
diff: move b85diff to mdiff module...
r17939
Augie Fackler
formatting: blacken the codebase...
r43346
mpm@selenic.com
Add a function to return the new text from a binary diff
r120 def patchtext(bin):
pos = 0
t = []
while pos < len(bin):
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 p1, p2, l = struct.unpack(b">lll", bin[pos : pos + 12])
mpm@selenic.com
Add a function to return the new text from a binary diff
r120 pos += 12
Augie Fackler
formatting: blacken the codebase...
r43346 t.append(bin[pos : pos + l])
mpm@selenic.com
Add a function to return the new text from a binary diff
r120 pos += l
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 return b"".join(t)
mpm@selenic.com
Add a function to return the new text from a binary diff
r120
Augie Fackler
formatting: blacken the codebase...
r43346
mpm@selenic.com
Add back links from file revisions to changeset revisions...
r0 def patch(a, bin):
Benoit Boissinot
mdiff.patch(): add a special case for when the base text is empty...
r12025 if len(a) == 0:
# skip over trivial delta header
Matt Mackall
util: don't mess with builtins to emulate buffer()
r15657 return util.buffer(bin, 12)
Matt Mackall
Clean up mdiff imports
r1379 return mpatch.patches(a, [bin])
mpm@selenic.com
Start using bdiff for generating deltas...
r432
Augie Fackler
formatting: blacken the codebase...
r43346
Alexis S. L. Carvalho
add mdiff.get_matching_blocks
r4361 # similar to difflib.SequenceMatcher.get_matching_blocks
def get_matching_blocks(a, b):
return [(d[0], d[2], d[1] - d[0]) for d in bdiff.blocks(a, b)]
Augie Fackler
formatting: blacken the codebase...
r43346
Matt Mackall
revlog: generate trivial deltas against null revision...
r5367 def trivialdiffheader(length):
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 return struct.pack(b">lll", 0, 0, length) if length else b''
Matt Mackall
revlog: generate trivial deltas against null revision...
r5367
Augie Fackler
formatting: blacken the codebase...
r43346
Mike Edgar
mdiff: add helper for making deltas which replace the full text of a revision...
r24119 def replacediffheader(oldlen, newlen):
Augie Fackler
formatting: byteify all mercurial/ and hgext/ string literals...
r43347 return struct.pack(b">lll", 0, oldlen, newlen)