##// END OF EJS Templates
parsers: inline fields of dirstate values in C version...
parsers: inline fields of dirstate values in C version Previously, while unpacking the dirstate we'd create 3-4 new CPython objects for most dirstate values: - the state is a single character string, which is pooled by CPython - the mode is a new object if it isn't 0 due to being in the lookup set - the size is a new object if it is greater than 255 - the mtime is a new object if it isn't -1 due to being in the lookup set - the tuple to contain them all In some cases such as regular hg status, we actually look at all the objects. In other cases like hg add, hg status for a subdirectory, or hg status with the third-party hgwatchman enabled, we look at almost none of the objects. This patch eliminates most object creation in these cases by defining a custom C struct that is exposed to Python with an interface similar to a tuple. Only when tuple elements are actually requested are the respective objects created. The gains, where they're expected, are significant. The following tests are run against a working copy with over 270,000 files. parse_dirstate becomes significantly faster: $ hg perfdirstate before: wall 0.186437 comb 0.180000 user 0.160000 sys 0.020000 (best of 35) after: wall 0.093158 comb 0.100000 user 0.090000 sys 0.010000 (best of 95) and as a result, several commands benefit: $ time hg status # with hgwatchman enabled before: 0.42s user 0.14s system 99% cpu 0.563 total after: 0.34s user 0.12s system 99% cpu 0.471 total $ time hg add new-file before: 0.85s user 0.18s system 99% cpu 1.033 total after: 0.76s user 0.17s system 99% cpu 0.931 total There is a slight regression in regular status performance, but this is fixed in an upcoming patch.

File last commit:

r20075:f8737bce default
r21809:e250b830 default
Show More
manifest.py
218 lines | 8.0 KiB | text/x-python | PythonLexer
mpm@selenic.com
Break apart hg.py...
r1089 # manifest.py - manifest revision class for mercurial
#
Thomas Arendsen Hein
Updated copyright notices and add "and others" to "hg version"
r4635 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
mpm@selenic.com
Break apart hg.py...
r1089 #
Martin Geisler
updated license to be explicit about GPL version 2
r8225 # This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
Simplify i18n imports
r3891 from i18n import _
Siddharth Agarwal
manifestdict: add a method to diff _flags...
r18821 import mdiff, parsers, error, revlog, util, dicthelpers
Simon Heimberg
separate import lines from mercurial and general python modules
r8312 import array, struct
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
Combine manifest dict and flags dict into a single object...
r2835 class manifestdict(dict):
Alexis S. L. Carvalho
Fix some bugs introduced during the manifest refactoring
r2857 def __init__(self, mapping=None, flags=None):
Matt Mackall
many, many trivial check-code fixups
r10282 if mapping is None:
mapping = {}
if flags is None:
flags = {}
Matt Mackall
Add manifestflags class
r2831 dict.__init__(self, mapping)
Matt Mackall
Switch to simpler manifestdict
r2839 self._flags = flags
Matt Mackall
manifestflags: eliminate remaining users of direct dict access
r2834 def flags(self, f):
Matt Mackall
Switch to simpler manifestdict
r2839 return self._flags.get(f, "")
Jesse Glick
localrepo: optimize internode status calls using withflags...
r16646 def withflags(self):
return set(self._flags.keys())
Matt Mackall
simplify flag handling...
r6743 def set(self, f, flags):
self._flags[f] = flags
Matt Mackall
Add manifestflags class
r2831 def copy(self):
Benoit Boissinot
manifestdict: remove unnecessary dictionary copy...
r9416 return manifestdict(self, dict.copy(self._flags))
Siddharth Agarwal
manifestdict: add a method to diff _flags...
r18821 def flagsdiff(self, d2):
return dicthelpers.diff(self._flags, d2._flags, "")
Matt Mackall
Add manifestflags class
r2831
Matt Mackall
revlog: kill from-style imports...
r7634 class manifest(revlog.revlog):
Matt Mackall
revlog: simplify revlog version handling...
r4258 def __init__(self, opener):
Durham Goode
manifest: increase lrucache from 3 to 4...
r20075 # we expect to deal with not more than four revs at a time,
# during a commit --amend
self._mancache = util.lrucachedict(4)
Matt Mackall
revlog: kill from-style imports...
r7634 revlog.revlog.__init__(self, opener, "00manifest.i")
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
manifest: speed up creation of the manifestdict...
r4995 def parse(self, lines):
mfdict = manifestdict()
Bryan O'Sullivan
manifest: improve parsing performance by 8x via a new C extension
r6389 parsers.parse_manifest(mfdict, mfdict._flags, lines)
Matt Mackall
manifest: speed up creation of the manifestdict...
r4995 return mfdict
Brendan Cully
Abstract manifest block parsing.
r3196
def readdelta(self, node):
Matt Mackall
revlog: remove delta function
r7362 r = self.rev(node)
Benoit Boissinot
deltaparent(): don't return nullrev for a revision containing a full snapshot...
r12011 return self.parse(mdiff.patchtext(self.revdiff(self.deltaparent(r), r)))
Thomas Arendsen Hein
Whitespace/Tab cleanup
r3223
Matt Mackall
manifest: add readfast method
r13711 def readfast(self, node):
'''use the faster of readdelta or read'''
r = self.rev(node)
Sune Foldager
revlog: compute correct deltaparent in the deltaparent function...
r14208 deltaparent = self.deltaparent(r)
if deltaparent != revlog.nullrev and deltaparent in self.parentrevs(r):
Matt Mackall
manifest: add readfast method
r13711 return self.readdelta(node)
return self.read(node)
mpm@selenic.com
Break apart hg.py...
r1089 def read(self, node):
Matt Mackall
revlog: kill from-style imports...
r7634 if node == revlog.nullid:
return manifestdict() # don't upset local cache
Siddharth Agarwal
manifest: use a size 3 LRU cache to store parsed manifests...
r18604 if node in self._mancache:
return self._mancache[node][0]
mpm@selenic.com
Break apart hg.py...
r1089 text = self.revision(node)
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 arraytext = array.array('c', text)
Matt Mackall
manifest: speed up creation of the manifestdict...
r4995 mapping = self.parse(text)
Siddharth Agarwal
manifest: use a size 3 LRU cache to store parsed manifests...
r18604 self._mancache[node] = (mapping, arraytext)
Matt Mackall
Combine manifest dict and flags dict into a single object...
r2835 return mapping
mpm@selenic.com
Break apart hg.py...
r1089
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 def _search(self, m, s, lo=0, hi=None):
'''return a tuple (start, end) that says where to find s within m.
If the string is found m[start:end] are the line containing
that string. If start == end the string was not found and
Mads Kiilerich
delete some dead comments and docstrings
r17426 they indicate the proper sorted insertion point.
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320
m should be a buffer or a string
s is a string'''
def advance(i, c):
while i < lenm and m[i] != c:
i += 1
return i
Patrick Mezard
manifest: fix _search() corner-case...
r7405 if not s:
return (lo, lo)
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 lenm = len(m)
if not hi:
hi = lenm
while lo < hi:
mid = (lo + hi) // 2
start = mid
Matt Mackall
many, many trivial check-code fixups
r10282 while start > 0 and m[start - 1] != '\n':
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 start -= 1
end = advance(start, '\0')
if m[start:end] < s:
# we know that after the null there are 40 bytes of sha1
# this translates to the bisect lo = mid + 1
lo = advance(end + 40, '\n') + 1
else:
# this translates to the bisect hi = mid
hi = start
end = advance(lo, '\0')
found = m[lo:end]
Renato Cunha
manifest: removed usage of the global cmp function...
r11763 if s == found:
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 # we know that after the null there are 40 bytes of sha1
end = advance(end + 40, '\n')
Matt Mackall
many, many trivial check-code fixups
r10282 return (lo, end + 1)
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 else:
return (lo, lo)
def find(self, node, f):
'''look up entry for a single file efficiently.
Alexis S. L. Carvalho
fix manifest.find
r4159 return (node, flags) pair if found, (None, None) if not.'''
Siddharth Agarwal
manifest: use a size 3 LRU cache to store parsed manifests...
r18604 if node in self._mancache:
mapping = self._mancache[node][0]
return mapping.get(f), mapping.flags(f)
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 text = self.revision(node)
start, end = self._search(text, f)
if start == end:
return None, None
l = text[start:end]
f, n = l.split('\0')
Matt Mackall
revlog: kill from-style imports...
r7634 return revlog.bin(n[:40]), n[40:-1]
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320
Matt Mackall
Remove manifest.readflags
r2841 def add(self, map, transaction, link, p1=None, p2=None,
mpm@selenic.com
Break apart hg.py...
r1089 changed=None):
# apply the changes collected during the bisect loop to our addlist
mason@suse.com
Optimize manifest.add...
r1534 # return a delta suitable for addrevision
def addlistdelta(addlist, x):
Durham Goode
commit: increase perf by building a new addlist instead of editing the old one...
r17983 # for large addlist arrays, building a new array is cheaper
# than repeatedly modifying the existing one
currentposition = 0
newaddlist = array.array('c')
for start, end, content in x:
newaddlist += addlist[currentposition:start]
Benoit Boissinot
manifest.add(): simplify with iterators and generator expressions
r9413 if content:
Durham Goode
commit: increase perf by building a new addlist instead of editing the old one...
r17983 newaddlist += array.array('c', content)
currentposition = end
newaddlist += addlist[currentposition:]
deltatext = "".join(struct.pack(">lll", start, end, len(content))
Brodie Rao
cleanup: eradicate long lines
r16683 + content for start, end, content in x)
Durham Goode
commit: increase perf by building a new addlist instead of editing the old one...
r17983 return deltatext, newaddlist
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
manifest: make checkforbidden take a list
r6765 def checkforbidden(l):
for f in l:
if '\n' in f or '\r' in f:
Matt Mackall
errors: move revlog errors...
r7633 raise error.RevlogError(
Greg Ward
manifest: improve error message about newlines in filenames...
r8077 _("'\\n' and '\\r' disallowed in filenames: %r") % f)
Benoit Boissinot
issue352: disallow '\n' and '\r' in filenames (dirstate and manifest)
r3607
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 # if we're using the cache, make sure it is valid and
mpm@selenic.com
Break apart hg.py...
r1089 # parented by the same node we're diffing against
Siddharth Agarwal
manifest: use a size 3 LRU cache to store parsed manifests...
r18604 if not (changed and p1 and (p1 in self._mancache)):
Matt Mackall
replace util.sort with sorted built-in...
r8209 files = sorted(map)
Matt Mackall
manifest: make checkforbidden take a list
r6765 checkforbidden(files)
Benoit Boissinot
issue352: disallow '\n' and '\r' in filenames (dirstate and manifest)
r3607
Matt Mackall
Fix comment syntax
r1651 # if this is changed to support newlines in filenames,
# be sure to check the templates/ dir again (especially *-raw.tmpl)
Matt Mackall
revlog: kill from-style imports...
r7634 hex, flags = revlog.hex, map.flags
Martin Geisler
manifest: use "\0" instead of "\000"...
r14632 text = ''.join("%s\0%s%s\n" % (f, hex(map[f]), flags(f))
Benoit Boissinot
manifest/revlog: do not let the revlog cache mutable objects...
r9420 for f in files)
arraytext = array.array('c', text)
mpm@selenic.com
Break apart hg.py...
r1089 cachedelta = None
else:
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 added, removed = changed
Siddharth Agarwal
manifest: use a size 3 LRU cache to store parsed manifests...
r18604 addlist = self._mancache[p1][1]
mpm@selenic.com
Break apart hg.py...
r1089
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 checkforbidden(added)
mpm@selenic.com
Break apart hg.py...
r1089 # combine the changed lists into one list for sorting
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 work = [(x, False) for x in added]
work.extend((x, True) for x in removed)
Mads Kiilerich
avoid using abbreviations that look like spelling errors
r17428 # this could use heapq.merge() (from Python 2.6+) or equivalent
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 # since the lists are already sorted
mpm@selenic.com
Break apart hg.py...
r1089 work.sort()
delta = []
mason@suse.com
Optimize manifest.add...
r1534 dstart = None
dend = None
dline = [""]
start = 0
# zero copy representation of addlist as a buffer
Matt Mackall
util: don't mess with builtins to emulate buffer()
r15657 addbuf = util.buffer(addlist)
mpm@selenic.com
Break apart hg.py...
r1089
mason@suse.com
Optimize manifest.add...
r1534 # start with a readonly loop that finds the offset of
# each line and creates the deltas
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 for f, todelete in work:
mpm@selenic.com
Break apart hg.py...
r1089 # bs will either be the index of the item or the insert point
Vadim Gelfer
fix parsing of tags. make parse errors useful. add new tag tests....
r2320 start, end = self._search(addbuf, f, start)
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 if not todelete:
Martin Geisler
manifest: use "\0" instead of "\000"...
r14632 l = "%s\0%s%s\n" % (f, revlog.hex(map[f]), map.flags(f))
mpm@selenic.com
Break apart hg.py...
r1089 else:
Benoit Boissinot
manifest.add(): cleanup worklist construction and iteration
r9415 if start == end:
# item we want to delete was not found, error out
raise AssertionError(
_("failed to remove %s from manifest") % f)
mason@suse.com
Optimize manifest.add...
r1534 l = ""
Martin Geisler
code style: prefer 'is' and 'is not' tests with singletons
r13031 if dstart is not None and dstart <= start and dend >= start:
mason@suse.com
Optimize manifest.add...
r1534 if dend < end:
dend = end
if l:
dline.append(l)
mpm@selenic.com
Break apart hg.py...
r1089 else:
Martin Geisler
code style: prefer 'is' and 'is not' tests with singletons
r13031 if dstart is not None:
mason@suse.com
Optimize manifest.add...
r1534 delta.append([dstart, dend, "".join(dline)])
dstart = start
dend = end
dline = [l]
mpm@selenic.com
Break apart hg.py...
r1089
Martin Geisler
code style: prefer 'is' and 'is not' tests with singletons
r13031 if dstart is not None:
mason@suse.com
Optimize manifest.add...
r1534 delta.append([dstart, dend, "".join(dline)])
# apply the delta to the addlist, and get a delta for addrevision
Durham Goode
commit: increase perf by building a new addlist instead of editing the old one...
r17983 deltatext, addlist = addlistdelta(addlist, delta)
cachedelta = (self.rev(p1), deltatext)
Benoit Boissinot
manifest: simplify cache handling, use a unique cache
r9414 arraytext = addlist
Matt Mackall
util: don't mess with builtins to emulate buffer()
r15657 text = util.buffer(arraytext)
mason@suse.com
Optimize manifest.add...
r1534
Benoit Boissinot
manifest/revlog: do not let the revlog cache mutable objects...
r9420 n = self.addrevision(text, transaction, link, p1, p2, cachedelta)
Siddharth Agarwal
manifest: use a size 3 LRU cache to store parsed manifests...
r18604 self._mancache[n] = (map, arraytext)
mpm@selenic.com
Break apart hg.py...
r1089
return n