##// END OF EJS Templates
parsers: inline fields of dirstate values in C version...
parsers: inline fields of dirstate values in C version Previously, while unpacking the dirstate we'd create 3-4 new CPython objects for most dirstate values: - the state is a single character string, which is pooled by CPython - the mode is a new object if it isn't 0 due to being in the lookup set - the size is a new object if it is greater than 255 - the mtime is a new object if it isn't -1 due to being in the lookup set - the tuple to contain them all In some cases such as regular hg status, we actually look at all the objects. In other cases like hg add, hg status for a subdirectory, or hg status with the third-party hgwatchman enabled, we look at almost none of the objects. This patch eliminates most object creation in these cases by defining a custom C struct that is exposed to Python with an interface similar to a tuple. Only when tuple elements are actually requested are the respective objects created. The gains, where they're expected, are significant. The following tests are run against a working copy with over 270,000 files. parse_dirstate becomes significantly faster: $ hg perfdirstate before: wall 0.186437 comb 0.180000 user 0.160000 sys 0.020000 (best of 35) after: wall 0.093158 comb 0.100000 user 0.090000 sys 0.010000 (best of 95) and as a result, several commands benefit: $ time hg status # with hgwatchman enabled before: 0.42s user 0.14s system 99% cpu 0.563 total after: 0.34s user 0.12s system 99% cpu 0.471 total $ time hg add new-file before: 0.85s user 0.18s system 99% cpu 1.033 total after: 0.76s user 0.17s system 99% cpu 0.931 total There is a slight regression in regular status performance, but this is fixed in an upcoming patch.

File last commit:

r19148:3bda242b default
r21809:e250b830 default
Show More
filelog.py
92 lines | 2.7 KiB | text/x-python | PythonLexer
mpm@selenic.com
Break apart hg.py...
r1089 # filelog.py - file history class for mercurial
#
Thomas Arendsen Hein
Updated copyright notices and add "and others" to "hg version"
r4635 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
mpm@selenic.com
Break apart hg.py...
r1089 #
Martin Geisler
updated license to be explicit about GPL version 2
r8225 # This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
mpm@selenic.com
Break apart hg.py...
r1089
Matt Mackall
revlog: kill from-style imports...
r7634 import revlog
Sune Foldager
filelog: extract metadata parsing and packing...
r14074 import re
mpm@selenic.com
Break apart hg.py...
r1089
Sune Foldager
filelog: extract metadata parsing and packing...
r14074 _mdre = re.compile('\1\n')
Matt Mackall
filelog: move metadata parsing to a helper function
r13240 def _parsemeta(text):
Sune Foldager
filelog: extract metadata parsing and packing...
r14074 """return (metadatadict, keylist, metadatasize)"""
# text can be buffer, so we can't use .startswith or .index
if text[:2] != '\1\n':
return None, None, None
s = _mdre.search(text, 2).start()
mtext = text[2:s]
meta = {}
keys = []
for l in mtext.splitlines():
Matt Mackall
filelog: move metadata parsing to a helper function
r13240 k, v = l.split(": ", 1)
Sune Foldager
filelog: extract metadata parsing and packing...
r14074 meta[k] = v
keys.append(k)
return meta, keys, (s + 2)
def _packmeta(meta, keys=None):
if not keys:
keys = sorted(meta.iterkeys())
return "".join("%s: %s\n" % (k, meta[k]) for k in keys)
Matt Mackall
filelog: move metadata parsing to a helper function
r13240
Matt Mackall
revlog: kill from-style imports...
r7634 class filelog(revlog.revlog):
Matt Mackall
revlog: simplify revlog version handling...
r4258 def __init__(self, opener, path):
Durham Goode
filelog: use super() for calling base functions...
r19148 super(filelog, self).__init__(opener,
Benoit Boissinot
filelog encoding: move the encoding/decoding into store...
r8531 "/".join(("data", path + ".i")))
mpm@selenic.com
Break apart hg.py...
r1089
def read(self, node):
t = self.revision(node)
if not t.startswith('\1\n'):
return t
Benoit Boissinot
use __contains__, index or split instead of str.find...
r2579 s = t.index('\1\n', 2)
Matt Mackall
many, many trivial check-code fixups
r10282 return t[s + 2:]
mpm@selenic.com
Break apart hg.py...
r1089
def add(self, text, meta, transaction, link, p1=None, p2=None):
if meta or text.startswith('\1\n'):
Sune Foldager
filelog: extract metadata parsing and packing...
r14074 text = "\1\n%s\1\n%s" % (_packmeta(meta), text)
mpm@selenic.com
Break apart hg.py...
r1089 return self.addrevision(text, transaction, link, p1, p2)
mpm@selenic.com
Add some rename debugging support
r1116 def renamed(self, node):
Matt Mackall
revlog: kill from-style imports...
r7634 if self.parents(node)[0] != revlog.nullid:
mpm@selenic.com
Add some rename debugging support
r1116 return False
Matt Mackall
filelog: move metadata parsing to a helper function
r13240 t = self.revision(node)
Sune Foldager
filelog: extract metadata parsing and packing...
r14074 m = _parsemeta(t)[0]
Christian Ebert
Prefer i in d over d.has_key(i)
r5915 if m and "copy" in m:
Matt Mackall
revlog: kill from-style imports...
r7634 return (m["copy"], revlog.bin(m["copyrev"]))
mpm@selenic.com
Add some rename debugging support
r1116 return False
Matt Mackall
merge: use file size stored in revlog index...
r2898 def size(self, rev):
"""return the size of a given revision"""
# for revisions with renames, we have to go the slow way
node = self.node(rev)
if self.renamed(node):
return len(self.read(node))
Nicolas Dumazet
filelog: test behaviour for data starting with "\1\n"...
r11540 # XXX if self.read(node).startswith("\1\n"), this returns (size+4)
Durham Goode
filelog: use super() for calling base functions...
r19148 return super(filelog, self).size(rev)
Matt Mackall
merge: use file size stored in revlog index...
r2898
Matt Mackall
filelog: add hash-based comparisons...
r2887 def cmp(self, node, text):
Nicolas Dumazet
cmp: document the fact that we return True if content is different...
r11539 """compare text with a given file revision
returns True if text is different than what is stored.
"""
Matt Mackall
filelog: add hash-based comparisons...
r2887
Nicolas Dumazet
filelog: cmp: don't read data if hashes are identical (issue2273)...
r11541 t = text
if text.startswith('\1\n'):
t = '\1\n\1\n' + text
Durham Goode
filelog: use super() for calling base functions...
r19148 samehashes = not super(filelog, self).cmp(node, t)
Nicolas Dumazet
filelog: cmp: don't read data if hashes are identical (issue2273)...
r11541 if samehashes:
return False
# renaming a file produces a different hash, even if the data
# remains unchanged. Check if it's the case (slow):
if self.renamed(node):
Matt Mackall
filelog: add hash-based comparisons...
r2887 t2 = self.read(node)
Matt Mackall
filelog.cmp: return 0 for equality...
r2895 return t2 != text
Matt Mackall
filelog: add hash-based comparisons...
r2887
Nicolas Dumazet
filelog: cmp: don't read data if hashes are identical (issue2273)...
r11541 return True
Sune Foldager
filelog: add file function to open other filelogs
r14287
def _file(self, f):
return filelog(self.opener, f)