upstream/mercurial-mirror Commit - r22778:80f2b63d

parsers: add a function to efficiently lowercase ASCII strings...

parsers: add a function to efficiently lowercase ASCII strings We need a way to efficiently lowercase ASCII strings. For example, 'hg status' needs to build up the fold map -- a map from a canonical case (for OS X, lowercase) to the actual case of each file and directory in the dirstate. The current way we do that is to try decoding to ASCII and then calling lower() on the string, labeled 'orig' below: str.decode('ascii') return str.lower() This is pretty inefficient, and it turns out we can do much better. I also tested out a condition-based approach, labeled 'cond' below: (c >= 'A' && c <= 'Z') ? (c + ('a' - 'A')) : c 'cond' turned out to be slower in all cases. A 256-byte lookup table with invalid values for everything past 127 performed similarly, but this was less verbose. On OS X 10.9 with LLVM version 6.0 (clang-600.0.51), the asciilower function was run against two corpuses. Corpus 1 (list of files from real-world repo, > 100k files): orig: wall 0.428567 comb 0.430000 user 0.430000 sys 0.000000 (best of 24) cond: wall 0.077204 comb 0.070000 user 0.070000 sys 0.000000 (best of 100) lookup: wall 0.060714 comb 0.060000 user 0.060000 sys 0.000000 (best of 100) Corpus 2 (mozilla-central, 113k files): orig: wall 0.238406 comb 0.240000 user 0.240000 sys 0.000000 (best of 42) cond: wall 0.040779 comb 0.040000 user 0.040000 sys 0.000000 (best of 100) lookup: wall 0.037623 comb 0.040000 user 0.040000 sys 0.000000 (best of 100) On a Linux server-class machine with GCC 4.4.6 20120305 (Red Hat 4.4.6-4): Corpus 1 (real-world repo, > 100k files): orig: wall 0.260899 comb 0.260000 user 0.260000 sys 0.000000 (best of 38) cond: wall 0.054818 comb 0.060000 user 0.060000 sys 0.000000 (best of 100) lookup: wall 0.048489 comb 0.050000 user 0.050000 sys 0.000000 (best of 100) Corpus 2 (mozilla-central, 113k files): orig: wall 0.153082 comb 0.150000 user 0.150000 sys 0.000000 (best of 65) cond: wall 0.031007 comb 0.040000 user 0.040000 sys 0.000000 (best of 100) lookup: wall 0.028793 comb 0.030000 user 0.030000 sys 0.000000 (best of 100) SSE instructions might help even more, but I didn't experiment with those.

Siddharth Agarwal -

r22778:80f2b63d default

parent child

Collapse all files

mercurial/encoding.py

0 +10 -1

              # encoding.py - character transcoding support for Mercurial
              #
              #  Copyright 2005-2009 Matt Mackall <mpm@selenic.com> and others
              #
              # This software may be used and distributed according to the terms of the
              # GNU General Public License version 2 or any later version.
-             import error
+             import error, parsers
              import unicodedata, locale, os
              def _getpreferredencoding():
                  '''
                  On darwin, getpreferredencoding ignores the locale environment and
                  always returns mac-roman. http://bugs.python.org/issue6202 fixes this
                  for Python 2.7 and up. This is the same corrected code for earlier
                  Python versions.
                  However, we can't use a version check for this method, as some distributions
                  patch Python to fix this. Instead, we use it as a 'fixer' for the mac-roman
                  encoding, as it is unlikely that this encoding is the actually expected.
                  '''
                  try:
                      locale.CODESET
                  except AttributeError:
                      # Fall back to parsing environment variables :-(
                      return locale.getdefaultlocale()[1]
                  oldloc = locale.setlocale(locale.LC_CTYPE)
                  locale.setlocale(locale.LC_CTYPE, "")
                  result = locale.nl_langinfo(locale.CODESET)
                  locale.setlocale(locale.LC_CTYPE, oldloc)
                  return result
              _encodingfixers = {
                  '646': lambda: 'ascii',
                  'ANSI_X3.4-1968': lambda: 'ascii',
                  'mac-roman': _getpreferredencoding
              }
              try:
                  encoding = os.environ.get("HGENCODING")
                  if not encoding:
                      encoding = locale.getpreferredencoding() or 'ascii'
                      encoding = _encodingfixers.get(encoding, lambda: encoding)()
              except locale.Error:
                  encoding = 'ascii'
              encodingmode = os.environ.get("HGENCODINGMODE", "strict")
              fallbackencoding = 'ISO-8859-1'
              class localstr(str):
                  '''This class allows strings that are unmodified to be
                  round-tripped to the local encoding and back'''
                  def __new__(cls, u, l):
                      s = str.__new__(cls, l)
                      s._utf8 = u
                      return s
                  def __hash__(self):
                      return hash(self._utf8) # avoid collisions in local string space
              def tolocal(s):
                  """
                  Convert a string from internal UTF-8 to local encoding
                  All internal strings should be UTF-8 but some repos before the
                  implementation of locale support may contain latin1 or possibly
                  other character sets. We attempt to decode everything strictly
                  using UTF-8, then Latin-1, and failing that, we use UTF-8 and
                  replace unknown characters.
                  The localstr class is used to cache the known UTF-8 encoding of
                  strings next to their local representation to allow lossless
                  round-trip conversion back to UTF-8.
                  >>> u = 'foo: \\xc3\\xa4' # utf-8
                  >>> l = tolocal(u)
                  >>> l
                  'foo: ?'
                  >>> fromlocal(l)
                  'foo: \\xc3\\xa4'
                  >>> u2 = 'foo: \\xc3\\xa1'
                  >>> d = { l: 1, tolocal(u2): 2 }
                  >>> len(d) # no collision
 
                  >>> 'foo: ?' in d
                  False
                  >>> l1 = 'foo: \\xe4' # historical latin1 fallback
                  >>> l = tolocal(l1)
                  >>> l
                  'foo: ?'
                  >>> fromlocal(l) # magically in utf-8
                  'foo: \\xc3\\xa4'
                  """
                  try:
                      try:
                          # make sure string is actually stored in UTF-8
                          u = s.decode('UTF-8')
                          if encoding == 'UTF-8':
                              # fast path
                              return s
                          r = u.encode(encoding, "replace")
                          if u == r.decode(encoding):
                              # r is a safe, non-lossy encoding of s
                              return r
                          return localstr(s, r)
                      except UnicodeDecodeError:
                          # we should only get here if we're looking at an ancient changeset
                          try:
                              u = s.decode(fallbackencoding)
                              r = u.encode(encoding, "replace")
                              if u == r.decode(encoding):
                                  # r is a safe, non-lossy encoding of s
                                  return r
                              return localstr(u.encode('UTF-8'), r)
                          except UnicodeDecodeError:
                              u = s.decode("utf-8", "replace") # last ditch
                              return u.encode(encoding, "replace") # can't round-trip
                  except LookupError, k:
                      raise error.Abort(k, hint="please check your locale settings")
              def fromlocal(s):
                  """
                  Convert a string from the local character encoding to UTF-8
                  We attempt to decode strings using the encoding mode set by
                  HGENCODINGMODE, which defaults to 'strict'. In this mode, unknown
                  characters will cause an error message. Other modes include
                  'replace', which replaces unknown characters with a special
                  Unicode character, and 'ignore', which drops the character.
                  """
                  # can we do a lossless round-trip?
                  if isinstance(s, localstr):
                      return s._utf8
                  try:
                      return s.decode(encoding, encodingmode).encode("utf-8")
                  except UnicodeDecodeError, inst:
                      sub = s[max(0, inst.start - 10):inst.start + 10]
                      raise error.Abort("decoding near '%s': %s!" % (sub, inst))
                  except LookupError, k:
                      raise error.Abort(k, hint="please check your locale settings")
              # How to treat ambiguous-width characters. Set to 'wide' to treat as wide.
              wide = (os.environ.get("HGENCODINGAMBIGUOUS", "narrow") == "wide"
                      and "WFA" or "WF")
              def colwidth(s):
                  "Find the column width of a string for display in the local encoding"
                  return ucolwidth(s.decode(encoding, 'replace'))
              def ucolwidth(d):
                  "Find the column width of a Unicode string for display"
                  eaw = getattr(unicodedata, 'east_asian_width', None)
                  if eaw is not None:
                      return sum([eaw(c) in wide and 2 or 1 for c in d])
                  return len(d)
              def getcols(s, start, c):
                  '''Use colwidth to find a c-column substring of s starting at byte
                  index start'''
                  for x in xrange(start + c, len(s)):
                      t = s[start:x]
                      if colwidth(t) == c:
                          return t
              def trim(s, width, ellipsis='', leftside=False):
                  """Trim string 's' to at most 'width' columns (including 'ellipsis').
                  If 'leftside' is True, left side of string 's' is trimmed.
                  'ellipsis' is always placed at trimmed side.
                  >>> ellipsis = '+++'
                  >>> from mercurial import encoding
                  >>> encoding.encoding = 'utf-8'
                  >>> t= '1234567890'
                  >>> print trim(t, 12, ellipsis=ellipsis)
                  1234567890
                  >>> print trim(t, 10, ellipsis=ellipsis)
                  1234567890
                  >>> print trim(t, 8, ellipsis=ellipsis)
 +++
                  >>> print trim(t, 8, ellipsis=ellipsis, leftside=True)
                  +++67890
                  >>> print trim(t, 8)
                  12345678
                  >>> print trim(t, 8, leftside=True)
                  34567890
                  >>> print trim(t, 3, ellipsis=ellipsis)
                  +++
                  >>> print trim(t, 1, ellipsis=ellipsis)
                  +
                  >>> u = u'\u3042\u3044\u3046\u3048\u304a' # 2 x 5 = 10 columns
                  >>> t = u.encode(encoding.encoding)
                  >>> print trim(t, 12, ellipsis=ellipsis)
                  \xe3\x81\x82\xe3\x81\x84\xe3\x81\x86\xe3\x81\x88\xe3\x81\x8a
                  >>> print trim(t, 10, ellipsis=ellipsis)
                  \xe3\x81\x82\xe3\x81\x84\xe3\x81\x86\xe3\x81\x88\xe3\x81\x8a
                  >>> print trim(t, 8, ellipsis=ellipsis)
                  \xe3\x81\x82\xe3\x81\x84+++
                  >>> print trim(t, 8, ellipsis=ellipsis, leftside=True)
                  +++\xe3\x81\x88\xe3\x81\x8a
                  >>> print trim(t, 5)
                  \xe3\x81\x82\xe3\x81\x84
                  >>> print trim(t, 5, leftside=True)
                  \xe3\x81\x88\xe3\x81\x8a
                  >>> print trim(t, 4, ellipsis=ellipsis)
                  +++
                  >>> print trim(t, 4, ellipsis=ellipsis, leftside=True)
                  +++
                  >>> t = '\x11\x22\x33\x44\x55\x66\x77\x88\x99\xaa' # invalid byte sequence
                  >>> print trim(t, 12, ellipsis=ellipsis)
                  \x11\x22\x33\x44\x55\x66\x77\x88\x99\xaa
                  >>> print trim(t, 10, ellipsis=ellipsis)
                  \x11\x22\x33\x44\x55\x66\x77\x88\x99\xaa
                  >>> print trim(t, 8, ellipsis=ellipsis)
                  \x11\x22\x33\x44\x55+++
                  >>> print trim(t, 8, ellipsis=ellipsis, leftside=True)
                  +++\x66\x77\x88\x99\xaa
                  >>> print trim(t, 8)
                  \x11\x22\x33\x44\x55\x66\x77\x88
                  >>> print trim(t, 8, leftside=True)
                  \x33\x44\x55\x66\x77\x88\x99\xaa
                  >>> print trim(t, 3, ellipsis=ellipsis)
                  +++
                  >>> print trim(t, 1, ellipsis=ellipsis)
                  +
                  """
                  try:
                      u = s.decode(encoding)
                  except UnicodeDecodeError:
                      if len(s) <= width: # trimming is not needed
                          return s
                      width -= len(ellipsis)
                      if width <= 0: # no enough room even for ellipsis
                          return ellipsis[:width + len(ellipsis)]
                      if leftside:
                          return ellipsis + s[-width:]
                      return s[:width] + ellipsis
                  if ucolwidth(u) <= width: # trimming is not needed
                      return s
                  width -= len(ellipsis)
                  if width <= 0: # no enough room even for ellipsis
                      return ellipsis[:width + len(ellipsis)]
                  if leftside:
                      uslice = lambda i: u[i:]
                      concat = lambda s: ellipsis + s
                  else:
                      uslice = lambda i: u[:-i]
                      concat = lambda s: s + ellipsis
                  for i in xrange(1, len(u)):
                      usub = uslice(i)
                      if ucolwidth(usub) <= width:
                          return concat(usub.encode(encoding))
                  return ellipsis # no enough room for multi-column characters
+             def asciilower(s):
+                 '''convert a string to lowercase if ASCII
+                 Raises UnicodeDecodeError if non-ASCII characters are found.'''
+                 s.decode('ascii')
+                 return s.lower()
+             asciilower = getattr(parsers, 'asciilower', asciilower)
              def lower(s):
                  "best-effort encoding-aware case-folding of local string s"
                  try:
                      s.decode('ascii') # throw exception for non-ASCII character
                      return s.lower()
                  except UnicodeDecodeError:
                      pass
                  try:
                      if isinstance(s, localstr):
                          u = s._utf8.decode("utf-8")
                      else:
                          u = s.decode(encoding, encodingmode)
                      lu = u.lower()
                      if u == lu:
                          return s # preserve localstring
                      return lu.encode(encoding)
                  except UnicodeError:
                      return s.lower() # we don't know how to fold this except in ASCII
                  except LookupError, k:
                      raise error.Abort(k, hint="please check your locale settings")
              def upper(s):
                  "best-effort encoding-aware case-folding of local string s"
                  try:
                      s.decode('ascii') # throw exception for non-ASCII character
                      return s.upper()
                  except UnicodeDecodeError:
                      pass
                  try:
                      if isinstance(s, localstr):
                          u = s._utf8.decode("utf-8")
                      else:
                          u = s.decode(encoding, encodingmode)
                      uu = u.upper()
                      if u == uu:
                          return s # preserve localstring
                      return uu.encode(encoding)
                  except UnicodeError:
                      return s.upper() # we don't know how to fold this except in ASCII
                  except LookupError, k:
                      raise error.Abort(k, hint="please check your locale settings")
              _jsonmap = {}
              def jsonescape(s):
                  '''returns a string suitable for JSON
                  JSON is problematic for us because it doesn't support non-Unicode
                  bytes. To deal with this, we take the following approach:
                  - localstr objects are converted back to UTF-8
                  - valid UTF-8/ASCII strings are passed as-is
                  - other strings are converted to UTF-8b surrogate encoding
                  - apply JSON-specified string escaping
                  (escapes are doubled in these tests)
                  >>> jsonescape('this is a test')
                  'this is a test'
                  >>> jsonescape('escape characters: \\0 \\x0b \\t \\n \\r \\" \\\\')
                  'escape characters: \\\\u0000 \\\\u000b \\\\t \\\\n \\\\r \\\\" \\\\\\\\'
                  >>> jsonescape('a weird byte: \\xdd')
                  'a weird byte: \\xed\\xb3\\x9d'
                  >>> jsonescape('utf-8: caf\\xc3\\xa9')
                  'utf-8: caf\\xc3\\xa9'
                  >>> jsonescape('')
                  ''
                  '''
                  if not _jsonmap:
                      for x in xrange(32):
                          _jsonmap[chr(x)] = "\u%04x" %x
                      for x in xrange(32, 256):
                          c = chr(x)
                          _jsonmap[c] = c
                      _jsonmap['\t'] = '\\t'
                      _jsonmap['\n'] = '\\n'
                      _jsonmap['\"'] = '\\"'
                      _jsonmap['\\'] = '\\\\'
                      _jsonmap['\b'] = '\\b'
                      _jsonmap['\f'] = '\\f'
                      _jsonmap['\r'] = '\\r'
                  return ''.join(_jsonmap[c] for c in toutf8b(s))
              def toutf8b(s):
                  '''convert a local, possibly-binary string into UTF-8b
                  This is intended as a generic method to preserve data when working
                  with schemes like JSON and XML that have no provision for
                  arbitrary byte strings. As Mercurial often doesn't know
                  what encoding data is in, we use so-called UTF-8b.
                  If a string is already valid UTF-8 (or ASCII), it passes unmodified.
                  Otherwise, unsupported bytes are mapped to UTF-16 surrogate range,
                  uDC00-uDCFF.
                  Principles of operation:
                  - ASCII and UTF-8 data successfully round-trips and is understood
                    by Unicode-oriented clients
                  - filenames and file contents in arbitrary other encodings can have
                    be round-tripped or recovered by clueful clients
                  - local strings that have a cached known UTF-8 encoding (aka
                    localstr) get sent as UTF-8 so Unicode-oriented clients get the
                    Unicode data they want
                  - because we must preserve UTF-8 bytestring in places such as
                    filenames, metadata can't be roundtripped without help
                  (Note: "UTF-8b" often refers to decoding a mix of valid UTF-8 and
                  arbitrary bytes into an internal Unicode format that can be
                  re-encoded back into the original. Here we are exposing the
                  internal surrogate encoding as a UTF-8 string.)
                  '''
                  if isinstance(s, localstr):
                      return s._utf8
                  try:
                      s.decode('utf-8')
                      return s
                  except UnicodeDecodeError:
                      # surrogate-encode any characters that don't round-trip
                      s2 = s.decode('utf-8', 'ignore').encode('utf-8')
                      r = ""
                      pos = 0
                      for c in s:
                          if s2[pos:pos + 1] == c:
                              r += c
                              pos += 1
                          else:
                              r += unichr(0xdc00 + ord(c)).encode('utf-8')
                      return r
              def fromutf8b(s):
                  '''Given a UTF-8b string, return a local, possibly-binary string.
                  return the original binary string. This
                  is a round-trip process for strings like filenames, but metadata
                  that's was passed through tolocal will remain in UTF-8.
                  >>> m = "\\xc3\\xa9\\x99abcd"
                  >>> n = toutf8b(m)
                  >>> n
                  '\\xc3\\xa9\\xed\\xb2\\x99abcd'
                  >>> fromutf8b(n) == m
                  True
                  '''
                  # fast path - look for uDxxx prefixes in s
                  if "\xed" not in s:
                      return s
                  u = s.decode("utf-8")
                  r = ""
                  for c in u:
                      if ord(c) & 0xff00 == 0xdc00:
                          r += chr(ord(c) & 0xff)
                      else:
                          r += c.encode("utf-8")
                  return r

mercurial/parsers.c

0 +55 0

              /*
               parsers.c - efficient content parsing
               Copyright 2008 Matt Mackall <mpm@selenic.com> and others
               This software may be used and distributed according to the terms of
               the GNU General Public License, incorporated herein by reference.
              */
              #include <Python.h>
              #include <ctype.h>
              #include <stddef.h>
              #include <string.h>
              #include "util.h"
              static char *versionerrortext = "Python minor version mismatch";
              static int8_t hextable[256] = {
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
 ,  1,  2,  3,  4,  5,  6,  7,  8,  9, -1, -1, -1, -1, -1, -1, /* 0-9 */
              	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* A-F */
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
              	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* a-f */
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
              	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1
              };
+             static char lowertable[128] = {
+             	'\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07',
+             	'\x08', '\x09', '\x0a', '\x0b', '\x0c', '\x0d', '\x0e', '\x0f',
+             	'\x10', '\x11', '\x12', '\x13', '\x14', '\x15', '\x16', '\x17',
+             	'\x18', '\x19', '\x1a', '\x1b', '\x1c', '\x1d', '\x1e', '\x1f',
+             	'\x20', '\x21', '\x22', '\x23', '\x24', '\x25', '\x26', '\x27',
+             	'\x28', '\x29', '\x2a', '\x2b', '\x2c', '\x2d', '\x2e', '\x2f',
+             	'\x30', '\x31', '\x32', '\x33', '\x34', '\x35', '\x36', '\x37',
+             	'\x38', '\x39', '\x3a', '\x3b', '\x3c', '\x3d', '\x3e', '\x3f',
+             	'\x40',
+             	        '\x61', '\x62', '\x63', '\x64', '\x65', '\x66', '\x67', /* A-G */
+             	'\x68', '\x69', '\x6a', '\x6b', '\x6c', '\x6d', '\x6e', '\x6f', /* H-O */
+             	'\x70', '\x71', '\x72', '\x73', '\x74', '\x75', '\x76', '\x77', /* P-W */
+             	'\x78', '\x79', '\x7a',                                         /* X-Z */
+             	                        '\x5b', '\x5c', '\x5d', '\x5e', '\x5f',
+             	'\x60', '\x61', '\x62', '\x63', '\x64', '\x65', '\x66', '\x67',
+             	'\x68', '\x69', '\x6a', '\x6b', '\x6c', '\x6d', '\x6e', '\x6f',
+             	'\x70', '\x71', '\x72', '\x73', '\x74', '\x75', '\x76', '\x77',
+             	'\x78', '\x79', '\x7a', '\x7b', '\x7c', '\x7d', '\x7e', '\x7f'
+             };
              static inline int hexdigit(const char *p, Py_ssize_t off)
              {
              	int8_t val = hextable[(unsigned char)p[off]];
              	if (val >= 0) {
              		return val;
              	}
              	PyErr_SetString(PyExc_ValueError, "input contains non-hex character");
              	return 0;
              }
              /*
               * Turn a hex-encoded string into binary.
               */
              static PyObject *unhexlify(const char *str, int len)
              {
              	PyObject *ret;
              	char *d;
              	int i;
              	ret = PyBytes_FromStringAndSize(NULL, len / 2);
              	if (!ret)
              		return NULL;
              	d = PyBytes_AsString(ret);
              	for (i = 0; i < len;) {
              		int hi = hexdigit(str, i++);
              		int lo = hexdigit(str, i++);
              		*d++ = (hi << 4) | lo;
              	}
              	return ret;
              }
+             static PyObject *asciilower(PyObject *self, PyObject *args)
+             {
+             	char *str, *newstr;
+             	int i, len;
+             	PyObject *newobj = NULL;
+             	if (!PyArg_ParseTuple(args, "s#", &str, &len))
+             		goto quit;
+             	newobj = PyBytes_FromStringAndSize(NULL, len);
+             	if (!newobj)
+             		goto quit;
+             	newstr = PyBytes_AS_STRING(newobj);
+             	for (i = 0; i < len; i++) {
+             		char c = str[i];
+             		if (c & 0x80) {
+             			PyObject *err = PyUnicodeDecodeError_Create(
+             				"ascii", str, len, i, (i + 1),
+             				"unexpected code byte");
+             			PyErr_SetObject(PyExc_UnicodeDecodeError, err);
+             			goto quit;
+             		}
+             		newstr[i] = lowertable[(unsigned char)c];
+             	}
+             	return newobj;
+             quit:
+             	Py_XDECREF(newobj);
+             	return NULL;
+             }
              /*
               * This code assumes that a manifest is stitched together with newline
               * ('\n') characters.
               */
              static PyObject *parse_manifest(PyObject *self, PyObject *args)
              {
              	PyObject *mfdict, *fdict;
              	char *str, *start, *end;
              	int len;
              	if (!PyArg_ParseTuple(args, "O!O!s#:parse_manifest",
              			      &PyDict_Type, &mfdict,
              			      &PyDict_Type, &fdict,
              			      &str, &len))
              		goto quit;
              	start = str;
              	end = str + len;
              	while (start < end) {
              		PyObject *file = NULL, *node = NULL;
              		PyObject *flags = NULL;
              		char *zero = NULL, *newline = NULL;
              		ptrdiff_t nlen;
              		zero = memchr(start, '\0', end - start);
              		if (!zero) {
              			PyErr_SetString(PyExc_ValueError,
              					"manifest entry has no separator");
              			goto quit;
              		}
              		newline = memchr(zero + 1, '\n', end - (zero + 1));
              		if (!newline) {
              			PyErr_SetString(PyExc_ValueError,
              					"manifest contains trailing garbage");
              			goto quit;
              		}
              		file = PyBytes_FromStringAndSize(start, zero - start);
              		if (!file)
              			goto bail;
              		nlen = newline - zero - 1;
              		node = unhexlify(zero + 1, nlen > 40 ? 40 : (int)nlen);
              		if (!node)
              			goto bail;
              		if (nlen > 40) {
              			flags = PyBytes_FromStringAndSize(zero + 41,
              							   nlen - 40);
              			if (!flags)
              				goto bail;
              			if (PyDict_SetItem(fdict, file, flags) == -1)
              				goto bail;
              		}
              		if (PyDict_SetItem(mfdict, file, node) == -1)
              			goto bail;
              		start = newline + 1;
              		Py_XDECREF(flags);
              		Py_XDECREF(node);
              		Py_XDECREF(file);
              		continue;
              	bail:
              		Py_XDECREF(flags);
              		Py_XDECREF(node);
              		Py_XDECREF(file);
              		goto quit;
              	}
              	Py_INCREF(Py_None);
              	return Py_None;
              quit:
              	return NULL;
              }
              static inline dirstateTupleObject *make_dirstate_tuple(char state, int mode,
              						       int size, int mtime)
              {
              	dirstateTupleObject *t = PyObject_New(dirstateTupleObject,
              					      &dirstateTupleType);
              	if (!t)
              		return NULL;
              	t->state = state;
              	t->mode = mode;
              	t->size = size;
              	t->mtime = mtime;
              	return t;
              }
              static PyObject *dirstate_tuple_new(PyTypeObject *subtype, PyObject *args,
              				    PyObject *kwds)
              {
              	/* We do all the initialization here and not a tp_init function because
              	 * dirstate_tuple is immutable. */
              	dirstateTupleObject *t;
              	char state;
              	int size, mode, mtime;
              	if (!PyArg_ParseTuple(args, "ciii", &state, &mode, &size, &mtime))
              		return NULL;
              	t = (dirstateTupleObject *)subtype->tp_alloc(subtype, 1);
              	if (!t)
              		return NULL;
              	t->state = state;
              	t->mode = mode;
              	t->size = size;
              	t->mtime = mtime;
              	return (PyObject *)t;
              }
              static void dirstate_tuple_dealloc(PyObject *o)
              {
              	PyObject_Del(o);
              }
              static Py_ssize_t dirstate_tuple_length(PyObject *o)
              {
              	return 4;
              }
              static PyObject *dirstate_tuple_item(PyObject *o, Py_ssize_t i)
              {
              	dirstateTupleObject *t = (dirstateTupleObject *)o;
              	switch (i) {
              	case 0:
              		return PyBytes_FromStringAndSize(&t->state, 1);
              	case 1:
              		return PyInt_FromLong(t->mode);
              	case 2:
              		return PyInt_FromLong(t->size);
              	case 3:
              		return PyInt_FromLong(t->mtime);
              	default:
              		PyErr_SetString(PyExc_IndexError, "index out of range");
              		return NULL;
              	}
              }
              static PySequenceMethods dirstate_tuple_sq = {
              	dirstate_tuple_length,     /* sq_length */
 ,                         /* sq_concat */
 ,                         /* sq_repeat */
              	dirstate_tuple_item,       /* sq_item */
 ,                         /* sq_ass_item */
 ,                         /* sq_contains */
 ,                         /* sq_inplace_concat */
 /* sq_inplace_repeat */
              };
              PyTypeObject dirstateTupleType = {
              	PyVarObject_HEAD_INIT(NULL, 0)
              	"dirstate_tuple",          /* tp_name */
              	sizeof(dirstateTupleObject),/* tp_basicsize */
 ,                         /* tp_itemsize */
              	(destructor)dirstate_tuple_dealloc, /* tp_dealloc */
 ,                         /* tp_print */
 ,                         /* tp_getattr */
 ,                         /* tp_setattr */
 ,                         /* tp_compare */
 ,                         /* tp_repr */
 ,                         /* tp_as_number */
              	&dirstate_tuple_sq,        /* tp_as_sequence */
 ,                         /* tp_as_mapping */
 ,                         /* tp_hash  */
 ,                         /* tp_call */
 ,                         /* tp_str */
 ,                         /* tp_getattro */
 ,                         /* tp_setattro */
 ,                         /* tp_as_buffer */
              	Py_TPFLAGS_DEFAULT,        /* tp_flags */
              	"dirstate tuple",          /* tp_doc */
 ,                         /* tp_traverse */
 ,                         /* tp_clear */
 ,                         /* tp_richcompare */
 ,                         /* tp_weaklistoffset */
 ,                         /* tp_iter */
 ,                         /* tp_iternext */
 ,                         /* tp_methods */
 ,                         /* tp_members */
 ,                         /* tp_getset */
 ,                         /* tp_base */
 ,                         /* tp_dict */
 ,                         /* tp_descr_get */
 ,                         /* tp_descr_set */
 ,                         /* tp_dictoffset */
 ,                         /* tp_init */
 ,                         /* tp_alloc */
              	dirstate_tuple_new,        /* tp_new */
              };
              static PyObject *parse_dirstate(PyObject *self, PyObject *args)
              {
              	PyObject *dmap, *cmap, *parents = NULL, *ret = NULL;
              	PyObject *fname = NULL, *cname = NULL, *entry = NULL;
              	char state, *cur, *str, *cpos;
              	int mode, size, mtime;
              	unsigned int flen, len, pos = 40;
              	int readlen;
              	if (!PyArg_ParseTuple(args, "O!O!s#:parse_dirstate",
              			      &PyDict_Type, &dmap,
              			      &PyDict_Type, &cmap,
              			      &str, &readlen))
              		goto quit;
              	if (readlen < 0)
              		goto quit;
              	len = readlen;
              	/* read parents */
              	if (len < 40)
              		goto quit;
              	parents = Py_BuildValue("s#s#", str, 20, str + 20, 20);
              	if (!parents)
              		goto quit;
              	/* read filenames */
              	while (pos >= 40 && pos < len) {
              		cur = str + pos;
              		/* unpack header */
              		state = *cur;
              		mode = getbe32(cur + 1);
              		size = getbe32(cur + 5);
              		mtime = getbe32(cur + 9);
              		flen = getbe32(cur + 13);
              		pos += 17;
              		cur += 17;
              		if (flen > len - pos) {
              			PyErr_SetString(PyExc_ValueError, "overflow in dirstate");
              			goto quit;
              		}
              		entry = (PyObject *)make_dirstate_tuple(state, mode, size,
              							mtime);
              		cpos = memchr(cur, 0, flen);
              		if (cpos) {
              			fname = PyBytes_FromStringAndSize(cur, cpos - cur);
              			cname = PyBytes_FromStringAndSize(cpos + 1,
              							   flen - (cpos - cur) - 1);
              			if (!fname || !cname ||
              			    PyDict_SetItem(cmap, fname, cname) == -1 ||
              			    PyDict_SetItem(dmap, fname, entry) == -1)
              				goto quit;
              			Py_DECREF(cname);
              		} else {
              			fname = PyBytes_FromStringAndSize(cur, flen);
              			if (!fname ||
              			    PyDict_SetItem(dmap, fname, entry) == -1)
              				goto quit;
              		}
              		Py_DECREF(fname);
              		Py_DECREF(entry);
              		fname = cname = entry = NULL;
              		pos += flen;
              	}
              	ret = parents;
              	Py_INCREF(ret);
              quit:
              	Py_XDECREF(fname);
              	Py_XDECREF(cname);
              	Py_XDECREF(entry);
              	Py_XDECREF(parents);
              	return ret;
              }
              /*
               * Efficiently pack a dirstate object into its on-disk format.
               */
              static PyObject *pack_dirstate(PyObject *self, PyObject *args)
              {
              	PyObject *packobj = NULL;
              	PyObject *map, *copymap, *pl, *mtime_unset = NULL;
              	Py_ssize_t nbytes, pos, l;
              	PyObject *k, *v, *pn;
              	char *p, *s;
              	double now;
              	if (!PyArg_ParseTuple(args, "O!O!Od:pack_dirstate",
              			      &PyDict_Type, &map, &PyDict_Type, &copymap,
              			      &pl, &now))
              		return NULL;
              	if (!PySequence_Check(pl) || PySequence_Size(pl) != 2) {
              		PyErr_SetString(PyExc_TypeError, "expected 2-element sequence");
              		return NULL;
              	}
              	/* Figure out how much we need to allocate. */
              	for (nbytes = 40, pos = 0; PyDict_Next(map, &pos, &k, &v);) {
              		PyObject *c;
              		if (!PyString_Check(k)) {
              			PyErr_SetString(PyExc_TypeError, "expected string key");
              			goto bail;
              		}
              		nbytes += PyString_GET_SIZE(k) + 17;
              		c = PyDict_GetItem(copymap, k);
              		if (c) {
              			if (!PyString_Check(c)) {
              				PyErr_SetString(PyExc_TypeError,
              						"expected string key");
              				goto bail;
              			}
              			nbytes += PyString_GET_SIZE(c) + 1;
              		}
              	}
              	packobj = PyString_FromStringAndSize(NULL, nbytes);
              	if (packobj == NULL)
              		goto bail;
              	p = PyString_AS_STRING(packobj);
              	pn = PySequence_ITEM(pl, 0);
              	if (PyString_AsStringAndSize(pn, &s, &l) == -1 || l != 20) {
              		PyErr_SetString(PyExc_TypeError, "expected a 20-byte hash");
              		goto bail;
              	}
              	memcpy(p, s, l);
              	p += 20;
              	pn = PySequence_ITEM(pl, 1);
              	if (PyString_AsStringAndSize(pn, &s, &l) == -1 || l != 20) {
              		PyErr_SetString(PyExc_TypeError, "expected a 20-byte hash");
              		goto bail;
              	}
              	memcpy(p, s, l);
              	p += 20;
              	for (pos = 0; PyDict_Next(map, &pos, &k, &v); ) {
              		dirstateTupleObject *tuple;
              		char state;
              		uint32_t mode, size, mtime;
              		Py_ssize_t len, l;
              		PyObject *o;
              		char *t;
              		if (!dirstate_tuple_check(v)) {
              			PyErr_SetString(PyExc_TypeError,
              					"expected a dirstate tuple");
              			goto bail;
              		}
              		tuple = (dirstateTupleObject *)v;
              		state = tuple->state;
              		mode = tuple->mode;
              		size = tuple->size;
              		mtime = tuple->mtime;
              		if (state == 'n' && mtime == (uint32_t)now) {
              			/* See pure/parsers.py:pack_dirstate for why we do
              			 * this. */
              			mtime = -1;
              			mtime_unset = (PyObject *)make_dirstate_tuple(
              				state, mode, size, mtime);
              			if (!mtime_unset)
              				goto bail;
              			if (PyDict_SetItem(map, k, mtime_unset) == -1)
              				goto bail;
              			Py_DECREF(mtime_unset);
              			mtime_unset = NULL;
              		}
              		*p++ = state;
              		putbe32(mode, p);
              		putbe32(size, p + 4);
              		putbe32(mtime, p + 8);
              		t = p + 12;
              		p += 16;
              		len = PyString_GET_SIZE(k);
              		memcpy(p, PyString_AS_STRING(k), len);
              		p += len;
              		o = PyDict_GetItem(copymap, k);
              		if (o) {
              			*p++ = '\0';
              			l = PyString_GET_SIZE(o);
              			memcpy(p, PyString_AS_STRING(o), l);
              			p += l;
              			len += l + 1;
              		}
              		putbe32((uint32_t)len, t);
              	}
              	pos = p - PyString_AS_STRING(packobj);
              	if (pos != nbytes) {
              		PyErr_Format(PyExc_SystemError, "bad dirstate size: %ld != %ld",
                                           (long)pos, (long)nbytes);
              		goto bail;
              	}
              	return packobj;
              bail:
              	Py_XDECREF(mtime_unset);
              	Py_XDECREF(packobj);
              	return NULL;
              }
              /*
               * A base-16 trie for fast node->rev mapping.
               *
               * Positive value is index of the next node in the trie
               * Negative value is a leaf: -(rev + 1)
               * Zero is empty
               */
              typedef struct {
              	int children[16];
              } nodetree;
              /*
               * This class has two behaviours.
               *
               * When used in a list-like way (with integer keys), we decode an
               * entry in a RevlogNG index file on demand. Our last entry is a
               * sentinel, always a nullid.  We have limited support for
               * integer-keyed insert and delete, only at elements right before the
               * sentinel.
               *
               * With string keys, we lazily perform a reverse mapping from node to
               * rev, using a base-16 trie.
               */
              typedef struct {
              	PyObject_HEAD
              	/* Type-specific fields go here. */
              	PyObject *data;        /* raw bytes of index */
              	PyObject **cache;      /* cached tuples */
              	const char **offsets;  /* populated on demand */
              	Py_ssize_t raw_length; /* original number of elements */
              	Py_ssize_t length;     /* current number of elements */
              	PyObject *added;       /* populated on demand */
              	PyObject *headrevs;    /* cache, invalidated on changes */
              	PyObject *filteredrevs;/* filtered revs set */
              	nodetree *nt;          /* base-16 trie */
              	int ntlength;          /* # nodes in use */
              	int ntcapacity;        /* # nodes allocated */
              	int ntdepth;           /* maximum depth of tree */
              	int ntsplits;          /* # splits performed */
              	int ntrev;             /* last rev scanned */
              	int ntlookups;         /* # lookups */
              	int ntmisses;          /* # lookups that miss the cache */
              	int inlined;
              } indexObject;
              static Py_ssize_t index_length(const indexObject *self)
              {
              	if (self->added == NULL)
              		return self->length;
              	return self->length + PyList_GET_SIZE(self->added);
              }
              static PyObject *nullentry;
              static const char nullid[20];
              static Py_ssize_t inline_scan(indexObject *self, const char **offsets);
              #if LONG_MAX == 0x7fffffffL
              static char *tuple_format = "Kiiiiiis#";
              #else
              static char *tuple_format = "kiiiiiis#";
              #endif
              /* A RevlogNG v1 index entry is 64 bytes long. */
              static const long v1_hdrsize = 64;
              /*
               * Return a pointer to the beginning of a RevlogNG record.
               */
              static const char *index_deref(indexObject *self, Py_ssize_t pos)
              {
              	if (self->inlined && pos > 0) {
              		if (self->offsets == NULL) {
              			self->offsets = malloc(self->raw_length *
              					       sizeof(*self->offsets));
              			if (self->offsets == NULL)
              				return (const char *)PyErr_NoMemory();
              			inline_scan(self, self->offsets);
              		}
              		return self->offsets[pos];
              	}
              	return PyString_AS_STRING(self->data) + pos * v1_hdrsize;
              }
              /*
               * RevlogNG format (all in big endian, data may be inlined):
               *    6 bytes: offset
               *    2 bytes: flags
               *    4 bytes: compressed length
               *    4 bytes: uncompressed length
               *    4 bytes: base revision
               *    4 bytes: link revision
               *    4 bytes: parent 1 revision
               *    4 bytes: parent 2 revision
               *   32 bytes: nodeid (only 20 bytes used)
               */
              static PyObject *index_get(indexObject *self, Py_ssize_t pos)
              {
              	uint64_t offset_flags;
              	int comp_len, uncomp_len, base_rev, link_rev, parent_1, parent_2;
              	const char *c_node_id;
              	const char *data;
              	Py_ssize_t length = index_length(self);
              	PyObject *entry;
              	if (pos < 0)
              		pos += length;
              	if (pos < 0 || pos >= length) {
              		PyErr_SetString(PyExc_IndexError, "revlog index out of range");
              		return NULL;
              	}
              	if (pos == length - 1) {
              		Py_INCREF(nullentry);
              		return nullentry;
              	}
              	if (pos >= self->length - 1) {
              		PyObject *obj;
              		obj = PyList_GET_ITEM(self->added, pos - self->length + 1);
              		Py_INCREF(obj);
              		return obj;
              	}
              	if (self->cache) {
              		if (self->cache[pos]) {
              			Py_INCREF(self->cache[pos]);
              			return self->cache[pos];
              		}
              	} else {
              		self->cache = calloc(self->raw_length, sizeof(PyObject *));
              		if (self->cache == NULL)
              			return PyErr_NoMemory();
              	}
              	data = index_deref(self, pos);
              	if (data == NULL)
              		return NULL;
              	offset_flags = getbe32(data + 4);
              	if (pos == 0) /* mask out version number for the first entry */
              		offset_flags &= 0xFFFF;
              	else {
              		uint32_t offset_high = getbe32(data);
              		offset_flags |= ((uint64_t)offset_high) << 32;
              	}
              	comp_len = getbe32(data + 8);
              	uncomp_len = getbe32(data + 12);
              	base_rev = getbe32(data + 16);
              	link_rev = getbe32(data + 20);
              	parent_1 = getbe32(data + 24);
              	parent_2 = getbe32(data + 28);
              	c_node_id = data + 32;
              	entry = Py_BuildValue(tuple_format, offset_flags, comp_len,
              			      uncomp_len, base_rev, link_rev,
              			      parent_1, parent_2, c_node_id, 20);
              	if (entry) {
              		PyObject_GC_UnTrack(entry);
              		Py_INCREF(entry);
              	}
              	self->cache[pos] = entry;
              	return entry;
              }
              /*
               * Return the 20-byte SHA of the node corresponding to the given rev.
               */
              static const char *index_node(indexObject *self, Py_ssize_t pos)
              {
              	Py_ssize_t length = index_length(self);
              	const char *data;
              	if (pos == length - 1 || pos == INT_MAX)
              		return nullid;
              	if (pos >= length)
              		return NULL;
              	if (pos >= self->length - 1) {
              		PyObject *tuple, *str;
              		tuple = PyList_GET_ITEM(self->added, pos - self->length + 1);
              		str = PyTuple_GetItem(tuple, 7);
              		return str ? PyString_AS_STRING(str) : NULL;
              	}
              	data = index_deref(self, pos);
              	return data ? data + 32 : NULL;
              }
              static int nt_insert(indexObject *self, const char *node, int rev);
              static int node_check(PyObject *obj, char **node, Py_ssize_t *nodelen)
              {
              	if (PyString_AsStringAndSize(obj, node, nodelen) == -1)
              		return -1;
              	if (*nodelen == 20)
              		return 0;
              	PyErr_SetString(PyExc_ValueError, "20-byte hash required");
              	return -1;
              }
              static PyObject *index_insert(indexObject *self, PyObject *args)
              {
              	PyObject *obj;
              	char *node;
              	int index;
              	Py_ssize_t len, nodelen;
              	if (!PyArg_ParseTuple(args, "iO", &index, &obj))
              		return NULL;
              	if (!PyTuple_Check(obj) || PyTuple_GET_SIZE(obj) != 8) {
              		PyErr_SetString(PyExc_TypeError, "8-tuple required");
              		return NULL;
              	}
              	if (node_check(PyTuple_GET_ITEM(obj, 7), &node, &nodelen) == -1)
              		return NULL;
              	len = index_length(self);
              	if (index < 0)
              		index += len;
              	if (index != len - 1) {
              		PyErr_SetString(PyExc_IndexError,
              				"insert only supported at index -1");
              		return NULL;
              	}
              	if (self->added == NULL) {
              		self->added = PyList_New(0);
              		if (self->added == NULL)
              			return NULL;
              	}
              	if (PyList_Append(self->added, obj) == -1)
              		return NULL;
              	if (self->nt)
              		nt_insert(self, node, index);
              	Py_CLEAR(self->headrevs);
              	Py_RETURN_NONE;
              }
              static void _index_clearcaches(indexObject *self)
              {
              	if (self->cache) {
              		Py_ssize_t i;
              		for (i = 0; i < self->raw_length; i++)
              			Py_CLEAR(self->cache[i]);
              		free(self->cache);
              		self->cache = NULL;
              	}
              	if (self->offsets) {
              		free(self->offsets);
              		self->offsets = NULL;
              	}
              	if (self->nt) {
              		free(self->nt);
              		self->nt = NULL;
              	}
              	Py_CLEAR(self->headrevs);
              }
              static PyObject *index_clearcaches(indexObject *self)
              {
              	_index_clearcaches(self);
              	self->ntlength = self->ntcapacity = 0;
              	self->ntdepth = self->ntsplits = 0;
              	self->ntrev = -1;
              	self->ntlookups = self->ntmisses = 0;
              	Py_RETURN_NONE;
              }
              static PyObject *index_stats(indexObject *self)
              {
              	PyObject *obj = PyDict_New();
              	if (obj == NULL)
              		return NULL;
              #define istat(__n, __d) \
              	if (PyDict_SetItemString(obj, __d, PyInt_FromSsize_t(self->__n)) == -1) \
              		goto bail;
              	if (self->added) {
              		Py_ssize_t len = PyList_GET_SIZE(self->added);
              		if (PyDict_SetItemString(obj, "index entries added",
              					 PyInt_FromSsize_t(len)) == -1)
              			goto bail;
              	}
              	if (self->raw_length != self->length - 1)
              		istat(raw_length, "revs on disk");
              	istat(length, "revs in memory");
              	istat(ntcapacity, "node trie capacity");
              	istat(ntdepth, "node trie depth");
              	istat(ntlength, "node trie count");
              	istat(ntlookups, "node trie lookups");
              	istat(ntmisses, "node trie misses");
              	istat(ntrev, "node trie last rev scanned");
              	istat(ntsplits, "node trie splits");
              #undef istat
              	return obj;
              bail:
              	Py_XDECREF(obj);
              	return NULL;
              }
              /*
               * When we cache a list, we want to be sure the caller can't mutate
               * the cached copy.
               */
              static PyObject *list_copy(PyObject *list)
              {
              	Py_ssize_t len = PyList_GET_SIZE(list);
              	PyObject *newlist = PyList_New(len);
              	Py_ssize_t i;
              	if (newlist == NULL)
              		return NULL;
              	for (i = 0; i < len; i++) {
              		PyObject *obj = PyList_GET_ITEM(list, i);
              		Py_INCREF(obj);
              		PyList_SET_ITEM(newlist, i, obj);
              	}
              	return newlist;
              }
              static int check_filter(PyObject *filter, Py_ssize_t arg) {
              	if (filter) {
              		PyObject *arglist, *result;
              		int isfiltered;
              		arglist = Py_BuildValue("(n)", arg);
              		if (!arglist) {
              			return -1;
              		}
              		result = PyEval_CallObject(filter, arglist);
              		Py_DECREF(arglist);
              		if (!result) {
              			return -1;
              		}
              		/* PyObject_IsTrue returns 1 if true, 0 if false, -1 if error,
              		 * same as this function, so we can just return it directly.*/
              		isfiltered = PyObject_IsTrue(result);
              		Py_DECREF(result);
              		return isfiltered;
              	} else {
              		return 0;
              	}
              }
              static PyObject *index_headrevs(indexObject *self, PyObject *args)
              {
              	Py_ssize_t i, len, addlen;
              	char *nothead = NULL;
              	PyObject *heads = NULL;
              	PyObject *filter = NULL;
              	PyObject *filteredrevs = Py_None;
              	if (!PyArg_ParseTuple(args, "|O", &filteredrevs)) {
              		return NULL;
              	}
              	if (self->headrevs && filteredrevs == self->filteredrevs)
              		return list_copy(self->headrevs);
              	Py_DECREF(self->filteredrevs);
              	self->filteredrevs = filteredrevs;
              	Py_INCREF(filteredrevs);
              	if (filteredrevs != Py_None) {
              		filter = PyObject_GetAttrString(filteredrevs, "__contains__");
              		if (!filter) {
              			PyErr_SetString(PyExc_TypeError,
              				"filteredrevs has no attribute __contains__");
              			goto bail;
              		}
              	}
              	len = index_length(self) - 1;
              	heads = PyList_New(0);
              	if (heads == NULL)
              		goto bail;
              	if (len == 0) {
              		PyObject *nullid = PyInt_FromLong(-1);
              		if (nullid == NULL || PyList_Append(heads, nullid) == -1) {
              			Py_XDECREF(nullid);
              			goto bail;
              		}
              		goto done;
              	}
              	nothead = calloc(len, 1);
              	if (nothead == NULL)
              		goto bail;
              	for (i = 0; i < self->raw_length; i++) {
              		const char *data;
              		int parent_1, parent_2, isfiltered;
              		isfiltered = check_filter(filter, i);
              		if (isfiltered == -1) {
              			PyErr_SetString(PyExc_TypeError,
              				"unable to check filter");
              			goto bail;
              		}
              		if (isfiltered) {
              			nothead[i] = 1;
              			continue;
              		}
              		data = index_deref(self, i);
              		parent_1 = getbe32(data + 24);
              		parent_2 = getbe32(data + 28);
              		if (parent_1 >= 0)
              			nothead[parent_1] = 1;
              		if (parent_2 >= 0)
              			nothead[parent_2] = 1;
              	}
              	addlen = self->added ? PyList_GET_SIZE(self->added) : 0;
              	for (i = 0; i < addlen; i++) {
              		PyObject *rev = PyList_GET_ITEM(self->added, i);
              		PyObject *p1 = PyTuple_GET_ITEM(rev, 5);
              		PyObject *p2 = PyTuple_GET_ITEM(rev, 6);
              		long parent_1, parent_2;
              		int isfiltered;
              		if (!PyInt_Check(p1) || !PyInt_Check(p2)) {
              			PyErr_SetString(PyExc_TypeError,
              					"revlog parents are invalid");
              			goto bail;
              		}
              		isfiltered = check_filter(filter, i);
              		if (isfiltered == -1) {
              			PyErr_SetString(PyExc_TypeError,
              				"unable to check filter");
              			goto bail;
              		}
              		if (isfiltered) {
              			nothead[i] = 1;
              			continue;
              		}
              		parent_1 = PyInt_AS_LONG(p1);
              		parent_2 = PyInt_AS_LONG(p2);
              		if (parent_1 >= 0)
              			nothead[parent_1] = 1;
              		if (parent_2 >= 0)
              			nothead[parent_2] = 1;
              	}
              	for (i = 0; i < len; i++) {
              		PyObject *head;
              		if (nothead[i])
              			continue;
              		head = PyInt_FromSsize_t(i);
              		if (head == NULL || PyList_Append(heads, head) == -1) {
              			Py_XDECREF(head);
              			goto bail;
              		}
              	}
              done:
              	self->headrevs = heads;
              	Py_XDECREF(filter);
              	free(nothead);
              	return list_copy(self->headrevs);
              bail:
              	Py_XDECREF(filter);
              	Py_XDECREF(heads);
              	free(nothead);
              	return NULL;
              }
              static inline int nt_level(const char *node, Py_ssize_t level)
              {
              	int v = node[level>>1];
              	if (!(level & 1))
              		v >>= 4;
              	return v & 0xf;
              }
              /*
               * Return values:
               *
               *   -4: match is ambiguous (multiple candidates)
               *   -2: not found
               * rest: valid rev
               */
              static int nt_find(indexObject *self, const char *node, Py_ssize_t nodelen,
              		   int hex)
              {
              	int (*getnybble)(const char *, Py_ssize_t) = hex ? hexdigit : nt_level;
              	int level, maxlevel, off;
              	if (nodelen == 20 && node[0] == '\0' && memcmp(node, nullid, 20) == 0)
              		return -1;
              	if (self->nt == NULL)
              		return -2;
              	if (hex)
              		maxlevel = nodelen > 40 ? 40 : (int)nodelen;
              	else
              		maxlevel = nodelen > 20 ? 40 : ((int)nodelen * 2);
              	for (level = off = 0; level < maxlevel; level++) {
              		int k = getnybble(node, level);
              		nodetree *n = &self->nt[off];
              		int v = n->children[k];
              		if (v < 0) {
              			const char *n;
              			Py_ssize_t i;
              			v = -v - 1;
              			n = index_node(self, v);
              			if (n == NULL)
              				return -2;
              			for (i = level; i < maxlevel; i++)
              				if (getnybble(node, i) != nt_level(n, i))
              					return -2;
              			return v;
              		}
              		if (v == 0)
              			return -2;
              		off = v;
              	}
              	/* multiple matches against an ambiguous prefix */
              	return -4;
              }
              static int nt_new(indexObject *self)
              {
              	if (self->ntlength == self->ntcapacity) {
              		self->ntcapacity *= 2;
              		self->nt = realloc(self->nt,
              				   self->ntcapacity * sizeof(nodetree));
              		if (self->nt == NULL) {
              			PyErr_SetString(PyExc_MemoryError, "out of memory");
              			return -1;
              		}
              		memset(&self->nt[self->ntlength], 0,
              		       sizeof(nodetree) * (self->ntcapacity - self->ntlength));
              	}
              	return self->ntlength++;
              }
              static int nt_insert(indexObject *self, const char *node, int rev)
              {
              	int level = 0;
              	int off = 0;
              	while (level < 40) {
              		int k = nt_level(node, level);
              		nodetree *n;
              		int v;
              		n = &self->nt[off];
              		v = n->children[k];
              		if (v == 0) {
              			n->children[k] = -rev - 1;
              			return 0;
              		}
              		if (v < 0) {
              			const char *oldnode = index_node(self, -v - 1);
              			int noff;
              			if (!oldnode || !memcmp(oldnode, node, 20)) {
              				n->children[k] = -rev - 1;
              				return 0;
              			}
              			noff = nt_new(self);
              			if (noff == -1)
              				return -1;
              			/* self->nt may have been changed by realloc */
              			self->nt[off].children[k] = noff;
              			off = noff;
              			n = &self->nt[off];
              			n->children[nt_level(oldnode, ++level)] = v;
              			if (level > self->ntdepth)
              				self->ntdepth = level;
              			self->ntsplits += 1;
              		} else {
              			level += 1;
              			off = v;
              		}
              	}
              	return -1;
              }
              static int nt_init(indexObject *self)
              {
              	if (self->nt == NULL) {
              		if (self->raw_length > INT_MAX) {
              			PyErr_SetString(PyExc_ValueError, "overflow in nt_init");
              			return -1;
              		}
              		self->ntcapacity = self->raw_length < 4
              			? 4 : (int)self->raw_length / 2;
              		self->nt = calloc(self->ntcapacity, sizeof(nodetree));
              		if (self->nt == NULL) {
              			PyErr_NoMemory();
              			return -1;
              		}
              		self->ntlength = 1;
              		self->ntrev = (int)index_length(self) - 1;
              		self->ntlookups = 1;
              		self->ntmisses = 0;
              		if (nt_insert(self, nullid, INT_MAX) == -1)
              			return -1;
              	}
              	return 0;
              }
              /*
               * Return values:
               *
               *   -3: error (exception set)
               *   -2: not found (no exception set)
               * rest: valid rev
               */
              static int index_find_node(indexObject *self,
              			   const char *node, Py_ssize_t nodelen)
              {
              	int rev;
              	self->ntlookups++;
              	rev = nt_find(self, node, nodelen, 0);
              	if (rev >= -1)
              		return rev;
              	if (nt_init(self) == -1)
              		return -3;
              	/*
              	 * For the first handful of lookups, we scan the entire index,
              	 * and cache only the matching nodes. This optimizes for cases
              	 * like "hg tip", where only a few nodes are accessed.
              	 *
              	 * After that, we cache every node we visit, using a single
              	 * scan amortized over multiple lookups.  This gives the best
              	 * bulk performance, e.g. for "hg log".
              	 */
              	if (self->ntmisses++ < 4) {
              		for (rev = self->ntrev - 1; rev >= 0; rev--) {
              			const char *n = index_node(self, rev);
              			if (n == NULL)
              				return -2;
              			if (memcmp(node, n, nodelen > 20 ? 20 : nodelen) == 0) {
              				if (nt_insert(self, n, rev) == -1)
              					return -3;
              				break;
              			}
              		}
              	} else {
              		for (rev = self->ntrev - 1; rev >= 0; rev--) {
              			const char *n = index_node(self, rev);
              			if (n == NULL) {
              				self->ntrev = rev + 1;
              				return -2;
              			}
              			if (nt_insert(self, n, rev) == -1) {
              				self->ntrev = rev + 1;
              				return -3;
              			}
              			if (memcmp(node, n, nodelen > 20 ? 20 : nodelen) == 0) {
              				break;
              			}
              		}
              		self->ntrev = rev;
              	}
              	if (rev >= 0)
              		return rev;
              	return -2;
              }
              static PyObject *raise_revlog_error(void)
              {
              	static PyObject *errclass;
              	PyObject *mod = NULL, *errobj;
              	if (errclass == NULL) {
              		PyObject *dict;
              		mod = PyImport_ImportModule("mercurial.error");
              		if (mod == NULL)
              			goto classfail;
              		dict = PyModule_GetDict(mod);
              		if (dict == NULL)
              			goto classfail;
              		errclass = PyDict_GetItemString(dict, "RevlogError");
              		if (errclass == NULL) {
              			PyErr_SetString(PyExc_SystemError,
              					"could not find RevlogError");
              			goto classfail;
              		}
              		Py_INCREF(errclass);
              	}
              	errobj = PyObject_CallFunction(errclass, NULL);
              	if (errobj == NULL)
              		return NULL;
              	PyErr_SetObject(errclass, errobj);
              	return errobj;
              classfail:
              	Py_XDECREF(mod);
              	return NULL;
              }
              static PyObject *index_getitem(indexObject *self, PyObject *value)
              {
              	char *node;
              	Py_ssize_t nodelen;
              	int rev;
              	if (PyInt_Check(value))
              		return index_get(self, PyInt_AS_LONG(value));
              	if (node_check(value, &node, &nodelen) == -1)
              		return NULL;
              	rev = index_find_node(self, node, nodelen);
              	if (rev >= -1)
              		return PyInt_FromLong(rev);
              	if (rev == -2)
              		raise_revlog_error();
              	return NULL;
              }
              static int nt_partialmatch(indexObject *self, const char *node,
              			   Py_ssize_t nodelen)
              {
              	int rev;
              	if (nt_init(self) == -1)
              		return -3;
              	if (self->ntrev > 0) {
              		/* ensure that the radix tree is fully populated */
              		for (rev = self->ntrev - 1; rev >= 0; rev--) {
              			const char *n = index_node(self, rev);
              			if (n == NULL)
              				return -2;
              			if (nt_insert(self, n, rev) == -1)
              				return -3;
              		}
              		self->ntrev = rev;
              	}
              	return nt_find(self, node, nodelen, 1);
              }
              static PyObject *index_partialmatch(indexObject *self, PyObject *args)
              {
              	const char *fullnode;
              	int nodelen;
              	char *node;
              	int rev, i;
              	if (!PyArg_ParseTuple(args, "s#", &node, &nodelen))
              		return NULL;
              	if (nodelen < 4) {
              		PyErr_SetString(PyExc_ValueError, "key too short");
              		return NULL;
              	}
              	if (nodelen > 40) {
              		PyErr_SetString(PyExc_ValueError, "key too long");
              		return NULL;
              	}
              	for (i = 0; i < nodelen; i++)
              		hexdigit(node, i);
              	if (PyErr_Occurred()) {
              		/* input contains non-hex characters */
              		PyErr_Clear();
              		Py_RETURN_NONE;
              	}
              	rev = nt_partialmatch(self, node, nodelen);
              	switch (rev) {
              	case -4:
              		raise_revlog_error();
              	case -3:
              		return NULL;
              	case -2:
              		Py_RETURN_NONE;
              	case -1:
              		return PyString_FromStringAndSize(nullid, 20);
              	}
              	fullnode = index_node(self, rev);
              	if (fullnode == NULL) {
              		PyErr_Format(PyExc_IndexError,
              			     "could not access rev %d", rev);
              		return NULL;
              	}
              	return PyString_FromStringAndSize(fullnode, 20);
              }
              static PyObject *index_m_get(indexObject *self, PyObject *args)
              {
              	Py_ssize_t nodelen;
              	PyObject *val;
              	char *node;
              	int rev;
              	if (!PyArg_ParseTuple(args, "O", &val))
              		return NULL;
              	if (node_check(val, &node, &nodelen) == -1)
              		return NULL;
              	rev = index_find_node(self, node, nodelen);
              	if (rev ==  -3)
              		return NULL;
              	if (rev == -2)
              		Py_RETURN_NONE;
              	return PyInt_FromLong(rev);
              }
              static int index_contains(indexObject *self, PyObject *value)
              {
              	char *node;
              	Py_ssize_t nodelen;
              	if (PyInt_Check(value)) {
              		long rev = PyInt_AS_LONG(value);
              		return rev >= -1 && rev < index_length(self);
              	}
              	if (node_check(value, &node, &nodelen) == -1)
              		return -1;
              	switch (index_find_node(self, node, nodelen)) {
              	case -3:
              		return -1;
              	case -2:
              		return 0;
              	default:
              		return 1;
              	}
              }
              static inline void index_get_parents(indexObject *self, int rev, int *ps)
              {
              	if (rev >= self->length - 1) {
              		PyObject *tuple = PyList_GET_ITEM(self->added,
              						  rev - self->length + 1);
              		ps[0] = (int)PyInt_AS_LONG(PyTuple_GET_ITEM(tuple, 5));
              		ps[1] = (int)PyInt_AS_LONG(PyTuple_GET_ITEM(tuple, 6));
              	} else {
              		const char *data = index_deref(self, rev);
              		ps[0] = getbe32(data + 24);
              		ps[1] = getbe32(data + 28);
              	}
              }
              typedef uint64_t bitmask;
              /*
               * Given a disjoint set of revs, return all candidates for the
               * greatest common ancestor. In revset notation, this is the set
               * "heads(::a and ::b and ...)"
               */
              static PyObject *find_gca_candidates(indexObject *self, const int *revs,
              				     int revcount)
              {
              	const bitmask allseen = (1ull << revcount) - 1;
              	const bitmask poison = 1ull << revcount;
              	PyObject *gca = PyList_New(0);
              	int i, v, interesting;
              	int maxrev = -1;
              	bitmask sp;
              	bitmask *seen;
              	if (gca == NULL)
              		return PyErr_NoMemory();
              	for (i = 0; i < revcount; i++) {
              		if (revs[i] > maxrev)
              			maxrev = revs[i];
              	}
              	seen = calloc(sizeof(*seen), maxrev + 1);
              	if (seen == NULL) {
              		Py_DECREF(gca);
              		return PyErr_NoMemory();
              	}
              	for (i = 0; i < revcount; i++)
              		seen[revs[i]] = 1ull << i;
              	interesting = revcount;
              	for (v = maxrev; v >= 0 && interesting; v--) {
              		bitmask sv = seen[v];
              		int parents[2];
              		if (!sv)
              			continue;
              		if (sv < poison) {
              			interesting -= 1;
              			if (sv == allseen) {
              				PyObject *obj = PyInt_FromLong(v);
              				if (obj == NULL)
              					goto bail;
              				if (PyList_Append(gca, obj) == -1) {
              					Py_DECREF(obj);
              					goto bail;
              				}
              				sv |= poison;
              				for (i = 0; i < revcount; i++) {
              					if (revs[i] == v)
              						goto done;
              				}
              			}
              		}
              		index_get_parents(self, v, parents);
              		for (i = 0; i < 2; i++) {
              			int p = parents[i];
              			if (p == -1)
              				continue;
              			sp = seen[p];
              			if (sv < poison) {
              				if (sp == 0) {
              					seen[p] = sv;
              					interesting++;
              				}
              				else if (sp != sv)
              					seen[p] |= sv;
              			} else {
              				if (sp && sp < poison)
              					interesting--;
              				seen[p] = sv;
              			}
              		}
              	}
              done:
              	free(seen);
              	return gca;
              bail:
              	free(seen);
              	Py_XDECREF(gca);
              	return NULL;
              }
              /*
               * Given a disjoint set of revs, return the subset with the longest
               * path to the root.
               */
              static PyObject *find_deepest(indexObject *self, PyObject *revs)
              {
              	const Py_ssize_t revcount = PyList_GET_SIZE(revs);
              	static const Py_ssize_t capacity = 24;
              	int *depth, *interesting = NULL;
              	int i, j, v, ninteresting;
              	PyObject *dict = NULL, *keys = NULL;
              	long *seen = NULL;
              	int maxrev = -1;
              	long final;
              	if (revcount > capacity) {
              		PyErr_Format(PyExc_OverflowError,
              			     "bitset size (%ld) > capacity (%ld)",
              			     (long)revcount, (long)capacity);
              		return NULL;
              	}
              	for (i = 0; i < revcount; i++) {
              		int n = (int)PyInt_AsLong(PyList_GET_ITEM(revs, i));
              		if (n > maxrev)
              			maxrev = n;
              	}
              	depth = calloc(sizeof(*depth), maxrev + 1);
              	if (depth == NULL)
              		return PyErr_NoMemory();
              	seen = calloc(sizeof(*seen), maxrev + 1);
              	if (seen == NULL) {
              		PyErr_NoMemory();
              		goto bail;
              	}
              	interesting = calloc(sizeof(*interesting), 2 << revcount);
              	if (interesting == NULL) {
              		PyErr_NoMemory();
              		goto bail;
              	}
              	if (PyList_Sort(revs) == -1)
              		goto bail;
              	for (i = 0; i < revcount; i++) {
              		int n = (int)PyInt_AsLong(PyList_GET_ITEM(revs, i));
              		long b = 1l << i;
              		depth[n] = 1;
              		seen[n] = b;
              		interesting[b] = 1;
              	}
              	ninteresting = (int)revcount;
              	for (v = maxrev; v >= 0 && ninteresting > 1; v--) {
              		int dv = depth[v];
              		int parents[2];
              		long sv;
              		if (dv == 0)
              			continue;
              		sv = seen[v];
              		index_get_parents(self, v, parents);
              		for (i = 0; i < 2; i++) {
              			int p = parents[i];
              			long nsp, sp;
              			int dp;
              			if (p == -1)
              				continue;
              			dp = depth[p];
              			nsp = sp = seen[p];
              			if (dp <= dv) {
              				depth[p] = dv + 1;
              				if (sp != sv) {
              					interesting[sv] += 1;
              					nsp = seen[p] = sv;
              					if (sp) {
              						interesting[sp] -= 1;
              						if (interesting[sp] == 0)
              							ninteresting -= 1;
              					}
              				}
              			}
              			else if (dv == dp - 1) {
              				nsp = sp | sv;
              				if (nsp == sp)
              					continue;
              				seen[p] = nsp;
              				interesting[sp] -= 1;
              				if (interesting[sp] == 0 && interesting[nsp] > 0)
              					ninteresting -= 1;
              				interesting[nsp] += 1;
              			}
              		}
              		interesting[sv] -= 1;
              		if (interesting[sv] == 0)
              			ninteresting -= 1;
              	}
              	final = 0;
              	j = ninteresting;
              	for (i = 0; i < (int)(2 << revcount) && j > 0; i++) {
              		if (interesting[i] == 0)
              			continue;
              		final |= i;
              		j -= 1;
              	}
              	if (final == 0) {
              		keys = PyList_New(0);
              		goto bail;
              	}
              	dict = PyDict_New();
              	if (dict == NULL)
              		goto bail;
              	for (i = 0; i < revcount; i++) {
              		PyObject *key;
              		if ((final & (1 << i)) == 0)
              			continue;
              		key = PyList_GET_ITEM(revs, i);
              		Py_INCREF(key);
              		Py_INCREF(Py_None);
              		if (PyDict_SetItem(dict, key, Py_None) == -1) {
              			Py_DECREF(key);
              			Py_DECREF(Py_None);
              			goto bail;
              		}
              	}
              	keys = PyDict_Keys(dict);
              bail:
              	free(depth);
              	free(seen);
              	free(interesting);
              	Py_XDECREF(dict);
              	return keys;
              }
              /*
               * Given a (possibly overlapping) set of revs, return the greatest
               * common ancestors: those with the longest path to the root.
               */
              static PyObject *index_ancestors(indexObject *self, PyObject *args)
              {
              	PyObject *ret = NULL, *gca = NULL;
              	Py_ssize_t argcount, i, len;
              	bitmask repeat = 0;
              	int revcount = 0;
              	int *revs;
              	argcount = PySequence_Length(args);
              	revs = malloc(argcount * sizeof(*revs));
              	if (argcount > 0 && revs == NULL)
              		return PyErr_NoMemory();
              	len = index_length(self) - 1;
              	for (i = 0; i < argcount; i++) {
              		static const int capacity = 24;
              		PyObject *obj = PySequence_GetItem(args, i);
              		bitmask x;
              		long val;
              		if (!PyInt_Check(obj)) {
              			PyErr_SetString(PyExc_TypeError,
              					"arguments must all be ints");
              			goto bail;
              		}
              		val = PyInt_AsLong(obj);
              		if (val == -1) {
              			ret = PyList_New(0);
              			goto done;
              		}
              		if (val < 0 || val >= len) {
              			PyErr_SetString(PyExc_IndexError,
              					"index out of range");
              			goto bail;
              		}
              		/* this cheesy bloom filter lets us avoid some more
              		 * expensive duplicate checks in the common set-is-disjoint
              		 * case */
              		x = 1ull << (val & 0x3f);
              		if (repeat & x) {
              			int k;
              			for (k = 0; k < revcount; k++) {
              				if (val == revs[k])
              					goto duplicate;
              			}
              		}
              		else repeat |= x;
              		if (revcount >= capacity) {
              			PyErr_Format(PyExc_OverflowError,
              				     "bitset size (%d) > capacity (%d)",
              				     revcount, capacity);
              			goto bail;
              		}
              		revs[revcount++] = (int)val;
              	duplicate:;
              	}
              	if (revcount == 0) {
              		ret = PyList_New(0);
              		goto done;
              	}
              	if (revcount == 1) {
              		PyObject *obj;
              		ret = PyList_New(1);
              		if (ret == NULL)
              			goto bail;
              		obj = PyInt_FromLong(revs[0]);
              		if (obj == NULL)
              			goto bail;
              		PyList_SET_ITEM(ret, 0, obj);
              		goto done;
              	}
              	gca = find_gca_candidates(self, revs, revcount);
              	if (gca == NULL)
              		goto bail;
              	if (PyList_GET_SIZE(gca) <= 1) {
              		ret = gca;
              		Py_INCREF(gca);
              	}
              	else ret = find_deepest(self, gca);
              done:
              	free(revs);
              	Py_XDECREF(gca);
              	return ret;
              bail:
              	free(revs);
              	Py_XDECREF(gca);
              	Py_XDECREF(ret);
              	return NULL;
              }
              /*
               * Given a (possibly overlapping) set of revs, return all the
               * common ancestors heads: heads(::args[0] and ::a[1] and ...)
               */
              static PyObject *index_commonancestorsheads(indexObject *self, PyObject *args)
              {
              	PyObject *ret = NULL;
              	Py_ssize_t argcount, i, len;
              	bitmask repeat = 0;
              	int revcount = 0;
              	int *revs;
              	argcount = PySequence_Length(args);
              	revs = malloc(argcount * sizeof(*revs));
              	if (argcount > 0 && revs == NULL)
              		return PyErr_NoMemory();
              	len = index_length(self) - 1;
              	for (i = 0; i < argcount; i++) {
              		static const int capacity = 24;
              		PyObject *obj = PySequence_GetItem(args, i);
              		bitmask x;
              		long val;
              		if (!PyInt_Check(obj)) {
              			PyErr_SetString(PyExc_TypeError,
              					"arguments must all be ints");
              			goto bail;
              		}
              		val = PyInt_AsLong(obj);
              		if (val == -1) {
              			ret = PyList_New(0);
              			goto done;
              		}
              		if (val < 0 || val >= len) {
              			PyErr_SetString(PyExc_IndexError,
              					"index out of range");
              			goto bail;
              		}
              		/* this cheesy bloom filter lets us avoid some more
              		 * expensive duplicate checks in the common set-is-disjoint
              		 * case */
              		x = 1ull << (val & 0x3f);
              		if (repeat & x) {
              			int k;
              			for (k = 0; k < revcount; k++) {
              				if (val == revs[k])
              					goto duplicate;
              			}
              		}
              		else repeat |= x;
              		if (revcount >= capacity) {
              			PyErr_Format(PyExc_OverflowError,
              				     "bitset size (%d) > capacity (%d)",
              				     revcount, capacity);
              			goto bail;
              		}
              		revs[revcount++] = (int)val;
              	duplicate:;
              	}
              	if (revcount == 0) {
              		ret = PyList_New(0);
              		goto done;
              	}
              	if (revcount == 1) {
              		PyObject *obj;
              		ret = PyList_New(1);
              		if (ret == NULL)
              			goto bail;
              		obj = PyInt_FromLong(revs[0]);
              		if (obj == NULL)
              			goto bail;
              		PyList_SET_ITEM(ret, 0, obj);
              		goto done;
              	}
              	ret = find_gca_candidates(self, revs, revcount);
              	if (ret == NULL)
              		goto bail;
              done:
              	free(revs);
              	return ret;
              bail:
              	free(revs);
              	Py_XDECREF(ret);
              	return NULL;
              }
              /*
               * Invalidate any trie entries introduced by added revs.
               */
              static void nt_invalidate_added(indexObject *self, Py_ssize_t start)
              {
              	Py_ssize_t i, len = PyList_GET_SIZE(self->added);
              	for (i = start; i < len; i++) {
              		PyObject *tuple = PyList_GET_ITEM(self->added, i);
              		PyObject *node = PyTuple_GET_ITEM(tuple, 7);
              		nt_insert(self, PyString_AS_STRING(node), -1);
              	}
              	if (start == 0)
              		Py_CLEAR(self->added);
              }
              /*
               * Delete a numeric range of revs, which must be at the end of the
               * range, but exclude the sentinel nullid entry.
               */
              static int index_slice_del(indexObject *self, PyObject *item)
              {
              	Py_ssize_t start, stop, step, slicelength;
              	Py_ssize_t length = index_length(self);
              	int ret = 0;
              	if (PySlice_GetIndicesEx((PySliceObject*)item, length,
              				 &start, &stop, &step, &slicelength) < 0)
              		return -1;
              	if (slicelength <= 0)
              		return 0;
              	if ((step < 0 && start < stop) || (step > 0 && start > stop))
              		stop = start;
              	if (step < 0) {
              		stop = start + 1;
              		start = stop + step*(slicelength - 1) - 1;
              		step = -step;
              	}
              	if (step != 1) {
              		PyErr_SetString(PyExc_ValueError,
              				"revlog index delete requires step size of 1");
              		return -1;
              	}
              	if (stop != length - 1) {
              		PyErr_SetString(PyExc_IndexError,
              				"revlog index deletion indices are invalid");
              		return -1;
              	}
              	if (start < self->length - 1) {
              		if (self->nt) {
              			Py_ssize_t i;
              			for (i = start + 1; i < self->length - 1; i++) {
              				const char *node = index_node(self, i);
              				if (node)
              					nt_insert(self, node, -1);
              			}
              			if (self->added)
              				nt_invalidate_added(self, 0);
              			if (self->ntrev > start)
              				self->ntrev = (int)start;
              		}
              		self->length = start + 1;
              		if (start < self->raw_length) {
              			if (self->cache) {
              				Py_ssize_t i;
              				for (i = start; i < self->raw_length; i++)
              					Py_CLEAR(self->cache[i]);
              			}
              			self->raw_length = start;
              		}
              		goto done;
              	}
              	if (self->nt) {
              		nt_invalidate_added(self, start - self->length + 1);
              		if (self->ntrev > start)
              			self->ntrev = (int)start;
              	}
              	if (self->added)
              		ret = PyList_SetSlice(self->added, start - self->length + 1,
              				      PyList_GET_SIZE(self->added), NULL);
              done:
              	Py_CLEAR(self->headrevs);
              	return ret;
              }
              /*
               * Supported ops:
               *
               * slice deletion
               * string assignment (extend node->rev mapping)
               * string deletion (shrink node->rev mapping)
               */
              static int index_assign_subscript(indexObject *self, PyObject *item,
              				  PyObject *value)
              {
              	char *node;
              	Py_ssize_t nodelen;
              	long rev;
              	if (PySlice_Check(item) && value == NULL)
              		return index_slice_del(self, item);
              	if (node_check(item, &node, &nodelen) == -1)
              		return -1;
              	if (value == NULL)
              		return self->nt ? nt_insert(self, node, -1) : 0;
              	rev = PyInt_AsLong(value);
              	if (rev > INT_MAX || rev < 0) {
              		if (!PyErr_Occurred())
              			PyErr_SetString(PyExc_ValueError, "rev out of range");
              		return -1;
              	}
              	return nt_insert(self, node, (int)rev);
              }
              /*
               * Find all RevlogNG entries in an index that has inline data. Update
               * the optional "offsets" table with those entries.
               */
              static Py_ssize_t inline_scan(indexObject *self, const char **offsets)
              {
              	const char *data = PyString_AS_STRING(self->data);
              	Py_ssize_t pos = 0;
              	Py_ssize_t end = PyString_GET_SIZE(self->data);
              	long incr = v1_hdrsize;
              	Py_ssize_t len = 0;
              	while (pos + v1_hdrsize <= end && pos >= 0) {
              		uint32_t comp_len;
              		/* 3rd element of header is length of compressed inline data */
              		comp_len = getbe32(data + pos + 8);
              		incr = v1_hdrsize + comp_len;
              		if (offsets)
              			offsets[len] = data + pos;
              		len++;
              		pos += incr;
              	}
              	if (pos != end) {
              		if (!PyErr_Occurred())
              			PyErr_SetString(PyExc_ValueError, "corrupt index file");
              		return -1;
              	}
              	return len;
              }
              static int index_init(indexObject *self, PyObject *args)
              {
              	PyObject *data_obj, *inlined_obj;
              	Py_ssize_t size;
              	/* Initialize before argument-checking to avoid index_dealloc() crash. */
              	self->raw_length = 0;
              	self->added = NULL;
              	self->cache = NULL;
              	self->data = NULL;
              	self->headrevs = NULL;
              	self->filteredrevs = Py_None;
              	Py_INCREF(Py_None);
              	self->nt = NULL;
              	self->offsets = NULL;
              	if (!PyArg_ParseTuple(args, "OO", &data_obj, &inlined_obj))
              		return -1;
              	if (!PyString_Check(data_obj)) {
              		PyErr_SetString(PyExc_TypeError, "data is not a string");
              		return -1;
              	}
              	size = PyString_GET_SIZE(data_obj);
              	self->inlined = inlined_obj && PyObject_IsTrue(inlined_obj);
              	self->data = data_obj;
              	self->ntlength = self->ntcapacity = 0;
              	self->ntdepth = self->ntsplits = 0;
              	self->ntlookups = self->ntmisses = 0;
              	self->ntrev = -1;
              	Py_INCREF(self->data);
              	if (self->inlined) {
              		Py_ssize_t len = inline_scan(self, NULL);
              		if (len == -1)
              			goto bail;
              		self->raw_length = len;
              		self->length = len + 1;
              	} else {
              		if (size % v1_hdrsize) {
              			PyErr_SetString(PyExc_ValueError, "corrupt index file");
              			goto bail;
              		}
              		self->raw_length = size / v1_hdrsize;
              		self->length = self->raw_length + 1;
              	}
              	return 0;
              bail:
              	return -1;
              }
              static PyObject *index_nodemap(indexObject *self)
              {
              	Py_INCREF(self);
              	return (PyObject *)self;
              }
              static void index_dealloc(indexObject *self)
              {
              	_index_clearcaches(self);
              	Py_XDECREF(self->filteredrevs);
              	Py_XDECREF(self->data);
              	Py_XDECREF(self->added);
              	PyObject_Del(self);
              }
              static PySequenceMethods index_sequence_methods = {
              	(lenfunc)index_length,   /* sq_length */
 ,                       /* sq_concat */
 ,                       /* sq_repeat */
              	(ssizeargfunc)index_get, /* sq_item */
 ,                       /* sq_slice */
 ,                       /* sq_ass_item */
 ,                       /* sq_ass_slice */
              	(objobjproc)index_contains, /* sq_contains */
              };
              static PyMappingMethods index_mapping_methods = {
              	(lenfunc)index_length,                 /* mp_length */
              	(binaryfunc)index_getitem,             /* mp_subscript */
              	(objobjargproc)index_assign_subscript, /* mp_ass_subscript */
              };
              static PyMethodDef index_methods[] = {
              	{"ancestors", (PyCFunction)index_ancestors, METH_VARARGS,
              	 "return the gca set of the given revs"},
              	{"commonancestorsheads", (PyCFunction)index_commonancestorsheads,
              	  METH_VARARGS,
              	  "return the heads of the common ancestors of the given revs"},
              	{"clearcaches", (PyCFunction)index_clearcaches, METH_NOARGS,
              	 "clear the index caches"},
              	{"get", (PyCFunction)index_m_get, METH_VARARGS,
              	 "get an index entry"},
              	{"headrevs", (PyCFunction)index_headrevs, METH_VARARGS,
              	 "get head revisions"},
              	{"insert", (PyCFunction)index_insert, METH_VARARGS,
              	 "insert an index entry"},
              	{"partialmatch", (PyCFunction)index_partialmatch, METH_VARARGS,
              	 "match a potentially ambiguous node ID"},
              	{"stats", (PyCFunction)index_stats, METH_NOARGS,
              	 "stats for the index"},
              	{NULL} /* Sentinel */
              };
              static PyGetSetDef index_getset[] = {
              	{"nodemap", (getter)index_nodemap, NULL, "nodemap", NULL},
              	{NULL} /* Sentinel */
              };
              static PyTypeObject indexType = {
              	PyObject_HEAD_INIT(NULL)
 ,                         /* ob_size */
              	"parsers.index",           /* tp_name */
              	sizeof(indexObject),       /* tp_basicsize */
 ,                         /* tp_itemsize */
              	(destructor)index_dealloc, /* tp_dealloc */
 ,                         /* tp_print */
 ,                         /* tp_getattr */
 ,                         /* tp_setattr */
 ,                         /* tp_compare */
 ,                         /* tp_repr */
 ,                         /* tp_as_number */
              	&index_sequence_methods,   /* tp_as_sequence */
              	&index_mapping_methods,    /* tp_as_mapping */
 ,                         /* tp_hash */
 ,                         /* tp_call */
 ,                         /* tp_str */
 ,                         /* tp_getattro */
 ,                         /* tp_setattro */
 ,                         /* tp_as_buffer */
              	Py_TPFLAGS_DEFAULT,        /* tp_flags */
              	"revlog index",            /* tp_doc */
 ,                         /* tp_traverse */
 ,                         /* tp_clear */
 ,                         /* tp_richcompare */
 ,                         /* tp_weaklistoffset */
 ,                         /* tp_iter */
 ,                         /* tp_iternext */
              	index_methods,             /* tp_methods */
 ,                         /* tp_members */
              	index_getset,              /* tp_getset */
 ,                         /* tp_base */
 ,                         /* tp_dict */
 ,                         /* tp_descr_get */
 ,                         /* tp_descr_set */
 ,                         /* tp_dictoffset */
              	(initproc)index_init,      /* tp_init */
 ,                         /* tp_alloc */
              };
              /*
               * returns a tuple of the form (index, index, cache) with elements as
               * follows:
               *
               * index: an index object that lazily parses RevlogNG records
               * cache: if data is inlined, a tuple (index_file_content, 0), else None
               *
               * added complications are for backwards compatibility
               */
              static PyObject *parse_index2(PyObject *self, PyObject *args)
              {
              	PyObject *tuple = NULL, *cache = NULL;
              	indexObject *idx;
              	int ret;
              	idx = PyObject_New(indexObject, &indexType);
              	if (idx == NULL)
              		goto bail;
              	ret = index_init(idx, args);
              	if (ret == -1)
              		goto bail;
              	if (idx->inlined) {
              		cache = Py_BuildValue("iO", 0, idx->data);
              		if (cache == NULL)
              			goto bail;
              	} else {
              		cache = Py_None;
              		Py_INCREF(cache);
              	}
              	tuple = Py_BuildValue("NN", idx, cache);
              	if (!tuple)
              		goto bail;
              	return tuple;
              bail:
              	Py_XDECREF(idx);
              	Py_XDECREF(cache);
              	Py_XDECREF(tuple);
              	return NULL;
              }
              static char parsers_doc[] = "Efficient content parsing.";
              PyObject *encodedir(PyObject *self, PyObject *args);
              PyObject *pathencode(PyObject *self, PyObject *args);
              PyObject *lowerencode(PyObject *self, PyObject *args);
              static PyMethodDef methods[] = {
              	{"pack_dirstate", pack_dirstate, METH_VARARGS, "pack a dirstate\n"},
              	{"parse_manifest", parse_manifest, METH_VARARGS, "parse a manifest\n"},
              	{"parse_dirstate", parse_dirstate, METH_VARARGS, "parse a dirstate\n"},
              	{"parse_index2", parse_index2, METH_VARARGS, "parse a revlog index\n"},
+             	{"asciilower", asciilower, METH_VARARGS, "lowercase an ASCII string\n"},
              	{"encodedir", encodedir, METH_VARARGS, "encodedir a path\n"},
              	{"pathencode", pathencode, METH_VARARGS, "fncache-encode a path\n"},
              	{"lowerencode", lowerencode, METH_VARARGS, "lower-encode a path\n"},
              	{NULL, NULL}
              };
              void dirs_module_init(PyObject *mod);
              static void module_init(PyObject *mod)
              {
              	/* This module constant has two purposes.  First, it lets us unit test
              	 * the ImportError raised without hard-coding any error text.  This
              	 * means we can change the text in the future without breaking tests,
              	 * even across changesets without a recompile.  Second, its presence
              	 * can be used to determine whether the version-checking logic is
              	 * present, which also helps in testing across changesets without a
              	 * recompile.  Note that this means the pure-Python version of parsers
              	 * should not have this module constant. */
              	PyModule_AddStringConstant(mod, "versionerrortext", versionerrortext);
              	dirs_module_init(mod);
              	indexType.tp_new = PyType_GenericNew;
              	if (PyType_Ready(&indexType) < 0 ||
              	    PyType_Ready(&dirstateTupleType) < 0)
              		return;
              	Py_INCREF(&indexType);
              	PyModule_AddObject(mod, "index", (PyObject *)&indexType);
              	Py_INCREF(&dirstateTupleType);
              	PyModule_AddObject(mod, "dirstatetuple",
              			   (PyObject *)&dirstateTupleType);
              	nullentry = Py_BuildValue("iiiiiiis#", 0, 0, 0,
              				  -1, -1, -1, -1, nullid, 20);
              	if (nullentry)
              		PyObject_GC_UnTrack(nullentry);
              }
              static int check_python_version(void)
              {
              	PyObject *sys = PyImport_ImportModule("sys");
              	long hexversion = PyInt_AsLong(PyObject_GetAttrString(sys, "hexversion"));
              	/* sys.hexversion is a 32-bit number by default, so the -1 case
              	 * should only occur in unusual circumstances (e.g. if sys.hexversion
              	 * is manually set to an invalid value). */
              	if ((hexversion == -1) || (hexversion >> 16 != PY_VERSION_HEX >> 16)) {
              		PyErr_Format(PyExc_ImportError, "%s: The Mercurial extension "
              			"modules were compiled with Python " PY_VERSION ", but "
              			"Mercurial is currently using Python with sys.hexversion=%ld: "
              			"Python %s\n at: %s", versionerrortext, hexversion,
              			Py_GetVersion(), Py_GetProgramFullPath());
              		return -1;
              	}
              	return 0;
              }
              #ifdef IS_PY3K
              static struct PyModuleDef parsers_module = {
              	PyModuleDef_HEAD_INIT,
              	"parsers",
              	parsers_doc,
              	-1,
              	methods
              };
              PyMODINIT_FUNC PyInit_parsers(void)
              {
              	PyObject *mod;
              	if (check_python_version() == -1)
              		return;
              	mod = PyModule_Create(&parsers_module);
              	module_init(mod);
              	return mod;
              }
              #else
              PyMODINIT_FUNC initparsers(void)
              {
              	PyObject *mod;
              	if (check_python_version() == -1)
              		return;
              	mod = Py_InitModule3("parsers", methods, parsers_doc);
              	module_init(mod);
              }
              #endif

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages