upstream/mercurial-mirror Commit - r39448:aeb551a3

cborutil: implement sans I/O decoder...

cborutil: implement sans I/O decoder The vendored CBOR package decodes by calling read(n) on an object. There are a number of disadvantages to this: * Uses blocking I/O. If sufficient data is not available, the decoder will hang until it is. * No support for partial reads. If the read(n) returns less data than requested, the decoder raises an error. * Requires the use of a file like object. If the original data is in say a buffer, we need to "cast" it to e.g. a BytesIO to appease the decoder. In addition, the vendored CBOR decoder doesn't provide flexibility that we desire. Specifically: * It buffers indefinite length bytestrings instead of streaming them. * It doesn't allow limiting the set of types that can be decoded. This property is useful when implementing a "hardened" decoder that is less susceptible to abusive input. * It doesn't provide sufficient "hook points" and introspection to institute checks around behavior. These are useful for implementing a "hardened" decoder. This all adds up to a reasonable set of justifications for writing our own decoder. So, this commit implements our own CBOR decoder. At the heart of the decoder is a function that decodes a single "item" from a buffer. This item can be a complete simple value or a special value, such as "start of array." Using this function, we can build a decoder that effectively iterates over the stream of decoded items and builds up higher-level values, such as arrays, maps, sets, and indefinite length bytestrings. And we can do this without performing I/O in the decoder itself. The core of the sans I/O decoder will probably not be used directly. Instead, it is expected that we'll build utility functions for invoking the decoder given specific input types. This will allow extreme flexibility in how data is delivered to the decoder. I'm pretty happy with the state of the decoder modulo the TODO items to track wanted features to help with a "hardened" decoder. The one thing I could be convinced to change is the handling of semantic tags. Since we only support a single semantic tag (sets), I thought it would be easier to handle them inline in decodeitem(). This is simpler now. But if we add support for other semantic tags, it will likely be easier to move semantic tag handling outside of decodeitem(). But, properly supporting semantic tags opens up a whole can of worms, as many semantic tags imply new types. I'm optimistic we won't need these in Mercurial. But who knows. I'm also pretty happy with the test coverage. Writing comprehensive tests for partial decoding did flush out a handful of bugs. One general improvement to testing would be fuzz testing for partial decoding. I may implement that later. I also anticipate switching the wire protocol code to this new decoder will flush out any lingering bugs. Differential Revision: https://phab.mercurial-scm.org/D4414

Gregory Szorc -

r39448:aeb551a3 default

parent child

Collapse all files

mercurial/utils/cborutil.py

0 +714 -1

              # cborutil.py - CBOR extensions
              #
              # Copyright 2018 Gregory Szorc <gregory.szorc@gmail.com>
              #
              # This software may be used and distributed according to the terms of the
              # GNU General Public License version 2 or any later version.
              from __future__ import absolute_import
              import struct
+             import sys
              from ..thirdparty.cbor.cbor2 import (
                  decoder as decodermod,
              )
              # Very short very of RFC 7049...
              #
              # Each item begins with a byte. The 3 high bits of that byte denote the
              # "major type." The lower 5 bits denote the "subtype." Each major type
              # has its own encoding mechanism.
              #
              # Most types have lengths. However, bytestring, string, array, and map
              # can be indefinite length. These are denotes by a subtype with value 31.
              # Sub-components of those types then come afterwards and are terminated
              # by a "break" byte.
              MAJOR_TYPE_UINT = 0
              MAJOR_TYPE_NEGINT = 1
              MAJOR_TYPE_BYTESTRING = 2
              MAJOR_TYPE_STRING = 3
              MAJOR_TYPE_ARRAY = 4
              MAJOR_TYPE_MAP = 5
              MAJOR_TYPE_SEMANTIC = 6
              MAJOR_TYPE_SPECIAL = 7
              SUBTYPE_MASK = 0b00011111
+             SUBTYPE_FALSE = 20
+             SUBTYPE_TRUE = 21
+             SUBTYPE_NULL = 22
              SUBTYPE_HALF_FLOAT = 25
              SUBTYPE_SINGLE_FLOAT = 26
              SUBTYPE_DOUBLE_FLOAT = 27
              SUBTYPE_INDEFINITE = 31
+             SEMANTIC_TAG_FINITE_SET = 258
              # Indefinite types begin with their major type ORd with information value 31.
              BEGIN_INDEFINITE_BYTESTRING = struct.pack(
                  r'>B', MAJOR_TYPE_BYTESTRING << 5 | SUBTYPE_INDEFINITE)
              BEGIN_INDEFINITE_ARRAY = struct.pack(
                  r'>B', MAJOR_TYPE_ARRAY << 5 | SUBTYPE_INDEFINITE)
              BEGIN_INDEFINITE_MAP = struct.pack(
                  r'>B', MAJOR_TYPE_MAP << 5 | SUBTYPE_INDEFINITE)
              ENCODED_LENGTH_1 = struct.Struct(r'>B')
              ENCODED_LENGTH_2 = struct.Struct(r'>BB')
              ENCODED_LENGTH_3 = struct.Struct(r'>BH')
              ENCODED_LENGTH_4 = struct.Struct(r'>BL')
              ENCODED_LENGTH_5 = struct.Struct(r'>BQ')
              # The break ends an indefinite length item.
              BREAK = b'\xff'
              BREAK_INT = 255
              def encodelength(majortype, length):
                  """Obtain a value encoding the major type and its length."""
                  if length < 24:
                      return ENCODED_LENGTH_1.pack(majortype << 5 | length)
                  elif length < 256:
                      return ENCODED_LENGTH_2.pack(majortype << 5 | 24, length)
                  elif length < 65536:
                      return ENCODED_LENGTH_3.pack(majortype << 5 | 25, length)
                  elif length < 4294967296:
                      return ENCODED_LENGTH_4.pack(majortype << 5 | 26, length)
                  else:
                      return ENCODED_LENGTH_5.pack(majortype << 5 | 27, length)
              def streamencodebytestring(v):
                  yield encodelength(MAJOR_TYPE_BYTESTRING, len(v))
                  yield v
              def streamencodebytestringfromiter(it):
                  """Convert an iterator of chunks to an indefinite bytestring.
                  Given an input that is iterable and each element in the iterator is
                  representable as bytes, emit an indefinite length bytestring.
                  """
                  yield BEGIN_INDEFINITE_BYTESTRING
                  for chunk in it:
                      yield encodelength(MAJOR_TYPE_BYTESTRING, len(chunk))
                      yield chunk
                  yield BREAK
              def streamencodeindefinitebytestring(source, chunksize=65536):
                  """Given a large source buffer, emit as an indefinite length bytestring.
                  This is a generator of chunks constituting the encoded CBOR data.
                  """
                  yield BEGIN_INDEFINITE_BYTESTRING
                  i = 0
                  l = len(source)
                  while True:
                      chunk = source[i:i + chunksize]
                      i += len(chunk)
                      yield encodelength(MAJOR_TYPE_BYTESTRING, len(chunk))
                      yield chunk
                      if i >= l:
                          break
                  yield BREAK
              def streamencodeint(v):
                  if v >= 18446744073709551616 or v < -18446744073709551616:
                      raise ValueError('big integers not supported')
                  if v >= 0:
                      yield encodelength(MAJOR_TYPE_UINT, v)
                  else:
                      yield encodelength(MAJOR_TYPE_NEGINT, abs(v) - 1)
              def streamencodearray(l):
                  """Encode a known size iterable to an array."""
                  yield encodelength(MAJOR_TYPE_ARRAY, len(l))
                  for i in l:
                      for chunk in streamencode(i):
                          yield chunk
              def streamencodearrayfromiter(it):
                  """Encode an iterator of items to an indefinite length array."""
                  yield BEGIN_INDEFINITE_ARRAY
                  for i in it:
                      for chunk in streamencode(i):
                          yield chunk
                  yield BREAK
              def _mixedtypesortkey(v):
                  return type(v).__name__, v
              def streamencodeset(s):
                  # https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml defines
                  # semantic tag 258 for finite sets.
-                 yield encodelength(MAJOR_TYPE_SEMANTIC, 258)
+                 yield encodelength(MAJOR_TYPE_SEMANTIC, SEMANTIC_TAG_FINITE_SET)
                  for chunk in streamencodearray(sorted(s, key=_mixedtypesortkey)):
                      yield chunk
              def streamencodemap(d):
                  """Encode dictionary to a generator.
                  Does not supporting indefinite length dictionaries.
                  """
                  yield encodelength(MAJOR_TYPE_MAP, len(d))
                  for key, value in sorted(d.iteritems(),
                                           key=lambda x: _mixedtypesortkey(x[0])):
                      for chunk in streamencode(key):
                          yield chunk
                      for chunk in streamencode(value):
                          yield chunk
              def streamencodemapfromiter(it):
                  """Given an iterable of (key, value), encode to an indefinite length map."""
                  yield BEGIN_INDEFINITE_MAP
                  for key, value in it:
                      for chunk in streamencode(key):
                          yield chunk
                      for chunk in streamencode(value):
                          yield chunk
                  yield BREAK
              def streamencodebool(b):
                  # major type 7, simple value 20 and 21.
                  yield b'\xf5' if b else b'\xf4'
              def streamencodenone(v):
                  # major type 7, simple value 22.
                  yield b'\xf6'
              STREAM_ENCODERS = {
                  bytes: streamencodebytestring,
                  int: streamencodeint,
                  list: streamencodearray,
                  tuple: streamencodearray,
                  dict: streamencodemap,
                  set: streamencodeset,
                  bool: streamencodebool,
                  type(None): streamencodenone,
              }
              def streamencode(v):
                  """Encode a value in a streaming manner.
                  Given an input object, encode it to CBOR recursively.
                  Returns a generator of CBOR encoded bytes. There is no guarantee
                  that each emitted chunk fully decodes to a value or sub-value.
                  Encoding is deterministic - unordered collections are sorted.
                  """
                  fn = STREAM_ENCODERS.get(v.__class__)
                  if not fn:
                      raise ValueError('do not know how to encode %s' % type(v))
                  return fn(v)
              def readindefinitebytestringtoiter(fh, expectheader=True):
                  """Read an indefinite bytestring to a generator.
                  Receives an object with a ``read(X)`` method to read N bytes.
                  If ``expectheader`` is True, it is expected that the first byte read
                  will represent an indefinite length bytestring. Otherwise, we
                  expect the first byte to be part of the first bytestring chunk.
                  """
                  read = fh.read
                  decodeuint = decodermod.decode_uint
                  byteasinteger = decodermod.byte_as_integer
                  if expectheader:
                      initial = decodermod.byte_as_integer(read(1))
                      majortype = initial >> 5
                      subtype = initial & SUBTYPE_MASK
                      if majortype != MAJOR_TYPE_BYTESTRING:
                          raise decodermod.CBORDecodeError(
                              'expected major type %d; got %d' % (MAJOR_TYPE_BYTESTRING,
                                                                  majortype))
                      if subtype != SUBTYPE_INDEFINITE:
                          raise decodermod.CBORDecodeError(
                              'expected indefinite subtype; got %d' % subtype)
                  # The indefinite bytestring is composed of chunks of normal bytestrings.
                  # Read chunks until we hit a BREAK byte.
                  while True:
                      # We need to sniff for the BREAK byte.
                      initial = byteasinteger(read(1))
                      if initial == BREAK_INT:
                          break
                      length = decodeuint(fh, initial & SUBTYPE_MASK)
                      chunk = read(length)
                      if len(chunk) != length:
                          raise decodermod.CBORDecodeError(
                              'failed to read bytestring chunk: got %d bytes; expected %d' % (
                                  len(chunk), length))
                      yield chunk
+             class CBORDecodeError(Exception):
+                 """Represents an error decoding CBOR."""
+             if sys.version_info.major >= 3:
+                 def _elementtointeger(b, i):
+                     return b[i]
+             else:
+                 def _elementtointeger(b, i):
+                     return ord(b[i])
+             STRUCT_BIG_UBYTE = struct.Struct(r'>B')
+             STRUCT_BIG_USHORT = struct.Struct('>H')
+             STRUCT_BIG_ULONG = struct.Struct('>L')
+             STRUCT_BIG_ULONGLONG = struct.Struct('>Q')
+             SPECIAL_NONE = 0
+             SPECIAL_START_INDEFINITE_BYTESTRING = 1
+             SPECIAL_START_ARRAY = 2
+             SPECIAL_START_MAP = 3
+             SPECIAL_START_SET = 4
+             SPECIAL_INDEFINITE_BREAK = 5
+             def decodeitem(b, offset=0):
+                 """Decode a new CBOR value from a buffer at offset.
+                 This function attempts to decode up to one complete CBOR value
+                 from ``b`` starting at offset ``offset``.
+                 The beginning of a collection (such as an array, map, set, or
+                 indefinite length bytestring) counts as a single value. For these
+                 special cases, a state flag will indicate that a special value was seen.
+                 When called, the function either returns a decoded value or gives
+                 a hint as to how many more bytes are needed to do so. By calling
+                 the function repeatedly given a stream of bytes, the caller can
+                 build up the original values.
+                 Returns a tuple with the following elements:
+                 * Bool indicating whether a complete value was decoded.
+                 * A decoded value if first value is True otherwise None
+                 * Integer number of bytes. If positive, the number of bytes
+                   read. If negative, the number of bytes we need to read to
+                   decode this value or the next chunk in this value.
+                 * One of the ``SPECIAL_*`` constants indicating special treatment
+                   for this value. ``SPECIAL_NONE`` means this is a fully decoded
+                   simple value (such as an integer or bool).
+                 """
+                 initial = _elementtointeger(b, offset)
+                 offset += 1
+                 majortype = initial >> 5
+                 subtype = initial & SUBTYPE_MASK
+                 if majortype == MAJOR_TYPE_UINT:
+                     complete, value, readcount = decodeuint(subtype, b, offset)
+                     if complete:
+                         return True, value, readcount + 1, SPECIAL_NONE
+                     else:
+                         return False, None, readcount, SPECIAL_NONE
+                 elif majortype == MAJOR_TYPE_NEGINT:
+                     # Negative integers are the same as UINT except inverted minus 1.
+                     complete, value, readcount = decodeuint(subtype, b, offset)
+                     if complete:
+                         return True, -value - 1, readcount + 1, SPECIAL_NONE
+                     else:
+                         return False, None, readcount, SPECIAL_NONE
+                 elif majortype == MAJOR_TYPE_BYTESTRING:
+                     # Beginning of bytestrings are treated as uints in order to
+                     # decode their length, which may be indefinite.
+                     complete, size, readcount = decodeuint(subtype, b, offset,
+                                                            allowindefinite=True)
+                     # We don't know the size of the bytestring. It must be a definitive
+                     # length since the indefinite subtype would be encoded in the initial
+                     # byte.
+                     if not complete:
+                         return False, None, readcount, SPECIAL_NONE
+                     # We know the length of the bytestring.
+                     if size is not None:
+                         # And the data is available in the buffer.
+                         if offset + readcount + size <= len(b):
+                             value = b[offset + readcount:offset + readcount + size]
+                             return True, value, readcount + size + 1, SPECIAL_NONE
+                         # And we need more data in order to return the bytestring.
+                         else:
+                             wanted = len(b) - offset - readcount - size
+                             return False, None, wanted, SPECIAL_NONE
+                     # It is an indefinite length bytestring.
+                     else:
+                         return True, None, 1, SPECIAL_START_INDEFINITE_BYTESTRING
+                 elif majortype == MAJOR_TYPE_STRING:
+                     raise CBORDecodeError('string major type not supported')
+                 elif majortype == MAJOR_TYPE_ARRAY:
+                     # Beginning of arrays are treated as uints in order to decode their
+                     # length. We don't allow indefinite length arrays.
+                     complete, size, readcount = decodeuint(subtype, b, offset)
+                     if complete:
+                         return True, size, readcount + 1, SPECIAL_START_ARRAY
+                     else:
+                         return False, None, readcount, SPECIAL_NONE
+                 elif majortype == MAJOR_TYPE_MAP:
+                     # Beginning of maps are treated as uints in order to decode their
+                     # number of elements. We don't allow indefinite length arrays.
+                     complete, size, readcount = decodeuint(subtype, b, offset)
+                     if complete:
+                         return True, size, readcount + 1, SPECIAL_START_MAP
+                     else:
+                         return False, None, readcount, SPECIAL_NONE
+                 elif majortype == MAJOR_TYPE_SEMANTIC:
+                     # Semantic tag value is read the same as a uint.
+                     complete, tagvalue, readcount = decodeuint(subtype, b, offset)
+                     if not complete:
+                         return False, None, readcount, SPECIAL_NONE
+                     # This behavior here is a little wonky. The main type being "decorated"
+                     # by this semantic tag follows. A more robust parser would probably emit
+                     # a special flag indicating this as a semantic tag and let the caller
+                     # deal with the types that follow. But since we don't support many
+                     # semantic tags, it is easier to deal with the special cases here and
+                     # hide complexity from the caller. If we add support for more semantic
+                     # tags, we should probably move semantic tag handling into the caller.
+                     if tagvalue == SEMANTIC_TAG_FINITE_SET:
+                         if offset + readcount >= len(b):
+                             return False, None, -1, SPECIAL_NONE
+                         complete, size, readcount2, special = decodeitem(b,
+                                                                          offset + readcount)
+                         if not complete:
+                             return False, None, readcount2, SPECIAL_NONE
+                         if special != SPECIAL_START_ARRAY:
+                             raise CBORDecodeError('expected array after finite set '
+                                                   'semantic tag')
+                         return True, size, readcount + readcount2 + 1, SPECIAL_START_SET
+                     else:
+                         raise CBORDecodeError('semantic tag %d not allowed' % tagvalue)
+                 elif majortype == MAJOR_TYPE_SPECIAL:
+                     # Only specific values for the information field are allowed.
+                     if subtype == SUBTYPE_FALSE:
+                         return True, False, 1, SPECIAL_NONE
+                     elif subtype == SUBTYPE_TRUE:
+                         return True, True, 1, SPECIAL_NONE
+                     elif subtype == SUBTYPE_NULL:
+                         return True, None, 1, SPECIAL_NONE
+                     elif subtype == SUBTYPE_INDEFINITE:
+                         return True, None, 1, SPECIAL_INDEFINITE_BREAK
+                     # If value is 24, subtype is in next byte.
+                     else:
+                         raise CBORDecodeError('special type %d not allowed' % subtype)
+                 else:
+                     assert False
+             def decodeuint(subtype, b, offset=0, allowindefinite=False):
+                 """Decode an unsigned integer.
+                 ``subtype`` is the lower 5 bits from the initial byte CBOR item
+                 "header." ``b`` is a buffer containing bytes. ``offset`` points to
+                 the index of the first byte after the byte that ``subtype`` was
+                 derived from.
+                 ``allowindefinite`` allows the special indefinite length value
+                 indicator.
+                 Returns a 3-tuple of (successful, value, count).
+                 The first element is a bool indicating if decoding completed. The 2nd
+                 is the decoded integer value or None if not fully decoded or the subtype
+                 is 31 and ``allowindefinite`` is True. The 3rd value is the count of bytes.
+                 If positive, it is the number of additional bytes decoded. If negative,
+                 it is the number of additional bytes needed to decode this value.
+                 """
+                 # Small values are inline.
+                 if subtype < 24:
+                     return True, subtype, 0
+                 # Indefinite length specifier.
+                 elif subtype == 31:
+                     if allowindefinite:
+                         return True, None, 0
+                     else:
+                         raise CBORDecodeError('indefinite length uint not allowed here')
+                 elif subtype >= 28:
+                     raise CBORDecodeError('unsupported subtype on integer type: %d' %
+                                           subtype)
+                 if subtype == 24:
+                     s = STRUCT_BIG_UBYTE
+                 elif subtype == 25:
+                     s = STRUCT_BIG_USHORT
+                 elif subtype == 26:
+                     s = STRUCT_BIG_ULONG
+                 elif subtype == 27:
+                     s = STRUCT_BIG_ULONGLONG
+                 else:
+                     raise CBORDecodeError('bounds condition checking violation')
+                 if len(b) - offset >= s.size:
+                     return True, s.unpack_from(b, offset)[0], s.size
+                 else:
+                     return False, None, len(b) - offset - s.size
+             class bytestringchunk(bytes):
+                 """Represents a chunk/segment in an indefinite length bytestring.
+                 This behaves like a ``bytes`` but in addition has the ``isfirst``
+                 and ``islast`` attributes indicating whether this chunk is the first
+                 or last in an indefinite length bytestring.
+                 """
+                 def __new__(cls, v, first=False, last=False):
+                     self = bytes.__new__(cls, v)
+                     self.isfirst = first
+                     self.islast = last
+                     return self
+             class sansiodecoder(object):
+                 """A CBOR decoder that doesn't perform its own I/O.
+                 To use, construct an instance and feed it segments containing
+                 CBOR-encoded bytes via ``decode()``. The return value from ``decode()``
+                 indicates whether a fully-decoded value is available, how many bytes
+                 were consumed, and offers a hint as to how many bytes should be fed
+                 in next time to decode the next value.
+                 The decoder assumes it will decode N discrete CBOR values, not just
+                 a single value. i.e. if the bytestream contains uints packed one after
+                 the other, the decoder will decode them all, rather than just the initial
+                 one.
+                 When ``decode()`` indicates a value is available, call ``getavailable()``
+                 to return all fully decoded values.
+                 ``decode()`` can partially decode input. It is up to the caller to keep
+                 track of what data was consumed and to pass unconsumed data in on the
+                 next invocation.
+                 The decoder decodes atomically at the *item* level. See ``decodeitem()``.
+                 If an *item* cannot be fully decoded, the decoder won't record it as
+                 partially consumed. Instead, the caller will be instructed to pass in
+                 the initial bytes of this item on the next invocation. This does result
+                 in some redundant parsing. But the overhead should be minimal.
+                 This decoder only supports a subset of CBOR as required by Mercurial.
+                 It lacks support for:
+                 * Indefinite length arrays
+                 * Indefinite length maps
+                 * Use of indefinite length bytestrings as keys or values within
+                   arrays, maps, or sets.
+                 * Nested arrays, maps, or sets within sets
+                 * Any semantic tag that isn't a mathematical finite set
+                 * Floating point numbers
+                 * Undefined special value
+                 CBOR types are decoded to Python types as follows:
+                 uint -> int
+                 negint -> int
+                 bytestring -> bytes
+                 map -> dict
+                 array -> list
+                 True -> bool
+                 False -> bool
+                 null -> None
+                 indefinite length bytestring chunk -> [bytestringchunk]
+                 The only non-obvious mapping here is an indefinite length bytestring
+                 to the ``bytestringchunk`` type. This is to facilitate streaming
+                 indefinite length bytestrings out of the decoder and to differentiate
+                 a regular bytestring from an indefinite length bytestring.
+                 """
+                 _STATE_NONE = 0
+                 _STATE_WANT_MAP_KEY = 1
+                 _STATE_WANT_MAP_VALUE = 2
+                 _STATE_WANT_ARRAY_VALUE = 3
+                 _STATE_WANT_SET_VALUE = 4
+                 _STATE_WANT_BYTESTRING_CHUNK_FIRST = 5
+                 _STATE_WANT_BYTESTRING_CHUNK_SUBSEQUENT = 6
+                 def __init__(self):
+                     # TODO add support for limiting size of bytestrings
+                     # TODO add support for limiting number of keys / values in collections
+                     # TODO add support for limiting size of buffered partial values
+                     self.decodedbytecount = 0
+                     self._state = self._STATE_NONE
+                     # Stack of active nested collections. Each entry is a dict describing
+                     # the collection.
+                     self._collectionstack = []
+                     # Fully decoded key to use for the current map.
+                     self._currentmapkey = None
+                     # Fully decoded values available for retrieval.
+                     self._decodedvalues = []
+                 @property
+                 def inprogress(self):
+                     """Whether the decoder has partially decoded a value."""
+                     return self._state != self._STATE_NONE
+                 def decode(self, b, offset=0):
+                     """Attempt to decode bytes from an input buffer.
+                     ``b`` is a collection of bytes and ``offset`` is the byte
+                     offset within that buffer from which to begin reading data.
+                     ``b`` must support ``len()`` and accessing bytes slices via
+                     ``__slice__``. Typically ``bytes`` instances are used.
+                     Returns a tuple with the following fields:
+                     * Bool indicating whether values are available for retrieval.
+                     * Integer indicating the number of bytes that were fully consumed,
+                       starting from ``offset``.
+                     * Integer indicating the number of bytes that are desired for the
+                       next call in order to decode an item.
+                     """
+                     if not b:
+                         return bool(self._decodedvalues), 0, 0
+                     initialoffset = offset
+                     # We could easily split the body of this loop into a function. But
+                     # Python performance is sensitive to function calls and collections
+                     # are composed of many items. So leaving as a while loop could help
+                     # with performance. One thing that may not help is the use of
+                     # if..elif versus a lookup/dispatch table. There may be value
+                     # in switching that.
+                     while offset < len(b):
+                         # Attempt to decode an item. This could be a whole value or a
+                         # special value indicating an event, such as start or end of a
+                         # collection or indefinite length type.
+                         complete, value, readcount, special = decodeitem(b, offset)
+                         if readcount > 0:
+                             self.decodedbytecount += readcount
+                         if not complete:
+                             assert readcount < 0
+                             return (
+                                 bool(self._decodedvalues),
+                                 offset - initialoffset,
+                                 -readcount,
+                             )
+                         offset += readcount
+                         # No nested state. We either have a full value or beginning of a
+                         # complex value to deal with.
+                         if self._state == self._STATE_NONE:
+                             # A normal value.
+                             if special == SPECIAL_NONE:
+                                 self._decodedvalues.append(value)
+                             elif special == SPECIAL_START_ARRAY:
+                                 self._collectionstack.append({
+                                     'remaining': value,
+                                     'v': [],
+                                 })
+                                 self._state = self._STATE_WANT_ARRAY_VALUE
+                             elif special == SPECIAL_START_MAP:
+                                 self._collectionstack.append({
+                                     'remaining': value,
+                                     'v': {},
+                                 })
+                                 self._state = self._STATE_WANT_MAP_KEY
+                             elif special == SPECIAL_START_SET:
+                                 self._collectionstack.append({
+                                     'remaining': value,
+                                     'v': set(),
+                                 })
+                                 self._state = self._STATE_WANT_SET_VALUE
+                             elif special == SPECIAL_START_INDEFINITE_BYTESTRING:
+                                 self._state = self._STATE_WANT_BYTESTRING_CHUNK_FIRST
+                             else:
+                                 raise CBORDecodeError('unhandled special state: %d' %
+                                                       special)
+                         # This value becomes an element of the current array.
+                         elif self._state == self._STATE_WANT_ARRAY_VALUE:
+                             # Simple values get appended.
+                             if special == SPECIAL_NONE:
+                                 c = self._collectionstack[-1]
+                                 c['v'].append(value)
+                                 c['remaining'] -= 1
+                                 # self._state doesn't need changed.
+                             # An array nested within an array.
+                             elif special == SPECIAL_START_ARRAY:
+                                 lastc = self._collectionstack[-1]
+                                 newvalue = []
+                                 lastc['v'].append(newvalue)
+                                 lastc['remaining'] -= 1
+                                 self._collectionstack.append({
+                                     'remaining': value,
+                                     'v': newvalue,
+                                 })
+                                 # self._state doesn't need changed.
+                             # A map nested within an array.
+                             elif special == SPECIAL_START_MAP:
+                                 lastc = self._collectionstack[-1]
+                                 newvalue = {}
+                                 lastc['v'].append(newvalue)
+                                 lastc['remaining'] -= 1
+                                 self._collectionstack.append({
+                                     'remaining': value,
+                                     'v': newvalue
+                                 })
+                                 self._state = self._STATE_WANT_MAP_KEY
+                             elif special == SPECIAL_START_SET:
+                                 lastc = self._collectionstack[-1]
+                                 newvalue = set()
+                                 lastc['v'].append(newvalue)
+                                 lastc['remaining'] -= 1
+                                 self._collectionstack.append({
+                                     'remaining': value,
+                                     'v': newvalue,
+                                 })
+                                 self._state = self._STATE_WANT_SET_VALUE
+                             elif special == SPECIAL_START_INDEFINITE_BYTESTRING:
+                                 raise CBORDecodeError('indefinite length bytestrings '
+                                                       'not allowed as array values')
+                             else:
+                                 raise CBORDecodeError('unhandled special item when '
+                                                       'expecting array value: %d' % special)
+                         # This value becomes the key of the current map instance.
+                         elif self._state == self._STATE_WANT_MAP_KEY:
+                             if special == SPECIAL_NONE:
+                                 self._currentmapkey = value
+                                 self._state = self._STATE_WANT_MAP_VALUE
+                             elif special == SPECIAL_START_INDEFINITE_BYTESTRING:
+                                 raise CBORDecodeError('indefinite length bytestrings '
+                                                       'not allowed as map keys')
+                             elif special in (SPECIAL_START_ARRAY, SPECIAL_START_MAP,
+                                              SPECIAL_START_SET):
+                                 raise CBORDecodeError('collections not supported as map '
+                                                       'keys')
+                             # We do not allow special values to be used as map keys.
+                             else:
+                                 raise CBORDecodeError('unhandled special item when '
+                                                       'expecting map key: %d' % special)
+                         # This value becomes the value of the current map key.
+                         elif self._state == self._STATE_WANT_MAP_VALUE:
+                             # Simple values simply get inserted into the map.
+                             if special == SPECIAL_NONE:
+                                 lastc = self._collectionstack[-1]
+                                 lastc['v'][self._currentmapkey] = value
+                                 lastc['remaining'] -= 1
+                                 self._state = self._STATE_WANT_MAP_KEY
+                             # A new array is used as the map value.
+                             elif special == SPECIAL_START_ARRAY:
+                                 lastc = self._collectionstack[-1]
+                                 newvalue = []
+                                 lastc['v'][self._currentmapkey] = newvalue
+                                 lastc['remaining'] -= 1
+                                 self._collectionstack.append({
+                                     'remaining': value,
+                                     'v': newvalue,
+                                 })
+                                 self._state = self._STATE_WANT_ARRAY_VALUE
+                             # A new map is used as the map value.
+                             elif special == SPECIAL_START_MAP:
+                                 lastc = self._collectionstack[-1]
+                                 newvalue = {}
+                                 lastc['v'][self._currentmapkey] = newvalue
+                                 lastc['remaining'] -= 1
+                                 self._collectionstack.append({
+                                     'remaining': value,
+                                     'v': newvalue,
+                                 })
+                                 self._state = self._STATE_WANT_MAP_KEY
+                             # A new set is used as the map value.
+                             elif special == SPECIAL_START_SET:
+                                 lastc = self._collectionstack[-1]
+                                 newvalue = set()
+                                 lastc['v'][self._currentmapkey] = newvalue
+                                 lastc['remaining'] -= 1
+                                 self._collectionstack.append({
+                                     'remaining': value,
+                                     'v': newvalue,
+                                 })
+                                 self._state = self._STATE_WANT_SET_VALUE
+                             elif special == SPECIAL_START_INDEFINITE_BYTESTRING:
+                                 raise CBORDecodeError('indefinite length bytestrings not '
+                                                       'allowed as map values')
+                             else:
+                                 raise CBORDecodeError('unhandled special item when '
+                                                       'expecting map value: %d' % special)
+                             self._currentmapkey = None
+                         # This value is added to the current set.
+                         elif self._state == self._STATE_WANT_SET_VALUE:
+                             if special == SPECIAL_NONE:
+                                 lastc = self._collectionstack[-1]
+                                 lastc['v'].add(value)
+                                 lastc['remaining'] -= 1
+                             elif special == SPECIAL_START_INDEFINITE_BYTESTRING:
+                                 raise CBORDecodeError('indefinite length bytestrings not '
+                                                       'allowed as set values')
+                             elif special in (SPECIAL_START_ARRAY,
+                                              SPECIAL_START_MAP,
+                                              SPECIAL_START_SET):
+                                 raise CBORDecodeError('collections not allowed as set '
+                                                       'values')
+                             # We don't allow non-trivial types to exist as set values.
+                             else:
+                                 raise CBORDecodeError('unhandled special item when '
+                                                       'expecting set value: %d' % special)
+                         # This value represents the first chunk in an indefinite length
+                         # bytestring.
+                         elif self._state == self._STATE_WANT_BYTESTRING_CHUNK_FIRST:
+                             # We received a full chunk.
+                             if special == SPECIAL_NONE:
+                                 self._decodedvalues.append(bytestringchunk(value,
+                                                                            first=True))
+                                 self._state = self._STATE_WANT_BYTESTRING_CHUNK_SUBSEQUENT
+                             # The end of stream marker. This means it is an empty
+                             # indefinite length bytestring.
+                             elif special == SPECIAL_INDEFINITE_BREAK:
+                                 # We /could/ convert this to a b''. But we want to preserve
+                                 # the nature of the underlying data so consumers expecting
+                                 # an indefinite length bytestring get one.
+                                 self._decodedvalues.append(bytestringchunk(b'',
+                                                                            first=True,
+                                                                            last=True))
+                                 # Since indefinite length bytestrings can't be used in
+                                 # collections, we must be at the root level.
+                                 assert not self._collectionstack
+                                 self._state = self._STATE_NONE
+                             else:
+                                 raise CBORDecodeError('unexpected special value when '
+                                                       'expecting bytestring chunk: %d' %
+                                                       special)
+                         # This value represents the non-initial chunk in an indefinite
+                         # length bytestring.
+                         elif self._state == self._STATE_WANT_BYTESTRING_CHUNK_SUBSEQUENT:
+                             # We received a full chunk.
+                             if special == SPECIAL_NONE:
+                                 self._decodedvalues.append(bytestringchunk(value))
+                             # The end of stream marker.
+                             elif special == SPECIAL_INDEFINITE_BREAK:
+                                 self._decodedvalues.append(bytestringchunk(b'', last=True))
+                                 # Since indefinite length bytestrings can't be used in
+                                 # collections, we must be at the root level.
+                                 assert not self._collectionstack
+                                 self._state = self._STATE_NONE
+                             else:
+                                 raise CBORDecodeError('unexpected special value when '
+                                                       'expecting bytestring chunk: %d' %
+                                                       special)
+                         else:
+                             raise CBORDecodeError('unhandled decoder state: %d' %
+                                                   self._state)
+                         # We could have just added the final value in a collection. End
+                         # all complete collections at the top of the stack.
+                         while True:
+                             # Bail if we're not waiting on a new collection item.
+                             if self._state not in (self._STATE_WANT_ARRAY_VALUE,
+                                                    self._STATE_WANT_MAP_KEY,
+                                                    self._STATE_WANT_SET_VALUE):
+                                 break
+                             # Or we are expecting more items for this collection.
+                             lastc = self._collectionstack[-1]
+                             if lastc['remaining']:
+                                 break
+                             # The collection at the top of the stack is complete.
+                             # Discard it, as it isn't needed for future items.
+                             self._collectionstack.pop()
+                             # If this is a nested collection, we don't emit it, since it
+                             # will be emitted by its parent collection. But we do need to
+                             # update state to reflect what the new top-most collection
+                             # on the stack is.
+                             if self._collectionstack:
+                                 self._state = {
+                                     list: self._STATE_WANT_ARRAY_VALUE,
+                                     dict: self._STATE_WANT_MAP_KEY,
+                                     set: self._STATE_WANT_SET_VALUE,
+                                 }[type(self._collectionstack[-1]['v'])]
+                             # If this is the root collection, emit it.
+                             else:
+                                 self._decodedvalues.append(lastc['v'])
+                                 self._state = self._STATE_NONE
+                     return (
+                         bool(self._decodedvalues),
+                         offset - initialoffset,
+,
+                     )
+                 def getavailable(self):
+                     """Returns an iterator over fully decoded values.
+                     Once values are retrieved, they won't be available on the next call.
+                     """
+                     l = list(self._decodedvalues)
+                     self._decodedvalues = []
+                     return l
+             def decodeall(b):
+                 """Decode all CBOR items present in an iterable of bytes.
+                 In addition to regular decode errors, raises CBORDecodeError if the
+                 entirety of the passed buffer does not fully decode to complete CBOR
+                 values. This includes failure to decode any value, incomplete collection
+                 types, incomplete indefinite length items, and extra data at the end of
+                 the buffer.
+                 """
+                 if not b:
+                     return []
+                 decoder = sansiodecoder()
+                 havevalues, readcount, wantbytes = decoder.decode(b)
+                 if readcount != len(b):
+                     raise CBORDecodeError('input data not fully consumed')
+                 if decoder.inprogress:
+                     raise CBORDecodeError('input data not complete')
+                 return decoder.getavailable()

tests/test-cbor.py

0 +777 -16

This diff has been collapsed as it changes many lines, (793 lines changed) Show them Hide them
			@@ -1,210 +1,971 b''
	1	1	from __future__ import absolute_import
	2	2
	3	3	import io
	4	4	import unittest
	5	5
	6	6	from mercurial.thirdparty import (
	7	7	cbor,
	8	8	)
	9	9	from mercurial.utils import (
	10	10	cborutil,
	11	11	)
	12	12
		13	class TestCase(unittest.TestCase):
		14	if not getattr(unittest.TestCase, 'assertRaisesRegex', False):
		15	# Python 3.7 deprecates the regexp version, but 2.7 lacks
		16	# the regex version.
		17	assertRaisesRegex = (# camelcase-required
		18	unittest.TestCase.assertRaisesRegexp)
		19
	13	20	def loadit(it):
	14	21	return cbor.loads(b''.join(it))
	15	22
	16		class BytestringTests(~~unittest~~.TestCase):
		23	class BytestringTests(TestCase):
	17	24	def testsimple(self):
	18	25	self.assertEqual(
	19	26	list(cborutil.streamencode(b'foobar')),
	20	27	[b'\x46', b'foobar'])
	21	28
	22	29	self.assertEqual(
	23	30	loadit(cborutil.streamencode(b'foobar')),
	24	31	b'foobar')
	25	32
		33	self.assertEqual(cborutil.decodeall(b'\x46foobar'),
		34	[b'foobar'])
		35
		36	self.assertEqual(cborutil.decodeall(b'\x46foobar\x45fizbi'),
		37	[b'foobar', b'fizbi'])
		38
	26	39	def testlong(self):
	27	40	source = b'x' * 1048576
	28	41
	29	42	self.assertEqual(loadit(cborutil.streamencode(source)), source)
	30	43
		44	encoded = b''.join(cborutil.streamencode(source))
		45	self.assertEqual(cborutil.decodeall(encoded), [source])
		46
	31	47	def testfromiter(self):
	32	48	# This is the example from RFC 7049 Section 2.2.2.
	33	49	source = [b'\xaa\xbb\xcc\xdd', b'\xee\xff\x99']
	34	50
	35	51	self.assertEqual(
	36	52	list(cborutil.streamencodebytestringfromiter(source)),
	37	53	[
	38	54	b'\x5f',
	39	55	b'\x44',
	40	56	b'\xaa\xbb\xcc\xdd',
	41	57	b'\x43',
	42	58	b'\xee\xff\x99',
	43	59	b'\xff',
	44	60	])
	45	61
	46	62	self.assertEqual(
	47	63	loadit(cborutil.streamencodebytestringfromiter(source)),
	48	64	b''.join(source))
	49	65
		66	self.assertEqual(cborutil.decodeall(b'\x5f\x44\xaa\xbb\xcc\xdd'
		67	b'\x43\xee\xff\x99\xff'),
		68	[b'\xaa\xbb\xcc\xdd', b'\xee\xff\x99', b''])
		69
		70	for i, chunk in enumerate(
		71	cborutil.decodeall(b'\x5f\x44\xaa\xbb\xcc\xdd'
		72	b'\x43\xee\xff\x99\xff')):
		73	self.assertIsInstance(chunk, cborutil.bytestringchunk)
		74
		75	if i == 0:
		76	self.assertTrue(chunk.isfirst)
		77	else:
		78	self.assertFalse(chunk.isfirst)
		79
		80	if i == 2:
		81	self.assertTrue(chunk.islast)
		82	else:
		83	self.assertFalse(chunk.islast)
		84
	50	85	def testfromiterlarge(self):
	51	86	source = [b'a' * 16, b'b' * 128, b'c' * 1024, b'd' * 1048576]
	52	87
	53	88	self.assertEqual(
	54	89	loadit(cborutil.streamencodebytestringfromiter(source)),
	55	90	b''.join(source))
	56	91
	57	92	def testindefinite(self):
	58	93	source = b'\x00\x01\x02\x03' + b'\xff' * 16384
	59	94
	60	95	it = cborutil.streamencodeindefinitebytestring(source, chunksize=2)
	61	96
	62	97	self.assertEqual(next(it), b'\x5f')
	63	98	self.assertEqual(next(it), b'\x42')
	64	99	self.assertEqual(next(it), b'\x00\x01')
	65	100	self.assertEqual(next(it), b'\x42')
	66	101	self.assertEqual(next(it), b'\x02\x03')
	67	102	self.assertEqual(next(it), b'\x42')
	68	103	self.assertEqual(next(it), b'\xff\xff')
	69	104
	70	105	dest = b''.join(cborutil.streamencodeindefinitebytestring(
	71	106	source, chunksize=42))
	72	107	self.assertEqual(cbor.loads(dest), source)
	73	108
		109	self.assertEqual(b''.join(cborutil.decodeall(dest)), source)
		110
		111	for chunk in cborutil.decodeall(dest):
		112	self.assertIsInstance(chunk, cborutil.bytestringchunk)
		113	self.assertIn(len(chunk), (0, 8, 42))
		114
		115	encoded = b'\x5f\xff'
		116	b = cborutil.decodeall(encoded)
		117	self.assertEqual(b, [b''])
		118	self.assertTrue(b[0].isfirst)
		119	self.assertTrue(b[0].islast)
		120
	74	121	def testreadtoiter(self):
	75	122	source = io.BytesIO(b'\x5f\x44\xaa\xbb\xcc\xdd\x43\xee\xff\x99\xff')
	76	123
	77	124	it = cborutil.readindefinitebytestringtoiter(source)
	78	125	self.assertEqual(next(it), b'\xaa\xbb\xcc\xdd')
	79	126	self.assertEqual(next(it), b'\xee\xff\x99')
	80	127
	81	128	with self.assertRaises(StopIteration):
	82	129	next(it)
	83	130
	84		class IntTests(unittest.TestCase):
		131	def testdecodevariouslengths(self):
		132	for i in (0, 1, 22, 23, 24, 25, 254, 255, 256, 65534, 65535, 65536):
		133	source = b'x' * i
		134	encoded = b''.join(cborutil.streamencode(source))
		135
		136	if len(source) < 24:
		137	hlen = 1
		138	elif len(source) < 256:
		139	hlen = 2
		140	elif len(source) < 65536:
		141	hlen = 3
		142	elif len(source) < 1048576:
		143	hlen = 5
		144
		145	self.assertEqual(cborutil.decodeitem(encoded),
		146	(True, source, hlen + len(source),
		147	cborutil.SPECIAL_NONE))
		148
		149	def testpartialdecode(self):
		150	encoded = b''.join(cborutil.streamencode(b'foobar'))
		151
		152	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		153	(False, None, -6, cborutil.SPECIAL_NONE))
		154	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		155	(False, None, -5, cborutil.SPECIAL_NONE))
		156	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		157	(False, None, -4, cborutil.SPECIAL_NONE))
		158	self.assertEqual(cborutil.decodeitem(encoded[0:4]),
		159	(False, None, -3, cborutil.SPECIAL_NONE))
		160	self.assertEqual(cborutil.decodeitem(encoded[0:5]),
		161	(False, None, -2, cborutil.SPECIAL_NONE))
		162	self.assertEqual(cborutil.decodeitem(encoded[0:6]),
		163	(False, None, -1, cborutil.SPECIAL_NONE))
		164	self.assertEqual(cborutil.decodeitem(encoded[0:7]),
		165	(True, b'foobar', 7, cborutil.SPECIAL_NONE))
		166
		167	def testpartialdecodevariouslengths(self):
		168	lens = [
		169	2,
		170	3,
		171	10,
		172	23,
		173	24,
		174	25,
		175	31,
		176	100,
		177	254,
		178	255,
		179	256,
		180	257,
		181	16384,
		182	65534,
		183	65535,
		184	65536,
		185	65537,
		186	131071,
		187	131072,
		188	131073,
		189	1048575,
		190	1048576,
		191	1048577,
		192	]
		193
		194	for size in lens:
		195	if size < 24:
		196	hlen = 1
		197	elif size < 2**8:
		198	hlen = 2
		199	elif size < 2**16:
		200	hlen = 3
		201	elif size < 2**32:
		202	hlen = 5
		203	else:
		204	assert False
		205
		206	source = b'x' * size
		207	encoded = b''.join(cborutil.streamencode(source))
		208
		209	res = cborutil.decodeitem(encoded[0:1])
		210
		211	if hlen > 1:
		212	self.assertEqual(res, (False, None, -(hlen - 1),
		213	cborutil.SPECIAL_NONE))
		214	else:
		215	self.assertEqual(res, (False, None, -(size + hlen - 1),
		216	cborutil.SPECIAL_NONE))
		217
		218	# Decoding partial header reports remaining header size.
		219	for i in range(hlen - 1):
		220	self.assertEqual(cborutil.decodeitem(encoded[0:i + 1]),
		221	(False, None, -(hlen - i - 1),
		222	cborutil.SPECIAL_NONE))
		223
		224	# Decoding complete header reports item size.
		225	self.assertEqual(cborutil.decodeitem(encoded[0:hlen]),
		226	(False, None, -size, cborutil.SPECIAL_NONE))
		227
		228	# Decoding single byte after header reports item size - 1
		229	self.assertEqual(cborutil.decodeitem(encoded[0:hlen + 1]),
		230	(False, None, -(size - 1), cborutil.SPECIAL_NONE))
		231
		232	# Decoding all but the last byte reports -1 needed.
		233	self.assertEqual(cborutil.decodeitem(encoded[0:hlen + size - 1]),
		234	(False, None, -1, cborutil.SPECIAL_NONE))
		235
		236	# Decoding last byte retrieves value.
		237	self.assertEqual(cborutil.decodeitem(encoded[0:hlen + size]),
		238	(True, source, hlen + size, cborutil.SPECIAL_NONE))
		239
		240	def testindefinitepartialdecode(self):
		241	encoded = b''.join(cborutil.streamencodebytestringfromiter(
		242	[b'foobar', b'biz']))
		243
		244	# First item should be begin of bytestring special.
		245	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		246	(True, None, 1,
		247	cborutil.SPECIAL_START_INDEFINITE_BYTESTRING))
		248
		249	# Second item should be the first chunk. But only available when
		250	# we give it 7 bytes (1 byte header + 6 byte chunk).
		251	self.assertEqual(cborutil.decodeitem(encoded[1:2]),
		252	(False, None, -6, cborutil.SPECIAL_NONE))
		253	self.assertEqual(cborutil.decodeitem(encoded[1:3]),
		254	(False, None, -5, cborutil.SPECIAL_NONE))
		255	self.assertEqual(cborutil.decodeitem(encoded[1:4]),
		256	(False, None, -4, cborutil.SPECIAL_NONE))
		257	self.assertEqual(cborutil.decodeitem(encoded[1:5]),
		258	(False, None, -3, cborutil.SPECIAL_NONE))
		259	self.assertEqual(cborutil.decodeitem(encoded[1:6]),
		260	(False, None, -2, cborutil.SPECIAL_NONE))
		261	self.assertEqual(cborutil.decodeitem(encoded[1:7]),
		262	(False, None, -1, cborutil.SPECIAL_NONE))
		263
		264	self.assertEqual(cborutil.decodeitem(encoded[1:8]),
		265	(True, b'foobar', 7, cborutil.SPECIAL_NONE))
		266
		267	# Third item should be second chunk. But only available when
		268	# we give it 4 bytes (1 byte header + 3 byte chunk).
		269	self.assertEqual(cborutil.decodeitem(encoded[8:9]),
		270	(False, None, -3, cborutil.SPECIAL_NONE))
		271	self.assertEqual(cborutil.decodeitem(encoded[8:10]),
		272	(False, None, -2, cborutil.SPECIAL_NONE))
		273	self.assertEqual(cborutil.decodeitem(encoded[8:11]),
		274	(False, None, -1, cborutil.SPECIAL_NONE))
		275
		276	self.assertEqual(cborutil.decodeitem(encoded[8:12]),
		277	(True, b'biz', 4, cborutil.SPECIAL_NONE))
		278
		279	# Fourth item should be end of indefinite stream marker.
		280	self.assertEqual(cborutil.decodeitem(encoded[12:13]),
		281	(True, None, 1, cborutil.SPECIAL_INDEFINITE_BREAK))
		282
		283	# Now test the behavior when going through the decoder.
		284
		285	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:1]),
		286	(False, 1, 0))
		287	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:2]),
		288	(False, 1, 6))
		289	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:3]),
		290	(False, 1, 5))
		291	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:4]),
		292	(False, 1, 4))
		293	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:5]),
		294	(False, 1, 3))
		295	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:6]),
		296	(False, 1, 2))
		297	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:7]),
		298	(False, 1, 1))
		299	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:8]),
		300	(True, 8, 0))
		301
		302	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:9]),
		303	(True, 8, 3))
		304	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:10]),
		305	(True, 8, 2))
		306	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:11]),
		307	(True, 8, 1))
		308	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:12]),
		309	(True, 12, 0))
		310
		311	self.assertEqual(cborutil.sansiodecoder().decode(encoded[0:13]),
		312	(True, 13, 0))
		313
		314	decoder = cborutil.sansiodecoder()
		315	decoder.decode(encoded[0:8])
		316	values = decoder.getavailable()
		317	self.assertEqual(values, [b'foobar'])
		318	self.assertTrue(values[0].isfirst)
		319	self.assertFalse(values[0].islast)
		320
		321	self.assertEqual(decoder.decode(encoded[8:12]),
		322	(True, 4, 0))
		323	values = decoder.getavailable()
		324	self.assertEqual(values, [b'biz'])
		325	self.assertFalse(values[0].isfirst)
		326	self.assertFalse(values[0].islast)
		327
		328	self.assertEqual(decoder.decode(encoded[12:]),
		329	(True, 1, 0))
		330	values = decoder.getavailable()
		331	self.assertEqual(values, [b''])
		332	self.assertFalse(values[0].isfirst)
		333	self.assertTrue(values[0].islast)
		334
		335	class StringTests(TestCase):
		336	def testdecodeforbidden(self):
		337	encoded = b'\x63foo'
		338	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		339	'string major type not supported'):
		340	cborutil.decodeall(encoded)
		341
		342	class IntTests(TestCase):
	85	343	def testsmall(self):
	86	344	self.assertEqual(list(cborutil.streamencode(0)), [b'\x00'])
		345	self.assertEqual(cborutil.decodeall(b'\x00'), [0])
		346
	87	347	self.assertEqual(list(cborutil.streamencode(1)), [b'\x01'])
		348	self.assertEqual(cborutil.decodeall(b'\x01'), [1])
		349
	88	350	self.assertEqual(list(cborutil.streamencode(2)), [b'\x02'])
		351	self.assertEqual(cborutil.decodeall(b'\x02'), [2])
		352
	89	353	self.assertEqual(list(cborutil.streamencode(3)), [b'\x03'])
		354	self.assertEqual(cborutil.decodeall(b'\x03'), [3])
		355
	90	356	self.assertEqual(list(cborutil.streamencode(4)), [b'\x04'])
		357	self.assertEqual(cborutil.decodeall(b'\x04'), [4])
		358
		359	# Multiple value decode works.
		360	self.assertEqual(cborutil.decodeall(b'\x00\x01\x02\x03\x04'),
		361	[0, 1, 2, 3, 4])
	91	362
	92	363	def testnegativesmall(self):
	93	364	self.assertEqual(list(cborutil.streamencode(-1)), [b'\x20'])
		365	self.assertEqual(cborutil.decodeall(b'\x20'), [-1])
		366
	94	367	self.assertEqual(list(cborutil.streamencode(-2)), [b'\x21'])
		368	self.assertEqual(cborutil.decodeall(b'\x21'), [-2])
		369
	95	370	self.assertEqual(list(cborutil.streamencode(-3)), [b'\x22'])
		371	self.assertEqual(cborutil.decodeall(b'\x22'), [-3])
		372
	96	373	self.assertEqual(list(cborutil.streamencode(-4)), [b'\x23'])
		374	self.assertEqual(cborutil.decodeall(b'\x23'), [-4])
		375
	97	376	self.assertEqual(list(cborutil.streamencode(-5)), [b'\x24'])
		377	self.assertEqual(cborutil.decodeall(b'\x24'), [-5])
		378
		379	# Multiple value decode works.
		380	self.assertEqual(cborutil.decodeall(b'\x20\x21\x22\x23\x24'),
		381	[-1, -2, -3, -4, -5])
	98	382
	99	383	def testrange(self):
	100	384	for i in range(-70000, 70000, 10):
	101		self.assertEqual(
	102		b''.join(cborutil.streamencode(i)),
	103		cbor.dumps(i))
		385	encoded = b''.join(cborutil.streamencode(i))
		386
		387	self.assertEqual(encoded, cbor.dumps(i))
		388	self.assertEqual(cborutil.decodeall(encoded), [i])
		389
		390	def testdecodepartialubyte(self):
		391	encoded = b''.join(cborutil.streamencode(250))
		392
		393	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		394	(False, None, -1, cborutil.SPECIAL_NONE))
		395	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		396	(True, 250, 2, cborutil.SPECIAL_NONE))
		397
		398	def testdecodepartialbyte(self):
		399	encoded = b''.join(cborutil.streamencode(-42))
		400	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		401	(False, None, -1, cborutil.SPECIAL_NONE))
		402	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		403	(True, -42, 2, cborutil.SPECIAL_NONE))
		404
		405	def testdecodepartialushort(self):
		406	encoded = b''.join(cborutil.streamencode(2**15))
		407
		408	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		409	(False, None, -2, cborutil.SPECIAL_NONE))
		410	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		411	(False, None, -1, cborutil.SPECIAL_NONE))
		412	self.assertEqual(cborutil.decodeitem(encoded[0:5]),
		413	(True, 2**15, 3, cborutil.SPECIAL_NONE))
		414
		415	def testdecodepartialshort(self):
		416	encoded = b''.join(cborutil.streamencode(-1024))
		417
		418	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		419	(False, None, -2, cborutil.SPECIAL_NONE))
		420	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		421	(False, None, -1, cborutil.SPECIAL_NONE))
		422	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		423	(True, -1024, 3, cborutil.SPECIAL_NONE))
		424
		425	def testdecodepartialulong(self):
		426	encoded = b''.join(cborutil.streamencode(2**28))
		427
		428	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		429	(False, None, -4, cborutil.SPECIAL_NONE))
		430	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		431	(False, None, -3, cborutil.SPECIAL_NONE))
		432	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		433	(False, None, -2, cborutil.SPECIAL_NONE))
		434	self.assertEqual(cborutil.decodeitem(encoded[0:4]),
		435	(False, None, -1, cborutil.SPECIAL_NONE))
		436	self.assertEqual(cborutil.decodeitem(encoded[0:5]),
		437	(True, 2**28, 5, cborutil.SPECIAL_NONE))
		438
		439	def testdecodepartiallong(self):
		440	encoded = b''.join(cborutil.streamencode(-1048580))
	104	441
	105		class ArrayTests(unittest.TestCase):
		442	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		443	(False, None, -4, cborutil.SPECIAL_NONE))
		444	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		445	(False, None, -3, cborutil.SPECIAL_NONE))
		446	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		447	(False, None, -2, cborutil.SPECIAL_NONE))
		448	self.assertEqual(cborutil.decodeitem(encoded[0:4]),
		449	(False, None, -1, cborutil.SPECIAL_NONE))
		450	self.assertEqual(cborutil.decodeitem(encoded[0:5]),
		451	(True, -1048580, 5, cborutil.SPECIAL_NONE))
		452
		453	def testdecodepartialulonglong(self):
		454	encoded = b''.join(cborutil.streamencode(2**32))
		455
		456	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		457	(False, None, -8, cborutil.SPECIAL_NONE))
		458	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		459	(False, None, -7, cborutil.SPECIAL_NONE))
		460	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		461	(False, None, -6, cborutil.SPECIAL_NONE))
		462	self.assertEqual(cborutil.decodeitem(encoded[0:4]),
		463	(False, None, -5, cborutil.SPECIAL_NONE))
		464	self.assertEqual(cborutil.decodeitem(encoded[0:5]),
		465	(False, None, -4, cborutil.SPECIAL_NONE))
		466	self.assertEqual(cborutil.decodeitem(encoded[0:6]),
		467	(False, None, -3, cborutil.SPECIAL_NONE))
		468	self.assertEqual(cborutil.decodeitem(encoded[0:7]),
		469	(False, None, -2, cborutil.SPECIAL_NONE))
		470	self.assertEqual(cborutil.decodeitem(encoded[0:8]),
		471	(False, None, -1, cborutil.SPECIAL_NONE))
		472	self.assertEqual(cborutil.decodeitem(encoded[0:9]),
		473	(True, 2**32, 9, cborutil.SPECIAL_NONE))
		474
		475	with self.assertRaisesRegex(
		476	cborutil.CBORDecodeError, 'input data not fully consumed'):
		477	cborutil.decodeall(encoded[0:1])
		478
		479	with self.assertRaisesRegex(
		480	cborutil.CBORDecodeError, 'input data not fully consumed'):
		481	cborutil.decodeall(encoded[0:2])
		482
		483	def testdecodepartiallonglong(self):
		484	encoded = b''.join(cborutil.streamencode(-7000000000))
		485
		486	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		487	(False, None, -8, cborutil.SPECIAL_NONE))
		488	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		489	(False, None, -7, cborutil.SPECIAL_NONE))
		490	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		491	(False, None, -6, cborutil.SPECIAL_NONE))
		492	self.assertEqual(cborutil.decodeitem(encoded[0:4]),
		493	(False, None, -5, cborutil.SPECIAL_NONE))
		494	self.assertEqual(cborutil.decodeitem(encoded[0:5]),
		495	(False, None, -4, cborutil.SPECIAL_NONE))
		496	self.assertEqual(cborutil.decodeitem(encoded[0:6]),
		497	(False, None, -3, cborutil.SPECIAL_NONE))
		498	self.assertEqual(cborutil.decodeitem(encoded[0:7]),
		499	(False, None, -2, cborutil.SPECIAL_NONE))
		500	self.assertEqual(cborutil.decodeitem(encoded[0:8]),
		501	(False, None, -1, cborutil.SPECIAL_NONE))
		502	self.assertEqual(cborutil.decodeitem(encoded[0:9]),
		503	(True, -7000000000, 9, cborutil.SPECIAL_NONE))
		504
		505	class ArrayTests(TestCase):
	106	506	def testempty(self):
	107	507	self.assertEqual(list(cborutil.streamencode([])), [b'\x80'])
	108	508	self.assertEqual(loadit(cborutil.streamencode([])), [])
	109	509
		510	self.assertEqual(cborutil.decodeall(b'\x80'), [[]])
		511
	110	512	def testbasic(self):
	111	513	source = [b'foo', b'bar', 1, -10]
	112	514
	113		self.assertEqual(list(cborutil.streamencode(source)), [
	114		b'\x84', b'\x43', b'foo', b'\x43', b'bar', b'\x01', b'\x29'])
		515	chunks = [
		516	b'\x84', b'\x43', b'foo', b'\x43', b'bar', b'\x01', b'\x29']
		517
		518	self.assertEqual(list(cborutil.streamencode(source)), chunks)
		519
		520	self.assertEqual(cborutil.decodeall(b''.join(chunks)), [source])
	115	521
	116	522	def testemptyfromiter(self):
	117	523	self.assertEqual(b''.join(cborutil.streamencodearrayfromiter([])),
	118	524	b'\x9f\xff')
	119	525
		526	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		527	'indefinite length uint not allowed'):
		528	cborutil.decodeall(b'\x9f\xff')
		529
	120	530	def testfromiter1(self):
	121	531	source = [b'foo']
	122	532
	123	533	self.assertEqual(list(cborutil.streamencodearrayfromiter(source)), [
	124	534	b'\x9f',
	125	535	b'\x43', b'foo',
	126	536	b'\xff',
	127	537	])
	128	538
	129	539	dest = b''.join(cborutil.streamencodearrayfromiter(source))
	130	540	self.assertEqual(cbor.loads(dest), source)
	131	541
		542	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		543	'indefinite length uint not allowed'):
		544	cborutil.decodeall(dest)
		545
	132	546	def testtuple(self):
	133	547	source = (b'foo', None, 42)
		548	encoded = b''.join(cborutil.streamencode(source))
	134	549
	135		self.assertEqual(cbor.loads(b''.~~join~~(~~cborutil~~.~~stream~~encode(source))),
	136		list(source))
		550	self.assertEqual(cbor.loads(encoded), list(source))
		551
		552	self.assertEqual(cborutil.decodeall(encoded), [list(source)])
		553
		554	def testpartialdecode(self):
		555	source = list(range(4))
		556	encoded = b''.join(cborutil.streamencode(source))
		557	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		558	(True, 4, 1, cborutil.SPECIAL_START_ARRAY))
		559	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		560	(True, 4, 1, cborutil.SPECIAL_START_ARRAY))
		561
		562	source = list(range(23))
		563	encoded = b''.join(cborutil.streamencode(source))
		564	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		565	(True, 23, 1, cborutil.SPECIAL_START_ARRAY))
		566	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		567	(True, 23, 1, cborutil.SPECIAL_START_ARRAY))
		568
		569	source = list(range(24))
		570	encoded = b''.join(cborutil.streamencode(source))
		571	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		572	(False, None, -1, cborutil.SPECIAL_NONE))
		573	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		574	(True, 24, 2, cborutil.SPECIAL_START_ARRAY))
		575	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		576	(True, 24, 2, cborutil.SPECIAL_START_ARRAY))
	137	577
	138		class SetTests(unittest.TestCase):
		578	source = list(range(256))
		579	encoded = b''.join(cborutil.streamencode(source))
		580	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		581	(False, None, -2, cborutil.SPECIAL_NONE))
		582	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		583	(False, None, -1, cborutil.SPECIAL_NONE))
		584	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		585	(True, 256, 3, cborutil.SPECIAL_START_ARRAY))
		586	self.assertEqual(cborutil.decodeitem(encoded[0:4]),
		587	(True, 256, 3, cborutil.SPECIAL_START_ARRAY))
		588
		589	def testnested(self):
		590	source = [[], [], [[], [], []]]
		591	encoded = b''.join(cborutil.streamencode(source))
		592	self.assertEqual(cborutil.decodeall(encoded), [source])
		593
		594	source = [True, None, [True, 0, 2], [None], [], [[[]], -87]]
		595	encoded = b''.join(cborutil.streamencode(source))
		596	self.assertEqual(cborutil.decodeall(encoded), [source])
		597
		598	# A set within an array.
		599	source = [None, {b'foo', b'bar', None, False}, set()]
		600	encoded = b''.join(cborutil.streamencode(source))
		601	self.assertEqual(cborutil.decodeall(encoded), [source])
		602
		603	# A map within an array.
		604	source = [None, {}, {b'foo': b'bar', True: False}, [{}]]
		605	encoded = b''.join(cborutil.streamencode(source))
		606	self.assertEqual(cborutil.decodeall(encoded), [source])
		607
		608	def testindefinitebytestringvalues(self):
		609	# Single value array whose value is an empty indefinite bytestring.
		610	encoded = b'\x81\x5f\x40\xff'
		611
		612	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		613	'indefinite length bytestrings not '
		614	'allowed as array values'):
		615	cborutil.decodeall(encoded)
		616
		617	class SetTests(TestCase):
	139	618	def testempty(self):
	140	619	self.assertEqual(list(cborutil.streamencode(set())), [
	141	620	b'\xd9\x01\x02',
	142	621	b'\x80',
	143	622	])
	144	623
		624	self.assertEqual(cborutil.decodeall(b'\xd9\x01\x02\x80'), [set()])
		625
	145	626	def testset(self):
	146	627	source = {b'foo', None, 42}
		628	encoded = b''.join(cborutil.streamencode(source))
	147	629
	148		self.assertEqual(cbor.loads(b''.~~join~~(~~cborutil~~.~~stream~~encode(source)~~)),~~
	149		source)
		630	self.assertEqual(cbor.loads(encoded), source)
		631
		632	self.assertEqual(cborutil.decodeall(encoded), [source])
		633
		634	def testinvalidtag(self):
		635	# Must use array to encode sets.
		636	encoded = b'\xd9\x01\x02\xa0'
		637
		638	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		639	'expected array after finite set '
		640	'semantic tag'):
		641	cborutil.decodeall(encoded)
		642
		643	def testpartialdecode(self):
		644	# Semantic tag item will be 3 bytes. Set header will be variable
		645	# depending on length.
		646	encoded = b''.join(cborutil.streamencode({i for i in range(23)}))
		647	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		648	(False, None, -2, cborutil.SPECIAL_NONE))
		649	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		650	(False, None, -1, cborutil.SPECIAL_NONE))
		651	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		652	(False, None, -1, cborutil.SPECIAL_NONE))
		653	self.assertEqual(cborutil.decodeitem(encoded[0:4]),
		654	(True, 23, 4, cborutil.SPECIAL_START_SET))
		655	self.assertEqual(cborutil.decodeitem(encoded[0:5]),
		656	(True, 23, 4, cborutil.SPECIAL_START_SET))
		657
		658	encoded = b''.join(cborutil.streamencode({i for i in range(24)}))
		659	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		660	(False, None, -2, cborutil.SPECIAL_NONE))
		661	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		662	(False, None, -1, cborutil.SPECIAL_NONE))
		663	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		664	(False, None, -1, cborutil.SPECIAL_NONE))
		665	self.assertEqual(cborutil.decodeitem(encoded[0:4]),
		666	(False, None, -1, cborutil.SPECIAL_NONE))
		667	self.assertEqual(cborutil.decodeitem(encoded[0:5]),
		668	(True, 24, 5, cborutil.SPECIAL_START_SET))
		669	self.assertEqual(cborutil.decodeitem(encoded[0:6]),
		670	(True, 24, 5, cborutil.SPECIAL_START_SET))
	150	671
	151		class BoolTests(unittest.TestCase):
		672	encoded = b''.join(cborutil.streamencode({i for i in range(256)}))
		673	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		674	(False, None, -2, cborutil.SPECIAL_NONE))
		675	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		676	(False, None, -1, cborutil.SPECIAL_NONE))
		677	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		678	(False, None, -1, cborutil.SPECIAL_NONE))
		679	self.assertEqual(cborutil.decodeitem(encoded[0:4]),
		680	(False, None, -2, cborutil.SPECIAL_NONE))
		681	self.assertEqual(cborutil.decodeitem(encoded[0:5]),
		682	(False, None, -1, cborutil.SPECIAL_NONE))
		683	self.assertEqual(cborutil.decodeitem(encoded[0:6]),
		684	(True, 256, 6, cborutil.SPECIAL_START_SET))
		685
		686	def testinvalidvalue(self):
		687	encoded = b''.join([
		688	b'\xd9\x01\x02', # semantic tag
		689	b'\x81', # array of size 1
		690	b'\x5f\x43foo\xff', # indefinite length bytestring "foo"
		691	])
		692
		693	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		694	'indefinite length bytestrings not '
		695	'allowed as set values'):
		696	cborutil.decodeall(encoded)
		697
		698	encoded = b''.join([
		699	b'\xd9\x01\x02',
		700	b'\x81',
		701	b'\x80', # empty array
		702	])
		703
		704	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		705	'collections not allowed as set values'):
		706	cborutil.decodeall(encoded)
		707
		708	encoded = b''.join([
		709	b'\xd9\x01\x02',
		710	b'\x81',
		711	b'\xa0', # empty map
		712	])
		713
		714	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		715	'collections not allowed as set values'):
		716	cborutil.decodeall(encoded)
		717
		718	encoded = b''.join([
		719	b'\xd9\x01\x02',
		720	b'\x81',
		721	b'\xd9\x01\x02\x81\x01', # set with integer 1
		722	])
		723
		724	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		725	'collections not allowed as set values'):
		726	cborutil.decodeall(encoded)
		727
		728	class BoolTests(TestCase):
	152	729	def testbasic(self):
	153	730	self.assertEqual(list(cborutil.streamencode(True)), [b'\xf5'])
	154	731	self.assertEqual(list(cborutil.streamencode(False)), [b'\xf4'])
	155	732
	156	733	self.assertIs(loadit(cborutil.streamencode(True)), True)
	157	734	self.assertIs(loadit(cborutil.streamencode(False)), False)
	158	735
	159		class NoneTests(unittest.TestCase):
		736	self.assertEqual(cborutil.decodeall(b'\xf4'), [False])
		737	self.assertEqual(cborutil.decodeall(b'\xf5'), [True])
		738
		739	self.assertEqual(cborutil.decodeall(b'\xf4\xf5\xf5\xf4'),
		740	[False, True, True, False])
		741
		742	class NoneTests(TestCase):
	160	743	def testbasic(self):
	161	744	self.assertEqual(list(cborutil.streamencode(None)), [b'\xf6'])
	162	745
	163	746	self.assertIs(loadit(cborutil.streamencode(None)), None)
	164	747
	165		class MapTests(unittest.TestCase):
		748	self.assertEqual(cborutil.decodeall(b'\xf6'), [None])
		749	self.assertEqual(cborutil.decodeall(b'\xf6\xf6'), [None, None])
		750
		751	class MapTests(TestCase):
	166	752	def testempty(self):
	167	753	self.assertEqual(list(cborutil.streamencode({})), [b'\xa0'])
	168	754	self.assertEqual(loadit(cborutil.streamencode({})), {})
	169	755
		756	self.assertEqual(cborutil.decodeall(b'\xa0'), [{}])
		757
	170	758	def testemptyindefinite(self):
	171	759	self.assertEqual(list(cborutil.streamencodemapfromiter([])), [
	172	760	b'\xbf', b'\xff'])
	173	761
	174	762	self.assertEqual(loadit(cborutil.streamencodemapfromiter([])), {})
	175	763
		764	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		765	'indefinite length uint not allowed'):
		766	cborutil.decodeall(b'\xbf\xff')
		767
	176	768	def testone(self):
	177	769	source = {b'foo': b'bar'}
	178	770	self.assertEqual(list(cborutil.streamencode(source)), [
	179	771	b'\xa1', b'\x43', b'foo', b'\x43', b'bar'])
	180	772
	181	773	self.assertEqual(loadit(cborutil.streamencode(source)), source)
	182	774
		775	self.assertEqual(cborutil.decodeall(b'\xa1\x43foo\x43bar'), [source])
		776
	183	777	def testmultiple(self):
	184	778	source = {
	185	779	b'foo': b'bar',
	186	780	b'baz': b'value1',
	187	781	}
	188	782
	189	783	self.assertEqual(loadit(cborutil.streamencode(source)), source)
	190	784
	191	785	self.assertEqual(
	192	786	loadit(cborutil.streamencodemapfromiter(source.items())),
	193	787	source)
	194	788
		789	encoded = b''.join(cborutil.streamencode(source))
		790	self.assertEqual(cborutil.decodeall(encoded), [source])
		791
	195	792	def testcomplex(self):
	196	793	source = {
	197	794	b'key': 1,
	198	795	2: -10,
	199	796	}
	200	797
	201	798	self.assertEqual(loadit(cborutil.streamencode(source)),
	202	799	source)
	203	800
	204	801	self.assertEqual(
	205	802	loadit(cborutil.streamencodemapfromiter(source.items())),
	206	803	source)
	207	804
		805	encoded = b''.join(cborutil.streamencode(source))
		806	self.assertEqual(cborutil.decodeall(encoded), [source])
		807
		808	def testnested(self):
		809	source = {b'key1': None, b'key2': {b'sub1': b'sub2'}, b'sub2': {}}
		810	encoded = b''.join(cborutil.streamencode(source))
		811
		812	self.assertEqual(cborutil.decodeall(encoded), [source])
		813
		814	source = {
		815	b'key1': [],
		816	b'key2': [None, False],
		817	b'key3': {b'foo', b'bar'},
		818	b'key4': {},
		819	}
		820	encoded = b''.join(cborutil.streamencode(source))
		821	self.assertEqual(cborutil.decodeall(encoded), [source])
		822
		823	def testillegalkey(self):
		824	encoded = b''.join([
		825	# map header + len 1
		826	b'\xa1',
		827	# indefinite length bytestring "foo" in key position
		828	b'\x5f\x03foo\xff'
		829	])
		830
		831	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		832	'indefinite length bytestrings not '
		833	'allowed as map keys'):
		834	cborutil.decodeall(encoded)
		835
		836	encoded = b''.join([
		837	b'\xa1',
		838	b'\x80', # empty array
		839	b'\x43foo',
		840	])
		841
		842	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		843	'collections not supported as map keys'):
		844	cborutil.decodeall(encoded)
		845
		846	def testillegalvalue(self):
		847	encoded = b''.join([
		848	b'\xa1', # map headers
		849	b'\x43foo', # key
		850	b'\x5f\x03bar\xff', # indefinite length value
		851	])
		852
		853	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		854	'indefinite length bytestrings not '
		855	'allowed as map values'):
		856	cborutil.decodeall(encoded)
		857
		858	def testpartialdecode(self):
		859	source = {b'key1': b'value1'}
		860	encoded = b''.join(cborutil.streamencode(source))
		861
		862	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		863	(True, 1, 1, cborutil.SPECIAL_START_MAP))
		864	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		865	(True, 1, 1, cborutil.SPECIAL_START_MAP))
		866
		867	source = {b'key%d' % i: None for i in range(23)}
		868	encoded = b''.join(cborutil.streamencode(source))
		869	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		870	(True, 23, 1, cborutil.SPECIAL_START_MAP))
		871
		872	source = {b'key%d' % i: None for i in range(24)}
		873	encoded = b''.join(cborutil.streamencode(source))
		874	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		875	(False, None, -1, cborutil.SPECIAL_NONE))
		876	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		877	(True, 24, 2, cborutil.SPECIAL_START_MAP))
		878	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		879	(True, 24, 2, cborutil.SPECIAL_START_MAP))
		880
		881	source = {b'key%d' % i: None for i in range(256)}
		882	encoded = b''.join(cborutil.streamencode(source))
		883	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		884	(False, None, -2, cborutil.SPECIAL_NONE))
		885	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		886	(False, None, -1, cborutil.SPECIAL_NONE))
		887	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		888	(True, 256, 3, cborutil.SPECIAL_START_MAP))
		889	self.assertEqual(cborutil.decodeitem(encoded[0:4]),
		890	(True, 256, 3, cborutil.SPECIAL_START_MAP))
		891
		892	source = {b'key%d' % i: None for i in range(65536)}
		893	encoded = b''.join(cborutil.streamencode(source))
		894	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		895	(False, None, -4, cborutil.SPECIAL_NONE))
		896	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		897	(False, None, -3, cborutil.SPECIAL_NONE))
		898	self.assertEqual(cborutil.decodeitem(encoded[0:3]),
		899	(False, None, -2, cborutil.SPECIAL_NONE))
		900	self.assertEqual(cborutil.decodeitem(encoded[0:4]),
		901	(False, None, -1, cborutil.SPECIAL_NONE))
		902	self.assertEqual(cborutil.decodeitem(encoded[0:5]),
		903	(True, 65536, 5, cborutil.SPECIAL_START_MAP))
		904	self.assertEqual(cborutil.decodeitem(encoded[0:6]),
		905	(True, 65536, 5, cborutil.SPECIAL_START_MAP))
		906
		907	class SemanticTagTests(TestCase):
		908	def testdecodeforbidden(self):
		909	for i in range(500):
		910	if i == cborutil.SEMANTIC_TAG_FINITE_SET:
		911	continue
		912
		913	tag = cborutil.encodelength(cborutil.MAJOR_TYPE_SEMANTIC,
		914	i)
		915
		916	encoded = tag + cborutil.encodelength(cborutil.MAJOR_TYPE_UINT, 42)
		917
		918	# Partial decode is incomplete.
		919	if i < 24:
		920	pass
		921	elif i < 256:
		922	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		923	(False, None, -1, cborutil.SPECIAL_NONE))
		924	elif i < 65536:
		925	self.assertEqual(cborutil.decodeitem(encoded[0:1]),
		926	(False, None, -2, cborutil.SPECIAL_NONE))
		927	self.assertEqual(cborutil.decodeitem(encoded[0:2]),
		928	(False, None, -1, cborutil.SPECIAL_NONE))
		929
		930	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		931	'semantic tag \d+ not allowed'):
		932	cborutil.decodeitem(encoded)
		933
		934	class SpecialTypesTests(TestCase):
		935	def testforbiddentypes(self):
		936	for i in range(256):
		937	if i == cborutil.SUBTYPE_FALSE:
		938	continue
		939	elif i == cborutil.SUBTYPE_TRUE:
		940	continue
		941	elif i == cborutil.SUBTYPE_NULL:
		942	continue
		943
		944	encoded = cborutil.encodelength(cborutil.MAJOR_TYPE_SPECIAL, i)
		945
		946	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		947	'special type \d+ not allowed'):
		948	cborutil.decodeitem(encoded)
		949
		950	class SansIODecoderTests(TestCase):
		951	def testemptyinput(self):
		952	decoder = cborutil.sansiodecoder()
		953	self.assertEqual(decoder.decode(b''), (False, 0, 0))
		954
		955	class DecodeallTests(TestCase):
		956	def testemptyinput(self):
		957	self.assertEqual(cborutil.decodeall(b''), [])
		958
		959	def testpartialinput(self):
		960	encoded = b''.join([
		961	b'\x82', # array of 2 elements
		962	b'\x01', # integer 1
		963	])
		964
		965	with self.assertRaisesRegex(cborutil.CBORDecodeError,
		966	'input data not complete'):
		967	cborutil.decodeall(encoded)
		968
	208	969	if __name__ == '__main__':
	209	970	import silenttestrunner
	210	971	silenttestrunner.main(__name__)

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages