##// END OF EJS Templates
encoding: handle UTF-16 internal limit with fromutf8b (issue5031)...
encoding: handle UTF-16 internal limit with fromutf8b (issue5031) Default builds of Python have a Unicode type that isn't actually full Unicode but UTF-16, which encodes non-BMP codepoints to a pair of BMP codepoints with surrogate escaping. Since our UTF-8b hack escaping uses a plane that overlaps with the UTF-16 escaping system, this gets extra complicated. In addition, unichr() for codepoints greater than U+FFFF may not work either. This changes the code to reuse getutf8char to walk the byte string, so we only rely on Python for unpacking our U+DCxx characters.

File last commit:

r25979:b723f05e default
r27699:c8d3392f default
Show More
strutil.py
36 lines | 953 B | text/x-python | PythonLexer
# strutil.py - string utilities for Mercurial
#
# Copyright 2006 Vadim Gelfer <vadim.gelfer@gmail.com>
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
from __future__ import absolute_import
def findall(haystack, needle, start=0, end=None):
if end is None:
end = len(haystack)
if end < 0:
end += len(haystack)
if start < 0:
start += len(haystack)
while start < end:
c = haystack.find(needle, start, end)
if c == -1:
break
yield c
start = c + 1
def rfindall(haystack, needle, start=0, end=None):
if end is None:
end = len(haystack)
if end < 0:
end += len(haystack)
if start < 0:
start += len(haystack)
while end >= 0:
c = haystack.rfind(needle, start, end)
if c == -1:
break
yield c
end = c - 1