upstream/mercurial-mirror Commit - r26879:a24b98f4

encoding: re-escape U+DCxx characters in toutf8b input (issue4927)...

Matt Mackall -

r26879:a24b98f4 default

parent child

mercurial/encoding.py

0 +14 -9

                 internal surrogate encoding as a UTF-8 string.)
                 '''
-                if isinstance(s, localstr):
+                if "\xed" not in s:
-                    return s._utf8
+                    if isinstance(s, localstr):
+                        return s._utf8
-                try:
+                    try:
-                    s.decode('utf-8')
+                        s.decode('utf-8')
-                    return s
+                        return s
-                except UnicodeDecodeError:
+                    except UnicodeDecodeError:
-                    pass
+                        pass
                 r = ""
                 pos = 0
                 while pos < l:
                     try:
                         c = getutf8char(s, pos)
-                        pos += len(c)
+                        if "\xed\xb0\x80" <= c <= "\xed\xb3\xbf":
+                            # have to re-escape existing U+DCxx characters
+                            c = unichr(0xdc00 + ord(s[pos])).encode('utf-8')
+                            pos += 1
+                        else:
+                            pos += len(c)
                     except UnicodeDecodeError:
                         c = unichr(0xdc00 + ord(s[pos])).encode('utf-8')
                         pos += 1

General Comments 0

You need to be logged in to leave comments. Login now

No TODOs yet

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages