##// END OF EJS Templates
replace Python standard textwrap by MBCS sensitive one for i18n text...
replace Python standard textwrap by MBCS sensitive one for i18n text Mercurial has problem around text wrapping/filling in MBCS encoding environment, because standard 'textwrap' module of Python can not treat it correctly. It splits byte sequence for one character into two lines. According to unicode specification, "east asian width" classifies characters into: W(ide), N(arrow), F(ull-width), H(alf-width), A(mbiguous) W/N/F/H can be always recognized as 2/1/2/1 bytes in byte sequence, but 'A' can not. Size of 'A' depends on language in which it is used. Unicode specification says: If the context(= language) cannot be established reliably they should be treated as narrow characters by default but many of class 'A' characters are full-width, at least, in Japanese environment. So, this patch treats class 'A' characters as full-width always for safety wrapping. This patch focuses only on MBCS safe-ness, not on writing/printing rule strict wrapping for each languages MBCS sensitive textwrap class is originally implemented by ITO Nobuaki <daydream.trippers@gmail.com>.

File last commit:

r11297:d320e704 default
r11297:d320e704 default
Show More
test-encoding.out
174 lines | 4.8 KiB | text/plain | TextLexer
adding changesets
adding manifests
adding file changes
added 2 changesets with 2 changes to 1 files
(run 'hg update' to get a working copy)
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
% should fail with encoding error
M a
? latin-1
? latin-1-tag
? utf-8
transaction abort!
rollback completed
abort: decoding near ' encoded: �': 'ascii' codec can't decode byte 0xe9 in position 20: ordinal not in range(128)!
% these should work
marked working directory as branch �
% hg log (ascii)
changeset: 5:db5520b4645f
branch: ?
tag: tip
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin1 branch
changeset: 4:9cff3c980b58
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: Added tag ? for changeset 770b9b11621d
changeset: 3:770b9b11621d
tag: ?
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: utf-8 e' encoded: ?
changeset: 2:0572af48b948
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e' encoded: ?
changeset: 1:0e5b7e3f9c4a
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: koi8-r: ????? = u'\u0440\u0442\u0443\u0442\u044c'
changeset: 0:1e78a93102a3
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e': ? = u'\xe9'
% hg log (latin-1)
changeset: 5:db5520b4645f
branch: �
tag: tip
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin1 branch
changeset: 4:9cff3c980b58
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: Added tag � for changeset 770b9b11621d
changeset: 3:770b9b11621d
tag: �
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: utf-8 e' encoded: �
changeset: 2:0572af48b948
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e' encoded: �
changeset: 1:0e5b7e3f9c4a
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: koi8-r: ����� = u'\u0440\u0442\u0443\u0442\u044c'
changeset: 0:1e78a93102a3
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e': � = u'\xe9'
% hg log (utf-8)
changeset: 5:db5520b4645f
branch: é
tag: tip
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin1 branch
changeset: 4:9cff3c980b58
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: Added tag é for changeset 770b9b11621d
changeset: 3:770b9b11621d
tag: é
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: utf-8 e' encoded: é
changeset: 2:0572af48b948
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e' encoded: é
changeset: 1:0e5b7e3f9c4a
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: koi8-r: ÒÔÕÔØ = u'\u0440\u0442\u0443\u0442\u044c'
changeset: 0:1e78a93102a3
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e': é = u'\xe9'
% hg tags (ascii)
tip 5:db5520b4645f
? 3:770b9b11621d
% hg tags (latin-1)
tip 5:db5520b4645f
� 3:770b9b11621d
% hg tags (utf-8)
tip 5:db5520b4645f
é 3:770b9b11621d
% hg branches (ascii)
? 5:db5520b4645f
default 4:9cff3c980b58 (inactive)
% hg branches (latin-1)
� 5:db5520b4645f
default 4:9cff3c980b58 (inactive)
% hg branches (utf-8)
é 5:db5520b4645f
default 4:9cff3c980b58 (inactive)
% hg log (utf-8)
changeset: 5:db5520b4645f
branch: é
tag: tip
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin1 branch
changeset: 4:9cff3c980b58
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: Added tag é for changeset 770b9b11621d
changeset: 3:770b9b11621d
tag: é
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: utf-8 e' encoded: é
changeset: 2:0572af48b948
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e' encoded: é
changeset: 1:0e5b7e3f9c4a
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: koi8-r: ртуть = u'\u0440\u0442\u0443\u0442\u044c'
changeset: 0:1e78a93102a3
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e': И = u'\xe9'
% hg log (dolphin)
abort: unknown encoding: dolphin, please check your locale settings
abort: decoding near '�': 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)!
abort: branch name not in UTF-8!