##// END OF EJS Templates
convert: transcode CVS log messages by specified encoding (issue5597)...
FUJIWARA Katsunori -
r33388:0823f098 default
parent child Browse files
Show More
@@ -228,6 +228,12 b' def convert(ui, src, dest=None, revmapfi'
228 228 part of a changeset then the default may not be long enough.
229 229 The default is 60.
230 230
231 :convert.cvsps.logencoding: Specify encoding name to be used for
232 transcoding CVS log messages. Multiple encoding names can be
233 specified as a list (see :hg:`help config.Syntax`), but only
234 the first acceptable encoding in the list is used per CVS log
235 entries. This transcoding is executed before cvslog hook below.
236
231 237 :convert.cvsps.mergeto: Specify a regular expression to which
232 238 commit log messages are matched. If a match occurs, then the
233 239 conversion process will insert a dummy revision merging the
@@ -12,6 +12,7 b' import re'
12 12 from mercurial.i18n import _
13 13 from mercurial import (
14 14 encoding,
15 error,
15 16 hook,
16 17 pycompat,
17 18 util,
@@ -491,6 +492,35 b' def createlog(ui, directory=None, root="'
491 492
492 493 ui.status(_('%d log entries\n') % len(log))
493 494
495 encodings = ui.configlist('convert', 'cvsps.logencoding')
496 if encodings:
497 def revstr(r):
498 # this is needed, because logentry.revision is a tuple of "int"
499 # (e.g. (1, 2) for "1.2")
500 return '.'.join(pycompat.maplist(pycompat.bytestr, r))
501
502 for entry in log:
503 comment = entry.comment
504 for e in encodings:
505 try:
506 entry.comment = comment.decode(e).encode('utf-8')
507 if ui.debugflag:
508 ui.debug("transcoding by %s: %s of %s\n" %
509 (e, revstr(entry.revision), entry.file))
510 break
511 except UnicodeDecodeError:
512 pass # try next encoding
513 except LookupError as inst: # unknown encoding, maybe
514 raise error.Abort(inst,
515 hint=_('check convert.cvsps.logencoding'
516 ' configuration'))
517 else:
518 raise error.Abort(_("no encoding can transcode"
519 " CVS log message for %s of %s")
520 % (revstr(entry.revision), entry.file),
521 hint=_('check convert.cvsps.logencoding'
522 ' configuration'))
523
494 524 hook.hook(ui, None, "cvslog", True, log=log)
495 525
496 526 return log
@@ -498,3 +498,157 b' update and verify the cvsps cache'
498 498
499 499
500 500 $ cd ..
501
502 Test transcoding CVS log messages (issue5597)
503 =============================================
504
505 To emulate commit messages in (non-ascii) multiple encodings portably,
506 this test scenario writes CVS history file (*,v file) directly via
507 python code.
508
509 Commit messages of version 1.2 - 1.4 use u3042 in 3 encodings below.
510
511 |encoding |byte sequence | decodable as: |
512 | | | utf-8 euc-jp cp932 |
513 +----------+--------------+--------------------+
514 |utf-8 |\xe3\x81\x82 | o x x |
515 |euc-jp |\xa4\xa2 | x o o |
516 |cp932 |\x82\xa0 | x x o |
517
518 $ mkdir -p cvsrepo/transcoding
519 $ python <<EOF
520 > fp = open('cvsrepo/transcoding/file,v', 'w')
521 > fp.write(('''
522 > head 1.4;
523 > access;
524 > symbols
525 > start:1.1.1.1 INITIAL:1.1.1;
526 > locks; strict;
527 > comment @# @;
528 >
529 >
530 > 1.4
531 > date 2017.07.10.00.00.04; author nobody; state Exp;
532 > branches;
533 > next 1.3;
534 > commitid 10059635D016A510FFA;
535 >
536 > 1.3
537 > date 2017.07.10.00.00.03; author nobody; state Exp;
538 > branches;
539 > next 1.2;
540 > commitid 10059635CFF6A4FF34E;
541 >
542 > 1.2
543 > date 2017.07.10.00.00.02; author nobody; state Exp;
544 > branches;
545 > next 1.1;
546 > commitid 10059635CFD6A4D5095;
547 >
548 > 1.1
549 > date 2017.07.10.00.00.01; author nobody; state Exp;
550 > branches
551 > 1.1.1.1;
552 > next ;
553 > commitid 10059635CFB6A4A3C33;
554 >
555 > 1.1.1.1
556 > date 2017.07.10.00.00.01; author nobody; state Exp;
557 > branches;
558 > next ;
559 > commitid 10059635CFB6A4A3C33;
560 >
561 >
562 > desc
563 > @@
564 >
565 >
566 > 1.4
567 > log
568 > @''' + u'\u3042'.encode('cp932') + ''' (cp932)
569 > @
570 > text
571 > @1
572 > 2
573 > 3
574 > 4
575 > @
576 >
577 >
578 > 1.3
579 > log
580 > @''' + u'\u3042'.encode('euc-jp') + ''' (euc-jp)
581 > @
582 > text
583 > @d4 1
584 > @
585 >
586 >
587 > 1.2
588 > log
589 > @''' + u'\u3042'.encode('utf-8') + ''' (utf-8)
590 > @
591 > text
592 > @d3 1
593 > @
594 >
595 >
596 > 1.1
597 > log
598 > @Initial revision
599 > @
600 > text
601 > @d2 1
602 > @
603 >
604 >
605 > 1.1.1.1
606 > log
607 > @import
608 > @
609 > text
610 > @@
611 > ''').lstrip())
612 > EOF
613
614 $ cvscall -q checkout transcoding
615 U transcoding/file
616
617 Test converting in normal case
618 ------------------------------
619
620 (filtering by grep in order to check only form of debug messages)
621
622 $ hg convert --config convert.cvsps.logencoding=utf-8,euc-jp,cp932 -q --debug transcoding transcoding-hg | grep 'transcoding by'
623 transcoding by utf-8: 1.1 of file
624 transcoding by utf-8: 1.1.1.1 of file
625 transcoding by utf-8: 1.2 of file
626 transcoding by euc-jp: 1.3 of file
627 transcoding by cp932: 1.4 of file
628 $ hg -R transcoding-hg --encoding utf-8 log -T "{rev}: {desc}\n"
629 5: update tags
630 4: import
631 3: \xe3\x81\x82 (cp932) (esc)
632 2: \xe3\x81\x82 (euc-jp) (esc)
633 1: \xe3\x81\x82 (utf-8) (esc)
634 0: Initial revision
635 $ rm -rf transcoding-hg
636
637 Test converting in error cases
638 ------------------------------
639
640 unknown encoding in convert.cvsps.logencoding
641
642 $ hg convert --config convert.cvsps.logencoding=foobar -q transcoding transcoding-hg
643 abort: unknown encoding: foobar
644 (check convert.cvsps.logencoding configuration)
645 [255]
646 $ rm -rf transcoding-hg
647
648 no acceptable encoding in convert.cvsps.logencoding
649
650 $ hg convert --config convert.cvsps.logencoding=utf-8,euc-jp -q transcoding transcoding-hg
651 abort: no encoding can transcode CVS log message for 1.4 of file
652 (check convert.cvsps.logencoding configuration)
653 [255]
654 $ rm -rf transcoding-hg
@@ -171,6 +171,12 b''
171 171 single changeset. When very large files were checked in as
172 172 part of a changeset then the default may not be long enough.
173 173 The default is 60.
174 convert.cvsps.logencoding
175 Specify encoding name to be used for transcoding CVS log
176 messages. Multiple encoding names can be specified as a list
177 (see 'hg help config.Syntax'), but only the first acceptable
178 encoding in the list is used per CVS log entries. This
179 transcoding is executed before cvslog hook below.
174 180 convert.cvsps.mergeto
175 181 Specify a regular expression to which commit log messages
176 182 are matched. If a match occurs, then the conversion process
General Comments 0
You need to be logged in to leave comments. Login now