upstream/mercurial-mirror Commit - r10095:69ce7a10

convert: implement two hooks in builtin cvsps

Frank Kingswood -

r10095:69ce7a10 default

parent child

hgext/convert/__init__.py

0 +9 0

              # convert.py Foreign SCM converter
              #
              # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
              #
              # This software may be used and distributed according to the terms of the
              # GNU General Public License version 2, incorporated herein by reference.
              '''import revisions from foreign VCS repositories into Mercurial'''
              import convcmd
              import cvsps
              import subversion
              from mercurial import commands
              from mercurial.i18n import _
              # Commands definition was moved elsewhere to ease demandload job.
              def convert(ui, src, dest=None, revmapfile=None, **opts):
                  """convert a foreign SCM repository to a Mercurial one.
                  Accepted source formats [identifiers]:
                  - Mercurial [hg]
                  - CVS [cvs]
                  - Darcs [darcs]
                  - git [git]
                  - Subversion [svn]
                  - Monotone [mtn]
                  - GNU Arch [gnuarch]
                  - Bazaar [bzr]
                  - Perforce [p4]
                  Accepted destination formats [identifiers]:
                  - Mercurial [hg]
                  - Subversion [svn] (history on branches is not preserved)
                  If no revision is given, all revisions will be converted.
                  Otherwise, convert will only import up to the named revision
                  (given in a format understood by the source).
                  If no destination directory name is specified, it defaults to the
                  basename of the source with '-hg' appended. If the destination
                  repository doesn't exist, it will be created.
                  By default, all sources except Mercurial will use --branchsort.
                  Mercurial uses --sourcesort to preserve original revision numbers
                  order. Sort modes have the following effects:
                  --branchsort  convert from parent to child revision when possible,
                                which means branches are usually converted one after
                                the other. It generates more compact repositories.
                  --datesort    sort revisions by date. Converted repositories have
                                good-looking changelogs but are often an order of
                                magnitude larger than the same ones generated by
                                --branchsort.
                  --sourcesort  try to preserve source revisions order, only
                                supported by Mercurial sources.
                  If <REVMAP> isn't given, it will be put in a default location
                  (<dest>/.hg/shamap by default). The <REVMAP> is a simple text file
                  that maps each source commit ID to the destination ID for that
                  revision, like so::
                    <source ID> <destination ID>
                  If the file doesn't exist, it's automatically created. It's
                  updated on each commit copied, so convert-repo can be interrupted
                  and can be run repeatedly to copy new commits.
                  The [username mapping] file is a simple text file that maps each
                  source commit author to a destination commit author. It is handy
                  for source SCMs that use unix logins to identify authors (eg:
                  CVS). One line per author mapping and the line format is:
                  srcauthor=whatever string you want
                  The filemap is a file that allows filtering and remapping of files
                  and directories. Comment lines start with '#'. Each line can
                  contain one of the following directives::
                    include path/to/file
                    exclude path/to/file
                    rename from/file to/file
                  The 'include' directive causes a file, or all files under a
                  directory, to be included in the destination repository, and the
                  exclusion of all other files and directories not explicitly
                  included. The 'exclude' directive causes files or directories to
                  be omitted. The 'rename' directive renames a file or directory. To
                  rename from a subdirectory into the root of the repository, use
                  '.' as the path to rename to.
                  The splicemap is a file that allows insertion of synthetic
                  history, letting you specify the parents of a revision. This is
                  useful if you want to e.g. give a Subversion merge two parents, or
                  graft two disconnected series of history together. Each entry
                  contains a key, followed by a space, followed by one or two
                  comma-separated values. The key is the revision ID in the source
                  revision control system whose parents should be modified (same
                  format as a key in .hg/shamap). The values are the revision IDs
                  (in either the source or destination revision control system) that
                  should be used as the new parents for that node. For example, if
                  you have merged "release-1.0" into "trunk", then you should
                  specify the revision on "trunk" as the first parent and the one on
                  the "release-1.0" branch as the second.
                  The branchmap is a file that allows you to rename a branch when it is
                  being brought in from whatever external repository. When used in
                  conjunction with a splicemap, it allows for a powerful combination
                  to help fix even the most badly mismanaged repositories and turn them
                  into nicely structured Mercurial repositories. The branchmap contains
                  lines of the form "original_branch_name new_branch_name".
                  "original_branch_name" is the name of the branch in the source
                  repository, and "new_branch_name" is the name of the branch is the
                  destination repository. This can be used to (for instance) move code
                  in one repository from "default" to a named branch.
                  Mercurial Source
                  ----------------
                  --config convert.hg.ignoreerrors=False    (boolean)
                      ignore integrity errors when reading. Use it to fix Mercurial
                      repositories with missing revlogs, by converting from and to
                      Mercurial.
                  --config convert.hg.saverev=False         (boolean)
                      store original revision ID in changeset (forces target IDs to
                      change)
                  --config convert.hg.startrev=0            (hg revision identifier)
                      convert start revision and its descendants
                  CVS Source
                  ----------
                  CVS source will use a sandbox (i.e. a checked-out copy) from CVS
                  to indicate the starting point of what will be converted. Direct
                  access to the repository files is not needed, unless of course the
                  repository is :local:. The conversion uses the top level directory
                  in the sandbox to find the CVS repository, and then uses CVS rlog
                  commands to find files to convert. This means that unless a
                  filemap is given, all files under the starting directory will be
                  converted, and that any directory reorganization in the CVS
                  sandbox is ignored.
                  The options shown are the defaults.
                  --config convert.cvsps.cache=True         (boolean)
                      Set to False to disable remote log caching, for testing and
                      debugging purposes.
                  --config convert.cvsps.fuzz=60            (integer)
                      Specify the maximum time (in seconds) that is allowed between
                      commits with identical user and log message in a single
                      changeset. When very large files were checked in as part of a
                      changeset then the default may not be long enough.
                  --config convert.cvsps.mergeto='{{mergetobranch ([-\\w]+)}}'
                      Specify a regular expression to which commit log messages are
                      matched. If a match occurs, then the conversion process will
                      insert a dummy revision merging the branch on which this log
                      message occurs to the branch indicated in the regex.
                  --config convert.cvsps.mergefrom='{{mergefrombranch ([-\\w]+)}}'
                      Specify a regular expression to which commit log messages are
                      matched. If a match occurs, then the conversion process will
                      add the most recent revision on the branch indicated in the
                      regex as the second parent of the changeset.
+                 --config hook.cvslog
+                     Specify a Python function to be called at the end of gathering
+                     the CVS log. The function is passed a list with the log entries,
+                     and can modify the entries in-place, or add or delete them.
+                 --config hook.cvschangesets
+                     Specify a Python function to be called after the changesets
+                     are calculated from the the CVS log. The function is passed
+                     a list with the changeset entries, and can modify the changesets
+                     in-place, or add or delete them.
                  An additional "debugcvsps" Mercurial command allows the builtin
                  changeset merging code to be run without doing a conversion. Its
                  parameters and output are similar to that of cvsps 2.1. Please see
                  the command help for more details.
                  Subversion Source
                  -----------------
                  Subversion source detects classical trunk/branches/tags layouts.
                  By default, the supplied "svn://repo/path/" source URL is
                  converted as a single branch. If "svn://repo/path/trunk" exists it
                  replaces the default branch. If "svn://repo/path/branches" exists,
                  its subdirectories are listed as possible branches. If
                  "svn://repo/path/tags" exists, it is looked for tags referencing
                  converted branches. Default "trunk", "branches" and "tags" values
                  can be overridden with following options. Set them to paths
                  relative to the source URL, or leave them blank to disable auto
                  detection.
                  --config convert.svn.branches=branches    (directory name)
                      specify the directory containing branches
                  --config convert.svn.tags=tags            (directory name)
                      specify the directory containing tags
                  --config convert.svn.trunk=trunk          (directory name)
                      specify the name of the trunk branch
                  Source history can be retrieved starting at a specific revision,
                  instead of being integrally converted. Only single branch
                  conversions are supported.
                  --config convert.svn.startrev=0           (svn revision number)
                      specify start Subversion revision.
                  Perforce Source
                  ---------------
                  The Perforce (P4) importer can be given a p4 depot path or a
                  client specification as source. It will convert all files in the
                  source to a flat Mercurial repository, ignoring labels, branches
                  and integrations. Note that when a depot path is given you then
                  usually should specify a target directory, because otherwise the
                  target may be named ...-hg.
                  It is possible to limit the amount of source history to be
                  converted by specifying an initial Perforce revision.
                  --config convert.p4.startrev=0            (perforce changelist number)
                      specify initial Perforce revision.
                  Mercurial Destination
                  ---------------------
                  --config convert.hg.clonebranches=False   (boolean)
                      dispatch source branches in separate clones.
                  --config convert.hg.tagsbranch=default    (branch name)
                      tag revisions branch name
                  --config convert.hg.usebranchnames=True   (boolean)
                      preserve branch names
                  """
                  return convcmd.convert(ui, src, dest, revmapfile, **opts)
              def debugsvnlog(ui, **opts):
                  return subversion.debugsvnlog(ui, **opts)
              def debugcvsps(ui, *args, **opts):
                  '''create changeset information from CVS
                  This command is intended as a debugging tool for the CVS to
                  Mercurial converter, and can be used as a direct replacement for
                  cvsps.
                  Hg debugcvsps reads the CVS rlog for current directory (or any
                  named directory) in the CVS repository, and converts the log to a
                  series of changesets based on matching commit log entries and
                  dates.'''
                  return cvsps.debugcvsps(ui, *args, **opts)
              commands.norepo += " convert debugsvnlog debugcvsps"
              cmdtable = {
                  "convert":
                      (convert,
                       [('A', 'authors', '', _('username mapping filename')),
                        ('d', 'dest-type', '', _('destination repository type')),
                        ('', 'filemap', '', _('remap file names using contents of file')),
                        ('r', 'rev', '', _('import up to target revision REV')),
                        ('s', 'source-type', '', _('source repository type')),
                        ('', 'splicemap', '', _('splice synthesized history into place')),
                        ('', 'branchmap', '', _('change branch names while converting')),
                        ('', 'branchsort', None, _('try to sort changesets by branches')),
                        ('', 'datesort', None, _('try to sort changesets by date')),
                        ('', 'sourcesort', None, _('preserve source changesets order'))],
                       _('hg convert [OPTION]... SOURCE [DEST [REVMAP]]')),
                  "debugsvnlog":
                      (debugsvnlog,
                       [],
                       'hg debugsvnlog'),
                  "debugcvsps":
                      (debugcvsps,
                       [
                        # Main options shared with cvsps-2.1
                        ('b', 'branches', [], _('only return changes on specified branches')),
                        ('p', 'prefix', '', _('prefix to remove from file names')),
                        ('r', 'revisions', [], _('only return changes after or between specified tags')),
                        ('u', 'update-cache', None, _("update cvs log cache")),
                        ('x', 'new-cache', None, _("create new cvs log cache")),
                        ('z', 'fuzz', 60, _('set commit time fuzz in seconds')),
                        ('', 'root', '', _('specify cvsroot')),
                        # Options specific to builtin cvsps
                        ('', 'parents', '', _('show parent changesets')),
                        ('', 'ancestors', '', _('show current changeset in ancestor branches')),
                        # Options that are ignored for compatibility with cvsps-2.1
                        ('A', 'cvs-direct', None, _('ignored for compatibility')),
                       ],
                       _('hg debugcvsps [OPTION]... [PATH]...')),
              }

hgext/convert/cvsps.py

0 +5 0

              #
              # Mercurial built-in replacement for cvsps.
              #
              # Copyright 2008, Frank Kingswood <frank@kingswood-consulting.co.uk>
              #
              # This software may be used and distributed according to the terms of the
              # GNU General Public License version 2, incorporated herein by reference.
              import os
              import re
              import cPickle as pickle
              from mercurial import util
              from mercurial.i18n import _
+             from mercurial import hook
              class logentry(object):
                  '''Class logentry has the following attributes:
                      .author    - author name as CVS knows it
                      .branch    - name of branch this revision is on
                      .branches  - revision tuple of branches starting at this revision
                      .comment   - commit message
                      .date      - the commit date as a (time, tz) tuple
                      .dead      - true if file revision is dead
                      .file      - Name of file
                      .lines     - a tuple (+lines, -lines) or None
                      .parent    - Previous revision of this entry
                      .rcs       - name of file as returned from CVS
                      .revision  - revision number as tuple
                      .tags      - list of tags on the file
                      .synthetic - is this a synthetic "file ... added on ..." revision?
                      .mergepoint- the branch that has been merged from
                                   (if present in rlog output)
                      .branchpoints- the branches that start at the current entry
                  '''
                  def __init__(self, **entries):
                      self.__dict__.update(entries)
                  def __repr__(self):
                      return "<%s at 0x%x: %s %s>" % (self.__class__.__name__,
                                                      id(self),
                                                      self.file,
                                                      ".".join(map(str, self.revision)))
              class logerror(Exception):
                  pass
              def getrepopath(cvspath):
                  """Return the repository path from a CVS path.
                  >>> getrepopath('/foo/bar')
                  '/foo/bar'
                  >>> getrepopath('c:/foo/bar')
                  'c:/foo/bar'
                  >>> getrepopath(':pserver:10/foo/bar')
                  '/foo/bar'
                  >>> getrepopath(':pserver:10c:/foo/bar')
                  '/foo/bar'
                  >>> getrepopath(':pserver:/foo/bar')
                  '/foo/bar'
                  >>> getrepopath(':pserver:c:/foo/bar')
                  'c:/foo/bar'
                  >>> getrepopath(':pserver:truc@foo.bar:/foo/bar')
                  '/foo/bar'
                  >>> getrepopath(':pserver:truc@foo.bar:c:/foo/bar')
                  'c:/foo/bar'
                  """
                  # According to CVS manual, CVS paths are expressed like:
                  # [:method:][[user][:password]@]hostname[:[port]]/path/to/repository
                  #
                  # Unfortunately, Windows absolute paths start with a drive letter
                  # like 'c:' making it harder to parse. Here we assume that drive
                  # letters are only one character long and any CVS component before
                  # the repository path is at least 2 characters long, and use this
                  # to disambiguate.
                  parts = cvspath.split(':')
                  if len(parts) == 1:
                      return parts[0]
                  # Here there is an ambiguous case if we have a port number
                  # immediately followed by a Windows driver letter. We assume this
                  # never happens and decide it must be CVS path component,
                  # therefore ignoring it.
                  if len(parts[-2]) > 1:
                      return parts[-1].lstrip('0123456789')
                  return parts[-2] + ':' + parts[-1]
              def createlog(ui, directory=None, root="", rlog=True, cache=None):
                  '''Collect the CVS rlog'''
                  # Because we store many duplicate commit log messages, reusing strings
                  # saves a lot of memory and pickle storage space.
                  _scache = {}
                  def scache(s):
                      "return a shared version of a string"
                      return _scache.setdefault(s, s)
                  ui.status(_('collecting CVS rlog\n'))
                  log = []      # list of logentry objects containing the CVS state
                  # patterns to match in CVS (r)log output, by state of use
                  re_00 = re.compile('RCS file: (.+)$')
                  re_01 = re.compile('cvs \\[r?log aborted\\]: (.+)$')
                  re_02 = re.compile('cvs (r?log|server): (.+)\n$')
                  re_03 = re.compile("(Cannot access.+CVSROOT)|"
                                     "(can't create temporary directory.+)$")
                  re_10 = re.compile('Working file: (.+)$')
                  re_20 = re.compile('symbolic names:')
                  re_30 = re.compile('\t(.+): ([\\d.]+)$')
                  re_31 = re.compile('----------------------------$')
                  re_32 = re.compile('======================================='
                                     '======================================$')
                  re_50 = re.compile('revision ([\\d.]+)(\s+locked by:\s+.+;)?$')
                  re_60 = re.compile(r'date:\s+(.+);\s+author:\s+(.+);\s+state:\s+(.+?);'
                                     r'(\s+lines:\s+(\+\d+)?\s+(-\d+)?;)?'
                                     r'(.*mergepoint:\s+([^;]+);)?')
                  re_70 = re.compile('branches: (.+);$')
                  file_added_re = re.compile(r'file [^/]+ was (initially )?added on branch')
                  prefix = ''   # leading path to strip of what we get from CVS
                  if directory is None:
                      # Current working directory
                      # Get the real directory in the repository
                      try:
                          prefix = open(os.path.join('CVS','Repository')).read().strip()
                          if prefix == ".":
                              prefix = ""
                          directory = prefix
                      except IOError:
                          raise logerror('Not a CVS sandbox')
                      if prefix and not prefix.endswith(os.sep):
                          prefix += os.sep
                      # Use the Root file in the sandbox, if it exists
                      try:
                          root = open(os.path.join('CVS','Root')).read().strip()
                      except IOError:
                          pass
                  if not root:
                      root = os.environ.get('CVSROOT', '')
                  # read log cache if one exists
                  oldlog = []
                  date = None
                  if cache:
                      cachedir = os.path.expanduser('~/.hg.cvsps')
                      if not os.path.exists(cachedir):
                          os.mkdir(cachedir)
                      # The cvsps cache pickle needs a uniquified name, based on the
                      # repository location. The address may have all sort of nasties
                      # in it, slashes, colons and such. So here we take just the
                      # alphanumerics, concatenated in a way that does not mix up the
                      # various components, so that
                      #    :pserver:user@server:/path
                      # and
                      #    /pserver/user/server/path
                      # are mapped to different cache file names.
                      cachefile = root.split(":") + [directory, "cache"]
                      cachefile = ['-'.join(re.findall(r'\w+', s)) for s in cachefile if s]
                      cachefile = os.path.join(cachedir,
                                               '.'.join([s for s in cachefile if s]))
                  if cache == 'update':
                      try:
                          ui.note(_('reading cvs log cache %s\n') % cachefile)
                          oldlog = pickle.load(open(cachefile))
                          ui.note(_('cache has %d log entries\n') % len(oldlog))
                      except Exception, e:
                          ui.note(_('error reading cache: %r\n') % e)
                      if oldlog:
                          date = oldlog[-1].date    # last commit date as a (time,tz) tuple
                          date = util.datestr(date, '%Y/%m/%d %H:%M:%S %1%2')
                  # build the CVS commandline
                  cmd = ['cvs', '-q']
                  if root:
                      cmd.append('-d%s' % root)
                      p = util.normpath(getrepopath(root))
                      if not p.endswith('/'):
                          p += '/'
                      prefix = p + util.normpath(prefix)
                  cmd.append(['log', 'rlog'][rlog])
                  if date:
                      # no space between option and date string
                      cmd.append('-d>%s' % date)
                  cmd.append(directory)
                  # state machine begins here
                  tags = {}     # dictionary of revisions on current file with their tags
                  branchmap = {} # mapping between branch names and revision numbers
                  state = 0
                  store = False # set when a new record can be appended
                  cmd = [util.shellquote(arg) for arg in cmd]
                  ui.note(_("running %s\n") % (' '.join(cmd)))
                  ui.debug("prefix=%r directory=%r root=%r\n" % (prefix, directory, root))
                  pfp = util.popen(' '.join(cmd))
                  peek = pfp.readline()
                  while True:
                      line = peek
                      if line == '':
                          break
                      peek = pfp.readline()
                      if line.endswith('\n'):
                          line = line[:-1]
                      #ui.debug('state=%d line=%r\n' % (state, line))
                      if state == 0:
                          # initial state, consume input until we see 'RCS file'
                          match = re_00.match(line)
                          if match:
                              rcs = match.group(1)
                              tags = {}
                              if rlog:
                                  filename = util.normpath(rcs[:-2])
                                  if filename.startswith(prefix):
                                      filename = filename[len(prefix):]
                                  if filename.startswith('/'):
                                      filename = filename[1:]
                                  if filename.startswith('Attic/'):
                                      filename = filename[6:]
                                  else:
                                      filename = filename.replace('/Attic/', '/')
                                  state = 2
                                  continue
                              state = 1
                              continue
                          match = re_01.match(line)
                          if match:
                              raise Exception(match.group(1))
                          match = re_02.match(line)
                          if match:
                              raise Exception(match.group(2))
                          if re_03.match(line):
                              raise Exception(line)
                      elif state == 1:
                          # expect 'Working file' (only when using log instead of rlog)
                          match = re_10.match(line)
                          assert match, _('RCS file must be followed by working file')
                          filename = util.normpath(match.group(1))
                          state = 2
                      elif state == 2:
                          # expect 'symbolic names'
                          if re_20.match(line):
                              branchmap = {}
                              state = 3
                      elif state == 3:
                          # read the symbolic names and store as tags
                          match = re_30.match(line)
                          if match:
                              rev = [int(x) for x in match.group(2).split('.')]
                              # Convert magic branch number to an odd-numbered one
                              revn = len(rev)
                              if revn > 3 and (revn % 2) == 0 and rev[-2] == 0:
                                  rev = rev[:-2] + rev[-1:]
                              rev = tuple(rev)
                              if rev not in tags:
                                  tags[rev] = []
                              tags[rev].append(match.group(1))
                              branchmap[match.group(1)] = match.group(2)
                          elif re_31.match(line):
                              state = 5
                          elif re_32.match(line):
                              state = 0
                      elif state == 4:
                          # expecting '------' separator before first revision
                          if re_31.match(line):
                              state = 5
                          else:
                              assert not re_32.match(line), _('must have at least '
                                                              'some revisions')
                      elif state == 5:
                          # expecting revision number and possibly (ignored) lock indication
                          # we create the logentry here from values stored in states 0 to 4,
                          # as this state is re-entered for subsequent revisions of a file.
                          match = re_50.match(line)
                          assert match, _('expected revision number')
                          e = logentry(rcs=scache(rcs), file=scache(filename),
                                  revision=tuple([int(x) for x in match.group(1).split('.')]),
                                  branches=[], parent=None,
                                  synthetic=False)
                          state = 6
                      elif state == 6:
                          # expecting date, author, state, lines changed
                          match = re_60.match(line)
                          assert match, _('revision must be followed by date line')
                          d = match.group(1)
                          if d[2] == '/':
                              # Y2K
                              d = '19' + d
                          if len(d.split()) != 3:
                              # cvs log dates always in GMT
                              d = d + ' UTC'
                          e.date = util.parsedate(d, ['%y/%m/%d %H:%M:%S',
                                                      '%Y/%m/%d %H:%M:%S',
                                                      '%Y-%m-%d %H:%M:%S'])
                          e.author = scache(match.group(2))
                          e.dead = match.group(3).lower() == 'dead'
                          if match.group(5):
                              if match.group(6):
                                  e.lines = (int(match.group(5)), int(match.group(6)))
                              else:
                                  e.lines = (int(match.group(5)), 0)
                          elif match.group(6):
                              e.lines = (0, int(match.group(6)))
                          else:
                              e.lines = None
                          if match.group(7): # cvsnt mergepoint
                              myrev = match.group(8).split('.')
                              if len(myrev) == 2: # head
                                  e.mergepoint = 'HEAD'
                              else:
                                  myrev = '.'.join(myrev[:-2] + ['0', myrev[-2]])
                                  branches = [b for b in branchmap if branchmap[b] == myrev]
                                  assert len(branches) == 1, 'unknown branch: %s' % e.mergepoint
                                  e.mergepoint = branches[0]
                          else:
                              e.mergepoint = None
                          e.comment = []
                          state = 7
                      elif state == 7:
                          # read the revision numbers of branches that start at this revision
                          # or store the commit log message otherwise
                          m = re_70.match(line)
                          if m:
                              e.branches = [tuple([int(y) for y in x.strip().split('.')])
                                              for x in m.group(1).split(';')]
                              state = 8
                          elif re_31.match(line) and re_50.match(peek):
                              state = 5
                              store = True
                          elif re_32.match(line):
                              state = 0
                              store = True
                          else:
                              e.comment.append(line)
                      elif state == 8:
                          # store commit log message
                          if re_31.match(line):
                              state = 5
                              store = True
                          elif re_32.match(line):
                              state = 0
                              store = True
                          else:
                              e.comment.append(line)
                      # When a file is added on a branch B1, CVS creates a synthetic
                      # dead trunk revision 1.1 so that the branch has a root.
                      # Likewise, if you merge such a file to a later branch B2 (one
                      # that already existed when the file was added on B1), CVS
                      # creates a synthetic dead revision 1.1.x.1 on B2.  Don't drop
                      # these revisions now, but mark them synthetic so
                      # createchangeset() can take care of them.
                      if (store and
                            e.dead and
                            e.revision[-1] == 1 and      # 1.1 or 1.1.x.1
                            len(e.comment) == 1 and
                            file_added_re.match(e.comment[0])):
                          ui.debug('found synthetic revision in %s: %r\n'
                                   % (e.rcs, e.comment[0]))
                          e.synthetic = True
                      if store:
                          # clean up the results and save in the log.
                          store = False
                          e.tags = sorted([scache(x) for x in tags.get(e.revision, [])])
                          e.comment = scache('\n'.join(e.comment))
                          revn = len(e.revision)
                          if revn > 3 and (revn % 2) == 0:
                              e.branch = tags.get(e.revision[:-1], [None])[0]
                          else:
                              e.branch = None
                          # find the branches starting from this revision
                          branchpoints = set()
                          for branch, revision in branchmap.iteritems():
                              revparts = tuple([int(i) for i in revision.split('.')])
                              if revparts[-2] == 0 and revparts[-1] % 2 == 0:
                                  # normal branch
                                  if revparts[:-2] == e.revision:
                                      branchpoints.add(branch)
                              elif revparts == (1,1,1): # vendor branch
                                  if revparts in e.branches:
                                      branchpoints.add(branch)
                          e.branchpoints = branchpoints
                          log.append(e)
                          if len(log) % 100 == 0:
                              ui.status(util.ellipsis('%d %s' % (len(log), e.file), 80)+'\n')
                  log.sort(key=lambda x: (x.rcs, x.revision))
                  # find parent revisions of individual files
                  versions = {}
                  for e in log:
                      branch = e.revision[:-1]
                      p = versions.get((e.rcs, branch), None)
                      if p is None:
                          p = e.revision[:-2]
                      e.parent = p
                      versions[(e.rcs, branch)] = e.revision
                  # update the log cache
                  if cache:
                      if log:
                          # join up the old and new logs
                          log.sort(key=lambda x: x.date)
                          if oldlog and oldlog[-1].date >= log[0].date:
                              raise logerror('Log cache overlaps with new log entries,'
                                             ' re-run without cache.')
                          log = oldlog + log
                          # write the new cachefile
                          ui.note(_('writing cvs log cache %s\n') % cachefile)
                          pickle.dump(log, open(cachefile, 'w'))
                      else:
                          log = oldlog
                  ui.status(_('%d log entries\n') % len(log))
+                 hook.hook(ui, None, "cvslog", True, log=log)
                  return log
              class changeset(object):
                  '''Class changeset has the following attributes:
                      .id        - integer identifying this changeset (list index)
                      .author    - author name as CVS knows it
                      .branch    - name of branch this changeset is on, or None
                      .comment   - commit message
                      .date      - the commit date as a (time,tz) tuple
                      .entries   - list of logentry objects in this changeset
                      .parents   - list of one or two parent changesets
                      .tags      - list of tags on this changeset
                      .synthetic - from synthetic revision "file ... added on branch ..."
                      .mergepoint- the branch that has been merged from
                                   (if present in rlog output)
                      .branchpoints- the branches that start at the current entry
                  '''
                  def __init__(self, **entries):
                      self.__dict__.update(entries)
                  def __repr__(self):
                      return "<%s at 0x%x: %s>" % (self.__class__.__name__,
                                                   id(self),
                                                   getattr(self, 'id', "(no id)"))
              def createchangeset(ui, log, fuzz=60, mergefrom=None, mergeto=None):
                  '''Convert log into changesets.'''
                  ui.status(_('creating changesets\n'))
                  # Merge changesets
                  log.sort(key=lambda x: (x.comment, x.author, x.branch, x.date))
                  changesets = []
                  files = set()
                  c = None
                  for i, e in enumerate(log):
                      # Check if log entry belongs to the current changeset or not.
                      # Since CVS is file centric, two different file revisions with
                      # different branchpoints should be treated as belonging to two
                      # different changesets (and the ordering is important and not
                      # honoured by cvsps at this point).
                      #
                      # Consider the following case:
                      # foo 1.1 branchpoints: [MYBRANCH]
                      # bar 1.1 branchpoints: [MYBRANCH, MYBRANCH2]
                      #
                      # Here foo is part only of MYBRANCH, but not MYBRANCH2, e.g. a
                      # later version of foo may be in MYBRANCH2, so foo should be the
                      # first changeset and bar the next and MYBRANCH and MYBRANCH2
                      # should both start off of the bar changeset. No provisions are
                      # made to ensure that this is, in fact, what happens.
                      if not (c and
                                e.comment == c.comment and
                                e.author == c.author and
                                e.branch == c.branch and
                                (not hasattr(e, 'branchpoints') or
                                  not hasattr (c, 'branchpoints') or
                                  e.branchpoints == c.branchpoints) and
                                ((c.date[0] + c.date[1]) <=
                                 (e.date[0] + e.date[1]) <=
                                 (c.date[0] + c.date[1]) + fuzz) and
                                e.file not in files):
                          c = changeset(comment=e.comment, author=e.author,
                                        branch=e.branch, date=e.date, entries=[],
                                        mergepoint=getattr(e, 'mergepoint', None),
                                        branchpoints=getattr(e, 'branchpoints', set()))
                          changesets.append(c)
                          files = set()
                          if len(changesets) % 100 == 0:
                              t = '%d %s' % (len(changesets), repr(e.comment)[1:-1])
                              ui.status(util.ellipsis(t, 80) + '\n')
                      c.entries.append(e)
                      files.add(e.file)
                      c.date = e.date       # changeset date is date of latest commit in it
                  # Mark synthetic changesets
                  for c in changesets:
                      # Synthetic revisions always get their own changeset, because
                      # the log message includes the filename.  E.g. if you add file3
                      # and file4 on a branch, you get four log entries and three
                      # changesets:
                      #   "File file3 was added on branch ..." (synthetic, 1 entry)
                      #   "File file4 was added on branch ..." (synthetic, 1 entry)
                      #   "Add file3 and file4 to fix ..."     (real, 2 entries)
                      # Hence the check for 1 entry here.
                      synth = getattr(c.entries[0], 'synthetic', None)
                      c.synthetic = (len(c.entries) == 1 and synth)
                  # Sort files in each changeset
                  for c in changesets:
                      def pathcompare(l, r):
                          'Mimic cvsps sorting order'
                          l = l.split('/')
                          r = r.split('/')
                          nl = len(l)
                          nr = len(r)
                          n = min(nl, nr)
                          for i in range(n):
                              if i + 1 == nl and nl < nr:
                                  return -1
                              elif i + 1 == nr and nl > nr:
                                  return +1
                              elif l[i] < r[i]:
                                  return -1
                              elif l[i] > r[i]:
                                  return +1
                          return 0
                      def entitycompare(l, r):
                          return pathcompare(l.file, r.file)
                      c.entries.sort(entitycompare)
                  # Sort changesets by date
                  def cscmp(l, r):
                      d = sum(l.date) - sum(r.date)
                      if d:
                          return d
                      # detect vendor branches and initial commits on a branch
                      le = {}
                      for e in l.entries:
                          le[e.rcs] = e.revision
                      re = {}
                      for e in r.entries:
                          re[e.rcs] = e.revision
                      d = 0
                      for e in l.entries:
                          if re.get(e.rcs, None) == e.parent:
                              assert not d
                              d = 1
                              break
                      for e in r.entries:
                          if le.get(e.rcs, None) == e.parent:
                              assert not d
                              d = -1
                              break
                      return d
                  changesets.sort(cscmp)
                  # Collect tags
                  globaltags = {}
                  for c in changesets:
                      for e in c.entries:
                          for tag in e.tags:
                              # remember which is the latest changeset to have this tag
                              globaltags[tag] = c
                  for c in changesets:
                      tags = set()
                      for e in c.entries:
                          tags.update(e.tags)
                      # remember tags only if this is the latest changeset to have it
                      c.tags = sorted(tag for tag in tags if globaltags[tag] is c)
                  # Find parent changesets, handle {{mergetobranch BRANCHNAME}}
                  # by inserting dummy changesets with two parents, and handle
                  # {{mergefrombranch BRANCHNAME}} by setting two parents.
                  if mergeto is None:
                      mergeto = r'{{mergetobranch ([-\w]+)}}'
                  if mergeto:
                      mergeto = re.compile(mergeto)
                  if mergefrom is None:
                      mergefrom = r'{{mergefrombranch ([-\w]+)}}'
                  if mergefrom:
                      mergefrom = re.compile(mergefrom)
                  versions = {}    # changeset index where we saw any particular file version
                  branches = {}    # changeset index where we saw a branch
                  n = len(changesets)
                  i = 0
                  while i<n:
                      c = changesets[i]
                      for f in c.entries:
                          versions[(f.rcs, f.revision)] = i
                      p = None
                      if c.branch in branches:
                          p = branches[c.branch]
                      else:
                          # first changeset on a new branch
                          # the parent is a changeset with the branch in its
                          # branchpoints such that it is the latest possible
                          # commit without any intervening, unrelated commits.
                          for candidate in xrange(i):
                              if c.branch not in changesets[candidate].branchpoints:
                                  if p is not None:
                                      break
                                  continue
                              p = candidate
                      c.parents = []
                      if p is not None:
                          p = changesets[p]
                          # Ensure no changeset has a synthetic changeset as a parent.
                          while p.synthetic:
                              assert len(p.parents) <= 1, \
                                     _('synthetic changeset cannot have multiple parents')
                              if p.parents:
                                  p = p.parents[0]
                              else:
                                  p = None
                                  break
                          if p is not None:
                              c.parents.append(p)
                      if c.mergepoint:
                          if c.mergepoint == 'HEAD':
                              c.mergepoint = None
                          c.parents.append(changesets[branches[c.mergepoint]])
                      if mergefrom:
                          m = mergefrom.search(c.comment)
                          if m:
                              m = m.group(1)
                              if m == 'HEAD':
                                  m = None
                              try:
                                  candidate = changesets[branches[m]]
                              except KeyError:
                                  ui.warn(_("warning: CVS commit message references "
                                            "non-existent branch %r:\n%s\n")
                                          % (m, c.comment))
                              if m in branches and c.branch != m and not candidate.synthetic:
                                  c.parents.append(candidate)
                      if mergeto:
                          m = mergeto.search(c.comment)
                          if m:
                              try:
                                  m = m.group(1)
                                  if m == 'HEAD':
                                      m = None
                              except:
                                  m = None   # if no group found then merge to HEAD
                              if m in branches and c.branch != m:
                                  # insert empty changeset for merge
                                  cc = changeset(author=c.author, branch=m, date=c.date,
                                          comment='convert-repo: CVS merge from branch %s' % c.branch,
                                          entries=[], tags=[], parents=[changesets[branches[m]], c])
                                  changesets.insert(i + 1, cc)
                                  branches[m] = i + 1
                                  # adjust our loop counters now we have inserted a new entry
                                  n += 1
                                  i += 2
                                  continue
                      branches[c.branch] = i
                      i += 1
                  # Drop synthetic changesets (safe now that we have ensured no other
                  # changesets can have them as parents).
                  i = 0
                  while i < len(changesets):
                      if changesets[i].synthetic:
                          del changesets[i]
                      else:
                          i += 1
                  # Number changesets
                  for i, c in enumerate(changesets):
                      c.id = i + 1
                  ui.status(_('%d changeset entries\n') % len(changesets))
+                 hook.hook(ui, None, "cvschangesets", True, changesets=changesets)
                  return changesets
              def debugcvsps(ui, *args, **opts):
                  '''Read CVS rlog for current directory or named path in
                  repository, and convert the log to changesets based on matching
                  commit log entries and dates.
                  '''
                  if opts["new_cache"]:
                      cache = "write"
                  elif opts["update_cache"]:
                      cache = "update"
                  else:
                      cache = None
                  revisions = opts["revisions"]
                  try:
                      if args:
                          log = []
                          for d in args:
                              log += createlog(ui, d, root=opts["root"], cache=cache)
                      else:
                          log = createlog(ui, root=opts["root"], cache=cache)
                  except logerror, e:
                      ui.write("%r\n"%e)
                      return
                  changesets = createchangeset(ui, log, opts["fuzz"])
                  del log
                  # Print changesets (optionally filtered)
                  off = len(revisions)
                  branches = {}    # latest version number in each branch
                  ancestors = {}   # parent branch
                  for cs in changesets:
                      if opts["ancestors"]:
                          if cs.branch not in branches and cs.parents and cs.parents[0].id:
                              ancestors[cs.branch] = (changesets[cs.parents[0].id-1].branch,
                                                      cs.parents[0].id)
                          branches[cs.branch] = cs.id
                      # limit by branches
                      if opts["branches"] and (cs.branch or 'HEAD') not in opts["branches"]:
                          continue
                      if not off:
                          # Note: trailing spaces on several lines here are needed to have
                          #       bug-for-bug compatibility with cvsps.
                          ui.write('---------------------\n')
                          ui.write('PatchSet %d \n' % cs.id)
                          ui.write('Date: %s\n' % util.datestr(cs.date,
                                                               '%Y/%m/%d %H:%M:%S %1%2'))
                          ui.write('Author: %s\n' % cs.author)
                          ui.write('Branch: %s\n' % (cs.branch or 'HEAD'))
                          ui.write('Tag%s: %s \n' % (['', 's'][len(cs.tags)>1],
                                                ','.join(cs.tags) or '(none)'))
                          branchpoints = getattr(cs, 'branchpoints', None)
                          if branchpoints:
                              ui.write('Branchpoints: %s \n' % ', '.join(branchpoints))
                          if opts["parents"] and cs.parents:
                              if len(cs.parents)>1:
                                  ui.write('Parents: %s\n' % (','.join([str(p.id) for p in cs.parents])))
                              else:
                                  ui.write('Parent: %d\n' % cs.parents[0].id)
                          if opts["ancestors"]:
                              b = cs.branch
                              r = []
                              while b:
                                  b, c = ancestors[b]
                                  r.append('%s:%d:%d' % (b or "HEAD", c, branches[b]))
                              if r:
                                  ui.write('Ancestors: %s\n' % (','.join(r)))
                          ui.write('Log:\n')
                          ui.write('%s\n\n' % cs.comment)
                          ui.write('Members: \n')
                          for f in cs.entries:
                              fn = f.file
                              if fn.startswith(opts["prefix"]):
                                  fn = fn[len(opts["prefix"]):]
                              ui.write('\t%s:%s->%s%s \n' % (fn, '.'.join([str(x) for x in f.parent]) or 'INITIAL',
                                                        '.'.join([str(x) for x in f.revision]), ['', '(DEAD)'][f.dead]))
                          ui.write('\n')
                      # have we seen the start tag?
                      if revisions and off:
                          if revisions[0] == str(cs.id) or \
                              revisions[0] in cs.tags:
                              off = False
                      # see if we reached the end tag
                      if len(revisions)>1 and not off:
                          if revisions[1] == str(cs.id) or \
                              revisions[1] in cs.tags:
                              break

tests/test-convert-cvs

0 +14 -1

              #!/bin/sh
              "$TESTDIR/hghave" cvs || exit 80
              cvscall()
              {
                  cvs -f "$@"
              }
              hgcat()
              {
                  hg --cwd src-hg cat -r tip "$1"
              }
              echo "[extensions]" >> $HGRCPATH
              echo "convert = " >> $HGRCPATH
              echo "graphlog = " >> $HGRCPATH
+             cat > cvshooks.py <<EOF
+             def cvslog(ui,repo,hooktype,log):
+                 print "%s hook: %d entries"%(hooktype,len(log))
+             def cvschangesets(ui,repo,hooktype,changesets):
+                 print "%s hook: %d changesets"%(hooktype,len(changesets))
+             EOF
+             hookpath=$PWD
+             echo "[hooks]" >> $HGRCPATH
+             echo "cvslog=python:$hookpath/cvshooks.py:cvslog" >> $HGRCPATH
+             echo "cvschangesets=python:$hookpath/cvshooks.py:cvschangesets" >> $HGRCPATH
              echo % create cvs repository
              mkdir cvsrepo
              cd cvsrepo
-             CVSROOT=`pwd`
+             CVSROOT=$PWD
              export CVSROOT
              CVS_OPTIONS=-f
              export CVS_OPTIONS
              cd ..
              cvscall -q -d "$CVSROOT" init
              echo % create source directory
              mkdir src-temp
              cd src-temp
              echo a > a
              mkdir b
              cd b
              echo c > c
              cd ..
              echo % import source directory
              cvscall -q import -m import src INITIAL start
              cd ..
              echo % checkout source directory
              cvscall -q checkout src
              echo % commit a new revision changing b/c
              cd src
              sleep 1
              echo c >> b/c
              cvscall -q commit -mci0 . | grep '<--' |\
                  sed -e 's:.*src/\(.*\),v.*:checking in src/\1,v:g'
              cd ..
              echo % convert fresh repo
              hg convert src src-hg | sed -e 's/connecting to.*cvsrepo/connecting to cvsrepo/g'
              hgcat a
              hgcat b/c
              echo % convert fresh repo with --filemap
              echo include b/c > filemap
              hg convert --filemap filemap src src-filemap | sed -e 's/connecting to.*cvsrepo/connecting to cvsrepo/g'
              hgcat b/c
              hg -R src-filemap log --template '{rev} {desc} files: {files}\n'
              echo % commit new file revisions
              cd src
              echo a >> a
              echo c >> b/c
              cvscall -q commit -mci1 . | grep '<--' |\
                  sed -e 's:.*src/\(.*\),v.*:checking in src/\1,v:g'
              cd ..
              echo % convert again
              hg convert src src-hg | sed -e 's/connecting to.*cvsrepo/connecting to cvsrepo/g'
              hgcat a
              hgcat b/c
              echo % convert again with --filemap
              hg convert --filemap filemap src src-filemap | sed -e 's/connecting to.*cvsrepo/connecting to cvsrepo/g'
              hgcat b/c
              hg -R src-filemap log --template '{rev} {desc} files: {files}\n'
              echo % commit branch
              cd src
              cvs -q update -r1.1 b/c
              cvs -q tag -b branch
              cvs -q update -r branch > /dev/null
              echo d >> b/c
              cvs -q commit -mci2 . | grep '<--' |\
                  sed -e 's:.*src/\(.*\),v.*:checking in src/\1,v:g'
              cd ..
              echo % convert again
              hg convert src src-hg | sed -e 's/connecting to.*cvsrepo/connecting to cvsrepo/g'
              hgcat b/c
              echo % convert again with --filemap
              hg convert --filemap filemap src src-filemap | sed -e 's/connecting to.*cvsrepo/connecting to cvsrepo/g'
              hgcat b/c
              hg -R src-filemap log --template '{rev} {desc} files: {files}\n'
              echo % commit a new revision with funny log message
              cd src
              sleep 1
              echo e >> a
              cvscall -q commit -m'funny
              ----------------------------
              log message' . | grep '<--' |\
                  sed -e 's:.*src/\(.*\),v.*:checking in src/\1,v:g'
              cd ..
              echo % convert again
              hg convert src src-hg | sed -e 's/connecting to.*cvsrepo/connecting to cvsrepo/g'
              echo "graphlog = " >> $HGRCPATH
              hg -R src-hg glog --template '{rev} ({branches}) {desc} files: {files}\n'
              echo % testing debugcvsps
              cd src
              hg debugcvsps | sed -e 's/Author:.*/Author:/' -e 's/Date:.*/Date:/'

tests/test-convert-cvs.out

0 +16 0

              % create cvs repository
              % create source directory
              % import source directory
              N src/a
              N src/b/c
              No conflicts created by this import
              % checkout source directory
              U src/a
              U src/b/c
              % commit a new revision changing b/c
              checking in src/b/c,v
              % convert fresh repo
              initializing destination src-hg repository
              connecting to cvsrepo
              scanning source...
              collecting CVS rlog
 log entries
+             cvslog hook: 5 entries
              creating changesets
 changeset entries
+             cvschangesets hook: 3 changesets
              sorting...
              converting...
 Initial revision
 import
 ci0
              updating tags
              a
              c
              c
              % convert fresh repo with --filemap
              initializing destination src-filemap repository
              connecting to cvsrepo
              scanning source...
              collecting CVS rlog
 log entries
+             cvslog hook: 5 entries
              creating changesets
 changeset entries
+             cvschangesets hook: 3 changesets
              sorting...
              converting...
 Initial revision
 import
              filtering out empty revision
              rolling back last transaction
 ci0
              updating tags
              c
              c
 update tags files: .hgtags
 ci0 files: b/c
 Initial revision files: b/c
              % commit new file revisions
              checking in src/a,v
              checking in src/b/c,v
              % convert again
              connecting to cvsrepo
              scanning source...
              collecting CVS rlog
 log entries
+             cvslog hook: 7 entries
              creating changesets
 changeset entries
+             cvschangesets hook: 4 changesets
              sorting...
              converting...
 ci1
              a
              a
              c
              c
              c
              % convert again with --filemap
              connecting to cvsrepo
              scanning source...
              collecting CVS rlog
 log entries
+             cvslog hook: 7 entries
              creating changesets
 changeset entries
+             cvschangesets hook: 4 changesets
              sorting...
              converting...
 ci1
              c
              c
              c
 ci1 files: b/c
 update tags files: .hgtags
 ci0 files: b/c
 Initial revision files: b/c
              % commit branch
              U b/c
              T a
              T b/c
              checking in src/b/c,v
              % convert again
              connecting to cvsrepo
              scanning source...
              collecting CVS rlog
 log entries
+             cvslog hook: 8 entries
              creating changesets
 changeset entries
+             cvschangesets hook: 5 changesets
              sorting...
              converting...
 ci2
              c
              d
              % convert again with --filemap
              connecting to cvsrepo
              scanning source...
              collecting CVS rlog
 log entries
+             cvslog hook: 8 entries
              creating changesets
 changeset entries
+             cvschangesets hook: 5 changesets
              sorting...
              converting...
 ci2
              c
              d
 ci2 files: b/c
 ci1 files: b/c
 update tags files: .hgtags
 ci0 files: b/c
 Initial revision files: b/c
              % commit a new revision with funny log message
              checking in src/a,v
              % convert again
              connecting to cvsrepo
              scanning source...
              collecting CVS rlog
 log entries
+             cvslog hook: 9 entries
              creating changesets
 changeset entries
+             cvschangesets hook: 6 changesets
              sorting...
              converting...
 funny
              o  6 (branch) funny
              |  ----------------------------
              |  log message files: a
              o  5 (branch) ci2 files: b/c
              o  4 () ci1 files: a b/c
              |
              o  3 () update tags files: .hgtags
              |
              o  2 () ci0 files: b/c
              |
              | o  1 (INITIAL) import files:
              |/
              o  0 () Initial revision files: a b/c
              % testing debugcvsps
              collecting CVS rlog
 log entries
+             cvslog hook: 9 entries
              creating changesets
 changeset entries
+             cvschangesets hook: 8 changesets
              ---------------------
              PatchSet 1
              Date:
              Author:
              Branch: HEAD
              Tag: (none)
              Branchpoints: INITIAL
              Log:
              Initial revision
              Members:
              	a:INITIAL->1.1
              ---------------------
              PatchSet 2
              Date:
              Author:
              Branch: HEAD
              Tag: (none)
              Branchpoints: INITIAL, branch
              Log:
              Initial revision
              Members:
              	b/c:INITIAL->1.1
              ---------------------
              PatchSet 3
              Date:
              Author:
              Branch: INITIAL
              Tag: start
              Log:
              import
              Members:
              	a:1.1->1.1.1.1
              	b/c:1.1->1.1.1.1
              ---------------------
              PatchSet 4
              Date:
              Author:
              Branch: HEAD
              Tag: (none)
              Log:
              ci0
              Members:
              	b/c:1.1->1.2
              ---------------------
              PatchSet 5
              Date:
              Author:
              Branch: HEAD
              Tag: (none)
              Branchpoints: branch
              Log:
              ci1
              Members:
              	a:1.1->1.2
              ---------------------
              PatchSet 6
              Date:
              Author:
              Branch: HEAD
              Tag: (none)
              Log:
              ci1
              Members:
              	b/c:1.2->1.3
              ---------------------
              PatchSet 7
              Date:
              Author:
              Branch: branch
              Tag: (none)
              Log:
              ci2
              Members:
              	b/c:1.1->1.1.2.1
              ---------------------
              PatchSet 8
              Date:
              Author:
              Branch: branch
              Tag: (none)
              Log:
              funny
              ----------------------------
              log message
              Members:
              	a:1.2->1.2.2.1

tests/test-convert.out

0 +9 0

              hg convert [OPTION]... SOURCE [DEST [REVMAP]]
              convert a foreign SCM repository to a Mercurial one.
                  Accepted source formats [identifiers]:
                  - Mercurial [hg]
                  - CVS [cvs]
                  - Darcs [darcs]
                  - git [git]
                  - Subversion [svn]
                  - Monotone [mtn]
                  - GNU Arch [gnuarch]
                  - Bazaar [bzr]
                  - Perforce [p4]
                  Accepted destination formats [identifiers]:
                  - Mercurial [hg]
                  - Subversion [svn] (history on branches is not preserved)
                  If no revision is given, all revisions will be converted. Otherwise,
                  convert will only import up to the named revision (given in a format
                  understood by the source).
                  If no destination directory name is specified, it defaults to the basename
                  of the source with '-hg' appended. If the destination repository doesn't
                  exist, it will be created.
                  By default, all sources except Mercurial will use --branchsort. Mercurial
                  uses --sourcesort to preserve original revision numbers order. Sort modes
                  have the following effects:
                  --branchsort  convert from parent to child revision when possible, which
                                means branches are usually converted one after the other. It
                                generates more compact repositories.
                  --datesort    sort revisions by date. Converted repositories have good-
                                looking changelogs but are often an order of magnitude
                                larger than the same ones generated by --branchsort.
                  --sourcesort  try to preserve source revisions order, only supported by
                                Mercurial sources.
                  If <REVMAP> isn't given, it will be put in a default location
                  (<dest>/.hg/shamap by default). The <REVMAP> is a simple text file that
                  maps each source commit ID to the destination ID for that revision, like
                  so:
                    <source ID> <destination ID>
                  If the file doesn't exist, it's automatically created. It's updated on
                  each commit copied, so convert-repo can be interrupted and can be run
                  repeatedly to copy new commits.
                  The [username mapping] file is a simple text file that maps each source
                  commit author to a destination commit author. It is handy for source SCMs
                  that use unix logins to identify authors (eg: CVS). One line per author
                  mapping and the line format is: srcauthor=whatever string you want
                  The filemap is a file that allows filtering and remapping of files and
                  directories. Comment lines start with '#'. Each line can contain one of
                  the following directives:
                    include path/to/file
                    exclude path/to/file
                    rename from/file to/file
                  The 'include' directive causes a file, or all files under a directory, to
                  be included in the destination repository, and the exclusion of all other
                  files and directories not explicitly included. The 'exclude' directive
                  causes files or directories to be omitted. The 'rename' directive renames
                  a file or directory. To rename from a subdirectory into the root of the
                  repository, use '.' as the path to rename to.
                  The splicemap is a file that allows insertion of synthetic history,
                  letting you specify the parents of a revision. This is useful if you want
                  to e.g. give a Subversion merge two parents, or graft two disconnected
                  series of history together. Each entry contains a key, followed by a
                  space, followed by one or two comma-separated values. The key is the
                  revision ID in the source revision control system whose parents should be
                  modified (same format as a key in .hg/shamap). The values are the revision
                  IDs (in either the source or destination revision control system) that
                  should be used as the new parents for that node. For example, if you have
                  merged "release-1.0" into "trunk", then you should specify the revision on
                  "trunk" as the first parent and the one on the "release-1.0" branch as the
                  second.
                  The branchmap is a file that allows you to rename a branch when it is
                  being brought in from whatever external repository. When used in
                  conjunction with a splicemap, it allows for a powerful combination to help
                  fix even the most badly mismanaged repositories and turn them into nicely
                  structured Mercurial repositories. The branchmap contains lines of the
                  form "original_branch_name new_branch_name". "original_branch_name" is the
                  name of the branch in the source repository, and "new_branch_name" is the
                  name of the branch is the destination repository. This can be used to (for
                  instance) move code in one repository from "default" to a named branch.
                  Mercurial Source
                  ----------------
                  --config convert.hg.ignoreerrors=False    (boolean)
                      ignore integrity errors when reading. Use it to fix Mercurial
                      repositories with missing revlogs, by converting from and to
                      Mercurial.
                  --config convert.hg.saverev=False         (boolean)
                      store original revision ID in changeset (forces target IDs to change)
                  --config convert.hg.startrev=0            (hg revision identifier)
                      convert start revision and its descendants
                  CVS Source
                  ----------
                  CVS source will use a sandbox (i.e. a checked-out copy) from CVS to
                  indicate the starting point of what will be converted. Direct access to
                  the repository files is not needed, unless of course the repository is
                  :local:. The conversion uses the top level directory in the sandbox to
                  find the CVS repository, and then uses CVS rlog commands to find files to
                  convert. This means that unless a filemap is given, all files under the
                  starting directory will be converted, and that any directory
                  reorganization in the CVS sandbox is ignored.
                  The options shown are the defaults.
                  --config convert.cvsps.cache=True         (boolean)
                      Set to False to disable remote log caching, for testing and debugging
                      purposes.
                  --config convert.cvsps.fuzz=60            (integer)
                      Specify the maximum time (in seconds) that is allowed between commits
                      with identical user and log message in a single changeset. When very
                      large files were checked in as part of a changeset then the default
                      may not be long enough.
                  --config convert.cvsps.mergeto='{{mergetobranch ([-\w]+)}}'
                      Specify a regular expression to which commit log messages are matched.
                      If a match occurs, then the conversion process will insert a dummy
                      revision merging the branch on which this log message occurs to the
                      branch indicated in the regex.
                  --config convert.cvsps.mergefrom='{{mergefrombranch ([-\w]+)}}'
                      Specify a regular expression to which commit log messages are matched.
                      If a match occurs, then the conversion process will add the most
                      recent revision on the branch indicated in the regex as the second
                      parent of the changeset.
+                 --config hook.cvslog
+                     Specify a Python function to be called at the end of gathering the CVS
+                     log. The function is passed a list with the log entries, and can
+                     modify the entries in-place, or add or delete them.
+                 --config hook.cvschangesets
+                     Specify a Python function to be called after the changesets are
+                     calculated from the the CVS log. The function is passed a list with
+                     the changeset entries, and can modify the changesets in-place, or add
+                     or delete them.
                  An additional "debugcvsps" Mercurial command allows the builtin changeset
                  merging code to be run without doing a conversion. Its parameters and
                  output are similar to that of cvsps 2.1. Please see the command help for
                  more details.
                  Subversion Source
                  -----------------
                  Subversion source detects classical trunk/branches/tags layouts. By
                  default, the supplied "svn://repo/path/" source URL is converted as a
                  single branch. If "svn://repo/path/trunk" exists it replaces the default
                  branch. If "svn://repo/path/branches" exists, its subdirectories are
                  listed as possible branches. If "svn://repo/path/tags" exists, it is
                  looked for tags referencing converted branches. Default "trunk",
                  "branches" and "tags" values can be overridden with following options. Set
                  them to paths relative to the source URL, or leave them blank to disable
                  auto detection.
                  --config convert.svn.branches=branches    (directory name)
                      specify the directory containing branches
                  --config convert.svn.tags=tags            (directory name)
                      specify the directory containing tags
                  --config convert.svn.trunk=trunk          (directory name)
                      specify the name of the trunk branch
                  Source history can be retrieved starting at a specific revision, instead
                  of being integrally converted. Only single branch conversions are
                  supported.
                  --config convert.svn.startrev=0           (svn revision number)
                      specify start Subversion revision.
                  Perforce Source
                  ---------------
                  The Perforce (P4) importer can be given a p4 depot path or a client
                  specification as source. It will convert all files in the source to a flat
                  Mercurial repository, ignoring labels, branches and integrations. Note
                  that when a depot path is given you then usually should specify a target
                  directory, because otherwise the target may be named ...-hg.
                  It is possible to limit the amount of source history to be converted by
                  specifying an initial Perforce revision.
                  --config convert.p4.startrev=0            (perforce changelist number)
                      specify initial Perforce revision.
                  Mercurial Destination
                  ---------------------
                  --config convert.hg.clonebranches=False   (boolean)
                      dispatch source branches in separate clones.
                  --config convert.hg.tagsbranch=default    (branch name)
                      tag revisions branch name
                  --config convert.hg.usebranchnames=True   (boolean)
                      preserve branch names
              options:
               -A --authors      username mapping filename
               -d --dest-type    destination repository type
                  --filemap      remap file names using contents of file
               -r --rev          import up to target revision REV
               -s --source-type  source repository type
                  --splicemap    splice synthesized history into place
                  --branchmap    change branch names while converting
                  --branchsort   try to sort changesets by branches
                  --datesort     try to sort changesets by date
                  --sourcesort   preserve source changesets order
              use "hg -v help convert" to show global options
              adding a
              assuming destination a-hg
              initializing destination a-hg repository
              scanning source...
              sorting...
              converting...
 a
 b
 c
 d
 e
              pulling from ../a
              searching for changes
              no changes found
              % should fail
              initializing destination bogusfile repository
              abort: cannot create new bundle repository
              % should fail
              abort: Permission denied: bogusdir
              % should succeed
              initializing destination bogusdir repository
              scanning source...
              sorting...
              converting...
 a
 b
 c
 d
 e
              % test pre and post conversion actions
              run hg source pre-conversion action
              run hg sink pre-conversion action
              run hg sink post-conversion action
              run hg source post-conversion action
              % converting empty dir should fail nicely
              assuming destination emptydir-hg
              initializing destination emptydir-hg repository
              emptydir does not look like a CVS checkout
              emptydir does not look like a Git repo
              emptydir does not look like a Subversion repo
              emptydir is not a local Mercurial repo
              emptydir does not look like a darcs repo
              emptydir does not look like a monotone repo
              emptydir does not look like a GNU Arch repo
              emptydir does not look like a Bazaar repo
              cannot find required "p4" tool
              abort: emptydir: missing or unsupported repository
              % convert with imaginary source type
              initializing destination a-foo repository
              abort: foo: invalid source repository type
              % convert with imaginary sink type
              abort: foo: invalid destination repository type

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages