upstream/mercurial-mirror Files · hgext/convert/bzr.py

localrepo: iteratively derive local repository type...

localrepo: iteratively derive local repository type This commit implements the dynamic local repository type derivation that was explained in the recent commit "localrepo: create new function for instantiating a local repo object." Instead of a static localrepository class/type which must be customized after construction, we now dynamically construct a type by building up base classes/types to represent specific repository interfaces. Conceptually, the end state is similar to what was happening when various extensions would monkeypatch the __class__ of newly-constructed repo instances. However, the approach is inverted. Instead of making the instance then customizing it, we do the customization up front by influencing the behavior of the type then we instantiate that custom type. This approach gives us much more flexibility. For example, we can use completely separate classes for implementing different aspects of the repository. For example, we could have one class representing revlog-based file storage and another representing non-revlog based file storage. When then choose which implementation to use based on the presence of repo requirements. A concern with this approach is that it creates a lot more types and complexity and that complexity adds overhead. Yes, it is true that this approach will result in more types being created. Yes, this is more complicated than traditional "instantiate a static type." However, I believe the alternatives to supporting alternate storage backends are just as complicated. (Before I arrived at this solution, I had patches storing factory functions on local repo instances for e.g. constructing a file storage instance. We ended up having a handful of these. And this was logically identical to assigning custom methods. Since we were logically changing the type of the instance, I figured it would be better to just use specialized types instead of introducing levels of abstraction at run-time.) On the performance front, I don't believe that having N base classes has any significant performance overhead compared to just a single base class. Intuition says that Python will need to iterate the base classes to find an attribute. However, CPython caches method lookups: as long as the __class__ or MRO isn't changing, method attribute lookup should be constant time after first access. And non-method attributes are stored in __dict__, of which there is only 1 per object, so the number of base classes for __dict__ is irrelevant. Anyway, this commit splits up the monolithic completelocalrepository interface into sub-interfaces: 1 for file storage and 1 representing everything else. We've taught ``makelocalrepository()`` to call a series of factory functions which will produce types implementing specific interfaces. It then calls type() to create a new type from the built-up list of base types. This commit should be considered a start and not the end state. I suspect we'll hit a number of problems as we start to implement alternate storage backends: * Passing custom arguments to __init__ and setting custom attributes on __dict__. * Customizing the set of interfaces that are needed. e.g. the "readonly" intent could translate to not requesting an interface providing methods related to writing. * More ergonomic way for extensions to insert themselves so their callbacks aren't unconditionally called. * Wanting to modify vfs instances, other arguments passed to __init__. That being said, this code is usable in its current state and I'm convinced future commits will demonstrate the value in this approach. Differential Revision: https://phab.mercurial-scm.org/D4642

Matt Harbison - - Load All Authors

File last commit:

r38591:85da230c default


                r39800:e4e88157

default

Download file

             bzr.py
        
                    306 lines
            
             | 11.6 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / hgext / convert / bzr.py
          
                    History
                
                 |
                  Annotation
                 | Raw
                 |Copy content
                 |Copy permalink

      # bzr.py - bzr support for the convert extension

      #

      #  Copyright 2008, 2009 Marek Kubica <marek@xivilization.net> and others

      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

      # This module is for handling 'bzr', that was formerly known as Bazaar-NG;

      # it cannot access 'bar' repositories, but they were never used very much

      from __future__ import absolute_import

      import os

      from mercurial.i18n import _

      from mercurial import (

          demandimport,

          error

      )

      from . import common

      # these do not work with demandimport, blacklist

      demandimport.IGNORES.update([

              'bzrlib.transactions',

              'bzrlib.urlutils',

              'ElementPath',

          ])

      try:

          # bazaar imports

          import bzrlib.bzrdir

          import bzrlib.errors

          import bzrlib.revision

          import bzrlib.revisionspec

          bzrdir = bzrlib.bzrdir

          errors = bzrlib.errors

          revision = bzrlib.revision

          revisionspec = bzrlib.revisionspec

          revisionspec.RevisionSpec

      except ImportError:

          pass

      supportedkinds = ('file', 'symlink')

      class bzr_source(common.converter_source):

          """Reads Bazaar repositories by using the Bazaar Python libraries"""

          def __init__(self, ui, repotype, path, revs=None):

              super(bzr_source, self).__init__(ui, repotype, path, revs=revs)

              if not os.path.exists(os.path.join(path, '.bzr')):

                  raise common.NoRepo(_('%s does not look like a Bazaar repository')

                                    % path)

              try:

                  # access bzrlib stuff

                  bzrdir

              except NameError:

                  raise common.NoRepo(_('Bazaar modules could not be loaded'))

              path = os.path.abspath(path)

              self._checkrepotype(path)

              try:

                  self.sourcerepo = bzrdir.BzrDir.open(path).open_repository()

              except errors.NoRepositoryPresent:

                  raise common.NoRepo(_('%s does not look like a Bazaar repository')

                                    % path)

              self._parentids = {}

              self._saverev = ui.configbool('convert', 'bzr.saverev')

          def _checkrepotype(self, path):

              # Lightweight checkouts detection is informational but probably

              # fragile at API level. It should not terminate the conversion.

              try:

                  dir = bzrdir.BzrDir.open_containing(path)[0]

                  try:

                      tree = dir.open_workingtree(recommend_upgrade=False)

                      branch = tree.branch

                  except (errors.NoWorkingTree, errors.NotLocalUrl):

                      tree = None

                      branch = dir.open_branch()

                  if (tree is not None and tree.bzrdir.root_transport.base !=

                      branch.bzrdir.root_transport.base):

                      self.ui.warn(_('warning: lightweight checkouts may cause '

                                     'conversion failures, try with a regular '

                                     'branch instead.\n'))

              except Exception:

                  self.ui.note(_('bzr source type could not be determined\n'))

          def before(self):

              """Before the conversion begins, acquire a read lock

              for all the operations that might need it. Fortunately

              read locks don't block other reads or writes to the

              repository, so this shouldn't have any impact on the usage of

              the source repository.

              The alternative would be locking on every operation that

              needs locks (there are currently two: getting the file and

              getting the parent map) and releasing immediately after,

              but this approach can take even 40% longer."""

              self.sourcerepo.lock_read()

          def after(self):

              self.sourcerepo.unlock()

          def _bzrbranches(self):

              return self.sourcerepo.find_branches(using=True)

          def getheads(self):

              if not self.revs:

                  # Set using=True to avoid nested repositories (see issue3254)

                  heads = sorted([b.last_revision() for b in self._bzrbranches()])

              else:

                  revid = None

                  for branch in self._bzrbranches():

                      try:

                          r = revisionspec.RevisionSpec.from_string(self.revs[0])

                          info = r.in_history(branch)

                      except errors.BzrError:

                          pass

                      revid = info.rev_id

                  if revid is None:

                      raise error.Abort(_('%s is not a valid revision')

                                        % self.revs[0])

                  heads = [revid]

              # Empty repositories return 'null:', which cannot be retrieved

              heads = [h for h in heads if h != 'null:']

              return heads

          def getfile(self, name, rev):

              revtree = self.sourcerepo.revision_tree(rev)

              fileid = revtree.path2id(name.decode(self.encoding or 'utf-8'))

              kind = None

              if fileid is not None:

                  kind = revtree.kind(fileid)

              if kind not in supportedkinds:

                  # the file is not available anymore - was deleted

                  return None, None

              mode = self._modecache[(name, rev)]

              if kind == 'symlink':

                  target = revtree.get_symlink_target(fileid)

                  if target is None:

                      raise error.Abort(_('%s.%s symlink has no target')

                                       % (name, rev))

                  return target, mode

              else:

                  sio = revtree.get_file(fileid)

                  return sio.read(), mode

          def getchanges(self, version, full):

              if full:

                  raise error.Abort(_("convert from cvs does not support --full"))

              self._modecache = {}

              self._revtree = self.sourcerepo.revision_tree(version)

              # get the parentids from the cache

              parentids = self._parentids.pop(version)

              # only diff against first parent id

              prevtree = self.sourcerepo.revision_tree(parentids[0])

              files, changes = self._gettreechanges(self._revtree, prevtree)

              return files, changes, set()

          def getcommit(self, version):

              rev = self.sourcerepo.get_revision(version)

              # populate parent id cache

              if not rev.parent_ids:

                  parents = []

                  self._parentids[version] = (revision.NULL_REVISION,)

              else:

                  parents = self._filterghosts(rev.parent_ids)

                  self._parentids[version] = parents

              branch = self.recode(rev.properties.get('branch-nick', u'default'))

              if branch == 'trunk':

                  branch = 'default'

              return common.commit(parents=parents,

                      date='%d %d' % (rev.timestamp, -rev.timezone),

                      author=self.recode(rev.committer),

                      desc=self.recode(rev.message),

                      branch=branch,

                      rev=version,

                      saverev=self._saverev)

          def gettags(self):

              bytetags = {}

              for branch in self._bzrbranches():

                  if not branch.supports_tags():

                      return {}

                  tagdict = branch.tags.get_tag_dict()

                  for name, rev in tagdict.iteritems():

                      bytetags[self.recode(name)] = rev

              return bytetags

          def getchangedfiles(self, rev, i):

              self._modecache = {}

              curtree = self.sourcerepo.revision_tree(rev)

              if i is not None:

                  parentid = self._parentids[rev][i]

              else:

                  # no parent id, get the empty revision

                  parentid = revision.NULL_REVISION

              prevtree = self.sourcerepo.revision_tree(parentid)

              changes = [e[0] for e in self._gettreechanges(curtree, prevtree)[0]]

              return changes

          def _gettreechanges(self, current, origin):

              revid = current._revision_id

              changes = []

              renames = {}

              seen = set()

              # Fall back to the deprecated attribute for legacy installations.

              try:

                  inventory = origin.root_inventory

              except AttributeError:

                  inventory = origin.inventory

              # Process the entries by reverse lexicographic name order to

              # handle nested renames correctly, most specific first.

              curchanges = sorted(current.iter_changes(origin),

                                  key=lambda c: c[1][0] or c[1][1],

                                  reverse=True)

              for (fileid, paths, changed_content, versioned, parent, name,

                  kind, executable) in curchanges:

                  if paths[0] == u'' or paths[1] == u'':

                      # ignore changes to tree root

                      continue

                  # bazaar tracks directories, mercurial does not, so

                  # we have to rename the directory contents

                  if kind[1] == 'directory':

                      if kind[0] not in (None, 'directory'):

                          # Replacing 'something' with a directory, record it

                          # so it can be removed.

                          changes.append((self.recode(paths[0]), revid))

                      if kind[0] == 'directory' and None not in paths:

                          renaming = paths[0] != paths[1]

                          # neither an add nor an delete - a move

                          # rename all directory contents manually

                          subdir = inventory.path2id(paths[0])

                          # get all child-entries of the directory

                          for name, entry in inventory.iter_entries(subdir):

                              # hg does not track directory renames

                              if entry.kind == 'directory':

                                  continue

                              frompath = self.recode(paths[0] + '/' + name)

                              if frompath in seen:

                                  # Already handled by a more specific change entry

                                  # This is important when you have:

                                  # a => b

                                  # a/c => a/c

                                  # Here a/c must not be renamed into b/c

                                  continue

                              seen.add(frompath)

                              if not renaming:

                                  continue

                              topath = self.recode(paths[1] + '/' + name)

                              # register the files as changed

                              changes.append((frompath, revid))

                              changes.append((topath, revid))

                              # add to mode cache

                              mode = ((entry.executable and 'x')

                                      or (entry.kind == 'symlink' and 's')

                                      or '')

                              self._modecache[(topath, revid)] = mode

                              # register the change as move

                              renames[topath] = frompath

                      # no further changes, go to the next change

                      continue

                  # we got unicode paths, need to convert them

                  path, topath = paths

                  if path is not None:

                      path = self.recode(path)

                  if topath is not None:

                      topath = self.recode(topath)

                  seen.add(path or topath)

                  if topath is None:

                      # file deleted

                      changes.append((path, revid))

                      continue

                  # renamed

                  if path and path != topath:

                      renames[topath] = path

                      changes.append((path, revid))

                  # populate the mode cache

                  kind, executable = [e[1] for e in (kind, executable)]

                  mode = ((executable and 'x') or (kind == 'symlink' and 'l')

                          or '')

                  self._modecache[(topath, revid)] = mode

                  changes.append((topath, revid))

              return changes, renames

          def _filterghosts(self, ids):

              """Filters out ghost revisions which hg does not support, see

              <http://bazaar-vcs.org/GhostRevision>

              """

              parentmap = self.sourcerepo.get_parent_map(ids)

              parents = tuple([parent for parent in ids if parent in parentmap])

              return parents

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

				# bzr.py - bzr support for the convert extension
				#
				# Copyright 2008, 2009 Marek Kubica <marek@xivilization.net> and others
				#
				# This software may be used and distributed according to the terms of the
				# GNU General Public License version 2 or any later version.

				# This module is for handling 'bzr', that was formerly known as Bazaar-NG;
				# it cannot access 'bar' repositories, but they were never used very much
				from __future__ import absolute_import

				import os

				from mercurial.i18n import _
				from mercurial import (
				demandimport,
				error
				)
				from . import common

				# these do not work with demandimport, blacklist
				demandimport.IGNORES.update([
				'bzrlib.transactions',
				'bzrlib.urlutils',
				'ElementPath',
				])

				try:
				# bazaar imports
				import bzrlib.bzrdir
				import bzrlib.errors
				import bzrlib.revision
				import bzrlib.revisionspec
				bzrdir = bzrlib.bzrdir
				errors = bzrlib.errors
				revision = bzrlib.revision
				revisionspec = bzrlib.revisionspec
				revisionspec.RevisionSpec
				except ImportError:
				pass

				supportedkinds = ('file', 'symlink')

				class bzr_source(common.converter_source):
				"""Reads Bazaar repositories by using the Bazaar Python libraries"""

				def __init__(self, ui, repotype, path, revs=None):
				super(bzr_source, self).__init__(ui, repotype, path, revs=revs)

				if not os.path.exists(os.path.join(path, '.bzr')):
				raise common.NoRepo(_('%s does not look like a Bazaar repository')
				% path)

				try:
				# access bzrlib stuff
				bzrdir
				except NameError:
				raise common.NoRepo(_('Bazaar modules could not be loaded'))

				path = os.path.abspath(path)
				self._checkrepotype(path)
				try:
				self.sourcerepo = bzrdir.BzrDir.open(path).open_repository()
				except errors.NoRepositoryPresent:
				raise common.NoRepo(_('%s does not look like a Bazaar repository')
				% path)
				self._parentids = {}
				self._saverev = ui.configbool('convert', 'bzr.saverev')

				def _checkrepotype(self, path):
				# Lightweight checkouts detection is informational but probably
				# fragile at API level. It should not terminate the conversion.
				try:
				dir = bzrdir.BzrDir.open_containing(path)[0]
				try:
				tree = dir.open_workingtree(recommend_upgrade=False)
				branch = tree.branch
				except (errors.NoWorkingTree, errors.NotLocalUrl):
				tree = None
				branch = dir.open_branch()
				if (tree is not None and tree.bzrdir.root_transport.base !=
				branch.bzrdir.root_transport.base):
				self.ui.warn(_('warning: lightweight checkouts may cause '
				'conversion failures, try with a regular '
				'branch instead.\n'))
				except Exception:
				self.ui.note(_('bzr source type could not be determined\n'))

				def before(self):
				"""Before the conversion begins, acquire a read lock
				for all the operations that might need it. Fortunately
				read locks don't block other reads or writes to the
				repository, so this shouldn't have any impact on the usage of
				the source repository.

				The alternative would be locking on every operation that
				needs locks (there are currently two: getting the file and
				getting the parent map) and releasing immediately after,
				but this approach can take even 40% longer."""
				self.sourcerepo.lock_read()

				def after(self):
				self.sourcerepo.unlock()

				def _bzrbranches(self):
				return self.sourcerepo.find_branches(using=True)

				def getheads(self):
				if not self.revs:
				# Set using=True to avoid nested repositories (see issue3254)
				heads = sorted([b.last_revision() for b in self._bzrbranches()])
				else:
				revid = None
				for branch in self._bzrbranches():
				try:
				r = revisionspec.RevisionSpec.from_string(self.revs[0])
				info = r.in_history(branch)
				except errors.BzrError:
				pass
				revid = info.rev_id
				if revid is None:
				raise error.Abort(_('%s is not a valid revision')
				% self.revs[0])
				heads = [revid]
				# Empty repositories return 'null:', which cannot be retrieved
				heads = [h for h in heads if h != 'null:']
				return heads

				def getfile(self, name, rev):
				revtree = self.sourcerepo.revision_tree(rev)
				fileid = revtree.path2id(name.decode(self.encoding or 'utf-8'))
				kind = None
				if fileid is not None:
				kind = revtree.kind(fileid)
				if kind not in supportedkinds:
				# the file is not available anymore - was deleted
				return None, None
				mode = self._modecache[(name, rev)]
				if kind == 'symlink':
				target = revtree.get_symlink_target(fileid)
				if target is None:
				raise error.Abort(_('%s.%s symlink has no target')
				% (name, rev))
				return target, mode
				else:
				sio = revtree.get_file(fileid)
				return sio.read(), mode

				def getchanges(self, version, full):
				if full:
				raise error.Abort(_("convert from cvs does not support --full"))
				self._modecache = {}
				self._revtree = self.sourcerepo.revision_tree(version)
				# get the parentids from the cache
				parentids = self._parentids.pop(version)
				# only diff against first parent id
				prevtree = self.sourcerepo.revision_tree(parentids[0])
				files, changes = self._gettreechanges(self._revtree, prevtree)
				return files, changes, set()

				def getcommit(self, version):
				rev = self.sourcerepo.get_revision(version)
				# populate parent id cache
				if not rev.parent_ids:
				parents = []
				self._parentids[version] = (revision.NULL_REVISION,)
				else:
				parents = self._filterghosts(rev.parent_ids)
				self._parentids[version] = parents

				branch = self.recode(rev.properties.get('branch-nick', u'default'))
				if branch == 'trunk':
				branch = 'default'
				return common.commit(parents=parents,
				date='%d %d' % (rev.timestamp, -rev.timezone),
				author=self.recode(rev.committer),
				desc=self.recode(rev.message),
				branch=branch,
				rev=version,
				saverev=self._saverev)

				def gettags(self):
				bytetags = {}
				for branch in self._bzrbranches():
				if not branch.supports_tags():
				return {}
				tagdict = branch.tags.get_tag_dict()
				for name, rev in tagdict.iteritems():
				bytetags[self.recode(name)] = rev
				return bytetags

				def getchangedfiles(self, rev, i):
				self._modecache = {}
				curtree = self.sourcerepo.revision_tree(rev)
				if i is not None:
				parentid = self._parentids[rev][i]
				else:
				# no parent id, get the empty revision
				parentid = revision.NULL_REVISION

				prevtree = self.sourcerepo.revision_tree(parentid)
				changes = [e[0] for e in self._gettreechanges(curtree, prevtree)[0]]
				return changes

				def _gettreechanges(self, current, origin):
				revid = current._revision_id
				changes = []
				renames = {}
				seen = set()

				# Fall back to the deprecated attribute for legacy installations.
				try:
				inventory = origin.root_inventory
				except AttributeError:
				inventory = origin.inventory

				# Process the entries by reverse lexicographic name order to
				# handle nested renames correctly, most specific first.
				curchanges = sorted(current.iter_changes(origin),
				key=lambda c: c[1][0] or c[1][1],
				reverse=True)
				for (fileid, paths, changed_content, versioned, parent, name,
				kind, executable) in curchanges:

				if paths[0] == u'' or paths[1] == u'':
				# ignore changes to tree root
				continue

				# bazaar tracks directories, mercurial does not, so
				# we have to rename the directory contents
				if kind[1] == 'directory':
				if kind[0] not in (None, 'directory'):
				# Replacing 'something' with a directory, record it
				# so it can be removed.
				changes.append((self.recode(paths[0]), revid))

				if kind[0] == 'directory' and None not in paths:
				renaming = paths[0] != paths[1]
				# neither an add nor an delete - a move
				# rename all directory contents manually
				subdir = inventory.path2id(paths[0])
				# get all child-entries of the directory
				for name, entry in inventory.iter_entries(subdir):
				# hg does not track directory renames
				if entry.kind == 'directory':
				continue
				frompath = self.recode(paths[0] + '/' + name)
				if frompath in seen:
				# Already handled by a more specific change entry
				# This is important when you have:
				# a => b
				# a/c => a/c
				# Here a/c must not be renamed into b/c
				continue
				seen.add(frompath)
				if not renaming:
				continue
				topath = self.recode(paths[1] + '/' + name)
				# register the files as changed
				changes.append((frompath, revid))
				changes.append((topath, revid))
				# add to mode cache
				mode = ((entry.executable and 'x')
				or (entry.kind == 'symlink' and 's')
				or '')
				self._modecache[(topath, revid)] = mode
				# register the change as move
				renames[topath] = frompath

				# no further changes, go to the next change
				continue

				# we got unicode paths, need to convert them
				path, topath = paths
				if path is not None:
				path = self.recode(path)
				if topath is not None:
				topath = self.recode(topath)
				seen.add(path or topath)

				if topath is None:
				# file deleted
				changes.append((path, revid))
				continue

				# renamed
				if path and path != topath:
				renames[topath] = path
				changes.append((path, revid))

				# populate the mode cache
				kind, executable = [e[1] for e in (kind, executable)]
				mode = ((executable and 'x') or (kind == 'symlink' and 'l')
				or '')
				self._modecache[(topath, revid)] = mode
				changes.append((topath, revid))

				return changes, renames

				def _filterghosts(self, ids):
				"""Filters out ghost revisions which hg does not support, see
				<http://bazaar-vcs.org/GhostRevision>
				"""
				parentmap = self.sourcerepo.get_parent_map(ids)
				parents = tuple([parent for parent in ids if parent in parentmap])
				return parents