upstream/mercurial-mirror Files · hgext/narrow/narrowrevlog.py

lfs: add basic routing for the server side wire protocol processing...

lfs: add basic routing for the server side wire protocol processing The recent hgweb refactoring yielded a clean point to wrap a function that could handle this, so I moved the routing for this out of the core. While not an hg wire protocol, this seems logically close enough. For now, these handlers do nothing other than check permissions. The protocol requires support for PUT requests, so that has been added to the core, and funnels into the same handler as GET and POST. The permission checking code was assuming that anything not checking 'pull' or None ops should be using POST. But that breaks the upload check if it checks 'push'. So I invented a new 'upload' permission, and used it to avoid the mandate to POST. A function wrap point could be added, but security code should probably stay grouped together. Given that anything not 'pull' or None was requiring POST, the comment on hgweb.common.permhooks is probably wrong- there is no 'read'. The rationale for the URIs is that the spec for the Batch API[1] defines the URL as the LFS server url + '/objects/batch'. The default git URLs are: Git remote: https://git-server.com/foo/bar LFS server: https://git-server.com/foo/bar.git/info/lfs Batch API: https://git-server.com/foo/bar.git/info/lfs/objects/batch '.git/' seems like it's not something a user would normally track. If we adhere to how git defines the URLs, then the hg-git extension should be able to talk to a git based server without any additional work. The URI for the transfer requests starts with '.hg/' to ensure that there are no conflicts with tracked files. Since these are handed out by the Batch API, we can change this at any point in the future. (Specifically, it might be a good idea to use something under the proposed /api/ namespace.) In any case, no files are stored at these locations in the repository directory. I started a new module for this because it seems like a good idea to keep all of the security sensitive server side code together. There's also an issue with `hg verify` in that it will want to download *all* blobs in order to run. Sadly, there's no way in the protocol to ask the server to verify the content of a blob it may have. (The verify action is for storing files on a 3rd party server, and then informing the LFS server when that completes.) So we may end up implementing a custom transfer adapter that simply indicates if the blobs are valid, and fall back to basic transfers for non-hg servers. In other words, this code is likely to get bigger before this is made non-experimental. [1] https://github.com/git-lfs/git-lfs/blob/master/docs/api/batch.md

Augie Fackler - - Load All Authors

File last commit:

r36114:66b4ffe9 default


                r37165:a2566597

default

Download file

             narrowrevlog.py
        
                    187 lines
            
             | 7.2 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / hgext / narrow / narrowrevlog.py
          
                    History
                
                 |
                  Annotation
                 | Raw
                 |Copy content
                 |Copy permalink

      # narrowrevlog.py - revlog storing irrelevant nodes as "ellipsis" nodes

      #

      # Copyright 2017 Google, Inc.

      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

      from __future__ import absolute_import

      from mercurial import (

         error,

         manifest,

         revlog,

         util,

      )

      def readtransform(self, text):

          return text, False

      def writetransform(self, text):

          return text, False

      def rawtransform(self, text):

          return False

      revlog.addflagprocessor(revlog.REVIDX_ELLIPSIS,

                              (readtransform, writetransform, rawtransform))

      def setup():

          # We just wanted to add the flag processor, which is done at module

          # load time.

          pass

      class excludeddir(manifest.treemanifest):

          """Stand-in for a directory that is excluded from the repository.

          With narrowing active on a repository that uses treemanifests,

          some of the directory revlogs will be excluded from the resulting

          clone. This is a huge storage win for clients, but means we need

          some sort of pseudo-manifest to surface to internals so we can

          detect a merge conflict outside the narrowspec. That's what this

          class is: it stands in for a directory whose node is known, but

          whose contents are unknown.

          """

          def __init__(self, dir, node):

              super(excludeddir, self).__init__(dir)

              self._node = node

              # Add an empty file, which will be included by iterators and such,

              # appearing as the directory itself (i.e. something like "dir/")

              self._files[''] = node

              self._flags[''] = 't'

          # Manifests outside the narrowspec should never be modified, so avoid

          # copying. This makes a noticeable difference when there are very many

          # directories outside the narrowspec. Also, it makes sense for the copy to

          # be of the same type as the original, which would not happen with the

          # super type's copy().

          def copy(self):

              return self

      class excludeddirmanifestctx(manifest.treemanifestctx):

          """context wrapper for excludeddir - see that docstring for rationale"""

          def __init__(self, dir, node):

              self._dir = dir

              self._node = node

          def read(self):

              return excludeddir(self._dir, self._node)

          def write(self, *args):

              raise error.ProgrammingError(

                  'attempt to write manifest from excluded dir %s' % self._dir)

      class excludedmanifestrevlog(manifest.manifestrevlog):

          """Stand-in for excluded treemanifest revlogs.

          When narrowing is active on a treemanifest repository, we'll have

          references to directories we can't see due to the revlog being

          skipped. This class exists to conform to the manifestrevlog

          interface for those directories and proactively prevent writes to

          outside the narrowspec.

          """

          def __init__(self, dir):

              self._dir = dir

          def __len__(self):

              raise error.ProgrammingError(

                  'attempt to get length of excluded dir %s' % self._dir)

          def rev(self, node):

              raise error.ProgrammingError(

                  'attempt to get rev from excluded dir %s' % self._dir)

          def linkrev(self, node):

              raise error.ProgrammingError(

                  'attempt to get linkrev from excluded dir %s' % self._dir)

          def node(self, rev):

              raise error.ProgrammingError(

                  'attempt to get node from excluded dir %s' % self._dir)

          def add(self, *args, **kwargs):

              # We should never write entries in dirlogs outside the narrow clone.

              # However, the method still gets called from writesubtree() in

              # _addtree(), so we need to handle it. We should possibly make that

              # avoid calling add() with a clean manifest (_dirty is always False

              # in excludeddir instances).

              pass

      def makenarrowmanifestrevlog(mfrevlog, repo):

          if util.safehasattr(mfrevlog, '_narrowed'):

              return

          class narrowmanifestrevlog(mfrevlog.__class__):

              # This function is called via debug{revlog,index,data}, but also during

              # at least some push operations. This will be used to wrap/exclude the

              # child directories when using treemanifests.

              def dirlog(self, d):

                  if d and not d.endswith('/'):

                      d = d + '/'

                  if not repo.narrowmatch().visitdir(d[:-1] or '.'):

                      return excludedmanifestrevlog(d)

                  result = super(narrowmanifestrevlog, self).dirlog(d)

                  makenarrowmanifestrevlog(result, repo)

                  return result

          mfrevlog.__class__ = narrowmanifestrevlog

          mfrevlog._narrowed = True

      def makenarrowmanifestlog(mfl, repo):

          class narrowmanifestlog(mfl.__class__):

              def get(self, dir, node, verify=True):

                  if not repo.narrowmatch().visitdir(dir[:-1] or '.'):

                      return excludeddirmanifestctx(dir, node)

                  return super(narrowmanifestlog, self).get(dir, node, verify=verify)

          mfl.__class__ = narrowmanifestlog

      def makenarrowfilelog(fl, narrowmatch):

          class narrowfilelog(fl.__class__):

              def renamed(self, node):

                  # Renames that come from outside the narrowspec are

                  # problematic at least for git-diffs, because we lack the

                  # base text for the rename. This logic was introduced in

                  # 3cd72b1 of narrowhg (authored by martinvonz, reviewed by

                  # adgar), but that revision doesn't have any additional

                  # commentary on what problems we can encounter.

                  m = super(narrowfilelog, self).renamed(node)

                  if m and not narrowmatch(m[0]):

                      return None

                  return m

              def size(self, rev):

                  # We take advantage of the fact that remotefilelog

                  # lacks a node() method to just skip the

                  # rename-checking logic when on remotefilelog. This

                  # might be incorrect on other non-revlog-based storage

                  # engines, but for now this seems to be fine.

                  #

                  # TODO: when remotefilelog is in core, improve this to

                  # explicitly look for remotefilelog instead of cheating

                  # with a hasattr check.

                  if util.safehasattr(self, 'node'):

                      node = self.node(rev)

                      # Because renamed() is overridden above to

                      # sometimes return None even if there is metadata

                      # in the revlog, size can be incorrect for

                      # copies/renames, so we need to make sure we call

                      # the super class's implementation of renamed()

                      # for the purpose of size calculation.

                      if super(narrowfilelog, self).renamed(node):

                          return len(self.read(node))

                  return super(narrowfilelog, self).size(rev)

              def cmp(self, node, text):

                  different = super(narrowfilelog, self).cmp(node, text)

                  if different:

                      # Similar to size() above, if the file was copied from

                      # a file outside the narrowspec, the super class's

                      # would have returned True because we tricked it into

                      # thinking that the file was not renamed.

                      if super(narrowfilelog, self).renamed(node):

                          t2 = self.read(node)

                          return t2 != text

                  return different

          fl.__class__ = narrowfilelog

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

				# narrowrevlog.py - revlog storing irrelevant nodes as "ellipsis" nodes
				#
				# Copyright 2017 Google, Inc.
				#
				# This software may be used and distributed according to the terms of the
				# GNU General Public License version 2 or any later version.

				from __future__ import absolute_import

				from mercurial import (
				error,
				manifest,
				revlog,
				util,
				)

				def readtransform(self, text):
				return text, False

				def writetransform(self, text):
				return text, False

				def rawtransform(self, text):
				return False

				revlog.addflagprocessor(revlog.REVIDX_ELLIPSIS,
				(readtransform, writetransform, rawtransform))

				def setup():
				# We just wanted to add the flag processor, which is done at module
				# load time.
				pass

				class excludeddir(manifest.treemanifest):
				"""Stand-in for a directory that is excluded from the repository.

				With narrowing active on a repository that uses treemanifests,
				some of the directory revlogs will be excluded from the resulting
				clone. This is a huge storage win for clients, but means we need
				some sort of pseudo-manifest to surface to internals so we can
				detect a merge conflict outside the narrowspec. That's what this
				class is: it stands in for a directory whose node is known, but
				whose contents are unknown.
				"""
				def __init__(self, dir, node):
				super(excludeddir, self).__init__(dir)
				self._node = node
				# Add an empty file, which will be included by iterators and such,
				# appearing as the directory itself (i.e. something like "dir/")
				self._files[''] = node
				self._flags[''] = 't'

				# Manifests outside the narrowspec should never be modified, so avoid
				# copying. This makes a noticeable difference when there are very many
				# directories outside the narrowspec. Also, it makes sense for the copy to
				# be of the same type as the original, which would not happen with the
				# super type's copy().
				def copy(self):
				return self

				class excludeddirmanifestctx(manifest.treemanifestctx):
				"""context wrapper for excludeddir - see that docstring for rationale"""
				def __init__(self, dir, node):
				self._dir = dir
				self._node = node

				def read(self):
				return excludeddir(self._dir, self._node)

				def write(self, *args):
				raise error.ProgrammingError(
				'attempt to write manifest from excluded dir %s' % self._dir)

				class excludedmanifestrevlog(manifest.manifestrevlog):
				"""Stand-in for excluded treemanifest revlogs.

				When narrowing is active on a treemanifest repository, we'll have
				references to directories we can't see due to the revlog being
				skipped. This class exists to conform to the manifestrevlog
				interface for those directories and proactively prevent writes to
				outside the narrowspec.
				"""

				def __init__(self, dir):
				self._dir = dir

				def __len__(self):
				raise error.ProgrammingError(
				'attempt to get length of excluded dir %s' % self._dir)

				def rev(self, node):
				raise error.ProgrammingError(
				'attempt to get rev from excluded dir %s' % self._dir)

				def linkrev(self, node):
				raise error.ProgrammingError(
				'attempt to get linkrev from excluded dir %s' % self._dir)

				def node(self, rev):
				raise error.ProgrammingError(
				'attempt to get node from excluded dir %s' % self._dir)

				def add(self, args, *kwargs):
				# We should never write entries in dirlogs outside the narrow clone.
				# However, the method still gets called from writesubtree() in
				# _addtree(), so we need to handle it. We should possibly make that
				# avoid calling add() with a clean manifest (_dirty is always False
				# in excludeddir instances).
				pass

				def makenarrowmanifestrevlog(mfrevlog, repo):
				if util.safehasattr(mfrevlog, '_narrowed'):
				return

				class narrowmanifestrevlog(mfrevlog.__class__):
				# This function is called via debug{revlog,index,data}, but also during
				# at least some push operations. This will be used to wrap/exclude the
				# child directories when using treemanifests.
				def dirlog(self, d):
				if d and not d.endswith('/'):
				d = d + '/'
				if not repo.narrowmatch().visitdir(d[:-1] or '.'):
				return excludedmanifestrevlog(d)
				result = super(narrowmanifestrevlog, self).dirlog(d)
				makenarrowmanifestrevlog(result, repo)
				return result

				mfrevlog.__class__ = narrowmanifestrevlog
				mfrevlog._narrowed = True

				def makenarrowmanifestlog(mfl, repo):
				class narrowmanifestlog(mfl.__class__):
				def get(self, dir, node, verify=True):
				if not repo.narrowmatch().visitdir(dir[:-1] or '.'):
				return excludeddirmanifestctx(dir, node)
				return super(narrowmanifestlog, self).get(dir, node, verify=verify)
				mfl.__class__ = narrowmanifestlog

				def makenarrowfilelog(fl, narrowmatch):
				class narrowfilelog(fl.__class__):
				def renamed(self, node):
				# Renames that come from outside the narrowspec are
				# problematic at least for git-diffs, because we lack the
				# base text for the rename. This logic was introduced in
				# 3cd72b1 of narrowhg (authored by martinvonz, reviewed by
				# adgar), but that revision doesn't have any additional
				# commentary on what problems we can encounter.
				m = super(narrowfilelog, self).renamed(node)
				if m and not narrowmatch(m[0]):
				return None
				return m

				def size(self, rev):
				# We take advantage of the fact that remotefilelog
				# lacks a node() method to just skip the
				# rename-checking logic when on remotefilelog. This
				# might be incorrect on other non-revlog-based storage
				# engines, but for now this seems to be fine.
				#
				# TODO: when remotefilelog is in core, improve this to
				# explicitly look for remotefilelog instead of cheating
				# with a hasattr check.
				if util.safehasattr(self, 'node'):
				node = self.node(rev)
				# Because renamed() is overridden above to
				# sometimes return None even if there is metadata
				# in the revlog, size can be incorrect for
				# copies/renames, so we need to make sure we call
				# the super class's implementation of renamed()
				# for the purpose of size calculation.
				if super(narrowfilelog, self).renamed(node):
				return len(self.read(node))
				return super(narrowfilelog, self).size(rev)

				def cmp(self, node, text):
				different = super(narrowfilelog, self).cmp(node, text)
				if different:
				# Similar to size() above, if the file was copied from
				# a file outside the narrowspec, the super class's
				# would have returned True because we tricked it into
				# thinking that the file was not renamed.
				if super(narrowfilelog, self).renamed(node):
				t2 = self.read(node)
				return t2 != text
				return different

				fl.__class__ = narrowfilelog