upstream/mercurial-mirror Files · hgext/relink.py

revlog: add a mechanism to verify expected file position before appending...

revlog: add a mechanism to verify expected file position before appending If someone uses `hg debuglocks`, or some non-hg process writes to the .hg directory without respecting the locks, or if the repo's on a networked filesystem, it's possible for the revlog code to write out corrupted data. The form of this corruption can vary depending on what data was written and how that happened. We are in the "networked filesystem" case (though I've had users also do this to themselves with the "`hg debuglocks`" scenario), and most often see this with the changelog. What ends up happening is we produce two items (let's call them rev1 and rev2) in the .i file that have the same linkrev, baserev, and offset into the .d file, while the data in the .d file is appended properly. rev2's compressed_size is accurate for rev2, but when we go to decompress the data in the .d file, we use the offset that's recorded in the index file, which is the same as rev1, and attempt to decompress rev2.compressed_size bytes of rev1's data. This usually does not succeed. :) When using inline data, this also fails, though I haven't investigated why too closely. This shows up as a "patch decode" error. I believe what's happening there is that we're basically ignoring the offset field, getting the data properly, but since baserev != rev, it thinks this is a delta based on rev (instead of a full text) and can't actually apply it as such. For now, I'm going to make this an optional component and default it to entirely off. I may increase the default severity of this in the future, once I've enabled it for my users and we gain more experience with it. Luckily, most of my users have a versioned filesystem and can roll back to before the corruption has been written, it's just a hassle to do so and not everyone knows how (so it's a support burden). Users on other filesystems will not have that luxury, and this can cause them to have a corrupted repository that they are unlikely to know how to resolve, and they'll see this as a data-loss event. Refusing to create the corruption is a much better user experience. This mechanism is not perfect. There may be false-negatives (racy writes that are not detected). There should not be any false-positives (non-racy writes that are detected as such). This is not a mechanism that makes putting a repo on a networked filesystem "safe" or "supported", just *less* likely to cause corruption. Differential Revision: https://phab.mercurial-scm.org/D9952

Gregory Szorc - - Load All Authors

File last commit:

r43355:eef9a2d6 default


                r47349:e9901d01

default

Download file

             relink.py
        
                    211 lines
            
             | 6.6 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / hgext / relink.py
          
                    History
                
                 |
                  Annotation
                 | Raw
                 |Copy content
                 |Copy permalink

      # Mercurial extension to provide 'hg relink' command

      #

      # Copyright (C) 2007 Brendan Cully <brendan@kublai.com>

      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

      """recreates hardlinks between repository clones"""

      from __future__ import absolute_import

      import os

      import stat

      from mercurial.i18n import _

      from mercurial.pycompat import open

      from mercurial import (

          error,

          hg,

          registrar,

          util,

      )

      from mercurial.utils import stringutil

      cmdtable = {}

      command = registrar.command(cmdtable)

      # Note for extension authors: ONLY specify testedwith = 'ships-with-hg-core' for

      # extensions which SHIP WITH MERCURIAL. Non-mainline extensions should

      # be specifying the version(s) of Mercurial they are tested with, or

      # leave the attribute unspecified.

      testedwith = b'ships-with-hg-core'

      @command(

          b'relink', [], _(b'[ORIGIN]'), helpcategory=command.CATEGORY_MAINTENANCE

      )

      def relink(ui, repo, origin=None, **opts):

          """recreate hardlinks between two repositories

          When repositories are cloned locally, their data files will be

          hardlinked so that they only use the space of a single repository.

          Unfortunately, subsequent pulls into either repository will break

          hardlinks for any files touched by the new changesets, even if

          both repositories end up pulling the same changes.

          Similarly, passing --rev to "hg clone" will fail to use any

          hardlinks, falling back to a complete copy of the source

          repository.

          This command lets you recreate those hardlinks and reclaim that

          wasted space.

          This repository will be relinked to share space with ORIGIN, which

          must be on the same local disk. If ORIGIN is omitted, looks for

          "default-relink", then "default", in [paths].

          Do not attempt any read operations on this repository while the

          command is running. (Both repositories will be locked against

          writes.)

          """

          if not util.safehasattr(util, b'samefile') or not util.safehasattr(

              util, b'samedevice'

          ):

              raise error.Abort(_(b'hardlinks are not supported on this system'))

          src = hg.repository(

              repo.baseui,

              ui.expandpath(origin or b'default-relink', origin or b'default'),

          )

          ui.status(_(b'relinking %s to %s\n') % (src.store.path, repo.store.path))

          if repo.root == src.root:

              ui.status(_(b'there is nothing to relink\n'))

              return

          if not util.samedevice(src.store.path, repo.store.path):

              # No point in continuing

              raise error.Abort(_(b'source and destination are on different devices'))

          with repo.lock(), src.lock():

              candidates = sorted(collect(src, ui))

              targets = prune(candidates, src.store.path, repo.store.path, ui)

              do_relink(src.store.path, repo.store.path, targets, ui)

      def collect(src, ui):

          seplen = len(os.path.sep)

          candidates = []

          live = len(src[b'tip'].manifest())

          # Your average repository has some files which were deleted before

          # the tip revision. We account for that by assuming that there are

          # 3 tracked files for every 2 live files as of the tip version of

          # the repository.

          #

          # mozilla-central as of 2010-06-10 had a ratio of just over 7:5.

          total = live * 3 // 2

          src = src.store.path

          progress = ui.makeprogress(_(b'collecting'), unit=_(b'files'), total=total)

          pos = 0

          ui.status(

              _(b"tip has %d files, estimated total number of files: %d\n")

              % (live, total)

          )

          for dirpath, dirnames, filenames in os.walk(src):

              dirnames.sort()

              relpath = dirpath[len(src) + seplen :]

              for filename in sorted(filenames):

                  if filename[-2:] not in (b'.d', b'.i'):

                      continue

                  st = os.stat(os.path.join(dirpath, filename))

                  if not stat.S_ISREG(st.st_mode):

                      continue

                  pos += 1

                  candidates.append((os.path.join(relpath, filename), st))

                  progress.update(pos, item=filename)

          progress.complete()

          ui.status(_(b'collected %d candidate storage files\n') % len(candidates))

          return candidates

      def prune(candidates, src, dst, ui):

          def linkfilter(src, dst, st):

              try:

                  ts = os.stat(dst)

              except OSError:

                  # Destination doesn't have this file?

                  return False

              if util.samefile(src, dst):

                  return False

              if not util.samedevice(src, dst):

                  # No point in continuing

                  raise error.Abort(

                      _(b'source and destination are on different devices')

                  )

              if st.st_size != ts.st_size:

                  return False

              return st

          targets = []

          progress = ui.makeprogress(

              _(b'pruning'), unit=_(b'files'), total=len(candidates)

          )

          pos = 0

          for fn, st in candidates:

              pos += 1

              srcpath = os.path.join(src, fn)

              tgt = os.path.join(dst, fn)

              ts = linkfilter(srcpath, tgt, st)

              if not ts:

                  ui.debug(b'not linkable: %s\n' % fn)

                  continue

              targets.append((fn, ts.st_size))

              progress.update(pos, item=fn)

          progress.complete()

          ui.status(

              _(b'pruned down to %d probably relinkable files\n') % len(targets)

          )

          return targets

      def do_relink(src, dst, files, ui):

          def relinkfile(src, dst):

              bak = dst + b'.bak'

              os.rename(dst, bak)

              try:

                  util.oslink(src, dst)

              except OSError:

                  os.rename(bak, dst)

                  raise

              os.remove(bak)

          CHUNKLEN = 65536

          relinked = 0

          savedbytes = 0

          progress = ui.makeprogress(

              _(b'relinking'), unit=_(b'files'), total=len(files)

          )

          pos = 0

          for f, sz in files:

              pos += 1

              source = os.path.join(src, f)

              tgt = os.path.join(dst, f)

              # Binary mode, so that read() works correctly, especially on Windows

              sfp = open(source, b'rb')

              dfp = open(tgt, b'rb')

              sin = sfp.read(CHUNKLEN)

              while sin:

                  din = dfp.read(CHUNKLEN)

                  if sin != din:

                      break

                  sin = sfp.read(CHUNKLEN)

              sfp.close()

              dfp.close()

              if sin:

                  ui.debug(b'not linkable: %s\n' % f)

                  continue

              try:

                  relinkfile(source, tgt)

                  progress.update(pos, item=f)

                  relinked += 1

                  savedbytes += sz

              except OSError as inst:

                  ui.warn(b'%s: %s\n' % (tgt, stringutil.forcebytestr(inst)))

          progress.complete()

          ui.status(

              _(b'relinked %d files (%s reclaimed)\n')

              % (relinked, util.bytecount(savedbytes))

          )

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

				# Mercurial extension to provide 'hg relink' command
				#
				# Copyright (C) 2007 Brendan Cully <brendan@kublai.com>
				#
				# This software may be used and distributed according to the terms of the
				# GNU General Public License version 2 or any later version.

				"""recreates hardlinks between repository clones"""
				from __future__ import absolute_import

				import os
				import stat

				from mercurial.i18n import _
				from mercurial.pycompat import open
				from mercurial import (
				error,
				hg,
				registrar,
				util,
				)
				from mercurial.utils import stringutil

				cmdtable = {}
				command = registrar.command(cmdtable)
				# Note for extension authors: ONLY specify testedwith = 'ships-with-hg-core' for
				# extensions which SHIP WITH MERCURIAL. Non-mainline extensions should
				# be specifying the version(s) of Mercurial they are tested with, or
				# leave the attribute unspecified.
				testedwith = b'ships-with-hg-core'


				@command(
				b'relink', [], _(b'[ORIGIN]'), helpcategory=command.CATEGORY_MAINTENANCE
				)
				def relink(ui, repo, origin=None, **opts):
				"""recreate hardlinks between two repositories

				When repositories are cloned locally, their data files will be
				hardlinked so that they only use the space of a single repository.

				Unfortunately, subsequent pulls into either repository will break
				hardlinks for any files touched by the new changesets, even if
				both repositories end up pulling the same changes.

				Similarly, passing --rev to "hg clone" will fail to use any
				hardlinks, falling back to a complete copy of the source
				repository.

				This command lets you recreate those hardlinks and reclaim that
				wasted space.

				This repository will be relinked to share space with ORIGIN, which
				must be on the same local disk. If ORIGIN is omitted, looks for
				"default-relink", then "default", in [paths].

				Do not attempt any read operations on this repository while the
				command is running. (Both repositories will be locked against
				writes.)
				"""
				if not util.safehasattr(util, b'samefile') or not util.safehasattr(
				util, b'samedevice'
				):
				raise error.Abort(_(b'hardlinks are not supported on this system'))
				src = hg.repository(
				repo.baseui,
				ui.expandpath(origin or b'default-relink', origin or b'default'),
				)
				ui.status(_(b'relinking %s to %s\n') % (src.store.path, repo.store.path))
				if repo.root == src.root:
				ui.status(_(b'there is nothing to relink\n'))
				return

				if not util.samedevice(src.store.path, repo.store.path):
				# No point in continuing
				raise error.Abort(_(b'source and destination are on different devices'))

				with repo.lock(), src.lock():
				candidates = sorted(collect(src, ui))
				targets = prune(candidates, src.store.path, repo.store.path, ui)
				do_relink(src.store.path, repo.store.path, targets, ui)


				def collect(src, ui):
				seplen = len(os.path.sep)
				candidates = []
				live = len(src[b'tip'].manifest())
				# Your average repository has some files which were deleted before
				# the tip revision. We account for that by assuming that there are
				# 3 tracked files for every 2 live files as of the tip version of
				# the repository.
				#
				# mozilla-central as of 2010-06-10 had a ratio of just over 7:5.
				total = live * 3 // 2
				src = src.store.path
				progress = ui.makeprogress(_(b'collecting'), unit=_(b'files'), total=total)
				pos = 0
				ui.status(
				_(b"tip has %d files, estimated total number of files: %d\n")
				% (live, total)
				)
				for dirpath, dirnames, filenames in os.walk(src):
				dirnames.sort()
				relpath = dirpath[len(src) + seplen :]
				for filename in sorted(filenames):
				if filename[-2:] not in (b'.d', b'.i'):
				continue
				st = os.stat(os.path.join(dirpath, filename))
				if not stat.S_ISREG(st.st_mode):
				continue
				pos += 1
				candidates.append((os.path.join(relpath, filename), st))
				progress.update(pos, item=filename)

				progress.complete()
				ui.status(_(b'collected %d candidate storage files\n') % len(candidates))
				return candidates


				def prune(candidates, src, dst, ui):
				def linkfilter(src, dst, st):
				try:
				ts = os.stat(dst)
				except OSError:
				# Destination doesn't have this file?
				return False
				if util.samefile(src, dst):
				return False
				if not util.samedevice(src, dst):
				# No point in continuing
				raise error.Abort(
				_(b'source and destination are on different devices')
				)
				if st.st_size != ts.st_size:
				return False
				return st

				targets = []
				progress = ui.makeprogress(
				_(b'pruning'), unit=_(b'files'), total=len(candidates)
				)
				pos = 0
				for fn, st in candidates:
				pos += 1
				srcpath = os.path.join(src, fn)
				tgt = os.path.join(dst, fn)
				ts = linkfilter(srcpath, tgt, st)
				if not ts:
				ui.debug(b'not linkable: %s\n' % fn)
				continue
				targets.append((fn, ts.st_size))
				progress.update(pos, item=fn)

				progress.complete()
				ui.status(
				_(b'pruned down to %d probably relinkable files\n') % len(targets)
				)
				return targets


				def do_relink(src, dst, files, ui):
				def relinkfile(src, dst):
				bak = dst + b'.bak'
				os.rename(dst, bak)
				try:
				util.oslink(src, dst)
				except OSError:
				os.rename(bak, dst)
				raise
				os.remove(bak)

				CHUNKLEN = 65536
				relinked = 0
				savedbytes = 0

				progress = ui.makeprogress(
				_(b'relinking'), unit=_(b'files'), total=len(files)
				)
				pos = 0
				for f, sz in files:
				pos += 1
				source = os.path.join(src, f)
				tgt = os.path.join(dst, f)
				# Binary mode, so that read() works correctly, especially on Windows
				sfp = open(source, b'rb')
				dfp = open(tgt, b'rb')
				sin = sfp.read(CHUNKLEN)
				while sin:
				din = dfp.read(CHUNKLEN)
				if sin != din:
				break
				sin = sfp.read(CHUNKLEN)
				sfp.close()
				dfp.close()
				if sin:
				ui.debug(b'not linkable: %s\n' % f)
				continue
				try:
				relinkfile(source, tgt)
				progress.update(pos, item=f)
				relinked += 1
				savedbytes += sz
				except OSError as inst:
				ui.warn(b'%s: %s\n' % (tgt, stringutil.forcebytestr(inst)))

				progress.complete()

				ui.status(
				_(b'relinked %d files (%s reclaimed)\n')
				% (relinked, util.bytecount(savedbytes))
				)