##// END OF EJS Templates
repair: migrate revlogs during upgrade...
repair: migrate revlogs during upgrade Our next step for in-place upgrade is to migrate store data. Revlogs are the biggest source of data within the store and a store is useless without them, so we implement their migration first. Our strategy for migrating revlogs is to walk the store and call `revlog.clone()` on each revlog. There are some minor complications. Because revlogs have different storage options (e.g. changelog has generaldelta and delta chains disabled), we need to obtain the correct class of revlog so inserted data is encoded properly for its type. Various attempts at implementing progress indicators that didn't lead to frustration from false "it's almost done" indicators were made. I initially used a single progress bar based on number of revlogs. However, this quickly churned through all filelogs, got to 99% then effectively froze at 99.99% when it got to the manifest. So I converted the progress bar to total revision count. This was a little bit better. But the manifest was still significantly slower than filelogs and it took forever to process the last few percent. I then tried both revision/chunk bytes and raw bytes as the denominator. This had the opposite effect: because so much data is in manifests, it would churn through filelogs without showing much progress. When it got to manifests, it would fill in 90+% of the progress bar. I finally gave up having a unified progress bar and instead implemented 3 progress bars: 1 for filelog revisions, 1 for manifest revisions, and 1 for changelog revisions. I added extra messages indicating the total number of revisions of each so users know there are more progress bars coming. I also added extra messages before and after each stage to give extra details about what is happening. Strictly speaking, this isn't necessary. But the numbers are impressive. For example, when converting a non-generaldelta mozilla-central repository, the messages you see are: migrating 2475593 total revisions (1833043 in filelogs, 321156 in manifests, 321394 in changelog) migrating 1.67 GB in store; 2508 GB tracked data migrating 267868 filelogs containing 1833043 revisions (1.09 GB in store; 57.3 GB tracked data) finished migrating 1833043 filelog revisions across 267868 filelogs; change in size: -415776 bytes migrating 1 manifests containing 321156 revisions (518 MB in store; 2451 GB tracked data) That "2508 GB" figure really blew me away. I had no clue that the raw tracked data in mozilla-central was that large. Granted, 2451 GB is in the manifest and "only" 57.3 GB is in filelogs. But still. It's worth noting that gratuitous loading of source revlogs in order to display numbers and progress bars does serve a purpose: it ensures we can open all source revlogs. We don't want to spend several minutes copying revlogs only to encounter a permissions error or similar later. As part of this commit, we also add swapping of the store directory to the upgrade function. After revlogs are converted, we move the old store into the backup directory then move the temporary repo's store into the old store's location. On well-behaved systems, this should be 2 atomic operations and the window of inconsistency show be very narrow. There are still a few improvements to be made to store copying and upgrading. But this commit gets the bulk of the work out of the way.

File last commit:

r30647:1914db1b stable
r30779:38aa1ca9 default
Show More
demandimport.py
330 lines | 11.6 KiB | text/x-python | PythonLexer
# demandimport.py - global demand-loading of modules for Mercurial
#
# Copyright 2006, 2007 Matt Mackall <mpm@selenic.com>
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
'''
demandimport - automatic demandloading of modules
To enable this module, do:
import demandimport; demandimport.enable()
Imports of the following forms will be demand-loaded:
import a, b.c
import a.b as c
from a import b,c # a will be loaded immediately
These imports will not be delayed:
from a import *
b = __import__(a)
'''
from __future__ import absolute_import
import contextlib
import os
import sys
# __builtin__ in Python 2, builtins in Python 3.
try:
import __builtin__ as builtins
except ImportError:
import builtins
contextmanager = contextlib.contextmanager
_origimport = __import__
nothing = object()
# Python 3 doesn't have relative imports nor level -1.
level = -1
if sys.version_info[0] >= 3:
level = 0
_import = _origimport
def _hgextimport(importfunc, name, globals, *args, **kwargs):
try:
return importfunc(name, globals, *args, **kwargs)
except ImportError:
if not globals:
raise
# extensions are loaded with "hgext_" prefix
hgextname = 'hgext_%s' % name
nameroot = hgextname.split('.', 1)[0]
contextroot = globals.get('__name__', '').split('.', 1)[0]
if nameroot != contextroot:
raise
# retry to import with "hgext_" prefix
return importfunc(hgextname, globals, *args, **kwargs)
class _demandmod(object):
"""module demand-loader and proxy
Specify 1 as 'level' argument at construction, to import module
relatively.
"""
def __init__(self, name, globals, locals, level):
if '.' in name:
head, rest = name.split('.', 1)
after = [rest]
else:
head = name
after = []
object.__setattr__(self, "_data",
(head, globals, locals, after, level, set()))
object.__setattr__(self, "_module", None)
def _extend(self, name):
"""add to the list of submodules to load"""
self._data[3].append(name)
def _addref(self, name):
"""Record that the named module ``name`` imports this module.
References to this proxy class having the name of this module will be
replaced at module load time. We assume the symbol inside the importing
module is identical to the "head" name of this module. We don't
actually know if "as X" syntax is being used to change the symbol name
because this information isn't exposed to __import__.
"""
self._data[5].add(name)
def _load(self):
if not self._module:
head, globals, locals, after, level, modrefs = self._data
mod = _hgextimport(_import, head, globals, locals, None, level)
if mod is self:
# In this case, _hgextimport() above should imply
# _demandimport(). Otherwise, _hgextimport() never
# returns _demandmod. This isn't intentional behavior,
# in fact. (see also issue5304 for detail)
#
# If self._module is already bound at this point, self
# should be already _load()-ed while _hgextimport().
# Otherwise, there is no way to import actual module
# as expected, because (re-)invoking _hgextimport()
# should cause same result.
# This is reason why _load() returns without any more
# setup but assumes self to be already bound.
mod = self._module
assert mod and mod is not self, "%s, %s" % (self, mod)
return
# load submodules
def subload(mod, p):
h, t = p, None
if '.' in p:
h, t = p.split('.', 1)
if getattr(mod, h, nothing) is nothing:
setattr(mod, h, _demandmod(p, mod.__dict__, mod.__dict__,
level=1))
elif t:
subload(getattr(mod, h), t)
for x in after:
subload(mod, x)
# Replace references to this proxy instance with the actual module.
if locals and locals.get(head) == self:
locals[head] = mod
for modname in modrefs:
modref = sys.modules.get(modname, None)
if modref and getattr(modref, head, None) == self:
setattr(modref, head, mod)
object.__setattr__(self, "_module", mod)
def __repr__(self):
if self._module:
return "<proxied module '%s'>" % self._data[0]
return "<unloaded module '%s'>" % self._data[0]
def __call__(self, *args, **kwargs):
raise TypeError("%s object is not callable" % repr(self))
def __getattribute__(self, attr):
if attr in ('_data', '_extend', '_load', '_module', '_addref'):
return object.__getattribute__(self, attr)
self._load()
return getattr(self._module, attr)
def __setattr__(self, attr, val):
self._load()
setattr(self._module, attr, val)
_pypy = '__pypy__' in sys.builtin_module_names
def _demandimport(name, globals=None, locals=None, fromlist=None, level=level):
if not locals or name in ignore or fromlist == ('*',):
# these cases we can't really delay
return _hgextimport(_import, name, globals, locals, fromlist, level)
elif not fromlist:
# import a [as b]
if '.' in name: # a.b
base, rest = name.split('.', 1)
# email.__init__ loading email.mime
if globals and globals.get('__name__', None) == base:
return _import(name, globals, locals, fromlist, level)
# if a is already demand-loaded, add b to its submodule list
if base in locals:
if isinstance(locals[base], _demandmod):
locals[base]._extend(rest)
return locals[base]
return _demandmod(name, globals, locals, level)
else:
# There is a fromlist.
# from a import b,c,d
# from . import b,c,d
# from .a import b,c,d
# level == -1: relative and absolute attempted (Python 2 only).
# level >= 0: absolute only (Python 2 w/ absolute_import and Python 3).
# The modern Mercurial convention is to use absolute_import everywhere,
# so modern Mercurial code will have level >= 0.
# The name of the module the import statement is located in.
globalname = globals.get('__name__')
def processfromitem(mod, attr):
"""Process an imported symbol in the import statement.
If the symbol doesn't exist in the parent module, and if the
parent module is a package, it must be a module. We set missing
modules up as _demandmod instances.
"""
symbol = getattr(mod, attr, nothing)
nonpkg = getattr(mod, '__path__', nothing) is nothing
if symbol is nothing:
if nonpkg:
# do not try relative import, which would raise ValueError,
# and leave unknown attribute as the default __import__()
# would do. the missing attribute will be detected later
# while processing the import statement.
return
mn = '%s.%s' % (mod.__name__, attr)
if mn in ignore:
importfunc = _origimport
else:
importfunc = _demandmod
symbol = importfunc(attr, mod.__dict__, locals, level=1)
setattr(mod, attr, symbol)
# Record the importing module references this symbol so we can
# replace the symbol with the actual module instance at load
# time.
if globalname and isinstance(symbol, _demandmod):
symbol._addref(globalname)
def chainmodules(rootmod, modname):
# recurse down the module chain, and return the leaf module
mod = rootmod
for comp in modname.split('.')[1:]:
if getattr(mod, comp, nothing) is nothing:
setattr(mod, comp, _demandmod(comp, mod.__dict__,
mod.__dict__, level=1))
mod = getattr(mod, comp)
return mod
if level >= 0:
if name:
# "from a import b" or "from .a import b" style
rootmod = _hgextimport(_origimport, name, globals, locals,
level=level)
mod = chainmodules(rootmod, name)
elif _pypy:
# PyPy's __import__ throws an exception if invoked
# with an empty name and no fromlist. Recreate the
# desired behaviour by hand.
mn = globalname
mod = sys.modules[mn]
if getattr(mod, '__path__', nothing) is nothing:
mn = mn.rsplit('.', 1)[0]
mod = sys.modules[mn]
if level > 1:
mn = mn.rsplit('.', level - 1)[0]
mod = sys.modules[mn]
else:
mod = _hgextimport(_origimport, name, globals, locals,
level=level)
for x in fromlist:
processfromitem(mod, x)
return mod
# But, we still need to support lazy loading of standard library and 3rd
# party modules. So handle level == -1.
mod = _hgextimport(_origimport, name, globals, locals)
mod = chainmodules(mod, name)
for x in fromlist:
processfromitem(mod, x)
return mod
ignore = [
'__future__',
'_hashlib',
# ImportError during pkg_resources/__init__.py:fixup_namespace_package
'_imp',
'_xmlplus',
'fcntl',
'nt', # pathlib2 tests the existence of built-in 'nt' module
'win32com.gen_py',
'_winreg', # 2.7 mimetypes needs immediate ImportError
'pythoncom',
# imported by tarfile, not available under Windows
'pwd',
'grp',
# imported by profile, itself imported by hotshot.stats,
# not available under Windows
'resource',
# this trips up many extension authors
'gtk',
# setuptools' pkg_resources.py expects "from __main__ import x" to
# raise ImportError if x not defined
'__main__',
'_ssl', # conditional imports in the stdlib, issue1964
'_sre', # issue4920
'rfc822',
'mimetools',
'sqlalchemy.events', # has import-time side effects (issue5085)
# setuptools 8 expects this module to explode early when not on windows
'distutils.msvc9compiler',
'__builtin__',
'builtins',
]
if _pypy:
ignore.extend([
# _ctypes.pointer is shadowed by "from ... import pointer" (PyPy 5)
'_ctypes.pointer',
])
def isenabled():
return builtins.__import__ == _demandimport
def enable():
"enable global demand-loading of modules"
if os.environ.get('HGDEMANDIMPORT') != 'disable':
builtins.__import__ = _demandimport
def disable():
"disable global demand-loading of modules"
builtins.__import__ = _origimport
@contextmanager
def deactivated():
"context manager for disabling demandimport in 'with' blocks"
demandenabled = isenabled()
if demandenabled:
disable()
try:
yield
finally:
if demandenabled:
enable()