##// END OF EJS Templates
repair: migrate revlogs during upgrade...
repair: migrate revlogs during upgrade Our next step for in-place upgrade is to migrate store data. Revlogs are the biggest source of data within the store and a store is useless without them, so we implement their migration first. Our strategy for migrating revlogs is to walk the store and call `revlog.clone()` on each revlog. There are some minor complications. Because revlogs have different storage options (e.g. changelog has generaldelta and delta chains disabled), we need to obtain the correct class of revlog so inserted data is encoded properly for its type. Various attempts at implementing progress indicators that didn't lead to frustration from false "it's almost done" indicators were made. I initially used a single progress bar based on number of revlogs. However, this quickly churned through all filelogs, got to 99% then effectively froze at 99.99% when it got to the manifest. So I converted the progress bar to total revision count. This was a little bit better. But the manifest was still significantly slower than filelogs and it took forever to process the last few percent. I then tried both revision/chunk bytes and raw bytes as the denominator. This had the opposite effect: because so much data is in manifests, it would churn through filelogs without showing much progress. When it got to manifests, it would fill in 90+% of the progress bar. I finally gave up having a unified progress bar and instead implemented 3 progress bars: 1 for filelog revisions, 1 for manifest revisions, and 1 for changelog revisions. I added extra messages indicating the total number of revisions of each so users know there are more progress bars coming. I also added extra messages before and after each stage to give extra details about what is happening. Strictly speaking, this isn't necessary. But the numbers are impressive. For example, when converting a non-generaldelta mozilla-central repository, the messages you see are: migrating 2475593 total revisions (1833043 in filelogs, 321156 in manifests, 321394 in changelog) migrating 1.67 GB in store; 2508 GB tracked data migrating 267868 filelogs containing 1833043 revisions (1.09 GB in store; 57.3 GB tracked data) finished migrating 1833043 filelog revisions across 267868 filelogs; change in size: -415776 bytes migrating 1 manifests containing 321156 revisions (518 MB in store; 2451 GB tracked data) That "2508 GB" figure really blew me away. I had no clue that the raw tracked data in mozilla-central was that large. Granted, 2451 GB is in the manifest and "only" 57.3 GB is in filelogs. But still. It's worth noting that gratuitous loading of source revlogs in order to display numbers and progress bars does serve a purpose: it ensures we can open all source revlogs. We don't want to spend several minutes copying revlogs only to encounter a permissions error or similar later. As part of this commit, we also add swapping of the store directory to the upgrade function. After revlogs are converted, we move the old store into the backup directory then move the temporary repo's store into the old store's location. On well-behaved systems, this should be 2 atomic operations and the window of inconsistency show be very narrow. There are still a few improvements to be made to store copying and upgrading. But this commit gets the bulk of the work out of the way.

File last commit:

r30642:e995f00a default
r30779:38aa1ca9 default
Show More
osutil.py
365 lines | 12.3 KiB | text/x-python | PythonLexer
Martin Geisler
pure/osutil: add copyright and license header
r8232 # osutil.py - pure Python version of osutil.c
#
# Copyright 2009 Matt Mackall <mpm@selenic.com> and others
#
# This software may be used and distributed according to the terms of the
Matt Mackall
Update license to GPLv2+
r10263 # GNU General Public License version 2 or any later version.
Martin Geisler
pure/osutil: add copyright and license header
r8232
Gregory Szorc
osutil: use absolute_import
r27338 from __future__ import absolute_import
Yuya Nishihara
osutil: implement pure version of recvfds() for PyPy...
r27474 import ctypes
import ctypes.util
Martin Geisler
move mercurial.osutil to mercurial.pure.osutil
r7704 import os
Yuya Nishihara
osutil: implement pure version of recvfds() for PyPy...
r27474 import socket
Benoit Boissinot
style: use consistent variable names (*mod) with imports which would shadow
r10651 import stat as statmod
Martin Geisler
move mercurial.osutil to mercurial.pure.osutil
r7704
Pulkit Goyal
py3: use pycompat.ossep at certain places...
r30304 from . import (
policy,
pycompat,
)
Maciej Fijalkowski
osutil: add darwin-only version of os.listdir using cffi
r29600 modulepolicy = policy.policy
policynocffi = policy.policynocffi
Martin Geisler
move mercurial.osutil to mercurial.pure.osutil
r7704 def _mode_to_kind(mode):
Benoit Boissinot
style: use consistent variable names (*mod) with imports which would shadow
r10651 if statmod.S_ISREG(mode):
return statmod.S_IFREG
if statmod.S_ISDIR(mode):
return statmod.S_IFDIR
if statmod.S_ISLNK(mode):
return statmod.S_IFLNK
if statmod.S_ISBLK(mode):
return statmod.S_IFBLK
if statmod.S_ISCHR(mode):
return statmod.S_IFCHR
if statmod.S_ISFIFO(mode):
return statmod.S_IFIFO
if statmod.S_ISSOCK(mode):
return statmod.S_IFSOCK
Martin Geisler
move mercurial.osutil to mercurial.pure.osutil
r7704 return mode
Maciej Fijalkowski
osutil: add darwin-only version of os.listdir using cffi
r29600 def listdirpure(path, stat=False, skip=None):
Martin Geisler
move mercurial.osutil to mercurial.pure.osutil
r7704 '''listdir(path, stat=False) -> list_of_tuples
Return a sorted list containing information about the entries
in the directory.
If stat is True, each element is a 3-tuple:
(name, type, stat object)
Otherwise, each element is a 2-tuple:
(name, type)
'''
result = []
prefix = path
Pulkit Goyal
py3: use pycompat.ossep at certain places...
r30304 if not prefix.endswith(pycompat.ossep):
prefix += pycompat.ossep
Martin Geisler
move mercurial.osutil to mercurial.pure.osutil
r7704 names = os.listdir(path)
names.sort()
for fn in names:
st = os.lstat(prefix + fn)
Benoit Boissinot
style: use consistent variable names (*mod) with imports which would shadow
r10651 if fn == skip and statmod.S_ISDIR(st.st_mode):
Martin Geisler
move mercurial.osutil to mercurial.pure.osutil
r7704 return []
if stat:
result.append((fn, _mode_to_kind(st.st_mode), st))
else:
result.append((fn, _mode_to_kind(st.st_mode)))
return result
Sune Foldager
posixfile: remove posixfile_nt and fix import bug in windows.py...
r8421
Maciej Fijalkowski
osutil: add darwin-only version of os.listdir using cffi
r29600 ffi = None
Pulkit Goyal
py3: replace sys.platform with pycompat.sysplatform (part 2 of 2)
r30642 if modulepolicy not in policynocffi and pycompat.sysplatform == 'darwin':
Maciej Fijalkowski
osutil: add darwin-only version of os.listdir using cffi
r29600 try:
from _osutil_cffi import ffi, lib
except ImportError:
if modulepolicy == 'cffi': # strict cffi import
raise
Pulkit Goyal
py3: replace sys.platform with pycompat.sysplatform (part 2 of 2)
r30642 if pycompat.sysplatform == 'darwin' and ffi is not None:
Maciej Fijalkowski
osutil: add darwin-only version of os.listdir using cffi
r29600 listdir_batch_size = 4096
# tweakable number, only affects performance, which chunks
# of bytes do we get back from getattrlistbulk
attrkinds = [None] * 20 # we need the max no for enum VXXX, 20 is plenty
attrkinds[lib.VREG] = statmod.S_IFREG
attrkinds[lib.VDIR] = statmod.S_IFDIR
attrkinds[lib.VLNK] = statmod.S_IFLNK
attrkinds[lib.VBLK] = statmod.S_IFBLK
attrkinds[lib.VCHR] = statmod.S_IFCHR
attrkinds[lib.VFIFO] = statmod.S_IFIFO
attrkinds[lib.VSOCK] = statmod.S_IFSOCK
class stat_res(object):
def __init__(self, st_mode, st_mtime, st_size):
self.st_mode = st_mode
self.st_mtime = st_mtime
self.st_size = st_size
tv_sec_ofs = ffi.offsetof("struct timespec", "tv_sec")
buf = ffi.new("char[]", listdir_batch_size)
def listdirinternal(dfd, req, stat, skip):
ret = []
while True:
r = lib.getattrlistbulk(dfd, req, buf, listdir_batch_size, 0)
if r == 0:
break
if r == -1:
raise OSError(ffi.errno, os.strerror(ffi.errno))
cur = ffi.cast("val_attrs_t*", buf)
for i in range(r):
lgt = cur.length
assert lgt == ffi.cast('uint32_t*', cur)[0]
ofs = cur.name_info.attr_dataoffset
str_lgt = cur.name_info.attr_length
base_ofs = ffi.offsetof('val_attrs_t', 'name_info')
name = str(ffi.buffer(ffi.cast("char*", cur) + base_ofs + ofs,
str_lgt - 1))
tp = attrkinds[cur.obj_type]
if name == "." or name == "..":
continue
if skip == name and tp == statmod.S_ISDIR:
return []
if stat:
Maciej Fijalkowski
osutil: fix the bug on OS X when we return more in listdir...
r29821 mtime = cur.mtime.tv_sec
Maciej Fijalkowski
osutil: add darwin-only version of os.listdir using cffi
r29600 mode = (cur.accessmask & ~lib.S_IFMT)| tp
ret.append((name, tp, stat_res(st_mode=mode, st_mtime=mtime,
st_size=cur.datalength)))
else:
ret.append((name, tp))
Maciej Fijalkowski
osutil: fix the bug on OS X when we return more in listdir...
r29821 cur = ffi.cast("val_attrs_t*", int(ffi.cast("intptr_t", cur))
+ lgt)
Maciej Fijalkowski
osutil: add darwin-only version of os.listdir using cffi
r29600 return ret
def listdir(path, stat=False, skip=None):
req = ffi.new("struct attrlist*")
req.bitmapcount = lib.ATTR_BIT_MAP_COUNT
req.commonattr = (lib.ATTR_CMN_RETURNED_ATTRS |
lib.ATTR_CMN_NAME |
lib.ATTR_CMN_OBJTYPE |
lib.ATTR_CMN_ACCESSMASK |
lib.ATTR_CMN_MODTIME)
req.fileattr = lib.ATTR_FILE_DATALENGTH
dfd = lib.open(path, lib.O_RDONLY, 0)
if dfd == -1:
raise OSError(ffi.errno, os.strerror(ffi.errno))
try:
ret = listdirinternal(dfd, req, stat, skip)
finally:
try:
lib.close(dfd)
except BaseException:
pass # we ignore all the errors from closing, not
# much we can do about that
return ret
else:
listdir = listdirpure
Pulkit Goyal
py3: replace os.name with pycompat.osname (part 1 of 2)...
r30639 if pycompat.osname != 'nt':
Adrian Buehlmann
pure: provide more correct implementation of posixfile for Windows...
r14413 posixfile = open
Yuya Nishihara
osutil: implement pure version of recvfds() for PyPy...
r27474
_SCM_RIGHTS = 0x01
_socklen_t = ctypes.c_uint
Pulkit Goyal
py3: replace sys.platform with pycompat.sysplatform (part 2 of 2)
r30642 if pycompat.sysplatform.startswith('linux'):
Yuya Nishihara
osutil: implement pure version of recvfds() for PyPy...
r27474 # socket.h says "the type should be socklen_t but the definition of
# the kernel is incompatible with this."
_cmsg_len_t = ctypes.c_size_t
_msg_controllen_t = ctypes.c_size_t
_msg_iovlen_t = ctypes.c_size_t
else:
_cmsg_len_t = _socklen_t
_msg_controllen_t = _socklen_t
_msg_iovlen_t = ctypes.c_int
class _iovec(ctypes.Structure):
_fields_ = [
Pulkit Goyal
py3: use unicode literals in pure/osutil.py...
r29698 (u'iov_base', ctypes.c_void_p),
(u'iov_len', ctypes.c_size_t),
Yuya Nishihara
osutil: implement pure version of recvfds() for PyPy...
r27474 ]
class _msghdr(ctypes.Structure):
_fields_ = [
Pulkit Goyal
py3: use unicode literals in pure/osutil.py...
r29698 (u'msg_name', ctypes.c_void_p),
(u'msg_namelen', _socklen_t),
(u'msg_iov', ctypes.POINTER(_iovec)),
(u'msg_iovlen', _msg_iovlen_t),
(u'msg_control', ctypes.c_void_p),
(u'msg_controllen', _msg_controllen_t),
(u'msg_flags', ctypes.c_int),
Yuya Nishihara
osutil: implement pure version of recvfds() for PyPy...
r27474 ]
class _cmsghdr(ctypes.Structure):
_fields_ = [
Pulkit Goyal
py3: use unicode literals in pure/osutil.py...
r29698 (u'cmsg_len', _cmsg_len_t),
(u'cmsg_level', ctypes.c_int),
(u'cmsg_type', ctypes.c_int),
(u'cmsg_data', ctypes.c_ubyte * 0),
Yuya Nishihara
osutil: implement pure version of recvfds() for PyPy...
r27474 ]
Pulkit Goyal
py3: use unicode literals in pure/osutil.py...
r29698 _libc = ctypes.CDLL(ctypes.util.find_library(u'c'), use_errno=True)
Yuya Nishihara
osutil: do not abort loading pure module just because libc has no recvmsg()...
r27971 _recvmsg = getattr(_libc, 'recvmsg', None)
if _recvmsg:
_recvmsg.restype = getattr(ctypes, 'c_ssize_t', ctypes.c_long)
_recvmsg.argtypes = (ctypes.c_int, ctypes.POINTER(_msghdr),
ctypes.c_int)
else:
# recvmsg isn't always provided by libc; such systems are unsupported
def _recvmsg(sockfd, msg, flags):
raise NotImplementedError('unsupported platform')
Yuya Nishihara
osutil: implement pure version of recvfds() for PyPy...
r27474
def _CMSG_FIRSTHDR(msgh):
if msgh.msg_controllen < ctypes.sizeof(_cmsghdr):
return
cmsgptr = ctypes.cast(msgh.msg_control, ctypes.POINTER(_cmsghdr))
return cmsgptr.contents
# The pure version is less portable than the native version because the
# handling of socket ancillary data heavily depends on C preprocessor.
# Also, some length fields are wrongly typed in Linux kernel.
def recvfds(sockfd):
"""receive list of file descriptors via socket"""
dummy = (ctypes.c_ubyte * 1)()
iov = _iovec(ctypes.cast(dummy, ctypes.c_void_p), ctypes.sizeof(dummy))
cbuf = ctypes.create_string_buffer(256)
msgh = _msghdr(None, 0,
ctypes.pointer(iov), 1,
ctypes.cast(cbuf, ctypes.c_void_p), ctypes.sizeof(cbuf),
0)
r = _recvmsg(sockfd, ctypes.byref(msgh), 0)
if r < 0:
e = ctypes.get_errno()
raise OSError(e, os.strerror(e))
# assumes that the first cmsg has fds because it isn't easy to write
# portable CMSG_NXTHDR() with ctypes.
cmsg = _CMSG_FIRSTHDR(msgh)
if not cmsg:
return []
if (cmsg.cmsg_level != socket.SOL_SOCKET or
cmsg.cmsg_type != _SCM_RIGHTS):
return []
rfds = ctypes.cast(cmsg.cmsg_data, ctypes.POINTER(ctypes.c_int))
rfdscount = ((cmsg.cmsg_len - _cmsghdr.cmsg_data.offset) /
ctypes.sizeof(ctypes.c_int))
return [rfds[i] for i in xrange(rfdscount)]
Adrian Buehlmann
pure: provide more correct implementation of posixfile for Windows...
r14413 else:
Gregory Szorc
osutil: use absolute_import
r27338 import msvcrt
Adrian Buehlmann
pure: provide more correct implementation of posixfile for Windows...
r14413
_kernel32 = ctypes.windll.kernel32
_DWORD = ctypes.c_ulong
_LPCSTR = _LPSTR = ctypes.c_char_p
_HANDLE = ctypes.c_void_p
_INVALID_HANDLE_VALUE = _HANDLE(-1).value
Mads Kiilerich
check-code: catch trailing space in comments
r18959 # CreateFile
Adrian Buehlmann
pure: provide more correct implementation of posixfile for Windows...
r14413 _FILE_SHARE_READ = 0x00000001
_FILE_SHARE_WRITE = 0x00000002
_FILE_SHARE_DELETE = 0x00000004
_CREATE_ALWAYS = 2
_OPEN_EXISTING = 3
_OPEN_ALWAYS = 4
_GENERIC_READ = 0x80000000
_GENERIC_WRITE = 0x40000000
_FILE_ATTRIBUTE_NORMAL = 0x80
Mads Kiilerich
declare local constants instead of using magic values and comments
r17429 # open_osfhandle flags
Adrian Buehlmann
pure: provide more correct implementation of posixfile for Windows...
r14413 _O_RDONLY = 0x0000
_O_RDWR = 0x0002
_O_APPEND = 0x0008
_O_TEXT = 0x4000
_O_BINARY = 0x8000
# types of parameters of C functions used (required by pypy)
_kernel32.CreateFileA.argtypes = [_LPCSTR, _DWORD, _DWORD, ctypes.c_void_p,
_DWORD, _DWORD, _HANDLE]
_kernel32.CreateFileA.restype = _HANDLE
def _raiseioerror(name):
err = ctypes.WinError()
Gregory Szorc
osutil: remove Python 2.4 errno conversion workaround
r25645 raise IOError(err.errno, '%s: %s' % (name, err.strerror))
Adrian Buehlmann
pure: provide more correct implementation of posixfile for Windows...
r14413
class posixfile(object):
'''a file object aiming for POSIX-like semantics
CPython's open() returns a file that was opened *without* setting the
_FILE_SHARE_DELETE flag, which causes rename and unlink to abort.
This even happens if any hardlinked copy of the file is in open state.
We set _FILE_SHARE_DELETE here, so files opened with posixfile can be
renamed and deleted while they are held open.
Note that if a file opened with posixfile is unlinked, the file
remains but cannot be opened again or be recreated under the same name,
until all reading processes have closed the file.'''
def __init__(self, name, mode='r', bufsize=-1):
if 'b' in mode:
flags = _O_BINARY
else:
flags = _O_TEXT
m0 = mode[0]
Brodie Rao
cleanup: "not x in y" -> "x not in y"
r16686 if m0 == 'r' and '+' not in mode:
Adrian Buehlmann
pure: provide more correct implementation of posixfile for Windows...
r14413 flags |= _O_RDONLY
access = _GENERIC_READ
else:
# work around http://support.microsoft.com/kb/899149 and
# set _O_RDWR for 'w' and 'a', even if mode has no '+'
flags |= _O_RDWR
access = _GENERIC_READ | _GENERIC_WRITE
if m0 == 'r':
creation = _OPEN_EXISTING
elif m0 == 'w':
creation = _CREATE_ALWAYS
elif m0 == 'a':
creation = _OPEN_ALWAYS
flags |= _O_APPEND
else:
raise ValueError("invalid mode: %s" % mode)
fh = _kernel32.CreateFileA(name, access,
_FILE_SHARE_READ | _FILE_SHARE_WRITE | _FILE_SHARE_DELETE,
None, creation, _FILE_ATTRIBUTE_NORMAL, None)
if fh == _INVALID_HANDLE_VALUE:
_raiseioerror(name)
Adrian Buehlmann
pure/osutil: use Python's msvcrt module (issue3380)...
r16474 fd = msvcrt.open_osfhandle(fh, flags)
Adrian Buehlmann
pure: provide more correct implementation of posixfile for Windows...
r14413 if fd == -1:
_kernel32.CloseHandle(fh)
_raiseioerror(name)
f = os.fdopen(fd, mode, bufsize)
# unfortunately, f.name is '<fdopen>' at this point -- so we store
# the name on this wrapper. We cannot just assign to f.name,
# because that attribute is read-only.
object.__setattr__(self, 'name', name)
object.__setattr__(self, '_file', f)
def __iter__(self):
return self._file
def __getattr__(self, name):
return getattr(self._file, name)
def __setattr__(self, name, value):
'''mimics the read-only attributes of Python file objects
by raising 'TypeError: readonly attribute' if someone tries:
f = posixfile('foo.txt')
f.name = 'bla' '''
return self._file.__setattr__(name, value)
Gregory Szorc
osutil: implement __enter__ and __exit__ on posixfile...
r27704
def __enter__(self):
return self._file.__enter__()
def __exit__(self, exc_type, exc_value, exc_tb):
return self._file.__exit__(exc_type, exc_value, exc_tb)