##// END OF EJS Templates
repair: migrate revlogs during upgrade...
repair: migrate revlogs during upgrade Our next step for in-place upgrade is to migrate store data. Revlogs are the biggest source of data within the store and a store is useless without them, so we implement their migration first. Our strategy for migrating revlogs is to walk the store and call `revlog.clone()` on each revlog. There are some minor complications. Because revlogs have different storage options (e.g. changelog has generaldelta and delta chains disabled), we need to obtain the correct class of revlog so inserted data is encoded properly for its type. Various attempts at implementing progress indicators that didn't lead to frustration from false "it's almost done" indicators were made. I initially used a single progress bar based on number of revlogs. However, this quickly churned through all filelogs, got to 99% then effectively froze at 99.99% when it got to the manifest. So I converted the progress bar to total revision count. This was a little bit better. But the manifest was still significantly slower than filelogs and it took forever to process the last few percent. I then tried both revision/chunk bytes and raw bytes as the denominator. This had the opposite effect: because so much data is in manifests, it would churn through filelogs without showing much progress. When it got to manifests, it would fill in 90+% of the progress bar. I finally gave up having a unified progress bar and instead implemented 3 progress bars: 1 for filelog revisions, 1 for manifest revisions, and 1 for changelog revisions. I added extra messages indicating the total number of revisions of each so users know there are more progress bars coming. I also added extra messages before and after each stage to give extra details about what is happening. Strictly speaking, this isn't necessary. But the numbers are impressive. For example, when converting a non-generaldelta mozilla-central repository, the messages you see are: migrating 2475593 total revisions (1833043 in filelogs, 321156 in manifests, 321394 in changelog) migrating 1.67 GB in store; 2508 GB tracked data migrating 267868 filelogs containing 1833043 revisions (1.09 GB in store; 57.3 GB tracked data) finished migrating 1833043 filelog revisions across 267868 filelogs; change in size: -415776 bytes migrating 1 manifests containing 321156 revisions (518 MB in store; 2451 GB tracked data) That "2508 GB" figure really blew me away. I had no clue that the raw tracked data in mozilla-central was that large. Granted, 2451 GB is in the manifest and "only" 57.3 GB is in filelogs. But still. It's worth noting that gratuitous loading of source revlogs in order to display numbers and progress bars does serve a purpose: it ensures we can open all source revlogs. We don't want to spend several minutes copying revlogs only to encounter a permissions error or similar later. As part of this commit, we also add swapping of the store directory to the upgrade function. After revlogs are converted, we move the old store into the backup directory then move the temporary repo's store into the old store's location. On well-behaved systems, this should be 2 atomic operations and the window of inconsistency show be very narrow. There are still a few improvements to be made to store copying and upgrading. But this commit gets the bulk of the work out of the way.

File last commit:

r29973:4ddb0575 default
r30779:38aa1ca9 default
Show More
discovery.py
436 lines | 17.1 KiB | text/x-python | PythonLexer
# discovery.py - protocol changeset discovery functions
#
# Copyright 2010 Matt Mackall <mpm@selenic.com>
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
from __future__ import absolute_import
from .i18n import _
from .node import (
nullid,
short,
)
from . import (
bookmarks,
branchmap,
error,
obsolete,
phases,
setdiscovery,
treediscovery,
util,
)
def findcommonincoming(repo, remote, heads=None, force=False):
"""Return a tuple (common, anyincoming, heads) used to identify the common
subset of nodes between repo and remote.
"common" is a list of (at least) the heads of the common subset.
"anyincoming" is testable as a boolean indicating if any nodes are missing
locally. If remote does not support getbundle, this actually is a list of
roots of the nodes that would be incoming, to be supplied to
changegroupsubset. No code except for pull should be relying on this fact
any longer.
"heads" is either the supplied heads, or else the remote's heads.
If you pass heads and they are all known locally, the response lists just
these heads in "common" and in "heads".
Please use findcommonoutgoing to compute the set of outgoing nodes to give
extensions a good hook into outgoing.
"""
if not remote.capable('getbundle'):
return treediscovery.findcommonincoming(repo, remote, heads, force)
if heads:
allknown = True
knownnode = repo.changelog.hasnode # no nodemap until it is filtered
for h in heads:
if not knownnode(h):
allknown = False
break
if allknown:
return (heads, False, heads)
res = setdiscovery.findcommonheads(repo.ui, repo, remote,
abortwhenunrelated=not force)
common, anyinc, srvheads = res
return (list(common), anyinc, heads or list(srvheads))
class outgoing(object):
'''Represents the set of nodes present in a local repo but not in a
(possibly) remote one.
Members:
missing is a list of all nodes present in local but not in remote.
common is a list of all nodes shared between the two repos.
excluded is the list of missing changeset that shouldn't be sent remotely.
missingheads is the list of heads of missing.
commonheads is the list of heads of common.
The sets are computed on demand from the heads, unless provided upfront
by discovery.'''
def __init__(self, repo, commonheads=None, missingheads=None,
missingroots=None):
# at least one of them must not be set
assert None in (commonheads, missingroots)
cl = repo.changelog
if missingheads is None:
missingheads = cl.heads()
if missingroots:
discbases = []
for n in missingroots:
discbases.extend([p for p in cl.parents(n) if p != nullid])
# TODO remove call to nodesbetween.
# TODO populate attributes on outgoing instance instead of setting
# discbases.
csets, roots, heads = cl.nodesbetween(missingroots, missingheads)
included = set(csets)
missingheads = heads
commonheads = [n for n in discbases if n not in included]
elif not commonheads:
commonheads = [nullid]
self.commonheads = commonheads
self.missingheads = missingheads
self._revlog = cl
self._common = None
self._missing = None
self.excluded = []
def _computecommonmissing(self):
sets = self._revlog.findcommonmissing(self.commonheads,
self.missingheads)
self._common, self._missing = sets
@util.propertycache
def common(self):
if self._common is None:
self._computecommonmissing()
return self._common
@util.propertycache
def missing(self):
if self._missing is None:
self._computecommonmissing()
return self._missing
def findcommonoutgoing(repo, other, onlyheads=None, force=False,
commoninc=None, portable=False):
'''Return an outgoing instance to identify the nodes present in repo but
not in other.
If onlyheads is given, only nodes ancestral to nodes in onlyheads
(inclusive) are included. If you already know the local repo's heads,
passing them in onlyheads is faster than letting them be recomputed here.
If commoninc is given, it must be the result of a prior call to
findcommonincoming(repo, other, force) to avoid recomputing it here.
If portable is given, compute more conservative common and missingheads,
to make bundles created from the instance more portable.'''
# declare an empty outgoing object to be filled later
og = outgoing(repo, None, None)
# get common set if not provided
if commoninc is None:
commoninc = findcommonincoming(repo, other, force=force)
og.commonheads, _any, _hds = commoninc
# compute outgoing
mayexclude = (repo._phasecache.phaseroots[phases.secret] or repo.obsstore)
if not mayexclude:
og.missingheads = onlyheads or repo.heads()
elif onlyheads is None:
# use visible heads as it should be cached
og.missingheads = repo.filtered("served").heads()
og.excluded = [ctx.node() for ctx in repo.set('secret() or extinct()')]
else:
# compute common, missing and exclude secret stuff
sets = repo.changelog.findcommonmissing(og.commonheads, onlyheads)
og._common, allmissing = sets
og._missing = missing = []
og.excluded = excluded = []
for node in allmissing:
ctx = repo[node]
if ctx.phase() >= phases.secret or ctx.extinct():
excluded.append(node)
else:
missing.append(node)
if len(missing) == len(allmissing):
missingheads = onlyheads
else: # update missing heads
missingheads = phases.newheads(repo, onlyheads, excluded)
og.missingheads = missingheads
if portable:
# recompute common and missingheads as if -r<rev> had been given for
# each head of missing, and --base <rev> for each head of the proper
# ancestors of missing
og._computecommonmissing()
cl = repo.changelog
missingrevs = set(cl.rev(n) for n in og._missing)
og._common = set(cl.ancestors(missingrevs)) - missingrevs
commonheads = set(og.commonheads)
og.missingheads = [h for h in og.missingheads if h not in commonheads]
return og
def _headssummary(repo, remote, outgoing):
"""compute a summary of branch and heads status before and after push
return {'branch': ([remoteheads], [newheads], [unsyncedheads])} mapping
- branch: the branch name
- remoteheads: the list of remote heads known locally
None if the branch is new
- newheads: the new remote heads (known locally) with outgoing pushed
- unsyncedheads: the list of remote heads unknown locally.
"""
cl = repo.changelog
headssum = {}
# A. Create set of branches involved in the push.
branches = set(repo[n].branch() for n in outgoing.missing)
remotemap = remote.branchmap()
newbranches = branches - set(remotemap)
branches.difference_update(newbranches)
# A. register remote heads
remotebranches = set()
for branch, heads in remote.branchmap().iteritems():
remotebranches.add(branch)
known = []
unsynced = []
knownnode = cl.hasnode # do not use nodemap until it is filtered
for h in heads:
if knownnode(h):
known.append(h)
else:
unsynced.append(h)
headssum[branch] = (known, list(known), unsynced)
# B. add new branch data
missingctx = list(repo[n] for n in outgoing.missing)
touchedbranches = set()
for ctx in missingctx:
branch = ctx.branch()
touchedbranches.add(branch)
if branch not in headssum:
headssum[branch] = (None, [], [])
# C drop data about untouched branches:
for branch in remotebranches - touchedbranches:
del headssum[branch]
# D. Update newmap with outgoing changes.
# This will possibly add new heads and remove existing ones.
newmap = branchmap.branchcache((branch, heads[1])
for branch, heads in headssum.iteritems()
if heads[0] is not None)
newmap.update(repo, (ctx.rev() for ctx in missingctx))
for branch, newheads in newmap.iteritems():
headssum[branch][1][:] = newheads
return headssum
def _oldheadssummary(repo, remoteheads, outgoing, inc=False):
"""Compute branchmapsummary for repo without branchmap support"""
# 1-4b. old servers: Check for new topological heads.
# Construct {old,new}map with branch = None (topological branch).
# (code based on update)
knownnode = repo.changelog.hasnode # no nodemap until it is filtered
oldheads = set(h for h in remoteheads if knownnode(h))
# all nodes in outgoing.missing are children of either:
# - an element of oldheads
# - another element of outgoing.missing
# - nullrev
# This explains why the new head are very simple to compute.
r = repo.set('heads(%ln + %ln)', oldheads, outgoing.missing)
newheads = list(c.node() for c in r)
# set some unsynced head to issue the "unsynced changes" warning
if inc:
unsynced = set([None])
else:
unsynced = set()
return {None: (oldheads, newheads, unsynced)}
def _nowarnheads(pushop):
# Compute newly pushed bookmarks. We don't warn about bookmarked heads.
repo = pushop.repo.unfiltered()
remote = pushop.remote
localbookmarks = repo._bookmarks
remotebookmarks = remote.listkeys('bookmarks')
bookmarkedheads = set()
# internal config: bookmarks.pushing
newbookmarks = [localbookmarks.expandname(b)
for b in pushop.ui.configlist('bookmarks', 'pushing')]
for bm in localbookmarks:
rnode = remotebookmarks.get(bm)
if rnode and rnode in repo:
lctx, rctx = repo[bm], repo[rnode]
if bookmarks.validdest(repo, rctx, lctx):
bookmarkedheads.add(lctx.node())
else:
if bm in newbookmarks and bm not in remotebookmarks:
bookmarkedheads.add(repo[bm].node())
return bookmarkedheads
def checkheads(pushop):
"""Check that a push won't add any outgoing head
raise Abort error and display ui message as needed.
"""
repo = pushop.repo.unfiltered()
remote = pushop.remote
outgoing = pushop.outgoing
remoteheads = pushop.remoteheads
newbranch = pushop.newbranch
inc = bool(pushop.incoming)
# Check for each named branch if we're creating new remote heads.
# To be a remote head after push, node must be either:
# - unknown locally
# - a local outgoing head descended from update
# - a remote head that's known locally and not
# ancestral to an outgoing head
if remoteheads == [nullid]:
# remote is empty, nothing to check.
return
if remote.capable('branchmap'):
headssum = _headssummary(repo, remote, outgoing)
else:
headssum = _oldheadssummary(repo, remoteheads, outgoing, inc)
newbranches = [branch for branch, heads in headssum.iteritems()
if heads[0] is None]
# 1. Check for new branches on the remote.
if newbranches and not newbranch: # new branch requires --new-branch
branchnames = ', '.join(sorted(newbranches))
raise error.Abort(_("push creates new remote branches: %s!")
% branchnames,
hint=_("use 'hg push --new-branch' to create"
" new remote branches"))
# 2. Find heads that we need not warn about
nowarnheads = _nowarnheads(pushop)
# 3. Check for new heads.
# If there are more heads after the push than before, a suitable
# error message, depending on unsynced status, is displayed.
errormsg = None
# If there is no obsstore, allfuturecommon won't be used, so no
# need to compute it.
if repo.obsstore:
allmissing = set(outgoing.missing)
cctx = repo.set('%ld', outgoing.common)
allfuturecommon = set(c.node() for c in cctx)
allfuturecommon.update(allmissing)
for branch, heads in sorted(headssum.iteritems()):
remoteheads, newheads, unsyncedheads = heads
candidate_newhs = set(newheads)
# add unsynced data
if remoteheads is None:
oldhs = set()
else:
oldhs = set(remoteheads)
oldhs.update(unsyncedheads)
candidate_newhs.update(unsyncedheads)
dhs = None # delta heads, the new heads on branch
discardedheads = set()
if not repo.obsstore:
newhs = candidate_newhs
else:
# remove future heads which are actually obsoleted by another
# pushed element:
#
# XXX as above, There are several cases this code does not handle
# XXX properly
#
# (1) if <nh> is public, it won't be affected by obsolete marker
# and a new is created
#
# (2) if the new heads have ancestors which are not obsolete and
# not ancestors of any other heads we will have a new head too.
#
# These two cases will be easy to handle for known changeset but
# much more tricky for unsynced changes.
#
# In addition, this code is confused by prune as it only looks for
# successors of the heads (none if pruned) leading to issue4354
newhs = set()
for nh in candidate_newhs:
if nh in repo and repo[nh].phase() <= phases.public:
newhs.add(nh)
else:
for suc in obsolete.allsuccessors(repo.obsstore, [nh]):
if suc != nh and suc in allfuturecommon:
discardedheads.add(nh)
break
else:
newhs.add(nh)
unsynced = sorted(h for h in unsyncedheads if h not in discardedheads)
if unsynced:
if None in unsynced:
# old remote, no heads data
heads = None
elif len(unsynced) <= 4 or repo.ui.verbose:
heads = ' '.join(short(h) for h in unsynced)
else:
heads = (' '.join(short(h) for h in unsynced[:4]) +
' ' + _("and %s others") % (len(unsynced) - 4))
if heads is None:
repo.ui.status(_("remote has heads that are "
"not known locally\n"))
elif branch is None:
repo.ui.status(_("remote has heads that are "
"not known locally: %s\n") % heads)
else:
repo.ui.status(_("remote has heads on branch '%s' that are "
"not known locally: %s\n") % (branch, heads))
if remoteheads is None:
if len(newhs) > 1:
dhs = list(newhs)
if errormsg is None:
errormsg = (_("push creates new branch '%s' "
"with multiple heads") % (branch))
hint = _("merge or"
" see 'hg help push' for details about"
" pushing new heads")
elif len(newhs) > len(oldhs):
# remove bookmarked or existing remote heads from the new heads list
dhs = sorted(newhs - nowarnheads - oldhs)
if dhs:
if errormsg is None:
if branch not in ('default', None):
errormsg = _("push creates new remote head %s "
"on branch '%s'!") % (short(dhs[0]), branch)
elif repo[dhs[0]].bookmarks():
errormsg = _("push creates new remote head %s "
"with bookmark '%s'!") % (
short(dhs[0]), repo[dhs[0]].bookmarks()[0])
else:
errormsg = _("push creates new remote head %s!"
) % short(dhs[0])
if unsyncedheads:
hint = _("pull and merge or"
" see 'hg help push' for details about"
" pushing new heads")
else:
hint = _("merge or"
" see 'hg help push' for details about"
" pushing new heads")
if branch is None:
repo.ui.note(_("new remote heads:\n"))
else:
repo.ui.note(_("new remote heads on branch '%s':\n") % branch)
for h in dhs:
repo.ui.note((" %s\n") % short(h))
if errormsg:
raise error.Abort(errormsg, hint=hint)