upstream/mercurial-mirror Commit - r43432:3c6976b1

1

# setdiscovery.py - improved discovery of common nodeset for mercurial

1

# setdiscovery.py - improved discovery of common nodeset for mercurial

2

#

2

#

3

4

# and Peter Arrenbrecht <peter@arrenbrecht.ch>

4

# and Peter Arrenbrecht <peter@arrenbrecht.ch>

5

#

5

#

6

# This software may be used and distributed according to the terms of the

6

# This software may be used and distributed according to the terms of the

7

# GNU General Public License version 2 or any later version.

7

# GNU General Public License version 2 or any later version.

8

"""

8

"""

9

Algorithm works in the following way. You have two repository: local and

9

Algorithm works in the following way. You have two repository: local and

10

remote. They both contains a DAG of changelists.

10

remote. They both contains a DAG of changelists.

11

12

The goal of the discovery protocol is to find one set of node *common*,

12

The goal of the discovery protocol is to find one set of node *common*,

13

the set of nodes shared by local and remote.

13

the set of nodes shared by local and remote.

14

15

One of the issue with the original protocol was latency, it could

15

One of the issue with the original protocol was latency, it could

16

potentially require lots of roundtrips to discover that the local repo was a

16

potentially require lots of roundtrips to discover that the local repo was a

17

subset of remote (which is a very common case, you usually have few changes

17

subset of remote (which is a very common case, you usually have few changes

18

compared to upstream, while upstream probably had lots of development).

18

compared to upstream, while upstream probably had lots of development).

19

20

The new protocol only requires one interface for the remote repo: `known()`,

20

The new protocol only requires one interface for the remote repo: `known()`,

21

which given a set of changelists tells you if they are present in the DAG.

21

which given a set of changelists tells you if they are present in the DAG.

22

23

The algorithm then works as follow:

23

The algorithm then works as follow:

24

25

- We will be using three sets, `common`, `missing`, `unknown`. Originally

25

- We will be using three sets, `common`, `missing`, `unknown`. Originally

26

all nodes are in `unknown`.

26

all nodes are in `unknown`.

27

- Take a sample from `unknown`, call `remote.known(sample)`

27

- Take a sample from `unknown`, call `remote.known(sample)`

28

- For each node that remote knows, move it and all its ancestors to `common`

28

- For each node that remote knows, move it and all its ancestors to `common`

29

- For each node that remote doesn't know, move it and all its descendants

29

- For each node that remote doesn't know, move it and all its descendants

30

to `missing`

30

to `missing`

31

- Iterate until `unknown` is empty

31

- Iterate until `unknown` is empty

32

33

There are a couple optimizations, first is instead of starting with a random

33

There are a couple optimizations, first is instead of starting with a random

34

sample of missing, start by sending all heads, in the case where the local

34

sample of missing, start by sending all heads, in the case where the local

35

repo is a subset, you computed the answer in one round trip.

35

repo is a subset, you computed the answer in one round trip.

36

37

Then you can do something similar to the bisecting strategy used when

37

Then you can do something similar to the bisecting strategy used when

38

finding faulty changesets. Instead of random samples, you can try picking

38

finding faulty changesets. Instead of random samples, you can try picking

39

nodes that will maximize the number of nodes that will be

39

nodes that will maximize the number of nodes that will be

40

classified with it (since all ancestors or descendants will be marked as well).

40

classified with it (since all ancestors or descendants will be marked as well).

41

"""

41

"""

42

43

from __future__ import absolute_import

43

from __future__ import absolute_import

44

45

import collections

45

import collections

46

import random

46

import random

47

48

from .i18n import _

48

from .i18n import _

49

from .node import (

49

from .node import (

50

nullid,

50

nullid,

51

nullrev,

51

nullrev,

52

)

52

)

53

from . import (

53

from . import (

54

error,

54

error,

55

policy,

55

policy,

56

util,

56

util,

57

)

57

)

58

59

60

def _updatesample(revs, heads, sample, parentfn, quicksamplesize=0):

60

def _updatesample(revs, heads, sample, parentfn, quicksamplesize=0):

61

"""update an existing sample to match the expected size

61

"""update an existing sample to match the expected size

62

63

The sample is updated with revs exponentially distant from each head of the

63

The sample is updated with revs exponentially distant from each head of the

64

<revs> set. (H~1, H~2, H~4, H~8, etc).

64

<revs> set. (H~1, H~2, H~4, H~8, etc).

65

66

If a target size is specified, the sampling will stop once this size is

66

If a target size is specified, the sampling will stop once this size is

67

reached. Otherwise sampling will happen until roots of the <revs> set are

67

reached. Otherwise sampling will happen until roots of the <revs> set are

68

reached.

68

reached.

69

70

:revs: set of revs we want to discover (if None, assume the whole dag)

70

:revs: set of revs we want to discover (if None, assume the whole dag)

71

:heads: set of DAG head revs

71

:heads: set of DAG head revs

72

:sample: a sample to update

72

:sample: a sample to update

73

:parentfn: a callable to resolve parents for a revision

73

:parentfn: a callable to resolve parents for a revision

74

:quicksamplesize: optional target size of the sample"""

74

:quicksamplesize: optional target size of the sample"""

75

dist = {}

75

dist = {}

76

visit = collections.deque(heads)

76

visit = collections.deque(heads)

77

seen = set()

77

seen = set()

78

factor = 1

78

factor = 1

79

while visit:

79

while visit:

80

curr = visit.popleft()

80

curr = visit.popleft()

81

if curr in seen:

81

if curr in seen:

82

continue

82

continue

83

d = dist.setdefault(curr, 1)

83

d = dist.setdefault(curr, 1)

84

if d > factor:

84

if d > factor:

85

factor *= 2

85

factor *= 2

86

if d == factor:

86

if d == factor:

87

sample.add(curr)

87

sample.add(curr)

88

if quicksamplesize and (len(sample) >= quicksamplesize):

88

if quicksamplesize and (len(sample) >= quicksamplesize):

89

return

89

return

90

seen.add(curr)

90

seen.add(curr)

91

92

for p in parentfn(curr):

92

for p in parentfn(curr):

93

if p != nullrev and (not revs or p in revs):

93

if p != nullrev and (not revs or p in revs):

94

dist.setdefault(p, d + 1)

94

dist.setdefault(p, d + 1)

95

visit.append(p)

95

visit.append(p)

96

97

98

def _limitsample(sample, desiredlen, randomize=True):

98

def _limitsample(sample, desiredlen, randomize=True):

99

"""return a random subset of sample of at most desiredlen item.

99

"""return a random subset of sample of at most desiredlen item.

100

101

If randomize is False, though, a deterministic subset is returned.

101

If randomize is False, though, a deterministic subset is returned.

102

This is meant for integration tests.

102

This is meant for integration tests.

103

"""

103

"""

104

if len(sample) <= desiredlen:

104

if len(sample) <= desiredlen:

105

return sample

105

return sample

106

if randomize:

106

if randomize:

107

return set(random.sample(sample, desiredlen))

107

return set(random.sample(sample, desiredlen))

108

sample = list(sample)

108

sample = list(sample)

109

sample.sort()

109

sample.sort()

110

return set(sample[:desiredlen])

110

return set(sample[:desiredlen])

111

112

113

class partialdiscovery(object):

113

class partialdiscovery(object):

114

"""an object representing ongoing discovery

114

"""an object representing ongoing discovery

115

116

Feed with data from the remote repository, this object keep track of the

116

Feed with data from the remote repository, this object keep track of the

117

current set of changeset in various states:

117

current set of changeset in various states:

118

119

- common: revs also known remotely

119

- common: revs also known remotely

120

- undecided: revs we don't have information on yet

120

- undecided: revs we don't have information on yet

121

- missing: revs missing remotely

121

- missing: revs missing remotely

122

(all tracked revisions are known locally)

122

(all tracked revisions are known locally)

123

"""

123

"""

124

125

def __init__(self, repo, targetheads, respectsize, randomize=True):

125

def __init__(self, repo, targetheads, respectsize, randomize=True):

126

self._repo = repo

126

self._repo = repo

127

self._targetheads = targetheads

127

self._targetheads = targetheads

128

self._common = repo.changelog.incrementalmissingrevs()

128

self._common = repo.changelog.incrementalmissingrevs()

129

self._undecided = None

129

self._undecided = None

130

self.missing = set()

130

self.missing = set()

131

self._childrenmap = None

131

self._childrenmap = None

132

self._respectsize = respectsize

132

self._respectsize = respectsize

133

self.randomize = randomize

133

self.randomize = randomize

134

135

def addcommons(self, commons):

135

def addcommons(self, commons):

136

"""register nodes known as common"""

136

"""register nodes known as common"""

137

self._common.addbases(commons)

137

self._common.addbases(commons)

138

if self._undecided is not None:

138

if self._undecided is not None:

139

self._common.removeancestorsfrom(self._undecided)

139

self._common.removeancestorsfrom(self._undecided)

140

141

def addmissings(self, missings):

141

def addmissings(self, missings):

142

"""register some nodes as missing"""

142

"""register some nodes as missing"""

143

newmissing = self._repo.revs(b'%ld::%ld', missings, self.undecided)

143

newmissing = self._repo.revs(b'%ld::%ld', missings, self.undecided)

144

if newmissing:

144

if newmissing:

145

self.missing.update(newmissing)

145

self.missing.update(newmissing)

146

self.undecided.difference_update(newmissing)

146

self.undecided.difference_update(newmissing)

147

148

def addinfo(self, sample):

148

def addinfo(self, sample):

149

"""consume an iterable of (rev, known) tuples"""

149

"""consume an iterable of (rev, known) tuples"""

150

common = set()

150

common = set()

151

missing = set()

151

missing = set()

152

for rev, known in sample:

152

for rev, known in sample:

153

if known:

153

if known:

154

common.add(rev)

154

common.add(rev)

155

else:

155

else:

156

missing.add(rev)

156

missing.add(rev)

157

if common:

157

if common:

158

self.addcommons(common)

158

self.addcommons(common)

159

if missing:

159

if missing:

160

self.addmissings(missing)

160

self.addmissings(missing)

161

162

def hasinfo(self):

162

def hasinfo(self):

163

"""return True is we have any clue about the remote state"""

163

"""return True is we have any clue about the remote state"""

164

return self._common.hasbases()

164

return self._common.hasbases()

165

166

def iscomplete(self):

166

def iscomplete(self):

167

"""True if all the necessary data have been gathered"""

167

"""True if all the necessary data have been gathered"""

168

return self._undecided is not None and not self._undecided

168

return self._undecided is not None and not self._undecided

169

170

@property

170

@property

171

def undecided(self):

171

def undecided(self):

172

if self._undecided is not None:

172

if self._undecided is not None:

173

return self._undecided

173

return self._undecided

174

self._undecided = set(self._common.missingancestors(self._targetheads))

174

self._undecided = set(self._common.missingancestors(self._targetheads))

175

return self._undecided

175

return self._undecided

176

177

def stats(self):

177

def stats(self):

178

return {

178

return {

179

b'undecided': len(self.undecided),

179

'undecided': len(self.undecided),

180

}

180

}

181

182

def commonheads(self):

182

def commonheads(self):

183

"""the heads of the known common set"""

183

"""the heads of the known common set"""

184

# heads(common) == heads(common.bases) since common represents

184

# heads(common) == heads(common.bases) since common represents

185

# common.bases and all its ancestors

185

# common.bases and all its ancestors

186

return self._common.basesheads()

186

return self._common.basesheads()

187

188

def _parentsgetter(self):

188

def _parentsgetter(self):

189

getrev = self._repo.changelog.index.__getitem__

189

getrev = self._repo.changelog.index.__getitem__

190

191

def getparents(r):

191

def getparents(r):

192

return getrev(r)[5:7]

192

return getrev(r)[5:7]

193

194

return getparents

194

return getparents

195

196

def _childrengetter(self):

196

def _childrengetter(self):

197

198

if self._childrenmap is not None:

198

if self._childrenmap is not None:

199

# During discovery, the `undecided` set keep shrinking.

199

# During discovery, the `undecided` set keep shrinking.

200

# Therefore, the map computed for an iteration N will be

200

# Therefore, the map computed for an iteration N will be

201

# valid for iteration N+1. Instead of computing the same

201

# valid for iteration N+1. Instead of computing the same

202

# data over and over we cached it the first time.

202

# data over and over we cached it the first time.

203

return self._childrenmap.__getitem__

203

return self._childrenmap.__getitem__

204

205

# _updatesample() essentially does interaction over revisions to look

205

# _updatesample() essentially does interaction over revisions to look

206

# up their children. This lookup is expensive and doing it in a loop is

206

# up their children. This lookup is expensive and doing it in a loop is

207

# quadratic. We precompute the children for all relevant revisions and

207

# quadratic. We precompute the children for all relevant revisions and

208

# make the lookup in _updatesample() a simple dict lookup.

208

# make the lookup in _updatesample() a simple dict lookup.

209

self._childrenmap = children = {}

209

self._childrenmap = children = {}

210

211

parentrevs = self._parentsgetter()

211

parentrevs = self._parentsgetter()

212

revs = self.undecided

212

revs = self.undecided

213

214

for rev in sorted(revs):

214

for rev in sorted(revs):

215

# Always ensure revision has an entry so we don't need to worry

215

# Always ensure revision has an entry so we don't need to worry

216

# about missing keys.

216

# about missing keys.

217

children[rev] = []

217

children[rev] = []

218

for prev in parentrevs(rev):

218

for prev in parentrevs(rev):

219

if prev == nullrev:

219

if prev == nullrev:

220

continue

220

continue

221

c = children.get(prev)

221

c = children.get(prev)

222

if c is not None:

222

if c is not None:

223

c.append(rev)

223

c.append(rev)

224

return children.__getitem__

224

return children.__getitem__

225

226

def takequicksample(self, headrevs, size):

226

def takequicksample(self, headrevs, size):

227

"""takes a quick sample of size <size>

227

"""takes a quick sample of size <size>

228

229

It is meant for initial sampling and focuses on querying heads and close

229

It is meant for initial sampling and focuses on querying heads and close

230

ancestors of heads.

230

ancestors of heads.

231

232

:headrevs: set of head revisions in local DAG to consider

232

:headrevs: set of head revisions in local DAG to consider

233

:size: the maximum size of the sample"""

233

:size: the maximum size of the sample"""

234

revs = self.undecided

234

revs = self.undecided

235

if len(revs) <= size:

235

if len(revs) <= size:

236

return list(revs)

236

return list(revs)

237

sample = set(self._repo.revs(b'heads(%ld)', revs))

237

sample = set(self._repo.revs(b'heads(%ld)', revs))

238

239

if len(sample) >= size:

239

if len(sample) >= size:

240

return _limitsample(sample, size, randomize=self.randomize)

240

return _limitsample(sample, size, randomize=self.randomize)

241

242

_updatesample(

242

_updatesample(

243

None, headrevs, sample, self._parentsgetter(), quicksamplesize=size

243

None, headrevs, sample, self._parentsgetter(), quicksamplesize=size

244

)

244

)

245

return sample

245

return sample

246

247

def takefullsample(self, headrevs, size):

247

def takefullsample(self, headrevs, size):

248

revs = self.undecided

248

revs = self.undecided

249

if len(revs) <= size:

249

if len(revs) <= size:

250

return list(revs)

250

return list(revs)

251

repo = self._repo

251

repo = self._repo

252

sample = set(repo.revs(b'heads(%ld)', revs))

252

sample = set(repo.revs(b'heads(%ld)', revs))

253

parentrevs = self._parentsgetter()

253

parentrevs = self._parentsgetter()

254

255

# update from heads

255

# update from heads

256

revsheads = sample.copy()

256

revsheads = sample.copy()

257

_updatesample(revs, revsheads, sample, parentrevs)

257

_updatesample(revs, revsheads, sample, parentrevs)

258

259

# update from roots

259

# update from roots

260

revsroots = set(repo.revs(b'roots(%ld)', revs))

260

revsroots = set(repo.revs(b'roots(%ld)', revs))

261

childrenrevs = self._childrengetter()

261

childrenrevs = self._childrengetter()

262

_updatesample(revs, revsroots, sample, childrenrevs)

262

_updatesample(revs, revsroots, sample, childrenrevs)

263

assert sample

263

assert sample

264

265

if not self._respectsize:

265

if not self._respectsize:

266

size = max(size, min(len(revsroots), len(revsheads)))

266

size = max(size, min(len(revsroots), len(revsheads)))

267

268

sample = _limitsample(sample, size, randomize=self.randomize)

268

sample = _limitsample(sample, size, randomize=self.randomize)

269

if len(sample) < size:

269

if len(sample) < size:

270

more = size - len(sample)

270

more = size - len(sample)

271

takefrom = list(revs - sample)

271

takefrom = list(revs - sample)

272

if self.randomize:

272

if self.randomize:

273

sample.update(random.sample(takefrom, more))

273

sample.update(random.sample(takefrom, more))

274

else:

274

else:

275

takefrom.sort()

275

takefrom.sort()

276

sample.update(takefrom[:more])

276

sample.update(takefrom[:more])

277

return sample

277

return sample

278

279

280

partialdiscovery = policy.importrust(

280

partialdiscovery = policy.importrust(

281

r'discovery', member=r'PartialDiscovery', default=partialdiscovery

281

r'discovery', member=r'PartialDiscovery', default=partialdiscovery

282

)

282

)

283

284

285

def findcommonheads(

285

def findcommonheads(

286

ui,

286

ui,

287

local,

287

local,

288

remote,

288

remote,

289

initialsamplesize=100,

289

initialsamplesize=100,

290

fullsamplesize=200,

290

fullsamplesize=200,

291

abortwhenunrelated=True,

291

abortwhenunrelated=True,

292

ancestorsof=None,

292

ancestorsof=None,

293

samplegrowth=1.05,

293

samplegrowth=1.05,

294

):

294

):

295

'''Return a tuple (common, anyincoming, remoteheads) used to identify

295

'''Return a tuple (common, anyincoming, remoteheads) used to identify

296

missing nodes from or in remote.

296

missing nodes from or in remote.

297

'''

297

'''

298

start = util.timer()

298

start = util.timer()

299

300

roundtrips = 0

300

roundtrips = 0

301

cl = local.changelog

301

cl = local.changelog

302

clnode = cl.node

302

clnode = cl.node

303

clrev = cl.rev

303

clrev = cl.rev

304

305

if ancestorsof is not None:

305

if ancestorsof is not None:

306

ownheads = [clrev(n) for n in ancestorsof]

306

ownheads = [clrev(n) for n in ancestorsof]

307

else:

307

else:

308

ownheads = [rev for rev in cl.headrevs() if rev != nullrev]

308

ownheads = [rev for rev in cl.headrevs() if rev != nullrev]

309

310

# early exit if we know all the specified remote heads already

310

# early exit if we know all the specified remote heads already

311

ui.debug(b"query 1; heads\n")

311

ui.debug(b"query 1; heads\n")

312

roundtrips += 1

312

roundtrips += 1

313

# We also ask remote about all the local heads. That set can be arbitrarily

313

# We also ask remote about all the local heads. That set can be arbitrarily

314

# large, so we used to limit it size to `initialsamplesize`. We no longer

314

# large, so we used to limit it size to `initialsamplesize`. We no longer

315

# do as it proved counter productive. The skipped heads could lead to a

315

# do as it proved counter productive. The skipped heads could lead to a

316

# large "undecided" set, slower to be clarified than if we asked the

316

# large "undecided" set, slower to be clarified than if we asked the

317

# question for all heads right away.

317

# question for all heads right away.

318

#

318

#

319

# We are already fetching all server heads using the `heads` commands,

319

# We are already fetching all server heads using the `heads` commands,

320

# sending a equivalent number of heads the other way should not have a

320

# sending a equivalent number of heads the other way should not have a

321

# significant impact. In addition, it is very likely that we are going to

321

# significant impact. In addition, it is very likely that we are going to

322

# have to issue "known" request for an equivalent amount of revisions in

322

# have to issue "known" request for an equivalent amount of revisions in

323

# order to decide if theses heads are common or missing.

323

# order to decide if theses heads are common or missing.

324

#

324

#

325

# find a detailled analysis below.

325

# find a detailled analysis below.

326

#

326

#

327

# Case A: local and server both has few heads

327

# Case A: local and server both has few heads

328

#

328

#

329

# Ownheads is below initialsamplesize, limit would not have any effect.

329

# Ownheads is below initialsamplesize, limit would not have any effect.

330

#

330

#

331

# Case B: local has few heads and server has many

331

# Case B: local has few heads and server has many

332

#

332

#

333

# Ownheads is below initialsamplesize, limit would not have any effect.

333

# Ownheads is below initialsamplesize, limit would not have any effect.

334

#

334

#

335

# Case C: local and server both has many heads

335

# Case C: local and server both has many heads

336

#

336

#

337

# We now transfert some more data, but not significantly more than is

337

# We now transfert some more data, but not significantly more than is

338

# already transfered to carry the server heads.

338

# already transfered to carry the server heads.

339

#

339

#

340

# Case D: local has many heads, server has few

340

# Case D: local has many heads, server has few

341

#

341

#

342

# D.1 local heads are mostly known remotely

342

# D.1 local heads are mostly known remotely

343

#

343

#

344

# All the known head will have be part of a `known` request at some

344

# All the known head will have be part of a `known` request at some

345

# point for the discovery to finish. Sending them all earlier is

345

# point for the discovery to finish. Sending them all earlier is

346

# actually helping.

346

# actually helping.

347

#

347

#

348

# (This case is fairly unlikely, it requires the numerous heads to all

348

# (This case is fairly unlikely, it requires the numerous heads to all

349

# be merged server side in only a few heads)

349

# be merged server side in only a few heads)

350

#

350

#

351

# D.2 local heads are mostly missing remotely

351

# D.2 local heads are mostly missing remotely

352

#

352

#

353

# To determine that the heads are missing, we'll have to issue `known`

353

# To determine that the heads are missing, we'll have to issue `known`

354

# request for them or one of their ancestors. This amount of `known`

354

# request for them or one of their ancestors. This amount of `known`

355

# request will likely be in the same order of magnitude than the amount

355

# request will likely be in the same order of magnitude than the amount

356

# of local heads.

356

# of local heads.

357

#

357

#

358

# The only case where we can be more efficient using `known` request on

358

# The only case where we can be more efficient using `known` request on

359

# ancestors are case were all the "missing" local heads are based on a

359

# ancestors are case were all the "missing" local heads are based on a

360

# few changeset, also "missing". This means we would have a "complex"

360

# few changeset, also "missing". This means we would have a "complex"

361

# graph (with many heads) attached to, but very independant to a the

361

# graph (with many heads) attached to, but very independant to a the

362

# "simple" graph on the server. This is a fairly usual case and have

362

# "simple" graph on the server. This is a fairly usual case and have

363

# not been met in the wild so far.

363

# not been met in the wild so far.

364

if remote.limitedarguments:

364

if remote.limitedarguments:

365

sample = _limitsample(ownheads, initialsamplesize)

365

sample = _limitsample(ownheads, initialsamplesize)

366

# indices between sample and externalized version must match

366

# indices between sample and externalized version must match

367

sample = list(sample)

367

sample = list(sample)

368

else:

368

else:

369

sample = ownheads

369

sample = ownheads

370

371

with remote.commandexecutor() as e:

371

with remote.commandexecutor() as e:

372

fheads = e.callcommand(b'heads', {})

372

fheads = e.callcommand(b'heads', {})

373

fknown = e.callcommand(

373

fknown = e.callcommand(

374

b'known', {b'nodes': [clnode(r) for r in sample],}

374

b'known', {b'nodes': [clnode(r) for r in sample],}

375

)

375

)

376

377

srvheadhashes, yesno = fheads.result(), fknown.result()

377

srvheadhashes, yesno = fheads.result(), fknown.result()

378

379

if cl.tip() == nullid:

379

if cl.tip() == nullid:

380

if srvheadhashes != [nullid]:

380

if srvheadhashes != [nullid]:

381

return [nullid], True, srvheadhashes

381

return [nullid], True, srvheadhashes

382

return [nullid], False, []

382

return [nullid], False, []

383

384

# start actual discovery (we note this before the next "if" for

384

# start actual discovery (we note this before the next "if" for

385

# compatibility reasons)

385

# compatibility reasons)

386

ui.status(_(b"searching for changes\n"))

386

ui.status(_(b"searching for changes\n"))

387

388

knownsrvheads = [] # revnos of remote heads that are known locally

388

knownsrvheads = [] # revnos of remote heads that are known locally

389

for node in srvheadhashes:

389

for node in srvheadhashes:

390

if node == nullid:

390

if node == nullid:

391

continue

391

continue

392

393

try:

393

try:

394

knownsrvheads.append(clrev(node))

394

knownsrvheads.append(clrev(node))

395

# Catches unknown and filtered nodes.

395

# Catches unknown and filtered nodes.

396

except error.LookupError:

396

except error.LookupError:

397

continue

397

continue

398

399

if len(knownsrvheads) == len(srvheadhashes):

399

if len(knownsrvheads) == len(srvheadhashes):

400

ui.debug(b"all remote heads known locally\n")

400

ui.debug(b"all remote heads known locally\n")

401

return srvheadhashes, False, srvheadhashes

401

return srvheadhashes, False, srvheadhashes

402

403

if len(sample) == len(ownheads) and all(yesno):

403

if len(sample) == len(ownheads) and all(yesno):

404

ui.note(_(b"all local changesets known remotely\n"))

404

ui.note(_(b"all local changesets known remotely\n"))

405

ownheadhashes = [clnode(r) for r in ownheads]

405

ownheadhashes = [clnode(r) for r in ownheads]

406

return ownheadhashes, True, srvheadhashes

406

return ownheadhashes, True, srvheadhashes

407

408

# full blown discovery

408

# full blown discovery

409

410

randomize = ui.configbool(b'devel', b'discovery.randomize')

410

randomize = ui.configbool(b'devel', b'discovery.randomize')

411

disco = partialdiscovery(

411

disco = partialdiscovery(

412

local, ownheads, remote.limitedarguments, randomize=randomize

412

local, ownheads, remote.limitedarguments, randomize=randomize

413

)

413

)

414

# treat remote heads (and maybe own heads) as a first implicit sample

414

# treat remote heads (and maybe own heads) as a first implicit sample

415

# response

415

# response

416

disco.addcommons(knownsrvheads)

416

disco.addcommons(knownsrvheads)

417

disco.addinfo(zip(sample, yesno))

417

disco.addinfo(zip(sample, yesno))

418

419

full = False

419

full = False

420

progress = ui.makeprogress(_(b'searching'), unit=_(b'queries'))

420

progress = ui.makeprogress(_(b'searching'), unit=_(b'queries'))

421

while not disco.iscomplete():

421

while not disco.iscomplete():

422

423

if full or disco.hasinfo():

423

if full or disco.hasinfo():

424

if full:

424

if full:

425

ui.note(_(b"sampling from both directions\n"))

425

ui.note(_(b"sampling from both directions\n"))

426

else:

426

else:

427

ui.debug(b"taking initial sample\n")

427

ui.debug(b"taking initial sample\n")

428

samplefunc = disco.takefullsample

428

samplefunc = disco.takefullsample

429

targetsize = fullsamplesize

429

targetsize = fullsamplesize

430

if not remote.limitedarguments:

430

if not remote.limitedarguments:

431

fullsamplesize = int(fullsamplesize * samplegrowth)

431

fullsamplesize = int(fullsamplesize * samplegrowth)

432

else:

432

else:

433

# use even cheaper initial sample

433

# use even cheaper initial sample

434

ui.debug(b"taking quick initial sample\n")

434

ui.debug(b"taking quick initial sample\n")

435

samplefunc = disco.takequicksample

435

samplefunc = disco.takequicksample

436

targetsize = initialsamplesize

436

targetsize = initialsamplesize

437

sample = samplefunc(ownheads, targetsize)

437

sample = samplefunc(ownheads, targetsize)

438

439

roundtrips += 1

439

roundtrips += 1

440

progress.update(roundtrips)

440

progress.update(roundtrips)

441

stats = disco.stats()

441

stats = disco.stats()

442

ui.debug(

442

ui.debug(

443

b"query %i; still undecided: %i, sample size is: %i\n"

443

b"query %i; still undecided: %i, sample size is: %i\n"

444

% (roundtrips, stats[b'undecided'], len(sample))

444

% (roundtrips, stats['undecided'], len(sample))

445

)

445

)

446

447

# indices between sample and externalized version must match

447

# indices between sample and externalized version must match

448

sample = list(sample)

448

sample = list(sample)

449

450

with remote.commandexecutor() as e:

450

with remote.commandexecutor() as e:

451

yesno = e.callcommand(

451

yesno = e.callcommand(

452

b'known', {b'nodes': [clnode(r) for r in sample],}

452

b'known', {b'nodes': [clnode(r) for r in sample],}

453

).result()

453

).result()

454

455

full = True

455

full = True

456

457

disco.addinfo(zip(sample, yesno))

457

disco.addinfo(zip(sample, yesno))

458

459

result = disco.commonheads()

459

result = disco.commonheads()

460

elapsed = util.timer() - start

460

elapsed = util.timer() - start

461

progress.complete()

461

progress.complete()

462

ui.debug(b"%d total queries in %.4fs\n" % (roundtrips, elapsed))

462

ui.debug(b"%d total queries in %.4fs\n" % (roundtrips, elapsed))

463

msg = (

463

msg = (

464

b'found %d common and %d unknown server heads,'

464

b'found %d common and %d unknown server heads,'

465

b' %d roundtrips in %.4fs\n'

465

b' %d roundtrips in %.4fs\n'

466

)

466

)

467

missing = set(result) - set(knownsrvheads)

467

missing = set(result) - set(knownsrvheads)

468

ui.log(b'discovery', msg, len(result), len(missing), roundtrips, elapsed)

468

ui.log(b'discovery', msg, len(result), len(missing), roundtrips, elapsed)

469

470

if not result and srvheadhashes != [nullid]:

470

if not result and srvheadhashes != [nullid]:

471

if abortwhenunrelated:

471

if abortwhenunrelated:

472

raise error.Abort(_(b"repository is unrelated"))

472

raise error.Abort(_(b"repository is unrelated"))

473

else:

473

else:

474

ui.warn(_(b"warning: repository is unrelated\n"))

474

ui.warn(_(b"warning: repository is unrelated\n"))

475

return (

475

return (

476

{nullid},

476

{nullid},

477

True,

477

True,

478

srvheadhashes,

478

srvheadhashes,

479

)

479

)

480

481

anyincoming = srvheadhashes != [nullid]

481

anyincoming = srvheadhashes != [nullid]

482

result = {clnode(r) for r in result}

482

result = {clnode(r) for r in result}

483

return result, anyincoming, srvheadhashes

483

return result, anyincoming, srvheadhashes

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

             # setdiscovery.py - improved discovery of common nodeset for mercurial
             #
             # Copyright 2010 Benoit Boissinot <bboissin@gmail.com>
             # and Peter Arrenbrecht <peter@arrenbrecht.ch>
             #
             # This software may be used and distributed according to the terms of the
             # GNU General Public License version 2 or any later version.
             """
             Algorithm works in the following way. You have two repository: local and
             remote. They both contains a DAG of changelists.
             The goal of the discovery protocol is to find one set of node *common*,
             the set of nodes shared by local and remote.
             One of the issue with the original protocol was latency, it could
             potentially require lots of roundtrips to discover that the local repo was a
             subset of remote (which is a very common case, you usually have few changes
             compared to upstream, while upstream probably had lots of development).
             The new protocol only requires one interface for the remote repo: `known()`,
             which given a set of changelists tells you if they are present in the DAG.
             The algorithm then works as follow:
              - We will be using three sets, `common`, `missing`, `unknown`. Originally
              all nodes are in `unknown`.
              - Take a sample from `unknown`, call `remote.known(sample)`
                - For each node that remote knows, move it and all its ancestors to `common`
                - For each node that remote doesn't know, move it and all its descendants
                to `missing`
              - Iterate until `unknown` is empty
             There are a couple optimizations, first is instead of starting with a random
             sample of missing, start by sending all heads, in the case where the local
             repo is a subset, you computed the answer in one round trip.
             Then you can do something similar to the bisecting strategy used when
             finding faulty changesets. Instead of random samples, you can try picking
             nodes that will maximize the number of nodes that will be
             classified with it (since all ancestors or descendants will be marked as well).
             """
             from __future__ import absolute_import
             import collections
             import random
             from .i18n import _
             from .node import (
                 nullid,
                 nullrev,
             )
             from . import (
                 error,
                 policy,
                 util,
             )
             def _updatesample(revs, heads, sample, parentfn, quicksamplesize=0):
                 """update an existing sample to match the expected size
                 The sample is updated with revs exponentially distant from each head of the
                 <revs> set. (H~1, H~2, H~4, H~8, etc).
                 If a target size is specified, the sampling will stop once this size is
                 reached. Otherwise sampling will happen until roots of the <revs> set are
                 reached.
                 :revs:  set of revs we want to discover (if None, assume the whole dag)
                 :heads: set of DAG head revs
                 :sample: a sample to update
                 :parentfn: a callable to resolve parents for a revision
                 :quicksamplesize: optional target size of the sample"""
                 dist = {}
                 visit = collections.deque(heads)
                 seen = set()
                 factor = 1
                 while visit:
                     curr = visit.popleft()
                     if curr in seen:
                         continue
                     d = dist.setdefault(curr, 1)
                     if d > factor:
                         factor *= 2
                     if d == factor:
                         sample.add(curr)
                         if quicksamplesize and (len(sample) >= quicksamplesize):
                             return
                     seen.add(curr)
                     for p in parentfn(curr):
                         if p != nullrev and (not revs or p in revs):
                             dist.setdefault(p, d + 1)
                             visit.append(p)
             def _limitsample(sample, desiredlen, randomize=True):
                 """return a random subset of sample of at most desiredlen item.
                 If randomize is False, though, a deterministic subset is returned.
                 This is meant for integration tests.
                 """
                 if len(sample) <= desiredlen:
                     return sample
                 if randomize:
                     return set(random.sample(sample, desiredlen))
                 sample = list(sample)
                 sample.sort()
                 return set(sample[:desiredlen])
             class partialdiscovery(object):
                 """an object representing ongoing discovery
                 Feed with data from the remote repository, this object keep track of the
                 current set of changeset in various states:
                 - common:    revs also known remotely
                 - undecided: revs we don't have information on yet
                 - missing:   revs missing remotely
                 (all tracked revisions are known locally)
                 """
                 def __init__(self, repo, targetheads, respectsize, randomize=True):
                     self._repo = repo
                     self._targetheads = targetheads
                     self._common = repo.changelog.incrementalmissingrevs()
                     self._undecided = None
                     self.missing = set()
                     self._childrenmap = None
                     self._respectsize = respectsize
                     self.randomize = randomize
                 def addcommons(self, commons):
                     """register nodes known as common"""
                     self._common.addbases(commons)
                     if self._undecided is not None:
                         self._common.removeancestorsfrom(self._undecided)
                 def addmissings(self, missings):
                     """register some nodes as missing"""
                     newmissing = self._repo.revs(b'%ld::%ld', missings, self.undecided)
                     if newmissing:
                         self.missing.update(newmissing)
                         self.undecided.difference_update(newmissing)
                 def addinfo(self, sample):
                     """consume an iterable of (rev, known) tuples"""
                     common = set()
                     missing = set()
                     for rev, known in sample:
                         if known:
                             common.add(rev)
                         else:
                             missing.add(rev)
                     if common:
                         self.addcommons(common)
                     if missing:
                         self.addmissings(missing)
                 def hasinfo(self):
                     """return True is we have any clue about the remote state"""
                     return self._common.hasbases()
                 def iscomplete(self):
                     """True if all the necessary data have been gathered"""
                     return self._undecided is not None and not self._undecided
                 @property
                 def undecided(self):
                     if self._undecided is not None:
                         return self._undecided
                     self._undecided = set(self._common.missingancestors(self._targetheads))
                     return self._undecided
                 def stats(self):
                     return {
-                        b'undecided': len(self.undecided),
+                        'undecided': len(self.undecided),
                     }
                 def commonheads(self):
                     """the heads of the known common set"""
                     # heads(common) == heads(common.bases) since common represents
                     # common.bases and all its ancestors
                     return self._common.basesheads()
                 def _parentsgetter(self):
                     getrev = self._repo.changelog.index.__getitem__
                     def getparents(r):
                         return getrev(r)[5:7]
                     return getparents
                 def _childrengetter(self):
                     if self._childrenmap is not None:
                         # During discovery, the `undecided` set keep shrinking.
                         # Therefore, the map computed for an iteration N will be
                         # valid for iteration N+1. Instead of computing the same
                         # data over and over we cached it the first time.
                         return self._childrenmap.__getitem__
                     # _updatesample() essentially does interaction over revisions to look
                     # up their children. This lookup is expensive and doing it in a loop is
                     # quadratic. We precompute the children for all relevant revisions and
                     # make the lookup in _updatesample() a simple dict lookup.
                     self._childrenmap = children = {}
                     parentrevs = self._parentsgetter()
                     revs = self.undecided
                     for rev in sorted(revs):
                         # Always ensure revision has an entry so we don't need to worry
                         # about missing keys.
                         children[rev] = []
                         for prev in parentrevs(rev):
                             if prev == nullrev:
                                 continue
                             c = children.get(prev)
                             if c is not None:
                                 c.append(rev)
                     return children.__getitem__
                 def takequicksample(self, headrevs, size):
                     """takes a quick sample of size <size>
                     It is meant for initial sampling and focuses on querying heads and close
                     ancestors of heads.
                     :headrevs: set of head revisions in local DAG to consider
                     :size: the maximum size of the sample"""
                     revs = self.undecided
                     if len(revs) <= size:
                         return list(revs)
                     sample = set(self._repo.revs(b'heads(%ld)', revs))
                     if len(sample) >= size:
                         return _limitsample(sample, size, randomize=self.randomize)
                     _updatesample(
                         None, headrevs, sample, self._parentsgetter(), quicksamplesize=size
                     )
                     return sample
                 def takefullsample(self, headrevs, size):
                     revs = self.undecided
                     if len(revs) <= size:
                         return list(revs)
                     repo = self._repo
                     sample = set(repo.revs(b'heads(%ld)', revs))
                     parentrevs = self._parentsgetter()
                     # update from heads
                     revsheads = sample.copy()
                     _updatesample(revs, revsheads, sample, parentrevs)
                     # update from roots
                     revsroots = set(repo.revs(b'roots(%ld)', revs))
                     childrenrevs = self._childrengetter()
                     _updatesample(revs, revsroots, sample, childrenrevs)
                     assert sample
                     if not self._respectsize:
                         size = max(size, min(len(revsroots), len(revsheads)))
                     sample = _limitsample(sample, size, randomize=self.randomize)
                     if len(sample) < size:
                         more = size - len(sample)
                         takefrom = list(revs - sample)
                         if self.randomize:
                             sample.update(random.sample(takefrom, more))
                         else:
                             takefrom.sort()
                             sample.update(takefrom[:more])
                     return sample
             partialdiscovery = policy.importrust(
                 r'discovery', member=r'PartialDiscovery', default=partialdiscovery
             )
             def findcommonheads(
                 ui,
                 local,
                 remote,
                 initialsamplesize=100,
                 fullsamplesize=200,
                 abortwhenunrelated=True,
                 ancestorsof=None,
                 samplegrowth=1.05,
             ):
                 '''Return a tuple (common, anyincoming, remoteheads) used to identify
                 missing nodes from or in remote.
                 '''
                 start = util.timer()
                 roundtrips = 0
                 cl = local.changelog
                 clnode = cl.node
                 clrev = cl.rev
                 if ancestorsof is not None:
                     ownheads = [clrev(n) for n in ancestorsof]
                 else:
                     ownheads = [rev for rev in cl.headrevs() if rev != nullrev]
                 # early exit if we know all the specified remote heads already
                 ui.debug(b"query 1; heads\n")
                 roundtrips += 1
                 # We also ask remote about all the local heads. That set can be arbitrarily
                 # large, so we used to limit it size to `initialsamplesize`. We no longer
                 # do as it proved counter productive. The skipped heads could lead to a
                 # large "undecided" set, slower to be clarified than if we asked the
                 # question for all heads right away.
                 #
                 # We are already fetching all server heads using the `heads` commands,
                 # sending a equivalent number of heads the other way should not have a
                 # significant impact.  In addition, it is very likely that we are going to
                 # have to issue "known" request for an equivalent amount of revisions in
                 # order to decide if theses heads are common or missing.
                 #
                 # find a detailled analysis below.
                 #
                 # Case A: local and server both has few heads
                 #
                 #     Ownheads is below initialsamplesize, limit would not have any effect.
                 #
                 # Case B: local has few heads and server has many
                 #
                 #     Ownheads is below initialsamplesize, limit would not have any effect.
                 #
                 # Case C: local and server both has many heads
                 #
                 #     We now transfert some more data, but not significantly more than is
                 #     already transfered to carry the server heads.
                 #
                 # Case D: local has many heads, server has few
                 #
                 #   D.1 local heads are mostly known remotely
                 #
                 #     All the known head will have be part of a `known` request at some
                 #     point for the discovery to finish. Sending them all earlier is
                 #     actually helping.
                 #
                 #     (This case is fairly unlikely, it requires the numerous heads to all
                 #     be merged server side in only a few heads)
                 #
                 #   D.2 local heads are mostly missing remotely
                 #
                 #     To determine that the heads are missing, we'll have to issue `known`
                 #     request for them or one of their ancestors. This amount of `known`
                 #     request will likely be in the same order of magnitude than the amount
                 #     of local heads.
                 #
                 #     The only case where we can be more efficient using `known` request on
                 #     ancestors are case were all the "missing" local heads are based on a
                 #     few changeset, also "missing".  This means we would have a "complex"
                 #     graph (with many heads) attached to, but very independant to a the
                 #     "simple" graph on the server. This is a fairly usual case and have
                 #     not been met in the wild so far.
                 if remote.limitedarguments:
                     sample = _limitsample(ownheads, initialsamplesize)
                     # indices between sample and externalized version must match
                     sample = list(sample)
                 else:
                     sample = ownheads
                 with remote.commandexecutor() as e:
                     fheads = e.callcommand(b'heads', {})
                     fknown = e.callcommand(
                         b'known', {b'nodes': [clnode(r) for r in sample],}
                     )
                 srvheadhashes, yesno = fheads.result(), fknown.result()
                 if cl.tip() == nullid:
                     if srvheadhashes != [nullid]:
                         return [nullid], True, srvheadhashes
                     return [nullid], False, []
                 # start actual discovery (we note this before the next "if" for
                 # compatibility reasons)
                 ui.status(_(b"searching for changes\n"))
                 knownsrvheads = []  # revnos of remote heads that are known locally
                 for node in srvheadhashes:
                     if node == nullid:
                         continue
                     try:
                         knownsrvheads.append(clrev(node))
                     # Catches unknown and filtered nodes.
                     except error.LookupError:
                         continue
                 if len(knownsrvheads) == len(srvheadhashes):
                     ui.debug(b"all remote heads known locally\n")
                     return srvheadhashes, False, srvheadhashes
                 if len(sample) == len(ownheads) and all(yesno):
                     ui.note(_(b"all local changesets known remotely\n"))
                     ownheadhashes = [clnode(r) for r in ownheads]
                     return ownheadhashes, True, srvheadhashes
                 # full blown discovery
                 randomize = ui.configbool(b'devel', b'discovery.randomize')
                 disco = partialdiscovery(
                     local, ownheads, remote.limitedarguments, randomize=randomize
                 )
                 # treat remote heads (and maybe own heads) as a first implicit sample
                 # response
                 disco.addcommons(knownsrvheads)
                 disco.addinfo(zip(sample, yesno))
                 full = False
                 progress = ui.makeprogress(_(b'searching'), unit=_(b'queries'))
                 while not disco.iscomplete():
                     if full or disco.hasinfo():
                         if full:
                             ui.note(_(b"sampling from both directions\n"))
                         else:
                             ui.debug(b"taking initial sample\n")
                         samplefunc = disco.takefullsample
                         targetsize = fullsamplesize
                         if not remote.limitedarguments:
                             fullsamplesize = int(fullsamplesize * samplegrowth)
                     else:
                         # use even cheaper initial sample
                         ui.debug(b"taking quick initial sample\n")
                         samplefunc = disco.takequicksample
                         targetsize = initialsamplesize
                     sample = samplefunc(ownheads, targetsize)
                     roundtrips += 1
                     progress.update(roundtrips)
                     stats = disco.stats()
                     ui.debug(
                         b"query %i; still undecided: %i, sample size is: %i\n"
-                        % (roundtrips, stats[b'undecided'], len(sample))
+                        % (roundtrips, stats['undecided'], len(sample))
                     )
                     # indices between sample and externalized version must match
                     sample = list(sample)
                     with remote.commandexecutor() as e:
                         yesno = e.callcommand(
                             b'known', {b'nodes': [clnode(r) for r in sample],}
                         ).result()
                     full = True
                     disco.addinfo(zip(sample, yesno))
                 result = disco.commonheads()
                 elapsed = util.timer() - start
                 progress.complete()
                 ui.debug(b"%d total queries in %.4fs\n" % (roundtrips, elapsed))
                 msg = (
                     b'found %d common and %d unknown server heads,'
                     b' %d roundtrips in %.4fs\n'
                 )
                 missing = set(result) - set(knownsrvheads)
                 ui.log(b'discovery', msg, len(result), len(missing), roundtrips, elapsed)
                 if not result and srvheadhashes != [nullid]:
                     if abortwhenunrelated:
                         raise error.Abort(_(b"repository is unrelated"))
                     else:
                         ui.warn(_(b"warning: repository is unrelated\n"))
                     return (
                         {nullid},
                         True,
                         srvheadhashes,
                     )
                 anyincoming = srvheadhashes != [nullid]
                 result = {clnode(r) for r in result}
                 return result, anyincoming, srvheadhashes