##// END OF EJS Templates
revlog: give EXTSTORED flag value to narrowhg...
Martin von Zweigbergk -
r30829:08b34c3a default
parent child Browse files
Show More
@@ -1,212 +1,215
1 1 Revision logs - or *revlogs* - are an append only data structure for
2 2 storing discrete entries, or *revisions*. They are the primary storage
3 3 mechanism of repository data.
4 4
5 5 Revlogs effectively model a directed acyclic graph (DAG). Each node
6 6 has edges to 1 or 2 *parent* nodes. Each node contains metadata and
7 7 the raw value for that node.
8 8
9 9 Revlogs consist of entries which have metadata and revision data.
10 10 Metadata includes the hash of the revision's content, sizes, and
11 11 links to its *parent* entries. The collective metadata is referred
12 12 to as the *index* and the revision data is the *data*.
13 13
14 14 Revision data is stored as a series of compressed deltas against previous
15 15 revisions.
16 16
17 17 Revlogs are written in an append-only fashion. We never need to rewrite
18 18 a file to insert nor do we need to remove data. Rolling back in-progress
19 19 writes can be performed by truncating files. Read locks can be avoided
20 20 using simple techniques. This means that references to other data in
21 21 the same revlog *always* refer to a previous entry.
22 22
23 23 Revlogs can be modeled as 0-indexed arrays. The first revision is
24 24 revision #0 and the second is revision #1. The revision -1 is typically
25 25 used to mean *does not exist* or *not defined*.
26 26
27 27 File Format
28 28 ===========
29 29
30 30 A revlog begins with a 32-bit big endian integer holding version info
31 31 and feature flags. This integer is shared with the first revision
32 32 entry.
33 33
34 34 This integer is logically divided into 2 16-bit shorts. The least
35 35 significant half of the integer is the format/version short. The other
36 36 short holds feature flags that dictate behavior of the revlog.
37 37
38 38 Only 1 bit of the format/version short is currently used. Remaining
39 39 bits are reserved for future use.
40 40
41 41 The following values for the format/version short are defined:
42 42
43 43 0
44 44 The original revlog version.
45 45 1
46 46 RevlogNG (*next generation*). It replaced version 0 when it was
47 47 implemented in 2006.
48 48
49 49 The feature flags short consists of bit flags. Where 0 is the least
50 50 significant bit, the following bit offsets define flags:
51 51
52 52 0
53 53 Store revision data inline.
54 54 1
55 55 Generaldelta encoding.
56 56
57 57 2-15
58 58 Reserved for future use.
59 59
60 60 The following header values are common:
61 61
62 62 00 00 00 01
63 63 RevlogNG
64 64 00 01 00 01
65 65 RevlogNG + inline
66 66 00 02 00 01
67 67 RevlogNG + generaldelta
68 68 00 03 00 01
69 69 RevlogNG + inline + generaldelta
70 70
71 71 Following the 32-bit header is the remainder of the first index entry.
72 72 Following that are remaining *index* data. Inlined revision data is
73 73 possibly located between index entries. More on this layout is described
74 74 below.
75 75
76 76 RevlogNG Format
77 77 ===============
78 78
79 79 RevlogNG (version 1) begins with an index describing the revisions in
80 80 the revlog. If the ``inline`` flag is set, revision data is stored inline,
81 81 or between index entries (as opposed to in a separate container).
82 82
83 83 Each index entry is 64 bytes. The byte layout of each entry is as
84 84 follows, with byte 0 being the first byte (all data stored as big endian):
85 85
86 86 0-3 (4 bytes) (rev 0 only)
87 87 Revlog header
88 88
89 89 0-5 (6 bytes)
90 90 Absolute offset of revision data from beginning of revlog.
91 91
92 92 6-7 (2 bytes)
93 93 Bit flags impacting revision behavior. The following bit offsets define:
94 94
95 95 0: REVIDX_ISCENSORED revision has censor metadata, must be verified.
96 96
97 1: REVIDX_EXTSTORED revision data is stored externally.
97 1: REVIDX_ELLIPSIS revision hash does not match its data. Used by
98 narrowhg
99
100 2: REVIDX_EXTSTORED revision data is stored externally.
98 101
99 102 8-11 (4 bytes)
100 103 Compressed length of revision data / chunk as stored in revlog.
101 104
102 105 12-15 (4 bytes)
103 106 Uncompressed length of revision data. This is the size of the full
104 107 revision data, not the size of the chunk post decompression.
105 108
106 109 16-19 (4 bytes)
107 110 Base or previous revision this revision's delta was produced against.
108 111 -1 means this revision holds full text (as opposed to a delta).
109 112 For generaldelta repos, this is the previous revision in the delta
110 113 chain. For non-generaldelta repos, this is the base or first
111 114 revision in the delta chain.
112 115
113 116 20-23 (4 bytes)
114 117 A revision this revision is *linked* to. This allows a revision in
115 118 one revlog to be forever associated with a revision in another
116 119 revlog. For example, a file's revlog may point to the changelog
117 120 revision that introduced it.
118 121
119 122 24-27 (4 bytes)
120 123 Revision of 1st parent. -1 indicates no parent.
121 124
122 125 28-31 (4 bytes)
123 126 Revision of 2nd parent. -1 indicates no 2nd parent.
124 127
125 128 32-63 (32 bytes)
126 129 Hash of revision's full text. Currently, SHA-1 is used and only
127 130 the first 20 bytes of this field are used. The rest of the bytes
128 131 are ignored and should be stored as \0.
129 132
130 133 If inline revision data is being stored, the compressed revision data
131 134 (of length from bytes offset 8-11 from the index entry) immediately
132 135 follows the index entry. There is no header on the revision data. There
133 136 is no padding between it and the index entries before and after.
134 137
135 138 If revision data is not inline, then raw revision data is stored in a
136 139 separate byte container. The offsets from bytes 0-5 and the compressed
137 140 length from bytes 8-11 define how to access this data.
138 141
139 142 The first 4 bytes of the revlog are shared between the revlog header
140 143 and the 6 byte absolute offset field from the first revlog entry.
141 144
142 145 Delta Chains
143 146 ============
144 147
145 148 Revision data is encoded as a chain of *chunks*. Each chain begins with
146 149 the compressed original full text for that revision. Each subsequent
147 150 *chunk* is a *delta* against the previous revision. We therefore call
148 151 these chains of chunks/deltas *delta chains*.
149 152
150 153 The full text for a revision is reconstructed by loading the original
151 154 full text for the base revision of a *delta chain* and then applying
152 155 *deltas* until the target revision is reconstructed.
153 156
154 157 *Delta chains* are limited in length so lookup time is bound. They are
155 158 limited to ~2x the length of the revision's data. The linear distance
156 159 between the base chunk and the final chunk is also limited so the
157 160 amount of read I/O to load all chunks in the delta chain is bound.
158 161
159 162 Deltas and delta chains are either computed against the previous
160 163 revision in the revlog or another revision (almost certainly one of
161 164 the parents of the revision). Historically, deltas were computed against
162 165 the previous revision. The *generaldelta* revlog feature flag (enabled
163 166 by default in Mercurial 3.7) activates the mode where deltas are
164 167 computed against an arbitrary revision (almost certainly a parent revision).
165 168
166 169 File Storage
167 170 ============
168 171
169 172 Revlogs logically consist of an index (metadata of entries) and
170 173 revision data. This data may be stored together in a single file or in
171 174 separate files. The mechanism used is indicated by the ``inline`` feature
172 175 flag on the revlog.
173 176
174 177 Mercurial's behavior is to use inline storage until a revlog reaches a
175 178 certain size, at which point it will be converted to non-inline. The
176 179 reason there is a size limit on inline storage is to establish an upper
177 180 bound on how much data must be read to load the index. It would be a waste
178 181 to read tens or hundreds of extra megabytes of data just to access the
179 182 index data.
180 183
181 184 The actual layout of revlog files on disk is governed by the repository's
182 185 *store format*. Typically, a ``.i`` file represents the index revlog
183 186 (possibly containing inline data) and a ``.d`` file holds the revision data.
184 187
185 188 Revision Entries
186 189 ================
187 190
188 191 Revision entries consist of an optional 1 byte header followed by an
189 192 encoding of the revision data. The headers are as follows:
190 193
191 194 \0 (0x00)
192 195 Revision data is the entirety of the entry, including this header.
193 196 u (0x75)
194 197 Raw revision data follows.
195 198 x (0x78)
196 199 zlib (RFC 1950) data.
197 200
198 201 The 0x78 value is actually the first byte of the zlib header (CMF byte).
199 202
200 203 Hash Computation
201 204 ================
202 205
203 206 The hash of the revision is stored in the index and is used both as a primary
204 207 key and for data integrity verification.
205 208
206 209 Currently, SHA-1 is the only supported hashing algorithm. To obtain the SHA-1
207 210 hash of a revision:
208 211
209 212 1. Hash the parent nodes
210 213 2. Hash the fulltext of the revision
211 214
212 215 The 20 byte node ids of the parents are fed into the hasher in ascending order.
@@ -1,2099 +1,2101
1 1 # revlog.py - storage back-end for mercurial
2 2 #
3 3 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
4 4 #
5 5 # This software may be used and distributed according to the terms of the
6 6 # GNU General Public License version 2 or any later version.
7 7
8 8 """Storage back-end for Mercurial.
9 9
10 10 This provides efficient delta storage with O(1) retrieve and append
11 11 and O(changes) merge between branches.
12 12 """
13 13
14 14 from __future__ import absolute_import
15 15
16 16 import collections
17 17 import errno
18 18 import hashlib
19 19 import os
20 20 import struct
21 21 import zlib
22 22
23 23 # import stuff from node for others to import from revlog
24 24 from .node import (
25 25 bin,
26 26 hex,
27 27 nullid,
28 28 nullrev,
29 29 )
30 30 from .i18n import _
31 31 from . import (
32 32 ancestor,
33 33 error,
34 34 mdiff,
35 35 parsers,
36 36 templatefilters,
37 37 util,
38 38 )
39 39
40 40 _pack = struct.pack
41 41 _unpack = struct.unpack
42 42 # Aliased for performance.
43 43 _zlibdecompress = zlib.decompress
44 44
45 45 # revlog header flags
46 46 REVLOGV0 = 0
47 47 REVLOGNG = 1
48 48 REVLOGNGINLINEDATA = (1 << 16)
49 49 REVLOGGENERALDELTA = (1 << 17)
50 50 REVLOG_DEFAULT_FLAGS = REVLOGNGINLINEDATA
51 51 REVLOG_DEFAULT_FORMAT = REVLOGNG
52 52 REVLOG_DEFAULT_VERSION = REVLOG_DEFAULT_FORMAT | REVLOG_DEFAULT_FLAGS
53 53 REVLOGNG_FLAGS = REVLOGNGINLINEDATA | REVLOGGENERALDELTA
54 54
55 55 # revlog index flags
56 56 REVIDX_ISCENSORED = (1 << 15) # revision has censor metadata, must be verified
57 REVIDX_EXTSTORED = (1 << 14) # revision data is stored externally
57 REVIDX_ELLIPSIS = (1 << 14) # revision hash does not match data (narrowhg)
58 REVIDX_EXTSTORED = (1 << 13) # revision data is stored externally
58 59 REVIDX_DEFAULT_FLAGS = 0
59 60 # stable order in which flags need to be processed and their processors applied
60 61 REVIDX_FLAGS_ORDER = [
61 62 REVIDX_ISCENSORED,
63 REVIDX_ELLIPSIS,
62 64 REVIDX_EXTSTORED,
63 65 ]
64 66 REVIDX_KNOWN_FLAGS = util.bitsfrom(REVIDX_FLAGS_ORDER)
65 67
66 68 # max size of revlog with inline data
67 69 _maxinline = 131072
68 70 _chunksize = 1048576
69 71
70 72 RevlogError = error.RevlogError
71 73 LookupError = error.LookupError
72 74 CensoredNodeError = error.CensoredNodeError
73 75 ProgrammingError = error.ProgrammingError
74 76
75 77 # Store flag processors (cf. 'addflagprocessor()' to register)
76 78 _flagprocessors = {
77 79 REVIDX_ISCENSORED: None,
78 80 }
79 81
80 82 def addflagprocessor(flag, processor):
81 83 """Register a flag processor on a revision data flag.
82 84
83 85 Invariant:
84 86 - Flags need to be defined in REVIDX_KNOWN_FLAGS and REVIDX_FLAGS_ORDER.
85 87 - Only one flag processor can be registered on a specific flag.
86 88 - flagprocessors must be 3-tuples of functions (read, write, raw) with the
87 89 following signatures:
88 90 - (read) f(self, text) -> newtext, bool
89 91 - (write) f(self, text) -> newtext, bool
90 92 - (raw) f(self, text) -> bool
91 93 The boolean returned by these transforms is used to determine whether
92 94 'newtext' can be used for hash integrity checking.
93 95
94 96 Note: The 'raw' transform is used for changegroup generation and in some
95 97 debug commands. In this case the transform only indicates whether the
96 98 contents can be used for hash integrity checks.
97 99 """
98 100 if not flag & REVIDX_KNOWN_FLAGS:
99 101 msg = _("cannot register processor on unknown flag '%#x'.") % (flag)
100 102 raise ProgrammingError(msg)
101 103 if flag not in REVIDX_FLAGS_ORDER:
102 104 msg = _("flag '%#x' undefined in REVIDX_FLAGS_ORDER.") % (flag)
103 105 raise ProgrammingError(msg)
104 106 if flag in _flagprocessors:
105 107 msg = _("cannot register multiple processors on flag '%#x'.") % (flag)
106 108 raise error.Abort(msg)
107 109 _flagprocessors[flag] = processor
108 110
109 111 def getoffset(q):
110 112 return int(q >> 16)
111 113
112 114 def gettype(q):
113 115 return int(q & 0xFFFF)
114 116
115 117 def offset_type(offset, type):
116 118 if (type & ~REVIDX_KNOWN_FLAGS) != 0:
117 119 raise ValueError('unknown revlog index flags')
118 120 return long(long(offset) << 16 | type)
119 121
120 122 _nullhash = hashlib.sha1(nullid)
121 123
122 124 def hash(text, p1, p2):
123 125 """generate a hash from the given text and its parent hashes
124 126
125 127 This hash combines both the current file contents and its history
126 128 in a manner that makes it easy to distinguish nodes with the same
127 129 content in the revision graph.
128 130 """
129 131 # As of now, if one of the parent node is null, p2 is null
130 132 if p2 == nullid:
131 133 # deep copy of a hash is faster than creating one
132 134 s = _nullhash.copy()
133 135 s.update(p1)
134 136 else:
135 137 # none of the parent nodes are nullid
136 138 l = [p1, p2]
137 139 l.sort()
138 140 s = hashlib.sha1(l[0])
139 141 s.update(l[1])
140 142 s.update(text)
141 143 return s.digest()
142 144
143 145 # index v0:
144 146 # 4 bytes: offset
145 147 # 4 bytes: compressed length
146 148 # 4 bytes: base rev
147 149 # 4 bytes: link rev
148 150 # 20 bytes: parent 1 nodeid
149 151 # 20 bytes: parent 2 nodeid
150 152 # 20 bytes: nodeid
151 153 indexformatv0 = ">4l20s20s20s"
152 154
153 155 class revlogoldio(object):
154 156 def __init__(self):
155 157 self.size = struct.calcsize(indexformatv0)
156 158
157 159 def parseindex(self, data, inline):
158 160 s = self.size
159 161 index = []
160 162 nodemap = {nullid: nullrev}
161 163 n = off = 0
162 164 l = len(data)
163 165 while off + s <= l:
164 166 cur = data[off:off + s]
165 167 off += s
166 168 e = _unpack(indexformatv0, cur)
167 169 # transform to revlogv1 format
168 170 e2 = (offset_type(e[0], 0), e[1], -1, e[2], e[3],
169 171 nodemap.get(e[4], nullrev), nodemap.get(e[5], nullrev), e[6])
170 172 index.append(e2)
171 173 nodemap[e[6]] = n
172 174 n += 1
173 175
174 176 # add the magic null revision at -1
175 177 index.append((0, 0, 0, -1, -1, -1, -1, nullid))
176 178
177 179 return index, nodemap, None
178 180
179 181 def packentry(self, entry, node, version, rev):
180 182 if gettype(entry[0]):
181 183 raise RevlogError(_("index entry flags need RevlogNG"))
182 184 e2 = (getoffset(entry[0]), entry[1], entry[3], entry[4],
183 185 node(entry[5]), node(entry[6]), entry[7])
184 186 return _pack(indexformatv0, *e2)
185 187
186 188 # index ng:
187 189 # 6 bytes: offset
188 190 # 2 bytes: flags
189 191 # 4 bytes: compressed length
190 192 # 4 bytes: uncompressed length
191 193 # 4 bytes: base rev
192 194 # 4 bytes: link rev
193 195 # 4 bytes: parent 1 rev
194 196 # 4 bytes: parent 2 rev
195 197 # 32 bytes: nodeid
196 198 indexformatng = ">Qiiiiii20s12x"
197 199 versionformat = ">I"
198 200
199 201 # corresponds to uncompressed length of indexformatng (2 gigs, 4-byte
200 202 # signed integer)
201 203 _maxentrysize = 0x7fffffff
202 204
203 205 class revlogio(object):
204 206 def __init__(self):
205 207 self.size = struct.calcsize(indexformatng)
206 208
207 209 def parseindex(self, data, inline):
208 210 # call the C implementation to parse the index data
209 211 index, cache = parsers.parse_index2(data, inline)
210 212 return index, getattr(index, 'nodemap', None), cache
211 213
212 214 def packentry(self, entry, node, version, rev):
213 215 p = _pack(indexformatng, *entry)
214 216 if rev == 0:
215 217 p = _pack(versionformat, version) + p[4:]
216 218 return p
217 219
218 220 class revlog(object):
219 221 """
220 222 the underlying revision storage object
221 223
222 224 A revlog consists of two parts, an index and the revision data.
223 225
224 226 The index is a file with a fixed record size containing
225 227 information on each revision, including its nodeid (hash), the
226 228 nodeids of its parents, the position and offset of its data within
227 229 the data file, and the revision it's based on. Finally, each entry
228 230 contains a linkrev entry that can serve as a pointer to external
229 231 data.
230 232
231 233 The revision data itself is a linear collection of data chunks.
232 234 Each chunk represents a revision and is usually represented as a
233 235 delta against the previous chunk. To bound lookup time, runs of
234 236 deltas are limited to about 2 times the length of the original
235 237 version data. This makes retrieval of a version proportional to
236 238 its size, or O(1) relative to the number of revisions.
237 239
238 240 Both pieces of the revlog are written to in an append-only
239 241 fashion, which means we never need to rewrite a file to insert or
240 242 remove data, and can use some simple techniques to avoid the need
241 243 for locking while reading.
242 244
243 245 If checkambig, indexfile is opened with checkambig=True at
244 246 writing, to avoid file stat ambiguity.
245 247 """
246 248 def __init__(self, opener, indexfile, checkambig=False):
247 249 """
248 250 create a revlog object
249 251
250 252 opener is a function that abstracts the file opening operation
251 253 and can be used to implement COW semantics or the like.
252 254 """
253 255 self.indexfile = indexfile
254 256 self.datafile = indexfile[:-2] + ".d"
255 257 self.opener = opener
256 258 # When True, indexfile is opened with checkambig=True at writing, to
257 259 # avoid file stat ambiguity.
258 260 self._checkambig = checkambig
259 261 # 3-tuple of (node, rev, text) for a raw revision.
260 262 self._cache = None
261 263 # Maps rev to chain base rev.
262 264 self._chainbasecache = util.lrucachedict(100)
263 265 # 2-tuple of (offset, data) of raw data from the revlog at an offset.
264 266 self._chunkcache = (0, '')
265 267 # How much data to read and cache into the raw revlog data cache.
266 268 self._chunkcachesize = 65536
267 269 self._maxchainlen = None
268 270 self._aggressivemergedeltas = False
269 271 self.index = []
270 272 # Mapping of partial identifiers to full nodes.
271 273 self._pcache = {}
272 274 # Mapping of revision integer to full node.
273 275 self._nodecache = {nullid: nullrev}
274 276 self._nodepos = None
275 277 self._compengine = 'zlib'
276 278
277 279 v = REVLOG_DEFAULT_VERSION
278 280 opts = getattr(opener, 'options', None)
279 281 if opts is not None:
280 282 if 'revlogv1' in opts:
281 283 if 'generaldelta' in opts:
282 284 v |= REVLOGGENERALDELTA
283 285 else:
284 286 v = 0
285 287 if 'chunkcachesize' in opts:
286 288 self._chunkcachesize = opts['chunkcachesize']
287 289 if 'maxchainlen' in opts:
288 290 self._maxchainlen = opts['maxchainlen']
289 291 if 'aggressivemergedeltas' in opts:
290 292 self._aggressivemergedeltas = opts['aggressivemergedeltas']
291 293 self._lazydeltabase = bool(opts.get('lazydeltabase', False))
292 294 if 'compengine' in opts:
293 295 self._compengine = opts['compengine']
294 296
295 297 if self._chunkcachesize <= 0:
296 298 raise RevlogError(_('revlog chunk cache size %r is not greater '
297 299 'than 0') % self._chunkcachesize)
298 300 elif self._chunkcachesize & (self._chunkcachesize - 1):
299 301 raise RevlogError(_('revlog chunk cache size %r is not a power '
300 302 'of 2') % self._chunkcachesize)
301 303
302 304 indexdata = ''
303 305 self._initempty = True
304 306 try:
305 307 f = self.opener(self.indexfile)
306 308 indexdata = f.read()
307 309 f.close()
308 310 if len(indexdata) > 0:
309 311 v = struct.unpack(versionformat, indexdata[:4])[0]
310 312 self._initempty = False
311 313 except IOError as inst:
312 314 if inst.errno != errno.ENOENT:
313 315 raise
314 316
315 317 self.version = v
316 318 self._inline = v & REVLOGNGINLINEDATA
317 319 self._generaldelta = v & REVLOGGENERALDELTA
318 320 flags = v & ~0xFFFF
319 321 fmt = v & 0xFFFF
320 322 if fmt == REVLOGV0 and flags:
321 323 raise RevlogError(_("index %s unknown flags %#04x for format v0")
322 324 % (self.indexfile, flags >> 16))
323 325 elif fmt == REVLOGNG and flags & ~REVLOGNG_FLAGS:
324 326 raise RevlogError(_("index %s unknown flags %#04x for revlogng")
325 327 % (self.indexfile, flags >> 16))
326 328 elif fmt > REVLOGNG:
327 329 raise RevlogError(_("index %s unknown format %d")
328 330 % (self.indexfile, fmt))
329 331
330 332 self.storedeltachains = True
331 333
332 334 self._io = revlogio()
333 335 if self.version == REVLOGV0:
334 336 self._io = revlogoldio()
335 337 try:
336 338 d = self._io.parseindex(indexdata, self._inline)
337 339 except (ValueError, IndexError):
338 340 raise RevlogError(_("index %s is corrupted") % (self.indexfile))
339 341 self.index, nodemap, self._chunkcache = d
340 342 if nodemap is not None:
341 343 self.nodemap = self._nodecache = nodemap
342 344 if not self._chunkcache:
343 345 self._chunkclear()
344 346 # revnum -> (chain-length, sum-delta-length)
345 347 self._chaininfocache = {}
346 348 # revlog header -> revlog compressor
347 349 self._decompressors = {}
348 350
349 351 @util.propertycache
350 352 def _compressor(self):
351 353 return util.compengines[self._compengine].revlogcompressor()
352 354
353 355 def tip(self):
354 356 return self.node(len(self.index) - 2)
355 357 def __contains__(self, rev):
356 358 return 0 <= rev < len(self)
357 359 def __len__(self):
358 360 return len(self.index) - 1
359 361 def __iter__(self):
360 362 return iter(xrange(len(self)))
361 363 def revs(self, start=0, stop=None):
362 364 """iterate over all rev in this revlog (from start to stop)"""
363 365 step = 1
364 366 if stop is not None:
365 367 if start > stop:
366 368 step = -1
367 369 stop += step
368 370 else:
369 371 stop = len(self)
370 372 return xrange(start, stop, step)
371 373
372 374 @util.propertycache
373 375 def nodemap(self):
374 376 self.rev(self.node(0))
375 377 return self._nodecache
376 378
377 379 def hasnode(self, node):
378 380 try:
379 381 self.rev(node)
380 382 return True
381 383 except KeyError:
382 384 return False
383 385
384 386 def clearcaches(self):
385 387 self._cache = None
386 388 self._chainbasecache.clear()
387 389 self._chunkcache = (0, '')
388 390 self._pcache = {}
389 391
390 392 try:
391 393 self._nodecache.clearcaches()
392 394 except AttributeError:
393 395 self._nodecache = {nullid: nullrev}
394 396 self._nodepos = None
395 397
396 398 def rev(self, node):
397 399 try:
398 400 return self._nodecache[node]
399 401 except TypeError:
400 402 raise
401 403 except RevlogError:
402 404 # parsers.c radix tree lookup failed
403 405 raise LookupError(node, self.indexfile, _('no node'))
404 406 except KeyError:
405 407 # pure python cache lookup failed
406 408 n = self._nodecache
407 409 i = self.index
408 410 p = self._nodepos
409 411 if p is None:
410 412 p = len(i) - 2
411 413 for r in xrange(p, -1, -1):
412 414 v = i[r][7]
413 415 n[v] = r
414 416 if v == node:
415 417 self._nodepos = r - 1
416 418 return r
417 419 raise LookupError(node, self.indexfile, _('no node'))
418 420
419 421 # Accessors for index entries.
420 422
421 423 # First tuple entry is 8 bytes. First 6 bytes are offset. Last 2 bytes
422 424 # are flags.
423 425 def start(self, rev):
424 426 return int(self.index[rev][0] >> 16)
425 427
426 428 def flags(self, rev):
427 429 return self.index[rev][0] & 0xFFFF
428 430
429 431 def length(self, rev):
430 432 return self.index[rev][1]
431 433
432 434 def rawsize(self, rev):
433 435 """return the length of the uncompressed text for a given revision"""
434 436 l = self.index[rev][2]
435 437 if l >= 0:
436 438 return l
437 439
438 440 t = self.revision(self.node(rev))
439 441 return len(t)
440 442 size = rawsize
441 443
442 444 def chainbase(self, rev):
443 445 base = self._chainbasecache.get(rev)
444 446 if base is not None:
445 447 return base
446 448
447 449 index = self.index
448 450 base = index[rev][3]
449 451 while base != rev:
450 452 rev = base
451 453 base = index[rev][3]
452 454
453 455 self._chainbasecache[rev] = base
454 456 return base
455 457
456 458 def linkrev(self, rev):
457 459 return self.index[rev][4]
458 460
459 461 def parentrevs(self, rev):
460 462 return self.index[rev][5:7]
461 463
462 464 def node(self, rev):
463 465 return self.index[rev][7]
464 466
465 467 # Derived from index values.
466 468
467 469 def end(self, rev):
468 470 return self.start(rev) + self.length(rev)
469 471
470 472 def parents(self, node):
471 473 i = self.index
472 474 d = i[self.rev(node)]
473 475 return i[d[5]][7], i[d[6]][7] # map revisions to nodes inline
474 476
475 477 def chainlen(self, rev):
476 478 return self._chaininfo(rev)[0]
477 479
478 480 def _chaininfo(self, rev):
479 481 chaininfocache = self._chaininfocache
480 482 if rev in chaininfocache:
481 483 return chaininfocache[rev]
482 484 index = self.index
483 485 generaldelta = self._generaldelta
484 486 iterrev = rev
485 487 e = index[iterrev]
486 488 clen = 0
487 489 compresseddeltalen = 0
488 490 while iterrev != e[3]:
489 491 clen += 1
490 492 compresseddeltalen += e[1]
491 493 if generaldelta:
492 494 iterrev = e[3]
493 495 else:
494 496 iterrev -= 1
495 497 if iterrev in chaininfocache:
496 498 t = chaininfocache[iterrev]
497 499 clen += t[0]
498 500 compresseddeltalen += t[1]
499 501 break
500 502 e = index[iterrev]
501 503 else:
502 504 # Add text length of base since decompressing that also takes
503 505 # work. For cache hits the length is already included.
504 506 compresseddeltalen += e[1]
505 507 r = (clen, compresseddeltalen)
506 508 chaininfocache[rev] = r
507 509 return r
508 510
509 511 def _deltachain(self, rev, stoprev=None):
510 512 """Obtain the delta chain for a revision.
511 513
512 514 ``stoprev`` specifies a revision to stop at. If not specified, we
513 515 stop at the base of the chain.
514 516
515 517 Returns a 2-tuple of (chain, stopped) where ``chain`` is a list of
516 518 revs in ascending order and ``stopped`` is a bool indicating whether
517 519 ``stoprev`` was hit.
518 520 """
519 521 chain = []
520 522
521 523 # Alias to prevent attribute lookup in tight loop.
522 524 index = self.index
523 525 generaldelta = self._generaldelta
524 526
525 527 iterrev = rev
526 528 e = index[iterrev]
527 529 while iterrev != e[3] and iterrev != stoprev:
528 530 chain.append(iterrev)
529 531 if generaldelta:
530 532 iterrev = e[3]
531 533 else:
532 534 iterrev -= 1
533 535 e = index[iterrev]
534 536
535 537 if iterrev == stoprev:
536 538 stopped = True
537 539 else:
538 540 chain.append(iterrev)
539 541 stopped = False
540 542
541 543 chain.reverse()
542 544 return chain, stopped
543 545
544 546 def ancestors(self, revs, stoprev=0, inclusive=False):
545 547 """Generate the ancestors of 'revs' in reverse topological order.
546 548 Does not generate revs lower than stoprev.
547 549
548 550 See the documentation for ancestor.lazyancestors for more details."""
549 551
550 552 return ancestor.lazyancestors(self.parentrevs, revs, stoprev=stoprev,
551 553 inclusive=inclusive)
552 554
553 555 def descendants(self, revs):
554 556 """Generate the descendants of 'revs' in revision order.
555 557
556 558 Yield a sequence of revision numbers starting with a child of
557 559 some rev in revs, i.e., each revision is *not* considered a
558 560 descendant of itself. Results are ordered by revision number (a
559 561 topological sort)."""
560 562 first = min(revs)
561 563 if first == nullrev:
562 564 for i in self:
563 565 yield i
564 566 return
565 567
566 568 seen = set(revs)
567 569 for i in self.revs(start=first + 1):
568 570 for x in self.parentrevs(i):
569 571 if x != nullrev and x in seen:
570 572 seen.add(i)
571 573 yield i
572 574 break
573 575
574 576 def findcommonmissing(self, common=None, heads=None):
575 577 """Return a tuple of the ancestors of common and the ancestors of heads
576 578 that are not ancestors of common. In revset terminology, we return the
577 579 tuple:
578 580
579 581 ::common, (::heads) - (::common)
580 582
581 583 The list is sorted by revision number, meaning it is
582 584 topologically sorted.
583 585
584 586 'heads' and 'common' are both lists of node IDs. If heads is
585 587 not supplied, uses all of the revlog's heads. If common is not
586 588 supplied, uses nullid."""
587 589 if common is None:
588 590 common = [nullid]
589 591 if heads is None:
590 592 heads = self.heads()
591 593
592 594 common = [self.rev(n) for n in common]
593 595 heads = [self.rev(n) for n in heads]
594 596
595 597 # we want the ancestors, but inclusive
596 598 class lazyset(object):
597 599 def __init__(self, lazyvalues):
598 600 self.addedvalues = set()
599 601 self.lazyvalues = lazyvalues
600 602
601 603 def __contains__(self, value):
602 604 return value in self.addedvalues or value in self.lazyvalues
603 605
604 606 def __iter__(self):
605 607 added = self.addedvalues
606 608 for r in added:
607 609 yield r
608 610 for r in self.lazyvalues:
609 611 if not r in added:
610 612 yield r
611 613
612 614 def add(self, value):
613 615 self.addedvalues.add(value)
614 616
615 617 def update(self, values):
616 618 self.addedvalues.update(values)
617 619
618 620 has = lazyset(self.ancestors(common))
619 621 has.add(nullrev)
620 622 has.update(common)
621 623
622 624 # take all ancestors from heads that aren't in has
623 625 missing = set()
624 626 visit = collections.deque(r for r in heads if r not in has)
625 627 while visit:
626 628 r = visit.popleft()
627 629 if r in missing:
628 630 continue
629 631 else:
630 632 missing.add(r)
631 633 for p in self.parentrevs(r):
632 634 if p not in has:
633 635 visit.append(p)
634 636 missing = list(missing)
635 637 missing.sort()
636 638 return has, [self.node(miss) for miss in missing]
637 639
638 640 def incrementalmissingrevs(self, common=None):
639 641 """Return an object that can be used to incrementally compute the
640 642 revision numbers of the ancestors of arbitrary sets that are not
641 643 ancestors of common. This is an ancestor.incrementalmissingancestors
642 644 object.
643 645
644 646 'common' is a list of revision numbers. If common is not supplied, uses
645 647 nullrev.
646 648 """
647 649 if common is None:
648 650 common = [nullrev]
649 651
650 652 return ancestor.incrementalmissingancestors(self.parentrevs, common)
651 653
652 654 def findmissingrevs(self, common=None, heads=None):
653 655 """Return the revision numbers of the ancestors of heads that
654 656 are not ancestors of common.
655 657
656 658 More specifically, return a list of revision numbers corresponding to
657 659 nodes N such that every N satisfies the following constraints:
658 660
659 661 1. N is an ancestor of some node in 'heads'
660 662 2. N is not an ancestor of any node in 'common'
661 663
662 664 The list is sorted by revision number, meaning it is
663 665 topologically sorted.
664 666
665 667 'heads' and 'common' are both lists of revision numbers. If heads is
666 668 not supplied, uses all of the revlog's heads. If common is not
667 669 supplied, uses nullid."""
668 670 if common is None:
669 671 common = [nullrev]
670 672 if heads is None:
671 673 heads = self.headrevs()
672 674
673 675 inc = self.incrementalmissingrevs(common=common)
674 676 return inc.missingancestors(heads)
675 677
676 678 def findmissing(self, common=None, heads=None):
677 679 """Return the ancestors of heads that are not ancestors of common.
678 680
679 681 More specifically, return a list of nodes N such that every N
680 682 satisfies the following constraints:
681 683
682 684 1. N is an ancestor of some node in 'heads'
683 685 2. N is not an ancestor of any node in 'common'
684 686
685 687 The list is sorted by revision number, meaning it is
686 688 topologically sorted.
687 689
688 690 'heads' and 'common' are both lists of node IDs. If heads is
689 691 not supplied, uses all of the revlog's heads. If common is not
690 692 supplied, uses nullid."""
691 693 if common is None:
692 694 common = [nullid]
693 695 if heads is None:
694 696 heads = self.heads()
695 697
696 698 common = [self.rev(n) for n in common]
697 699 heads = [self.rev(n) for n in heads]
698 700
699 701 inc = self.incrementalmissingrevs(common=common)
700 702 return [self.node(r) for r in inc.missingancestors(heads)]
701 703
702 704 def nodesbetween(self, roots=None, heads=None):
703 705 """Return a topological path from 'roots' to 'heads'.
704 706
705 707 Return a tuple (nodes, outroots, outheads) where 'nodes' is a
706 708 topologically sorted list of all nodes N that satisfy both of
707 709 these constraints:
708 710
709 711 1. N is a descendant of some node in 'roots'
710 712 2. N is an ancestor of some node in 'heads'
711 713
712 714 Every node is considered to be both a descendant and an ancestor
713 715 of itself, so every reachable node in 'roots' and 'heads' will be
714 716 included in 'nodes'.
715 717
716 718 'outroots' is the list of reachable nodes in 'roots', i.e., the
717 719 subset of 'roots' that is returned in 'nodes'. Likewise,
718 720 'outheads' is the subset of 'heads' that is also in 'nodes'.
719 721
720 722 'roots' and 'heads' are both lists of node IDs. If 'roots' is
721 723 unspecified, uses nullid as the only root. If 'heads' is
722 724 unspecified, uses list of all of the revlog's heads."""
723 725 nonodes = ([], [], [])
724 726 if roots is not None:
725 727 roots = list(roots)
726 728 if not roots:
727 729 return nonodes
728 730 lowestrev = min([self.rev(n) for n in roots])
729 731 else:
730 732 roots = [nullid] # Everybody's a descendant of nullid
731 733 lowestrev = nullrev
732 734 if (lowestrev == nullrev) and (heads is None):
733 735 # We want _all_ the nodes!
734 736 return ([self.node(r) for r in self], [nullid], list(self.heads()))
735 737 if heads is None:
736 738 # All nodes are ancestors, so the latest ancestor is the last
737 739 # node.
738 740 highestrev = len(self) - 1
739 741 # Set ancestors to None to signal that every node is an ancestor.
740 742 ancestors = None
741 743 # Set heads to an empty dictionary for later discovery of heads
742 744 heads = {}
743 745 else:
744 746 heads = list(heads)
745 747 if not heads:
746 748 return nonodes
747 749 ancestors = set()
748 750 # Turn heads into a dictionary so we can remove 'fake' heads.
749 751 # Also, later we will be using it to filter out the heads we can't
750 752 # find from roots.
751 753 heads = dict.fromkeys(heads, False)
752 754 # Start at the top and keep marking parents until we're done.
753 755 nodestotag = set(heads)
754 756 # Remember where the top was so we can use it as a limit later.
755 757 highestrev = max([self.rev(n) for n in nodestotag])
756 758 while nodestotag:
757 759 # grab a node to tag
758 760 n = nodestotag.pop()
759 761 # Never tag nullid
760 762 if n == nullid:
761 763 continue
762 764 # A node's revision number represents its place in a
763 765 # topologically sorted list of nodes.
764 766 r = self.rev(n)
765 767 if r >= lowestrev:
766 768 if n not in ancestors:
767 769 # If we are possibly a descendant of one of the roots
768 770 # and we haven't already been marked as an ancestor
769 771 ancestors.add(n) # Mark as ancestor
770 772 # Add non-nullid parents to list of nodes to tag.
771 773 nodestotag.update([p for p in self.parents(n) if
772 774 p != nullid])
773 775 elif n in heads: # We've seen it before, is it a fake head?
774 776 # So it is, real heads should not be the ancestors of
775 777 # any other heads.
776 778 heads.pop(n)
777 779 if not ancestors:
778 780 return nonodes
779 781 # Now that we have our set of ancestors, we want to remove any
780 782 # roots that are not ancestors.
781 783
782 784 # If one of the roots was nullid, everything is included anyway.
783 785 if lowestrev > nullrev:
784 786 # But, since we weren't, let's recompute the lowest rev to not
785 787 # include roots that aren't ancestors.
786 788
787 789 # Filter out roots that aren't ancestors of heads
788 790 roots = [root for root in roots if root in ancestors]
789 791 # Recompute the lowest revision
790 792 if roots:
791 793 lowestrev = min([self.rev(root) for root in roots])
792 794 else:
793 795 # No more roots? Return empty list
794 796 return nonodes
795 797 else:
796 798 # We are descending from nullid, and don't need to care about
797 799 # any other roots.
798 800 lowestrev = nullrev
799 801 roots = [nullid]
800 802 # Transform our roots list into a set.
801 803 descendants = set(roots)
802 804 # Also, keep the original roots so we can filter out roots that aren't
803 805 # 'real' roots (i.e. are descended from other roots).
804 806 roots = descendants.copy()
805 807 # Our topologically sorted list of output nodes.
806 808 orderedout = []
807 809 # Don't start at nullid since we don't want nullid in our output list,
808 810 # and if nullid shows up in descendants, empty parents will look like
809 811 # they're descendants.
810 812 for r in self.revs(start=max(lowestrev, 0), stop=highestrev + 1):
811 813 n = self.node(r)
812 814 isdescendant = False
813 815 if lowestrev == nullrev: # Everybody is a descendant of nullid
814 816 isdescendant = True
815 817 elif n in descendants:
816 818 # n is already a descendant
817 819 isdescendant = True
818 820 # This check only needs to be done here because all the roots
819 821 # will start being marked is descendants before the loop.
820 822 if n in roots:
821 823 # If n was a root, check if it's a 'real' root.
822 824 p = tuple(self.parents(n))
823 825 # If any of its parents are descendants, it's not a root.
824 826 if (p[0] in descendants) or (p[1] in descendants):
825 827 roots.remove(n)
826 828 else:
827 829 p = tuple(self.parents(n))
828 830 # A node is a descendant if either of its parents are
829 831 # descendants. (We seeded the dependents list with the roots
830 832 # up there, remember?)
831 833 if (p[0] in descendants) or (p[1] in descendants):
832 834 descendants.add(n)
833 835 isdescendant = True
834 836 if isdescendant and ((ancestors is None) or (n in ancestors)):
835 837 # Only include nodes that are both descendants and ancestors.
836 838 orderedout.append(n)
837 839 if (ancestors is not None) and (n in heads):
838 840 # We're trying to figure out which heads are reachable
839 841 # from roots.
840 842 # Mark this head as having been reached
841 843 heads[n] = True
842 844 elif ancestors is None:
843 845 # Otherwise, we're trying to discover the heads.
844 846 # Assume this is a head because if it isn't, the next step
845 847 # will eventually remove it.
846 848 heads[n] = True
847 849 # But, obviously its parents aren't.
848 850 for p in self.parents(n):
849 851 heads.pop(p, None)
850 852 heads = [head for head, flag in heads.iteritems() if flag]
851 853 roots = list(roots)
852 854 assert orderedout
853 855 assert roots
854 856 assert heads
855 857 return (orderedout, roots, heads)
856 858
857 859 def headrevs(self):
858 860 try:
859 861 return self.index.headrevs()
860 862 except AttributeError:
861 863 return self._headrevs()
862 864
863 865 def computephases(self, roots):
864 866 return self.index.computephasesmapsets(roots)
865 867
866 868 def _headrevs(self):
867 869 count = len(self)
868 870 if not count:
869 871 return [nullrev]
870 872 # we won't iter over filtered rev so nobody is a head at start
871 873 ishead = [0] * (count + 1)
872 874 index = self.index
873 875 for r in self:
874 876 ishead[r] = 1 # I may be an head
875 877 e = index[r]
876 878 ishead[e[5]] = ishead[e[6]] = 0 # my parent are not
877 879 return [r for r, val in enumerate(ishead) if val]
878 880
879 881 def heads(self, start=None, stop=None):
880 882 """return the list of all nodes that have no children
881 883
882 884 if start is specified, only heads that are descendants of
883 885 start will be returned
884 886 if stop is specified, it will consider all the revs from stop
885 887 as if they had no children
886 888 """
887 889 if start is None and stop is None:
888 890 if not len(self):
889 891 return [nullid]
890 892 return [self.node(r) for r in self.headrevs()]
891 893
892 894 if start is None:
893 895 start = nullid
894 896 if stop is None:
895 897 stop = []
896 898 stoprevs = set([self.rev(n) for n in stop])
897 899 startrev = self.rev(start)
898 900 reachable = set((startrev,))
899 901 heads = set((startrev,))
900 902
901 903 parentrevs = self.parentrevs
902 904 for r in self.revs(start=startrev + 1):
903 905 for p in parentrevs(r):
904 906 if p in reachable:
905 907 if r not in stoprevs:
906 908 reachable.add(r)
907 909 heads.add(r)
908 910 if p in heads and p not in stoprevs:
909 911 heads.remove(p)
910 912
911 913 return [self.node(r) for r in heads]
912 914
913 915 def children(self, node):
914 916 """find the children of a given node"""
915 917 c = []
916 918 p = self.rev(node)
917 919 for r in self.revs(start=p + 1):
918 920 prevs = [pr for pr in self.parentrevs(r) if pr != nullrev]
919 921 if prevs:
920 922 for pr in prevs:
921 923 if pr == p:
922 924 c.append(self.node(r))
923 925 elif p == nullrev:
924 926 c.append(self.node(r))
925 927 return c
926 928
927 929 def descendant(self, start, end):
928 930 if start == nullrev:
929 931 return True
930 932 for i in self.descendants([start]):
931 933 if i == end:
932 934 return True
933 935 elif i > end:
934 936 break
935 937 return False
936 938
937 939 def commonancestorsheads(self, a, b):
938 940 """calculate all the heads of the common ancestors of nodes a and b"""
939 941 a, b = self.rev(a), self.rev(b)
940 942 try:
941 943 ancs = self.index.commonancestorsheads(a, b)
942 944 except (AttributeError, OverflowError): # C implementation failed
943 945 ancs = ancestor.commonancestorsheads(self.parentrevs, a, b)
944 946 return map(self.node, ancs)
945 947
946 948 def isancestor(self, a, b):
947 949 """return True if node a is an ancestor of node b
948 950
949 951 The implementation of this is trivial but the use of
950 952 commonancestorsheads is not."""
951 953 return a in self.commonancestorsheads(a, b)
952 954
953 955 def ancestor(self, a, b):
954 956 """calculate the "best" common ancestor of nodes a and b"""
955 957
956 958 a, b = self.rev(a), self.rev(b)
957 959 try:
958 960 ancs = self.index.ancestors(a, b)
959 961 except (AttributeError, OverflowError):
960 962 ancs = ancestor.ancestors(self.parentrevs, a, b)
961 963 if ancs:
962 964 # choose a consistent winner when there's a tie
963 965 return min(map(self.node, ancs))
964 966 return nullid
965 967
966 968 def _match(self, id):
967 969 if isinstance(id, int):
968 970 # rev
969 971 return self.node(id)
970 972 if len(id) == 20:
971 973 # possibly a binary node
972 974 # odds of a binary node being all hex in ASCII are 1 in 10**25
973 975 try:
974 976 node = id
975 977 self.rev(node) # quick search the index
976 978 return node
977 979 except LookupError:
978 980 pass # may be partial hex id
979 981 try:
980 982 # str(rev)
981 983 rev = int(id)
982 984 if str(rev) != id:
983 985 raise ValueError
984 986 if rev < 0:
985 987 rev = len(self) + rev
986 988 if rev < 0 or rev >= len(self):
987 989 raise ValueError
988 990 return self.node(rev)
989 991 except (ValueError, OverflowError):
990 992 pass
991 993 if len(id) == 40:
992 994 try:
993 995 # a full hex nodeid?
994 996 node = bin(id)
995 997 self.rev(node)
996 998 return node
997 999 except (TypeError, LookupError):
998 1000 pass
999 1001
1000 1002 def _partialmatch(self, id):
1001 1003 try:
1002 1004 partial = self.index.partialmatch(id)
1003 1005 if partial and self.hasnode(partial):
1004 1006 return partial
1005 1007 return None
1006 1008 except RevlogError:
1007 1009 # parsers.c radix tree lookup gave multiple matches
1008 1010 # fast path: for unfiltered changelog, radix tree is accurate
1009 1011 if not getattr(self, 'filteredrevs', None):
1010 1012 raise LookupError(id, self.indexfile,
1011 1013 _('ambiguous identifier'))
1012 1014 # fall through to slow path that filters hidden revisions
1013 1015 except (AttributeError, ValueError):
1014 1016 # we are pure python, or key was too short to search radix tree
1015 1017 pass
1016 1018
1017 1019 if id in self._pcache:
1018 1020 return self._pcache[id]
1019 1021
1020 1022 if len(id) < 40:
1021 1023 try:
1022 1024 # hex(node)[:...]
1023 1025 l = len(id) // 2 # grab an even number of digits
1024 1026 prefix = bin(id[:l * 2])
1025 1027 nl = [e[7] for e in self.index if e[7].startswith(prefix)]
1026 1028 nl = [n for n in nl if hex(n).startswith(id) and
1027 1029 self.hasnode(n)]
1028 1030 if len(nl) > 0:
1029 1031 if len(nl) == 1:
1030 1032 self._pcache[id] = nl[0]
1031 1033 return nl[0]
1032 1034 raise LookupError(id, self.indexfile,
1033 1035 _('ambiguous identifier'))
1034 1036 return None
1035 1037 except TypeError:
1036 1038 pass
1037 1039
1038 1040 def lookup(self, id):
1039 1041 """locate a node based on:
1040 1042 - revision number or str(revision number)
1041 1043 - nodeid or subset of hex nodeid
1042 1044 """
1043 1045 n = self._match(id)
1044 1046 if n is not None:
1045 1047 return n
1046 1048 n = self._partialmatch(id)
1047 1049 if n:
1048 1050 return n
1049 1051
1050 1052 raise LookupError(id, self.indexfile, _('no match found'))
1051 1053
1052 1054 def cmp(self, node, text):
1053 1055 """compare text with a given file revision
1054 1056
1055 1057 returns True if text is different than what is stored.
1056 1058 """
1057 1059 p1, p2 = self.parents(node)
1058 1060 return hash(text, p1, p2) != node
1059 1061
1060 1062 def _addchunk(self, offset, data):
1061 1063 """Add a segment to the revlog cache.
1062 1064
1063 1065 Accepts an absolute offset and the data that is at that location.
1064 1066 """
1065 1067 o, d = self._chunkcache
1066 1068 # try to add to existing cache
1067 1069 if o + len(d) == offset and len(d) + len(data) < _chunksize:
1068 1070 self._chunkcache = o, d + data
1069 1071 else:
1070 1072 self._chunkcache = offset, data
1071 1073
1072 1074 def _loadchunk(self, offset, length, df=None):
1073 1075 """Load a segment of raw data from the revlog.
1074 1076
1075 1077 Accepts an absolute offset, length to read, and an optional existing
1076 1078 file handle to read from.
1077 1079
1078 1080 If an existing file handle is passed, it will be seeked and the
1079 1081 original seek position will NOT be restored.
1080 1082
1081 1083 Returns a str or buffer of raw byte data.
1082 1084 """
1083 1085 if df is not None:
1084 1086 closehandle = False
1085 1087 else:
1086 1088 if self._inline:
1087 1089 df = self.opener(self.indexfile)
1088 1090 else:
1089 1091 df = self.opener(self.datafile)
1090 1092 closehandle = True
1091 1093
1092 1094 # Cache data both forward and backward around the requested
1093 1095 # data, in a fixed size window. This helps speed up operations
1094 1096 # involving reading the revlog backwards.
1095 1097 cachesize = self._chunkcachesize
1096 1098 realoffset = offset & ~(cachesize - 1)
1097 1099 reallength = (((offset + length + cachesize) & ~(cachesize - 1))
1098 1100 - realoffset)
1099 1101 df.seek(realoffset)
1100 1102 d = df.read(reallength)
1101 1103 if closehandle:
1102 1104 df.close()
1103 1105 self._addchunk(realoffset, d)
1104 1106 if offset != realoffset or reallength != length:
1105 1107 return util.buffer(d, offset - realoffset, length)
1106 1108 return d
1107 1109
1108 1110 def _getchunk(self, offset, length, df=None):
1109 1111 """Obtain a segment of raw data from the revlog.
1110 1112
1111 1113 Accepts an absolute offset, length of bytes to obtain, and an
1112 1114 optional file handle to the already-opened revlog. If the file
1113 1115 handle is used, it's original seek position will not be preserved.
1114 1116
1115 1117 Requests for data may be returned from a cache.
1116 1118
1117 1119 Returns a str or a buffer instance of raw byte data.
1118 1120 """
1119 1121 o, d = self._chunkcache
1120 1122 l = len(d)
1121 1123
1122 1124 # is it in the cache?
1123 1125 cachestart = offset - o
1124 1126 cacheend = cachestart + length
1125 1127 if cachestart >= 0 and cacheend <= l:
1126 1128 if cachestart == 0 and cacheend == l:
1127 1129 return d # avoid a copy
1128 1130 return util.buffer(d, cachestart, cacheend - cachestart)
1129 1131
1130 1132 return self._loadchunk(offset, length, df=df)
1131 1133
1132 1134 def _chunkraw(self, startrev, endrev, df=None):
1133 1135 """Obtain a segment of raw data corresponding to a range of revisions.
1134 1136
1135 1137 Accepts the start and end revisions and an optional already-open
1136 1138 file handle to be used for reading. If the file handle is read, its
1137 1139 seek position will not be preserved.
1138 1140
1139 1141 Requests for data may be satisfied by a cache.
1140 1142
1141 1143 Returns a 2-tuple of (offset, data) for the requested range of
1142 1144 revisions. Offset is the integer offset from the beginning of the
1143 1145 revlog and data is a str or buffer of the raw byte data.
1144 1146
1145 1147 Callers will need to call ``self.start(rev)`` and ``self.length(rev)``
1146 1148 to determine where each revision's data begins and ends.
1147 1149 """
1148 1150 # Inlined self.start(startrev) & self.end(endrev) for perf reasons
1149 1151 # (functions are expensive).
1150 1152 index = self.index
1151 1153 istart = index[startrev]
1152 1154 start = int(istart[0] >> 16)
1153 1155 if startrev == endrev:
1154 1156 end = start + istart[1]
1155 1157 else:
1156 1158 iend = index[endrev]
1157 1159 end = int(iend[0] >> 16) + iend[1]
1158 1160
1159 1161 if self._inline:
1160 1162 start += (startrev + 1) * self._io.size
1161 1163 end += (endrev + 1) * self._io.size
1162 1164 length = end - start
1163 1165
1164 1166 return start, self._getchunk(start, length, df=df)
1165 1167
1166 1168 def _chunk(self, rev, df=None):
1167 1169 """Obtain a single decompressed chunk for a revision.
1168 1170
1169 1171 Accepts an integer revision and an optional already-open file handle
1170 1172 to be used for reading. If used, the seek position of the file will not
1171 1173 be preserved.
1172 1174
1173 1175 Returns a str holding uncompressed data for the requested revision.
1174 1176 """
1175 1177 return self.decompress(self._chunkraw(rev, rev, df=df)[1])
1176 1178
1177 1179 def _chunks(self, revs, df=None):
1178 1180 """Obtain decompressed chunks for the specified revisions.
1179 1181
1180 1182 Accepts an iterable of numeric revisions that are assumed to be in
1181 1183 ascending order. Also accepts an optional already-open file handle
1182 1184 to be used for reading. If used, the seek position of the file will
1183 1185 not be preserved.
1184 1186
1185 1187 This function is similar to calling ``self._chunk()`` multiple times,
1186 1188 but is faster.
1187 1189
1188 1190 Returns a list with decompressed data for each requested revision.
1189 1191 """
1190 1192 if not revs:
1191 1193 return []
1192 1194 start = self.start
1193 1195 length = self.length
1194 1196 inline = self._inline
1195 1197 iosize = self._io.size
1196 1198 buffer = util.buffer
1197 1199
1198 1200 l = []
1199 1201 ladd = l.append
1200 1202
1201 1203 try:
1202 1204 offset, data = self._chunkraw(revs[0], revs[-1], df=df)
1203 1205 except OverflowError:
1204 1206 # issue4215 - we can't cache a run of chunks greater than
1205 1207 # 2G on Windows
1206 1208 return [self._chunk(rev, df=df) for rev in revs]
1207 1209
1208 1210 decomp = self.decompress
1209 1211 for rev in revs:
1210 1212 chunkstart = start(rev)
1211 1213 if inline:
1212 1214 chunkstart += (rev + 1) * iosize
1213 1215 chunklength = length(rev)
1214 1216 ladd(decomp(buffer(data, chunkstart - offset, chunklength)))
1215 1217
1216 1218 return l
1217 1219
1218 1220 def _chunkclear(self):
1219 1221 """Clear the raw chunk cache."""
1220 1222 self._chunkcache = (0, '')
1221 1223
1222 1224 def deltaparent(self, rev):
1223 1225 """return deltaparent of the given revision"""
1224 1226 base = self.index[rev][3]
1225 1227 if base == rev:
1226 1228 return nullrev
1227 1229 elif self._generaldelta:
1228 1230 return base
1229 1231 else:
1230 1232 return rev - 1
1231 1233
1232 1234 def revdiff(self, rev1, rev2):
1233 1235 """return or calculate a delta between two revisions"""
1234 1236 if rev1 != nullrev and self.deltaparent(rev2) == rev1:
1235 1237 return str(self._chunk(rev2))
1236 1238
1237 1239 return mdiff.textdiff(self.revision(rev1),
1238 1240 self.revision(rev2))
1239 1241
1240 1242 def revision(self, nodeorrev, _df=None, raw=False):
1241 1243 """return an uncompressed revision of a given node or revision
1242 1244 number.
1243 1245
1244 1246 _df - an existing file handle to read from. (internal-only)
1245 1247 raw - an optional argument specifying if the revision data is to be
1246 1248 treated as raw data when applying flag transforms. 'raw' should be set
1247 1249 to True when generating changegroups or in debug commands.
1248 1250 """
1249 1251 if isinstance(nodeorrev, int):
1250 1252 rev = nodeorrev
1251 1253 node = self.node(rev)
1252 1254 else:
1253 1255 node = nodeorrev
1254 1256 rev = None
1255 1257
1256 1258 cachedrev = None
1257 1259 if node == nullid:
1258 1260 return ""
1259 1261 if self._cache:
1260 1262 if self._cache[0] == node:
1261 1263 return self._cache[2]
1262 1264 cachedrev = self._cache[1]
1263 1265
1264 1266 # look up what we need to read
1265 1267 text = None
1266 1268 if rev is None:
1267 1269 rev = self.rev(node)
1268 1270
1269 1271 chain, stopped = self._deltachain(rev, stoprev=cachedrev)
1270 1272 if stopped:
1271 1273 text = self._cache[2]
1272 1274
1273 1275 # drop cache to save memory
1274 1276 self._cache = None
1275 1277
1276 1278 bins = self._chunks(chain, df=_df)
1277 1279 if text is None:
1278 1280 text = str(bins[0])
1279 1281 bins = bins[1:]
1280 1282
1281 1283 text = mdiff.patches(text, bins)
1282 1284
1283 1285 text, validatehash = self._processflags(text, self.flags(rev), 'read',
1284 1286 raw=raw)
1285 1287 if validatehash:
1286 1288 self.checkhash(text, node, rev=rev)
1287 1289
1288 1290 self._cache = (node, rev, text)
1289 1291 return text
1290 1292
1291 1293 def hash(self, text, p1, p2):
1292 1294 """Compute a node hash.
1293 1295
1294 1296 Available as a function so that subclasses can replace the hash
1295 1297 as needed.
1296 1298 """
1297 1299 return hash(text, p1, p2)
1298 1300
1299 1301 def _processflags(self, text, flags, operation, raw=False):
1300 1302 """Inspect revision data flags and applies transforms defined by
1301 1303 registered flag processors.
1302 1304
1303 1305 ``text`` - the revision data to process
1304 1306 ``flags`` - the revision flags
1305 1307 ``operation`` - the operation being performed (read or write)
1306 1308 ``raw`` - an optional argument describing if the raw transform should be
1307 1309 applied.
1308 1310
1309 1311 This method processes the flags in the order (or reverse order if
1310 1312 ``operation`` is 'write') defined by REVIDX_FLAGS_ORDER, applying the
1311 1313 flag processors registered for present flags. The order of flags defined
1312 1314 in REVIDX_FLAGS_ORDER needs to be stable to allow non-commutativity.
1313 1315
1314 1316 Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is the
1315 1317 processed text and ``validatehash`` is a bool indicating whether the
1316 1318 returned text should be checked for hash integrity.
1317 1319
1318 1320 Note: If the ``raw`` argument is set, it has precedence over the
1319 1321 operation and will only update the value of ``validatehash``.
1320 1322 """
1321 1323 if not operation in ('read', 'write'):
1322 1324 raise ProgrammingError(_("invalid '%s' operation ") % (operation))
1323 1325 # Check all flags are known.
1324 1326 if flags & ~REVIDX_KNOWN_FLAGS:
1325 1327 raise RevlogError(_("incompatible revision flag '%#x'") %
1326 1328 (flags & ~REVIDX_KNOWN_FLAGS))
1327 1329 validatehash = True
1328 1330 # Depending on the operation (read or write), the order might be
1329 1331 # reversed due to non-commutative transforms.
1330 1332 orderedflags = REVIDX_FLAGS_ORDER
1331 1333 if operation == 'write':
1332 1334 orderedflags = reversed(orderedflags)
1333 1335
1334 1336 for flag in orderedflags:
1335 1337 # If a flagprocessor has been registered for a known flag, apply the
1336 1338 # related operation transform and update result tuple.
1337 1339 if flag & flags:
1338 1340 vhash = True
1339 1341
1340 1342 if flag not in _flagprocessors:
1341 1343 message = _("missing processor for flag '%#x'") % (flag)
1342 1344 raise RevlogError(message)
1343 1345
1344 1346 processor = _flagprocessors[flag]
1345 1347 if processor is not None:
1346 1348 readtransform, writetransform, rawtransform = processor
1347 1349
1348 1350 if raw:
1349 1351 vhash = rawtransform(self, text)
1350 1352 elif operation == 'read':
1351 1353 text, vhash = readtransform(self, text)
1352 1354 else: # write operation
1353 1355 text, vhash = writetransform(self, text)
1354 1356 validatehash = validatehash and vhash
1355 1357
1356 1358 return text, validatehash
1357 1359
1358 1360 def checkhash(self, text, node, p1=None, p2=None, rev=None):
1359 1361 """Check node hash integrity.
1360 1362
1361 1363 Available as a function so that subclasses can extend hash mismatch
1362 1364 behaviors as needed.
1363 1365 """
1364 1366 if p1 is None and p2 is None:
1365 1367 p1, p2 = self.parents(node)
1366 1368 if node != self.hash(text, p1, p2):
1367 1369 revornode = rev
1368 1370 if revornode is None:
1369 1371 revornode = templatefilters.short(hex(node))
1370 1372 raise RevlogError(_("integrity check failed on %s:%s")
1371 1373 % (self.indexfile, revornode))
1372 1374
1373 1375 def checkinlinesize(self, tr, fp=None):
1374 1376 """Check if the revlog is too big for inline and convert if so.
1375 1377
1376 1378 This should be called after revisions are added to the revlog. If the
1377 1379 revlog has grown too large to be an inline revlog, it will convert it
1378 1380 to use multiple index and data files.
1379 1381 """
1380 1382 if not self._inline or (self.start(-2) + self.length(-2)) < _maxinline:
1381 1383 return
1382 1384
1383 1385 trinfo = tr.find(self.indexfile)
1384 1386 if trinfo is None:
1385 1387 raise RevlogError(_("%s not found in the transaction")
1386 1388 % self.indexfile)
1387 1389
1388 1390 trindex = trinfo[2]
1389 1391 if trindex is not None:
1390 1392 dataoff = self.start(trindex)
1391 1393 else:
1392 1394 # revlog was stripped at start of transaction, use all leftover data
1393 1395 trindex = len(self) - 1
1394 1396 dataoff = self.end(-2)
1395 1397
1396 1398 tr.add(self.datafile, dataoff)
1397 1399
1398 1400 if fp:
1399 1401 fp.flush()
1400 1402 fp.close()
1401 1403
1402 1404 df = self.opener(self.datafile, 'w')
1403 1405 try:
1404 1406 for r in self:
1405 1407 df.write(self._chunkraw(r, r)[1])
1406 1408 finally:
1407 1409 df.close()
1408 1410
1409 1411 fp = self.opener(self.indexfile, 'w', atomictemp=True,
1410 1412 checkambig=self._checkambig)
1411 1413 self.version &= ~(REVLOGNGINLINEDATA)
1412 1414 self._inline = False
1413 1415 for i in self:
1414 1416 e = self._io.packentry(self.index[i], self.node, self.version, i)
1415 1417 fp.write(e)
1416 1418
1417 1419 # if we don't call close, the temp file will never replace the
1418 1420 # real index
1419 1421 fp.close()
1420 1422
1421 1423 tr.replace(self.indexfile, trindex * self._io.size)
1422 1424 self._chunkclear()
1423 1425
1424 1426 def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
1425 1427 node=None, flags=REVIDX_DEFAULT_FLAGS):
1426 1428 """add a revision to the log
1427 1429
1428 1430 text - the revision data to add
1429 1431 transaction - the transaction object used for rollback
1430 1432 link - the linkrev data to add
1431 1433 p1, p2 - the parent nodeids of the revision
1432 1434 cachedelta - an optional precomputed delta
1433 1435 node - nodeid of revision; typically node is not specified, and it is
1434 1436 computed by default as hash(text, p1, p2), however subclasses might
1435 1437 use different hashing method (and override checkhash() in such case)
1436 1438 flags - the known flags to set on the revision
1437 1439 """
1438 1440 if link == nullrev:
1439 1441 raise RevlogError(_("attempted to add linkrev -1 to %s")
1440 1442 % self.indexfile)
1441 1443
1442 1444 if flags:
1443 1445 node = node or self.hash(text, p1, p2)
1444 1446
1445 1447 newtext, validatehash = self._processflags(text, flags, 'write')
1446 1448
1447 1449 # If the flag processor modifies the revision data, ignore any provided
1448 1450 # cachedelta.
1449 1451 if newtext != text:
1450 1452 cachedelta = None
1451 1453 text = newtext
1452 1454
1453 1455 if len(text) > _maxentrysize:
1454 1456 raise RevlogError(
1455 1457 _("%s: size of %d bytes exceeds maximum revlog storage of 2GiB")
1456 1458 % (self.indexfile, len(text)))
1457 1459
1458 1460 node = node or self.hash(text, p1, p2)
1459 1461 if node in self.nodemap:
1460 1462 return node
1461 1463
1462 1464 if validatehash:
1463 1465 self.checkhash(text, node, p1=p1, p2=p2)
1464 1466
1465 1467 dfh = None
1466 1468 if not self._inline:
1467 1469 dfh = self.opener(self.datafile, "a+")
1468 1470 ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
1469 1471 try:
1470 1472 return self._addrevision(node, text, transaction, link, p1, p2,
1471 1473 flags, cachedelta, ifh, dfh)
1472 1474 finally:
1473 1475 if dfh:
1474 1476 dfh.close()
1475 1477 ifh.close()
1476 1478
1477 1479 def compress(self, data):
1478 1480 """Generate a possibly-compressed representation of data."""
1479 1481 if not data:
1480 1482 return '', data
1481 1483
1482 1484 compressed = self._compressor.compress(data)
1483 1485
1484 1486 if compressed:
1485 1487 # The revlog compressor added the header in the returned data.
1486 1488 return '', compressed
1487 1489
1488 1490 if data[0] == '\0':
1489 1491 return '', data
1490 1492 return 'u', data
1491 1493
1492 1494 def decompress(self, data):
1493 1495 """Decompress a revlog chunk.
1494 1496
1495 1497 The chunk is expected to begin with a header identifying the
1496 1498 format type so it can be routed to an appropriate decompressor.
1497 1499 """
1498 1500 if not data:
1499 1501 return data
1500 1502
1501 1503 # Revlogs are read much more frequently than they are written and many
1502 1504 # chunks only take microseconds to decompress, so performance is
1503 1505 # important here.
1504 1506 #
1505 1507 # We can make a few assumptions about revlogs:
1506 1508 #
1507 1509 # 1) the majority of chunks will be compressed (as opposed to inline
1508 1510 # raw data).
1509 1511 # 2) decompressing *any* data will likely by at least 10x slower than
1510 1512 # returning raw inline data.
1511 1513 # 3) we want to prioritize common and officially supported compression
1512 1514 # engines
1513 1515 #
1514 1516 # It follows that we want to optimize for "decompress compressed data
1515 1517 # when encoded with common and officially supported compression engines"
1516 1518 # case over "raw data" and "data encoded by less common or non-official
1517 1519 # compression engines." That is why we have the inline lookup first
1518 1520 # followed by the compengines lookup.
1519 1521 #
1520 1522 # According to `hg perfrevlogchunks`, this is ~0.5% faster for zlib
1521 1523 # compressed chunks. And this matters for changelog and manifest reads.
1522 1524 t = data[0]
1523 1525
1524 1526 if t == 'x':
1525 1527 try:
1526 1528 return _zlibdecompress(data)
1527 1529 except zlib.error as e:
1528 1530 raise RevlogError(_('revlog decompress error: %s') % str(e))
1529 1531 # '\0' is more common than 'u' so it goes first.
1530 1532 elif t == '\0':
1531 1533 return data
1532 1534 elif t == 'u':
1533 1535 return util.buffer(data, 1)
1534 1536
1535 1537 try:
1536 1538 compressor = self._decompressors[t]
1537 1539 except KeyError:
1538 1540 try:
1539 1541 engine = util.compengines.forrevlogheader(t)
1540 1542 compressor = engine.revlogcompressor()
1541 1543 self._decompressors[t] = compressor
1542 1544 except KeyError:
1543 1545 raise RevlogError(_('unknown compression type %r') % t)
1544 1546
1545 1547 return compressor.decompress(data)
1546 1548
1547 1549 def _isgooddelta(self, d, textlen):
1548 1550 """Returns True if the given delta is good. Good means that it is within
1549 1551 the disk span, disk size, and chain length bounds that we know to be
1550 1552 performant."""
1551 1553 if d is None:
1552 1554 return False
1553 1555
1554 1556 # - 'dist' is the distance from the base revision -- bounding it limits
1555 1557 # the amount of I/O we need to do.
1556 1558 # - 'compresseddeltalen' is the sum of the total size of deltas we need
1557 1559 # to apply -- bounding it limits the amount of CPU we consume.
1558 1560 dist, l, data, base, chainbase, chainlen, compresseddeltalen = d
1559 1561 if (dist > textlen * 4 or l > textlen or
1560 1562 compresseddeltalen > textlen * 2 or
1561 1563 (self._maxchainlen and chainlen > self._maxchainlen)):
1562 1564 return False
1563 1565
1564 1566 return True
1565 1567
1566 1568 def _addrevision(self, node, text, transaction, link, p1, p2, flags,
1567 1569 cachedelta, ifh, dfh, alwayscache=False, raw=False):
1568 1570 """internal function to add revisions to the log
1569 1571
1570 1572 see addrevision for argument descriptions.
1571 1573 invariants:
1572 1574 - text is optional (can be None); if not set, cachedelta must be set.
1573 1575 if both are set, they must correspond to each other.
1574 1576 - raw is optional; if set to True, it indicates the revision data is to
1575 1577 be treated by _processflags() as raw. It is usually set by changegroup
1576 1578 generation and debug commands.
1577 1579 """
1578 1580 btext = [text]
1579 1581 def buildtext():
1580 1582 if btext[0] is not None:
1581 1583 return btext[0]
1582 1584 baserev = cachedelta[0]
1583 1585 delta = cachedelta[1]
1584 1586 # special case deltas which replace entire base; no need to decode
1585 1587 # base revision. this neatly avoids censored bases, which throw when
1586 1588 # they're decoded.
1587 1589 hlen = struct.calcsize(">lll")
1588 1590 if delta[:hlen] == mdiff.replacediffheader(self.rawsize(baserev),
1589 1591 len(delta) - hlen):
1590 1592 btext[0] = delta[hlen:]
1591 1593 else:
1592 1594 if self._inline:
1593 1595 fh = ifh
1594 1596 else:
1595 1597 fh = dfh
1596 1598 basetext = self.revision(self.node(baserev), _df=fh, raw=raw)
1597 1599 btext[0] = mdiff.patch(basetext, delta)
1598 1600
1599 1601 try:
1600 1602 res = self._processflags(btext[0], flags, 'read', raw=raw)
1601 1603 btext[0], validatehash = res
1602 1604 if validatehash:
1603 1605 self.checkhash(btext[0], node, p1=p1, p2=p2)
1604 1606 if flags & REVIDX_ISCENSORED:
1605 1607 raise RevlogError(_('node %s is not censored') % node)
1606 1608 except CensoredNodeError:
1607 1609 # must pass the censored index flag to add censored revisions
1608 1610 if not flags & REVIDX_ISCENSORED:
1609 1611 raise
1610 1612 return btext[0]
1611 1613
1612 1614 def builddelta(rev):
1613 1615 # can we use the cached delta?
1614 1616 if cachedelta and cachedelta[0] == rev:
1615 1617 delta = cachedelta[1]
1616 1618 else:
1617 1619 t = buildtext()
1618 1620 if self.iscensored(rev):
1619 1621 # deltas based on a censored revision must replace the
1620 1622 # full content in one patch, so delta works everywhere
1621 1623 header = mdiff.replacediffheader(self.rawsize(rev), len(t))
1622 1624 delta = header + t
1623 1625 else:
1624 1626 if self._inline:
1625 1627 fh = ifh
1626 1628 else:
1627 1629 fh = dfh
1628 1630 ptext = self.revision(self.node(rev), _df=fh)
1629 1631 delta = mdiff.textdiff(ptext, t)
1630 1632 header, data = self.compress(delta)
1631 1633 deltalen = len(header) + len(data)
1632 1634 chainbase = self.chainbase(rev)
1633 1635 dist = deltalen + offset - self.start(chainbase)
1634 1636 if self._generaldelta:
1635 1637 base = rev
1636 1638 else:
1637 1639 base = chainbase
1638 1640 chainlen, compresseddeltalen = self._chaininfo(rev)
1639 1641 chainlen += 1
1640 1642 compresseddeltalen += deltalen
1641 1643 return (dist, deltalen, (header, data), base,
1642 1644 chainbase, chainlen, compresseddeltalen)
1643 1645
1644 1646 curr = len(self)
1645 1647 prev = curr - 1
1646 1648 offset = self.end(prev)
1647 1649 delta = None
1648 1650 p1r, p2r = self.rev(p1), self.rev(p2)
1649 1651
1650 1652 # full versions are inserted when the needed deltas
1651 1653 # become comparable to the uncompressed text
1652 1654 if text is None:
1653 1655 textlen = mdiff.patchedsize(self.rawsize(cachedelta[0]),
1654 1656 cachedelta[1])
1655 1657 else:
1656 1658 textlen = len(text)
1657 1659
1658 1660 # should we try to build a delta?
1659 1661 if prev != nullrev and self.storedeltachains:
1660 1662 tested = set()
1661 1663 # This condition is true most of the time when processing
1662 1664 # changegroup data into a generaldelta repo. The only time it
1663 1665 # isn't true is if this is the first revision in a delta chain
1664 1666 # or if ``format.generaldelta=true`` disabled ``lazydeltabase``.
1665 1667 if cachedelta and self._generaldelta and self._lazydeltabase:
1666 1668 # Assume what we received from the server is a good choice
1667 1669 # build delta will reuse the cache
1668 1670 candidatedelta = builddelta(cachedelta[0])
1669 1671 tested.add(cachedelta[0])
1670 1672 if self._isgooddelta(candidatedelta, textlen):
1671 1673 delta = candidatedelta
1672 1674 if delta is None and self._generaldelta:
1673 1675 # exclude already lazy tested base if any
1674 1676 parents = [p for p in (p1r, p2r)
1675 1677 if p != nullrev and p not in tested]
1676 1678 if parents and not self._aggressivemergedeltas:
1677 1679 # Pick whichever parent is closer to us (to minimize the
1678 1680 # chance of having to build a fulltext).
1679 1681 parents = [max(parents)]
1680 1682 tested.update(parents)
1681 1683 pdeltas = []
1682 1684 for p in parents:
1683 1685 pd = builddelta(p)
1684 1686 if self._isgooddelta(pd, textlen):
1685 1687 pdeltas.append(pd)
1686 1688 if pdeltas:
1687 1689 delta = min(pdeltas, key=lambda x: x[1])
1688 1690 if delta is None and prev not in tested:
1689 1691 # other approach failed try against prev to hopefully save us a
1690 1692 # fulltext.
1691 1693 candidatedelta = builddelta(prev)
1692 1694 if self._isgooddelta(candidatedelta, textlen):
1693 1695 delta = candidatedelta
1694 1696 if delta is not None:
1695 1697 dist, l, data, base, chainbase, chainlen, compresseddeltalen = delta
1696 1698 else:
1697 1699 text = buildtext()
1698 1700 data = self.compress(text)
1699 1701 l = len(data[1]) + len(data[0])
1700 1702 base = chainbase = curr
1701 1703
1702 1704 e = (offset_type(offset, flags), l, textlen,
1703 1705 base, link, p1r, p2r, node)
1704 1706 self.index.insert(-1, e)
1705 1707 self.nodemap[node] = curr
1706 1708
1707 1709 entry = self._io.packentry(e, self.node, self.version, curr)
1708 1710 self._writeentry(transaction, ifh, dfh, entry, data, link, offset)
1709 1711
1710 1712 if alwayscache and text is None:
1711 1713 text = buildtext()
1712 1714
1713 1715 if type(text) == str: # only accept immutable objects
1714 1716 self._cache = (node, curr, text)
1715 1717 self._chainbasecache[curr] = chainbase
1716 1718 return node
1717 1719
1718 1720 def _writeentry(self, transaction, ifh, dfh, entry, data, link, offset):
1719 1721 # Files opened in a+ mode have inconsistent behavior on various
1720 1722 # platforms. Windows requires that a file positioning call be made
1721 1723 # when the file handle transitions between reads and writes. See
1722 1724 # 3686fa2b8eee and the mixedfilemodewrapper in windows.py. On other
1723 1725 # platforms, Python or the platform itself can be buggy. Some versions
1724 1726 # of Solaris have been observed to not append at the end of the file
1725 1727 # if the file was seeked to before the end. See issue4943 for more.
1726 1728 #
1727 1729 # We work around this issue by inserting a seek() before writing.
1728 1730 # Note: This is likely not necessary on Python 3.
1729 1731 ifh.seek(0, os.SEEK_END)
1730 1732 if dfh:
1731 1733 dfh.seek(0, os.SEEK_END)
1732 1734
1733 1735 curr = len(self) - 1
1734 1736 if not self._inline:
1735 1737 transaction.add(self.datafile, offset)
1736 1738 transaction.add(self.indexfile, curr * len(entry))
1737 1739 if data[0]:
1738 1740 dfh.write(data[0])
1739 1741 dfh.write(data[1])
1740 1742 ifh.write(entry)
1741 1743 else:
1742 1744 offset += curr * self._io.size
1743 1745 transaction.add(self.indexfile, offset, curr)
1744 1746 ifh.write(entry)
1745 1747 ifh.write(data[0])
1746 1748 ifh.write(data[1])
1747 1749 self.checkinlinesize(transaction, ifh)
1748 1750
1749 1751 def addgroup(self, cg, linkmapper, transaction, addrevisioncb=None):
1750 1752 """
1751 1753 add a delta group
1752 1754
1753 1755 given a set of deltas, add them to the revision log. the
1754 1756 first delta is against its parent, which should be in our
1755 1757 log, the rest are against the previous delta.
1756 1758
1757 1759 If ``addrevisioncb`` is defined, it will be called with arguments of
1758 1760 this revlog and the node that was added.
1759 1761 """
1760 1762
1761 1763 # track the base of the current delta log
1762 1764 content = []
1763 1765 node = None
1764 1766
1765 1767 r = len(self)
1766 1768 end = 0
1767 1769 if r:
1768 1770 end = self.end(r - 1)
1769 1771 ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
1770 1772 isize = r * self._io.size
1771 1773 if self._inline:
1772 1774 transaction.add(self.indexfile, end + isize, r)
1773 1775 dfh = None
1774 1776 else:
1775 1777 transaction.add(self.indexfile, isize, r)
1776 1778 transaction.add(self.datafile, end)
1777 1779 dfh = self.opener(self.datafile, "a+")
1778 1780 def flush():
1779 1781 if dfh:
1780 1782 dfh.flush()
1781 1783 ifh.flush()
1782 1784 try:
1783 1785 # loop through our set of deltas
1784 1786 chain = None
1785 1787 for chunkdata in iter(lambda: cg.deltachunk(chain), {}):
1786 1788 node = chunkdata['node']
1787 1789 p1 = chunkdata['p1']
1788 1790 p2 = chunkdata['p2']
1789 1791 cs = chunkdata['cs']
1790 1792 deltabase = chunkdata['deltabase']
1791 1793 delta = chunkdata['delta']
1792 1794 flags = chunkdata['flags'] or REVIDX_DEFAULT_FLAGS
1793 1795
1794 1796 content.append(node)
1795 1797
1796 1798 link = linkmapper(cs)
1797 1799 if node in self.nodemap:
1798 1800 # this can happen if two branches make the same change
1799 1801 chain = node
1800 1802 continue
1801 1803
1802 1804 for p in (p1, p2):
1803 1805 if p not in self.nodemap:
1804 1806 raise LookupError(p, self.indexfile,
1805 1807 _('unknown parent'))
1806 1808
1807 1809 if deltabase not in self.nodemap:
1808 1810 raise LookupError(deltabase, self.indexfile,
1809 1811 _('unknown delta base'))
1810 1812
1811 1813 baserev = self.rev(deltabase)
1812 1814
1813 1815 if baserev != nullrev and self.iscensored(baserev):
1814 1816 # if base is censored, delta must be full replacement in a
1815 1817 # single patch operation
1816 1818 hlen = struct.calcsize(">lll")
1817 1819 oldlen = self.rawsize(baserev)
1818 1820 newlen = len(delta) - hlen
1819 1821 if delta[:hlen] != mdiff.replacediffheader(oldlen, newlen):
1820 1822 raise error.CensoredBaseError(self.indexfile,
1821 1823 self.node(baserev))
1822 1824
1823 1825 if not flags and self._peek_iscensored(baserev, delta, flush):
1824 1826 flags |= REVIDX_ISCENSORED
1825 1827
1826 1828 # We assume consumers of addrevisioncb will want to retrieve
1827 1829 # the added revision, which will require a call to
1828 1830 # revision(). revision() will fast path if there is a cache
1829 1831 # hit. So, we tell _addrevision() to always cache in this case.
1830 1832 # We're only using addgroup() in the context of changegroup
1831 1833 # generation so the revision data can always be handled as raw
1832 1834 # by the flagprocessor.
1833 1835 chain = self._addrevision(node, None, transaction, link,
1834 1836 p1, p2, flags, (baserev, delta),
1835 1837 ifh, dfh,
1836 1838 alwayscache=bool(addrevisioncb),
1837 1839 raw=True)
1838 1840
1839 1841 if addrevisioncb:
1840 1842 addrevisioncb(self, chain)
1841 1843
1842 1844 if not dfh and not self._inline:
1843 1845 # addrevision switched from inline to conventional
1844 1846 # reopen the index
1845 1847 ifh.close()
1846 1848 dfh = self.opener(self.datafile, "a+")
1847 1849 ifh = self.opener(self.indexfile, "a+",
1848 1850 checkambig=self._checkambig)
1849 1851 finally:
1850 1852 if dfh:
1851 1853 dfh.close()
1852 1854 ifh.close()
1853 1855
1854 1856 return content
1855 1857
1856 1858 def iscensored(self, rev):
1857 1859 """Check if a file revision is censored."""
1858 1860 return False
1859 1861
1860 1862 def _peek_iscensored(self, baserev, delta, flush):
1861 1863 """Quickly check if a delta produces a censored revision."""
1862 1864 return False
1863 1865
1864 1866 def getstrippoint(self, minlink):
1865 1867 """find the minimum rev that must be stripped to strip the linkrev
1866 1868
1867 1869 Returns a tuple containing the minimum rev and a set of all revs that
1868 1870 have linkrevs that will be broken by this strip.
1869 1871 """
1870 1872 brokenrevs = set()
1871 1873 strippoint = len(self)
1872 1874
1873 1875 heads = {}
1874 1876 futurelargelinkrevs = set()
1875 1877 for head in self.headrevs():
1876 1878 headlinkrev = self.linkrev(head)
1877 1879 heads[head] = headlinkrev
1878 1880 if headlinkrev >= minlink:
1879 1881 futurelargelinkrevs.add(headlinkrev)
1880 1882
1881 1883 # This algorithm involves walking down the rev graph, starting at the
1882 1884 # heads. Since the revs are topologically sorted according to linkrev,
1883 1885 # once all head linkrevs are below the minlink, we know there are
1884 1886 # no more revs that could have a linkrev greater than minlink.
1885 1887 # So we can stop walking.
1886 1888 while futurelargelinkrevs:
1887 1889 strippoint -= 1
1888 1890 linkrev = heads.pop(strippoint)
1889 1891
1890 1892 if linkrev < minlink:
1891 1893 brokenrevs.add(strippoint)
1892 1894 else:
1893 1895 futurelargelinkrevs.remove(linkrev)
1894 1896
1895 1897 for p in self.parentrevs(strippoint):
1896 1898 if p != nullrev:
1897 1899 plinkrev = self.linkrev(p)
1898 1900 heads[p] = plinkrev
1899 1901 if plinkrev >= minlink:
1900 1902 futurelargelinkrevs.add(plinkrev)
1901 1903
1902 1904 return strippoint, brokenrevs
1903 1905
1904 1906 def strip(self, minlink, transaction):
1905 1907 """truncate the revlog on the first revision with a linkrev >= minlink
1906 1908
1907 1909 This function is called when we're stripping revision minlink and
1908 1910 its descendants from the repository.
1909 1911
1910 1912 We have to remove all revisions with linkrev >= minlink, because
1911 1913 the equivalent changelog revisions will be renumbered after the
1912 1914 strip.
1913 1915
1914 1916 So we truncate the revlog on the first of these revisions, and
1915 1917 trust that the caller has saved the revisions that shouldn't be
1916 1918 removed and that it'll re-add them after this truncation.
1917 1919 """
1918 1920 if len(self) == 0:
1919 1921 return
1920 1922
1921 1923 rev, _ = self.getstrippoint(minlink)
1922 1924 if rev == len(self):
1923 1925 return
1924 1926
1925 1927 # first truncate the files on disk
1926 1928 end = self.start(rev)
1927 1929 if not self._inline:
1928 1930 transaction.add(self.datafile, end)
1929 1931 end = rev * self._io.size
1930 1932 else:
1931 1933 end += rev * self._io.size
1932 1934
1933 1935 transaction.add(self.indexfile, end)
1934 1936
1935 1937 # then reset internal state in memory to forget those revisions
1936 1938 self._cache = None
1937 1939 self._chaininfocache = {}
1938 1940 self._chunkclear()
1939 1941 for x in xrange(rev, len(self)):
1940 1942 del self.nodemap[self.node(x)]
1941 1943
1942 1944 del self.index[rev:-1]
1943 1945
1944 1946 def checksize(self):
1945 1947 expected = 0
1946 1948 if len(self):
1947 1949 expected = max(0, self.end(len(self) - 1))
1948 1950
1949 1951 try:
1950 1952 f = self.opener(self.datafile)
1951 1953 f.seek(0, 2)
1952 1954 actual = f.tell()
1953 1955 f.close()
1954 1956 dd = actual - expected
1955 1957 except IOError as inst:
1956 1958 if inst.errno != errno.ENOENT:
1957 1959 raise
1958 1960 dd = 0
1959 1961
1960 1962 try:
1961 1963 f = self.opener(self.indexfile)
1962 1964 f.seek(0, 2)
1963 1965 actual = f.tell()
1964 1966 f.close()
1965 1967 s = self._io.size
1966 1968 i = max(0, actual // s)
1967 1969 di = actual - (i * s)
1968 1970 if self._inline:
1969 1971 databytes = 0
1970 1972 for r in self:
1971 1973 databytes += max(0, self.length(r))
1972 1974 dd = 0
1973 1975 di = actual - len(self) * s - databytes
1974 1976 except IOError as inst:
1975 1977 if inst.errno != errno.ENOENT:
1976 1978 raise
1977 1979 di = 0
1978 1980
1979 1981 return (dd, di)
1980 1982
1981 1983 def files(self):
1982 1984 res = [self.indexfile]
1983 1985 if not self._inline:
1984 1986 res.append(self.datafile)
1985 1987 return res
1986 1988
1987 1989 DELTAREUSEALWAYS = 'always'
1988 1990 DELTAREUSESAMEREVS = 'samerevs'
1989 1991 DELTAREUSENEVER = 'never'
1990 1992
1991 1993 DELTAREUSEALL = set(['always', 'samerevs', 'never'])
1992 1994
1993 1995 def clone(self, tr, destrevlog, addrevisioncb=None,
1994 1996 deltareuse=DELTAREUSESAMEREVS, aggressivemergedeltas=None):
1995 1997 """Copy this revlog to another, possibly with format changes.
1996 1998
1997 1999 The destination revlog will contain the same revisions and nodes.
1998 2000 However, it may not be bit-for-bit identical due to e.g. delta encoding
1999 2001 differences.
2000 2002
2001 2003 The ``deltareuse`` argument control how deltas from the existing revlog
2002 2004 are preserved in the destination revlog. The argument can have the
2003 2005 following values:
2004 2006
2005 2007 DELTAREUSEALWAYS
2006 2008 Deltas will always be reused (if possible), even if the destination
2007 2009 revlog would not select the same revisions for the delta. This is the
2008 2010 fastest mode of operation.
2009 2011 DELTAREUSESAMEREVS
2010 2012 Deltas will be reused if the destination revlog would pick the same
2011 2013 revisions for the delta. This mode strikes a balance between speed
2012 2014 and optimization.
2013 2015 DELTAREUSENEVER
2014 2016 Deltas will never be reused. This is the slowest mode of execution.
2015 2017 This mode can be used to recompute deltas (e.g. if the diff/delta
2016 2018 algorithm changes).
2017 2019
2018 2020 Delta computation can be slow, so the choice of delta reuse policy can
2019 2021 significantly affect run time.
2020 2022
2021 2023 The default policy (``DELTAREUSESAMEREVS``) strikes a balance between
2022 2024 two extremes. Deltas will be reused if they are appropriate. But if the
2023 2025 delta could choose a better revision, it will do so. This means if you
2024 2026 are converting a non-generaldelta revlog to a generaldelta revlog,
2025 2027 deltas will be recomputed if the delta's parent isn't a parent of the
2026 2028 revision.
2027 2029
2028 2030 In addition to the delta policy, the ``aggressivemergedeltas`` argument
2029 2031 controls whether to compute deltas against both parents for merges.
2030 2032 By default, the current default is used.
2031 2033 """
2032 2034 if deltareuse not in self.DELTAREUSEALL:
2033 2035 raise ValueError(_('value for deltareuse invalid: %s') % deltareuse)
2034 2036
2035 2037 if len(destrevlog):
2036 2038 raise ValueError(_('destination revlog is not empty'))
2037 2039
2038 2040 if getattr(self, 'filteredrevs', None):
2039 2041 raise ValueError(_('source revlog has filtered revisions'))
2040 2042 if getattr(destrevlog, 'filteredrevs', None):
2041 2043 raise ValueError(_('destination revlog has filtered revisions'))
2042 2044
2043 2045 # lazydeltabase controls whether to reuse a cached delta, if possible.
2044 2046 oldlazydeltabase = destrevlog._lazydeltabase
2045 2047 oldamd = destrevlog._aggressivemergedeltas
2046 2048
2047 2049 try:
2048 2050 if deltareuse == self.DELTAREUSEALWAYS:
2049 2051 destrevlog._lazydeltabase = True
2050 2052 elif deltareuse == self.DELTAREUSESAMEREVS:
2051 2053 destrevlog._lazydeltabase = False
2052 2054
2053 2055 destrevlog._aggressivemergedeltas = aggressivemergedeltas or oldamd
2054 2056
2055 2057 populatecachedelta = deltareuse in (self.DELTAREUSEALWAYS,
2056 2058 self.DELTAREUSESAMEREVS)
2057 2059
2058 2060 index = self.index
2059 2061 for rev in self:
2060 2062 entry = index[rev]
2061 2063
2062 2064 # Some classes override linkrev to take filtered revs into
2063 2065 # account. Use raw entry from index.
2064 2066 flags = entry[0] & 0xffff
2065 2067 linkrev = entry[4]
2066 2068 p1 = index[entry[5]][7]
2067 2069 p2 = index[entry[6]][7]
2068 2070 node = entry[7]
2069 2071
2070 2072 # (Possibly) reuse the delta from the revlog if allowed and
2071 2073 # the revlog chunk is a delta.
2072 2074 cachedelta = None
2073 2075 text = None
2074 2076 if populatecachedelta:
2075 2077 dp = self.deltaparent(rev)
2076 2078 if dp != nullrev:
2077 2079 cachedelta = (dp, str(self._chunk(rev)))
2078 2080
2079 2081 if not cachedelta:
2080 2082 text = self.revision(rev)
2081 2083
2082 2084 ifh = destrevlog.opener(destrevlog.indexfile, 'a+',
2083 2085 checkambig=False)
2084 2086 dfh = None
2085 2087 if not destrevlog._inline:
2086 2088 dfh = destrevlog.opener(destrevlog.datafile, 'a+')
2087 2089 try:
2088 2090 destrevlog._addrevision(node, text, tr, linkrev, p1, p2,
2089 2091 flags, cachedelta, ifh, dfh)
2090 2092 finally:
2091 2093 if dfh:
2092 2094 dfh.close()
2093 2095 ifh.close()
2094 2096
2095 2097 if addrevisioncb:
2096 2098 addrevisioncb(self, rev, node)
2097 2099 finally:
2098 2100 destrevlog._lazydeltabase = oldlazydeltabase
2099 2101 destrevlog._aggressivemergedeltas = oldamd
General Comments 0
You need to be logged in to leave comments. Login now