##// END OF EJS Templates
revlog: remove some revlogNG terminology...
Gregory Szorc -
r32392:d47b6236 default
parent child Browse files
Show More
@@ -1,215 +1,215
1 1 Revision logs - or *revlogs* - are an append only data structure for
2 2 storing discrete entries, or *revisions*. They are the primary storage
3 3 mechanism of repository data.
4 4
5 5 Revlogs effectively model a directed acyclic graph (DAG). Each node
6 6 has edges to 1 or 2 *parent* nodes. Each node contains metadata and
7 7 the raw value for that node.
8 8
9 9 Revlogs consist of entries which have metadata and revision data.
10 10 Metadata includes the hash of the revision's content, sizes, and
11 11 links to its *parent* entries. The collective metadata is referred
12 12 to as the *index* and the revision data is the *data*.
13 13
14 14 Revision data is stored as a series of compressed deltas against previous
15 15 revisions.
16 16
17 17 Revlogs are written in an append-only fashion. We never need to rewrite
18 18 a file to insert nor do we need to remove data. Rolling back in-progress
19 19 writes can be performed by truncating files. Read locks can be avoided
20 20 using simple techniques. This means that references to other data in
21 21 the same revlog *always* refer to a previous entry.
22 22
23 23 Revlogs can be modeled as 0-indexed arrays. The first revision is
24 24 revision #0 and the second is revision #1. The revision -1 is typically
25 25 used to mean *does not exist* or *not defined*.
26 26
27 27 File Format
28 28 ===========
29 29
30 30 A revlog begins with a 32-bit big endian integer holding version info
31 31 and feature flags. This integer is shared with the first revision
32 32 entry.
33 33
34 34 This integer is logically divided into 2 16-bit shorts. The least
35 35 significant half of the integer is the format/version short. The other
36 36 short holds feature flags that dictate behavior of the revlog.
37 37
38 38 Only 1 bit of the format/version short is currently used. Remaining
39 39 bits are reserved for future use.
40 40
41 41 The following values for the format/version short are defined:
42 42
43 43 0
44 44 The original revlog version.
45 45 1
46 46 RevlogNG (*next generation*). It replaced version 0 when it was
47 47 implemented in 2006.
48 48
49 49 The feature flags short consists of bit flags. Where 0 is the least
50 50 significant bit, the following bit offsets define flags:
51 51
52 52 0
53 53 Store revision data inline.
54 54 1
55 55 Generaldelta encoding.
56 56
57 57 2-15
58 58 Reserved for future use.
59 59
60 60 The following header values are common:
61 61
62 62 00 00 00 01
63 RevlogNG
63 v1
64 64 00 01 00 01
65 RevlogNG + inline
65 v1 + inline
66 66 00 02 00 01
67 RevlogNG + generaldelta
67 v1 + generaldelta
68 68 00 03 00 01
69 RevlogNG + inline + generaldelta
69 v1 + inline + generaldelta
70 70
71 71 Following the 32-bit header is the remainder of the first index entry.
72 72 Following that are remaining *index* data. Inlined revision data is
73 73 possibly located between index entries. More on this layout is described
74 74 below.
75 75
76 RevlogNG Format
77 ===============
76 Version 1 Format
77 ================
78 78
79 RevlogNG (version 1) begins with an index describing the revisions in
79 Version 1 (RevlogNG) begins with an index describing the revisions in
80 80 the revlog. If the ``inline`` flag is set, revision data is stored inline,
81 81 or between index entries (as opposed to in a separate container).
82 82
83 83 Each index entry is 64 bytes. The byte layout of each entry is as
84 84 follows, with byte 0 being the first byte (all data stored as big endian):
85 85
86 86 0-3 (4 bytes) (rev 0 only)
87 87 Revlog header
88 88
89 89 0-5 (6 bytes)
90 90 Absolute offset of revision data from beginning of revlog.
91 91
92 92 6-7 (2 bytes)
93 93 Bit flags impacting revision behavior. The following bit offsets define:
94 94
95 95 0: REVIDX_ISCENSORED revision has censor metadata, must be verified.
96 96
97 97 1: REVIDX_ELLIPSIS revision hash does not match its data. Used by
98 98 narrowhg
99 99
100 100 2: REVIDX_EXTSTORED revision data is stored externally.
101 101
102 102 8-11 (4 bytes)
103 103 Compressed length of revision data / chunk as stored in revlog.
104 104
105 105 12-15 (4 bytes)
106 106 Uncompressed length of revision data. This is the size of the full
107 107 revision data, not the size of the chunk post decompression.
108 108
109 109 16-19 (4 bytes)
110 110 Base or previous revision this revision's delta was produced against.
111 111 This revision holds full text (as opposed to a delta) if it points to
112 112 itself. For generaldelta repos, this is the previous revision in the
113 113 delta chain. For non-generaldelta repos, this is the base or first
114 114 revision in the delta chain.
115 115
116 116 20-23 (4 bytes)
117 117 A revision this revision is *linked* to. This allows a revision in
118 118 one revlog to be forever associated with a revision in another
119 119 revlog. For example, a file's revlog may point to the changelog
120 120 revision that introduced it.
121 121
122 122 24-27 (4 bytes)
123 123 Revision of 1st parent. -1 indicates no parent.
124 124
125 125 28-31 (4 bytes)
126 126 Revision of 2nd parent. -1 indicates no 2nd parent.
127 127
128 128 32-63 (32 bytes)
129 129 Hash of revision's full text. Currently, SHA-1 is used and only
130 130 the first 20 bytes of this field are used. The rest of the bytes
131 131 are ignored and should be stored as \0.
132 132
133 133 If inline revision data is being stored, the compressed revision data
134 134 (of length from bytes offset 8-11 from the index entry) immediately
135 135 follows the index entry. There is no header on the revision data. There
136 136 is no padding between it and the index entries before and after.
137 137
138 138 If revision data is not inline, then raw revision data is stored in a
139 139 separate byte container. The offsets from bytes 0-5 and the compressed
140 140 length from bytes 8-11 define how to access this data.
141 141
142 142 The first 4 bytes of the revlog are shared between the revlog header
143 143 and the 6 byte absolute offset field from the first revlog entry.
144 144
145 145 Delta Chains
146 146 ============
147 147
148 148 Revision data is encoded as a chain of *chunks*. Each chain begins with
149 149 the compressed original full text for that revision. Each subsequent
150 150 *chunk* is a *delta* against the previous revision. We therefore call
151 151 these chains of chunks/deltas *delta chains*.
152 152
153 153 The full text for a revision is reconstructed by loading the original
154 154 full text for the base revision of a *delta chain* and then applying
155 155 *deltas* until the target revision is reconstructed.
156 156
157 157 *Delta chains* are limited in length so lookup time is bound. They are
158 158 limited to ~2x the length of the revision's data. The linear distance
159 159 between the base chunk and the final chunk is also limited so the
160 160 amount of read I/O to load all chunks in the delta chain is bound.
161 161
162 162 Deltas and delta chains are either computed against the previous
163 163 revision in the revlog or another revision (almost certainly one of
164 164 the parents of the revision). Historically, deltas were computed against
165 165 the previous revision. The *generaldelta* revlog feature flag (enabled
166 166 by default in Mercurial 3.7) activates the mode where deltas are
167 167 computed against an arbitrary revision (almost certainly a parent revision).
168 168
169 169 File Storage
170 170 ============
171 171
172 172 Revlogs logically consist of an index (metadata of entries) and
173 173 revision data. This data may be stored together in a single file or in
174 174 separate files. The mechanism used is indicated by the ``inline`` feature
175 175 flag on the revlog.
176 176
177 177 Mercurial's behavior is to use inline storage until a revlog reaches a
178 178 certain size, at which point it will be converted to non-inline. The
179 179 reason there is a size limit on inline storage is to establish an upper
180 180 bound on how much data must be read to load the index. It would be a waste
181 181 to read tens or hundreds of extra megabytes of data just to access the
182 182 index data.
183 183
184 184 The actual layout of revlog files on disk is governed by the repository's
185 185 *store format*. Typically, a ``.i`` file represents the index revlog
186 186 (possibly containing inline data) and a ``.d`` file holds the revision data.
187 187
188 188 Revision Entries
189 189 ================
190 190
191 191 Revision entries consist of an optional 1 byte header followed by an
192 192 encoding of the revision data. The headers are as follows:
193 193
194 194 \0 (0x00)
195 195 Revision data is the entirety of the entry, including this header.
196 196 u (0x75)
197 197 Raw revision data follows.
198 198 x (0x78)
199 199 zlib (RFC 1950) data.
200 200
201 201 The 0x78 value is actually the first byte of the zlib header (CMF byte).
202 202
203 203 Hash Computation
204 204 ================
205 205
206 206 The hash of the revision is stored in the index and is used both as a primary
207 207 key and for data integrity verification.
208 208
209 209 Currently, SHA-1 is the only supported hashing algorithm. To obtain the SHA-1
210 210 hash of a revision:
211 211
212 212 1. Hash the parent nodes
213 213 2. Hash the fulltext of the revision
214 214
215 215 The 20 byte node ids of the parents are fed into the hasher in ascending order.
@@ -1,2156 +1,2156
1 1 # revlog.py - storage back-end for mercurial
2 2 #
3 3 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
4 4 #
5 5 # This software may be used and distributed according to the terms of the
6 6 # GNU General Public License version 2 or any later version.
7 7
8 8 """Storage back-end for Mercurial.
9 9
10 10 This provides efficient delta storage with O(1) retrieve and append
11 11 and O(changes) merge between branches.
12 12 """
13 13
14 14 from __future__ import absolute_import
15 15
16 16 import collections
17 17 import errno
18 18 import hashlib
19 19 import os
20 20 import struct
21 21 import zlib
22 22
23 23 # import stuff from node for others to import from revlog
24 24 from .node import (
25 25 bin,
26 26 hex,
27 27 nullid,
28 28 nullrev,
29 29 )
30 30 from .i18n import _
31 31 from . import (
32 32 ancestor,
33 33 error,
34 34 mdiff,
35 35 policy,
36 36 pycompat,
37 37 templatefilters,
38 38 util,
39 39 )
40 40
41 41 parsers = policy.importmod(r'parsers')
42 42
43 43 _pack = struct.pack
44 44 _unpack = struct.unpack
45 45 # Aliased for performance.
46 46 _zlibdecompress = zlib.decompress
47 47
48 48 # revlog header flags
49 49 REVLOGV0 = 0
50 50 REVLOGV1 = 1
51 51 FLAG_INLINE_DATA = (1 << 16)
52 52 FLAG_GENERALDELTA = (1 << 17)
53 53 REVLOG_DEFAULT_FLAGS = FLAG_INLINE_DATA
54 54 REVLOG_DEFAULT_FORMAT = REVLOGV1
55 55 REVLOG_DEFAULT_VERSION = REVLOG_DEFAULT_FORMAT | REVLOG_DEFAULT_FLAGS
56 56 REVLOGV1_FLAGS = FLAG_INLINE_DATA | FLAG_GENERALDELTA
57 57
58 58 # revlog index flags
59 59 REVIDX_ISCENSORED = (1 << 15) # revision has censor metadata, must be verified
60 60 REVIDX_ELLIPSIS = (1 << 14) # revision hash does not match data (narrowhg)
61 61 REVIDX_EXTSTORED = (1 << 13) # revision data is stored externally
62 62 REVIDX_DEFAULT_FLAGS = 0
63 63 # stable order in which flags need to be processed and their processors applied
64 64 REVIDX_FLAGS_ORDER = [
65 65 REVIDX_ISCENSORED,
66 66 REVIDX_ELLIPSIS,
67 67 REVIDX_EXTSTORED,
68 68 ]
69 69 REVIDX_KNOWN_FLAGS = util.bitsfrom(REVIDX_FLAGS_ORDER)
70 70
71 71 # max size of revlog with inline data
72 72 _maxinline = 131072
73 73 _chunksize = 1048576
74 74
75 75 RevlogError = error.RevlogError
76 76 LookupError = error.LookupError
77 77 CensoredNodeError = error.CensoredNodeError
78 78 ProgrammingError = error.ProgrammingError
79 79
80 80 # Store flag processors (cf. 'addflagprocessor()' to register)
81 81 _flagprocessors = {
82 82 REVIDX_ISCENSORED: None,
83 83 }
84 84
85 85 def addflagprocessor(flag, processor):
86 86 """Register a flag processor on a revision data flag.
87 87
88 88 Invariant:
89 89 - Flags need to be defined in REVIDX_KNOWN_FLAGS and REVIDX_FLAGS_ORDER.
90 90 - Only one flag processor can be registered on a specific flag.
91 91 - flagprocessors must be 3-tuples of functions (read, write, raw) with the
92 92 following signatures:
93 93 - (read) f(self, rawtext) -> text, bool
94 94 - (write) f(self, text) -> rawtext, bool
95 95 - (raw) f(self, rawtext) -> bool
96 96 "text" is presented to the user. "rawtext" is stored in revlog data, not
97 97 directly visible to the user.
98 98 The boolean returned by these transforms is used to determine whether
99 99 the returned text can be used for hash integrity checking. For example,
100 100 if "write" returns False, then "text" is used to generate hash. If
101 101 "write" returns True, that basically means "rawtext" returned by "write"
102 102 should be used to generate hash. Usually, "write" and "read" return
103 103 different booleans. And "raw" returns a same boolean as "write".
104 104
105 105 Note: The 'raw' transform is used for changegroup generation and in some
106 106 debug commands. In this case the transform only indicates whether the
107 107 contents can be used for hash integrity checks.
108 108 """
109 109 if not flag & REVIDX_KNOWN_FLAGS:
110 110 msg = _("cannot register processor on unknown flag '%#x'.") % (flag)
111 111 raise ProgrammingError(msg)
112 112 if flag not in REVIDX_FLAGS_ORDER:
113 113 msg = _("flag '%#x' undefined in REVIDX_FLAGS_ORDER.") % (flag)
114 114 raise ProgrammingError(msg)
115 115 if flag in _flagprocessors:
116 116 msg = _("cannot register multiple processors on flag '%#x'.") % (flag)
117 117 raise error.Abort(msg)
118 118 _flagprocessors[flag] = processor
119 119
120 120 def getoffset(q):
121 121 return int(q >> 16)
122 122
123 123 def gettype(q):
124 124 return int(q & 0xFFFF)
125 125
126 126 def offset_type(offset, type):
127 127 if (type & ~REVIDX_KNOWN_FLAGS) != 0:
128 128 raise ValueError('unknown revlog index flags')
129 129 return int(int(offset) << 16 | type)
130 130
131 131 _nullhash = hashlib.sha1(nullid)
132 132
133 133 def hash(text, p1, p2):
134 134 """generate a hash from the given text and its parent hashes
135 135
136 136 This hash combines both the current file contents and its history
137 137 in a manner that makes it easy to distinguish nodes with the same
138 138 content in the revision graph.
139 139 """
140 140 # As of now, if one of the parent node is null, p2 is null
141 141 if p2 == nullid:
142 142 # deep copy of a hash is faster than creating one
143 143 s = _nullhash.copy()
144 144 s.update(p1)
145 145 else:
146 146 # none of the parent nodes are nullid
147 147 l = [p1, p2]
148 148 l.sort()
149 149 s = hashlib.sha1(l[0])
150 150 s.update(l[1])
151 151 s.update(text)
152 152 return s.digest()
153 153
154 154 # index v0:
155 155 # 4 bytes: offset
156 156 # 4 bytes: compressed length
157 157 # 4 bytes: base rev
158 158 # 4 bytes: link rev
159 159 # 20 bytes: parent 1 nodeid
160 160 # 20 bytes: parent 2 nodeid
161 161 # 20 bytes: nodeid
162 162 indexformatv0 = ">4l20s20s20s"
163 163
164 164 class revlogoldio(object):
165 165 def __init__(self):
166 166 self.size = struct.calcsize(indexformatv0)
167 167
168 168 def parseindex(self, data, inline):
169 169 s = self.size
170 170 index = []
171 171 nodemap = {nullid: nullrev}
172 172 n = off = 0
173 173 l = len(data)
174 174 while off + s <= l:
175 175 cur = data[off:off + s]
176 176 off += s
177 177 e = _unpack(indexformatv0, cur)
178 178 # transform to revlogv1 format
179 179 e2 = (offset_type(e[0], 0), e[1], -1, e[2], e[3],
180 180 nodemap.get(e[4], nullrev), nodemap.get(e[5], nullrev), e[6])
181 181 index.append(e2)
182 182 nodemap[e[6]] = n
183 183 n += 1
184 184
185 185 # add the magic null revision at -1
186 186 index.append((0, 0, 0, -1, -1, -1, -1, nullid))
187 187
188 188 return index, nodemap, None
189 189
190 190 def packentry(self, entry, node, version, rev):
191 191 if gettype(entry[0]):
192 raise RevlogError(_("index entry flags need RevlogNG"))
192 raise RevlogError(_('index entry flags need revlog version 1'))
193 193 e2 = (getoffset(entry[0]), entry[1], entry[3], entry[4],
194 194 node(entry[5]), node(entry[6]), entry[7])
195 195 return _pack(indexformatv0, *e2)
196 196
197 197 # index ng:
198 198 # 6 bytes: offset
199 199 # 2 bytes: flags
200 200 # 4 bytes: compressed length
201 201 # 4 bytes: uncompressed length
202 202 # 4 bytes: base rev
203 203 # 4 bytes: link rev
204 204 # 4 bytes: parent 1 rev
205 205 # 4 bytes: parent 2 rev
206 206 # 32 bytes: nodeid
207 207 indexformatng = ">Qiiiiii20s12x"
208 208 versionformat = ">I"
209 209
210 210 # corresponds to uncompressed length of indexformatng (2 gigs, 4-byte
211 211 # signed integer)
212 212 _maxentrysize = 0x7fffffff
213 213
214 214 class revlogio(object):
215 215 def __init__(self):
216 216 self.size = struct.calcsize(indexformatng)
217 217
218 218 def parseindex(self, data, inline):
219 219 # call the C implementation to parse the index data
220 220 index, cache = parsers.parse_index2(data, inline)
221 221 return index, getattr(index, 'nodemap', None), cache
222 222
223 223 def packentry(self, entry, node, version, rev):
224 224 p = _pack(indexformatng, *entry)
225 225 if rev == 0:
226 226 p = _pack(versionformat, version) + p[4:]
227 227 return p
228 228
229 229 class revlog(object):
230 230 """
231 231 the underlying revision storage object
232 232
233 233 A revlog consists of two parts, an index and the revision data.
234 234
235 235 The index is a file with a fixed record size containing
236 236 information on each revision, including its nodeid (hash), the
237 237 nodeids of its parents, the position and offset of its data within
238 238 the data file, and the revision it's based on. Finally, each entry
239 239 contains a linkrev entry that can serve as a pointer to external
240 240 data.
241 241
242 242 The revision data itself is a linear collection of data chunks.
243 243 Each chunk represents a revision and is usually represented as a
244 244 delta against the previous chunk. To bound lookup time, runs of
245 245 deltas are limited to about 2 times the length of the original
246 246 version data. This makes retrieval of a version proportional to
247 247 its size, or O(1) relative to the number of revisions.
248 248
249 249 Both pieces of the revlog are written to in an append-only
250 250 fashion, which means we never need to rewrite a file to insert or
251 251 remove data, and can use some simple techniques to avoid the need
252 252 for locking while reading.
253 253
254 254 If checkambig, indexfile is opened with checkambig=True at
255 255 writing, to avoid file stat ambiguity.
256 256 """
257 257 def __init__(self, opener, indexfile, datafile=None, checkambig=False):
258 258 """
259 259 create a revlog object
260 260
261 261 opener is a function that abstracts the file opening operation
262 262 and can be used to implement COW semantics or the like.
263 263 """
264 264 self.indexfile = indexfile
265 265 self.datafile = datafile or (indexfile[:-2] + ".d")
266 266 self.opener = opener
267 267 # When True, indexfile is opened with checkambig=True at writing, to
268 268 # avoid file stat ambiguity.
269 269 self._checkambig = checkambig
270 270 # 3-tuple of (node, rev, text) for a raw revision.
271 271 self._cache = None
272 272 # Maps rev to chain base rev.
273 273 self._chainbasecache = util.lrucachedict(100)
274 274 # 2-tuple of (offset, data) of raw data from the revlog at an offset.
275 275 self._chunkcache = (0, '')
276 276 # How much data to read and cache into the raw revlog data cache.
277 277 self._chunkcachesize = 65536
278 278 self._maxchainlen = None
279 279 self._aggressivemergedeltas = False
280 280 self.index = []
281 281 # Mapping of partial identifiers to full nodes.
282 282 self._pcache = {}
283 283 # Mapping of revision integer to full node.
284 284 self._nodecache = {nullid: nullrev}
285 285 self._nodepos = None
286 286 self._compengine = 'zlib'
287 287
288 288 v = REVLOG_DEFAULT_VERSION
289 289 opts = getattr(opener, 'options', None)
290 290 if opts is not None:
291 291 if 'revlogv1' in opts:
292 292 if 'generaldelta' in opts:
293 293 v |= FLAG_GENERALDELTA
294 294 else:
295 295 v = 0
296 296 if 'chunkcachesize' in opts:
297 297 self._chunkcachesize = opts['chunkcachesize']
298 298 if 'maxchainlen' in opts:
299 299 self._maxchainlen = opts['maxchainlen']
300 300 if 'aggressivemergedeltas' in opts:
301 301 self._aggressivemergedeltas = opts['aggressivemergedeltas']
302 302 self._lazydeltabase = bool(opts.get('lazydeltabase', False))
303 303 if 'compengine' in opts:
304 304 self._compengine = opts['compengine']
305 305
306 306 if self._chunkcachesize <= 0:
307 307 raise RevlogError(_('revlog chunk cache size %r is not greater '
308 308 'than 0') % self._chunkcachesize)
309 309 elif self._chunkcachesize & (self._chunkcachesize - 1):
310 310 raise RevlogError(_('revlog chunk cache size %r is not a power '
311 311 'of 2') % self._chunkcachesize)
312 312
313 313 indexdata = ''
314 314 self._initempty = True
315 315 try:
316 316 f = self.opener(self.indexfile)
317 317 indexdata = f.read()
318 318 f.close()
319 319 if len(indexdata) > 0:
320 320 v = struct.unpack(versionformat, indexdata[:4])[0]
321 321 self._initempty = False
322 322 except IOError as inst:
323 323 if inst.errno != errno.ENOENT:
324 324 raise
325 325
326 326 self.version = v
327 327 self._inline = v & FLAG_INLINE_DATA
328 328 self._generaldelta = v & FLAG_GENERALDELTA
329 329 flags = v & ~0xFFFF
330 330 fmt = v & 0xFFFF
331 331 if fmt == REVLOGV0:
332 332 if flags:
333 333 raise RevlogError(_('unknown flags (%#04x) in version %d '
334 334 'revlog %s') %
335 335 (flags >> 16, fmt, self.indexfile))
336 336 elif fmt == REVLOGV1:
337 337 if flags & ~REVLOGV1_FLAGS:
338 338 raise RevlogError(_('unknown flags (%#04x) in version %d '
339 339 'revlog %s') %
340 340 (flags >> 16, fmt, self.indexfile))
341 341 else:
342 342 raise RevlogError(_('unknown version (%d) in revlog %s') %
343 343 (fmt, self.indexfile))
344 344
345 345 self.storedeltachains = True
346 346
347 347 self._io = revlogio()
348 348 if self.version == REVLOGV0:
349 349 self._io = revlogoldio()
350 350 try:
351 351 d = self._io.parseindex(indexdata, self._inline)
352 352 except (ValueError, IndexError):
353 353 raise RevlogError(_("index %s is corrupted") % (self.indexfile))
354 354 self.index, nodemap, self._chunkcache = d
355 355 if nodemap is not None:
356 356 self.nodemap = self._nodecache = nodemap
357 357 if not self._chunkcache:
358 358 self._chunkclear()
359 359 # revnum -> (chain-length, sum-delta-length)
360 360 self._chaininfocache = {}
361 361 # revlog header -> revlog compressor
362 362 self._decompressors = {}
363 363
364 364 @util.propertycache
365 365 def _compressor(self):
366 366 return util.compengines[self._compengine].revlogcompressor()
367 367
368 368 def tip(self):
369 369 return self.node(len(self.index) - 2)
370 370 def __contains__(self, rev):
371 371 return 0 <= rev < len(self)
372 372 def __len__(self):
373 373 return len(self.index) - 1
374 374 def __iter__(self):
375 375 return iter(xrange(len(self)))
376 376 def revs(self, start=0, stop=None):
377 377 """iterate over all rev in this revlog (from start to stop)"""
378 378 step = 1
379 379 if stop is not None:
380 380 if start > stop:
381 381 step = -1
382 382 stop += step
383 383 else:
384 384 stop = len(self)
385 385 return xrange(start, stop, step)
386 386
387 387 @util.propertycache
388 388 def nodemap(self):
389 389 self.rev(self.node(0))
390 390 return self._nodecache
391 391
392 392 def hasnode(self, node):
393 393 try:
394 394 self.rev(node)
395 395 return True
396 396 except KeyError:
397 397 return False
398 398
399 399 def clearcaches(self):
400 400 self._cache = None
401 401 self._chainbasecache.clear()
402 402 self._chunkcache = (0, '')
403 403 self._pcache = {}
404 404
405 405 try:
406 406 self._nodecache.clearcaches()
407 407 except AttributeError:
408 408 self._nodecache = {nullid: nullrev}
409 409 self._nodepos = None
410 410
411 411 def rev(self, node):
412 412 try:
413 413 return self._nodecache[node]
414 414 except TypeError:
415 415 raise
416 416 except RevlogError:
417 417 # parsers.c radix tree lookup failed
418 418 raise LookupError(node, self.indexfile, _('no node'))
419 419 except KeyError:
420 420 # pure python cache lookup failed
421 421 n = self._nodecache
422 422 i = self.index
423 423 p = self._nodepos
424 424 if p is None:
425 425 p = len(i) - 2
426 426 for r in xrange(p, -1, -1):
427 427 v = i[r][7]
428 428 n[v] = r
429 429 if v == node:
430 430 self._nodepos = r - 1
431 431 return r
432 432 raise LookupError(node, self.indexfile, _('no node'))
433 433
434 434 # Accessors for index entries.
435 435
436 436 # First tuple entry is 8 bytes. First 6 bytes are offset. Last 2 bytes
437 437 # are flags.
438 438 def start(self, rev):
439 439 return int(self.index[rev][0] >> 16)
440 440
441 441 def flags(self, rev):
442 442 return self.index[rev][0] & 0xFFFF
443 443
444 444 def length(self, rev):
445 445 return self.index[rev][1]
446 446
447 447 def rawsize(self, rev):
448 448 """return the length of the uncompressed text for a given revision"""
449 449 l = self.index[rev][2]
450 450 if l >= 0:
451 451 return l
452 452
453 453 t = self.revision(rev, raw=True)
454 454 return len(t)
455 455
456 456 def size(self, rev):
457 457 """length of non-raw text (processed by a "read" flag processor)"""
458 458 # fast path: if no "read" flag processor could change the content,
459 459 # size is rawsize. note: ELLIPSIS is known to not change the content.
460 460 flags = self.flags(rev)
461 461 if flags & (REVIDX_KNOWN_FLAGS ^ REVIDX_ELLIPSIS) == 0:
462 462 return self.rawsize(rev)
463 463
464 464 return len(self.revision(rev, raw=False))
465 465
466 466 def chainbase(self, rev):
467 467 base = self._chainbasecache.get(rev)
468 468 if base is not None:
469 469 return base
470 470
471 471 index = self.index
472 472 base = index[rev][3]
473 473 while base != rev:
474 474 rev = base
475 475 base = index[rev][3]
476 476
477 477 self._chainbasecache[rev] = base
478 478 return base
479 479
480 480 def linkrev(self, rev):
481 481 return self.index[rev][4]
482 482
483 483 def parentrevs(self, rev):
484 484 return self.index[rev][5:7]
485 485
486 486 def node(self, rev):
487 487 return self.index[rev][7]
488 488
489 489 # Derived from index values.
490 490
491 491 def end(self, rev):
492 492 return self.start(rev) + self.length(rev)
493 493
494 494 def parents(self, node):
495 495 i = self.index
496 496 d = i[self.rev(node)]
497 497 return i[d[5]][7], i[d[6]][7] # map revisions to nodes inline
498 498
499 499 def chainlen(self, rev):
500 500 return self._chaininfo(rev)[0]
501 501
502 502 def _chaininfo(self, rev):
503 503 chaininfocache = self._chaininfocache
504 504 if rev in chaininfocache:
505 505 return chaininfocache[rev]
506 506 index = self.index
507 507 generaldelta = self._generaldelta
508 508 iterrev = rev
509 509 e = index[iterrev]
510 510 clen = 0
511 511 compresseddeltalen = 0
512 512 while iterrev != e[3]:
513 513 clen += 1
514 514 compresseddeltalen += e[1]
515 515 if generaldelta:
516 516 iterrev = e[3]
517 517 else:
518 518 iterrev -= 1
519 519 if iterrev in chaininfocache:
520 520 t = chaininfocache[iterrev]
521 521 clen += t[0]
522 522 compresseddeltalen += t[1]
523 523 break
524 524 e = index[iterrev]
525 525 else:
526 526 # Add text length of base since decompressing that also takes
527 527 # work. For cache hits the length is already included.
528 528 compresseddeltalen += e[1]
529 529 r = (clen, compresseddeltalen)
530 530 chaininfocache[rev] = r
531 531 return r
532 532
533 533 def _deltachain(self, rev, stoprev=None):
534 534 """Obtain the delta chain for a revision.
535 535
536 536 ``stoprev`` specifies a revision to stop at. If not specified, we
537 537 stop at the base of the chain.
538 538
539 539 Returns a 2-tuple of (chain, stopped) where ``chain`` is a list of
540 540 revs in ascending order and ``stopped`` is a bool indicating whether
541 541 ``stoprev`` was hit.
542 542 """
543 543 chain = []
544 544
545 545 # Alias to prevent attribute lookup in tight loop.
546 546 index = self.index
547 547 generaldelta = self._generaldelta
548 548
549 549 iterrev = rev
550 550 e = index[iterrev]
551 551 while iterrev != e[3] and iterrev != stoprev:
552 552 chain.append(iterrev)
553 553 if generaldelta:
554 554 iterrev = e[3]
555 555 else:
556 556 iterrev -= 1
557 557 e = index[iterrev]
558 558
559 559 if iterrev == stoprev:
560 560 stopped = True
561 561 else:
562 562 chain.append(iterrev)
563 563 stopped = False
564 564
565 565 chain.reverse()
566 566 return chain, stopped
567 567
568 568 def ancestors(self, revs, stoprev=0, inclusive=False):
569 569 """Generate the ancestors of 'revs' in reverse topological order.
570 570 Does not generate revs lower than stoprev.
571 571
572 572 See the documentation for ancestor.lazyancestors for more details."""
573 573
574 574 return ancestor.lazyancestors(self.parentrevs, revs, stoprev=stoprev,
575 575 inclusive=inclusive)
576 576
577 577 def descendants(self, revs):
578 578 """Generate the descendants of 'revs' in revision order.
579 579
580 580 Yield a sequence of revision numbers starting with a child of
581 581 some rev in revs, i.e., each revision is *not* considered a
582 582 descendant of itself. Results are ordered by revision number (a
583 583 topological sort)."""
584 584 first = min(revs)
585 585 if first == nullrev:
586 586 for i in self:
587 587 yield i
588 588 return
589 589
590 590 seen = set(revs)
591 591 for i in self.revs(start=first + 1):
592 592 for x in self.parentrevs(i):
593 593 if x != nullrev and x in seen:
594 594 seen.add(i)
595 595 yield i
596 596 break
597 597
598 598 def findcommonmissing(self, common=None, heads=None):
599 599 """Return a tuple of the ancestors of common and the ancestors of heads
600 600 that are not ancestors of common. In revset terminology, we return the
601 601 tuple:
602 602
603 603 ::common, (::heads) - (::common)
604 604
605 605 The list is sorted by revision number, meaning it is
606 606 topologically sorted.
607 607
608 608 'heads' and 'common' are both lists of node IDs. If heads is
609 609 not supplied, uses all of the revlog's heads. If common is not
610 610 supplied, uses nullid."""
611 611 if common is None:
612 612 common = [nullid]
613 613 if heads is None:
614 614 heads = self.heads()
615 615
616 616 common = [self.rev(n) for n in common]
617 617 heads = [self.rev(n) for n in heads]
618 618
619 619 # we want the ancestors, but inclusive
620 620 class lazyset(object):
621 621 def __init__(self, lazyvalues):
622 622 self.addedvalues = set()
623 623 self.lazyvalues = lazyvalues
624 624
625 625 def __contains__(self, value):
626 626 return value in self.addedvalues or value in self.lazyvalues
627 627
628 628 def __iter__(self):
629 629 added = self.addedvalues
630 630 for r in added:
631 631 yield r
632 632 for r in self.lazyvalues:
633 633 if not r in added:
634 634 yield r
635 635
636 636 def add(self, value):
637 637 self.addedvalues.add(value)
638 638
639 639 def update(self, values):
640 640 self.addedvalues.update(values)
641 641
642 642 has = lazyset(self.ancestors(common))
643 643 has.add(nullrev)
644 644 has.update(common)
645 645
646 646 # take all ancestors from heads that aren't in has
647 647 missing = set()
648 648 visit = collections.deque(r for r in heads if r not in has)
649 649 while visit:
650 650 r = visit.popleft()
651 651 if r in missing:
652 652 continue
653 653 else:
654 654 missing.add(r)
655 655 for p in self.parentrevs(r):
656 656 if p not in has:
657 657 visit.append(p)
658 658 missing = list(missing)
659 659 missing.sort()
660 660 return has, [self.node(miss) for miss in missing]
661 661
662 662 def incrementalmissingrevs(self, common=None):
663 663 """Return an object that can be used to incrementally compute the
664 664 revision numbers of the ancestors of arbitrary sets that are not
665 665 ancestors of common. This is an ancestor.incrementalmissingancestors
666 666 object.
667 667
668 668 'common' is a list of revision numbers. If common is not supplied, uses
669 669 nullrev.
670 670 """
671 671 if common is None:
672 672 common = [nullrev]
673 673
674 674 return ancestor.incrementalmissingancestors(self.parentrevs, common)
675 675
676 676 def findmissingrevs(self, common=None, heads=None):
677 677 """Return the revision numbers of the ancestors of heads that
678 678 are not ancestors of common.
679 679
680 680 More specifically, return a list of revision numbers corresponding to
681 681 nodes N such that every N satisfies the following constraints:
682 682
683 683 1. N is an ancestor of some node in 'heads'
684 684 2. N is not an ancestor of any node in 'common'
685 685
686 686 The list is sorted by revision number, meaning it is
687 687 topologically sorted.
688 688
689 689 'heads' and 'common' are both lists of revision numbers. If heads is
690 690 not supplied, uses all of the revlog's heads. If common is not
691 691 supplied, uses nullid."""
692 692 if common is None:
693 693 common = [nullrev]
694 694 if heads is None:
695 695 heads = self.headrevs()
696 696
697 697 inc = self.incrementalmissingrevs(common=common)
698 698 return inc.missingancestors(heads)
699 699
700 700 def findmissing(self, common=None, heads=None):
701 701 """Return the ancestors of heads that are not ancestors of common.
702 702
703 703 More specifically, return a list of nodes N such that every N
704 704 satisfies the following constraints:
705 705
706 706 1. N is an ancestor of some node in 'heads'
707 707 2. N is not an ancestor of any node in 'common'
708 708
709 709 The list is sorted by revision number, meaning it is
710 710 topologically sorted.
711 711
712 712 'heads' and 'common' are both lists of node IDs. If heads is
713 713 not supplied, uses all of the revlog's heads. If common is not
714 714 supplied, uses nullid."""
715 715 if common is None:
716 716 common = [nullid]
717 717 if heads is None:
718 718 heads = self.heads()
719 719
720 720 common = [self.rev(n) for n in common]
721 721 heads = [self.rev(n) for n in heads]
722 722
723 723 inc = self.incrementalmissingrevs(common=common)
724 724 return [self.node(r) for r in inc.missingancestors(heads)]
725 725
726 726 def nodesbetween(self, roots=None, heads=None):
727 727 """Return a topological path from 'roots' to 'heads'.
728 728
729 729 Return a tuple (nodes, outroots, outheads) where 'nodes' is a
730 730 topologically sorted list of all nodes N that satisfy both of
731 731 these constraints:
732 732
733 733 1. N is a descendant of some node in 'roots'
734 734 2. N is an ancestor of some node in 'heads'
735 735
736 736 Every node is considered to be both a descendant and an ancestor
737 737 of itself, so every reachable node in 'roots' and 'heads' will be
738 738 included in 'nodes'.
739 739
740 740 'outroots' is the list of reachable nodes in 'roots', i.e., the
741 741 subset of 'roots' that is returned in 'nodes'. Likewise,
742 742 'outheads' is the subset of 'heads' that is also in 'nodes'.
743 743
744 744 'roots' and 'heads' are both lists of node IDs. If 'roots' is
745 745 unspecified, uses nullid as the only root. If 'heads' is
746 746 unspecified, uses list of all of the revlog's heads."""
747 747 nonodes = ([], [], [])
748 748 if roots is not None:
749 749 roots = list(roots)
750 750 if not roots:
751 751 return nonodes
752 752 lowestrev = min([self.rev(n) for n in roots])
753 753 else:
754 754 roots = [nullid] # Everybody's a descendant of nullid
755 755 lowestrev = nullrev
756 756 if (lowestrev == nullrev) and (heads is None):
757 757 # We want _all_ the nodes!
758 758 return ([self.node(r) for r in self], [nullid], list(self.heads()))
759 759 if heads is None:
760 760 # All nodes are ancestors, so the latest ancestor is the last
761 761 # node.
762 762 highestrev = len(self) - 1
763 763 # Set ancestors to None to signal that every node is an ancestor.
764 764 ancestors = None
765 765 # Set heads to an empty dictionary for later discovery of heads
766 766 heads = {}
767 767 else:
768 768 heads = list(heads)
769 769 if not heads:
770 770 return nonodes
771 771 ancestors = set()
772 772 # Turn heads into a dictionary so we can remove 'fake' heads.
773 773 # Also, later we will be using it to filter out the heads we can't
774 774 # find from roots.
775 775 heads = dict.fromkeys(heads, False)
776 776 # Start at the top and keep marking parents until we're done.
777 777 nodestotag = set(heads)
778 778 # Remember where the top was so we can use it as a limit later.
779 779 highestrev = max([self.rev(n) for n in nodestotag])
780 780 while nodestotag:
781 781 # grab a node to tag
782 782 n = nodestotag.pop()
783 783 # Never tag nullid
784 784 if n == nullid:
785 785 continue
786 786 # A node's revision number represents its place in a
787 787 # topologically sorted list of nodes.
788 788 r = self.rev(n)
789 789 if r >= lowestrev:
790 790 if n not in ancestors:
791 791 # If we are possibly a descendant of one of the roots
792 792 # and we haven't already been marked as an ancestor
793 793 ancestors.add(n) # Mark as ancestor
794 794 # Add non-nullid parents to list of nodes to tag.
795 795 nodestotag.update([p for p in self.parents(n) if
796 796 p != nullid])
797 797 elif n in heads: # We've seen it before, is it a fake head?
798 798 # So it is, real heads should not be the ancestors of
799 799 # any other heads.
800 800 heads.pop(n)
801 801 if not ancestors:
802 802 return nonodes
803 803 # Now that we have our set of ancestors, we want to remove any
804 804 # roots that are not ancestors.
805 805
806 806 # If one of the roots was nullid, everything is included anyway.
807 807 if lowestrev > nullrev:
808 808 # But, since we weren't, let's recompute the lowest rev to not
809 809 # include roots that aren't ancestors.
810 810
811 811 # Filter out roots that aren't ancestors of heads
812 812 roots = [root for root in roots if root in ancestors]
813 813 # Recompute the lowest revision
814 814 if roots:
815 815 lowestrev = min([self.rev(root) for root in roots])
816 816 else:
817 817 # No more roots? Return empty list
818 818 return nonodes
819 819 else:
820 820 # We are descending from nullid, and don't need to care about
821 821 # any other roots.
822 822 lowestrev = nullrev
823 823 roots = [nullid]
824 824 # Transform our roots list into a set.
825 825 descendants = set(roots)
826 826 # Also, keep the original roots so we can filter out roots that aren't
827 827 # 'real' roots (i.e. are descended from other roots).
828 828 roots = descendants.copy()
829 829 # Our topologically sorted list of output nodes.
830 830 orderedout = []
831 831 # Don't start at nullid since we don't want nullid in our output list,
832 832 # and if nullid shows up in descendants, empty parents will look like
833 833 # they're descendants.
834 834 for r in self.revs(start=max(lowestrev, 0), stop=highestrev + 1):
835 835 n = self.node(r)
836 836 isdescendant = False
837 837 if lowestrev == nullrev: # Everybody is a descendant of nullid
838 838 isdescendant = True
839 839 elif n in descendants:
840 840 # n is already a descendant
841 841 isdescendant = True
842 842 # This check only needs to be done here because all the roots
843 843 # will start being marked is descendants before the loop.
844 844 if n in roots:
845 845 # If n was a root, check if it's a 'real' root.
846 846 p = tuple(self.parents(n))
847 847 # If any of its parents are descendants, it's not a root.
848 848 if (p[0] in descendants) or (p[1] in descendants):
849 849 roots.remove(n)
850 850 else:
851 851 p = tuple(self.parents(n))
852 852 # A node is a descendant if either of its parents are
853 853 # descendants. (We seeded the dependents list with the roots
854 854 # up there, remember?)
855 855 if (p[0] in descendants) or (p[1] in descendants):
856 856 descendants.add(n)
857 857 isdescendant = True
858 858 if isdescendant and ((ancestors is None) or (n in ancestors)):
859 859 # Only include nodes that are both descendants and ancestors.
860 860 orderedout.append(n)
861 861 if (ancestors is not None) and (n in heads):
862 862 # We're trying to figure out which heads are reachable
863 863 # from roots.
864 864 # Mark this head as having been reached
865 865 heads[n] = True
866 866 elif ancestors is None:
867 867 # Otherwise, we're trying to discover the heads.
868 868 # Assume this is a head because if it isn't, the next step
869 869 # will eventually remove it.
870 870 heads[n] = True
871 871 # But, obviously its parents aren't.
872 872 for p in self.parents(n):
873 873 heads.pop(p, None)
874 874 heads = [head for head, flag in heads.iteritems() if flag]
875 875 roots = list(roots)
876 876 assert orderedout
877 877 assert roots
878 878 assert heads
879 879 return (orderedout, roots, heads)
880 880
881 881 def headrevs(self):
882 882 try:
883 883 return self.index.headrevs()
884 884 except AttributeError:
885 885 return self._headrevs()
886 886
887 887 def computephases(self, roots):
888 888 return self.index.computephasesmapsets(roots)
889 889
890 890 def _headrevs(self):
891 891 count = len(self)
892 892 if not count:
893 893 return [nullrev]
894 894 # we won't iter over filtered rev so nobody is a head at start
895 895 ishead = [0] * (count + 1)
896 896 index = self.index
897 897 for r in self:
898 898 ishead[r] = 1 # I may be an head
899 899 e = index[r]
900 900 ishead[e[5]] = ishead[e[6]] = 0 # my parent are not
901 901 return [r for r, val in enumerate(ishead) if val]
902 902
903 903 def heads(self, start=None, stop=None):
904 904 """return the list of all nodes that have no children
905 905
906 906 if start is specified, only heads that are descendants of
907 907 start will be returned
908 908 if stop is specified, it will consider all the revs from stop
909 909 as if they had no children
910 910 """
911 911 if start is None and stop is None:
912 912 if not len(self):
913 913 return [nullid]
914 914 return [self.node(r) for r in self.headrevs()]
915 915
916 916 if start is None:
917 917 start = nullid
918 918 if stop is None:
919 919 stop = []
920 920 stoprevs = set([self.rev(n) for n in stop])
921 921 startrev = self.rev(start)
922 922 reachable = {startrev}
923 923 heads = {startrev}
924 924
925 925 parentrevs = self.parentrevs
926 926 for r in self.revs(start=startrev + 1):
927 927 for p in parentrevs(r):
928 928 if p in reachable:
929 929 if r not in stoprevs:
930 930 reachable.add(r)
931 931 heads.add(r)
932 932 if p in heads and p not in stoprevs:
933 933 heads.remove(p)
934 934
935 935 return [self.node(r) for r in heads]
936 936
937 937 def children(self, node):
938 938 """find the children of a given node"""
939 939 c = []
940 940 p = self.rev(node)
941 941 for r in self.revs(start=p + 1):
942 942 prevs = [pr for pr in self.parentrevs(r) if pr != nullrev]
943 943 if prevs:
944 944 for pr in prevs:
945 945 if pr == p:
946 946 c.append(self.node(r))
947 947 elif p == nullrev:
948 948 c.append(self.node(r))
949 949 return c
950 950
951 951 def descendant(self, start, end):
952 952 if start == nullrev:
953 953 return True
954 954 for i in self.descendants([start]):
955 955 if i == end:
956 956 return True
957 957 elif i > end:
958 958 break
959 959 return False
960 960
961 961 def commonancestorsheads(self, a, b):
962 962 """calculate all the heads of the common ancestors of nodes a and b"""
963 963 a, b = self.rev(a), self.rev(b)
964 964 try:
965 965 ancs = self.index.commonancestorsheads(a, b)
966 966 except (AttributeError, OverflowError): # C implementation failed
967 967 ancs = ancestor.commonancestorsheads(self.parentrevs, a, b)
968 968 return pycompat.maplist(self.node, ancs)
969 969
970 970 def isancestor(self, a, b):
971 971 """return True if node a is an ancestor of node b
972 972
973 973 The implementation of this is trivial but the use of
974 974 commonancestorsheads is not."""
975 975 return a in self.commonancestorsheads(a, b)
976 976
977 977 def ancestor(self, a, b):
978 978 """calculate the "best" common ancestor of nodes a and b"""
979 979
980 980 a, b = self.rev(a), self.rev(b)
981 981 try:
982 982 ancs = self.index.ancestors(a, b)
983 983 except (AttributeError, OverflowError):
984 984 ancs = ancestor.ancestors(self.parentrevs, a, b)
985 985 if ancs:
986 986 # choose a consistent winner when there's a tie
987 987 return min(map(self.node, ancs))
988 988 return nullid
989 989
990 990 def _match(self, id):
991 991 if isinstance(id, int):
992 992 # rev
993 993 return self.node(id)
994 994 if len(id) == 20:
995 995 # possibly a binary node
996 996 # odds of a binary node being all hex in ASCII are 1 in 10**25
997 997 try:
998 998 node = id
999 999 self.rev(node) # quick search the index
1000 1000 return node
1001 1001 except LookupError:
1002 1002 pass # may be partial hex id
1003 1003 try:
1004 1004 # str(rev)
1005 1005 rev = int(id)
1006 1006 if str(rev) != id:
1007 1007 raise ValueError
1008 1008 if rev < 0:
1009 1009 rev = len(self) + rev
1010 1010 if rev < 0 or rev >= len(self):
1011 1011 raise ValueError
1012 1012 return self.node(rev)
1013 1013 except (ValueError, OverflowError):
1014 1014 pass
1015 1015 if len(id) == 40:
1016 1016 try:
1017 1017 # a full hex nodeid?
1018 1018 node = bin(id)
1019 1019 self.rev(node)
1020 1020 return node
1021 1021 except (TypeError, LookupError):
1022 1022 pass
1023 1023
1024 1024 def _partialmatch(self, id):
1025 1025 try:
1026 1026 partial = self.index.partialmatch(id)
1027 1027 if partial and self.hasnode(partial):
1028 1028 return partial
1029 1029 return None
1030 1030 except RevlogError:
1031 1031 # parsers.c radix tree lookup gave multiple matches
1032 1032 # fast path: for unfiltered changelog, radix tree is accurate
1033 1033 if not getattr(self, 'filteredrevs', None):
1034 1034 raise LookupError(id, self.indexfile,
1035 1035 _('ambiguous identifier'))
1036 1036 # fall through to slow path that filters hidden revisions
1037 1037 except (AttributeError, ValueError):
1038 1038 # we are pure python, or key was too short to search radix tree
1039 1039 pass
1040 1040
1041 1041 if id in self._pcache:
1042 1042 return self._pcache[id]
1043 1043
1044 1044 if len(id) < 40:
1045 1045 try:
1046 1046 # hex(node)[:...]
1047 1047 l = len(id) // 2 # grab an even number of digits
1048 1048 prefix = bin(id[:l * 2])
1049 1049 nl = [e[7] for e in self.index if e[7].startswith(prefix)]
1050 1050 nl = [n for n in nl if hex(n).startswith(id) and
1051 1051 self.hasnode(n)]
1052 1052 if len(nl) > 0:
1053 1053 if len(nl) == 1:
1054 1054 self._pcache[id] = nl[0]
1055 1055 return nl[0]
1056 1056 raise LookupError(id, self.indexfile,
1057 1057 _('ambiguous identifier'))
1058 1058 return None
1059 1059 except TypeError:
1060 1060 pass
1061 1061
1062 1062 def lookup(self, id):
1063 1063 """locate a node based on:
1064 1064 - revision number or str(revision number)
1065 1065 - nodeid or subset of hex nodeid
1066 1066 """
1067 1067 n = self._match(id)
1068 1068 if n is not None:
1069 1069 return n
1070 1070 n = self._partialmatch(id)
1071 1071 if n:
1072 1072 return n
1073 1073
1074 1074 raise LookupError(id, self.indexfile, _('no match found'))
1075 1075
1076 1076 def cmp(self, node, text):
1077 1077 """compare text with a given file revision
1078 1078
1079 1079 returns True if text is different than what is stored.
1080 1080 """
1081 1081 p1, p2 = self.parents(node)
1082 1082 return hash(text, p1, p2) != node
1083 1083
1084 1084 def _cachesegment(self, offset, data):
1085 1085 """Add a segment to the revlog cache.
1086 1086
1087 1087 Accepts an absolute offset and the data that is at that location.
1088 1088 """
1089 1089 o, d = self._chunkcache
1090 1090 # try to add to existing cache
1091 1091 if o + len(d) == offset and len(d) + len(data) < _chunksize:
1092 1092 self._chunkcache = o, d + data
1093 1093 else:
1094 1094 self._chunkcache = offset, data
1095 1095
1096 1096 def _readsegment(self, offset, length, df=None):
1097 1097 """Load a segment of raw data from the revlog.
1098 1098
1099 1099 Accepts an absolute offset, length to read, and an optional existing
1100 1100 file handle to read from.
1101 1101
1102 1102 If an existing file handle is passed, it will be seeked and the
1103 1103 original seek position will NOT be restored.
1104 1104
1105 1105 Returns a str or buffer of raw byte data.
1106 1106 """
1107 1107 if df is not None:
1108 1108 closehandle = False
1109 1109 else:
1110 1110 if self._inline:
1111 1111 df = self.opener(self.indexfile)
1112 1112 else:
1113 1113 df = self.opener(self.datafile)
1114 1114 closehandle = True
1115 1115
1116 1116 # Cache data both forward and backward around the requested
1117 1117 # data, in a fixed size window. This helps speed up operations
1118 1118 # involving reading the revlog backwards.
1119 1119 cachesize = self._chunkcachesize
1120 1120 realoffset = offset & ~(cachesize - 1)
1121 1121 reallength = (((offset + length + cachesize) & ~(cachesize - 1))
1122 1122 - realoffset)
1123 1123 df.seek(realoffset)
1124 1124 d = df.read(reallength)
1125 1125 if closehandle:
1126 1126 df.close()
1127 1127 self._cachesegment(realoffset, d)
1128 1128 if offset != realoffset or reallength != length:
1129 1129 return util.buffer(d, offset - realoffset, length)
1130 1130 return d
1131 1131
1132 1132 def _getsegment(self, offset, length, df=None):
1133 1133 """Obtain a segment of raw data from the revlog.
1134 1134
1135 1135 Accepts an absolute offset, length of bytes to obtain, and an
1136 1136 optional file handle to the already-opened revlog. If the file
1137 1137 handle is used, it's original seek position will not be preserved.
1138 1138
1139 1139 Requests for data may be returned from a cache.
1140 1140
1141 1141 Returns a str or a buffer instance of raw byte data.
1142 1142 """
1143 1143 o, d = self._chunkcache
1144 1144 l = len(d)
1145 1145
1146 1146 # is it in the cache?
1147 1147 cachestart = offset - o
1148 1148 cacheend = cachestart + length
1149 1149 if cachestart >= 0 and cacheend <= l:
1150 1150 if cachestart == 0 and cacheend == l:
1151 1151 return d # avoid a copy
1152 1152 return util.buffer(d, cachestart, cacheend - cachestart)
1153 1153
1154 1154 return self._readsegment(offset, length, df=df)
1155 1155
1156 1156 def _getsegmentforrevs(self, startrev, endrev, df=None):
1157 1157 """Obtain a segment of raw data corresponding to a range of revisions.
1158 1158
1159 1159 Accepts the start and end revisions and an optional already-open
1160 1160 file handle to be used for reading. If the file handle is read, its
1161 1161 seek position will not be preserved.
1162 1162
1163 1163 Requests for data may be satisfied by a cache.
1164 1164
1165 1165 Returns a 2-tuple of (offset, data) for the requested range of
1166 1166 revisions. Offset is the integer offset from the beginning of the
1167 1167 revlog and data is a str or buffer of the raw byte data.
1168 1168
1169 1169 Callers will need to call ``self.start(rev)`` and ``self.length(rev)``
1170 1170 to determine where each revision's data begins and ends.
1171 1171 """
1172 1172 # Inlined self.start(startrev) & self.end(endrev) for perf reasons
1173 1173 # (functions are expensive).
1174 1174 index = self.index
1175 1175 istart = index[startrev]
1176 1176 start = int(istart[0] >> 16)
1177 1177 if startrev == endrev:
1178 1178 end = start + istart[1]
1179 1179 else:
1180 1180 iend = index[endrev]
1181 1181 end = int(iend[0] >> 16) + iend[1]
1182 1182
1183 1183 if self._inline:
1184 1184 start += (startrev + 1) * self._io.size
1185 1185 end += (endrev + 1) * self._io.size
1186 1186 length = end - start
1187 1187
1188 1188 return start, self._getsegment(start, length, df=df)
1189 1189
1190 1190 def _chunk(self, rev, df=None):
1191 1191 """Obtain a single decompressed chunk for a revision.
1192 1192
1193 1193 Accepts an integer revision and an optional already-open file handle
1194 1194 to be used for reading. If used, the seek position of the file will not
1195 1195 be preserved.
1196 1196
1197 1197 Returns a str holding uncompressed data for the requested revision.
1198 1198 """
1199 1199 return self.decompress(self._getsegmentforrevs(rev, rev, df=df)[1])
1200 1200
1201 1201 def _chunks(self, revs, df=None):
1202 1202 """Obtain decompressed chunks for the specified revisions.
1203 1203
1204 1204 Accepts an iterable of numeric revisions that are assumed to be in
1205 1205 ascending order. Also accepts an optional already-open file handle
1206 1206 to be used for reading. If used, the seek position of the file will
1207 1207 not be preserved.
1208 1208
1209 1209 This function is similar to calling ``self._chunk()`` multiple times,
1210 1210 but is faster.
1211 1211
1212 1212 Returns a list with decompressed data for each requested revision.
1213 1213 """
1214 1214 if not revs:
1215 1215 return []
1216 1216 start = self.start
1217 1217 length = self.length
1218 1218 inline = self._inline
1219 1219 iosize = self._io.size
1220 1220 buffer = util.buffer
1221 1221
1222 1222 l = []
1223 1223 ladd = l.append
1224 1224
1225 1225 try:
1226 1226 offset, data = self._getsegmentforrevs(revs[0], revs[-1], df=df)
1227 1227 except OverflowError:
1228 1228 # issue4215 - we can't cache a run of chunks greater than
1229 1229 # 2G on Windows
1230 1230 return [self._chunk(rev, df=df) for rev in revs]
1231 1231
1232 1232 decomp = self.decompress
1233 1233 for rev in revs:
1234 1234 chunkstart = start(rev)
1235 1235 if inline:
1236 1236 chunkstart += (rev + 1) * iosize
1237 1237 chunklength = length(rev)
1238 1238 ladd(decomp(buffer(data, chunkstart - offset, chunklength)))
1239 1239
1240 1240 return l
1241 1241
1242 1242 def _chunkclear(self):
1243 1243 """Clear the raw chunk cache."""
1244 1244 self._chunkcache = (0, '')
1245 1245
1246 1246 def deltaparent(self, rev):
1247 1247 """return deltaparent of the given revision"""
1248 1248 base = self.index[rev][3]
1249 1249 if base == rev:
1250 1250 return nullrev
1251 1251 elif self._generaldelta:
1252 1252 return base
1253 1253 else:
1254 1254 return rev - 1
1255 1255
1256 1256 def revdiff(self, rev1, rev2):
1257 1257 """return or calculate a delta between two revisions
1258 1258
1259 1259 The delta calculated is in binary form and is intended to be written to
1260 1260 revlog data directly. So this function needs raw revision data.
1261 1261 """
1262 1262 if rev1 != nullrev and self.deltaparent(rev2) == rev1:
1263 1263 return bytes(self._chunk(rev2))
1264 1264
1265 1265 return mdiff.textdiff(self.revision(rev1, raw=True),
1266 1266 self.revision(rev2, raw=True))
1267 1267
1268 1268 def revision(self, nodeorrev, _df=None, raw=False):
1269 1269 """return an uncompressed revision of a given node or revision
1270 1270 number.
1271 1271
1272 1272 _df - an existing file handle to read from. (internal-only)
1273 1273 raw - an optional argument specifying if the revision data is to be
1274 1274 treated as raw data when applying flag transforms. 'raw' should be set
1275 1275 to True when generating changegroups or in debug commands.
1276 1276 """
1277 1277 if isinstance(nodeorrev, int):
1278 1278 rev = nodeorrev
1279 1279 node = self.node(rev)
1280 1280 else:
1281 1281 node = nodeorrev
1282 1282 rev = None
1283 1283
1284 1284 cachedrev = None
1285 1285 flags = None
1286 1286 rawtext = None
1287 1287 if node == nullid:
1288 1288 return ""
1289 1289 if self._cache:
1290 1290 if self._cache[0] == node:
1291 1291 # _cache only stores rawtext
1292 1292 if raw:
1293 1293 return self._cache[2]
1294 1294 # duplicated, but good for perf
1295 1295 if rev is None:
1296 1296 rev = self.rev(node)
1297 1297 if flags is None:
1298 1298 flags = self.flags(rev)
1299 1299 # no extra flags set, no flag processor runs, text = rawtext
1300 1300 if flags == REVIDX_DEFAULT_FLAGS:
1301 1301 return self._cache[2]
1302 1302 # rawtext is reusable. need to run flag processor
1303 1303 rawtext = self._cache[2]
1304 1304
1305 1305 cachedrev = self._cache[1]
1306 1306
1307 1307 # look up what we need to read
1308 1308 if rawtext is None:
1309 1309 if rev is None:
1310 1310 rev = self.rev(node)
1311 1311
1312 1312 chain, stopped = self._deltachain(rev, stoprev=cachedrev)
1313 1313 if stopped:
1314 1314 rawtext = self._cache[2]
1315 1315
1316 1316 # drop cache to save memory
1317 1317 self._cache = None
1318 1318
1319 1319 bins = self._chunks(chain, df=_df)
1320 1320 if rawtext is None:
1321 1321 rawtext = bytes(bins[0])
1322 1322 bins = bins[1:]
1323 1323
1324 1324 rawtext = mdiff.patches(rawtext, bins)
1325 1325 self._cache = (node, rev, rawtext)
1326 1326
1327 1327 if flags is None:
1328 1328 if rev is None:
1329 1329 rev = self.rev(node)
1330 1330 flags = self.flags(rev)
1331 1331
1332 1332 text, validatehash = self._processflags(rawtext, flags, 'read', raw=raw)
1333 1333 if validatehash:
1334 1334 self.checkhash(text, node, rev=rev)
1335 1335
1336 1336 return text
1337 1337
1338 1338 def hash(self, text, p1, p2):
1339 1339 """Compute a node hash.
1340 1340
1341 1341 Available as a function so that subclasses can replace the hash
1342 1342 as needed.
1343 1343 """
1344 1344 return hash(text, p1, p2)
1345 1345
1346 1346 def _processflags(self, text, flags, operation, raw=False):
1347 1347 """Inspect revision data flags and applies transforms defined by
1348 1348 registered flag processors.
1349 1349
1350 1350 ``text`` - the revision data to process
1351 1351 ``flags`` - the revision flags
1352 1352 ``operation`` - the operation being performed (read or write)
1353 1353 ``raw`` - an optional argument describing if the raw transform should be
1354 1354 applied.
1355 1355
1356 1356 This method processes the flags in the order (or reverse order if
1357 1357 ``operation`` is 'write') defined by REVIDX_FLAGS_ORDER, applying the
1358 1358 flag processors registered for present flags. The order of flags defined
1359 1359 in REVIDX_FLAGS_ORDER needs to be stable to allow non-commutativity.
1360 1360
1361 1361 Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is the
1362 1362 processed text and ``validatehash`` is a bool indicating whether the
1363 1363 returned text should be checked for hash integrity.
1364 1364
1365 1365 Note: If the ``raw`` argument is set, it has precedence over the
1366 1366 operation and will only update the value of ``validatehash``.
1367 1367 """
1368 1368 # fast path: no flag processors will run
1369 1369 if flags == 0:
1370 1370 return text, True
1371 1371 if not operation in ('read', 'write'):
1372 1372 raise ProgrammingError(_("invalid '%s' operation ") % (operation))
1373 1373 # Check all flags are known.
1374 1374 if flags & ~REVIDX_KNOWN_FLAGS:
1375 1375 raise RevlogError(_("incompatible revision flag '%#x'") %
1376 1376 (flags & ~REVIDX_KNOWN_FLAGS))
1377 1377 validatehash = True
1378 1378 # Depending on the operation (read or write), the order might be
1379 1379 # reversed due to non-commutative transforms.
1380 1380 orderedflags = REVIDX_FLAGS_ORDER
1381 1381 if operation == 'write':
1382 1382 orderedflags = reversed(orderedflags)
1383 1383
1384 1384 for flag in orderedflags:
1385 1385 # If a flagprocessor has been registered for a known flag, apply the
1386 1386 # related operation transform and update result tuple.
1387 1387 if flag & flags:
1388 1388 vhash = True
1389 1389
1390 1390 if flag not in _flagprocessors:
1391 1391 message = _("missing processor for flag '%#x'") % (flag)
1392 1392 raise RevlogError(message)
1393 1393
1394 1394 processor = _flagprocessors[flag]
1395 1395 if processor is not None:
1396 1396 readtransform, writetransform, rawtransform = processor
1397 1397
1398 1398 if raw:
1399 1399 vhash = rawtransform(self, text)
1400 1400 elif operation == 'read':
1401 1401 text, vhash = readtransform(self, text)
1402 1402 else: # write operation
1403 1403 text, vhash = writetransform(self, text)
1404 1404 validatehash = validatehash and vhash
1405 1405
1406 1406 return text, validatehash
1407 1407
1408 1408 def checkhash(self, text, node, p1=None, p2=None, rev=None):
1409 1409 """Check node hash integrity.
1410 1410
1411 1411 Available as a function so that subclasses can extend hash mismatch
1412 1412 behaviors as needed.
1413 1413 """
1414 1414 if p1 is None and p2 is None:
1415 1415 p1, p2 = self.parents(node)
1416 1416 if node != self.hash(text, p1, p2):
1417 1417 revornode = rev
1418 1418 if revornode is None:
1419 1419 revornode = templatefilters.short(hex(node))
1420 1420 raise RevlogError(_("integrity check failed on %s:%s")
1421 1421 % (self.indexfile, revornode))
1422 1422
1423 1423 def checkinlinesize(self, tr, fp=None):
1424 1424 """Check if the revlog is too big for inline and convert if so.
1425 1425
1426 1426 This should be called after revisions are added to the revlog. If the
1427 1427 revlog has grown too large to be an inline revlog, it will convert it
1428 1428 to use multiple index and data files.
1429 1429 """
1430 1430 if not self._inline or (self.start(-2) + self.length(-2)) < _maxinline:
1431 1431 return
1432 1432
1433 1433 trinfo = tr.find(self.indexfile)
1434 1434 if trinfo is None:
1435 1435 raise RevlogError(_("%s not found in the transaction")
1436 1436 % self.indexfile)
1437 1437
1438 1438 trindex = trinfo[2]
1439 1439 if trindex is not None:
1440 1440 dataoff = self.start(trindex)
1441 1441 else:
1442 1442 # revlog was stripped at start of transaction, use all leftover data
1443 1443 trindex = len(self) - 1
1444 1444 dataoff = self.end(-2)
1445 1445
1446 1446 tr.add(self.datafile, dataoff)
1447 1447
1448 1448 if fp:
1449 1449 fp.flush()
1450 1450 fp.close()
1451 1451
1452 1452 df = self.opener(self.datafile, 'w')
1453 1453 try:
1454 1454 for r in self:
1455 1455 df.write(self._getsegmentforrevs(r, r)[1])
1456 1456 finally:
1457 1457 df.close()
1458 1458
1459 1459 fp = self.opener(self.indexfile, 'w', atomictemp=True,
1460 1460 checkambig=self._checkambig)
1461 1461 self.version &= ~FLAG_INLINE_DATA
1462 1462 self._inline = False
1463 1463 for i in self:
1464 1464 e = self._io.packentry(self.index[i], self.node, self.version, i)
1465 1465 fp.write(e)
1466 1466
1467 1467 # if we don't call close, the temp file will never replace the
1468 1468 # real index
1469 1469 fp.close()
1470 1470
1471 1471 tr.replace(self.indexfile, trindex * self._io.size)
1472 1472 self._chunkclear()
1473 1473
1474 1474 def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
1475 1475 node=None, flags=REVIDX_DEFAULT_FLAGS):
1476 1476 """add a revision to the log
1477 1477
1478 1478 text - the revision data to add
1479 1479 transaction - the transaction object used for rollback
1480 1480 link - the linkrev data to add
1481 1481 p1, p2 - the parent nodeids of the revision
1482 1482 cachedelta - an optional precomputed delta
1483 1483 node - nodeid of revision; typically node is not specified, and it is
1484 1484 computed by default as hash(text, p1, p2), however subclasses might
1485 1485 use different hashing method (and override checkhash() in such case)
1486 1486 flags - the known flags to set on the revision
1487 1487 """
1488 1488 if link == nullrev:
1489 1489 raise RevlogError(_("attempted to add linkrev -1 to %s")
1490 1490 % self.indexfile)
1491 1491
1492 1492 if flags:
1493 1493 node = node or self.hash(text, p1, p2)
1494 1494
1495 1495 rawtext, validatehash = self._processflags(text, flags, 'write')
1496 1496
1497 1497 # If the flag processor modifies the revision data, ignore any provided
1498 1498 # cachedelta.
1499 1499 if rawtext != text:
1500 1500 cachedelta = None
1501 1501
1502 1502 if len(rawtext) > _maxentrysize:
1503 1503 raise RevlogError(
1504 1504 _("%s: size of %d bytes exceeds maximum revlog storage of 2GiB")
1505 1505 % (self.indexfile, len(rawtext)))
1506 1506
1507 1507 node = node or self.hash(rawtext, p1, p2)
1508 1508 if node in self.nodemap:
1509 1509 return node
1510 1510
1511 1511 if validatehash:
1512 1512 self.checkhash(rawtext, node, p1=p1, p2=p2)
1513 1513
1514 1514 return self.addrawrevision(rawtext, transaction, link, p1, p2, node,
1515 1515 flags, cachedelta=cachedelta)
1516 1516
1517 1517 def addrawrevision(self, rawtext, transaction, link, p1, p2, node, flags,
1518 1518 cachedelta=None):
1519 1519 """add a raw revision with known flags, node and parents
1520 1520 useful when reusing a revision not stored in this revlog (ex: received
1521 1521 over wire, or read from an external bundle).
1522 1522 """
1523 1523 dfh = None
1524 1524 if not self._inline:
1525 1525 dfh = self.opener(self.datafile, "a+")
1526 1526 ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
1527 1527 try:
1528 1528 return self._addrevision(node, rawtext, transaction, link, p1, p2,
1529 1529 flags, cachedelta, ifh, dfh)
1530 1530 finally:
1531 1531 if dfh:
1532 1532 dfh.close()
1533 1533 ifh.close()
1534 1534
1535 1535 def compress(self, data):
1536 1536 """Generate a possibly-compressed representation of data."""
1537 1537 if not data:
1538 1538 return '', data
1539 1539
1540 1540 compressed = self._compressor.compress(data)
1541 1541
1542 1542 if compressed:
1543 1543 # The revlog compressor added the header in the returned data.
1544 1544 return '', compressed
1545 1545
1546 1546 if data[0:1] == '\0':
1547 1547 return '', data
1548 1548 return 'u', data
1549 1549
1550 1550 def decompress(self, data):
1551 1551 """Decompress a revlog chunk.
1552 1552
1553 1553 The chunk is expected to begin with a header identifying the
1554 1554 format type so it can be routed to an appropriate decompressor.
1555 1555 """
1556 1556 if not data:
1557 1557 return data
1558 1558
1559 1559 # Revlogs are read much more frequently than they are written and many
1560 1560 # chunks only take microseconds to decompress, so performance is
1561 1561 # important here.
1562 1562 #
1563 1563 # We can make a few assumptions about revlogs:
1564 1564 #
1565 1565 # 1) the majority of chunks will be compressed (as opposed to inline
1566 1566 # raw data).
1567 1567 # 2) decompressing *any* data will likely by at least 10x slower than
1568 1568 # returning raw inline data.
1569 1569 # 3) we want to prioritize common and officially supported compression
1570 1570 # engines
1571 1571 #
1572 1572 # It follows that we want to optimize for "decompress compressed data
1573 1573 # when encoded with common and officially supported compression engines"
1574 1574 # case over "raw data" and "data encoded by less common or non-official
1575 1575 # compression engines." That is why we have the inline lookup first
1576 1576 # followed by the compengines lookup.
1577 1577 #
1578 1578 # According to `hg perfrevlogchunks`, this is ~0.5% faster for zlib
1579 1579 # compressed chunks. And this matters for changelog and manifest reads.
1580 1580 t = data[0:1]
1581 1581
1582 1582 if t == 'x':
1583 1583 try:
1584 1584 return _zlibdecompress(data)
1585 1585 except zlib.error as e:
1586 1586 raise RevlogError(_('revlog decompress error: %s') % str(e))
1587 1587 # '\0' is more common than 'u' so it goes first.
1588 1588 elif t == '\0':
1589 1589 return data
1590 1590 elif t == 'u':
1591 1591 return util.buffer(data, 1)
1592 1592
1593 1593 try:
1594 1594 compressor = self._decompressors[t]
1595 1595 except KeyError:
1596 1596 try:
1597 1597 engine = util.compengines.forrevlogheader(t)
1598 1598 compressor = engine.revlogcompressor()
1599 1599 self._decompressors[t] = compressor
1600 1600 except KeyError:
1601 1601 raise RevlogError(_('unknown compression type %r') % t)
1602 1602
1603 1603 return compressor.decompress(data)
1604 1604
1605 1605 def _isgooddelta(self, d, textlen):
1606 1606 """Returns True if the given delta is good. Good means that it is within
1607 1607 the disk span, disk size, and chain length bounds that we know to be
1608 1608 performant."""
1609 1609 if d is None:
1610 1610 return False
1611 1611
1612 1612 # - 'dist' is the distance from the base revision -- bounding it limits
1613 1613 # the amount of I/O we need to do.
1614 1614 # - 'compresseddeltalen' is the sum of the total size of deltas we need
1615 1615 # to apply -- bounding it limits the amount of CPU we consume.
1616 1616 dist, l, data, base, chainbase, chainlen, compresseddeltalen = d
1617 1617 if (dist > textlen * 4 or l > textlen or
1618 1618 compresseddeltalen > textlen * 2 or
1619 1619 (self._maxchainlen and chainlen > self._maxchainlen)):
1620 1620 return False
1621 1621
1622 1622 return True
1623 1623
1624 1624 def _addrevision(self, node, rawtext, transaction, link, p1, p2, flags,
1625 1625 cachedelta, ifh, dfh, alwayscache=False):
1626 1626 """internal function to add revisions to the log
1627 1627
1628 1628 see addrevision for argument descriptions.
1629 1629
1630 1630 note: "addrevision" takes non-raw text, "_addrevision" takes raw text.
1631 1631
1632 1632 invariants:
1633 1633 - rawtext is optional (can be None); if not set, cachedelta must be set.
1634 1634 if both are set, they must correspond to each other.
1635 1635 """
1636 1636 btext = [rawtext]
1637 1637 def buildtext():
1638 1638 if btext[0] is not None:
1639 1639 return btext[0]
1640 1640 baserev = cachedelta[0]
1641 1641 delta = cachedelta[1]
1642 1642 # special case deltas which replace entire base; no need to decode
1643 1643 # base revision. this neatly avoids censored bases, which throw when
1644 1644 # they're decoded.
1645 1645 hlen = struct.calcsize(">lll")
1646 1646 if delta[:hlen] == mdiff.replacediffheader(self.rawsize(baserev),
1647 1647 len(delta) - hlen):
1648 1648 btext[0] = delta[hlen:]
1649 1649 else:
1650 1650 if self._inline:
1651 1651 fh = ifh
1652 1652 else:
1653 1653 fh = dfh
1654 1654 basetext = self.revision(baserev, _df=fh, raw=True)
1655 1655 btext[0] = mdiff.patch(basetext, delta)
1656 1656
1657 1657 try:
1658 1658 res = self._processflags(btext[0], flags, 'read', raw=True)
1659 1659 btext[0], validatehash = res
1660 1660 if validatehash:
1661 1661 self.checkhash(btext[0], node, p1=p1, p2=p2)
1662 1662 if flags & REVIDX_ISCENSORED:
1663 1663 raise RevlogError(_('node %s is not censored') % node)
1664 1664 except CensoredNodeError:
1665 1665 # must pass the censored index flag to add censored revisions
1666 1666 if not flags & REVIDX_ISCENSORED:
1667 1667 raise
1668 1668 return btext[0]
1669 1669
1670 1670 def builddelta(rev):
1671 1671 # can we use the cached delta?
1672 1672 if cachedelta and cachedelta[0] == rev:
1673 1673 delta = cachedelta[1]
1674 1674 else:
1675 1675 t = buildtext()
1676 1676 if self.iscensored(rev):
1677 1677 # deltas based on a censored revision must replace the
1678 1678 # full content in one patch, so delta works everywhere
1679 1679 header = mdiff.replacediffheader(self.rawsize(rev), len(t))
1680 1680 delta = header + t
1681 1681 else:
1682 1682 if self._inline:
1683 1683 fh = ifh
1684 1684 else:
1685 1685 fh = dfh
1686 1686 ptext = self.revision(rev, _df=fh, raw=True)
1687 1687 delta = mdiff.textdiff(ptext, t)
1688 1688 header, data = self.compress(delta)
1689 1689 deltalen = len(header) + len(data)
1690 1690 chainbase = self.chainbase(rev)
1691 1691 dist = deltalen + offset - self.start(chainbase)
1692 1692 if self._generaldelta:
1693 1693 base = rev
1694 1694 else:
1695 1695 base = chainbase
1696 1696 chainlen, compresseddeltalen = self._chaininfo(rev)
1697 1697 chainlen += 1
1698 1698 compresseddeltalen += deltalen
1699 1699 return (dist, deltalen, (header, data), base,
1700 1700 chainbase, chainlen, compresseddeltalen)
1701 1701
1702 1702 curr = len(self)
1703 1703 prev = curr - 1
1704 1704 offset = self.end(prev)
1705 1705 delta = None
1706 1706 p1r, p2r = self.rev(p1), self.rev(p2)
1707 1707
1708 1708 # full versions are inserted when the needed deltas
1709 1709 # become comparable to the uncompressed text
1710 1710 if rawtext is None:
1711 1711 textlen = mdiff.patchedsize(self.rawsize(cachedelta[0]),
1712 1712 cachedelta[1])
1713 1713 else:
1714 1714 textlen = len(rawtext)
1715 1715
1716 1716 # should we try to build a delta?
1717 1717 if prev != nullrev and self.storedeltachains:
1718 1718 tested = set()
1719 1719 # This condition is true most of the time when processing
1720 1720 # changegroup data into a generaldelta repo. The only time it
1721 1721 # isn't true is if this is the first revision in a delta chain
1722 1722 # or if ``format.generaldelta=true`` disabled ``lazydeltabase``.
1723 1723 if cachedelta and self._generaldelta and self._lazydeltabase:
1724 1724 # Assume what we received from the server is a good choice
1725 1725 # build delta will reuse the cache
1726 1726 candidatedelta = builddelta(cachedelta[0])
1727 1727 tested.add(cachedelta[0])
1728 1728 if self._isgooddelta(candidatedelta, textlen):
1729 1729 delta = candidatedelta
1730 1730 if delta is None and self._generaldelta:
1731 1731 # exclude already lazy tested base if any
1732 1732 parents = [p for p in (p1r, p2r)
1733 1733 if p != nullrev and p not in tested]
1734 1734 if parents and not self._aggressivemergedeltas:
1735 1735 # Pick whichever parent is closer to us (to minimize the
1736 1736 # chance of having to build a fulltext).
1737 1737 parents = [max(parents)]
1738 1738 tested.update(parents)
1739 1739 pdeltas = []
1740 1740 for p in parents:
1741 1741 pd = builddelta(p)
1742 1742 if self._isgooddelta(pd, textlen):
1743 1743 pdeltas.append(pd)
1744 1744 if pdeltas:
1745 1745 delta = min(pdeltas, key=lambda x: x[1])
1746 1746 if delta is None and prev not in tested:
1747 1747 # other approach failed try against prev to hopefully save us a
1748 1748 # fulltext.
1749 1749 candidatedelta = builddelta(prev)
1750 1750 if self._isgooddelta(candidatedelta, textlen):
1751 1751 delta = candidatedelta
1752 1752 if delta is not None:
1753 1753 dist, l, data, base, chainbase, chainlen, compresseddeltalen = delta
1754 1754 else:
1755 1755 rawtext = buildtext()
1756 1756 data = self.compress(rawtext)
1757 1757 l = len(data[1]) + len(data[0])
1758 1758 base = chainbase = curr
1759 1759
1760 1760 e = (offset_type(offset, flags), l, textlen,
1761 1761 base, link, p1r, p2r, node)
1762 1762 self.index.insert(-1, e)
1763 1763 self.nodemap[node] = curr
1764 1764
1765 1765 entry = self._io.packentry(e, self.node, self.version, curr)
1766 1766 self._writeentry(transaction, ifh, dfh, entry, data, link, offset)
1767 1767
1768 1768 if alwayscache and rawtext is None:
1769 1769 rawtext = buildtext()
1770 1770
1771 1771 if type(rawtext) == str: # only accept immutable objects
1772 1772 self._cache = (node, curr, rawtext)
1773 1773 self._chainbasecache[curr] = chainbase
1774 1774 return node
1775 1775
1776 1776 def _writeentry(self, transaction, ifh, dfh, entry, data, link, offset):
1777 1777 # Files opened in a+ mode have inconsistent behavior on various
1778 1778 # platforms. Windows requires that a file positioning call be made
1779 1779 # when the file handle transitions between reads and writes. See
1780 1780 # 3686fa2b8eee and the mixedfilemodewrapper in windows.py. On other
1781 1781 # platforms, Python or the platform itself can be buggy. Some versions
1782 1782 # of Solaris have been observed to not append at the end of the file
1783 1783 # if the file was seeked to before the end. See issue4943 for more.
1784 1784 #
1785 1785 # We work around this issue by inserting a seek() before writing.
1786 1786 # Note: This is likely not necessary on Python 3.
1787 1787 ifh.seek(0, os.SEEK_END)
1788 1788 if dfh:
1789 1789 dfh.seek(0, os.SEEK_END)
1790 1790
1791 1791 curr = len(self) - 1
1792 1792 if not self._inline:
1793 1793 transaction.add(self.datafile, offset)
1794 1794 transaction.add(self.indexfile, curr * len(entry))
1795 1795 if data[0]:
1796 1796 dfh.write(data[0])
1797 1797 dfh.write(data[1])
1798 1798 ifh.write(entry)
1799 1799 else:
1800 1800 offset += curr * self._io.size
1801 1801 transaction.add(self.indexfile, offset, curr)
1802 1802 ifh.write(entry)
1803 1803 ifh.write(data[0])
1804 1804 ifh.write(data[1])
1805 1805 self.checkinlinesize(transaction, ifh)
1806 1806
1807 1807 def addgroup(self, cg, linkmapper, transaction, addrevisioncb=None):
1808 1808 """
1809 1809 add a delta group
1810 1810
1811 1811 given a set of deltas, add them to the revision log. the
1812 1812 first delta is against its parent, which should be in our
1813 1813 log, the rest are against the previous delta.
1814 1814
1815 1815 If ``addrevisioncb`` is defined, it will be called with arguments of
1816 1816 this revlog and the node that was added.
1817 1817 """
1818 1818
1819 1819 # track the base of the current delta log
1820 1820 content = []
1821 1821 node = None
1822 1822
1823 1823 r = len(self)
1824 1824 end = 0
1825 1825 if r:
1826 1826 end = self.end(r - 1)
1827 1827 ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
1828 1828 isize = r * self._io.size
1829 1829 if self._inline:
1830 1830 transaction.add(self.indexfile, end + isize, r)
1831 1831 dfh = None
1832 1832 else:
1833 1833 transaction.add(self.indexfile, isize, r)
1834 1834 transaction.add(self.datafile, end)
1835 1835 dfh = self.opener(self.datafile, "a+")
1836 1836 def flush():
1837 1837 if dfh:
1838 1838 dfh.flush()
1839 1839 ifh.flush()
1840 1840 try:
1841 1841 # loop through our set of deltas
1842 1842 chain = None
1843 1843 for chunkdata in iter(lambda: cg.deltachunk(chain), {}):
1844 1844 node = chunkdata['node']
1845 1845 p1 = chunkdata['p1']
1846 1846 p2 = chunkdata['p2']
1847 1847 cs = chunkdata['cs']
1848 1848 deltabase = chunkdata['deltabase']
1849 1849 delta = chunkdata['delta']
1850 1850 flags = chunkdata['flags'] or REVIDX_DEFAULT_FLAGS
1851 1851
1852 1852 content.append(node)
1853 1853
1854 1854 link = linkmapper(cs)
1855 1855 if node in self.nodemap:
1856 1856 # this can happen if two branches make the same change
1857 1857 chain = node
1858 1858 continue
1859 1859
1860 1860 for p in (p1, p2):
1861 1861 if p not in self.nodemap:
1862 1862 raise LookupError(p, self.indexfile,
1863 1863 _('unknown parent'))
1864 1864
1865 1865 if deltabase not in self.nodemap:
1866 1866 raise LookupError(deltabase, self.indexfile,
1867 1867 _('unknown delta base'))
1868 1868
1869 1869 baserev = self.rev(deltabase)
1870 1870
1871 1871 if baserev != nullrev and self.iscensored(baserev):
1872 1872 # if base is censored, delta must be full replacement in a
1873 1873 # single patch operation
1874 1874 hlen = struct.calcsize(">lll")
1875 1875 oldlen = self.rawsize(baserev)
1876 1876 newlen = len(delta) - hlen
1877 1877 if delta[:hlen] != mdiff.replacediffheader(oldlen, newlen):
1878 1878 raise error.CensoredBaseError(self.indexfile,
1879 1879 self.node(baserev))
1880 1880
1881 1881 if not flags and self._peek_iscensored(baserev, delta, flush):
1882 1882 flags |= REVIDX_ISCENSORED
1883 1883
1884 1884 # We assume consumers of addrevisioncb will want to retrieve
1885 1885 # the added revision, which will require a call to
1886 1886 # revision(). revision() will fast path if there is a cache
1887 1887 # hit. So, we tell _addrevision() to always cache in this case.
1888 1888 # We're only using addgroup() in the context of changegroup
1889 1889 # generation so the revision data can always be handled as raw
1890 1890 # by the flagprocessor.
1891 1891 chain = self._addrevision(node, None, transaction, link,
1892 1892 p1, p2, flags, (baserev, delta),
1893 1893 ifh, dfh,
1894 1894 alwayscache=bool(addrevisioncb))
1895 1895
1896 1896 if addrevisioncb:
1897 1897 addrevisioncb(self, chain)
1898 1898
1899 1899 if not dfh and not self._inline:
1900 1900 # addrevision switched from inline to conventional
1901 1901 # reopen the index
1902 1902 ifh.close()
1903 1903 dfh = self.opener(self.datafile, "a+")
1904 1904 ifh = self.opener(self.indexfile, "a+",
1905 1905 checkambig=self._checkambig)
1906 1906 finally:
1907 1907 if dfh:
1908 1908 dfh.close()
1909 1909 ifh.close()
1910 1910
1911 1911 return content
1912 1912
1913 1913 def iscensored(self, rev):
1914 1914 """Check if a file revision is censored."""
1915 1915 return False
1916 1916
1917 1917 def _peek_iscensored(self, baserev, delta, flush):
1918 1918 """Quickly check if a delta produces a censored revision."""
1919 1919 return False
1920 1920
1921 1921 def getstrippoint(self, minlink):
1922 1922 """find the minimum rev that must be stripped to strip the linkrev
1923 1923
1924 1924 Returns a tuple containing the minimum rev and a set of all revs that
1925 1925 have linkrevs that will be broken by this strip.
1926 1926 """
1927 1927 brokenrevs = set()
1928 1928 strippoint = len(self)
1929 1929
1930 1930 heads = {}
1931 1931 futurelargelinkrevs = set()
1932 1932 for head in self.headrevs():
1933 1933 headlinkrev = self.linkrev(head)
1934 1934 heads[head] = headlinkrev
1935 1935 if headlinkrev >= minlink:
1936 1936 futurelargelinkrevs.add(headlinkrev)
1937 1937
1938 1938 # This algorithm involves walking down the rev graph, starting at the
1939 1939 # heads. Since the revs are topologically sorted according to linkrev,
1940 1940 # once all head linkrevs are below the minlink, we know there are
1941 1941 # no more revs that could have a linkrev greater than minlink.
1942 1942 # So we can stop walking.
1943 1943 while futurelargelinkrevs:
1944 1944 strippoint -= 1
1945 1945 linkrev = heads.pop(strippoint)
1946 1946
1947 1947 if linkrev < minlink:
1948 1948 brokenrevs.add(strippoint)
1949 1949 else:
1950 1950 futurelargelinkrevs.remove(linkrev)
1951 1951
1952 1952 for p in self.parentrevs(strippoint):
1953 1953 if p != nullrev:
1954 1954 plinkrev = self.linkrev(p)
1955 1955 heads[p] = plinkrev
1956 1956 if plinkrev >= minlink:
1957 1957 futurelargelinkrevs.add(plinkrev)
1958 1958
1959 1959 return strippoint, brokenrevs
1960 1960
1961 1961 def strip(self, minlink, transaction):
1962 1962 """truncate the revlog on the first revision with a linkrev >= minlink
1963 1963
1964 1964 This function is called when we're stripping revision minlink and
1965 1965 its descendants from the repository.
1966 1966
1967 1967 We have to remove all revisions with linkrev >= minlink, because
1968 1968 the equivalent changelog revisions will be renumbered after the
1969 1969 strip.
1970 1970
1971 1971 So we truncate the revlog on the first of these revisions, and
1972 1972 trust that the caller has saved the revisions that shouldn't be
1973 1973 removed and that it'll re-add them after this truncation.
1974 1974 """
1975 1975 if len(self) == 0:
1976 1976 return
1977 1977
1978 1978 rev, _ = self.getstrippoint(minlink)
1979 1979 if rev == len(self):
1980 1980 return
1981 1981
1982 1982 # first truncate the files on disk
1983 1983 end = self.start(rev)
1984 1984 if not self._inline:
1985 1985 transaction.add(self.datafile, end)
1986 1986 end = rev * self._io.size
1987 1987 else:
1988 1988 end += rev * self._io.size
1989 1989
1990 1990 transaction.add(self.indexfile, end)
1991 1991
1992 1992 # then reset internal state in memory to forget those revisions
1993 1993 self._cache = None
1994 1994 self._chaininfocache = {}
1995 1995 self._chunkclear()
1996 1996 for x in xrange(rev, len(self)):
1997 1997 del self.nodemap[self.node(x)]
1998 1998
1999 1999 del self.index[rev:-1]
2000 2000
2001 2001 def checksize(self):
2002 2002 expected = 0
2003 2003 if len(self):
2004 2004 expected = max(0, self.end(len(self) - 1))
2005 2005
2006 2006 try:
2007 2007 f = self.opener(self.datafile)
2008 2008 f.seek(0, 2)
2009 2009 actual = f.tell()
2010 2010 f.close()
2011 2011 dd = actual - expected
2012 2012 except IOError as inst:
2013 2013 if inst.errno != errno.ENOENT:
2014 2014 raise
2015 2015 dd = 0
2016 2016
2017 2017 try:
2018 2018 f = self.opener(self.indexfile)
2019 2019 f.seek(0, 2)
2020 2020 actual = f.tell()
2021 2021 f.close()
2022 2022 s = self._io.size
2023 2023 i = max(0, actual // s)
2024 2024 di = actual - (i * s)
2025 2025 if self._inline:
2026 2026 databytes = 0
2027 2027 for r in self:
2028 2028 databytes += max(0, self.length(r))
2029 2029 dd = 0
2030 2030 di = actual - len(self) * s - databytes
2031 2031 except IOError as inst:
2032 2032 if inst.errno != errno.ENOENT:
2033 2033 raise
2034 2034 di = 0
2035 2035
2036 2036 return (dd, di)
2037 2037
2038 2038 def files(self):
2039 2039 res = [self.indexfile]
2040 2040 if not self._inline:
2041 2041 res.append(self.datafile)
2042 2042 return res
2043 2043
2044 2044 DELTAREUSEALWAYS = 'always'
2045 2045 DELTAREUSESAMEREVS = 'samerevs'
2046 2046 DELTAREUSENEVER = 'never'
2047 2047
2048 2048 DELTAREUSEALL = {'always', 'samerevs', 'never'}
2049 2049
2050 2050 def clone(self, tr, destrevlog, addrevisioncb=None,
2051 2051 deltareuse=DELTAREUSESAMEREVS, aggressivemergedeltas=None):
2052 2052 """Copy this revlog to another, possibly with format changes.
2053 2053
2054 2054 The destination revlog will contain the same revisions and nodes.
2055 2055 However, it may not be bit-for-bit identical due to e.g. delta encoding
2056 2056 differences.
2057 2057
2058 2058 The ``deltareuse`` argument control how deltas from the existing revlog
2059 2059 are preserved in the destination revlog. The argument can have the
2060 2060 following values:
2061 2061
2062 2062 DELTAREUSEALWAYS
2063 2063 Deltas will always be reused (if possible), even if the destination
2064 2064 revlog would not select the same revisions for the delta. This is the
2065 2065 fastest mode of operation.
2066 2066 DELTAREUSESAMEREVS
2067 2067 Deltas will be reused if the destination revlog would pick the same
2068 2068 revisions for the delta. This mode strikes a balance between speed
2069 2069 and optimization.
2070 2070 DELTAREUSENEVER
2071 2071 Deltas will never be reused. This is the slowest mode of execution.
2072 2072 This mode can be used to recompute deltas (e.g. if the diff/delta
2073 2073 algorithm changes).
2074 2074
2075 2075 Delta computation can be slow, so the choice of delta reuse policy can
2076 2076 significantly affect run time.
2077 2077
2078 2078 The default policy (``DELTAREUSESAMEREVS``) strikes a balance between
2079 2079 two extremes. Deltas will be reused if they are appropriate. But if the
2080 2080 delta could choose a better revision, it will do so. This means if you
2081 2081 are converting a non-generaldelta revlog to a generaldelta revlog,
2082 2082 deltas will be recomputed if the delta's parent isn't a parent of the
2083 2083 revision.
2084 2084
2085 2085 In addition to the delta policy, the ``aggressivemergedeltas`` argument
2086 2086 controls whether to compute deltas against both parents for merges.
2087 2087 By default, the current default is used.
2088 2088 """
2089 2089 if deltareuse not in self.DELTAREUSEALL:
2090 2090 raise ValueError(_('value for deltareuse invalid: %s') % deltareuse)
2091 2091
2092 2092 if len(destrevlog):
2093 2093 raise ValueError(_('destination revlog is not empty'))
2094 2094
2095 2095 if getattr(self, 'filteredrevs', None):
2096 2096 raise ValueError(_('source revlog has filtered revisions'))
2097 2097 if getattr(destrevlog, 'filteredrevs', None):
2098 2098 raise ValueError(_('destination revlog has filtered revisions'))
2099 2099
2100 2100 # lazydeltabase controls whether to reuse a cached delta, if possible.
2101 2101 oldlazydeltabase = destrevlog._lazydeltabase
2102 2102 oldamd = destrevlog._aggressivemergedeltas
2103 2103
2104 2104 try:
2105 2105 if deltareuse == self.DELTAREUSEALWAYS:
2106 2106 destrevlog._lazydeltabase = True
2107 2107 elif deltareuse == self.DELTAREUSESAMEREVS:
2108 2108 destrevlog._lazydeltabase = False
2109 2109
2110 2110 destrevlog._aggressivemergedeltas = aggressivemergedeltas or oldamd
2111 2111
2112 2112 populatecachedelta = deltareuse in (self.DELTAREUSEALWAYS,
2113 2113 self.DELTAREUSESAMEREVS)
2114 2114
2115 2115 index = self.index
2116 2116 for rev in self:
2117 2117 entry = index[rev]
2118 2118
2119 2119 # Some classes override linkrev to take filtered revs into
2120 2120 # account. Use raw entry from index.
2121 2121 flags = entry[0] & 0xffff
2122 2122 linkrev = entry[4]
2123 2123 p1 = index[entry[5]][7]
2124 2124 p2 = index[entry[6]][7]
2125 2125 node = entry[7]
2126 2126
2127 2127 # (Possibly) reuse the delta from the revlog if allowed and
2128 2128 # the revlog chunk is a delta.
2129 2129 cachedelta = None
2130 2130 rawtext = None
2131 2131 if populatecachedelta:
2132 2132 dp = self.deltaparent(rev)
2133 2133 if dp != nullrev:
2134 2134 cachedelta = (dp, str(self._chunk(rev)))
2135 2135
2136 2136 if not cachedelta:
2137 2137 rawtext = self.revision(rev, raw=True)
2138 2138
2139 2139 ifh = destrevlog.opener(destrevlog.indexfile, 'a+',
2140 2140 checkambig=False)
2141 2141 dfh = None
2142 2142 if not destrevlog._inline:
2143 2143 dfh = destrevlog.opener(destrevlog.datafile, 'a+')
2144 2144 try:
2145 2145 destrevlog._addrevision(node, rawtext, tr, linkrev, p1, p2,
2146 2146 flags, cachedelta, ifh, dfh)
2147 2147 finally:
2148 2148 if dfh:
2149 2149 dfh.close()
2150 2150 ifh.close()
2151 2151
2152 2152 if addrevisioncb:
2153 2153 addrevisioncb(self, rev, node)
2154 2154 finally:
2155 2155 destrevlog._lazydeltabase = oldlazydeltabase
2156 2156 destrevlog._aggressivemergedeltas = oldamd
General Comments 0
You need to be logged in to leave comments. Login now