##// END OF EJS Templates
linelog: add a Python implementation of the linelog datastructure...
Augie Fackler -
r38831:422d6610 default
parent child Browse files
Show More
@@ -0,0 +1,251 b''
1 linelog is a storage format inspired by the "Interleaved deltas" idea. See
2 https://en.wikipedia.org/wiki/Interleaved_deltas for its introduction.
3
4 0. SCCS Weave
5
6 To understand what linelog is, first we have a quick look at a simplified
7 (with header removed) SCCS weave format, which is an implementation of the
8 "Interleaved deltas" idea.
9
10 0.1 Basic SCCS Weave File Format
11
12 A SCCS weave file consists of plain text lines. Each line is either a
13 special instruction starting with "^A" or part of the content of the real
14 file the weave tracks. There are 3 important operations, where REV denotes
15 the revision number:
16
17 ^AI REV, marking the beginning of an insertion block introduced by REV
18 ^AD REV, marking the beginning of a deletion block introduced by REV
19 ^AE REV, marking the end of the block started by "^AI REV" or "^AD REV"
20
21 Note on revision numbers: For any two different revision numbers, one must
22 be an ancestor of the other to make them comparable. This enforces linear
23 history. Besides, the comparison functions (">=", "<") should be efficient.
24 This means, if revisions are strings like git or hg, an external map is
25 required to convert them into integers.
26
27 For example, to represent the following changes:
28
29 REV 1 | REV 2 | REV 3
30 ------+-------+-------
31 a | a | a
32 b | b | 2
33 c | 1 | c
34 | 2 |
35 | c |
36
37 A possible weave file looks like:
38
39 ^AI 1
40 a
41 ^AD 3
42 b
43 ^AI 2
44 1
45 ^AE 3
46 2
47 ^AE 2
48 c
49 ^AE 1
50
51 An "^AE" does not always match its nearest operation ("^AI" or "^AD"). In
52 the above example, "^AE 3" does not match the nearest "^AI 2" but "^AD 3".
53 Therefore we need some extra information for "^AE". The SCCS weave uses a
54 revision number. It could also be a boolean value about whether it is an
55 insertion or a deletion (see section 0.4).
56
57 0.2 Checkout
58
59 The "checkout" operation is to retrieve file content at a given revision,
60 say X. It's doable by going through the file line by line and:
61
62 - If meet ^AI rev, and rev > X, find the corresponding ^AE and jump there
63 - If meet ^AD rev, and rev <= X, find the corresponding ^AE and jump there
64 - Ignore ^AE
65 - For normal lines, just output them
66
67 0.3 Annotate
68
69 The "annotate" operation is to show extra metadata like the revision number
70 and the original line number a line comes from.
71
72 It's basically just a "Checkout". For the extra metadata, they can be stored
73 side by side with the line contents. Alternatively, we can infer the
74 revision number from "^AI"s.
75
76 Some SCM tools have to calculate diffs on the fly and thus are much slower
77 on this operation.
78
79 0.4 Tree Structure
80
81 The word "interleaved" is used because "^AI" .. "^AE" and "^AD" .. "^AE"
82 blocks can be interleaved.
83
84 If we consider insertions and deletions separately, they can form tree
85 structures, respectively.
86
87 +--- ^AI 1 +--- ^AD 3
88 | +- ^AI 2 | +- ^AD 2
89 | | | |
90 | +- ^AE 2 | +- ^AE 2
91 | |
92 +--- ^AE 1 +--- ^AE 3
93
94 More specifically, it's possible to build a tree for all insertions, where
95 the tree node has the structure "(rev, startline, endline)". "startline" is
96 the line number of "^AI" and "endline" is the line number of the matched
97 "^AE". The tree will have these properties:
98
99 1. child.rev > parent.rev
100 2. child.startline > parent.startline
101 3. child.endline < parent.endline
102
103 A similar tree for all deletions can also be built with the first property
104 changed to:
105
106 1. child.rev < parent.rev
107
108 0.5 Malformed Cases
109
110 The following cases are considered malformed in our implementation:
111
112 1. Interleaved insertions, or interleaved deletions.
113 It can be rewritten to a non-interleaved tree structure.
114
115 ^AI/D x ^AI/D x
116 ^AI/D y -> ^AI/D y
117 ^AE x ^AE y
118 ^AE y ^AE x
119
120 2. Nested insertions, where the inner one has a smaller revision number.
121 It can be rewritten to a non-nested form.
122
123 ^AI x + 1 ^AI x + 1
124 ^AI x -> ^AE x + 1
125 ^AE x ^AI x
126 ^AE x + 1 ^AE x
127
128 3. Insertion or deletion inside another deletion, where the outer deletion
129 block has a smaller revision number.
130
131 ^AD x ^AD x
132 ^AI/D x + 1 -> ^AE x
133 ^AE x + 1 ^AI/D x + 1
134 ^AE x ^AE x
135
136 Some of them may be valid in other implementations for special purposes. For
137 example, to "revive" a previously deleted block in a newer revision.
138
139 0.6 Cases Can Be Optimized
140
141 It's always better to get things nested. For example, the left is more
142 efficient than the right while they represent the same content:
143
144 +--- ^AD 2 +- ^AD 1
145 | +- ^AD 1 | LINE A
146 | | LINE A +- ^AE 1
147 | +- ^AE 1 +- ^AD 2
148 | LINE B | LINE B
149 +--- ^AE 2 +- ^AE 2
150
151 Our implementation sometimes generates the less efficient data. To always
152 get the optimal form, it requires extra code complexity that seems unworthy.
153
154 0.7 Inefficiency
155
156 The file format can be slow because:
157
158 - Inserting a new line at position P requires rewriting all data after P.
159 - Finding "^AE" requires walking through the content (O(N), where N is the
160 number of lines between "^AI/D" and "^AE").
161
162 1. Linelog
163
164 The linelog is a binary format that dedicates to speed up mercurial (or
165 git)'s "annotate" operation. It's designed to avoid issues mentioned in
166 section 0.7.
167
168 1.1 Content Stored
169
170 Linelog is not another storage for file contents. It only stores line
171 numbers and corresponding revision numbers, instead of actual line content.
172 This is okay for the "annotate" operation because usually the external
173 source is fast to checkout the content of a file at a specific revision.
174
175 A typical SCCS weave is also fast on the "grep" operation, which needs
176 random accesses to line contents from different revisions of a file. This
177 can be slow with linelog's no-line-content design. However we could use
178 an extra map ((rev, line num) -> line content) to speed it up.
179
180 Note the revision numbers in linelog should be independent from mercurial
181 integer revision numbers. There should be some mapping between linelog rev
182 and hg hash stored side by side, to make the files reusable after being
183 copied to another machine.
184
185 1.2 Basic Format
186
187 A linelog file consists of "instruction"s. An "instruction" can be either:
188
189 - JGE REV ADDR # jump to ADDR if rev >= REV
190 - JL REV ADDR # jump to ADDR if rev < REV
191 - LINE REV LINENUM # append the (LINENUM+1)-th line in revision REV
192
193 For example, here is the example linelog representing the same file with
194 3 revisions mentioned in section 0.1:
195
196 SCCS | Linelog
197 Weave | Addr : Instruction
198 ------+------+-------------
199 ^AI 1 | 0 : JL 1 8
200 a | 1 : LINE 1 0
201 ^AD 3 | 2 : JGE 3 6
202 b | 3 : LINE 1 1
203 ^AI 2 | 4 : JL 2 7
204 1 | 5 : LINE 2 2
205 ^AE 3 |
206 2 | 6 : LINE 2 3
207 ^AE 2 |
208 c | 7 : LINE 1 2
209 ^AE 1 |
210 | 8 : END
211
212 This way, "find ^AE" is O(1) because we just jump there. And we can insert
213 new lines without rewriting most part of the file by appending new lines and
214 changing a single instruction to jump to them.
215
216 The current implementation uses 64 bits for an instruction: The opcode (JGE,
217 JL or LINE) takes 2 bits, REV takes 30 bits and ADDR or LINENUM takes 32
218 bits. It also stores the max revision number and buffer size at the first
219 64 bits for quick access to these values.
220
221 1.3 Comparing with Mercurial's revlog format
222
223 Apparently, linelog is very different from revlog: linelog stores rev and
224 line numbers, while revlog has line contents and other metadata (like
225 parents, flags). However, the revlog format could also be used to store rev
226 and line numbers. For example, to speed up the annotate operation, we could
227 also pre-calculate annotate results and just store them using the revlog
228 format.
229
230 Therefore, linelog is actually somehow similar to revlog, with the important
231 trade-off that it only supports linear history (mentioned in section 0.1).
232 Essentially, the differences are:
233
234 a) Linelog is full of deltas, while revlog could contain full file
235 contents sometimes. So linelog is smaller. Revlog could trade
236 reconstruction speed for file size - best case, revlog is as small as
237 linelog.
238 b) The interleaved delta structure allows skipping large portion of
239 uninteresting deltas so linelog's content reconstruction is faster than
240 the delta-only version of revlog (however it's possible to construct
241 a case where interleaved deltas degrade to plain deltas, so linelog
242 worst case would be delta-only revlog). Revlog could trade file size
243 for reconstruction speed.
244 c) Linelog implicitly maintains the order of all lines it stores. So it
245 could dump all the lines from all revisions, with a reasonable order.
246 While revlog could also dump all line additions, it requires extra
247 computation to figure out the order putting those lines - that's some
248 kind of "merge".
249
250 "c" makes "hg absorb" easier to implement and makes it possible to do
251 "annotate --deleted".
@@ -0,0 +1,414 b''
1 # linelog - efficient cache for annotate data
2 #
3 # Copyright 2018 Google LLC.
4 #
5 # This software may be used and distributed according to the terms of the
6 # GNU General Public License version 2 or any later version.
7 """linelog is an efficient cache for annotate data inspired by SCCS Weaves.
8
9 SCCS Weaves are an implementation of
10 https://en.wikipedia.org/wiki/Interleaved_deltas. See
11 mercurial/help/internals/linelog.txt for an exploration of SCCS weaves
12 and how linelog works in detail.
13
14 Here's a hacker's summary: a linelog is a program which is executed in
15 the context of a revision. Executing the program emits information
16 about lines, including the revision that introduced them and the line
17 number in the file at the introducing revision. When an insertion or
18 deletion is performed on the file, a jump instruction is used to patch
19 in a new body of annotate information.
20 """
21 from __future__ import absolute_import, print_function
22
23 import abc
24 import struct
25
26 from mercurial import (
27 pycompat,
28 )
29 from .thirdparty import (
30 attr,
31 )
32
33 _llentry = struct.Struct('>II')
34
35 class LineLogError(Exception):
36 """Error raised when something bad happens internally in linelog."""
37
38 @attr.s
39 class lineinfo(object):
40 # Introducing revision of this line.
41 rev = attr.ib()
42 # Line number for this line in its introducing revision.
43 linenum = attr.ib()
44 # Private. Offset in the linelog program of this line. Used internally.
45 _offset = attr.ib()
46
47 @attr.s
48 class annotateresult(object):
49 rev = attr.ib()
50 lines = attr.ib()
51 _eof = attr.ib()
52
53 def __iter__(self):
54 return iter(self.lines)
55
56 class _llinstruction(object):
57
58 __metaclass__ = abc.ABCMeta
59
60 @abc.abstractmethod
61 def __init__(self, op1, op2):
62 pass
63
64 @abc.abstractmethod
65 def __str__(self):
66 pass
67
68 def __repr__(self):
69 return str(self)
70
71 @abc.abstractmethod
72 def __eq__(self, other):
73 pass
74
75 @abc.abstractmethod
76 def encode(self):
77 """Encode this instruction to the binary linelog format."""
78
79 @abc.abstractmethod
80 def execute(self, rev, pc, emit):
81 """Execute this instruction.
82
83 Args:
84 rev: The revision we're annotating.
85 pc: The current offset in the linelog program.
86 emit: A function that accepts a single lineinfo object.
87
88 Returns:
89 The new value of pc. Returns None if exeuction should stop
90 (that is, we've found the end of the file.)
91 """
92
93 class _jge(_llinstruction):
94 """If the current rev is greater than or equal to op1, jump to op2."""
95
96 def __init__(self, op1, op2):
97 self._cmprev = op1
98 self._target = op2
99
100 def __str__(self):
101 return 'JGE %d %d' % (self._cmprev, self._target)
102
103 def __eq__(self, other):
104 return (type(self) == type(other)
105 and self._cmprev == other._cmprev
106 and self._target == other._target)
107
108 def encode(self):
109 return _llentry.pack(self._cmprev << 2, self._target)
110
111 def execute(self, rev, pc, emit):
112 if rev >= self._cmprev:
113 return self._target
114 return pc + 1
115
116 class _jump(_llinstruction):
117 """Unconditional jumps are expressed as a JGE with op1 set to 0."""
118
119 def __init__(self, op1, op2):
120 if op1 != 0:
121 raise LineLogError("malformed JUMP, op1 must be 0, got %d" % op1)
122 self._target = op2
123
124 def __str__(self):
125 return 'JUMP %d' % (self._target)
126
127 def __eq__(self, other):
128 return (type(self) == type(other)
129 and self._target == other._target)
130
131 def encode(self):
132 return _llentry.pack(0, self._target)
133
134 def execute(self, rev, pc, emit):
135 return self._target
136
137 class _eof(_llinstruction):
138 """EOF is expressed as a JGE that always jumps to 0."""
139
140 def __init__(self, op1, op2):
141 if op1 != 0:
142 raise LineLogError("malformed EOF, op1 must be 0, got %d" % op1)
143 if op2 != 0:
144 raise LineLogError("malformed EOF, op2 must be 0, got %d" % op2)
145
146 def __str__(self):
147 return 'EOF'
148
149 def __eq__(self, other):
150 return type(self) == type(other)
151
152 def encode(self):
153 return _llentry.pack(0, 0)
154
155 def execute(self, rev, pc, emit):
156 return None
157
158 class _jl(_llinstruction):
159 """If the current rev is less than op1, jump to op2."""
160
161 def __init__(self, op1, op2):
162 self._cmprev = op1
163 self._target = op2
164
165 def __str__(self):
166 return 'JL %d %d' % (self._cmprev, self._target)
167
168 def __eq__(self, other):
169 return (type(self) == type(other)
170 and self._cmprev == other._cmprev
171 and self._target == other._target)
172
173 def encode(self):
174 return _llentry.pack(1 | (self._cmprev << 2), self._target)
175
176 def execute(self, rev, pc, emit):
177 if rev < self._cmprev:
178 return self._target
179 return pc + 1
180
181 class _line(_llinstruction):
182 """Emit a line."""
183
184 def __init__(self, op1, op2):
185 # This line was introduced by this revision number.
186 self._rev = op1
187 # This line had the specified line number in the introducing revision.
188 self._origlineno = op2
189
190 def __str__(self):
191 return 'LINE %d %d' % (self._rev, self._origlineno)
192
193 def __eq__(self, other):
194 return (type(self) == type(other)
195 and self._rev == other._rev
196 and self._origlineno == other._origlineno)
197
198 def encode(self):
199 return _llentry.pack(2 | (self._rev << 2), self._origlineno)
200
201 def execute(self, rev, pc, emit):
202 emit(lineinfo(self._rev, self._origlineno, pc))
203 return pc + 1
204
205 def _decodeone(data, offset):
206 """Decode a single linelog instruction from an offset in a buffer."""
207 try:
208 op1, op2 = _llentry.unpack_from(data, offset)
209 except struct.error as e:
210 raise LineLogError('reading an instruction failed: %r' % e)
211 opcode = op1 & 0b11
212 op1 = op1 >> 2
213 if opcode == 0:
214 if op1 == 0:
215 if op2 == 0:
216 return _eof(op1, op2)
217 return _jump(op1, op2)
218 return _jge(op1, op2)
219 elif opcode == 1:
220 return _jl(op1, op2)
221 elif opcode == 2:
222 return _line(op1, op2)
223 raise NotImplementedError('Unimplemented opcode %r' % opcode)
224
225 class linelog(object):
226 """Efficient cache for per-line history information."""
227
228 def __init__(self, program=None, maxrev=0):
229 if program is None:
230 # We pad the program with an extra leading EOF so that our
231 # offsets will match the C code exactly. This means we can
232 # interoperate with the C code.
233 program = [_eof(0, 0), _eof(0, 0)]
234 self._program = program
235 self._lastannotate = None
236 self._maxrev = maxrev
237
238 def __eq__(self, other):
239 return (type(self) == type(other)
240 and self._program == other._program
241 and self._maxrev == other._maxrev)
242
243 def __repr__(self):
244 return '<linelog at %s: maxrev=%d size=%d>' % (
245 hex(id(self)), self._maxrev, len(self._program))
246
247 def debugstr(self):
248 fmt = '%%%dd %%s' % len(str(len(self._program)))
249 return '\n'.join(
250 fmt % (idx, i) for idx, i in enumerate(self._program[1:], 1))
251
252 @classmethod
253 def fromdata(cls, buf):
254 if len(buf) % _llentry.size != 0:
255 raise LineLogError(
256 "invalid linelog buffer size %d (must be a multiple of %d)" % (
257 len(buf), _llentry.size))
258 expected = len(buf) / _llentry.size
259 fakejge = _decodeone(buf, 0)
260 if isinstance(fakejge, _jump):
261 maxrev = 0
262 else:
263 maxrev = fakejge._cmprev
264 numentries = fakejge._target
265 if expected != numentries:
266 raise LineLogError("corrupt linelog data: claimed"
267 " %d entries but given data for %d entries" % (
268 expected, numentries))
269 instructions = [_eof(0, 0)]
270 for offset in pycompat.xrange(1, numentries):
271 instructions.append(_decodeone(buf, offset * _llentry.size))
272 return cls(instructions, maxrev=maxrev)
273
274 def encode(self):
275 hdr = _jge(self._maxrev, len(self._program)).encode()
276 return hdr + ''.join(i.encode() for i in self._program[1:])
277
278 def clear(self):
279 self._program = []
280 self._maxrev = 0
281 self._lastannotate = None
282
283 def replacelines(self, rev, a1, a2, b1, b2):
284 """Replace lines [a1, a2) with lines [b1, b2)."""
285 if self._lastannotate:
286 # TODO(augie): make replacelines() accept a revision at
287 # which we're editing as well as a revision to mark
288 # responsible for the edits. In hg-experimental it's
289 # stateful like this, so we're doing the same thing to
290 # retain compatibility with absorb until that's imported.
291 ar = self._lastannotate
292 else:
293 ar = self.annotate(rev)
294 # ar = self.annotate(self._maxrev)
295 if a1 > len(ar.lines):
296 raise LineLogError(
297 '%d contains %d lines, tried to access line %d' % (
298 rev, len(ar.lines), a1))
299 elif a1 == len(ar.lines):
300 # Simulated EOF instruction since we're at EOF, which
301 # doesn't have a "real" line.
302 a1inst = _eof(0, 0)
303 a1info = lineinfo(0, 0, ar._eof)
304 else:
305 a1info = ar.lines[a1]
306 a1inst = self._program[a1info._offset]
307 oldproglen = len(self._program)
308 appendinst = self._program.append
309
310 # insert
311 if b1 < b2:
312 # Determine the jump target for the JGE at the start of
313 # the new block.
314 tgt = oldproglen + (b2 - b1 + 1)
315 # Jump to skip the insert if we're at an older revision.
316 appendinst(_jl(rev, tgt))
317 for linenum in pycompat.xrange(b1, b2):
318 appendinst(_line(rev, linenum))
319 # delete
320 if a1 < a2:
321 if a2 > len(ar.lines):
322 raise LineLogError(
323 '%d contains %d lines, tried to access line %d' % (
324 rev, len(ar.lines), a2))
325 elif a2 == len(ar.lines):
326 endaddr = ar._eof
327 else:
328 endaddr = ar.lines[a2]._offset
329 if a2 > 0 and rev < self._maxrev:
330 # If we're here, we're deleting a chunk of an old
331 # commit, so we need to be careful and not touch
332 # invisible lines between a2-1 and a2 (IOW, lines that
333 # are added later).
334 endaddr = ar.lines[a2 - 1]._offset + 1
335 appendinst(_jge(rev, endaddr))
336 # copy instruction from a1
337 appendinst(a1inst)
338 # if a1inst isn't a jump or EOF, then we need to add an unconditional
339 # jump back into the program here.
340 if not isinstance(a1inst, (_jump, _eof)):
341 appendinst(_jump(0, a1info._offset + 1))
342 # Patch instruction at a1, which makes our patch live.
343 self._program[a1info._offset] = _jump(0, oldproglen)
344 # For compat with the C version, re-annotate rev so that
345 # self.annotateresult is cromulent.. We could fix up the
346 # annotateresult in place (which is how the C version works),
347 # but for now we'll pass on that and see if it matters in
348 # practice.
349 self.annotate(max(self._lastannotate.rev, rev))
350 if rev > self._maxrev:
351 self._maxrev = rev
352
353 def annotate(self, rev):
354 pc = 1
355 lines = []
356 # Sanity check: if len(lines) is longer than len(program), we
357 # hit an infinite loop in the linelog program somehow and we
358 # should stop.
359 while pc is not None and len(lines) < len(self._program):
360 inst = self._program[pc]
361 lastpc = pc
362 pc = inst.execute(rev, pc, lines.append)
363 if pc is not None:
364 raise LineLogError(
365 'Probably hit an infinite loop in linelog. Program:\n' +
366 self.debugstr())
367 ar = annotateresult(rev, lines, lastpc)
368 self._lastannotate = ar
369 return ar
370
371 @property
372 def maxrev(self):
373 return self._maxrev
374
375 # Stateful methods which depend on the value of the last
376 # annotation run. This API is for compatiblity with the original
377 # linelog, and we should probably consider refactoring it.
378 @property
379 def annotateresult(self):
380 """Return the last annotation result. C linelog code exposed this."""
381 return [(l.rev, l.linenum) for l in self._lastannotate.lines]
382
383 def getoffset(self, line):
384 return self._lastannotate.lines[line]._offset
385
386 def getalllines(self, start=0, end=0):
387 """Get all lines that ever occurred in [start, end).
388
389 Passing start == end == 0 means "all lines ever".
390
391 This works in terms of *internal* program offsets, not line numbers.
392 """
393 pc = start or 1
394 lines = []
395 # only take as many steps as there are instructions in the
396 # program - if we don't find an EOF or our stop-line before
397 # then, something is badly broken.
398 for step in pycompat.xrange(len(self._program)):
399 inst = self._program[pc]
400 nextpc = pc + 1
401 if isinstance(inst, _jump):
402 nextpc = inst._target
403 elif isinstance(inst, _eof):
404 return lines
405 elif isinstance(inst, (_jl, _jge)):
406 pass
407 elif isinstance(inst, _line):
408 lines.append((inst._rev, inst._origlineno))
409 else:
410 raise LineLogError("Illegal instruction %r" % inst)
411 if nextpc == end:
412 return lines
413 pc = nextpc
414 raise LineLogError("Failed to perform getalllines")
@@ -0,0 +1,173 b''
1 from __future__ import absolute_import, print_function
2
3 import difflib
4 import random
5 import unittest
6
7 from mercurial import linelog
8
9 maxlinenum = 0xffffff
10 maxb1 = 0xffffff
11 maxdeltaa = 10
12 maxdeltab = 10
13
14 def _genedits(seed, endrev):
15 lines = []
16 random.seed(seed)
17 rev = 0
18 for rev in range(0, endrev):
19 n = len(lines)
20 a1 = random.randint(0, n)
21 a2 = random.randint(a1, min(n, a1 + maxdeltaa))
22 b1 = random.randint(0, maxb1)
23 b2 = random.randint(b1, b1 + maxdeltab)
24 blines = [(rev, idx) for idx in range(b1, b2)]
25 lines[a1:a2] = blines
26 yield lines, rev, a1, a2, b1, b2
27
28 class linelogtests(unittest.TestCase):
29 def testlinelogencodedecode(self):
30 program = [linelog._eof(0, 0),
31 linelog._jge(41, 42),
32 linelog._jump(0, 43),
33 linelog._eof(0, 0),
34 linelog._jl(44, 45),
35 linelog._line(46, 47),
36 ]
37 ll = linelog.linelog(program, maxrev=100)
38 enc = ll.encode()
39 # round-trips okay
40 self.assertEqual(linelog.linelog.fromdata(enc)._program, ll._program)
41 self.assertEqual(linelog.linelog.fromdata(enc), ll)
42 # This encoding matches the encoding used by hg-experimental's
43 # linelog file, or is supposed to if it doesn't.
44 self.assertEqual(enc, ('\x00\x00\x01\x90\x00\x00\x00\x06'
45 '\x00\x00\x00\xa4\x00\x00\x00*'
46 '\x00\x00\x00\x00\x00\x00\x00+'
47 '\x00\x00\x00\x00\x00\x00\x00\x00'
48 '\x00\x00\x00\xb1\x00\x00\x00-'
49 '\x00\x00\x00\xba\x00\x00\x00/'))
50
51 def testsimpleedits(self):
52 ll = linelog.linelog()
53 # Initial revision: add lines 0, 1, and 2
54 ll.replacelines(1, 0, 0, 0, 3)
55 self.assertEqual([(l.rev, l.linenum) for l in ll.annotate(1)],
56 [(1, 0),
57 (1, 1),
58 (1, 2),
59 ])
60 # Replace line 1 with a new line
61 ll.replacelines(2, 1, 2, 1, 2)
62 self.assertEqual([(l.rev, l.linenum) for l in ll.annotate(2)],
63 [(1, 0),
64 (2, 1),
65 (1, 2),
66 ])
67 # delete a line out of 2
68 ll.replacelines(3, 1, 2, 0, 0)
69 self.assertEqual([(l.rev, l.linenum) for l in ll.annotate(3)],
70 [(1, 0),
71 (1, 2),
72 ])
73 # annotation of 1 is unchanged
74 self.assertEqual([(l.rev, l.linenum) for l in ll.annotate(1)],
75 [(1, 0),
76 (1, 1),
77 (1, 2),
78 ])
79 ll.annotate(3) # set internal state to revision 3
80 start = ll.getoffset(0)
81 end = ll.getoffset(1)
82 self.assertEqual(ll.getalllines(start, end), [
83 (1, 0),
84 (2, 1),
85 (1, 1),
86 ])
87 self.assertEqual(ll.getalllines(), [
88 (1, 0),
89 (2, 1),
90 (1, 1),
91 (1, 2),
92 ])
93
94 def testparseclinelogfile(self):
95 # This data is what the replacements in testsimpleedits
96 # produce when fed to the original linelog.c implementation.
97 data = ('\x00\x00\x00\x0c\x00\x00\x00\x0f'
98 '\x00\x00\x00\x00\x00\x00\x00\x02'
99 '\x00\x00\x00\x05\x00\x00\x00\x06'
100 '\x00\x00\x00\x06\x00\x00\x00\x00'
101 '\x00\x00\x00\x00\x00\x00\x00\x07'
102 '\x00\x00\x00\x06\x00\x00\x00\x02'
103 '\x00\x00\x00\x00\x00\x00\x00\x00'
104 '\x00\x00\x00\t\x00\x00\x00\t'
105 '\x00\x00\x00\x00\x00\x00\x00\x0c'
106 '\x00\x00\x00\x08\x00\x00\x00\x05'
107 '\x00\x00\x00\x06\x00\x00\x00\x01'
108 '\x00\x00\x00\x00\x00\x00\x00\x05'
109 '\x00\x00\x00\x0c\x00\x00\x00\x05'
110 '\x00\x00\x00\n\x00\x00\x00\x01'
111 '\x00\x00\x00\x00\x00\x00\x00\t')
112 llc = linelog.linelog.fromdata(data)
113 self.assertEqual([(l.rev, l.linenum) for l in llc.annotate(1)],
114 [(1, 0),
115 (1, 1),
116 (1, 2),
117 ])
118 self.assertEqual([(l.rev, l.linenum) for l in llc.annotate(2)],
119 [(1, 0),
120 (2, 1),
121 (1, 2),
122 ])
123 self.assertEqual([(l.rev, l.linenum) for l in llc.annotate(3)],
124 [(1, 0),
125 (1, 2),
126 ])
127 # Check we emit the same bytecode.
128 ll = linelog.linelog()
129 # Initial revision: add lines 0, 1, and 2
130 ll.replacelines(1, 0, 0, 0, 3)
131 # Replace line 1 with a new line
132 ll.replacelines(2, 1, 2, 1, 2)
133 # delete a line out of 2
134 ll.replacelines(3, 1, 2, 0, 0)
135 diff = '\n ' + '\n '.join(difflib.unified_diff(
136 ll.debugstr().splitlines(), llc.debugstr().splitlines(),
137 'python', 'c', lineterm=''))
138 self.assertEqual(ll._program, llc._program, 'Program mismatch: ' + diff)
139 # Done as a secondary step so we get a better result if the
140 # program is where the mismatch is.
141 self.assertEqual(ll, llc)
142 self.assertEqual(ll.encode(), data)
143
144 def testanothersimplecase(self):
145 ll = linelog.linelog()
146 ll.replacelines(3, 0, 0, 0, 2)
147 ll.replacelines(4, 0, 2, 0, 0)
148 self.assertEqual([(l.rev, l.linenum) for l in ll.annotate(4)],
149 [])
150 self.assertEqual([(l.rev, l.linenum) for l in ll.annotate(3)],
151 [(3, 0), (3, 1)])
152 # rev 2 is empty because contents were only ever introduced in rev 3
153 self.assertEqual([(l.rev, l.linenum) for l in ll.annotate(2)],
154 [])
155
156 def testrandomedits(self):
157 # Inspired by original linelog tests.
158 seed = random.random()
159 numrevs = 2000
160 ll = linelog.linelog()
161 # Populate linelog
162 for lines, rev, a1, a2, b1, b2 in _genedits(seed, numrevs):
163 ll.replacelines(rev, a1, a2, b1, b2)
164 ar = ll.annotate(rev)
165 self.assertEqual(ll.annotateresult, lines)
166 # Verify we can get back these states by annotating each rev
167 for lines, rev, a1, a2, b1, b2 in _genedits(seed, numrevs):
168 ar = ll.annotate(rev)
169 self.assertEqual([(l.rev, l.linenum) for l in ar], lines)
170
171 if __name__ == '__main__':
172 import silenttestrunner
173 silenttestrunner.main(__name__)
@@ -46,6 +46,7 b''
46 <File Id="internals.censor.txt" Name="censor.txt" />
46 <File Id="internals.censor.txt" Name="censor.txt" />
47 <File Id="internals.changegroups.txt" Name="changegroups.txt" />
47 <File Id="internals.changegroups.txt" Name="changegroups.txt" />
48 <File Id="internals.config.txt" Name="config.txt" />
48 <File Id="internals.config.txt" Name="config.txt" />
49 <File Id="internals.linelog.txt" Name="linelog.txt" />
49 <File Id="internals.requirements.txt" Name="requirements.txt" />
50 <File Id="internals.requirements.txt" Name="requirements.txt" />
50 <File Id="internals.revlogs.txt" Name="revlogs.txt" />
51 <File Id="internals.revlogs.txt" Name="revlogs.txt" />
51 <File Id="internals.wireprotocol.txt" Name="wireprotocol.txt" />
52 <File Id="internals.wireprotocol.txt" Name="wireprotocol.txt" />
General Comments 0
You need to be logged in to leave comments. Login now