##// END OF EJS Templates
revlog: remove some revlogNG terminology...
Gregory Szorc -
r32392:d47b6236 default
parent child Browse files
Show More
@@ -1,215 +1,215
1 Revision logs - or *revlogs* - are an append only data structure for
1 Revision logs - or *revlogs* - are an append only data structure for
2 storing discrete entries, or *revisions*. They are the primary storage
2 storing discrete entries, or *revisions*. They are the primary storage
3 mechanism of repository data.
3 mechanism of repository data.
4
4
5 Revlogs effectively model a directed acyclic graph (DAG). Each node
5 Revlogs effectively model a directed acyclic graph (DAG). Each node
6 has edges to 1 or 2 *parent* nodes. Each node contains metadata and
6 has edges to 1 or 2 *parent* nodes. Each node contains metadata and
7 the raw value for that node.
7 the raw value for that node.
8
8
9 Revlogs consist of entries which have metadata and revision data.
9 Revlogs consist of entries which have metadata and revision data.
10 Metadata includes the hash of the revision's content, sizes, and
10 Metadata includes the hash of the revision's content, sizes, and
11 links to its *parent* entries. The collective metadata is referred
11 links to its *parent* entries. The collective metadata is referred
12 to as the *index* and the revision data is the *data*.
12 to as the *index* and the revision data is the *data*.
13
13
14 Revision data is stored as a series of compressed deltas against previous
14 Revision data is stored as a series of compressed deltas against previous
15 revisions.
15 revisions.
16
16
17 Revlogs are written in an append-only fashion. We never need to rewrite
17 Revlogs are written in an append-only fashion. We never need to rewrite
18 a file to insert nor do we need to remove data. Rolling back in-progress
18 a file to insert nor do we need to remove data. Rolling back in-progress
19 writes can be performed by truncating files. Read locks can be avoided
19 writes can be performed by truncating files. Read locks can be avoided
20 using simple techniques. This means that references to other data in
20 using simple techniques. This means that references to other data in
21 the same revlog *always* refer to a previous entry.
21 the same revlog *always* refer to a previous entry.
22
22
23 Revlogs can be modeled as 0-indexed arrays. The first revision is
23 Revlogs can be modeled as 0-indexed arrays. The first revision is
24 revision #0 and the second is revision #1. The revision -1 is typically
24 revision #0 and the second is revision #1. The revision -1 is typically
25 used to mean *does not exist* or *not defined*.
25 used to mean *does not exist* or *not defined*.
26
26
27 File Format
27 File Format
28 ===========
28 ===========
29
29
30 A revlog begins with a 32-bit big endian integer holding version info
30 A revlog begins with a 32-bit big endian integer holding version info
31 and feature flags. This integer is shared with the first revision
31 and feature flags. This integer is shared with the first revision
32 entry.
32 entry.
33
33
34 This integer is logically divided into 2 16-bit shorts. The least
34 This integer is logically divided into 2 16-bit shorts. The least
35 significant half of the integer is the format/version short. The other
35 significant half of the integer is the format/version short. The other
36 short holds feature flags that dictate behavior of the revlog.
36 short holds feature flags that dictate behavior of the revlog.
37
37
38 Only 1 bit of the format/version short is currently used. Remaining
38 Only 1 bit of the format/version short is currently used. Remaining
39 bits are reserved for future use.
39 bits are reserved for future use.
40
40
41 The following values for the format/version short are defined:
41 The following values for the format/version short are defined:
42
42
43 0
43 0
44 The original revlog version.
44 The original revlog version.
45 1
45 1
46 RevlogNG (*next generation*). It replaced version 0 when it was
46 RevlogNG (*next generation*). It replaced version 0 when it was
47 implemented in 2006.
47 implemented in 2006.
48
48
49 The feature flags short consists of bit flags. Where 0 is the least
49 The feature flags short consists of bit flags. Where 0 is the least
50 significant bit, the following bit offsets define flags:
50 significant bit, the following bit offsets define flags:
51
51
52 0
52 0
53 Store revision data inline.
53 Store revision data inline.
54 1
54 1
55 Generaldelta encoding.
55 Generaldelta encoding.
56
56
57 2-15
57 2-15
58 Reserved for future use.
58 Reserved for future use.
59
59
60 The following header values are common:
60 The following header values are common:
61
61
62 00 00 00 01
62 00 00 00 01
63 RevlogNG
63 v1
64 00 01 00 01
64 00 01 00 01
65 RevlogNG + inline
65 v1 + inline
66 00 02 00 01
66 00 02 00 01
67 RevlogNG + generaldelta
67 v1 + generaldelta
68 00 03 00 01
68 00 03 00 01
69 RevlogNG + inline + generaldelta
69 v1 + inline + generaldelta
70
70
71 Following the 32-bit header is the remainder of the first index entry.
71 Following the 32-bit header is the remainder of the first index entry.
72 Following that are remaining *index* data. Inlined revision data is
72 Following that are remaining *index* data. Inlined revision data is
73 possibly located between index entries. More on this layout is described
73 possibly located between index entries. More on this layout is described
74 below.
74 below.
75
75
76 RevlogNG Format
76 Version 1 Format
77 ===============
77 ================
78
78
79 RevlogNG (version 1) begins with an index describing the revisions in
79 Version 1 (RevlogNG) begins with an index describing the revisions in
80 the revlog. If the ``inline`` flag is set, revision data is stored inline,
80 the revlog. If the ``inline`` flag is set, revision data is stored inline,
81 or between index entries (as opposed to in a separate container).
81 or between index entries (as opposed to in a separate container).
82
82
83 Each index entry is 64 bytes. The byte layout of each entry is as
83 Each index entry is 64 bytes. The byte layout of each entry is as
84 follows, with byte 0 being the first byte (all data stored as big endian):
84 follows, with byte 0 being the first byte (all data stored as big endian):
85
85
86 0-3 (4 bytes) (rev 0 only)
86 0-3 (4 bytes) (rev 0 only)
87 Revlog header
87 Revlog header
88
88
89 0-5 (6 bytes)
89 0-5 (6 bytes)
90 Absolute offset of revision data from beginning of revlog.
90 Absolute offset of revision data from beginning of revlog.
91
91
92 6-7 (2 bytes)
92 6-7 (2 bytes)
93 Bit flags impacting revision behavior. The following bit offsets define:
93 Bit flags impacting revision behavior. The following bit offsets define:
94
94
95 0: REVIDX_ISCENSORED revision has censor metadata, must be verified.
95 0: REVIDX_ISCENSORED revision has censor metadata, must be verified.
96
96
97 1: REVIDX_ELLIPSIS revision hash does not match its data. Used by
97 1: REVIDX_ELLIPSIS revision hash does not match its data. Used by
98 narrowhg
98 narrowhg
99
99
100 2: REVIDX_EXTSTORED revision data is stored externally.
100 2: REVIDX_EXTSTORED revision data is stored externally.
101
101
102 8-11 (4 bytes)
102 8-11 (4 bytes)
103 Compressed length of revision data / chunk as stored in revlog.
103 Compressed length of revision data / chunk as stored in revlog.
104
104
105 12-15 (4 bytes)
105 12-15 (4 bytes)
106 Uncompressed length of revision data. This is the size of the full
106 Uncompressed length of revision data. This is the size of the full
107 revision data, not the size of the chunk post decompression.
107 revision data, not the size of the chunk post decompression.
108
108
109 16-19 (4 bytes)
109 16-19 (4 bytes)
110 Base or previous revision this revision's delta was produced against.
110 Base or previous revision this revision's delta was produced against.
111 This revision holds full text (as opposed to a delta) if it points to
111 This revision holds full text (as opposed to a delta) if it points to
112 itself. For generaldelta repos, this is the previous revision in the
112 itself. For generaldelta repos, this is the previous revision in the
113 delta chain. For non-generaldelta repos, this is the base or first
113 delta chain. For non-generaldelta repos, this is the base or first
114 revision in the delta chain.
114 revision in the delta chain.
115
115
116 20-23 (4 bytes)
116 20-23 (4 bytes)
117 A revision this revision is *linked* to. This allows a revision in
117 A revision this revision is *linked* to. This allows a revision in
118 one revlog to be forever associated with a revision in another
118 one revlog to be forever associated with a revision in another
119 revlog. For example, a file's revlog may point to the changelog
119 revlog. For example, a file's revlog may point to the changelog
120 revision that introduced it.
120 revision that introduced it.
121
121
122 24-27 (4 bytes)
122 24-27 (4 bytes)
123 Revision of 1st parent. -1 indicates no parent.
123 Revision of 1st parent. -1 indicates no parent.
124
124
125 28-31 (4 bytes)
125 28-31 (4 bytes)
126 Revision of 2nd parent. -1 indicates no 2nd parent.
126 Revision of 2nd parent. -1 indicates no 2nd parent.
127
127
128 32-63 (32 bytes)
128 32-63 (32 bytes)
129 Hash of revision's full text. Currently, SHA-1 is used and only
129 Hash of revision's full text. Currently, SHA-1 is used and only
130 the first 20 bytes of this field are used. The rest of the bytes
130 the first 20 bytes of this field are used. The rest of the bytes
131 are ignored and should be stored as \0.
131 are ignored and should be stored as \0.
132
132
133 If inline revision data is being stored, the compressed revision data
133 If inline revision data is being stored, the compressed revision data
134 (of length from bytes offset 8-11 from the index entry) immediately
134 (of length from bytes offset 8-11 from the index entry) immediately
135 follows the index entry. There is no header on the revision data. There
135 follows the index entry. There is no header on the revision data. There
136 is no padding between it and the index entries before and after.
136 is no padding between it and the index entries before and after.
137
137
138 If revision data is not inline, then raw revision data is stored in a
138 If revision data is not inline, then raw revision data is stored in a
139 separate byte container. The offsets from bytes 0-5 and the compressed
139 separate byte container. The offsets from bytes 0-5 and the compressed
140 length from bytes 8-11 define how to access this data.
140 length from bytes 8-11 define how to access this data.
141
141
142 The first 4 bytes of the revlog are shared between the revlog header
142 The first 4 bytes of the revlog are shared between the revlog header
143 and the 6 byte absolute offset field from the first revlog entry.
143 and the 6 byte absolute offset field from the first revlog entry.
144
144
145 Delta Chains
145 Delta Chains
146 ============
146 ============
147
147
148 Revision data is encoded as a chain of *chunks*. Each chain begins with
148 Revision data is encoded as a chain of *chunks*. Each chain begins with
149 the compressed original full text for that revision. Each subsequent
149 the compressed original full text for that revision. Each subsequent
150 *chunk* is a *delta* against the previous revision. We therefore call
150 *chunk* is a *delta* against the previous revision. We therefore call
151 these chains of chunks/deltas *delta chains*.
151 these chains of chunks/deltas *delta chains*.
152
152
153 The full text for a revision is reconstructed by loading the original
153 The full text for a revision is reconstructed by loading the original
154 full text for the base revision of a *delta chain* and then applying
154 full text for the base revision of a *delta chain* and then applying
155 *deltas* until the target revision is reconstructed.
155 *deltas* until the target revision is reconstructed.
156
156
157 *Delta chains* are limited in length so lookup time is bound. They are
157 *Delta chains* are limited in length so lookup time is bound. They are
158 limited to ~2x the length of the revision's data. The linear distance
158 limited to ~2x the length of the revision's data. The linear distance
159 between the base chunk and the final chunk is also limited so the
159 between the base chunk and the final chunk is also limited so the
160 amount of read I/O to load all chunks in the delta chain is bound.
160 amount of read I/O to load all chunks in the delta chain is bound.
161
161
162 Deltas and delta chains are either computed against the previous
162 Deltas and delta chains are either computed against the previous
163 revision in the revlog or another revision (almost certainly one of
163 revision in the revlog or another revision (almost certainly one of
164 the parents of the revision). Historically, deltas were computed against
164 the parents of the revision). Historically, deltas were computed against
165 the previous revision. The *generaldelta* revlog feature flag (enabled
165 the previous revision. The *generaldelta* revlog feature flag (enabled
166 by default in Mercurial 3.7) activates the mode where deltas are
166 by default in Mercurial 3.7) activates the mode where deltas are
167 computed against an arbitrary revision (almost certainly a parent revision).
167 computed against an arbitrary revision (almost certainly a parent revision).
168
168
169 File Storage
169 File Storage
170 ============
170 ============
171
171
172 Revlogs logically consist of an index (metadata of entries) and
172 Revlogs logically consist of an index (metadata of entries) and
173 revision data. This data may be stored together in a single file or in
173 revision data. This data may be stored together in a single file or in
174 separate files. The mechanism used is indicated by the ``inline`` feature
174 separate files. The mechanism used is indicated by the ``inline`` feature
175 flag on the revlog.
175 flag on the revlog.
176
176
177 Mercurial's behavior is to use inline storage until a revlog reaches a
177 Mercurial's behavior is to use inline storage until a revlog reaches a
178 certain size, at which point it will be converted to non-inline. The
178 certain size, at which point it will be converted to non-inline. The
179 reason there is a size limit on inline storage is to establish an upper
179 reason there is a size limit on inline storage is to establish an upper
180 bound on how much data must be read to load the index. It would be a waste
180 bound on how much data must be read to load the index. It would be a waste
181 to read tens or hundreds of extra megabytes of data just to access the
181 to read tens or hundreds of extra megabytes of data just to access the
182 index data.
182 index data.
183
183
184 The actual layout of revlog files on disk is governed by the repository's
184 The actual layout of revlog files on disk is governed by the repository's
185 *store format*. Typically, a ``.i`` file represents the index revlog
185 *store format*. Typically, a ``.i`` file represents the index revlog
186 (possibly containing inline data) and a ``.d`` file holds the revision data.
186 (possibly containing inline data) and a ``.d`` file holds the revision data.
187
187
188 Revision Entries
188 Revision Entries
189 ================
189 ================
190
190
191 Revision entries consist of an optional 1 byte header followed by an
191 Revision entries consist of an optional 1 byte header followed by an
192 encoding of the revision data. The headers are as follows:
192 encoding of the revision data. The headers are as follows:
193
193
194 \0 (0x00)
194 \0 (0x00)
195 Revision data is the entirety of the entry, including this header.
195 Revision data is the entirety of the entry, including this header.
196 u (0x75)
196 u (0x75)
197 Raw revision data follows.
197 Raw revision data follows.
198 x (0x78)
198 x (0x78)
199 zlib (RFC 1950) data.
199 zlib (RFC 1950) data.
200
200
201 The 0x78 value is actually the first byte of the zlib header (CMF byte).
201 The 0x78 value is actually the first byte of the zlib header (CMF byte).
202
202
203 Hash Computation
203 Hash Computation
204 ================
204 ================
205
205
206 The hash of the revision is stored in the index and is used both as a primary
206 The hash of the revision is stored in the index and is used both as a primary
207 key and for data integrity verification.
207 key and for data integrity verification.
208
208
209 Currently, SHA-1 is the only supported hashing algorithm. To obtain the SHA-1
209 Currently, SHA-1 is the only supported hashing algorithm. To obtain the SHA-1
210 hash of a revision:
210 hash of a revision:
211
211
212 1. Hash the parent nodes
212 1. Hash the parent nodes
213 2. Hash the fulltext of the revision
213 2. Hash the fulltext of the revision
214
214
215 The 20 byte node ids of the parents are fed into the hasher in ascending order.
215 The 20 byte node ids of the parents are fed into the hasher in ascending order.
@@ -1,2156 +1,2156
1 # revlog.py - storage back-end for mercurial
1 # revlog.py - storage back-end for mercurial
2 #
2 #
3 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
3 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
4 #
4 #
5 # This software may be used and distributed according to the terms of the
5 # This software may be used and distributed according to the terms of the
6 # GNU General Public License version 2 or any later version.
6 # GNU General Public License version 2 or any later version.
7
7
8 """Storage back-end for Mercurial.
8 """Storage back-end for Mercurial.
9
9
10 This provides efficient delta storage with O(1) retrieve and append
10 This provides efficient delta storage with O(1) retrieve and append
11 and O(changes) merge between branches.
11 and O(changes) merge between branches.
12 """
12 """
13
13
14 from __future__ import absolute_import
14 from __future__ import absolute_import
15
15
16 import collections
16 import collections
17 import errno
17 import errno
18 import hashlib
18 import hashlib
19 import os
19 import os
20 import struct
20 import struct
21 import zlib
21 import zlib
22
22
23 # import stuff from node for others to import from revlog
23 # import stuff from node for others to import from revlog
24 from .node import (
24 from .node import (
25 bin,
25 bin,
26 hex,
26 hex,
27 nullid,
27 nullid,
28 nullrev,
28 nullrev,
29 )
29 )
30 from .i18n import _
30 from .i18n import _
31 from . import (
31 from . import (
32 ancestor,
32 ancestor,
33 error,
33 error,
34 mdiff,
34 mdiff,
35 policy,
35 policy,
36 pycompat,
36 pycompat,
37 templatefilters,
37 templatefilters,
38 util,
38 util,
39 )
39 )
40
40
41 parsers = policy.importmod(r'parsers')
41 parsers = policy.importmod(r'parsers')
42
42
43 _pack = struct.pack
43 _pack = struct.pack
44 _unpack = struct.unpack
44 _unpack = struct.unpack
45 # Aliased for performance.
45 # Aliased for performance.
46 _zlibdecompress = zlib.decompress
46 _zlibdecompress = zlib.decompress
47
47
48 # revlog header flags
48 # revlog header flags
49 REVLOGV0 = 0
49 REVLOGV0 = 0
50 REVLOGV1 = 1
50 REVLOGV1 = 1
51 FLAG_INLINE_DATA = (1 << 16)
51 FLAG_INLINE_DATA = (1 << 16)
52 FLAG_GENERALDELTA = (1 << 17)
52 FLAG_GENERALDELTA = (1 << 17)
53 REVLOG_DEFAULT_FLAGS = FLAG_INLINE_DATA
53 REVLOG_DEFAULT_FLAGS = FLAG_INLINE_DATA
54 REVLOG_DEFAULT_FORMAT = REVLOGV1
54 REVLOG_DEFAULT_FORMAT = REVLOGV1
55 REVLOG_DEFAULT_VERSION = REVLOG_DEFAULT_FORMAT | REVLOG_DEFAULT_FLAGS
55 REVLOG_DEFAULT_VERSION = REVLOG_DEFAULT_FORMAT | REVLOG_DEFAULT_FLAGS
56 REVLOGV1_FLAGS = FLAG_INLINE_DATA | FLAG_GENERALDELTA
56 REVLOGV1_FLAGS = FLAG_INLINE_DATA | FLAG_GENERALDELTA
57
57
58 # revlog index flags
58 # revlog index flags
59 REVIDX_ISCENSORED = (1 << 15) # revision has censor metadata, must be verified
59 REVIDX_ISCENSORED = (1 << 15) # revision has censor metadata, must be verified
60 REVIDX_ELLIPSIS = (1 << 14) # revision hash does not match data (narrowhg)
60 REVIDX_ELLIPSIS = (1 << 14) # revision hash does not match data (narrowhg)
61 REVIDX_EXTSTORED = (1 << 13) # revision data is stored externally
61 REVIDX_EXTSTORED = (1 << 13) # revision data is stored externally
62 REVIDX_DEFAULT_FLAGS = 0
62 REVIDX_DEFAULT_FLAGS = 0
63 # stable order in which flags need to be processed and their processors applied
63 # stable order in which flags need to be processed and their processors applied
64 REVIDX_FLAGS_ORDER = [
64 REVIDX_FLAGS_ORDER = [
65 REVIDX_ISCENSORED,
65 REVIDX_ISCENSORED,
66 REVIDX_ELLIPSIS,
66 REVIDX_ELLIPSIS,
67 REVIDX_EXTSTORED,
67 REVIDX_EXTSTORED,
68 ]
68 ]
69 REVIDX_KNOWN_FLAGS = util.bitsfrom(REVIDX_FLAGS_ORDER)
69 REVIDX_KNOWN_FLAGS = util.bitsfrom(REVIDX_FLAGS_ORDER)
70
70
71 # max size of revlog with inline data
71 # max size of revlog with inline data
72 _maxinline = 131072
72 _maxinline = 131072
73 _chunksize = 1048576
73 _chunksize = 1048576
74
74
75 RevlogError = error.RevlogError
75 RevlogError = error.RevlogError
76 LookupError = error.LookupError
76 LookupError = error.LookupError
77 CensoredNodeError = error.CensoredNodeError
77 CensoredNodeError = error.CensoredNodeError
78 ProgrammingError = error.ProgrammingError
78 ProgrammingError = error.ProgrammingError
79
79
80 # Store flag processors (cf. 'addflagprocessor()' to register)
80 # Store flag processors (cf. 'addflagprocessor()' to register)
81 _flagprocessors = {
81 _flagprocessors = {
82 REVIDX_ISCENSORED: None,
82 REVIDX_ISCENSORED: None,
83 }
83 }
84
84
85 def addflagprocessor(flag, processor):
85 def addflagprocessor(flag, processor):
86 """Register a flag processor on a revision data flag.
86 """Register a flag processor on a revision data flag.
87
87
88 Invariant:
88 Invariant:
89 - Flags need to be defined in REVIDX_KNOWN_FLAGS and REVIDX_FLAGS_ORDER.
89 - Flags need to be defined in REVIDX_KNOWN_FLAGS and REVIDX_FLAGS_ORDER.
90 - Only one flag processor can be registered on a specific flag.
90 - Only one flag processor can be registered on a specific flag.
91 - flagprocessors must be 3-tuples of functions (read, write, raw) with the
91 - flagprocessors must be 3-tuples of functions (read, write, raw) with the
92 following signatures:
92 following signatures:
93 - (read) f(self, rawtext) -> text, bool
93 - (read) f(self, rawtext) -> text, bool
94 - (write) f(self, text) -> rawtext, bool
94 - (write) f(self, text) -> rawtext, bool
95 - (raw) f(self, rawtext) -> bool
95 - (raw) f(self, rawtext) -> bool
96 "text" is presented to the user. "rawtext" is stored in revlog data, not
96 "text" is presented to the user. "rawtext" is stored in revlog data, not
97 directly visible to the user.
97 directly visible to the user.
98 The boolean returned by these transforms is used to determine whether
98 The boolean returned by these transforms is used to determine whether
99 the returned text can be used for hash integrity checking. For example,
99 the returned text can be used for hash integrity checking. For example,
100 if "write" returns False, then "text" is used to generate hash. If
100 if "write" returns False, then "text" is used to generate hash. If
101 "write" returns True, that basically means "rawtext" returned by "write"
101 "write" returns True, that basically means "rawtext" returned by "write"
102 should be used to generate hash. Usually, "write" and "read" return
102 should be used to generate hash. Usually, "write" and "read" return
103 different booleans. And "raw" returns a same boolean as "write".
103 different booleans. And "raw" returns a same boolean as "write".
104
104
105 Note: The 'raw' transform is used for changegroup generation and in some
105 Note: The 'raw' transform is used for changegroup generation and in some
106 debug commands. In this case the transform only indicates whether the
106 debug commands. In this case the transform only indicates whether the
107 contents can be used for hash integrity checks.
107 contents can be used for hash integrity checks.
108 """
108 """
109 if not flag & REVIDX_KNOWN_FLAGS:
109 if not flag & REVIDX_KNOWN_FLAGS:
110 msg = _("cannot register processor on unknown flag '%#x'.") % (flag)
110 msg = _("cannot register processor on unknown flag '%#x'.") % (flag)
111 raise ProgrammingError(msg)
111 raise ProgrammingError(msg)
112 if flag not in REVIDX_FLAGS_ORDER:
112 if flag not in REVIDX_FLAGS_ORDER:
113 msg = _("flag '%#x' undefined in REVIDX_FLAGS_ORDER.") % (flag)
113 msg = _("flag '%#x' undefined in REVIDX_FLAGS_ORDER.") % (flag)
114 raise ProgrammingError(msg)
114 raise ProgrammingError(msg)
115 if flag in _flagprocessors:
115 if flag in _flagprocessors:
116 msg = _("cannot register multiple processors on flag '%#x'.") % (flag)
116 msg = _("cannot register multiple processors on flag '%#x'.") % (flag)
117 raise error.Abort(msg)
117 raise error.Abort(msg)
118 _flagprocessors[flag] = processor
118 _flagprocessors[flag] = processor
119
119
120 def getoffset(q):
120 def getoffset(q):
121 return int(q >> 16)
121 return int(q >> 16)
122
122
123 def gettype(q):
123 def gettype(q):
124 return int(q & 0xFFFF)
124 return int(q & 0xFFFF)
125
125
126 def offset_type(offset, type):
126 def offset_type(offset, type):
127 if (type & ~REVIDX_KNOWN_FLAGS) != 0:
127 if (type & ~REVIDX_KNOWN_FLAGS) != 0:
128 raise ValueError('unknown revlog index flags')
128 raise ValueError('unknown revlog index flags')
129 return int(int(offset) << 16 | type)
129 return int(int(offset) << 16 | type)
130
130
131 _nullhash = hashlib.sha1(nullid)
131 _nullhash = hashlib.sha1(nullid)
132
132
133 def hash(text, p1, p2):
133 def hash(text, p1, p2):
134 """generate a hash from the given text and its parent hashes
134 """generate a hash from the given text and its parent hashes
135
135
136 This hash combines both the current file contents and its history
136 This hash combines both the current file contents and its history
137 in a manner that makes it easy to distinguish nodes with the same
137 in a manner that makes it easy to distinguish nodes with the same
138 content in the revision graph.
138 content in the revision graph.
139 """
139 """
140 # As of now, if one of the parent node is null, p2 is null
140 # As of now, if one of the parent node is null, p2 is null
141 if p2 == nullid:
141 if p2 == nullid:
142 # deep copy of a hash is faster than creating one
142 # deep copy of a hash is faster than creating one
143 s = _nullhash.copy()
143 s = _nullhash.copy()
144 s.update(p1)
144 s.update(p1)
145 else:
145 else:
146 # none of the parent nodes are nullid
146 # none of the parent nodes are nullid
147 l = [p1, p2]
147 l = [p1, p2]
148 l.sort()
148 l.sort()
149 s = hashlib.sha1(l[0])
149 s = hashlib.sha1(l[0])
150 s.update(l[1])
150 s.update(l[1])
151 s.update(text)
151 s.update(text)
152 return s.digest()
152 return s.digest()
153
153
154 # index v0:
154 # index v0:
155 # 4 bytes: offset
155 # 4 bytes: offset
156 # 4 bytes: compressed length
156 # 4 bytes: compressed length
157 # 4 bytes: base rev
157 # 4 bytes: base rev
158 # 4 bytes: link rev
158 # 4 bytes: link rev
159 # 20 bytes: parent 1 nodeid
159 # 20 bytes: parent 1 nodeid
160 # 20 bytes: parent 2 nodeid
160 # 20 bytes: parent 2 nodeid
161 # 20 bytes: nodeid
161 # 20 bytes: nodeid
162 indexformatv0 = ">4l20s20s20s"
162 indexformatv0 = ">4l20s20s20s"
163
163
164 class revlogoldio(object):
164 class revlogoldio(object):
165 def __init__(self):
165 def __init__(self):
166 self.size = struct.calcsize(indexformatv0)
166 self.size = struct.calcsize(indexformatv0)
167
167
168 def parseindex(self, data, inline):
168 def parseindex(self, data, inline):
169 s = self.size
169 s = self.size
170 index = []
170 index = []
171 nodemap = {nullid: nullrev}
171 nodemap = {nullid: nullrev}
172 n = off = 0
172 n = off = 0
173 l = len(data)
173 l = len(data)
174 while off + s <= l:
174 while off + s <= l:
175 cur = data[off:off + s]
175 cur = data[off:off + s]
176 off += s
176 off += s
177 e = _unpack(indexformatv0, cur)
177 e = _unpack(indexformatv0, cur)
178 # transform to revlogv1 format
178 # transform to revlogv1 format
179 e2 = (offset_type(e[0], 0), e[1], -1, e[2], e[3],
179 e2 = (offset_type(e[0], 0), e[1], -1, e[2], e[3],
180 nodemap.get(e[4], nullrev), nodemap.get(e[5], nullrev), e[6])
180 nodemap.get(e[4], nullrev), nodemap.get(e[5], nullrev), e[6])
181 index.append(e2)
181 index.append(e2)
182 nodemap[e[6]] = n
182 nodemap[e[6]] = n
183 n += 1
183 n += 1
184
184
185 # add the magic null revision at -1
185 # add the magic null revision at -1
186 index.append((0, 0, 0, -1, -1, -1, -1, nullid))
186 index.append((0, 0, 0, -1, -1, -1, -1, nullid))
187
187
188 return index, nodemap, None
188 return index, nodemap, None
189
189
190 def packentry(self, entry, node, version, rev):
190 def packentry(self, entry, node, version, rev):
191 if gettype(entry[0]):
191 if gettype(entry[0]):
192 raise RevlogError(_("index entry flags need RevlogNG"))
192 raise RevlogError(_('index entry flags need revlog version 1'))
193 e2 = (getoffset(entry[0]), entry[1], entry[3], entry[4],
193 e2 = (getoffset(entry[0]), entry[1], entry[3], entry[4],
194 node(entry[5]), node(entry[6]), entry[7])
194 node(entry[5]), node(entry[6]), entry[7])
195 return _pack(indexformatv0, *e2)
195 return _pack(indexformatv0, *e2)
196
196
197 # index ng:
197 # index ng:
198 # 6 bytes: offset
198 # 6 bytes: offset
199 # 2 bytes: flags
199 # 2 bytes: flags
200 # 4 bytes: compressed length
200 # 4 bytes: compressed length
201 # 4 bytes: uncompressed length
201 # 4 bytes: uncompressed length
202 # 4 bytes: base rev
202 # 4 bytes: base rev
203 # 4 bytes: link rev
203 # 4 bytes: link rev
204 # 4 bytes: parent 1 rev
204 # 4 bytes: parent 1 rev
205 # 4 bytes: parent 2 rev
205 # 4 bytes: parent 2 rev
206 # 32 bytes: nodeid
206 # 32 bytes: nodeid
207 indexformatng = ">Qiiiiii20s12x"
207 indexformatng = ">Qiiiiii20s12x"
208 versionformat = ">I"
208 versionformat = ">I"
209
209
210 # corresponds to uncompressed length of indexformatng (2 gigs, 4-byte
210 # corresponds to uncompressed length of indexformatng (2 gigs, 4-byte
211 # signed integer)
211 # signed integer)
212 _maxentrysize = 0x7fffffff
212 _maxentrysize = 0x7fffffff
213
213
214 class revlogio(object):
214 class revlogio(object):
215 def __init__(self):
215 def __init__(self):
216 self.size = struct.calcsize(indexformatng)
216 self.size = struct.calcsize(indexformatng)
217
217
218 def parseindex(self, data, inline):
218 def parseindex(self, data, inline):
219 # call the C implementation to parse the index data
219 # call the C implementation to parse the index data
220 index, cache = parsers.parse_index2(data, inline)
220 index, cache = parsers.parse_index2(data, inline)
221 return index, getattr(index, 'nodemap', None), cache
221 return index, getattr(index, 'nodemap', None), cache
222
222
223 def packentry(self, entry, node, version, rev):
223 def packentry(self, entry, node, version, rev):
224 p = _pack(indexformatng, *entry)
224 p = _pack(indexformatng, *entry)
225 if rev == 0:
225 if rev == 0:
226 p = _pack(versionformat, version) + p[4:]
226 p = _pack(versionformat, version) + p[4:]
227 return p
227 return p
228
228
229 class revlog(object):
229 class revlog(object):
230 """
230 """
231 the underlying revision storage object
231 the underlying revision storage object
232
232
233 A revlog consists of two parts, an index and the revision data.
233 A revlog consists of two parts, an index and the revision data.
234
234
235 The index is a file with a fixed record size containing
235 The index is a file with a fixed record size containing
236 information on each revision, including its nodeid (hash), the
236 information on each revision, including its nodeid (hash), the
237 nodeids of its parents, the position and offset of its data within
237 nodeids of its parents, the position and offset of its data within
238 the data file, and the revision it's based on. Finally, each entry
238 the data file, and the revision it's based on. Finally, each entry
239 contains a linkrev entry that can serve as a pointer to external
239 contains a linkrev entry that can serve as a pointer to external
240 data.
240 data.
241
241
242 The revision data itself is a linear collection of data chunks.
242 The revision data itself is a linear collection of data chunks.
243 Each chunk represents a revision and is usually represented as a
243 Each chunk represents a revision and is usually represented as a
244 delta against the previous chunk. To bound lookup time, runs of
244 delta against the previous chunk. To bound lookup time, runs of
245 deltas are limited to about 2 times the length of the original
245 deltas are limited to about 2 times the length of the original
246 version data. This makes retrieval of a version proportional to
246 version data. This makes retrieval of a version proportional to
247 its size, or O(1) relative to the number of revisions.
247 its size, or O(1) relative to the number of revisions.
248
248
249 Both pieces of the revlog are written to in an append-only
249 Both pieces of the revlog are written to in an append-only
250 fashion, which means we never need to rewrite a file to insert or
250 fashion, which means we never need to rewrite a file to insert or
251 remove data, and can use some simple techniques to avoid the need
251 remove data, and can use some simple techniques to avoid the need
252 for locking while reading.
252 for locking while reading.
253
253
254 If checkambig, indexfile is opened with checkambig=True at
254 If checkambig, indexfile is opened with checkambig=True at
255 writing, to avoid file stat ambiguity.
255 writing, to avoid file stat ambiguity.
256 """
256 """
257 def __init__(self, opener, indexfile, datafile=None, checkambig=False):
257 def __init__(self, opener, indexfile, datafile=None, checkambig=False):
258 """
258 """
259 create a revlog object
259 create a revlog object
260
260
261 opener is a function that abstracts the file opening operation
261 opener is a function that abstracts the file opening operation
262 and can be used to implement COW semantics or the like.
262 and can be used to implement COW semantics or the like.
263 """
263 """
264 self.indexfile = indexfile
264 self.indexfile = indexfile
265 self.datafile = datafile or (indexfile[:-2] + ".d")
265 self.datafile = datafile or (indexfile[:-2] + ".d")
266 self.opener = opener
266 self.opener = opener
267 # When True, indexfile is opened with checkambig=True at writing, to
267 # When True, indexfile is opened with checkambig=True at writing, to
268 # avoid file stat ambiguity.
268 # avoid file stat ambiguity.
269 self._checkambig = checkambig
269 self._checkambig = checkambig
270 # 3-tuple of (node, rev, text) for a raw revision.
270 # 3-tuple of (node, rev, text) for a raw revision.
271 self._cache = None
271 self._cache = None
272 # Maps rev to chain base rev.
272 # Maps rev to chain base rev.
273 self._chainbasecache = util.lrucachedict(100)
273 self._chainbasecache = util.lrucachedict(100)
274 # 2-tuple of (offset, data) of raw data from the revlog at an offset.
274 # 2-tuple of (offset, data) of raw data from the revlog at an offset.
275 self._chunkcache = (0, '')
275 self._chunkcache = (0, '')
276 # How much data to read and cache into the raw revlog data cache.
276 # How much data to read and cache into the raw revlog data cache.
277 self._chunkcachesize = 65536
277 self._chunkcachesize = 65536
278 self._maxchainlen = None
278 self._maxchainlen = None
279 self._aggressivemergedeltas = False
279 self._aggressivemergedeltas = False
280 self.index = []
280 self.index = []
281 # Mapping of partial identifiers to full nodes.
281 # Mapping of partial identifiers to full nodes.
282 self._pcache = {}
282 self._pcache = {}
283 # Mapping of revision integer to full node.
283 # Mapping of revision integer to full node.
284 self._nodecache = {nullid: nullrev}
284 self._nodecache = {nullid: nullrev}
285 self._nodepos = None
285 self._nodepos = None
286 self._compengine = 'zlib'
286 self._compengine = 'zlib'
287
287
288 v = REVLOG_DEFAULT_VERSION
288 v = REVLOG_DEFAULT_VERSION
289 opts = getattr(opener, 'options', None)
289 opts = getattr(opener, 'options', None)
290 if opts is not None:
290 if opts is not None:
291 if 'revlogv1' in opts:
291 if 'revlogv1' in opts:
292 if 'generaldelta' in opts:
292 if 'generaldelta' in opts:
293 v |= FLAG_GENERALDELTA
293 v |= FLAG_GENERALDELTA
294 else:
294 else:
295 v = 0
295 v = 0
296 if 'chunkcachesize' in opts:
296 if 'chunkcachesize' in opts:
297 self._chunkcachesize = opts['chunkcachesize']
297 self._chunkcachesize = opts['chunkcachesize']
298 if 'maxchainlen' in opts:
298 if 'maxchainlen' in opts:
299 self._maxchainlen = opts['maxchainlen']
299 self._maxchainlen = opts['maxchainlen']
300 if 'aggressivemergedeltas' in opts:
300 if 'aggressivemergedeltas' in opts:
301 self._aggressivemergedeltas = opts['aggressivemergedeltas']
301 self._aggressivemergedeltas = opts['aggressivemergedeltas']
302 self._lazydeltabase = bool(opts.get('lazydeltabase', False))
302 self._lazydeltabase = bool(opts.get('lazydeltabase', False))
303 if 'compengine' in opts:
303 if 'compengine' in opts:
304 self._compengine = opts['compengine']
304 self._compengine = opts['compengine']
305
305
306 if self._chunkcachesize <= 0:
306 if self._chunkcachesize <= 0:
307 raise RevlogError(_('revlog chunk cache size %r is not greater '
307 raise RevlogError(_('revlog chunk cache size %r is not greater '
308 'than 0') % self._chunkcachesize)
308 'than 0') % self._chunkcachesize)
309 elif self._chunkcachesize & (self._chunkcachesize - 1):
309 elif self._chunkcachesize & (self._chunkcachesize - 1):
310 raise RevlogError(_('revlog chunk cache size %r is not a power '
310 raise RevlogError(_('revlog chunk cache size %r is not a power '
311 'of 2') % self._chunkcachesize)
311 'of 2') % self._chunkcachesize)
312
312
313 indexdata = ''
313 indexdata = ''
314 self._initempty = True
314 self._initempty = True
315 try:
315 try:
316 f = self.opener(self.indexfile)
316 f = self.opener(self.indexfile)
317 indexdata = f.read()
317 indexdata = f.read()
318 f.close()
318 f.close()
319 if len(indexdata) > 0:
319 if len(indexdata) > 0:
320 v = struct.unpack(versionformat, indexdata[:4])[0]
320 v = struct.unpack(versionformat, indexdata[:4])[0]
321 self._initempty = False
321 self._initempty = False
322 except IOError as inst:
322 except IOError as inst:
323 if inst.errno != errno.ENOENT:
323 if inst.errno != errno.ENOENT:
324 raise
324 raise
325
325
326 self.version = v
326 self.version = v
327 self._inline = v & FLAG_INLINE_DATA
327 self._inline = v & FLAG_INLINE_DATA
328 self._generaldelta = v & FLAG_GENERALDELTA
328 self._generaldelta = v & FLAG_GENERALDELTA
329 flags = v & ~0xFFFF
329 flags = v & ~0xFFFF
330 fmt = v & 0xFFFF
330 fmt = v & 0xFFFF
331 if fmt == REVLOGV0:
331 if fmt == REVLOGV0:
332 if flags:
332 if flags:
333 raise RevlogError(_('unknown flags (%#04x) in version %d '
333 raise RevlogError(_('unknown flags (%#04x) in version %d '
334 'revlog %s') %
334 'revlog %s') %
335 (flags >> 16, fmt, self.indexfile))
335 (flags >> 16, fmt, self.indexfile))
336 elif fmt == REVLOGV1:
336 elif fmt == REVLOGV1:
337 if flags & ~REVLOGV1_FLAGS:
337 if flags & ~REVLOGV1_FLAGS:
338 raise RevlogError(_('unknown flags (%#04x) in version %d '
338 raise RevlogError(_('unknown flags (%#04x) in version %d '
339 'revlog %s') %
339 'revlog %s') %
340 (flags >> 16, fmt, self.indexfile))
340 (flags >> 16, fmt, self.indexfile))
341 else:
341 else:
342 raise RevlogError(_('unknown version (%d) in revlog %s') %
342 raise RevlogError(_('unknown version (%d) in revlog %s') %
343 (fmt, self.indexfile))
343 (fmt, self.indexfile))
344
344
345 self.storedeltachains = True
345 self.storedeltachains = True
346
346
347 self._io = revlogio()
347 self._io = revlogio()
348 if self.version == REVLOGV0:
348 if self.version == REVLOGV0:
349 self._io = revlogoldio()
349 self._io = revlogoldio()
350 try:
350 try:
351 d = self._io.parseindex(indexdata, self._inline)
351 d = self._io.parseindex(indexdata, self._inline)
352 except (ValueError, IndexError):
352 except (ValueError, IndexError):
353 raise RevlogError(_("index %s is corrupted") % (self.indexfile))
353 raise RevlogError(_("index %s is corrupted") % (self.indexfile))
354 self.index, nodemap, self._chunkcache = d
354 self.index, nodemap, self._chunkcache = d
355 if nodemap is not None:
355 if nodemap is not None:
356 self.nodemap = self._nodecache = nodemap
356 self.nodemap = self._nodecache = nodemap
357 if not self._chunkcache:
357 if not self._chunkcache:
358 self._chunkclear()
358 self._chunkclear()
359 # revnum -> (chain-length, sum-delta-length)
359 # revnum -> (chain-length, sum-delta-length)
360 self._chaininfocache = {}
360 self._chaininfocache = {}
361 # revlog header -> revlog compressor
361 # revlog header -> revlog compressor
362 self._decompressors = {}
362 self._decompressors = {}
363
363
364 @util.propertycache
364 @util.propertycache
365 def _compressor(self):
365 def _compressor(self):
366 return util.compengines[self._compengine].revlogcompressor()
366 return util.compengines[self._compengine].revlogcompressor()
367
367
368 def tip(self):
368 def tip(self):
369 return self.node(len(self.index) - 2)
369 return self.node(len(self.index) - 2)
370 def __contains__(self, rev):
370 def __contains__(self, rev):
371 return 0 <= rev < len(self)
371 return 0 <= rev < len(self)
372 def __len__(self):
372 def __len__(self):
373 return len(self.index) - 1
373 return len(self.index) - 1
374 def __iter__(self):
374 def __iter__(self):
375 return iter(xrange(len(self)))
375 return iter(xrange(len(self)))
376 def revs(self, start=0, stop=None):
376 def revs(self, start=0, stop=None):
377 """iterate over all rev in this revlog (from start to stop)"""
377 """iterate over all rev in this revlog (from start to stop)"""
378 step = 1
378 step = 1
379 if stop is not None:
379 if stop is not None:
380 if start > stop:
380 if start > stop:
381 step = -1
381 step = -1
382 stop += step
382 stop += step
383 else:
383 else:
384 stop = len(self)
384 stop = len(self)
385 return xrange(start, stop, step)
385 return xrange(start, stop, step)
386
386
387 @util.propertycache
387 @util.propertycache
388 def nodemap(self):
388 def nodemap(self):
389 self.rev(self.node(0))
389 self.rev(self.node(0))
390 return self._nodecache
390 return self._nodecache
391
391
392 def hasnode(self, node):
392 def hasnode(self, node):
393 try:
393 try:
394 self.rev(node)
394 self.rev(node)
395 return True
395 return True
396 except KeyError:
396 except KeyError:
397 return False
397 return False
398
398
399 def clearcaches(self):
399 def clearcaches(self):
400 self._cache = None
400 self._cache = None
401 self._chainbasecache.clear()
401 self._chainbasecache.clear()
402 self._chunkcache = (0, '')
402 self._chunkcache = (0, '')
403 self._pcache = {}
403 self._pcache = {}
404
404
405 try:
405 try:
406 self._nodecache.clearcaches()
406 self._nodecache.clearcaches()
407 except AttributeError:
407 except AttributeError:
408 self._nodecache = {nullid: nullrev}
408 self._nodecache = {nullid: nullrev}
409 self._nodepos = None
409 self._nodepos = None
410
410
411 def rev(self, node):
411 def rev(self, node):
412 try:
412 try:
413 return self._nodecache[node]
413 return self._nodecache[node]
414 except TypeError:
414 except TypeError:
415 raise
415 raise
416 except RevlogError:
416 except RevlogError:
417 # parsers.c radix tree lookup failed
417 # parsers.c radix tree lookup failed
418 raise LookupError(node, self.indexfile, _('no node'))
418 raise LookupError(node, self.indexfile, _('no node'))
419 except KeyError:
419 except KeyError:
420 # pure python cache lookup failed
420 # pure python cache lookup failed
421 n = self._nodecache
421 n = self._nodecache
422 i = self.index
422 i = self.index
423 p = self._nodepos
423 p = self._nodepos
424 if p is None:
424 if p is None:
425 p = len(i) - 2
425 p = len(i) - 2
426 for r in xrange(p, -1, -1):
426 for r in xrange(p, -1, -1):
427 v = i[r][7]
427 v = i[r][7]
428 n[v] = r
428 n[v] = r
429 if v == node:
429 if v == node:
430 self._nodepos = r - 1
430 self._nodepos = r - 1
431 return r
431 return r
432 raise LookupError(node, self.indexfile, _('no node'))
432 raise LookupError(node, self.indexfile, _('no node'))
433
433
434 # Accessors for index entries.
434 # Accessors for index entries.
435
435
436 # First tuple entry is 8 bytes. First 6 bytes are offset. Last 2 bytes
436 # First tuple entry is 8 bytes. First 6 bytes are offset. Last 2 bytes
437 # are flags.
437 # are flags.
438 def start(self, rev):
438 def start(self, rev):
439 return int(self.index[rev][0] >> 16)
439 return int(self.index[rev][0] >> 16)
440
440
441 def flags(self, rev):
441 def flags(self, rev):
442 return self.index[rev][0] & 0xFFFF
442 return self.index[rev][0] & 0xFFFF
443
443
444 def length(self, rev):
444 def length(self, rev):
445 return self.index[rev][1]
445 return self.index[rev][1]
446
446
447 def rawsize(self, rev):
447 def rawsize(self, rev):
448 """return the length of the uncompressed text for a given revision"""
448 """return the length of the uncompressed text for a given revision"""
449 l = self.index[rev][2]
449 l = self.index[rev][2]
450 if l >= 0:
450 if l >= 0:
451 return l
451 return l
452
452
453 t = self.revision(rev, raw=True)
453 t = self.revision(rev, raw=True)
454 return len(t)
454 return len(t)
455
455
456 def size(self, rev):
456 def size(self, rev):
457 """length of non-raw text (processed by a "read" flag processor)"""
457 """length of non-raw text (processed by a "read" flag processor)"""
458 # fast path: if no "read" flag processor could change the content,
458 # fast path: if no "read" flag processor could change the content,
459 # size is rawsize. note: ELLIPSIS is known to not change the content.
459 # size is rawsize. note: ELLIPSIS is known to not change the content.
460 flags = self.flags(rev)
460 flags = self.flags(rev)
461 if flags & (REVIDX_KNOWN_FLAGS ^ REVIDX_ELLIPSIS) == 0:
461 if flags & (REVIDX_KNOWN_FLAGS ^ REVIDX_ELLIPSIS) == 0:
462 return self.rawsize(rev)
462 return self.rawsize(rev)
463
463
464 return len(self.revision(rev, raw=False))
464 return len(self.revision(rev, raw=False))
465
465
466 def chainbase(self, rev):
466 def chainbase(self, rev):
467 base = self._chainbasecache.get(rev)
467 base = self._chainbasecache.get(rev)
468 if base is not None:
468 if base is not None:
469 return base
469 return base
470
470
471 index = self.index
471 index = self.index
472 base = index[rev][3]
472 base = index[rev][3]
473 while base != rev:
473 while base != rev:
474 rev = base
474 rev = base
475 base = index[rev][3]
475 base = index[rev][3]
476
476
477 self._chainbasecache[rev] = base
477 self._chainbasecache[rev] = base
478 return base
478 return base
479
479
480 def linkrev(self, rev):
480 def linkrev(self, rev):
481 return self.index[rev][4]
481 return self.index[rev][4]
482
482
483 def parentrevs(self, rev):
483 def parentrevs(self, rev):
484 return self.index[rev][5:7]
484 return self.index[rev][5:7]
485
485
486 def node(self, rev):
486 def node(self, rev):
487 return self.index[rev][7]
487 return self.index[rev][7]
488
488
489 # Derived from index values.
489 # Derived from index values.
490
490
491 def end(self, rev):
491 def end(self, rev):
492 return self.start(rev) + self.length(rev)
492 return self.start(rev) + self.length(rev)
493
493
494 def parents(self, node):
494 def parents(self, node):
495 i = self.index
495 i = self.index
496 d = i[self.rev(node)]
496 d = i[self.rev(node)]
497 return i[d[5]][7], i[d[6]][7] # map revisions to nodes inline
497 return i[d[5]][7], i[d[6]][7] # map revisions to nodes inline
498
498
499 def chainlen(self, rev):
499 def chainlen(self, rev):
500 return self._chaininfo(rev)[0]
500 return self._chaininfo(rev)[0]
501
501
502 def _chaininfo(self, rev):
502 def _chaininfo(self, rev):
503 chaininfocache = self._chaininfocache
503 chaininfocache = self._chaininfocache
504 if rev in chaininfocache:
504 if rev in chaininfocache:
505 return chaininfocache[rev]
505 return chaininfocache[rev]
506 index = self.index
506 index = self.index
507 generaldelta = self._generaldelta
507 generaldelta = self._generaldelta
508 iterrev = rev
508 iterrev = rev
509 e = index[iterrev]
509 e = index[iterrev]
510 clen = 0
510 clen = 0
511 compresseddeltalen = 0
511 compresseddeltalen = 0
512 while iterrev != e[3]:
512 while iterrev != e[3]:
513 clen += 1
513 clen += 1
514 compresseddeltalen += e[1]
514 compresseddeltalen += e[1]
515 if generaldelta:
515 if generaldelta:
516 iterrev = e[3]
516 iterrev = e[3]
517 else:
517 else:
518 iterrev -= 1
518 iterrev -= 1
519 if iterrev in chaininfocache:
519 if iterrev in chaininfocache:
520 t = chaininfocache[iterrev]
520 t = chaininfocache[iterrev]
521 clen += t[0]
521 clen += t[0]
522 compresseddeltalen += t[1]
522 compresseddeltalen += t[1]
523 break
523 break
524 e = index[iterrev]
524 e = index[iterrev]
525 else:
525 else:
526 # Add text length of base since decompressing that also takes
526 # Add text length of base since decompressing that also takes
527 # work. For cache hits the length is already included.
527 # work. For cache hits the length is already included.
528 compresseddeltalen += e[1]
528 compresseddeltalen += e[1]
529 r = (clen, compresseddeltalen)
529 r = (clen, compresseddeltalen)
530 chaininfocache[rev] = r
530 chaininfocache[rev] = r
531 return r
531 return r
532
532
533 def _deltachain(self, rev, stoprev=None):
533 def _deltachain(self, rev, stoprev=None):
534 """Obtain the delta chain for a revision.
534 """Obtain the delta chain for a revision.
535
535
536 ``stoprev`` specifies a revision to stop at. If not specified, we
536 ``stoprev`` specifies a revision to stop at. If not specified, we
537 stop at the base of the chain.
537 stop at the base of the chain.
538
538
539 Returns a 2-tuple of (chain, stopped) where ``chain`` is a list of
539 Returns a 2-tuple of (chain, stopped) where ``chain`` is a list of
540 revs in ascending order and ``stopped`` is a bool indicating whether
540 revs in ascending order and ``stopped`` is a bool indicating whether
541 ``stoprev`` was hit.
541 ``stoprev`` was hit.
542 """
542 """
543 chain = []
543 chain = []
544
544
545 # Alias to prevent attribute lookup in tight loop.
545 # Alias to prevent attribute lookup in tight loop.
546 index = self.index
546 index = self.index
547 generaldelta = self._generaldelta
547 generaldelta = self._generaldelta
548
548
549 iterrev = rev
549 iterrev = rev
550 e = index[iterrev]
550 e = index[iterrev]
551 while iterrev != e[3] and iterrev != stoprev:
551 while iterrev != e[3] and iterrev != stoprev:
552 chain.append(iterrev)
552 chain.append(iterrev)
553 if generaldelta:
553 if generaldelta:
554 iterrev = e[3]
554 iterrev = e[3]
555 else:
555 else:
556 iterrev -= 1
556 iterrev -= 1
557 e = index[iterrev]
557 e = index[iterrev]
558
558
559 if iterrev == stoprev:
559 if iterrev == stoprev:
560 stopped = True
560 stopped = True
561 else:
561 else:
562 chain.append(iterrev)
562 chain.append(iterrev)
563 stopped = False
563 stopped = False
564
564
565 chain.reverse()
565 chain.reverse()
566 return chain, stopped
566 return chain, stopped
567
567
568 def ancestors(self, revs, stoprev=0, inclusive=False):
568 def ancestors(self, revs, stoprev=0, inclusive=False):
569 """Generate the ancestors of 'revs' in reverse topological order.
569 """Generate the ancestors of 'revs' in reverse topological order.
570 Does not generate revs lower than stoprev.
570 Does not generate revs lower than stoprev.
571
571
572 See the documentation for ancestor.lazyancestors for more details."""
572 See the documentation for ancestor.lazyancestors for more details."""
573
573
574 return ancestor.lazyancestors(self.parentrevs, revs, stoprev=stoprev,
574 return ancestor.lazyancestors(self.parentrevs, revs, stoprev=stoprev,
575 inclusive=inclusive)
575 inclusive=inclusive)
576
576
577 def descendants(self, revs):
577 def descendants(self, revs):
578 """Generate the descendants of 'revs' in revision order.
578 """Generate the descendants of 'revs' in revision order.
579
579
580 Yield a sequence of revision numbers starting with a child of
580 Yield a sequence of revision numbers starting with a child of
581 some rev in revs, i.e., each revision is *not* considered a
581 some rev in revs, i.e., each revision is *not* considered a
582 descendant of itself. Results are ordered by revision number (a
582 descendant of itself. Results are ordered by revision number (a
583 topological sort)."""
583 topological sort)."""
584 first = min(revs)
584 first = min(revs)
585 if first == nullrev:
585 if first == nullrev:
586 for i in self:
586 for i in self:
587 yield i
587 yield i
588 return
588 return
589
589
590 seen = set(revs)
590 seen = set(revs)
591 for i in self.revs(start=first + 1):
591 for i in self.revs(start=first + 1):
592 for x in self.parentrevs(i):
592 for x in self.parentrevs(i):
593 if x != nullrev and x in seen:
593 if x != nullrev and x in seen:
594 seen.add(i)
594 seen.add(i)
595 yield i
595 yield i
596 break
596 break
597
597
598 def findcommonmissing(self, common=None, heads=None):
598 def findcommonmissing(self, common=None, heads=None):
599 """Return a tuple of the ancestors of common and the ancestors of heads
599 """Return a tuple of the ancestors of common and the ancestors of heads
600 that are not ancestors of common. In revset terminology, we return the
600 that are not ancestors of common. In revset terminology, we return the
601 tuple:
601 tuple:
602
602
603 ::common, (::heads) - (::common)
603 ::common, (::heads) - (::common)
604
604
605 The list is sorted by revision number, meaning it is
605 The list is sorted by revision number, meaning it is
606 topologically sorted.
606 topologically sorted.
607
607
608 'heads' and 'common' are both lists of node IDs. If heads is
608 'heads' and 'common' are both lists of node IDs. If heads is
609 not supplied, uses all of the revlog's heads. If common is not
609 not supplied, uses all of the revlog's heads. If common is not
610 supplied, uses nullid."""
610 supplied, uses nullid."""
611 if common is None:
611 if common is None:
612 common = [nullid]
612 common = [nullid]
613 if heads is None:
613 if heads is None:
614 heads = self.heads()
614 heads = self.heads()
615
615
616 common = [self.rev(n) for n in common]
616 common = [self.rev(n) for n in common]
617 heads = [self.rev(n) for n in heads]
617 heads = [self.rev(n) for n in heads]
618
618
619 # we want the ancestors, but inclusive
619 # we want the ancestors, but inclusive
620 class lazyset(object):
620 class lazyset(object):
621 def __init__(self, lazyvalues):
621 def __init__(self, lazyvalues):
622 self.addedvalues = set()
622 self.addedvalues = set()
623 self.lazyvalues = lazyvalues
623 self.lazyvalues = lazyvalues
624
624
625 def __contains__(self, value):
625 def __contains__(self, value):
626 return value in self.addedvalues or value in self.lazyvalues
626 return value in self.addedvalues or value in self.lazyvalues
627
627
628 def __iter__(self):
628 def __iter__(self):
629 added = self.addedvalues
629 added = self.addedvalues
630 for r in added:
630 for r in added:
631 yield r
631 yield r
632 for r in self.lazyvalues:
632 for r in self.lazyvalues:
633 if not r in added:
633 if not r in added:
634 yield r
634 yield r
635
635
636 def add(self, value):
636 def add(self, value):
637 self.addedvalues.add(value)
637 self.addedvalues.add(value)
638
638
639 def update(self, values):
639 def update(self, values):
640 self.addedvalues.update(values)
640 self.addedvalues.update(values)
641
641
642 has = lazyset(self.ancestors(common))
642 has = lazyset(self.ancestors(common))
643 has.add(nullrev)
643 has.add(nullrev)
644 has.update(common)
644 has.update(common)
645
645
646 # take all ancestors from heads that aren't in has
646 # take all ancestors from heads that aren't in has
647 missing = set()
647 missing = set()
648 visit = collections.deque(r for r in heads if r not in has)
648 visit = collections.deque(r for r in heads if r not in has)
649 while visit:
649 while visit:
650 r = visit.popleft()
650 r = visit.popleft()
651 if r in missing:
651 if r in missing:
652 continue
652 continue
653 else:
653 else:
654 missing.add(r)
654 missing.add(r)
655 for p in self.parentrevs(r):
655 for p in self.parentrevs(r):
656 if p not in has:
656 if p not in has:
657 visit.append(p)
657 visit.append(p)
658 missing = list(missing)
658 missing = list(missing)
659 missing.sort()
659 missing.sort()
660 return has, [self.node(miss) for miss in missing]
660 return has, [self.node(miss) for miss in missing]
661
661
662 def incrementalmissingrevs(self, common=None):
662 def incrementalmissingrevs(self, common=None):
663 """Return an object that can be used to incrementally compute the
663 """Return an object that can be used to incrementally compute the
664 revision numbers of the ancestors of arbitrary sets that are not
664 revision numbers of the ancestors of arbitrary sets that are not
665 ancestors of common. This is an ancestor.incrementalmissingancestors
665 ancestors of common. This is an ancestor.incrementalmissingancestors
666 object.
666 object.
667
667
668 'common' is a list of revision numbers. If common is not supplied, uses
668 'common' is a list of revision numbers. If common is not supplied, uses
669 nullrev.
669 nullrev.
670 """
670 """
671 if common is None:
671 if common is None:
672 common = [nullrev]
672 common = [nullrev]
673
673
674 return ancestor.incrementalmissingancestors(self.parentrevs, common)
674 return ancestor.incrementalmissingancestors(self.parentrevs, common)
675
675
676 def findmissingrevs(self, common=None, heads=None):
676 def findmissingrevs(self, common=None, heads=None):
677 """Return the revision numbers of the ancestors of heads that
677 """Return the revision numbers of the ancestors of heads that
678 are not ancestors of common.
678 are not ancestors of common.
679
679
680 More specifically, return a list of revision numbers corresponding to
680 More specifically, return a list of revision numbers corresponding to
681 nodes N such that every N satisfies the following constraints:
681 nodes N such that every N satisfies the following constraints:
682
682
683 1. N is an ancestor of some node in 'heads'
683 1. N is an ancestor of some node in 'heads'
684 2. N is not an ancestor of any node in 'common'
684 2. N is not an ancestor of any node in 'common'
685
685
686 The list is sorted by revision number, meaning it is
686 The list is sorted by revision number, meaning it is
687 topologically sorted.
687 topologically sorted.
688
688
689 'heads' and 'common' are both lists of revision numbers. If heads is
689 'heads' and 'common' are both lists of revision numbers. If heads is
690 not supplied, uses all of the revlog's heads. If common is not
690 not supplied, uses all of the revlog's heads. If common is not
691 supplied, uses nullid."""
691 supplied, uses nullid."""
692 if common is None:
692 if common is None:
693 common = [nullrev]
693 common = [nullrev]
694 if heads is None:
694 if heads is None:
695 heads = self.headrevs()
695 heads = self.headrevs()
696
696
697 inc = self.incrementalmissingrevs(common=common)
697 inc = self.incrementalmissingrevs(common=common)
698 return inc.missingancestors(heads)
698 return inc.missingancestors(heads)
699
699
700 def findmissing(self, common=None, heads=None):
700 def findmissing(self, common=None, heads=None):
701 """Return the ancestors of heads that are not ancestors of common.
701 """Return the ancestors of heads that are not ancestors of common.
702
702
703 More specifically, return a list of nodes N such that every N
703 More specifically, return a list of nodes N such that every N
704 satisfies the following constraints:
704 satisfies the following constraints:
705
705
706 1. N is an ancestor of some node in 'heads'
706 1. N is an ancestor of some node in 'heads'
707 2. N is not an ancestor of any node in 'common'
707 2. N is not an ancestor of any node in 'common'
708
708
709 The list is sorted by revision number, meaning it is
709 The list is sorted by revision number, meaning it is
710 topologically sorted.
710 topologically sorted.
711
711
712 'heads' and 'common' are both lists of node IDs. If heads is
712 'heads' and 'common' are both lists of node IDs. If heads is
713 not supplied, uses all of the revlog's heads. If common is not
713 not supplied, uses all of the revlog's heads. If common is not
714 supplied, uses nullid."""
714 supplied, uses nullid."""
715 if common is None:
715 if common is None:
716 common = [nullid]
716 common = [nullid]
717 if heads is None:
717 if heads is None:
718 heads = self.heads()
718 heads = self.heads()
719
719
720 common = [self.rev(n) for n in common]
720 common = [self.rev(n) for n in common]
721 heads = [self.rev(n) for n in heads]
721 heads = [self.rev(n) for n in heads]
722
722
723 inc = self.incrementalmissingrevs(common=common)
723 inc = self.incrementalmissingrevs(common=common)
724 return [self.node(r) for r in inc.missingancestors(heads)]
724 return [self.node(r) for r in inc.missingancestors(heads)]
725
725
726 def nodesbetween(self, roots=None, heads=None):
726 def nodesbetween(self, roots=None, heads=None):
727 """Return a topological path from 'roots' to 'heads'.
727 """Return a topological path from 'roots' to 'heads'.
728
728
729 Return a tuple (nodes, outroots, outheads) where 'nodes' is a
729 Return a tuple (nodes, outroots, outheads) where 'nodes' is a
730 topologically sorted list of all nodes N that satisfy both of
730 topologically sorted list of all nodes N that satisfy both of
731 these constraints:
731 these constraints:
732
732
733 1. N is a descendant of some node in 'roots'
733 1. N is a descendant of some node in 'roots'
734 2. N is an ancestor of some node in 'heads'
734 2. N is an ancestor of some node in 'heads'
735
735
736 Every node is considered to be both a descendant and an ancestor
736 Every node is considered to be both a descendant and an ancestor
737 of itself, so every reachable node in 'roots' and 'heads' will be
737 of itself, so every reachable node in 'roots' and 'heads' will be
738 included in 'nodes'.
738 included in 'nodes'.
739
739
740 'outroots' is the list of reachable nodes in 'roots', i.e., the
740 'outroots' is the list of reachable nodes in 'roots', i.e., the
741 subset of 'roots' that is returned in 'nodes'. Likewise,
741 subset of 'roots' that is returned in 'nodes'. Likewise,
742 'outheads' is the subset of 'heads' that is also in 'nodes'.
742 'outheads' is the subset of 'heads' that is also in 'nodes'.
743
743
744 'roots' and 'heads' are both lists of node IDs. If 'roots' is
744 'roots' and 'heads' are both lists of node IDs. If 'roots' is
745 unspecified, uses nullid as the only root. If 'heads' is
745 unspecified, uses nullid as the only root. If 'heads' is
746 unspecified, uses list of all of the revlog's heads."""
746 unspecified, uses list of all of the revlog's heads."""
747 nonodes = ([], [], [])
747 nonodes = ([], [], [])
748 if roots is not None:
748 if roots is not None:
749 roots = list(roots)
749 roots = list(roots)
750 if not roots:
750 if not roots:
751 return nonodes
751 return nonodes
752 lowestrev = min([self.rev(n) for n in roots])
752 lowestrev = min([self.rev(n) for n in roots])
753 else:
753 else:
754 roots = [nullid] # Everybody's a descendant of nullid
754 roots = [nullid] # Everybody's a descendant of nullid
755 lowestrev = nullrev
755 lowestrev = nullrev
756 if (lowestrev == nullrev) and (heads is None):
756 if (lowestrev == nullrev) and (heads is None):
757 # We want _all_ the nodes!
757 # We want _all_ the nodes!
758 return ([self.node(r) for r in self], [nullid], list(self.heads()))
758 return ([self.node(r) for r in self], [nullid], list(self.heads()))
759 if heads is None:
759 if heads is None:
760 # All nodes are ancestors, so the latest ancestor is the last
760 # All nodes are ancestors, so the latest ancestor is the last
761 # node.
761 # node.
762 highestrev = len(self) - 1
762 highestrev = len(self) - 1
763 # Set ancestors to None to signal that every node is an ancestor.
763 # Set ancestors to None to signal that every node is an ancestor.
764 ancestors = None
764 ancestors = None
765 # Set heads to an empty dictionary for later discovery of heads
765 # Set heads to an empty dictionary for later discovery of heads
766 heads = {}
766 heads = {}
767 else:
767 else:
768 heads = list(heads)
768 heads = list(heads)
769 if not heads:
769 if not heads:
770 return nonodes
770 return nonodes
771 ancestors = set()
771 ancestors = set()
772 # Turn heads into a dictionary so we can remove 'fake' heads.
772 # Turn heads into a dictionary so we can remove 'fake' heads.
773 # Also, later we will be using it to filter out the heads we can't
773 # Also, later we will be using it to filter out the heads we can't
774 # find from roots.
774 # find from roots.
775 heads = dict.fromkeys(heads, False)
775 heads = dict.fromkeys(heads, False)
776 # Start at the top and keep marking parents until we're done.
776 # Start at the top and keep marking parents until we're done.
777 nodestotag = set(heads)
777 nodestotag = set(heads)
778 # Remember where the top was so we can use it as a limit later.
778 # Remember where the top was so we can use it as a limit later.
779 highestrev = max([self.rev(n) for n in nodestotag])
779 highestrev = max([self.rev(n) for n in nodestotag])
780 while nodestotag:
780 while nodestotag:
781 # grab a node to tag
781 # grab a node to tag
782 n = nodestotag.pop()
782 n = nodestotag.pop()
783 # Never tag nullid
783 # Never tag nullid
784 if n == nullid:
784 if n == nullid:
785 continue
785 continue
786 # A node's revision number represents its place in a
786 # A node's revision number represents its place in a
787 # topologically sorted list of nodes.
787 # topologically sorted list of nodes.
788 r = self.rev(n)
788 r = self.rev(n)
789 if r >= lowestrev:
789 if r >= lowestrev:
790 if n not in ancestors:
790 if n not in ancestors:
791 # If we are possibly a descendant of one of the roots
791 # If we are possibly a descendant of one of the roots
792 # and we haven't already been marked as an ancestor
792 # and we haven't already been marked as an ancestor
793 ancestors.add(n) # Mark as ancestor
793 ancestors.add(n) # Mark as ancestor
794 # Add non-nullid parents to list of nodes to tag.
794 # Add non-nullid parents to list of nodes to tag.
795 nodestotag.update([p for p in self.parents(n) if
795 nodestotag.update([p for p in self.parents(n) if
796 p != nullid])
796 p != nullid])
797 elif n in heads: # We've seen it before, is it a fake head?
797 elif n in heads: # We've seen it before, is it a fake head?
798 # So it is, real heads should not be the ancestors of
798 # So it is, real heads should not be the ancestors of
799 # any other heads.
799 # any other heads.
800 heads.pop(n)
800 heads.pop(n)
801 if not ancestors:
801 if not ancestors:
802 return nonodes
802 return nonodes
803 # Now that we have our set of ancestors, we want to remove any
803 # Now that we have our set of ancestors, we want to remove any
804 # roots that are not ancestors.
804 # roots that are not ancestors.
805
805
806 # If one of the roots was nullid, everything is included anyway.
806 # If one of the roots was nullid, everything is included anyway.
807 if lowestrev > nullrev:
807 if lowestrev > nullrev:
808 # But, since we weren't, let's recompute the lowest rev to not
808 # But, since we weren't, let's recompute the lowest rev to not
809 # include roots that aren't ancestors.
809 # include roots that aren't ancestors.
810
810
811 # Filter out roots that aren't ancestors of heads
811 # Filter out roots that aren't ancestors of heads
812 roots = [root for root in roots if root in ancestors]
812 roots = [root for root in roots if root in ancestors]
813 # Recompute the lowest revision
813 # Recompute the lowest revision
814 if roots:
814 if roots:
815 lowestrev = min([self.rev(root) for root in roots])
815 lowestrev = min([self.rev(root) for root in roots])
816 else:
816 else:
817 # No more roots? Return empty list
817 # No more roots? Return empty list
818 return nonodes
818 return nonodes
819 else:
819 else:
820 # We are descending from nullid, and don't need to care about
820 # We are descending from nullid, and don't need to care about
821 # any other roots.
821 # any other roots.
822 lowestrev = nullrev
822 lowestrev = nullrev
823 roots = [nullid]
823 roots = [nullid]
824 # Transform our roots list into a set.
824 # Transform our roots list into a set.
825 descendants = set(roots)
825 descendants = set(roots)
826 # Also, keep the original roots so we can filter out roots that aren't
826 # Also, keep the original roots so we can filter out roots that aren't
827 # 'real' roots (i.e. are descended from other roots).
827 # 'real' roots (i.e. are descended from other roots).
828 roots = descendants.copy()
828 roots = descendants.copy()
829 # Our topologically sorted list of output nodes.
829 # Our topologically sorted list of output nodes.
830 orderedout = []
830 orderedout = []
831 # Don't start at nullid since we don't want nullid in our output list,
831 # Don't start at nullid since we don't want nullid in our output list,
832 # and if nullid shows up in descendants, empty parents will look like
832 # and if nullid shows up in descendants, empty parents will look like
833 # they're descendants.
833 # they're descendants.
834 for r in self.revs(start=max(lowestrev, 0), stop=highestrev + 1):
834 for r in self.revs(start=max(lowestrev, 0), stop=highestrev + 1):
835 n = self.node(r)
835 n = self.node(r)
836 isdescendant = False
836 isdescendant = False
837 if lowestrev == nullrev: # Everybody is a descendant of nullid
837 if lowestrev == nullrev: # Everybody is a descendant of nullid
838 isdescendant = True
838 isdescendant = True
839 elif n in descendants:
839 elif n in descendants:
840 # n is already a descendant
840 # n is already a descendant
841 isdescendant = True
841 isdescendant = True
842 # This check only needs to be done here because all the roots
842 # This check only needs to be done here because all the roots
843 # will start being marked is descendants before the loop.
843 # will start being marked is descendants before the loop.
844 if n in roots:
844 if n in roots:
845 # If n was a root, check if it's a 'real' root.
845 # If n was a root, check if it's a 'real' root.
846 p = tuple(self.parents(n))
846 p = tuple(self.parents(n))
847 # If any of its parents are descendants, it's not a root.
847 # If any of its parents are descendants, it's not a root.
848 if (p[0] in descendants) or (p[1] in descendants):
848 if (p[0] in descendants) or (p[1] in descendants):
849 roots.remove(n)
849 roots.remove(n)
850 else:
850 else:
851 p = tuple(self.parents(n))
851 p = tuple(self.parents(n))
852 # A node is a descendant if either of its parents are
852 # A node is a descendant if either of its parents are
853 # descendants. (We seeded the dependents list with the roots
853 # descendants. (We seeded the dependents list with the roots
854 # up there, remember?)
854 # up there, remember?)
855 if (p[0] in descendants) or (p[1] in descendants):
855 if (p[0] in descendants) or (p[1] in descendants):
856 descendants.add(n)
856 descendants.add(n)
857 isdescendant = True
857 isdescendant = True
858 if isdescendant and ((ancestors is None) or (n in ancestors)):
858 if isdescendant and ((ancestors is None) or (n in ancestors)):
859 # Only include nodes that are both descendants and ancestors.
859 # Only include nodes that are both descendants and ancestors.
860 orderedout.append(n)
860 orderedout.append(n)
861 if (ancestors is not None) and (n in heads):
861 if (ancestors is not None) and (n in heads):
862 # We're trying to figure out which heads are reachable
862 # We're trying to figure out which heads are reachable
863 # from roots.
863 # from roots.
864 # Mark this head as having been reached
864 # Mark this head as having been reached
865 heads[n] = True
865 heads[n] = True
866 elif ancestors is None:
866 elif ancestors is None:
867 # Otherwise, we're trying to discover the heads.
867 # Otherwise, we're trying to discover the heads.
868 # Assume this is a head because if it isn't, the next step
868 # Assume this is a head because if it isn't, the next step
869 # will eventually remove it.
869 # will eventually remove it.
870 heads[n] = True
870 heads[n] = True
871 # But, obviously its parents aren't.
871 # But, obviously its parents aren't.
872 for p in self.parents(n):
872 for p in self.parents(n):
873 heads.pop(p, None)
873 heads.pop(p, None)
874 heads = [head for head, flag in heads.iteritems() if flag]
874 heads = [head for head, flag in heads.iteritems() if flag]
875 roots = list(roots)
875 roots = list(roots)
876 assert orderedout
876 assert orderedout
877 assert roots
877 assert roots
878 assert heads
878 assert heads
879 return (orderedout, roots, heads)
879 return (orderedout, roots, heads)
880
880
881 def headrevs(self):
881 def headrevs(self):
882 try:
882 try:
883 return self.index.headrevs()
883 return self.index.headrevs()
884 except AttributeError:
884 except AttributeError:
885 return self._headrevs()
885 return self._headrevs()
886
886
887 def computephases(self, roots):
887 def computephases(self, roots):
888 return self.index.computephasesmapsets(roots)
888 return self.index.computephasesmapsets(roots)
889
889
890 def _headrevs(self):
890 def _headrevs(self):
891 count = len(self)
891 count = len(self)
892 if not count:
892 if not count:
893 return [nullrev]
893 return [nullrev]
894 # we won't iter over filtered rev so nobody is a head at start
894 # we won't iter over filtered rev so nobody is a head at start
895 ishead = [0] * (count + 1)
895 ishead = [0] * (count + 1)
896 index = self.index
896 index = self.index
897 for r in self:
897 for r in self:
898 ishead[r] = 1 # I may be an head
898 ishead[r] = 1 # I may be an head
899 e = index[r]
899 e = index[r]
900 ishead[e[5]] = ishead[e[6]] = 0 # my parent are not
900 ishead[e[5]] = ishead[e[6]] = 0 # my parent are not
901 return [r for r, val in enumerate(ishead) if val]
901 return [r for r, val in enumerate(ishead) if val]
902
902
903 def heads(self, start=None, stop=None):
903 def heads(self, start=None, stop=None):
904 """return the list of all nodes that have no children
904 """return the list of all nodes that have no children
905
905
906 if start is specified, only heads that are descendants of
906 if start is specified, only heads that are descendants of
907 start will be returned
907 start will be returned
908 if stop is specified, it will consider all the revs from stop
908 if stop is specified, it will consider all the revs from stop
909 as if they had no children
909 as if they had no children
910 """
910 """
911 if start is None and stop is None:
911 if start is None and stop is None:
912 if not len(self):
912 if not len(self):
913 return [nullid]
913 return [nullid]
914 return [self.node(r) for r in self.headrevs()]
914 return [self.node(r) for r in self.headrevs()]
915
915
916 if start is None:
916 if start is None:
917 start = nullid
917 start = nullid
918 if stop is None:
918 if stop is None:
919 stop = []
919 stop = []
920 stoprevs = set([self.rev(n) for n in stop])
920 stoprevs = set([self.rev(n) for n in stop])
921 startrev = self.rev(start)
921 startrev = self.rev(start)
922 reachable = {startrev}
922 reachable = {startrev}
923 heads = {startrev}
923 heads = {startrev}
924
924
925 parentrevs = self.parentrevs
925 parentrevs = self.parentrevs
926 for r in self.revs(start=startrev + 1):
926 for r in self.revs(start=startrev + 1):
927 for p in parentrevs(r):
927 for p in parentrevs(r):
928 if p in reachable:
928 if p in reachable:
929 if r not in stoprevs:
929 if r not in stoprevs:
930 reachable.add(r)
930 reachable.add(r)
931 heads.add(r)
931 heads.add(r)
932 if p in heads and p not in stoprevs:
932 if p in heads and p not in stoprevs:
933 heads.remove(p)
933 heads.remove(p)
934
934
935 return [self.node(r) for r in heads]
935 return [self.node(r) for r in heads]
936
936
937 def children(self, node):
937 def children(self, node):
938 """find the children of a given node"""
938 """find the children of a given node"""
939 c = []
939 c = []
940 p = self.rev(node)
940 p = self.rev(node)
941 for r in self.revs(start=p + 1):
941 for r in self.revs(start=p + 1):
942 prevs = [pr for pr in self.parentrevs(r) if pr != nullrev]
942 prevs = [pr for pr in self.parentrevs(r) if pr != nullrev]
943 if prevs:
943 if prevs:
944 for pr in prevs:
944 for pr in prevs:
945 if pr == p:
945 if pr == p:
946 c.append(self.node(r))
946 c.append(self.node(r))
947 elif p == nullrev:
947 elif p == nullrev:
948 c.append(self.node(r))
948 c.append(self.node(r))
949 return c
949 return c
950
950
951 def descendant(self, start, end):
951 def descendant(self, start, end):
952 if start == nullrev:
952 if start == nullrev:
953 return True
953 return True
954 for i in self.descendants([start]):
954 for i in self.descendants([start]):
955 if i == end:
955 if i == end:
956 return True
956 return True
957 elif i > end:
957 elif i > end:
958 break
958 break
959 return False
959 return False
960
960
961 def commonancestorsheads(self, a, b):
961 def commonancestorsheads(self, a, b):
962 """calculate all the heads of the common ancestors of nodes a and b"""
962 """calculate all the heads of the common ancestors of nodes a and b"""
963 a, b = self.rev(a), self.rev(b)
963 a, b = self.rev(a), self.rev(b)
964 try:
964 try:
965 ancs = self.index.commonancestorsheads(a, b)
965 ancs = self.index.commonancestorsheads(a, b)
966 except (AttributeError, OverflowError): # C implementation failed
966 except (AttributeError, OverflowError): # C implementation failed
967 ancs = ancestor.commonancestorsheads(self.parentrevs, a, b)
967 ancs = ancestor.commonancestorsheads(self.parentrevs, a, b)
968 return pycompat.maplist(self.node, ancs)
968 return pycompat.maplist(self.node, ancs)
969
969
970 def isancestor(self, a, b):
970 def isancestor(self, a, b):
971 """return True if node a is an ancestor of node b
971 """return True if node a is an ancestor of node b
972
972
973 The implementation of this is trivial but the use of
973 The implementation of this is trivial but the use of
974 commonancestorsheads is not."""
974 commonancestorsheads is not."""
975 return a in self.commonancestorsheads(a, b)
975 return a in self.commonancestorsheads(a, b)
976
976
977 def ancestor(self, a, b):
977 def ancestor(self, a, b):
978 """calculate the "best" common ancestor of nodes a and b"""
978 """calculate the "best" common ancestor of nodes a and b"""
979
979
980 a, b = self.rev(a), self.rev(b)
980 a, b = self.rev(a), self.rev(b)
981 try:
981 try:
982 ancs = self.index.ancestors(a, b)
982 ancs = self.index.ancestors(a, b)
983 except (AttributeError, OverflowError):
983 except (AttributeError, OverflowError):
984 ancs = ancestor.ancestors(self.parentrevs, a, b)
984 ancs = ancestor.ancestors(self.parentrevs, a, b)
985 if ancs:
985 if ancs:
986 # choose a consistent winner when there's a tie
986 # choose a consistent winner when there's a tie
987 return min(map(self.node, ancs))
987 return min(map(self.node, ancs))
988 return nullid
988 return nullid
989
989
990 def _match(self, id):
990 def _match(self, id):
991 if isinstance(id, int):
991 if isinstance(id, int):
992 # rev
992 # rev
993 return self.node(id)
993 return self.node(id)
994 if len(id) == 20:
994 if len(id) == 20:
995 # possibly a binary node
995 # possibly a binary node
996 # odds of a binary node being all hex in ASCII are 1 in 10**25
996 # odds of a binary node being all hex in ASCII are 1 in 10**25
997 try:
997 try:
998 node = id
998 node = id
999 self.rev(node) # quick search the index
999 self.rev(node) # quick search the index
1000 return node
1000 return node
1001 except LookupError:
1001 except LookupError:
1002 pass # may be partial hex id
1002 pass # may be partial hex id
1003 try:
1003 try:
1004 # str(rev)
1004 # str(rev)
1005 rev = int(id)
1005 rev = int(id)
1006 if str(rev) != id:
1006 if str(rev) != id:
1007 raise ValueError
1007 raise ValueError
1008 if rev < 0:
1008 if rev < 0:
1009 rev = len(self) + rev
1009 rev = len(self) + rev
1010 if rev < 0 or rev >= len(self):
1010 if rev < 0 or rev >= len(self):
1011 raise ValueError
1011 raise ValueError
1012 return self.node(rev)
1012 return self.node(rev)
1013 except (ValueError, OverflowError):
1013 except (ValueError, OverflowError):
1014 pass
1014 pass
1015 if len(id) == 40:
1015 if len(id) == 40:
1016 try:
1016 try:
1017 # a full hex nodeid?
1017 # a full hex nodeid?
1018 node = bin(id)
1018 node = bin(id)
1019 self.rev(node)
1019 self.rev(node)
1020 return node
1020 return node
1021 except (TypeError, LookupError):
1021 except (TypeError, LookupError):
1022 pass
1022 pass
1023
1023
1024 def _partialmatch(self, id):
1024 def _partialmatch(self, id):
1025 try:
1025 try:
1026 partial = self.index.partialmatch(id)
1026 partial = self.index.partialmatch(id)
1027 if partial and self.hasnode(partial):
1027 if partial and self.hasnode(partial):
1028 return partial
1028 return partial
1029 return None
1029 return None
1030 except RevlogError:
1030 except RevlogError:
1031 # parsers.c radix tree lookup gave multiple matches
1031 # parsers.c radix tree lookup gave multiple matches
1032 # fast path: for unfiltered changelog, radix tree is accurate
1032 # fast path: for unfiltered changelog, radix tree is accurate
1033 if not getattr(self, 'filteredrevs', None):
1033 if not getattr(self, 'filteredrevs', None):
1034 raise LookupError(id, self.indexfile,
1034 raise LookupError(id, self.indexfile,
1035 _('ambiguous identifier'))
1035 _('ambiguous identifier'))
1036 # fall through to slow path that filters hidden revisions
1036 # fall through to slow path that filters hidden revisions
1037 except (AttributeError, ValueError):
1037 except (AttributeError, ValueError):
1038 # we are pure python, or key was too short to search radix tree
1038 # we are pure python, or key was too short to search radix tree
1039 pass
1039 pass
1040
1040
1041 if id in self._pcache:
1041 if id in self._pcache:
1042 return self._pcache[id]
1042 return self._pcache[id]
1043
1043
1044 if len(id) < 40:
1044 if len(id) < 40:
1045 try:
1045 try:
1046 # hex(node)[:...]
1046 # hex(node)[:...]
1047 l = len(id) // 2 # grab an even number of digits
1047 l = len(id) // 2 # grab an even number of digits
1048 prefix = bin(id[:l * 2])
1048 prefix = bin(id[:l * 2])
1049 nl = [e[7] for e in self.index if e[7].startswith(prefix)]
1049 nl = [e[7] for e in self.index if e[7].startswith(prefix)]
1050 nl = [n for n in nl if hex(n).startswith(id) and
1050 nl = [n for n in nl if hex(n).startswith(id) and
1051 self.hasnode(n)]
1051 self.hasnode(n)]
1052 if len(nl) > 0:
1052 if len(nl) > 0:
1053 if len(nl) == 1:
1053 if len(nl) == 1:
1054 self._pcache[id] = nl[0]
1054 self._pcache[id] = nl[0]
1055 return nl[0]
1055 return nl[0]
1056 raise LookupError(id, self.indexfile,
1056 raise LookupError(id, self.indexfile,
1057 _('ambiguous identifier'))
1057 _('ambiguous identifier'))
1058 return None
1058 return None
1059 except TypeError:
1059 except TypeError:
1060 pass
1060 pass
1061
1061
1062 def lookup(self, id):
1062 def lookup(self, id):
1063 """locate a node based on:
1063 """locate a node based on:
1064 - revision number or str(revision number)
1064 - revision number or str(revision number)
1065 - nodeid or subset of hex nodeid
1065 - nodeid or subset of hex nodeid
1066 """
1066 """
1067 n = self._match(id)
1067 n = self._match(id)
1068 if n is not None:
1068 if n is not None:
1069 return n
1069 return n
1070 n = self._partialmatch(id)
1070 n = self._partialmatch(id)
1071 if n:
1071 if n:
1072 return n
1072 return n
1073
1073
1074 raise LookupError(id, self.indexfile, _('no match found'))
1074 raise LookupError(id, self.indexfile, _('no match found'))
1075
1075
1076 def cmp(self, node, text):
1076 def cmp(self, node, text):
1077 """compare text with a given file revision
1077 """compare text with a given file revision
1078
1078
1079 returns True if text is different than what is stored.
1079 returns True if text is different than what is stored.
1080 """
1080 """
1081 p1, p2 = self.parents(node)
1081 p1, p2 = self.parents(node)
1082 return hash(text, p1, p2) != node
1082 return hash(text, p1, p2) != node
1083
1083
1084 def _cachesegment(self, offset, data):
1084 def _cachesegment(self, offset, data):
1085 """Add a segment to the revlog cache.
1085 """Add a segment to the revlog cache.
1086
1086
1087 Accepts an absolute offset and the data that is at that location.
1087 Accepts an absolute offset and the data that is at that location.
1088 """
1088 """
1089 o, d = self._chunkcache
1089 o, d = self._chunkcache
1090 # try to add to existing cache
1090 # try to add to existing cache
1091 if o + len(d) == offset and len(d) + len(data) < _chunksize:
1091 if o + len(d) == offset and len(d) + len(data) < _chunksize:
1092 self._chunkcache = o, d + data
1092 self._chunkcache = o, d + data
1093 else:
1093 else:
1094 self._chunkcache = offset, data
1094 self._chunkcache = offset, data
1095
1095
1096 def _readsegment(self, offset, length, df=None):
1096 def _readsegment(self, offset, length, df=None):
1097 """Load a segment of raw data from the revlog.
1097 """Load a segment of raw data from the revlog.
1098
1098
1099 Accepts an absolute offset, length to read, and an optional existing
1099 Accepts an absolute offset, length to read, and an optional existing
1100 file handle to read from.
1100 file handle to read from.
1101
1101
1102 If an existing file handle is passed, it will be seeked and the
1102 If an existing file handle is passed, it will be seeked and the
1103 original seek position will NOT be restored.
1103 original seek position will NOT be restored.
1104
1104
1105 Returns a str or buffer of raw byte data.
1105 Returns a str or buffer of raw byte data.
1106 """
1106 """
1107 if df is not None:
1107 if df is not None:
1108 closehandle = False
1108 closehandle = False
1109 else:
1109 else:
1110 if self._inline:
1110 if self._inline:
1111 df = self.opener(self.indexfile)
1111 df = self.opener(self.indexfile)
1112 else:
1112 else:
1113 df = self.opener(self.datafile)
1113 df = self.opener(self.datafile)
1114 closehandle = True
1114 closehandle = True
1115
1115
1116 # Cache data both forward and backward around the requested
1116 # Cache data both forward and backward around the requested
1117 # data, in a fixed size window. This helps speed up operations
1117 # data, in a fixed size window. This helps speed up operations
1118 # involving reading the revlog backwards.
1118 # involving reading the revlog backwards.
1119 cachesize = self._chunkcachesize
1119 cachesize = self._chunkcachesize
1120 realoffset = offset & ~(cachesize - 1)
1120 realoffset = offset & ~(cachesize - 1)
1121 reallength = (((offset + length + cachesize) & ~(cachesize - 1))
1121 reallength = (((offset + length + cachesize) & ~(cachesize - 1))
1122 - realoffset)
1122 - realoffset)
1123 df.seek(realoffset)
1123 df.seek(realoffset)
1124 d = df.read(reallength)
1124 d = df.read(reallength)
1125 if closehandle:
1125 if closehandle:
1126 df.close()
1126 df.close()
1127 self._cachesegment(realoffset, d)
1127 self._cachesegment(realoffset, d)
1128 if offset != realoffset or reallength != length:
1128 if offset != realoffset or reallength != length:
1129 return util.buffer(d, offset - realoffset, length)
1129 return util.buffer(d, offset - realoffset, length)
1130 return d
1130 return d
1131
1131
1132 def _getsegment(self, offset, length, df=None):
1132 def _getsegment(self, offset, length, df=None):
1133 """Obtain a segment of raw data from the revlog.
1133 """Obtain a segment of raw data from the revlog.
1134
1134
1135 Accepts an absolute offset, length of bytes to obtain, and an
1135 Accepts an absolute offset, length of bytes to obtain, and an
1136 optional file handle to the already-opened revlog. If the file
1136 optional file handle to the already-opened revlog. If the file
1137 handle is used, it's original seek position will not be preserved.
1137 handle is used, it's original seek position will not be preserved.
1138
1138
1139 Requests for data may be returned from a cache.
1139 Requests for data may be returned from a cache.
1140
1140
1141 Returns a str or a buffer instance of raw byte data.
1141 Returns a str or a buffer instance of raw byte data.
1142 """
1142 """
1143 o, d = self._chunkcache
1143 o, d = self._chunkcache
1144 l = len(d)
1144 l = len(d)
1145
1145
1146 # is it in the cache?
1146 # is it in the cache?
1147 cachestart = offset - o
1147 cachestart = offset - o
1148 cacheend = cachestart + length
1148 cacheend = cachestart + length
1149 if cachestart >= 0 and cacheend <= l:
1149 if cachestart >= 0 and cacheend <= l:
1150 if cachestart == 0 and cacheend == l:
1150 if cachestart == 0 and cacheend == l:
1151 return d # avoid a copy
1151 return d # avoid a copy
1152 return util.buffer(d, cachestart, cacheend - cachestart)
1152 return util.buffer(d, cachestart, cacheend - cachestart)
1153
1153
1154 return self._readsegment(offset, length, df=df)
1154 return self._readsegment(offset, length, df=df)
1155
1155
1156 def _getsegmentforrevs(self, startrev, endrev, df=None):
1156 def _getsegmentforrevs(self, startrev, endrev, df=None):
1157 """Obtain a segment of raw data corresponding to a range of revisions.
1157 """Obtain a segment of raw data corresponding to a range of revisions.
1158
1158
1159 Accepts the start and end revisions and an optional already-open
1159 Accepts the start and end revisions and an optional already-open
1160 file handle to be used for reading. If the file handle is read, its
1160 file handle to be used for reading. If the file handle is read, its
1161 seek position will not be preserved.
1161 seek position will not be preserved.
1162
1162
1163 Requests for data may be satisfied by a cache.
1163 Requests for data may be satisfied by a cache.
1164
1164
1165 Returns a 2-tuple of (offset, data) for the requested range of
1165 Returns a 2-tuple of (offset, data) for the requested range of
1166 revisions. Offset is the integer offset from the beginning of the
1166 revisions. Offset is the integer offset from the beginning of the
1167 revlog and data is a str or buffer of the raw byte data.
1167 revlog and data is a str or buffer of the raw byte data.
1168
1168
1169 Callers will need to call ``self.start(rev)`` and ``self.length(rev)``
1169 Callers will need to call ``self.start(rev)`` and ``self.length(rev)``
1170 to determine where each revision's data begins and ends.
1170 to determine where each revision's data begins and ends.
1171 """
1171 """
1172 # Inlined self.start(startrev) & self.end(endrev) for perf reasons
1172 # Inlined self.start(startrev) & self.end(endrev) for perf reasons
1173 # (functions are expensive).
1173 # (functions are expensive).
1174 index = self.index
1174 index = self.index
1175 istart = index[startrev]
1175 istart = index[startrev]
1176 start = int(istart[0] >> 16)
1176 start = int(istart[0] >> 16)
1177 if startrev == endrev:
1177 if startrev == endrev:
1178 end = start + istart[1]
1178 end = start + istart[1]
1179 else:
1179 else:
1180 iend = index[endrev]
1180 iend = index[endrev]
1181 end = int(iend[0] >> 16) + iend[1]
1181 end = int(iend[0] >> 16) + iend[1]
1182
1182
1183 if self._inline:
1183 if self._inline:
1184 start += (startrev + 1) * self._io.size
1184 start += (startrev + 1) * self._io.size
1185 end += (endrev + 1) * self._io.size
1185 end += (endrev + 1) * self._io.size
1186 length = end - start
1186 length = end - start
1187
1187
1188 return start, self._getsegment(start, length, df=df)
1188 return start, self._getsegment(start, length, df=df)
1189
1189
1190 def _chunk(self, rev, df=None):
1190 def _chunk(self, rev, df=None):
1191 """Obtain a single decompressed chunk for a revision.
1191 """Obtain a single decompressed chunk for a revision.
1192
1192
1193 Accepts an integer revision and an optional already-open file handle
1193 Accepts an integer revision and an optional already-open file handle
1194 to be used for reading. If used, the seek position of the file will not
1194 to be used for reading. If used, the seek position of the file will not
1195 be preserved.
1195 be preserved.
1196
1196
1197 Returns a str holding uncompressed data for the requested revision.
1197 Returns a str holding uncompressed data for the requested revision.
1198 """
1198 """
1199 return self.decompress(self._getsegmentforrevs(rev, rev, df=df)[1])
1199 return self.decompress(self._getsegmentforrevs(rev, rev, df=df)[1])
1200
1200
1201 def _chunks(self, revs, df=None):
1201 def _chunks(self, revs, df=None):
1202 """Obtain decompressed chunks for the specified revisions.
1202 """Obtain decompressed chunks for the specified revisions.
1203
1203
1204 Accepts an iterable of numeric revisions that are assumed to be in
1204 Accepts an iterable of numeric revisions that are assumed to be in
1205 ascending order. Also accepts an optional already-open file handle
1205 ascending order. Also accepts an optional already-open file handle
1206 to be used for reading. If used, the seek position of the file will
1206 to be used for reading. If used, the seek position of the file will
1207 not be preserved.
1207 not be preserved.
1208
1208
1209 This function is similar to calling ``self._chunk()`` multiple times,
1209 This function is similar to calling ``self._chunk()`` multiple times,
1210 but is faster.
1210 but is faster.
1211
1211
1212 Returns a list with decompressed data for each requested revision.
1212 Returns a list with decompressed data for each requested revision.
1213 """
1213 """
1214 if not revs:
1214 if not revs:
1215 return []
1215 return []
1216 start = self.start
1216 start = self.start
1217 length = self.length
1217 length = self.length
1218 inline = self._inline
1218 inline = self._inline
1219 iosize = self._io.size
1219 iosize = self._io.size
1220 buffer = util.buffer
1220 buffer = util.buffer
1221
1221
1222 l = []
1222 l = []
1223 ladd = l.append
1223 ladd = l.append
1224
1224
1225 try:
1225 try:
1226 offset, data = self._getsegmentforrevs(revs[0], revs[-1], df=df)
1226 offset, data = self._getsegmentforrevs(revs[0], revs[-1], df=df)
1227 except OverflowError:
1227 except OverflowError:
1228 # issue4215 - we can't cache a run of chunks greater than
1228 # issue4215 - we can't cache a run of chunks greater than
1229 # 2G on Windows
1229 # 2G on Windows
1230 return [self._chunk(rev, df=df) for rev in revs]
1230 return [self._chunk(rev, df=df) for rev in revs]
1231
1231
1232 decomp = self.decompress
1232 decomp = self.decompress
1233 for rev in revs:
1233 for rev in revs:
1234 chunkstart = start(rev)
1234 chunkstart = start(rev)
1235 if inline:
1235 if inline:
1236 chunkstart += (rev + 1) * iosize
1236 chunkstart += (rev + 1) * iosize
1237 chunklength = length(rev)
1237 chunklength = length(rev)
1238 ladd(decomp(buffer(data, chunkstart - offset, chunklength)))
1238 ladd(decomp(buffer(data, chunkstart - offset, chunklength)))
1239
1239
1240 return l
1240 return l
1241
1241
1242 def _chunkclear(self):
1242 def _chunkclear(self):
1243 """Clear the raw chunk cache."""
1243 """Clear the raw chunk cache."""
1244 self._chunkcache = (0, '')
1244 self._chunkcache = (0, '')
1245
1245
1246 def deltaparent(self, rev):
1246 def deltaparent(self, rev):
1247 """return deltaparent of the given revision"""
1247 """return deltaparent of the given revision"""
1248 base = self.index[rev][3]
1248 base = self.index[rev][3]
1249 if base == rev:
1249 if base == rev:
1250 return nullrev
1250 return nullrev
1251 elif self._generaldelta:
1251 elif self._generaldelta:
1252 return base
1252 return base
1253 else:
1253 else:
1254 return rev - 1
1254 return rev - 1
1255
1255
1256 def revdiff(self, rev1, rev2):
1256 def revdiff(self, rev1, rev2):
1257 """return or calculate a delta between two revisions
1257 """return or calculate a delta between two revisions
1258
1258
1259 The delta calculated is in binary form and is intended to be written to
1259 The delta calculated is in binary form and is intended to be written to
1260 revlog data directly. So this function needs raw revision data.
1260 revlog data directly. So this function needs raw revision data.
1261 """
1261 """
1262 if rev1 != nullrev and self.deltaparent(rev2) == rev1:
1262 if rev1 != nullrev and self.deltaparent(rev2) == rev1:
1263 return bytes(self._chunk(rev2))
1263 return bytes(self._chunk(rev2))
1264
1264
1265 return mdiff.textdiff(self.revision(rev1, raw=True),
1265 return mdiff.textdiff(self.revision(rev1, raw=True),
1266 self.revision(rev2, raw=True))
1266 self.revision(rev2, raw=True))
1267
1267
1268 def revision(self, nodeorrev, _df=None, raw=False):
1268 def revision(self, nodeorrev, _df=None, raw=False):
1269 """return an uncompressed revision of a given node or revision
1269 """return an uncompressed revision of a given node or revision
1270 number.
1270 number.
1271
1271
1272 _df - an existing file handle to read from. (internal-only)
1272 _df - an existing file handle to read from. (internal-only)
1273 raw - an optional argument specifying if the revision data is to be
1273 raw - an optional argument specifying if the revision data is to be
1274 treated as raw data when applying flag transforms. 'raw' should be set
1274 treated as raw data when applying flag transforms. 'raw' should be set
1275 to True when generating changegroups or in debug commands.
1275 to True when generating changegroups or in debug commands.
1276 """
1276 """
1277 if isinstance(nodeorrev, int):
1277 if isinstance(nodeorrev, int):
1278 rev = nodeorrev
1278 rev = nodeorrev
1279 node = self.node(rev)
1279 node = self.node(rev)
1280 else:
1280 else:
1281 node = nodeorrev
1281 node = nodeorrev
1282 rev = None
1282 rev = None
1283
1283
1284 cachedrev = None
1284 cachedrev = None
1285 flags = None
1285 flags = None
1286 rawtext = None
1286 rawtext = None
1287 if node == nullid:
1287 if node == nullid:
1288 return ""
1288 return ""
1289 if self._cache:
1289 if self._cache:
1290 if self._cache[0] == node:
1290 if self._cache[0] == node:
1291 # _cache only stores rawtext
1291 # _cache only stores rawtext
1292 if raw:
1292 if raw:
1293 return self._cache[2]
1293 return self._cache[2]
1294 # duplicated, but good for perf
1294 # duplicated, but good for perf
1295 if rev is None:
1295 if rev is None:
1296 rev = self.rev(node)
1296 rev = self.rev(node)
1297 if flags is None:
1297 if flags is None:
1298 flags = self.flags(rev)
1298 flags = self.flags(rev)
1299 # no extra flags set, no flag processor runs, text = rawtext
1299 # no extra flags set, no flag processor runs, text = rawtext
1300 if flags == REVIDX_DEFAULT_FLAGS:
1300 if flags == REVIDX_DEFAULT_FLAGS:
1301 return self._cache[2]
1301 return self._cache[2]
1302 # rawtext is reusable. need to run flag processor
1302 # rawtext is reusable. need to run flag processor
1303 rawtext = self._cache[2]
1303 rawtext = self._cache[2]
1304
1304
1305 cachedrev = self._cache[1]
1305 cachedrev = self._cache[1]
1306
1306
1307 # look up what we need to read
1307 # look up what we need to read
1308 if rawtext is None:
1308 if rawtext is None:
1309 if rev is None:
1309 if rev is None:
1310 rev = self.rev(node)
1310 rev = self.rev(node)
1311
1311
1312 chain, stopped = self._deltachain(rev, stoprev=cachedrev)
1312 chain, stopped = self._deltachain(rev, stoprev=cachedrev)
1313 if stopped:
1313 if stopped:
1314 rawtext = self._cache[2]
1314 rawtext = self._cache[2]
1315
1315
1316 # drop cache to save memory
1316 # drop cache to save memory
1317 self._cache = None
1317 self._cache = None
1318
1318
1319 bins = self._chunks(chain, df=_df)
1319 bins = self._chunks(chain, df=_df)
1320 if rawtext is None:
1320 if rawtext is None:
1321 rawtext = bytes(bins[0])
1321 rawtext = bytes(bins[0])
1322 bins = bins[1:]
1322 bins = bins[1:]
1323
1323
1324 rawtext = mdiff.patches(rawtext, bins)
1324 rawtext = mdiff.patches(rawtext, bins)
1325 self._cache = (node, rev, rawtext)
1325 self._cache = (node, rev, rawtext)
1326
1326
1327 if flags is None:
1327 if flags is None:
1328 if rev is None:
1328 if rev is None:
1329 rev = self.rev(node)
1329 rev = self.rev(node)
1330 flags = self.flags(rev)
1330 flags = self.flags(rev)
1331
1331
1332 text, validatehash = self._processflags(rawtext, flags, 'read', raw=raw)
1332 text, validatehash = self._processflags(rawtext, flags, 'read', raw=raw)
1333 if validatehash:
1333 if validatehash:
1334 self.checkhash(text, node, rev=rev)
1334 self.checkhash(text, node, rev=rev)
1335
1335
1336 return text
1336 return text
1337
1337
1338 def hash(self, text, p1, p2):
1338 def hash(self, text, p1, p2):
1339 """Compute a node hash.
1339 """Compute a node hash.
1340
1340
1341 Available as a function so that subclasses can replace the hash
1341 Available as a function so that subclasses can replace the hash
1342 as needed.
1342 as needed.
1343 """
1343 """
1344 return hash(text, p1, p2)
1344 return hash(text, p1, p2)
1345
1345
1346 def _processflags(self, text, flags, operation, raw=False):
1346 def _processflags(self, text, flags, operation, raw=False):
1347 """Inspect revision data flags and applies transforms defined by
1347 """Inspect revision data flags and applies transforms defined by
1348 registered flag processors.
1348 registered flag processors.
1349
1349
1350 ``text`` - the revision data to process
1350 ``text`` - the revision data to process
1351 ``flags`` - the revision flags
1351 ``flags`` - the revision flags
1352 ``operation`` - the operation being performed (read or write)
1352 ``operation`` - the operation being performed (read or write)
1353 ``raw`` - an optional argument describing if the raw transform should be
1353 ``raw`` - an optional argument describing if the raw transform should be
1354 applied.
1354 applied.
1355
1355
1356 This method processes the flags in the order (or reverse order if
1356 This method processes the flags in the order (or reverse order if
1357 ``operation`` is 'write') defined by REVIDX_FLAGS_ORDER, applying the
1357 ``operation`` is 'write') defined by REVIDX_FLAGS_ORDER, applying the
1358 flag processors registered for present flags. The order of flags defined
1358 flag processors registered for present flags. The order of flags defined
1359 in REVIDX_FLAGS_ORDER needs to be stable to allow non-commutativity.
1359 in REVIDX_FLAGS_ORDER needs to be stable to allow non-commutativity.
1360
1360
1361 Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is the
1361 Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is the
1362 processed text and ``validatehash`` is a bool indicating whether the
1362 processed text and ``validatehash`` is a bool indicating whether the
1363 returned text should be checked for hash integrity.
1363 returned text should be checked for hash integrity.
1364
1364
1365 Note: If the ``raw`` argument is set, it has precedence over the
1365 Note: If the ``raw`` argument is set, it has precedence over the
1366 operation and will only update the value of ``validatehash``.
1366 operation and will only update the value of ``validatehash``.
1367 """
1367 """
1368 # fast path: no flag processors will run
1368 # fast path: no flag processors will run
1369 if flags == 0:
1369 if flags == 0:
1370 return text, True
1370 return text, True
1371 if not operation in ('read', 'write'):
1371 if not operation in ('read', 'write'):
1372 raise ProgrammingError(_("invalid '%s' operation ") % (operation))
1372 raise ProgrammingError(_("invalid '%s' operation ") % (operation))
1373 # Check all flags are known.
1373 # Check all flags are known.
1374 if flags & ~REVIDX_KNOWN_FLAGS:
1374 if flags & ~REVIDX_KNOWN_FLAGS:
1375 raise RevlogError(_("incompatible revision flag '%#x'") %
1375 raise RevlogError(_("incompatible revision flag '%#x'") %
1376 (flags & ~REVIDX_KNOWN_FLAGS))
1376 (flags & ~REVIDX_KNOWN_FLAGS))
1377 validatehash = True
1377 validatehash = True
1378 # Depending on the operation (read or write), the order might be
1378 # Depending on the operation (read or write), the order might be
1379 # reversed due to non-commutative transforms.
1379 # reversed due to non-commutative transforms.
1380 orderedflags = REVIDX_FLAGS_ORDER
1380 orderedflags = REVIDX_FLAGS_ORDER
1381 if operation == 'write':
1381 if operation == 'write':
1382 orderedflags = reversed(orderedflags)
1382 orderedflags = reversed(orderedflags)
1383
1383
1384 for flag in orderedflags:
1384 for flag in orderedflags:
1385 # If a flagprocessor has been registered for a known flag, apply the
1385 # If a flagprocessor has been registered for a known flag, apply the
1386 # related operation transform and update result tuple.
1386 # related operation transform and update result tuple.
1387 if flag & flags:
1387 if flag & flags:
1388 vhash = True
1388 vhash = True
1389
1389
1390 if flag not in _flagprocessors:
1390 if flag not in _flagprocessors:
1391 message = _("missing processor for flag '%#x'") % (flag)
1391 message = _("missing processor for flag '%#x'") % (flag)
1392 raise RevlogError(message)
1392 raise RevlogError(message)
1393
1393
1394 processor = _flagprocessors[flag]
1394 processor = _flagprocessors[flag]
1395 if processor is not None:
1395 if processor is not None:
1396 readtransform, writetransform, rawtransform = processor
1396 readtransform, writetransform, rawtransform = processor
1397
1397
1398 if raw:
1398 if raw:
1399 vhash = rawtransform(self, text)
1399 vhash = rawtransform(self, text)
1400 elif operation == 'read':
1400 elif operation == 'read':
1401 text, vhash = readtransform(self, text)
1401 text, vhash = readtransform(self, text)
1402 else: # write operation
1402 else: # write operation
1403 text, vhash = writetransform(self, text)
1403 text, vhash = writetransform(self, text)
1404 validatehash = validatehash and vhash
1404 validatehash = validatehash and vhash
1405
1405
1406 return text, validatehash
1406 return text, validatehash
1407
1407
1408 def checkhash(self, text, node, p1=None, p2=None, rev=None):
1408 def checkhash(self, text, node, p1=None, p2=None, rev=None):
1409 """Check node hash integrity.
1409 """Check node hash integrity.
1410
1410
1411 Available as a function so that subclasses can extend hash mismatch
1411 Available as a function so that subclasses can extend hash mismatch
1412 behaviors as needed.
1412 behaviors as needed.
1413 """
1413 """
1414 if p1 is None and p2 is None:
1414 if p1 is None and p2 is None:
1415 p1, p2 = self.parents(node)
1415 p1, p2 = self.parents(node)
1416 if node != self.hash(text, p1, p2):
1416 if node != self.hash(text, p1, p2):
1417 revornode = rev
1417 revornode = rev
1418 if revornode is None:
1418 if revornode is None:
1419 revornode = templatefilters.short(hex(node))
1419 revornode = templatefilters.short(hex(node))
1420 raise RevlogError(_("integrity check failed on %s:%s")
1420 raise RevlogError(_("integrity check failed on %s:%s")
1421 % (self.indexfile, revornode))
1421 % (self.indexfile, revornode))
1422
1422
1423 def checkinlinesize(self, tr, fp=None):
1423 def checkinlinesize(self, tr, fp=None):
1424 """Check if the revlog is too big for inline and convert if so.
1424 """Check if the revlog is too big for inline and convert if so.
1425
1425
1426 This should be called after revisions are added to the revlog. If the
1426 This should be called after revisions are added to the revlog. If the
1427 revlog has grown too large to be an inline revlog, it will convert it
1427 revlog has grown too large to be an inline revlog, it will convert it
1428 to use multiple index and data files.
1428 to use multiple index and data files.
1429 """
1429 """
1430 if not self._inline or (self.start(-2) + self.length(-2)) < _maxinline:
1430 if not self._inline or (self.start(-2) + self.length(-2)) < _maxinline:
1431 return
1431 return
1432
1432
1433 trinfo = tr.find(self.indexfile)
1433 trinfo = tr.find(self.indexfile)
1434 if trinfo is None:
1434 if trinfo is None:
1435 raise RevlogError(_("%s not found in the transaction")
1435 raise RevlogError(_("%s not found in the transaction")
1436 % self.indexfile)
1436 % self.indexfile)
1437
1437
1438 trindex = trinfo[2]
1438 trindex = trinfo[2]
1439 if trindex is not None:
1439 if trindex is not None:
1440 dataoff = self.start(trindex)
1440 dataoff = self.start(trindex)
1441 else:
1441 else:
1442 # revlog was stripped at start of transaction, use all leftover data
1442 # revlog was stripped at start of transaction, use all leftover data
1443 trindex = len(self) - 1
1443 trindex = len(self) - 1
1444 dataoff = self.end(-2)
1444 dataoff = self.end(-2)
1445
1445
1446 tr.add(self.datafile, dataoff)
1446 tr.add(self.datafile, dataoff)
1447
1447
1448 if fp:
1448 if fp:
1449 fp.flush()
1449 fp.flush()
1450 fp.close()
1450 fp.close()
1451
1451
1452 df = self.opener(self.datafile, 'w')
1452 df = self.opener(self.datafile, 'w')
1453 try:
1453 try:
1454 for r in self:
1454 for r in self:
1455 df.write(self._getsegmentforrevs(r, r)[1])
1455 df.write(self._getsegmentforrevs(r, r)[1])
1456 finally:
1456 finally:
1457 df.close()
1457 df.close()
1458
1458
1459 fp = self.opener(self.indexfile, 'w', atomictemp=True,
1459 fp = self.opener(self.indexfile, 'w', atomictemp=True,
1460 checkambig=self._checkambig)
1460 checkambig=self._checkambig)
1461 self.version &= ~FLAG_INLINE_DATA
1461 self.version &= ~FLAG_INLINE_DATA
1462 self._inline = False
1462 self._inline = False
1463 for i in self:
1463 for i in self:
1464 e = self._io.packentry(self.index[i], self.node, self.version, i)
1464 e = self._io.packentry(self.index[i], self.node, self.version, i)
1465 fp.write(e)
1465 fp.write(e)
1466
1466
1467 # if we don't call close, the temp file will never replace the
1467 # if we don't call close, the temp file will never replace the
1468 # real index
1468 # real index
1469 fp.close()
1469 fp.close()
1470
1470
1471 tr.replace(self.indexfile, trindex * self._io.size)
1471 tr.replace(self.indexfile, trindex * self._io.size)
1472 self._chunkclear()
1472 self._chunkclear()
1473
1473
1474 def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
1474 def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
1475 node=None, flags=REVIDX_DEFAULT_FLAGS):
1475 node=None, flags=REVIDX_DEFAULT_FLAGS):
1476 """add a revision to the log
1476 """add a revision to the log
1477
1477
1478 text - the revision data to add
1478 text - the revision data to add
1479 transaction - the transaction object used for rollback
1479 transaction - the transaction object used for rollback
1480 link - the linkrev data to add
1480 link - the linkrev data to add
1481 p1, p2 - the parent nodeids of the revision
1481 p1, p2 - the parent nodeids of the revision
1482 cachedelta - an optional precomputed delta
1482 cachedelta - an optional precomputed delta
1483 node - nodeid of revision; typically node is not specified, and it is
1483 node - nodeid of revision; typically node is not specified, and it is
1484 computed by default as hash(text, p1, p2), however subclasses might
1484 computed by default as hash(text, p1, p2), however subclasses might
1485 use different hashing method (and override checkhash() in such case)
1485 use different hashing method (and override checkhash() in such case)
1486 flags - the known flags to set on the revision
1486 flags - the known flags to set on the revision
1487 """
1487 """
1488 if link == nullrev:
1488 if link == nullrev:
1489 raise RevlogError(_("attempted to add linkrev -1 to %s")
1489 raise RevlogError(_("attempted to add linkrev -1 to %s")
1490 % self.indexfile)
1490 % self.indexfile)
1491
1491
1492 if flags:
1492 if flags:
1493 node = node or self.hash(text, p1, p2)
1493 node = node or self.hash(text, p1, p2)
1494
1494
1495 rawtext, validatehash = self._processflags(text, flags, 'write')
1495 rawtext, validatehash = self._processflags(text, flags, 'write')
1496
1496
1497 # If the flag processor modifies the revision data, ignore any provided
1497 # If the flag processor modifies the revision data, ignore any provided
1498 # cachedelta.
1498 # cachedelta.
1499 if rawtext != text:
1499 if rawtext != text:
1500 cachedelta = None
1500 cachedelta = None
1501
1501
1502 if len(rawtext) > _maxentrysize:
1502 if len(rawtext) > _maxentrysize:
1503 raise RevlogError(
1503 raise RevlogError(
1504 _("%s: size of %d bytes exceeds maximum revlog storage of 2GiB")
1504 _("%s: size of %d bytes exceeds maximum revlog storage of 2GiB")
1505 % (self.indexfile, len(rawtext)))
1505 % (self.indexfile, len(rawtext)))
1506
1506
1507 node = node or self.hash(rawtext, p1, p2)
1507 node = node or self.hash(rawtext, p1, p2)
1508 if node in self.nodemap:
1508 if node in self.nodemap:
1509 return node
1509 return node
1510
1510
1511 if validatehash:
1511 if validatehash:
1512 self.checkhash(rawtext, node, p1=p1, p2=p2)
1512 self.checkhash(rawtext, node, p1=p1, p2=p2)
1513
1513
1514 return self.addrawrevision(rawtext, transaction, link, p1, p2, node,
1514 return self.addrawrevision(rawtext, transaction, link, p1, p2, node,
1515 flags, cachedelta=cachedelta)
1515 flags, cachedelta=cachedelta)
1516
1516
1517 def addrawrevision(self, rawtext, transaction, link, p1, p2, node, flags,
1517 def addrawrevision(self, rawtext, transaction, link, p1, p2, node, flags,
1518 cachedelta=None):
1518 cachedelta=None):
1519 """add a raw revision with known flags, node and parents
1519 """add a raw revision with known flags, node and parents
1520 useful when reusing a revision not stored in this revlog (ex: received
1520 useful when reusing a revision not stored in this revlog (ex: received
1521 over wire, or read from an external bundle).
1521 over wire, or read from an external bundle).
1522 """
1522 """
1523 dfh = None
1523 dfh = None
1524 if not self._inline:
1524 if not self._inline:
1525 dfh = self.opener(self.datafile, "a+")
1525 dfh = self.opener(self.datafile, "a+")
1526 ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
1526 ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
1527 try:
1527 try:
1528 return self._addrevision(node, rawtext, transaction, link, p1, p2,
1528 return self._addrevision(node, rawtext, transaction, link, p1, p2,
1529 flags, cachedelta, ifh, dfh)
1529 flags, cachedelta, ifh, dfh)
1530 finally:
1530 finally:
1531 if dfh:
1531 if dfh:
1532 dfh.close()
1532 dfh.close()
1533 ifh.close()
1533 ifh.close()
1534
1534
1535 def compress(self, data):
1535 def compress(self, data):
1536 """Generate a possibly-compressed representation of data."""
1536 """Generate a possibly-compressed representation of data."""
1537 if not data:
1537 if not data:
1538 return '', data
1538 return '', data
1539
1539
1540 compressed = self._compressor.compress(data)
1540 compressed = self._compressor.compress(data)
1541
1541
1542 if compressed:
1542 if compressed:
1543 # The revlog compressor added the header in the returned data.
1543 # The revlog compressor added the header in the returned data.
1544 return '', compressed
1544 return '', compressed
1545
1545
1546 if data[0:1] == '\0':
1546 if data[0:1] == '\0':
1547 return '', data
1547 return '', data
1548 return 'u', data
1548 return 'u', data
1549
1549
1550 def decompress(self, data):
1550 def decompress(self, data):
1551 """Decompress a revlog chunk.
1551 """Decompress a revlog chunk.
1552
1552
1553 The chunk is expected to begin with a header identifying the
1553 The chunk is expected to begin with a header identifying the
1554 format type so it can be routed to an appropriate decompressor.
1554 format type so it can be routed to an appropriate decompressor.
1555 """
1555 """
1556 if not data:
1556 if not data:
1557 return data
1557 return data
1558
1558
1559 # Revlogs are read much more frequently than they are written and many
1559 # Revlogs are read much more frequently than they are written and many
1560 # chunks only take microseconds to decompress, so performance is
1560 # chunks only take microseconds to decompress, so performance is
1561 # important here.
1561 # important here.
1562 #
1562 #
1563 # We can make a few assumptions about revlogs:
1563 # We can make a few assumptions about revlogs:
1564 #
1564 #
1565 # 1) the majority of chunks will be compressed (as opposed to inline
1565 # 1) the majority of chunks will be compressed (as opposed to inline
1566 # raw data).
1566 # raw data).
1567 # 2) decompressing *any* data will likely by at least 10x slower than
1567 # 2) decompressing *any* data will likely by at least 10x slower than
1568 # returning raw inline data.
1568 # returning raw inline data.
1569 # 3) we want to prioritize common and officially supported compression
1569 # 3) we want to prioritize common and officially supported compression
1570 # engines
1570 # engines
1571 #
1571 #
1572 # It follows that we want to optimize for "decompress compressed data
1572 # It follows that we want to optimize for "decompress compressed data
1573 # when encoded with common and officially supported compression engines"
1573 # when encoded with common and officially supported compression engines"
1574 # case over "raw data" and "data encoded by less common or non-official
1574 # case over "raw data" and "data encoded by less common or non-official
1575 # compression engines." That is why we have the inline lookup first
1575 # compression engines." That is why we have the inline lookup first
1576 # followed by the compengines lookup.
1576 # followed by the compengines lookup.
1577 #
1577 #
1578 # According to `hg perfrevlogchunks`, this is ~0.5% faster for zlib
1578 # According to `hg perfrevlogchunks`, this is ~0.5% faster for zlib
1579 # compressed chunks. And this matters for changelog and manifest reads.
1579 # compressed chunks. And this matters for changelog and manifest reads.
1580 t = data[0:1]
1580 t = data[0:1]
1581
1581
1582 if t == 'x':
1582 if t == 'x':
1583 try:
1583 try:
1584 return _zlibdecompress(data)
1584 return _zlibdecompress(data)
1585 except zlib.error as e:
1585 except zlib.error as e:
1586 raise RevlogError(_('revlog decompress error: %s') % str(e))
1586 raise RevlogError(_('revlog decompress error: %s') % str(e))
1587 # '\0' is more common than 'u' so it goes first.
1587 # '\0' is more common than 'u' so it goes first.
1588 elif t == '\0':
1588 elif t == '\0':
1589 return data
1589 return data
1590 elif t == 'u':
1590 elif t == 'u':
1591 return util.buffer(data, 1)
1591 return util.buffer(data, 1)
1592
1592
1593 try:
1593 try:
1594 compressor = self._decompressors[t]
1594 compressor = self._decompressors[t]
1595 except KeyError:
1595 except KeyError:
1596 try:
1596 try:
1597 engine = util.compengines.forrevlogheader(t)
1597 engine = util.compengines.forrevlogheader(t)
1598 compressor = engine.revlogcompressor()
1598 compressor = engine.revlogcompressor()
1599 self._decompressors[t] = compressor
1599 self._decompressors[t] = compressor
1600 except KeyError:
1600 except KeyError:
1601 raise RevlogError(_('unknown compression type %r') % t)
1601 raise RevlogError(_('unknown compression type %r') % t)
1602
1602
1603 return compressor.decompress(data)
1603 return compressor.decompress(data)
1604
1604
1605 def _isgooddelta(self, d, textlen):
1605 def _isgooddelta(self, d, textlen):
1606 """Returns True if the given delta is good. Good means that it is within
1606 """Returns True if the given delta is good. Good means that it is within
1607 the disk span, disk size, and chain length bounds that we know to be
1607 the disk span, disk size, and chain length bounds that we know to be
1608 performant."""
1608 performant."""
1609 if d is None:
1609 if d is None:
1610 return False
1610 return False
1611
1611
1612 # - 'dist' is the distance from the base revision -- bounding it limits
1612 # - 'dist' is the distance from the base revision -- bounding it limits
1613 # the amount of I/O we need to do.
1613 # the amount of I/O we need to do.
1614 # - 'compresseddeltalen' is the sum of the total size of deltas we need
1614 # - 'compresseddeltalen' is the sum of the total size of deltas we need
1615 # to apply -- bounding it limits the amount of CPU we consume.
1615 # to apply -- bounding it limits the amount of CPU we consume.
1616 dist, l, data, base, chainbase, chainlen, compresseddeltalen = d
1616 dist, l, data, base, chainbase, chainlen, compresseddeltalen = d
1617 if (dist > textlen * 4 or l > textlen or
1617 if (dist > textlen * 4 or l > textlen or
1618 compresseddeltalen > textlen * 2 or
1618 compresseddeltalen > textlen * 2 or
1619 (self._maxchainlen and chainlen > self._maxchainlen)):
1619 (self._maxchainlen and chainlen > self._maxchainlen)):
1620 return False
1620 return False
1621
1621
1622 return True
1622 return True
1623
1623
1624 def _addrevision(self, node, rawtext, transaction, link, p1, p2, flags,
1624 def _addrevision(self, node, rawtext, transaction, link, p1, p2, flags,
1625 cachedelta, ifh, dfh, alwayscache=False):
1625 cachedelta, ifh, dfh, alwayscache=False):
1626 """internal function to add revisions to the log
1626 """internal function to add revisions to the log
1627
1627
1628 see addrevision for argument descriptions.
1628 see addrevision for argument descriptions.
1629
1629
1630 note: "addrevision" takes non-raw text, "_addrevision" takes raw text.
1630 note: "addrevision" takes non-raw text, "_addrevision" takes raw text.
1631
1631
1632 invariants:
1632 invariants:
1633 - rawtext is optional (can be None); if not set, cachedelta must be set.
1633 - rawtext is optional (can be None); if not set, cachedelta must be set.
1634 if both are set, they must correspond to each other.
1634 if both are set, they must correspond to each other.
1635 """
1635 """
1636 btext = [rawtext]
1636 btext = [rawtext]
1637 def buildtext():
1637 def buildtext():
1638 if btext[0] is not None:
1638 if btext[0] is not None:
1639 return btext[0]
1639 return btext[0]
1640 baserev = cachedelta[0]
1640 baserev = cachedelta[0]
1641 delta = cachedelta[1]
1641 delta = cachedelta[1]
1642 # special case deltas which replace entire base; no need to decode
1642 # special case deltas which replace entire base; no need to decode
1643 # base revision. this neatly avoids censored bases, which throw when
1643 # base revision. this neatly avoids censored bases, which throw when
1644 # they're decoded.
1644 # they're decoded.
1645 hlen = struct.calcsize(">lll")
1645 hlen = struct.calcsize(">lll")
1646 if delta[:hlen] == mdiff.replacediffheader(self.rawsize(baserev),
1646 if delta[:hlen] == mdiff.replacediffheader(self.rawsize(baserev),
1647 len(delta) - hlen):
1647 len(delta) - hlen):
1648 btext[0] = delta[hlen:]
1648 btext[0] = delta[hlen:]
1649 else:
1649 else:
1650 if self._inline:
1650 if self._inline:
1651 fh = ifh
1651 fh = ifh
1652 else:
1652 else:
1653 fh = dfh
1653 fh = dfh
1654 basetext = self.revision(baserev, _df=fh, raw=True)
1654 basetext = self.revision(baserev, _df=fh, raw=True)
1655 btext[0] = mdiff.patch(basetext, delta)
1655 btext[0] = mdiff.patch(basetext, delta)
1656
1656
1657 try:
1657 try:
1658 res = self._processflags(btext[0], flags, 'read', raw=True)
1658 res = self._processflags(btext[0], flags, 'read', raw=True)
1659 btext[0], validatehash = res
1659 btext[0], validatehash = res
1660 if validatehash:
1660 if validatehash:
1661 self.checkhash(btext[0], node, p1=p1, p2=p2)
1661 self.checkhash(btext[0], node, p1=p1, p2=p2)
1662 if flags & REVIDX_ISCENSORED:
1662 if flags & REVIDX_ISCENSORED:
1663 raise RevlogError(_('node %s is not censored') % node)
1663 raise RevlogError(_('node %s is not censored') % node)
1664 except CensoredNodeError:
1664 except CensoredNodeError:
1665 # must pass the censored index flag to add censored revisions
1665 # must pass the censored index flag to add censored revisions
1666 if not flags & REVIDX_ISCENSORED:
1666 if not flags & REVIDX_ISCENSORED:
1667 raise
1667 raise
1668 return btext[0]
1668 return btext[0]
1669
1669
1670 def builddelta(rev):
1670 def builddelta(rev):
1671 # can we use the cached delta?
1671 # can we use the cached delta?
1672 if cachedelta and cachedelta[0] == rev:
1672 if cachedelta and cachedelta[0] == rev:
1673 delta = cachedelta[1]
1673 delta = cachedelta[1]
1674 else:
1674 else:
1675 t = buildtext()
1675 t = buildtext()
1676 if self.iscensored(rev):
1676 if self.iscensored(rev):
1677 # deltas based on a censored revision must replace the
1677 # deltas based on a censored revision must replace the
1678 # full content in one patch, so delta works everywhere
1678 # full content in one patch, so delta works everywhere
1679 header = mdiff.replacediffheader(self.rawsize(rev), len(t))
1679 header = mdiff.replacediffheader(self.rawsize(rev), len(t))
1680 delta = header + t
1680 delta = header + t
1681 else:
1681 else:
1682 if self._inline:
1682 if self._inline:
1683 fh = ifh
1683 fh = ifh
1684 else:
1684 else:
1685 fh = dfh
1685 fh = dfh
1686 ptext = self.revision(rev, _df=fh, raw=True)
1686 ptext = self.revision(rev, _df=fh, raw=True)
1687 delta = mdiff.textdiff(ptext, t)
1687 delta = mdiff.textdiff(ptext, t)
1688 header, data = self.compress(delta)
1688 header, data = self.compress(delta)
1689 deltalen = len(header) + len(data)
1689 deltalen = len(header) + len(data)
1690 chainbase = self.chainbase(rev)
1690 chainbase = self.chainbase(rev)
1691 dist = deltalen + offset - self.start(chainbase)
1691 dist = deltalen + offset - self.start(chainbase)
1692 if self._generaldelta:
1692 if self._generaldelta:
1693 base = rev
1693 base = rev
1694 else:
1694 else:
1695 base = chainbase
1695 base = chainbase
1696 chainlen, compresseddeltalen = self._chaininfo(rev)
1696 chainlen, compresseddeltalen = self._chaininfo(rev)
1697 chainlen += 1
1697 chainlen += 1
1698 compresseddeltalen += deltalen
1698 compresseddeltalen += deltalen
1699 return (dist, deltalen, (header, data), base,
1699 return (dist, deltalen, (header, data), base,
1700 chainbase, chainlen, compresseddeltalen)
1700 chainbase, chainlen, compresseddeltalen)
1701
1701
1702 curr = len(self)
1702 curr = len(self)
1703 prev = curr - 1
1703 prev = curr - 1
1704 offset = self.end(prev)
1704 offset = self.end(prev)
1705 delta = None
1705 delta = None
1706 p1r, p2r = self.rev(p1), self.rev(p2)
1706 p1r, p2r = self.rev(p1), self.rev(p2)
1707
1707
1708 # full versions are inserted when the needed deltas
1708 # full versions are inserted when the needed deltas
1709 # become comparable to the uncompressed text
1709 # become comparable to the uncompressed text
1710 if rawtext is None:
1710 if rawtext is None:
1711 textlen = mdiff.patchedsize(self.rawsize(cachedelta[0]),
1711 textlen = mdiff.patchedsize(self.rawsize(cachedelta[0]),
1712 cachedelta[1])
1712 cachedelta[1])
1713 else:
1713 else:
1714 textlen = len(rawtext)
1714 textlen = len(rawtext)
1715
1715
1716 # should we try to build a delta?
1716 # should we try to build a delta?
1717 if prev != nullrev and self.storedeltachains:
1717 if prev != nullrev and self.storedeltachains:
1718 tested = set()
1718 tested = set()
1719 # This condition is true most of the time when processing
1719 # This condition is true most of the time when processing
1720 # changegroup data into a generaldelta repo. The only time it
1720 # changegroup data into a generaldelta repo. The only time it
1721 # isn't true is if this is the first revision in a delta chain
1721 # isn't true is if this is the first revision in a delta chain
1722 # or if ``format.generaldelta=true`` disabled ``lazydeltabase``.
1722 # or if ``format.generaldelta=true`` disabled ``lazydeltabase``.
1723 if cachedelta and self._generaldelta and self._lazydeltabase:
1723 if cachedelta and self._generaldelta and self._lazydeltabase:
1724 # Assume what we received from the server is a good choice
1724 # Assume what we received from the server is a good choice
1725 # build delta will reuse the cache
1725 # build delta will reuse the cache
1726 candidatedelta = builddelta(cachedelta[0])
1726 candidatedelta = builddelta(cachedelta[0])
1727 tested.add(cachedelta[0])
1727 tested.add(cachedelta[0])
1728 if self._isgooddelta(candidatedelta, textlen):
1728 if self._isgooddelta(candidatedelta, textlen):
1729 delta = candidatedelta
1729 delta = candidatedelta
1730 if delta is None and self._generaldelta:
1730 if delta is None and self._generaldelta:
1731 # exclude already lazy tested base if any
1731 # exclude already lazy tested base if any
1732 parents = [p for p in (p1r, p2r)
1732 parents = [p for p in (p1r, p2r)
1733 if p != nullrev and p not in tested]
1733 if p != nullrev and p not in tested]
1734 if parents and not self._aggressivemergedeltas:
1734 if parents and not self._aggressivemergedeltas:
1735 # Pick whichever parent is closer to us (to minimize the
1735 # Pick whichever parent is closer to us (to minimize the
1736 # chance of having to build a fulltext).
1736 # chance of having to build a fulltext).
1737 parents = [max(parents)]
1737 parents = [max(parents)]
1738 tested.update(parents)
1738 tested.update(parents)
1739 pdeltas = []
1739 pdeltas = []
1740 for p in parents:
1740 for p in parents:
1741 pd = builddelta(p)
1741 pd = builddelta(p)
1742 if self._isgooddelta(pd, textlen):
1742 if self._isgooddelta(pd, textlen):
1743 pdeltas.append(pd)
1743 pdeltas.append(pd)
1744 if pdeltas:
1744 if pdeltas:
1745 delta = min(pdeltas, key=lambda x: x[1])
1745 delta = min(pdeltas, key=lambda x: x[1])
1746 if delta is None and prev not in tested:
1746 if delta is None and prev not in tested:
1747 # other approach failed try against prev to hopefully save us a
1747 # other approach failed try against prev to hopefully save us a
1748 # fulltext.
1748 # fulltext.
1749 candidatedelta = builddelta(prev)
1749 candidatedelta = builddelta(prev)
1750 if self._isgooddelta(candidatedelta, textlen):
1750 if self._isgooddelta(candidatedelta, textlen):
1751 delta = candidatedelta
1751 delta = candidatedelta
1752 if delta is not None:
1752 if delta is not None:
1753 dist, l, data, base, chainbase, chainlen, compresseddeltalen = delta
1753 dist, l, data, base, chainbase, chainlen, compresseddeltalen = delta
1754 else:
1754 else:
1755 rawtext = buildtext()
1755 rawtext = buildtext()
1756 data = self.compress(rawtext)
1756 data = self.compress(rawtext)
1757 l = len(data[1]) + len(data[0])
1757 l = len(data[1]) + len(data[0])
1758 base = chainbase = curr
1758 base = chainbase = curr
1759
1759
1760 e = (offset_type(offset, flags), l, textlen,
1760 e = (offset_type(offset, flags), l, textlen,
1761 base, link, p1r, p2r, node)
1761 base, link, p1r, p2r, node)
1762 self.index.insert(-1, e)
1762 self.index.insert(-1, e)
1763 self.nodemap[node] = curr
1763 self.nodemap[node] = curr
1764
1764
1765 entry = self._io.packentry(e, self.node, self.version, curr)
1765 entry = self._io.packentry(e, self.node, self.version, curr)
1766 self._writeentry(transaction, ifh, dfh, entry, data, link, offset)
1766 self._writeentry(transaction, ifh, dfh, entry, data, link, offset)
1767
1767
1768 if alwayscache and rawtext is None:
1768 if alwayscache and rawtext is None:
1769 rawtext = buildtext()
1769 rawtext = buildtext()
1770
1770
1771 if type(rawtext) == str: # only accept immutable objects
1771 if type(rawtext) == str: # only accept immutable objects
1772 self._cache = (node, curr, rawtext)
1772 self._cache = (node, curr, rawtext)
1773 self._chainbasecache[curr] = chainbase
1773 self._chainbasecache[curr] = chainbase
1774 return node
1774 return node
1775
1775
1776 def _writeentry(self, transaction, ifh, dfh, entry, data, link, offset):
1776 def _writeentry(self, transaction, ifh, dfh, entry, data, link, offset):
1777 # Files opened in a+ mode have inconsistent behavior on various
1777 # Files opened in a+ mode have inconsistent behavior on various
1778 # platforms. Windows requires that a file positioning call be made
1778 # platforms. Windows requires that a file positioning call be made
1779 # when the file handle transitions between reads and writes. See
1779 # when the file handle transitions between reads and writes. See
1780 # 3686fa2b8eee and the mixedfilemodewrapper in windows.py. On other
1780 # 3686fa2b8eee and the mixedfilemodewrapper in windows.py. On other
1781 # platforms, Python or the platform itself can be buggy. Some versions
1781 # platforms, Python or the platform itself can be buggy. Some versions
1782 # of Solaris have been observed to not append at the end of the file
1782 # of Solaris have been observed to not append at the end of the file
1783 # if the file was seeked to before the end. See issue4943 for more.
1783 # if the file was seeked to before the end. See issue4943 for more.
1784 #
1784 #
1785 # We work around this issue by inserting a seek() before writing.
1785 # We work around this issue by inserting a seek() before writing.
1786 # Note: This is likely not necessary on Python 3.
1786 # Note: This is likely not necessary on Python 3.
1787 ifh.seek(0, os.SEEK_END)
1787 ifh.seek(0, os.SEEK_END)
1788 if dfh:
1788 if dfh:
1789 dfh.seek(0, os.SEEK_END)
1789 dfh.seek(0, os.SEEK_END)
1790
1790
1791 curr = len(self) - 1
1791 curr = len(self) - 1
1792 if not self._inline:
1792 if not self._inline:
1793 transaction.add(self.datafile, offset)
1793 transaction.add(self.datafile, offset)
1794 transaction.add(self.indexfile, curr * len(entry))
1794 transaction.add(self.indexfile, curr * len(entry))
1795 if data[0]:
1795 if data[0]:
1796 dfh.write(data[0])
1796 dfh.write(data[0])
1797 dfh.write(data[1])
1797 dfh.write(data[1])
1798 ifh.write(entry)
1798 ifh.write(entry)
1799 else:
1799 else:
1800 offset += curr * self._io.size
1800 offset += curr * self._io.size
1801 transaction.add(self.indexfile, offset, curr)
1801 transaction.add(self.indexfile, offset, curr)
1802 ifh.write(entry)
1802 ifh.write(entry)
1803 ifh.write(data[0])
1803 ifh.write(data[0])
1804 ifh.write(data[1])
1804 ifh.write(data[1])
1805 self.checkinlinesize(transaction, ifh)
1805 self.checkinlinesize(transaction, ifh)
1806
1806
1807 def addgroup(self, cg, linkmapper, transaction, addrevisioncb=None):
1807 def addgroup(self, cg, linkmapper, transaction, addrevisioncb=None):
1808 """
1808 """
1809 add a delta group
1809 add a delta group
1810
1810
1811 given a set of deltas, add them to the revision log. the
1811 given a set of deltas, add them to the revision log. the
1812 first delta is against its parent, which should be in our
1812 first delta is against its parent, which should be in our
1813 log, the rest are against the previous delta.
1813 log, the rest are against the previous delta.
1814
1814
1815 If ``addrevisioncb`` is defined, it will be called with arguments of
1815 If ``addrevisioncb`` is defined, it will be called with arguments of
1816 this revlog and the node that was added.
1816 this revlog and the node that was added.
1817 """
1817 """
1818
1818
1819 # track the base of the current delta log
1819 # track the base of the current delta log
1820 content = []
1820 content = []
1821 node = None
1821 node = None
1822
1822
1823 r = len(self)
1823 r = len(self)
1824 end = 0
1824 end = 0
1825 if r:
1825 if r:
1826 end = self.end(r - 1)
1826 end = self.end(r - 1)
1827 ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
1827 ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
1828 isize = r * self._io.size
1828 isize = r * self._io.size
1829 if self._inline:
1829 if self._inline:
1830 transaction.add(self.indexfile, end + isize, r)
1830 transaction.add(self.indexfile, end + isize, r)
1831 dfh = None
1831 dfh = None
1832 else:
1832 else:
1833 transaction.add(self.indexfile, isize, r)
1833 transaction.add(self.indexfile, isize, r)
1834 transaction.add(self.datafile, end)
1834 transaction.add(self.datafile, end)
1835 dfh = self.opener(self.datafile, "a+")
1835 dfh = self.opener(self.datafile, "a+")
1836 def flush():
1836 def flush():
1837 if dfh:
1837 if dfh:
1838 dfh.flush()
1838 dfh.flush()
1839 ifh.flush()
1839 ifh.flush()
1840 try:
1840 try:
1841 # loop through our set of deltas
1841 # loop through our set of deltas
1842 chain = None
1842 chain = None
1843 for chunkdata in iter(lambda: cg.deltachunk(chain), {}):
1843 for chunkdata in iter(lambda: cg.deltachunk(chain), {}):
1844 node = chunkdata['node']
1844 node = chunkdata['node']
1845 p1 = chunkdata['p1']
1845 p1 = chunkdata['p1']
1846 p2 = chunkdata['p2']
1846 p2 = chunkdata['p2']
1847 cs = chunkdata['cs']
1847 cs = chunkdata['cs']
1848 deltabase = chunkdata['deltabase']
1848 deltabase = chunkdata['deltabase']
1849 delta = chunkdata['delta']
1849 delta = chunkdata['delta']
1850 flags = chunkdata['flags'] or REVIDX_DEFAULT_FLAGS
1850 flags = chunkdata['flags'] or REVIDX_DEFAULT_FLAGS
1851
1851
1852 content.append(node)
1852 content.append(node)
1853
1853
1854 link = linkmapper(cs)
1854 link = linkmapper(cs)
1855 if node in self.nodemap:
1855 if node in self.nodemap:
1856 # this can happen if two branches make the same change
1856 # this can happen if two branches make the same change
1857 chain = node
1857 chain = node
1858 continue
1858 continue
1859
1859
1860 for p in (p1, p2):
1860 for p in (p1, p2):
1861 if p not in self.nodemap:
1861 if p not in self.nodemap:
1862 raise LookupError(p, self.indexfile,
1862 raise LookupError(p, self.indexfile,
1863 _('unknown parent'))
1863 _('unknown parent'))
1864
1864
1865 if deltabase not in self.nodemap:
1865 if deltabase not in self.nodemap:
1866 raise LookupError(deltabase, self.indexfile,
1866 raise LookupError(deltabase, self.indexfile,
1867 _('unknown delta base'))
1867 _('unknown delta base'))
1868
1868
1869 baserev = self.rev(deltabase)
1869 baserev = self.rev(deltabase)
1870
1870
1871 if baserev != nullrev and self.iscensored(baserev):
1871 if baserev != nullrev and self.iscensored(baserev):
1872 # if base is censored, delta must be full replacement in a
1872 # if base is censored, delta must be full replacement in a
1873 # single patch operation
1873 # single patch operation
1874 hlen = struct.calcsize(">lll")
1874 hlen = struct.calcsize(">lll")
1875 oldlen = self.rawsize(baserev)
1875 oldlen = self.rawsize(baserev)
1876 newlen = len(delta) - hlen
1876 newlen = len(delta) - hlen
1877 if delta[:hlen] != mdiff.replacediffheader(oldlen, newlen):
1877 if delta[:hlen] != mdiff.replacediffheader(oldlen, newlen):
1878 raise error.CensoredBaseError(self.indexfile,
1878 raise error.CensoredBaseError(self.indexfile,
1879 self.node(baserev))
1879 self.node(baserev))
1880
1880
1881 if not flags and self._peek_iscensored(baserev, delta, flush):
1881 if not flags and self._peek_iscensored(baserev, delta, flush):
1882 flags |= REVIDX_ISCENSORED
1882 flags |= REVIDX_ISCENSORED
1883
1883
1884 # We assume consumers of addrevisioncb will want to retrieve
1884 # We assume consumers of addrevisioncb will want to retrieve
1885 # the added revision, which will require a call to
1885 # the added revision, which will require a call to
1886 # revision(). revision() will fast path if there is a cache
1886 # revision(). revision() will fast path if there is a cache
1887 # hit. So, we tell _addrevision() to always cache in this case.
1887 # hit. So, we tell _addrevision() to always cache in this case.
1888 # We're only using addgroup() in the context of changegroup
1888 # We're only using addgroup() in the context of changegroup
1889 # generation so the revision data can always be handled as raw
1889 # generation so the revision data can always be handled as raw
1890 # by the flagprocessor.
1890 # by the flagprocessor.
1891 chain = self._addrevision(node, None, transaction, link,
1891 chain = self._addrevision(node, None, transaction, link,
1892 p1, p2, flags, (baserev, delta),
1892 p1, p2, flags, (baserev, delta),
1893 ifh, dfh,
1893 ifh, dfh,
1894 alwayscache=bool(addrevisioncb))
1894 alwayscache=bool(addrevisioncb))
1895
1895
1896 if addrevisioncb:
1896 if addrevisioncb:
1897 addrevisioncb(self, chain)
1897 addrevisioncb(self, chain)
1898
1898
1899 if not dfh and not self._inline:
1899 if not dfh and not self._inline:
1900 # addrevision switched from inline to conventional
1900 # addrevision switched from inline to conventional
1901 # reopen the index
1901 # reopen the index
1902 ifh.close()
1902 ifh.close()
1903 dfh = self.opener(self.datafile, "a+")
1903 dfh = self.opener(self.datafile, "a+")
1904 ifh = self.opener(self.indexfile, "a+",
1904 ifh = self.opener(self.indexfile, "a+",
1905 checkambig=self._checkambig)
1905 checkambig=self._checkambig)
1906 finally:
1906 finally:
1907 if dfh:
1907 if dfh:
1908 dfh.close()
1908 dfh.close()
1909 ifh.close()
1909 ifh.close()
1910
1910
1911 return content
1911 return content
1912
1912
1913 def iscensored(self, rev):
1913 def iscensored(self, rev):
1914 """Check if a file revision is censored."""
1914 """Check if a file revision is censored."""
1915 return False
1915 return False
1916
1916
1917 def _peek_iscensored(self, baserev, delta, flush):
1917 def _peek_iscensored(self, baserev, delta, flush):
1918 """Quickly check if a delta produces a censored revision."""
1918 """Quickly check if a delta produces a censored revision."""
1919 return False
1919 return False
1920
1920
1921 def getstrippoint(self, minlink):
1921 def getstrippoint(self, minlink):
1922 """find the minimum rev that must be stripped to strip the linkrev
1922 """find the minimum rev that must be stripped to strip the linkrev
1923
1923
1924 Returns a tuple containing the minimum rev and a set of all revs that
1924 Returns a tuple containing the minimum rev and a set of all revs that
1925 have linkrevs that will be broken by this strip.
1925 have linkrevs that will be broken by this strip.
1926 """
1926 """
1927 brokenrevs = set()
1927 brokenrevs = set()
1928 strippoint = len(self)
1928 strippoint = len(self)
1929
1929
1930 heads = {}
1930 heads = {}
1931 futurelargelinkrevs = set()
1931 futurelargelinkrevs = set()
1932 for head in self.headrevs():
1932 for head in self.headrevs():
1933 headlinkrev = self.linkrev(head)
1933 headlinkrev = self.linkrev(head)
1934 heads[head] = headlinkrev
1934 heads[head] = headlinkrev
1935 if headlinkrev >= minlink:
1935 if headlinkrev >= minlink:
1936 futurelargelinkrevs.add(headlinkrev)
1936 futurelargelinkrevs.add(headlinkrev)
1937
1937
1938 # This algorithm involves walking down the rev graph, starting at the
1938 # This algorithm involves walking down the rev graph, starting at the
1939 # heads. Since the revs are topologically sorted according to linkrev,
1939 # heads. Since the revs are topologically sorted according to linkrev,
1940 # once all head linkrevs are below the minlink, we know there are
1940 # once all head linkrevs are below the minlink, we know there are
1941 # no more revs that could have a linkrev greater than minlink.
1941 # no more revs that could have a linkrev greater than minlink.
1942 # So we can stop walking.
1942 # So we can stop walking.
1943 while futurelargelinkrevs:
1943 while futurelargelinkrevs:
1944 strippoint -= 1
1944 strippoint -= 1
1945 linkrev = heads.pop(strippoint)
1945 linkrev = heads.pop(strippoint)
1946
1946
1947 if linkrev < minlink:
1947 if linkrev < minlink:
1948 brokenrevs.add(strippoint)
1948 brokenrevs.add(strippoint)
1949 else:
1949 else:
1950 futurelargelinkrevs.remove(linkrev)
1950 futurelargelinkrevs.remove(linkrev)
1951
1951
1952 for p in self.parentrevs(strippoint):
1952 for p in self.parentrevs(strippoint):
1953 if p != nullrev:
1953 if p != nullrev:
1954 plinkrev = self.linkrev(p)
1954 plinkrev = self.linkrev(p)
1955 heads[p] = plinkrev
1955 heads[p] = plinkrev
1956 if plinkrev >= minlink:
1956 if plinkrev >= minlink:
1957 futurelargelinkrevs.add(plinkrev)
1957 futurelargelinkrevs.add(plinkrev)
1958
1958
1959 return strippoint, brokenrevs
1959 return strippoint, brokenrevs
1960
1960
1961 def strip(self, minlink, transaction):
1961 def strip(self, minlink, transaction):
1962 """truncate the revlog on the first revision with a linkrev >= minlink
1962 """truncate the revlog on the first revision with a linkrev >= minlink
1963
1963
1964 This function is called when we're stripping revision minlink and
1964 This function is called when we're stripping revision minlink and
1965 its descendants from the repository.
1965 its descendants from the repository.
1966
1966
1967 We have to remove all revisions with linkrev >= minlink, because
1967 We have to remove all revisions with linkrev >= minlink, because
1968 the equivalent changelog revisions will be renumbered after the
1968 the equivalent changelog revisions will be renumbered after the
1969 strip.
1969 strip.
1970
1970
1971 So we truncate the revlog on the first of these revisions, and
1971 So we truncate the revlog on the first of these revisions, and
1972 trust that the caller has saved the revisions that shouldn't be
1972 trust that the caller has saved the revisions that shouldn't be
1973 removed and that it'll re-add them after this truncation.
1973 removed and that it'll re-add them after this truncation.
1974 """
1974 """
1975 if len(self) == 0:
1975 if len(self) == 0:
1976 return
1976 return
1977
1977
1978 rev, _ = self.getstrippoint(minlink)
1978 rev, _ = self.getstrippoint(minlink)
1979 if rev == len(self):
1979 if rev == len(self):
1980 return
1980 return
1981
1981
1982 # first truncate the files on disk
1982 # first truncate the files on disk
1983 end = self.start(rev)
1983 end = self.start(rev)
1984 if not self._inline:
1984 if not self._inline:
1985 transaction.add(self.datafile, end)
1985 transaction.add(self.datafile, end)
1986 end = rev * self._io.size
1986 end = rev * self._io.size
1987 else:
1987 else:
1988 end += rev * self._io.size
1988 end += rev * self._io.size
1989
1989
1990 transaction.add(self.indexfile, end)
1990 transaction.add(self.indexfile, end)
1991
1991
1992 # then reset internal state in memory to forget those revisions
1992 # then reset internal state in memory to forget those revisions
1993 self._cache = None
1993 self._cache = None
1994 self._chaininfocache = {}
1994 self._chaininfocache = {}
1995 self._chunkclear()
1995 self._chunkclear()
1996 for x in xrange(rev, len(self)):
1996 for x in xrange(rev, len(self)):
1997 del self.nodemap[self.node(x)]
1997 del self.nodemap[self.node(x)]
1998
1998
1999 del self.index[rev:-1]
1999 del self.index[rev:-1]
2000
2000
2001 def checksize(self):
2001 def checksize(self):
2002 expected = 0
2002 expected = 0
2003 if len(self):
2003 if len(self):
2004 expected = max(0, self.end(len(self) - 1))
2004 expected = max(0, self.end(len(self) - 1))
2005
2005
2006 try:
2006 try:
2007 f = self.opener(self.datafile)
2007 f = self.opener(self.datafile)
2008 f.seek(0, 2)
2008 f.seek(0, 2)
2009 actual = f.tell()
2009 actual = f.tell()
2010 f.close()
2010 f.close()
2011 dd = actual - expected
2011 dd = actual - expected
2012 except IOError as inst:
2012 except IOError as inst:
2013 if inst.errno != errno.ENOENT:
2013 if inst.errno != errno.ENOENT:
2014 raise
2014 raise
2015 dd = 0
2015 dd = 0
2016
2016
2017 try:
2017 try:
2018 f = self.opener(self.indexfile)
2018 f = self.opener(self.indexfile)
2019 f.seek(0, 2)
2019 f.seek(0, 2)
2020 actual = f.tell()
2020 actual = f.tell()
2021 f.close()
2021 f.close()
2022 s = self._io.size
2022 s = self._io.size
2023 i = max(0, actual // s)
2023 i = max(0, actual // s)
2024 di = actual - (i * s)
2024 di = actual - (i * s)
2025 if self._inline:
2025 if self._inline:
2026 databytes = 0
2026 databytes = 0
2027 for r in self:
2027 for r in self:
2028 databytes += max(0, self.length(r))
2028 databytes += max(0, self.length(r))
2029 dd = 0
2029 dd = 0
2030 di = actual - len(self) * s - databytes
2030 di = actual - len(self) * s - databytes
2031 except IOError as inst:
2031 except IOError as inst:
2032 if inst.errno != errno.ENOENT:
2032 if inst.errno != errno.ENOENT:
2033 raise
2033 raise
2034 di = 0
2034 di = 0
2035
2035
2036 return (dd, di)
2036 return (dd, di)
2037
2037
2038 def files(self):
2038 def files(self):
2039 res = [self.indexfile]
2039 res = [self.indexfile]
2040 if not self._inline:
2040 if not self._inline:
2041 res.append(self.datafile)
2041 res.append(self.datafile)
2042 return res
2042 return res
2043
2043
2044 DELTAREUSEALWAYS = 'always'
2044 DELTAREUSEALWAYS = 'always'
2045 DELTAREUSESAMEREVS = 'samerevs'
2045 DELTAREUSESAMEREVS = 'samerevs'
2046 DELTAREUSENEVER = 'never'
2046 DELTAREUSENEVER = 'never'
2047
2047
2048 DELTAREUSEALL = {'always', 'samerevs', 'never'}
2048 DELTAREUSEALL = {'always', 'samerevs', 'never'}
2049
2049
2050 def clone(self, tr, destrevlog, addrevisioncb=None,
2050 def clone(self, tr, destrevlog, addrevisioncb=None,
2051 deltareuse=DELTAREUSESAMEREVS, aggressivemergedeltas=None):
2051 deltareuse=DELTAREUSESAMEREVS, aggressivemergedeltas=None):
2052 """Copy this revlog to another, possibly with format changes.
2052 """Copy this revlog to another, possibly with format changes.
2053
2053
2054 The destination revlog will contain the same revisions and nodes.
2054 The destination revlog will contain the same revisions and nodes.
2055 However, it may not be bit-for-bit identical due to e.g. delta encoding
2055 However, it may not be bit-for-bit identical due to e.g. delta encoding
2056 differences.
2056 differences.
2057
2057
2058 The ``deltareuse`` argument control how deltas from the existing revlog
2058 The ``deltareuse`` argument control how deltas from the existing revlog
2059 are preserved in the destination revlog. The argument can have the
2059 are preserved in the destination revlog. The argument can have the
2060 following values:
2060 following values:
2061
2061
2062 DELTAREUSEALWAYS
2062 DELTAREUSEALWAYS
2063 Deltas will always be reused (if possible), even if the destination
2063 Deltas will always be reused (if possible), even if the destination
2064 revlog would not select the same revisions for the delta. This is the
2064 revlog would not select the same revisions for the delta. This is the
2065 fastest mode of operation.
2065 fastest mode of operation.
2066 DELTAREUSESAMEREVS
2066 DELTAREUSESAMEREVS
2067 Deltas will be reused if the destination revlog would pick the same
2067 Deltas will be reused if the destination revlog would pick the same
2068 revisions for the delta. This mode strikes a balance between speed
2068 revisions for the delta. This mode strikes a balance between speed
2069 and optimization.
2069 and optimization.
2070 DELTAREUSENEVER
2070 DELTAREUSENEVER
2071 Deltas will never be reused. This is the slowest mode of execution.
2071 Deltas will never be reused. This is the slowest mode of execution.
2072 This mode can be used to recompute deltas (e.g. if the diff/delta
2072 This mode can be used to recompute deltas (e.g. if the diff/delta
2073 algorithm changes).
2073 algorithm changes).
2074
2074
2075 Delta computation can be slow, so the choice of delta reuse policy can
2075 Delta computation can be slow, so the choice of delta reuse policy can
2076 significantly affect run time.
2076 significantly affect run time.
2077
2077
2078 The default policy (``DELTAREUSESAMEREVS``) strikes a balance between
2078 The default policy (``DELTAREUSESAMEREVS``) strikes a balance between
2079 two extremes. Deltas will be reused if they are appropriate. But if the
2079 two extremes. Deltas will be reused if they are appropriate. But if the
2080 delta could choose a better revision, it will do so. This means if you
2080 delta could choose a better revision, it will do so. This means if you
2081 are converting a non-generaldelta revlog to a generaldelta revlog,
2081 are converting a non-generaldelta revlog to a generaldelta revlog,
2082 deltas will be recomputed if the delta's parent isn't a parent of the
2082 deltas will be recomputed if the delta's parent isn't a parent of the
2083 revision.
2083 revision.
2084
2084
2085 In addition to the delta policy, the ``aggressivemergedeltas`` argument
2085 In addition to the delta policy, the ``aggressivemergedeltas`` argument
2086 controls whether to compute deltas against both parents for merges.
2086 controls whether to compute deltas against both parents for merges.
2087 By default, the current default is used.
2087 By default, the current default is used.
2088 """
2088 """
2089 if deltareuse not in self.DELTAREUSEALL:
2089 if deltareuse not in self.DELTAREUSEALL:
2090 raise ValueError(_('value for deltareuse invalid: %s') % deltareuse)
2090 raise ValueError(_('value for deltareuse invalid: %s') % deltareuse)
2091
2091
2092 if len(destrevlog):
2092 if len(destrevlog):
2093 raise ValueError(_('destination revlog is not empty'))
2093 raise ValueError(_('destination revlog is not empty'))
2094
2094
2095 if getattr(self, 'filteredrevs', None):
2095 if getattr(self, 'filteredrevs', None):
2096 raise ValueError(_('source revlog has filtered revisions'))
2096 raise ValueError(_('source revlog has filtered revisions'))
2097 if getattr(destrevlog, 'filteredrevs', None):
2097 if getattr(destrevlog, 'filteredrevs', None):
2098 raise ValueError(_('destination revlog has filtered revisions'))
2098 raise ValueError(_('destination revlog has filtered revisions'))
2099
2099
2100 # lazydeltabase controls whether to reuse a cached delta, if possible.
2100 # lazydeltabase controls whether to reuse a cached delta, if possible.
2101 oldlazydeltabase = destrevlog._lazydeltabase
2101 oldlazydeltabase = destrevlog._lazydeltabase
2102 oldamd = destrevlog._aggressivemergedeltas
2102 oldamd = destrevlog._aggressivemergedeltas
2103
2103
2104 try:
2104 try:
2105 if deltareuse == self.DELTAREUSEALWAYS:
2105 if deltareuse == self.DELTAREUSEALWAYS:
2106 destrevlog._lazydeltabase = True
2106 destrevlog._lazydeltabase = True
2107 elif deltareuse == self.DELTAREUSESAMEREVS:
2107 elif deltareuse == self.DELTAREUSESAMEREVS:
2108 destrevlog._lazydeltabase = False
2108 destrevlog._lazydeltabase = False
2109
2109
2110 destrevlog._aggressivemergedeltas = aggressivemergedeltas or oldamd
2110 destrevlog._aggressivemergedeltas = aggressivemergedeltas or oldamd
2111
2111
2112 populatecachedelta = deltareuse in (self.DELTAREUSEALWAYS,
2112 populatecachedelta = deltareuse in (self.DELTAREUSEALWAYS,
2113 self.DELTAREUSESAMEREVS)
2113 self.DELTAREUSESAMEREVS)
2114
2114
2115 index = self.index
2115 index = self.index
2116 for rev in self:
2116 for rev in self:
2117 entry = index[rev]
2117 entry = index[rev]
2118
2118
2119 # Some classes override linkrev to take filtered revs into
2119 # Some classes override linkrev to take filtered revs into
2120 # account. Use raw entry from index.
2120 # account. Use raw entry from index.
2121 flags = entry[0] & 0xffff
2121 flags = entry[0] & 0xffff
2122 linkrev = entry[4]
2122 linkrev = entry[4]
2123 p1 = index[entry[5]][7]
2123 p1 = index[entry[5]][7]
2124 p2 = index[entry[6]][7]
2124 p2 = index[entry[6]][7]
2125 node = entry[7]
2125 node = entry[7]
2126
2126
2127 # (Possibly) reuse the delta from the revlog if allowed and
2127 # (Possibly) reuse the delta from the revlog if allowed and
2128 # the revlog chunk is a delta.
2128 # the revlog chunk is a delta.
2129 cachedelta = None
2129 cachedelta = None
2130 rawtext = None
2130 rawtext = None
2131 if populatecachedelta:
2131 if populatecachedelta:
2132 dp = self.deltaparent(rev)
2132 dp = self.deltaparent(rev)
2133 if dp != nullrev:
2133 if dp != nullrev:
2134 cachedelta = (dp, str(self._chunk(rev)))
2134 cachedelta = (dp, str(self._chunk(rev)))
2135
2135
2136 if not cachedelta:
2136 if not cachedelta:
2137 rawtext = self.revision(rev, raw=True)
2137 rawtext = self.revision(rev, raw=True)
2138
2138
2139 ifh = destrevlog.opener(destrevlog.indexfile, 'a+',
2139 ifh = destrevlog.opener(destrevlog.indexfile, 'a+',
2140 checkambig=False)
2140 checkambig=False)
2141 dfh = None
2141 dfh = None
2142 if not destrevlog._inline:
2142 if not destrevlog._inline:
2143 dfh = destrevlog.opener(destrevlog.datafile, 'a+')
2143 dfh = destrevlog.opener(destrevlog.datafile, 'a+')
2144 try:
2144 try:
2145 destrevlog._addrevision(node, rawtext, tr, linkrev, p1, p2,
2145 destrevlog._addrevision(node, rawtext, tr, linkrev, p1, p2,
2146 flags, cachedelta, ifh, dfh)
2146 flags, cachedelta, ifh, dfh)
2147 finally:
2147 finally:
2148 if dfh:
2148 if dfh:
2149 dfh.close()
2149 dfh.close()
2150 ifh.close()
2150 ifh.close()
2151
2151
2152 if addrevisioncb:
2152 if addrevisioncb:
2153 addrevisioncb(self, rev, node)
2153 addrevisioncb(self, rev, node)
2154 finally:
2154 finally:
2155 destrevlog._lazydeltabase = oldlazydeltabase
2155 destrevlog._lazydeltabase = oldlazydeltabase
2156 destrevlog._aggressivemergedeltas = oldamd
2156 destrevlog._aggressivemergedeltas = oldamd
General Comments 0
You need to be logged in to leave comments. Login now