##// END OF EJS Templates
help: fix the display for `hg help internals.revlogs` (issue5227)...
Matt Harbison -
r29094:1111e84d stable
parent child Browse files
Show More
@@ -1,201 +1,201 b''
1 Revisions Logs
2 ==============
1 Revlogs
2 =======
3 3
4 4 Revision logs - or *revlogs* - are an append only data structure for
5 5 storing discrete entries, or *revisions*. They are the primary storage
6 6 mechanism of repository data.
7 7
8 8 Revlogs effectively model a directed acyclic graph (DAG). Each node
9 9 has edges to 1 or 2 *parent* nodes. Each node contains metadata and
10 10 the raw value for that node.
11 11
12 12 Revlogs consist of entries which have metadata and revision data.
13 13 Metadata includes the hash of the revision's content, sizes, and
14 14 links to its *parent* entries. The collective metadata is referred
15 15 to as the *index* and the revision data is the *data*.
16 16
17 17 Revision data is stored as a series of compressed deltas against previous
18 18 revisions.
19 19
20 20 Revlogs are written in an append-only fashion. We never need to rewrite
21 21 a file to insert nor do we need to remove data. Rolling back in-progress
22 22 writes can be performed by truncating files. Read locks can be avoided
23 23 using simple techniques. This means that references to other data in
24 24 the same revlog *always* refer to a previous entry.
25 25
26 26 Revlogs can be modeled as 0-indexed arrays. The first revision is
27 27 revision #0 and the second is revision #1. The revision -1 is typically
28 28 used to mean *does not exist* or *not defined*.
29 29
30 30 File Format
31 31 -----------
32 32
33 33 A revlog begins with a 32-bit big endian integer holding version info
34 34 and feature flags. This integer is shared with the first revision
35 35 entry.
36 36
37 37 This integer is logically divided into 2 16-bit shorts. The least
38 38 significant half of the integer is the format/version short. The other
39 39 short holds feature flags that dictate behavior of the revlog.
40 40
41 41 Only 1 bit of the format/version short is currently used. Remaining
42 42 bits are reserved for future use.
43 43
44 44 The following values for the format/version short are defined:
45 45
46 46 0
47 47 The original revlog version.
48 48 1
49 49 RevlogNG (*next generation*). It replaced version 0 when it was
50 50 implemented in 2006.
51 51
52 52 The feature flags short consists of bit flags. Where 0 is the least
53 53 significant bit, the following bit offsets define flags:
54 54
55 55 0
56 56 Store revision data inline.
57 57 1
58 58 Generaldelta encoding.
59 59
60 60 2-15
61 61 Reserved for future use.
62 62
63 63 The following header values are common:
64 64
65 65 00 00 00 01
66 66 RevlogNG
67 67 00 01 00 01
68 68 RevlogNG + inline
69 69 00 02 00 01
70 70 RevlogNG + generaldelta
71 71 00 03 00 01
72 72 RevlogNG + inline + generaldelta
73 73
74 74 Following the 32-bit header is the remainder of the first index entry.
75 75 Following that are remaining *index* data. Inlined revision data is
76 76 possibly located between index entries. More on this layout is described
77 77 below.
78 78
79 79 RevlogNG Format
80 80 ---------------
81 81
82 82 RevlogNG (version 1) begins with an index describing the revisions in
83 83 the revlog. If the ``inline`` flag is set, revision data is stored inline,
84 84 or between index entries (as opposed to in a separate container).
85 85
86 86 Each index entry is 64 bytes. The byte layout of each entry is as
87 87 follows, with byte 0 being the first byte (all data stored as big endian):
88 88
89 89 0-3 (4 bytes) (rev 0 only)
90 90 Revlog header
91 91 0-5 (6 bytes)
92 92 Absolute offset of revision data from beginning of revlog.
93 93 6-7 (2 bytes)
94 94 Bit flags impacting revision behavior.
95 95 8-11 (4 bytes)
96 96 Compressed length of revision data / chunk as stored in revlog.
97 97 12-15 (4 bytes)
98 98 Uncompressed length of revision data / chunk.
99 99 16-19 (4 bytes)
100 100 Base or previous revision this revision's delta was produced against.
101 101 -1 means this revision holds full text (as opposed to a delta).
102 102 For generaldelta repos, this is the previous revision in the delta
103 103 chain. For non-generaldelta repos, this is the base or first
104 104 revision in the delta chain.
105 105 20-23 (4 bytes)
106 106 A revision this revision is *linked* to. This allows a revision in
107 107 one revlog to be forever associated with a revision in another
108 108 revlog. For example, a file's revlog may point to the changelog
109 109 revision that introduced it.
110 110 24-27 (4 bytes)
111 111 Revision of 1st parent. -1 indicates no parent.
112 112 28-31 (4 bytes)
113 113 Revision of 2nd parent. -1 indicates no 2nd parent.
114 114 32-63 (32 bytes)
115 115 Hash of revision's full text. Currently, SHA-1 is used and only
116 116 the first 20 bytes of this field are used. The rest of the bytes
117 117 are ignored and should be stored as \0.
118 118
119 119 If inline revision data is being stored, the compressed revision data
120 120 (of length from bytes offset 8-11 from the index entry) immediately
121 121 follows the index entry. There is no header on the revision data. There
122 122 is no padding between it and the index entries before and after.
123 123
124 124 If revision data is not inline, then raw revision data is stored in a
125 125 separate byte container. The offsets from bytes 0-5 and the compressed
126 126 length from bytes 8-11 define how to access this data.
127 127
128 128 The first 4 bytes of the revlog are shared between the revlog header
129 129 and the 6 byte absolute offset field from the first revlog entry.
130 130
131 131 Delta Chains
132 132 ------------
133 133
134 134 Revision data is encoded as a chain of *chunks*. Each chain begins with
135 135 the compressed original full text for that revision. Each subsequent
136 136 *chunk* is a *delta* against the previous revision. We therefore call
137 137 these chains of chunks/deltas *delta chains*.
138 138
139 139 The full text for a revision is reconstructed by loading the original
140 140 full text for the base revision of a *delta chain* and then applying
141 141 *deltas* until the target revision is reconstructed.
142 142
143 143 *Delta chains* are limited in length so lookup time is bound. They are
144 144 limited to ~2x the length of the revision's data. The linear distance
145 145 between the base chunk and the final chunk is also limited so the
146 146 amount of read I/O to load all chunks in the delta chain is bound.
147 147
148 148 Deltas and delta chains are either computed against the previous
149 149 revision in the revlog or another revision (almost certainly one of
150 150 the parents of the revision). Historically, deltas were computed against
151 151 the previous revision. The *generaldelta* revlog feature flag (enabled
152 152 by default in Mercurial 3.7) activates the mode where deltas are
153 153 computed against an arbitrary revision (almost certainly a parent revision).
154 154
155 155 File Storage
156 156 ------------
157 157
158 158 Revlogs logically consist of an index (metadata of entries) and
159 159 revision data. This data may be stored together in a single file or in
160 160 separate files. The mechanism used is indicated by the ``inline`` feature
161 161 flag on the revlog.
162 162
163 163 Mercurial's behavior is to use inline storage until a revlog reaches a
164 164 certain size, at which point it will be converted to non-inline. The
165 165 reason there is a size limit on inline storage is to establish an upper
166 166 bound on how much data must be read to load the index. It would be a waste
167 167 to read tens or hundreds of extra megabytes of data just to access the
168 168 index data.
169 169
170 170 The actual layout of revlog files on disk is governed by the repository's
171 171 *store format*. Typically, a ``.i`` file represents the index revlog
172 172 (possibly containing inline data) and a ``.d`` file holds the revision data.
173 173
174 174 Revision Entries
175 175 ----------------
176 176
177 177 Revision entries consist of an optional 1 byte header followed by an
178 178 encoding of the revision data. The headers are as follows:
179 179
180 180 \0 (0x00)
181 181 Revision data is the entirety of the entry, including this header.
182 182 u (0x75)
183 183 Raw revision data follows.
184 184 x (0x78)
185 185 zlib (RFC 1950) data.
186 186
187 187 The 0x78 value is actually the first byte of the zlib header (CMF byte).
188 188
189 189 Hash Computation
190 190 ----------------
191 191
192 192 The hash of the revision is stored in the index and is used both as a primary
193 193 key and for data integrity verification.
194 194
195 195 Currently, SHA-1 is the only supported hashing algorithm. To obtain the SHA-1
196 196 hash of a revision:
197 197
198 198 1. Hash the parent nodes
199 199 2. Hash the fulltext of the revision
200 200
201 201 The 20 byte node ids of the parents are fed into the hasher in ascending order.
General Comments 0
You need to be logged in to leave comments. Login now