##// END OF EJS Templates
Minor corrections
mpm@selenic.com -
r59:2bff7c0e default
parent child Browse files
Show More
@@ -1,158 +1,158 b''
1 Some notes about Mercurial's design
1 Some notes about Mercurial's design
2
2
3 Revlogs:
3 Revlogs:
4
4
5 The fundamental storage type in Mercurial is a "revlog". A revlog is
5 The fundamental storage type in Mercurial is a "revlog". A revlog is
6 the set of all revisions to a file. Each revision is either stored
6 the set of all revisions to a file. Each revision is either stored
7 compressed in its entirety or as a compressed binary delta against the
7 compressed in its entirety or as a compressed binary delta against the
8 previous version. The decision of when to store a full version is made
8 previous version. The decision of when to store a full version is made
9 based on how much data would be needed to reconstruct the file. This
9 based on how much data would be needed to reconstruct the file. This
10 lets us ensure that we never need to read huge amounts of data to
10 lets us ensure that we never need to read huge amounts of data to
11 reconstruct a file, regardless of how many revisions of it we store.
11 reconstruct a file, regardless of how many revisions of it we store.
12
12
13 In fact, we should always be able to do it with a single read,
13 In fact, we should always be able to do it with a single read,
14 provided we know when and where to read. This is where the index comes
14 provided we know when and where to read. This is where the index comes
15 in. Each revlog has an index containing a special hash (nodeid) of the
15 in. Each revlog has an index containing a special hash (nodeid) of the
16 text, hashes for its parents, and where and how much of the revlog
16 text, hashes for its parents, and where and how much of the revlog
17 data we need to read to reconstruct it. Thus, with one read of the
17 data we need to read to reconstruct it. Thus, with one read of the
18 index and one read of the data, we can reconstruct any version in time
18 index and one read of the data, we can reconstruct any version in time
19 proportional to the file size.
19 proportional to the file size.
20
20
21 Similarly, revlogs and their indices are append-only. This means that
21 Similarly, revlogs and their indices are append-only. This means that
22 adding a new version is also O(1) seeks.
22 adding a new version is also O(1) seeks.
23
23
24 Generally revlogs are used to represent revisions of files, but they
24 Generally revlogs are used to represent revisions of files, but they
25 also are used to represent manifests and changesets.
25 also are used to represent manifests and changesets.
26
26
27 Manifests:
27 Manifests:
28
28
29 A manifest is simply a list of all files in a given revision of a
29 A manifest is simply a list of all files in a given revision of a
30 project along with the nodeids of the corresponding file revisions. So
30 project along with the nodeids of the corresponding file revisions. So
31 grabbing a given version of the project means simply looking up its
31 grabbing a given version of the project means simply looking up its
32 manifest and reconstruction all the file revisions pointed to by it.
32 manifest and reconstruction all the file revisions pointed to by it.
33
33
34 Changesets:
34 Changesets:
35
35
36 A changeset is a list of all files changed in a check-in along with a
36 A changeset is a list of all files changed in a check-in along with a
37 change description and some metadata like user and date. It also
37 change description and some metadata like user and date. It also
38 contains a nodeid to the relevent revision of the manifest. Changesets
38 contains a nodeid to the relevent revision of the manifest. Changesets
39 and manifests are one-to-one, but contain different data for
39 and manifests are one-to-one, but contain different data for
40 convenience.
40 convenience.
41
41
42 Nodeids:
42 Nodeids:
43
43
44 Nodeids are unique ids that are used to represent the contents of a
44 Nodeids are unique ids that are used to represent the contents of a
45 file AND its position in the project history. That is, if you change a
45 file AND its position in the project history. That is, if you change a
46 file and then change it back, the result will have a different nodeid
46 file and then change it back, the result will have a different nodeid
47 because it has different history. This is accomplished by including
47 because it has different history. This is accomplished by including
48 the parents of a given revision's nodeids with the revision's text
48 the parents of a given revision's nodeids with the revision's text
49 when calculating the hash.
49 when calculating the hash.
50
50
51 Graph merging:
51 Graph merging:
52
52
53 Nodeids are implemented as they are to simplify merging. Merging a
53 Nodeids are implemented as they are to simplify merging. Merging a
54 pair of directed acyclic graphs (aka "the family tree" of the file
54 pair of directed acyclic graphs (aka "the family tree" of the file
55 history) requires some method of determining if nodes in different
55 history) requires some method of determining if nodes in different
56 graphs correspond. Simply comparing the contents of the node (by
56 graphs correspond. Simply comparing the contents of the node (by
57 comparing text of given revisions or their hashes) can get confused by
57 comparing text of given revisions or their hashes) can get confused by
58 identical revisions in the tree.
58 identical revisions in the tree.
59
59
60 The nodeid approach makes it trivial - the hash uniquely describes a
60 The nodeid approach makes it trivial - the hash uniquely describes a
61 revision's contents and its graph position relative to the root, so
61 revision's contents and its graph position relative to the root, so
62 merge is simply checking whether each nodeid in graph A is in the hash
62 merge is simply checking whether each nodeid in graph A is in the hash
63 table of graph B. If not, we pull them in, adding them sequentially to
63 table of graph B. If not, we pull them in, adding them sequentially to
64 the revlog.
64 the revlog.
65
65
66 Graph resolving:
66 Graph resolving:
67
67
68 Mercurial does branching by copying (or COWing) a repository and thus
68 Mercurial does branching by copying (or COWing) a repository and thus
69 keeps everything nice and linear within a repository. However, when a
69 keeps everything nice and linear within a repository. However, when a
70 merge of repositories (a "pull") is done, we may often have two head
70 merge of repositories (a "pull") is done, we may often have two head
71 revisions in a given graph. To keep things simple, Mercurial forces
71 revisions in a given graph. To keep things simple, Mercurial forces
72 the head revisions to be merged.
72 the head revisions to be merged.
73
73
74 It first finds the closest common ancestor of the two heads. If one is
74 It first finds the closest common ancestor of the two heads. If one is
75 a child of the other, it becomes the new head. Otherwise, we call out
75 a child of the other, it becomes the new head. Otherwise, we call out
76 to a user-specified 3-way merge tool.
76 to a user-specified 3-way merge tool.
77
77
78 Merging files, manifests, and changesets:
78 Merging files, manifests, and changesets:
79
79
80 We begin by comparing changeset DAGs, pulling all nodes we don't have
80 We begin by comparing changeset DAGs, pulling all nodes we don't have
81 in our DAG from the other repository. As we do so, we collect a list
81 in our DAG from the other repository. As we do so, we collect a list
82 of changed files to merge.
82 of changed files to merge.
83
83
84 Then for each file, we perform a graph merge and resolve as above.
84 Then for each file, we perform a graph merge and resolve as above.
85 It's important to merge files using per-file DAGs rather than just
85 It's important to merge files using per-file DAGs rather than just
86 changeset level DAGs as this diagram illustrates:
86 changeset level DAGs as this diagram illustrates:
87
87
88 M M1 M2
88 M M1 M2
89
89
90 AB
90 AB
91 |`-------v M2 clones M
91 |`-------v M2 clones M
92 aB AB file A is change in mainline
92 aB AB file A is change in mainline
93 |`---v AB' file B is changed in M2
93 |`---v AB' file B is changed in M2
94 | aB / | M1 clones M
94 | aB / | M1 clones M
95 | ab/ | M1 changes B
95 | ab/ | M1 changes B
96 | ab' | M1 merges from M2, changes to B conflict
96 | ab' | M1 merges from M2, changes to B conflict
97 | | A'B' M2 changes A
97 | | A'B' M2 changes A
98 `---+--.|
98 `---+--.|
99 | a'B' M2 merges from mainline, changes to A conflict
99 | a'B' M2 merges from mainline, changes to A conflict
100 `--.|
100 `--.|
101 ??? depending on which ancestor we choose, we will have
101 ??? depending on which ancestor we choose, we will have
102 to redo A hand-merge, B hand-merge, or both
102 to redo A hand-merge, B hand-merge, or both
103 but if we look at the files independently, everything
103 but if we look at the files independently, everything
104 is fine
104 is fine
105
105
106 After we've merged files, we merge the manifest log DAG and resolve
106 After we've merged files, we merge the manifest log DAG and resolve
107 additions and deletions. Then we are ready to resolve the changeset
107 additions and deletions. Then we are ready to resolve the changeset
108 DAG - if our merge required any changes (the new head is not a
108 DAG - if our merge required any changes (the new head is not a
109 decendent of our tip), we must create a new changeset describing all
109 decendent of our tip), we must create a new changeset describing all
110 of the changes needed to merge it into the tip.
110 of the changes needed to merge it into the tip.
111
111
112 Merge performance:
112 Merge performance:
113
113
114 The I/O operations for performing a merge are O(changed files), not
114 The I/O operations for performing a merge are O(changed files), not
115 O(total changes) and in many cases, we needn't even unpack the deltas
115 O(total changes) and in many cases, we needn't even unpack the deltas
116 to add them to our repository (though this optimization isn't
116 to add them to our repository (though this optimization isn't
117 necessary).
117 necessary).
118
118
119 Rollback:
119 Rollback:
120
120
121 When performing a commit or a merge, we order things so that the
121 When performing a commit or a merge, we order things so that the
122 changeset entry gets added last. We keep a transaction log of the name
122 changeset entry gets added last. We keep a transaction log of the name
123 of each file touched and its length prior to the transaction. On
123 of each file touched and its length prior to the transaction. On
124 abort, we simply truncate each file to its prior length. This is one
124 abort, we simply truncate each file to its prior length. This is one
125 of the nice properties of the append-only structure of the revlogs.
125 of the nice properties of the append-only structure of the revlogs.
126
126
127 Remote access:
127 Remote access:
128
128
129 Mercurial currently supports pulling from "serverless" repositories.
129 Mercurial currently supports pulling from "serverless" repositories.
130 Simply making the repo directory accessibly via the web and pointing
130 Simply making the repo directory accessibly via the web and pointing
131 hg at it can accomplish a pull. This is relatively bandwidth efficient
131 hg at it can accomplish a pull. This is relatively bandwidth efficient
132 but no effort has been spent on pipelining, so it won't work
132 but no effort has been spent on pipelining, so it won't work
133 especially well over LAN yet.
133 especially well over LAN yet.
134
134
135 It's also quite amenable to rsync, if you don't mind keeping an intact
135 It's also quite amenable to rsync, if you don't mind keeping an intact
136 copy of the master around locally.
136 copy of the master around locally.
137
137
138 Also note the append-only and ordering properties of the commit
138 Also note the append-only and ordering properties of the commit
139 guarantee that readers will always see a repository in a consistent
139 guarantee that readers will always see a repository in a consistent
140 state and no special locking is necessary. As there is generally only
140 state and no special locking is necessary. As there is generally only
141 one writer to an hg repository, there is in fact no exclusion
141 one writer to an hg repository, there is in fact no exclusion
142 implemented yet.
142 implemented yet.
143
143
144
144
145 Some comparisons to git:
145 Some comparisons to git:
146
146
147 Most notably, Mercurial uses delta compression and repositories
147 Most notably, Mercurial uses delta compression and repositories
148 created with it will grow much more slowly over time. This also allows
148 created with it will grow much more slowly over time. This also allows
149 it to be much more bandwidth efficient. I expect repos sizes and sync
149 it to be much more bandwidth efficient. I expect repos sizes and sync
150 speeds to be similar to or better than BK, given the use of binary diffs.
150 speeds to be similar to or better than BK, given the use of binary diffs.
151
151
152 Mercurial is roughly the same performance as git and is faster in
152 Mercurial is roughly the same performance as git in some areas and is
153 others as it keeps around more metadata. One example is listing and
153 faster in others as it keeps around more metadata. One example is
154 retrieving past versions of a file, which it can do without reading
154 listing and retrieving past versions of a file, which it can do
155 all the changesets. This metadata will also allow it to perform better
155 without reading all the changesets. This metadata will also allow it
156 merges as described above.
156 to perform better merges as described above.
157
157
158
158
General Comments 0
You need to be logged in to leave comments. Login now