Show More
@@ -1,158 +1,158 b'' | |||||
1 | Some notes about Mercurial's design |
|
1 | Some notes about Mercurial's design | |
2 |
|
2 | |||
3 | Revlogs: |
|
3 | Revlogs: | |
4 |
|
4 | |||
5 | The fundamental storage type in Mercurial is a "revlog". A revlog is |
|
5 | The fundamental storage type in Mercurial is a "revlog". A revlog is | |
6 | the set of all revisions to a file. Each revision is either stored |
|
6 | the set of all revisions to a file. Each revision is either stored | |
7 | compressed in its entirety or as a compressed binary delta against the |
|
7 | compressed in its entirety or as a compressed binary delta against the | |
8 | previous version. The decision of when to store a full version is made |
|
8 | previous version. The decision of when to store a full version is made | |
9 | based on how much data would be needed to reconstruct the file. This |
|
9 | based on how much data would be needed to reconstruct the file. This | |
10 | lets us ensure that we never need to read huge amounts of data to |
|
10 | lets us ensure that we never need to read huge amounts of data to | |
11 | reconstruct a file, regardless of how many revisions of it we store. |
|
11 | reconstruct a file, regardless of how many revisions of it we store. | |
12 |
|
12 | |||
13 | In fact, we should always be able to do it with a single read, |
|
13 | In fact, we should always be able to do it with a single read, | |
14 | provided we know when and where to read. This is where the index comes |
|
14 | provided we know when and where to read. This is where the index comes | |
15 | in. Each revlog has an index containing a special hash (nodeid) of the |
|
15 | in. Each revlog has an index containing a special hash (nodeid) of the | |
16 | text, hashes for its parents, and where and how much of the revlog |
|
16 | text, hashes for its parents, and where and how much of the revlog | |
17 | data we need to read to reconstruct it. Thus, with one read of the |
|
17 | data we need to read to reconstruct it. Thus, with one read of the | |
18 | index and one read of the data, we can reconstruct any version in time |
|
18 | index and one read of the data, we can reconstruct any version in time | |
19 | proportional to the file size. |
|
19 | proportional to the file size. | |
20 |
|
20 | |||
21 | Similarly, revlogs and their indices are append-only. This means that |
|
21 | Similarly, revlogs and their indices are append-only. This means that | |
22 | adding a new version is also O(1) seeks. |
|
22 | adding a new version is also O(1) seeks. | |
23 |
|
23 | |||
24 | Generally revlogs are used to represent revisions of files, but they |
|
24 | Generally revlogs are used to represent revisions of files, but they | |
25 | also are used to represent manifests and changesets. |
|
25 | also are used to represent manifests and changesets. | |
26 |
|
26 | |||
27 | Manifests: |
|
27 | Manifests: | |
28 |
|
28 | |||
29 | A manifest is simply a list of all files in a given revision of a |
|
29 | A manifest is simply a list of all files in a given revision of a | |
30 | project along with the nodeids of the corresponding file revisions. So |
|
30 | project along with the nodeids of the corresponding file revisions. So | |
31 | grabbing a given version of the project means simply looking up its |
|
31 | grabbing a given version of the project means simply looking up its | |
32 | manifest and reconstruction all the file revisions pointed to by it. |
|
32 | manifest and reconstruction all the file revisions pointed to by it. | |
33 |
|
33 | |||
34 | Changesets: |
|
34 | Changesets: | |
35 |
|
35 | |||
36 | A changeset is a list of all files changed in a check-in along with a |
|
36 | A changeset is a list of all files changed in a check-in along with a | |
37 | change description and some metadata like user and date. It also |
|
37 | change description and some metadata like user and date. It also | |
38 | contains a nodeid to the relevent revision of the manifest. Changesets |
|
38 | contains a nodeid to the relevent revision of the manifest. Changesets | |
39 | and manifests are one-to-one, but contain different data for |
|
39 | and manifests are one-to-one, but contain different data for | |
40 | convenience. |
|
40 | convenience. | |
41 |
|
41 | |||
42 | Nodeids: |
|
42 | Nodeids: | |
43 |
|
43 | |||
44 | Nodeids are unique ids that are used to represent the contents of a |
|
44 | Nodeids are unique ids that are used to represent the contents of a | |
45 | file AND its position in the project history. That is, if you change a |
|
45 | file AND its position in the project history. That is, if you change a | |
46 | file and then change it back, the result will have a different nodeid |
|
46 | file and then change it back, the result will have a different nodeid | |
47 | because it has different history. This is accomplished by including |
|
47 | because it has different history. This is accomplished by including | |
48 | the parents of a given revision's nodeids with the revision's text |
|
48 | the parents of a given revision's nodeids with the revision's text | |
49 | when calculating the hash. |
|
49 | when calculating the hash. | |
50 |
|
50 | |||
51 | Graph merging: |
|
51 | Graph merging: | |
52 |
|
52 | |||
53 | Nodeids are implemented as they are to simplify merging. Merging a |
|
53 | Nodeids are implemented as they are to simplify merging. Merging a | |
54 | pair of directed acyclic graphs (aka "the family tree" of the file |
|
54 | pair of directed acyclic graphs (aka "the family tree" of the file | |
55 | history) requires some method of determining if nodes in different |
|
55 | history) requires some method of determining if nodes in different | |
56 | graphs correspond. Simply comparing the contents of the node (by |
|
56 | graphs correspond. Simply comparing the contents of the node (by | |
57 | comparing text of given revisions or their hashes) can get confused by |
|
57 | comparing text of given revisions or their hashes) can get confused by | |
58 | identical revisions in the tree. |
|
58 | identical revisions in the tree. | |
59 |
|
59 | |||
60 | The nodeid approach makes it trivial - the hash uniquely describes a |
|
60 | The nodeid approach makes it trivial - the hash uniquely describes a | |
61 | revision's contents and its graph position relative to the root, so |
|
61 | revision's contents and its graph position relative to the root, so | |
62 | merge is simply checking whether each nodeid in graph A is in the hash |
|
62 | merge is simply checking whether each nodeid in graph A is in the hash | |
63 | table of graph B. If not, we pull them in, adding them sequentially to |
|
63 | table of graph B. If not, we pull them in, adding them sequentially to | |
64 | the revlog. |
|
64 | the revlog. | |
65 |
|
65 | |||
66 | Graph resolving: |
|
66 | Graph resolving: | |
67 |
|
67 | |||
68 | Mercurial does branching by copying (or COWing) a repository and thus |
|
68 | Mercurial does branching by copying (or COWing) a repository and thus | |
69 | keeps everything nice and linear within a repository. However, when a |
|
69 | keeps everything nice and linear within a repository. However, when a | |
70 | merge of repositories (a "pull") is done, we may often have two head |
|
70 | merge of repositories (a "pull") is done, we may often have two head | |
71 | revisions in a given graph. To keep things simple, Mercurial forces |
|
71 | revisions in a given graph. To keep things simple, Mercurial forces | |
72 | the head revisions to be merged. |
|
72 | the head revisions to be merged. | |
73 |
|
73 | |||
74 | It first finds the closest common ancestor of the two heads. If one is |
|
74 | It first finds the closest common ancestor of the two heads. If one is | |
75 | a child of the other, it becomes the new head. Otherwise, we call out |
|
75 | a child of the other, it becomes the new head. Otherwise, we call out | |
76 | to a user-specified 3-way merge tool. |
|
76 | to a user-specified 3-way merge tool. | |
77 |
|
77 | |||
78 | Merging files, manifests, and changesets: |
|
78 | Merging files, manifests, and changesets: | |
79 |
|
79 | |||
80 | We begin by comparing changeset DAGs, pulling all nodes we don't have |
|
80 | We begin by comparing changeset DAGs, pulling all nodes we don't have | |
81 | in our DAG from the other repository. As we do so, we collect a list |
|
81 | in our DAG from the other repository. As we do so, we collect a list | |
82 | of changed files to merge. |
|
82 | of changed files to merge. | |
83 |
|
83 | |||
84 | Then for each file, we perform a graph merge and resolve as above. |
|
84 | Then for each file, we perform a graph merge and resolve as above. | |
85 | It's important to merge files using per-file DAGs rather than just |
|
85 | It's important to merge files using per-file DAGs rather than just | |
86 | changeset level DAGs as this diagram illustrates: |
|
86 | changeset level DAGs as this diagram illustrates: | |
87 |
|
87 | |||
88 | M M1 M2 |
|
88 | M M1 M2 | |
89 |
|
89 | |||
90 | AB |
|
90 | AB | |
91 | |`-------v M2 clones M |
|
91 | |`-------v M2 clones M | |
92 | aB AB file A is change in mainline |
|
92 | aB AB file A is change in mainline | |
93 | |`---v AB' file B is changed in M2 |
|
93 | |`---v AB' file B is changed in M2 | |
94 | | aB / | M1 clones M |
|
94 | | aB / | M1 clones M | |
95 | | ab/ | M1 changes B |
|
95 | | ab/ | M1 changes B | |
96 | | ab' | M1 merges from M2, changes to B conflict |
|
96 | | ab' | M1 merges from M2, changes to B conflict | |
97 | | | A'B' M2 changes A |
|
97 | | | A'B' M2 changes A | |
98 | `---+--.| |
|
98 | `---+--.| | |
99 | | a'B' M2 merges from mainline, changes to A conflict |
|
99 | | a'B' M2 merges from mainline, changes to A conflict | |
100 | `--.| |
|
100 | `--.| | |
101 | ??? depending on which ancestor we choose, we will have |
|
101 | ??? depending on which ancestor we choose, we will have | |
102 | to redo A hand-merge, B hand-merge, or both |
|
102 | to redo A hand-merge, B hand-merge, or both | |
103 | but if we look at the files independently, everything |
|
103 | but if we look at the files independently, everything | |
104 | is fine |
|
104 | is fine | |
105 |
|
105 | |||
106 | After we've merged files, we merge the manifest log DAG and resolve |
|
106 | After we've merged files, we merge the manifest log DAG and resolve | |
107 | additions and deletions. Then we are ready to resolve the changeset |
|
107 | additions and deletions. Then we are ready to resolve the changeset | |
108 | DAG - if our merge required any changes (the new head is not a |
|
108 | DAG - if our merge required any changes (the new head is not a | |
109 | decendent of our tip), we must create a new changeset describing all |
|
109 | decendent of our tip), we must create a new changeset describing all | |
110 | of the changes needed to merge it into the tip. |
|
110 | of the changes needed to merge it into the tip. | |
111 |
|
111 | |||
112 | Merge performance: |
|
112 | Merge performance: | |
113 |
|
113 | |||
114 | The I/O operations for performing a merge are O(changed files), not |
|
114 | The I/O operations for performing a merge are O(changed files), not | |
115 | O(total changes) and in many cases, we needn't even unpack the deltas |
|
115 | O(total changes) and in many cases, we needn't even unpack the deltas | |
116 | to add them to our repository (though this optimization isn't |
|
116 | to add them to our repository (though this optimization isn't | |
117 | necessary). |
|
117 | necessary). | |
118 |
|
118 | |||
119 | Rollback: |
|
119 | Rollback: | |
120 |
|
120 | |||
121 | When performing a commit or a merge, we order things so that the |
|
121 | When performing a commit or a merge, we order things so that the | |
122 | changeset entry gets added last. We keep a transaction log of the name |
|
122 | changeset entry gets added last. We keep a transaction log of the name | |
123 | of each file touched and its length prior to the transaction. On |
|
123 | of each file touched and its length prior to the transaction. On | |
124 | abort, we simply truncate each file to its prior length. This is one |
|
124 | abort, we simply truncate each file to its prior length. This is one | |
125 | of the nice properties of the append-only structure of the revlogs. |
|
125 | of the nice properties of the append-only structure of the revlogs. | |
126 |
|
126 | |||
127 | Remote access: |
|
127 | Remote access: | |
128 |
|
128 | |||
129 | Mercurial currently supports pulling from "serverless" repositories. |
|
129 | Mercurial currently supports pulling from "serverless" repositories. | |
130 | Simply making the repo directory accessibly via the web and pointing |
|
130 | Simply making the repo directory accessibly via the web and pointing | |
131 | hg at it can accomplish a pull. This is relatively bandwidth efficient |
|
131 | hg at it can accomplish a pull. This is relatively bandwidth efficient | |
132 | but no effort has been spent on pipelining, so it won't work |
|
132 | but no effort has been spent on pipelining, so it won't work | |
133 | especially well over LAN yet. |
|
133 | especially well over LAN yet. | |
134 |
|
134 | |||
135 | It's also quite amenable to rsync, if you don't mind keeping an intact |
|
135 | It's also quite amenable to rsync, if you don't mind keeping an intact | |
136 | copy of the master around locally. |
|
136 | copy of the master around locally. | |
137 |
|
137 | |||
138 | Also note the append-only and ordering properties of the commit |
|
138 | Also note the append-only and ordering properties of the commit | |
139 | guarantee that readers will always see a repository in a consistent |
|
139 | guarantee that readers will always see a repository in a consistent | |
140 | state and no special locking is necessary. As there is generally only |
|
140 | state and no special locking is necessary. As there is generally only | |
141 | one writer to an hg repository, there is in fact no exclusion |
|
141 | one writer to an hg repository, there is in fact no exclusion | |
142 | implemented yet. |
|
142 | implemented yet. | |
143 |
|
143 | |||
144 |
|
144 | |||
145 | Some comparisons to git: |
|
145 | Some comparisons to git: | |
146 |
|
146 | |||
147 | Most notably, Mercurial uses delta compression and repositories |
|
147 | Most notably, Mercurial uses delta compression and repositories | |
148 | created with it will grow much more slowly over time. This also allows |
|
148 | created with it will grow much more slowly over time. This also allows | |
149 | it to be much more bandwidth efficient. I expect repos sizes and sync |
|
149 | it to be much more bandwidth efficient. I expect repos sizes and sync | |
150 | speeds to be similar to or better than BK, given the use of binary diffs. |
|
150 | speeds to be similar to or better than BK, given the use of binary diffs. | |
151 |
|
151 | |||
152 |
Mercurial is roughly the same performance as git |
|
152 | Mercurial is roughly the same performance as git in some areas and is | |
153 |
others as it keeps around more metadata. One example is |
|
153 | faster in others as it keeps around more metadata. One example is | |
154 |
retrieving past versions of a file, which it can do |
|
154 | listing and retrieving past versions of a file, which it can do | |
155 |
all the changesets. This metadata will also allow it |
|
155 | without reading all the changesets. This metadata will also allow it | |
156 | merges as described above. |
|
156 | to perform better merges as described above. | |
157 |
|
157 | |||
158 |
|
158 |
General Comments 0
You need to be logged in to leave comments.
Login now