notes.txt
146 lines
| 6.1 KiB
| text/plain
|
TextLexer
mpm@selenic.com
|
r0 | Some notes about Mercurial's design | ||
Revlogs: | ||||
The fundamental storage type in Mercurial is a "revlog". A revlog is | ||||
the set of all revisions to a file. Each revision is either stored | ||||
compressed in its entirety or as a compressed binary delta against the | ||||
previous version. The decision of when to store a full version is made | ||||
based on how much data would be needed to reconstruct the file. This | ||||
lets us ensure that we never need to read huge amounts of data to | ||||
reconstruct a file, regardless of how many revisions of it we store. | ||||
In fact, we should always be able to do it with a single read, | ||||
provided we know when and where to read. This is where the index comes | ||||
in. Each revlog has an index containing a special hash (nodeid) of the | ||||
text, hashes for its parents, and where and how much of the revlog | ||||
data we need to read to reconstruct it. Thus, with one read of the | ||||
index and one read of the data, we can reconstruct any version in time | ||||
proportional to the file size. | ||||
Similarly, revlogs and their indices are append-only. This means that | ||||
adding a new version is also O(1) seeks. | ||||
Generally revlogs are used to represent revisions of files, but they | ||||
also are used to represent manifests and changesets. | ||||
Manifests: | ||||
A manifest is simply a list of all files in a given revision of a | ||||
project along with the nodeids of the corresponding file revisions. So | ||||
grabbing a given version of the project means simply looking up its | ||||
manifest and reconstruction all the file revisions pointed to by it. | ||||
Changesets: | ||||
A changeset is a list of all files changed in a check-in along with a | ||||
change description and some metadata like user and date. It also | ||||
contains a nodeid to the relevent revision of the manifest. Changesets | ||||
and manifests are one-to-one, but contain different data for | ||||
convenience. | ||||
Nodeids: | ||||
Nodeids are unique ids that are used to represent the contents of a | ||||
file AND its position in the project history. That is, if you change a | ||||
file and then change it back, the result will have a different nodeid | ||||
because it has different history. This is accomplished by including | ||||
the parents of a given revision's nodeids with the revision's text | ||||
when calculating the hash. | ||||
Graph merging: | ||||
Nodeids are implemented as they are to simplify merging. Merging a | ||||
pair of directed acyclic graphs (aka "the family tree" of the file | ||||
history) requires some method of determining if nodes in different | ||||
graphs correspond. Simply comparing the contents of the node (by | ||||
comparing text of given revisions or their hashes) can get confused by | ||||
identical revisions in the tree. | ||||
The nodeid approach makes it trivial - the hash uniquely describes a | ||||
revision's contents and its graph position relative to the root, so | ||||
merge is simply checking whether each nodeid in graph A is in the hash | ||||
table of graph B. If not, we pull them in, adding them sequentially to | ||||
the revlog. | ||||
mpm@selenic.com
|
r260 | Branching and merging: | ||
mpm@selenic.com
|
r0 | |||
mpm@selenic.com
|
r260 | Everything in Mercurial is potentially a branch and every user | ||
effectively works in their own branch. When you do a checkout, | ||||
Mercurial remembers what the parent changeset was and uses it for the | ||||
next check in. | ||||
To do a merge of branches in Mercurial, you check out the heads of the | ||||
two branches into the same working directory which causes a merge to | ||||
be performed, and then check in the result once you're happy with it. | ||||
The resulting checkin will have two parents. | ||||
mpm@selenic.com
|
r0 | |||
mpm@selenic.com
|
r260 | It decides when a merge is necessary by first determining if there are | ||
any uncommitted changes in the working directory. This effectively | ||||
makes the working directory a branch off the checked in version it's | ||||
based on. Then it also determines if the working directory is a direct | ||||
ancestor or descendent of the second version we're attempting to | ||||
checkout. If neither is true, we simply replace the working directory | ||||
version with the new version. Otherwise we perform a merge between the | ||||
two versions. | ||||
mpm@selenic.com
|
r0 | |||
mpm@selenic.com
|
r260 | Merging files and manifests: | ||
mpm@selenic.com
|
r0 | |||
mpm@selenic.com
|
r260 | We begin by comparing two versions manifests and deciding which files | ||
need to be added, deleted, and merged. | ||||
mpm@selenic.com
|
r0 | |||
Then for each file, we perform a graph merge and resolve as above. | ||||
It's important to merge files using per-file DAGs rather than just | ||||
Thomas Arendsen Hein
|
r1308 | changeset level DAGs as this diagram illustrates: | ||
mpm@selenic.com
|
r0 | |||
M M1 M2 | ||||
AB | ||||
|`-------v M2 clones M | ||||
aB AB file A is change in mainline | ||||
|`---v AB' file B is changed in M2 | ||||
| aB / | M1 clones M | ||||
| ab/ | M1 changes B | ||||
| ab' | M1 merges from M2, changes to B conflict | ||||
| | A'B' M2 changes A | ||||
`---+--.| | ||||
| a'B' M2 merges from mainline, changes to A conflict | ||||
`--.| | ||||
??? depending on which ancestor we choose, we will have | ||||
to redo A hand-merge, B hand-merge, or both | ||||
but if we look at the files independently, everything | ||||
is fine | ||||
mpm@selenic.com
|
r260 | The result is a merged version in the working directory, waiting for | ||
check-in. | ||||
mpm@selenic.com
|
r0 | |||
Rollback: | ||||
mpm@selenic.com
|
r47 | When performing a commit or a merge, we order things so that the | ||
changeset entry gets added last. We keep a transaction log of the name | ||||
of each file touched and its length prior to the transaction. On | ||||
abort, we simply truncate each file to its prior length. This is one | ||||
of the nice properties of the append-only structure of the revlogs. | ||||
mpm@selenic.com
|
r260 | We can also reuse this journal for "undo". | ||
mpm@selenic.com
|
r0 | |||
mpm@selenic.com
|
r260 | Merging between repositories: | ||
mpm@selenic.com
|
r0 | |||
mpm@selenic.com
|
r260 | One of the key features of Mercurial is the ability to merge between | ||
independent repositories in a decentralized fashion. Each repository | ||||
can act as a read-only server or a client. Clients operating by | ||||
pulling all branches that it hasn't seen from the server and adding | ||||
them into its graph. This is done in two steps: searching for new | ||||
"roots" and pulling a "changegroup" | ||||
mpm@selenic.com
|
r0 | |||
mpm@selenic.com
|
r260 | Searching for new "roots" begins by finding all new heads and | ||
searching backwards from those heads to the first unknown nodes in | ||||
their respective branches. These nodes are the 'roots' that are used | ||||
to calculate the 'changegroup': the set of all changesets starting at | ||||
those roots. Mercurial takes pains to make this search efficient in | ||||
both bandwidth and round-trips. | ||||
mpm@selenic.com
|
r0 | |||
mpm@selenic.com
|
r260 | Once the roots are found, the changegroup can be transferred as a | ||
single streaming transfer. This is organized as an ordered set of | ||||
deltas for changesets, manifests, and files. Large chunks of deltas | ||||
can be directly added to the repository without unpacking so it's | ||||
fairly fast. | ||||