upstream/mercurial-mirror Files · notes.txt

Update version numbers

mpm@selenic.com - - Load All Authors

File last commit:

r0:9117c656 default


                r16:7eca4cfa

0.4d default

Download file

             notes.txt
        
                    159 lines
            
             | 6.4 KiB
            
                | text/plain
            
             |
                TextLexer

/ notes.txt

History | Annotation | Raw |Copy content |Copy permalink

				Some notes about Mercurial's design

				Revlogs:

				The fundamental storage type in Mercurial is a "revlog". A revlog is
				the set of all revisions to a file. Each revision is either stored
				compressed in its entirety or as a compressed binary delta against the
				previous version. The decision of when to store a full version is made
				based on how much data would be needed to reconstruct the file. This
				lets us ensure that we never need to read huge amounts of data to
				reconstruct a file, regardless of how many revisions of it we store.

				In fact, we should always be able to do it with a single read,
				provided we know when and where to read. This is where the index comes
				in. Each revlog has an index containing a special hash (nodeid) of the
				text, hashes for its parents, and where and how much of the revlog
				data we need to read to reconstruct it. Thus, with one read of the
				index and one read of the data, we can reconstruct any version in time
				proportional to the file size.

				Similarly, revlogs and their indices are append-only. This means that
				adding a new version is also O(1) seeks.

				Generally revlogs are used to represent revisions of files, but they
				also are used to represent manifests and changesets.

				Manifests:

				A manifest is simply a list of all files in a given revision of a
				project along with the nodeids of the corresponding file revisions. So
				grabbing a given version of the project means simply looking up its
				manifest and reconstruction all the file revisions pointed to by it.

				Changesets:

				A changeset is a list of all files changed in a check-in along with a
				change description and some metadata like user and date. It also
				contains a nodeid to the relevent revision of the manifest. Changesets
				and manifests are one-to-one, but contain different data for
				convenience.

				Nodeids:

				Nodeids are unique ids that are used to represent the contents of a
				file AND its position in the project history. That is, if you change a
				file and then change it back, the result will have a different nodeid
				because it has different history. This is accomplished by including
				the parents of a given revision's nodeids with the revision's text
				when calculating the hash.

				Graph merging:

				Nodeids are implemented as they are to simplify merging. Merging a
				pair of directed acyclic graphs (aka "the family tree" of the file
				history) requires some method of determining if nodes in different
				graphs correspond. Simply comparing the contents of the node (by
				comparing text of given revisions or their hashes) can get confused by
				identical revisions in the tree.

				The nodeid approach makes it trivial - the hash uniquely describes a
				revision's contents and its graph position relative to the root, so
				merge is simply checking whether each nodeid in graph A is in the hash
				table of graph B. If not, we pull them in, adding them sequentially to
				the revlog.

				Graph resolving:

				Mercurial does branching by copying (or COWing) a repository and thus
				keeps everything nice and linear within a repository. However, when a
				merge of repositories (a "pull") is done, we may often have two head
				revisions in a given graph. To keep things simple, Mercurial forces
				the head revisions to be merged.

				It first finds the closest common ancestor of the two heads. If one is
				a child of the other, it becomes the new head. Otherwise, we call out
				to a user-specified 3-way merge tool.

				Merging files, manifests, and changesets:

				We begin by comparing changeset DAGs, pulling all nodes we don't have
				in our DAG from the other repository. As we do so, we collect a list
				of changed files to merge.

				Then for each file, we perform a graph merge and resolve as above.
				It's important to merge files using per-file DAGs rather than just
				changeset level DAGs as this diagram illustrates:

				M M1 M2

				AB
				\|`-------v M2 clones M
				aB AB file A is change in mainline
				\|`---v AB' file B is changed in M2
				\| aB / \| M1 clones M
				\| ab/ \| M1 changes B
				\| ab' \| M1 merges from M2, changes to B conflict
				\| \| A'B' M2 changes A
				`---+--.\|
				\| a'B' M2 merges from mainline, changes to A conflict
				`--.\|
				??? depending on which ancestor we choose, we will have
				to redo A hand-merge, B hand-merge, or both
				but if we look at the files independently, everything
				is fine

				After we've merged files, we merge the manifest log DAG and resolve
				additions and deletions. Then we are ready to resolve the changeset
				DAG - if our merge required any changes (the new head is not a
				decendent of our tip), we must create a new changeset describing all
				of the changes needed to merge it into the tip.

				Merge performance:

				The I/O operations for performing a merge are O(changed files), not
				O(total changes) and in many cases, we needn't even unpack the deltas
				to add them to our repository (though this optimization isn't
				necessary).

				Rollback:

				Rollback is not yet implemented, but will be easy to add. When
				performing a commit or a merge, we order things so that the changeset
				entry gets added last. We keep a transaction log of the name of each
				file and its length prior to the transaction. On abort, we simply
				truncate each file to its prior length. This is one of the nice
				properties of the append-only structure of the revlogs.

				Remote access:

				Mercurial currently supports pulling from "serverless" repositories.
				Simply making the repo directory accessibly via the web and pointing
				hg at it can accomplish a pull. This is relatively bandwidth efficient
				but no effort has been spent on pipelining, so it won't work
				especially well over LAN yet.

				It's also quite amenable to rsync, if you don't mind keeping an intact
				copy of the master around locally.

				Also note the append-only and ordering properties of the commit
				guarantee that readers will always see a repository in a consistent
				state and no special locking is necessary. As there is generally only
				one writer to an hg repository, there is in fact no exclusion
				implemented yet.


				Some comparisons to git:

				Most notably, Mercurial uses delta compression and repositories
				created with it will grow much more slowly over time. This also allows
				it to be much more bandwidth efficient. I expect repos sizes and sync
				speeds to be similar to or better than BK, given the use of binary diffs.

				Mercurial is roughly the same performance as git and is faster in
				others as it keeps around more metadata. One example is listing and
				retrieving past versions of a file, which it can do without reading
				all the changesets. This metadata will also allow it to perform better
				merges as described above.

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages