upstream/mercurial-mirror Files · notes.txt

fix speed regression in mdiff caused by line split bugfix.

Vadim Gelfer - - Load All Authors

File last commit:

r2227:4f072bb0 default


                r2251:35fb62a3

0.9 default

Download file

             notes.txt
        
                    146 lines
            
             | 6.1 KiB
            
                | text/plain
            
             |
                TextLexer

/ notes.txt

History | Annotation | Raw |Copy content |Copy permalink

				Some notes about Mercurial's design

				Revlogs:

				The fundamental storage type in Mercurial is a "revlog". A revlog is
				the set of all revisions to a file. Each revision is either stored
				compressed in its entirety or as a compressed binary delta against the
				previous version. The decision of when to store a full version is made
				based on how much data would be needed to reconstruct the file. This
				lets us ensure that we never need to read huge amounts of data to
				reconstruct a file, regardless of how many revisions of it we store.

				In fact, we should always be able to do it with a single read,
				provided we know when and where to read. This is where the index comes
				in. Each revlog has an index containing a special hash (nodeid) of the
				text, hashes for its parents, and where and how much of the revlog
				data we need to read to reconstruct it. Thus, with one read of the
				index and one read of the data, we can reconstruct any version in time
				proportional to the file size.

				Similarly, revlogs and their indices are append-only. This means that
				adding a new version is also O(1) seeks.

				Generally revlogs are used to represent revisions of files, but they
				also are used to represent manifests and changesets.

				Manifests:

				A manifest is simply a list of all files in a given revision of a
				project along with the nodeids of the corresponding file revisions. So
				grabbing a given version of the project means simply looking up its
				manifest and reconstruction all the file revisions pointed to by it.

				Changesets:

				A changeset is a list of all files changed in a check-in along with a
				change description and some metadata like user and date. It also
				contains a nodeid to the relevent revision of the manifest. Changesets
				and manifests are one-to-one, but contain different data for
				convenience.

				Nodeids:

				Nodeids are unique ids that are used to represent the contents of a
				file AND its position in the project history. That is, if you change a
				file and then change it back, the result will have a different nodeid
				because it has different history. This is accomplished by including
				the parents of a given revision's nodeids with the revision's text
				when calculating the hash.

				Graph merging:

				Nodeids are implemented as they are to simplify merging. Merging a
				pair of directed acyclic graphs (aka "the family tree" of the file
				history) requires some method of determining if nodes in different
				graphs correspond. Simply comparing the contents of the node (by
				comparing text of given revisions or their hashes) can get confused by
				identical revisions in the tree.

				The nodeid approach makes it trivial - the hash uniquely describes a
				revision's contents and its graph position relative to the root, so
				merge is simply checking whether each nodeid in graph A is in the hash
				table of graph B. If not, we pull them in, adding them sequentially to
				the revlog.

				Branching and merging:

				Everything in Mercurial is potentially a branch and every user
				effectively works in their own branch. When you do a checkout,
				Mercurial remembers what the parent changeset was and uses it for the
				next check in.

				To do a merge of branches in Mercurial, you check out the heads of the
				two branches into the same working directory which causes a merge to
				be performed, and then check in the result once you're happy with it.
				The resulting checkin will have two parents.

				It decides when a merge is necessary by first determining if there are
				any uncommitted changes in the working directory. This effectively
				makes the working directory a branch off the checked in version it's
				based on. Then it also determines if the working directory is a direct
				ancestor or descendent of the second version we're attempting to
				checkout. If neither is true, we simply replace the working directory
				version with the new version. Otherwise we perform a merge between the
				two versions.

				Merging files and manifests:

				We begin by comparing two versions manifests and deciding which files
				need to be added, deleted, and merged.

				Then for each file, we perform a graph merge and resolve as above.
				It's important to merge files using per-file DAGs rather than just
				changeset level DAGs as this diagram illustrates:

				M M1 M2

				AB
				\|`-------v M2 clones M
				aB AB file A is change in mainline
				\|`---v AB' file B is changed in M2
				\| aB / \| M1 clones M
				\| ab/ \| M1 changes B
				\| ab' \| M1 merges from M2, changes to B conflict
				\| \| A'B' M2 changes A
				`---+--.\|
				\| a'B' M2 merges from mainline, changes to A conflict
				`--.\|
				??? depending on which ancestor we choose, we will have
				to redo A hand-merge, B hand-merge, or both
				but if we look at the files independently, everything
				is fine

				The result is a merged version in the working directory, waiting for
				check-in.

				Rollback:

				When performing a commit or a merge, we order things so that the
				changeset entry gets added last. We keep a transaction log of the name
				of each file touched and its length prior to the transaction. On
				abort, we simply truncate each file to its prior length. This is one
				of the nice properties of the append-only structure of the revlogs.
				We can also reuse this journal for "rollback".

				Merging between repositories:

				One of the key features of Mercurial is the ability to merge between
				independent repositories in a decentralized fashion. Each repository
				can act as a read-only server or a client. Clients operating by
				pulling all branches that it hasn't seen from the server and adding
				them into its graph. This is done in two steps: searching for new
				"roots" and pulling a "changegroup"

				Searching for new "roots" begins by finding all new heads and
				searching backwards from those heads to the first unknown nodes in
				their respective branches. These nodes are the 'roots' that are used
				to calculate the 'changegroup': the set of all changesets starting at
				those roots. Mercurial takes pains to make this search efficient in
				both bandwidth and round-trips.

				Once the roots are found, the changegroup can be transferred as a
				single streaming transfer. This is organized as an ordered set of
				deltas for changesets, manifests, and files. Large chunks of deltas
				can be directly added to the repository without unpacking so it's
				fairly fast.

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages