##// END OF EJS Templates
mmap: populate the mapping by default...
mmap: populate the mapping by default Without pre-population, accessing all data through a mmap can result in many pagefault, reducing performance significantly. If the mmap is prepopulated, the performance can no longer get slower than a full read. (See benchmark number below) In some cases were very few data is read, prepopulating can be overkill and slower than populating on access (through page fault). So that behavior can be controlled when the caller can pre-determine the best behavior. (See benchmark number below) In addition, testing with populating in a secondary thread yield great result combining the best of each approach. This might be implemented in later changesets. In all cases, using mmap has a great effect on memory usage when many processes run in parallel on the same machine. ### Benchmarks # What did I run A couple of month back I ran a large benchmark campaign to assess the impact of various approach for using mmap with the revlog (and other files), it highlighted a few benchmarks that capture the impact of the changes well. So to validate this change I checked the following: - log command displaying various revisions (read the changelog index) - log command displaying the patch of listed revisions (read the changelog index, the manifest index and a few files indexes) - unbundling a few revisions (read and write changelog, manifest and few files indexes, and walk the graph to update some cache) - pushing a few revisions (read and write changelog, manifest and few files indexes, walk the graph to update some cache, performs various accesses locally and remotely during discovery) Benchmarks were run using the default module policy (c+py) and the rust one. No significant difference were found between the two implementation, so we will present result using the default policy (unless otherwise specified). I ran them on a few repositories : - mercurial: a "public changeset only" copy of mercurial from 2018-08-01 using zstd compression and sparse-revlog - pypy: a copy of pypy from 2018-08-01 using zstd compression and sparse-revlog - netbeans: a copy of netbeans from 2018-08-01 using zstd compression and sparse-revlog - mozilla-try: a copy of mozilla-try from 2019-02-18 using zstd compression and sparse-revlog - mozilla-try persistent-nodemap: Same as the above but with a persistent nodemap. Used for the log --patch benchmark only # Results For the smaller repositories (mercurial, pypy), the impact of mmap is almost imperceptible, other cost dominating the operation. The impact of prepopulating is undiscernible in the benchmark we ran. For larger repositories the benchmark support explanation given above: On netbeans, the log can be about 1% faster without repopulation (for a difference < 100ms) but unbundle becomes a bit slower, even when small. ### data-env-vars.name = netbeans-2018-08-01-zstd-sparse-revlog # benchmark.name = hg.command.unbundle # benchmark.variants.issue6528 = disabled # benchmark.variants.reuse-external-delta-parent = yes # benchmark.variants.revs = any-1-extra-rev # benchmark.variants.source = unbundle # benchmark.variants.verbosity = quiet with-populate: 0.240157 no-populate: 0.265087 (+10.38%, +0.02) # benchmark.variants.revs = any-100-extra-rev with-populate: 1.459518 no-populate: 1.481290 (+1.49%, +0.02) ## benchmark.name = hg.command.push # benchmark.variants.explicit-rev = none # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.reuse-external-delta-parent = yes # benchmark.variants.revs = any-1-extra-rev with-populate: 0.771919 no-populate: 0.792025 (+2.60%, +0.02) # benchmark.variants.revs = any-100-extra-rev with-populate: 1.459518 no-populate: 1.481290 (+1.49%, +0.02) For mozilla-try, the "slow down" from pre-populate for small `hg log` is more visible, but still small in absolute time. (using rust value for the persistent nodemap value to be relevant). ### data-env-vars.name = mozilla-try-2019-02-18-ds2-pnm # benchmark.name = hg.command.log # bin-env-vars.hg.flavor = rust # benchmark.variants.patch = yes # benchmark.variants.limit-rev = 1 with-populate: 0.237813 no-populate: 0.229452 (-3.52%, -0.01) # benchmark.variants.limit-rev = 10 # benchmark.variants.patch = yes with-populate: 1.213578 no-populate: 1.205189 ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.variants.limit-rev = 1000 # benchmark.variants.patch = no # benchmark.variants.rev = tip with-populate: 0.198607 no-populate: 0.195038 (-1.80%, -0.00) However pre-populating provide a significant boost on more complex operations like unbundle or push: ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.name = hg.command.push # benchmark.variants.explicit-rev = none # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.reuse-external-delta-parent = yes # benchmark.variants.revs = any-1-extra-rev with-populate: 4.798632 no-populate: 4.953295 (+3.22%, +0.15) # benchmark.variants.revs = any-100-extra-rev with-populate: 4.903618 no-populate: 5.014963 (+2.27%, +0.11) ## benchmark.name = hg.command.unbundle # benchmark.variants.revs = any-1-extra-rev with-populate: 1.423411 no-populate: 1.585365 (+11.38%, +0.16) # benchmark.variants.revs = any-100-extra-rev with-populate: 1.537909 no-populate: 1.688489 (+9.79%, +0.15)

File last commit:

r44031:2e017696 default
r52574:522b4d72 default
Show More
subrepos.txt
171 lines | 7.2 KiB | text/plain | TextLexer
Subrepositories let you nest external repositories or projects into a
parent Mercurial repository, and make commands operate on them as a
group.
Mercurial currently supports Mercurial, Git, and Subversion
subrepositories.
Subrepositories are made of three components:
1. Nested repository checkouts. They can appear anywhere in the
parent working directory.
2. Nested repository references. They are defined in ``.hgsub``, which
should be placed in the root of working directory, and
tell where the subrepository checkouts come from. Mercurial
subrepositories are referenced like::
path/to/nested = https://example.com/nested/repo/path
Git and Subversion subrepos are also supported::
path/to/nested = [git]git://example.com/nested/repo/path
path/to/nested = [svn]https://example.com/nested/trunk/path
where ``path/to/nested`` is the checkout location relatively to the
parent Mercurial root, and ``https://example.com/nested/repo/path``
is the source repository path. The source can also reference a
filesystem path.
Note that ``.hgsub`` does not exist by default in Mercurial
repositories, you have to create and add it to the parent
repository before using subrepositories.
3. Nested repository states. They are defined in ``.hgsubstate``, which
is placed in the root of working directory, and
capture whatever information is required to restore the
subrepositories to the state they were committed in a parent
repository changeset. Mercurial automatically record the nested
repositories states when committing in the parent repository.
.. note::
The ``.hgsubstate`` file should not be edited manually.
Adding a Subrepository
======================
If ``.hgsub`` does not exist, create it and add it to the parent
repository. Clone or checkout the external projects where you want it
to live in the parent repository. Edit ``.hgsub`` and add the
subrepository entry as described above. At this point, the
subrepository is tracked and the next commit will record its state in
``.hgsubstate`` and bind it to the committed changeset.
Synchronizing a Subrepository
=============================
Subrepos do not automatically track the latest changeset of their
sources. Instead, they are updated to the changeset that corresponds
with the changeset checked out in the top-level changeset. This is so
developers always get a consistent set of compatible code and
libraries when they update.
Thus, updating subrepos is a manual process. Simply check out target
subrepo at the desired revision, test in the top-level repo, then
commit in the parent repository to record the new combination.
Deleting a Subrepository
========================
To remove a subrepository from the parent repository, delete its
reference from ``.hgsub``, then remove its files.
Interaction with Mercurial Commands
===================================
:add: add does not recurse in subrepos unless -S/--subrepos is
specified. However, if you specify the full path of a file in a
subrepo, it will be added even without -S/--subrepos specified.
Subversion subrepositories are currently silently
ignored.
:addremove: addremove does not recurse into subrepos unless
-S/--subrepos is specified. However, if you specify the full
path of a directory in a subrepo, addremove will be performed on
it even without -S/--subrepos being specified. Git and
Subversion subrepositories will print a warning and continue.
:archive: archive does not recurse in subrepositories unless
-S/--subrepos is specified.
:cat: Git subrepositories only support exact file matches.
Subversion subrepositories are currently ignored.
:commit: commit creates a consistent snapshot of the state of the
entire project and its subrepositories. If any subrepositories
have been modified, Mercurial will abort. Mercurial can be made
to instead commit all modified subrepositories by specifying
-S/--subrepos, or setting "ui.commitsubrepos=True" in a
configuration file (see :hg:`help config`). After there are no
longer any modified subrepositories, it records their state and
finally commits it in the parent repository. The --addremove
option also honors the -S/--subrepos option. However, Git and
Subversion subrepositories will print a warning and abort.
:diff: diff does not recurse in subrepos unless -S/--subrepos is
specified. However, if you specify the full path of a file or
directory in a subrepo, it will be diffed even without
-S/--subrepos being specified. Subversion subrepositories are
currently silently ignored.
:files: files does not recurse into subrepos unless -S/--subrepos is
specified. However, if you specify the full path of a file or
directory in a subrepo, it will be displayed even without
-S/--subrepos being specified. Git and Subversion subrepositories
are currently silently ignored.
:forget: forget currently only handles exact file matches in subrepos.
Git and Subversion subrepositories are currently silently ignored.
:incoming: incoming does not recurse in subrepos unless -S/--subrepos
is specified. Git and Subversion subrepositories are currently
silently ignored.
:outgoing: outgoing does not recurse in subrepos unless -S/--subrepos
is specified. Git and Subversion subrepositories are currently
silently ignored.
:pull: pull is not recursive since it is not clear what to pull prior
to running :hg:`update`. Listing and retrieving all
subrepositories changes referenced by the parent repository pulled
changesets is expensive at best, impossible in the Subversion
case.
:push: Mercurial will automatically push all subrepositories first
when the parent repository is being pushed. This ensures new
subrepository changes are available when referenced by top-level
repositories. Push is a no-op for Subversion subrepositories.
:serve: serve does not recurse into subrepositories unless
-S/--subrepos is specified. Git and Subversion subrepositories
are currently silently ignored.
:status: status does not recurse into subrepositories unless
-S/--subrepos is specified. Subrepository changes are displayed as
regular Mercurial changes on the subrepository
elements. Subversion subrepositories are currently silently
ignored.
:remove: remove does not recurse into subrepositories unless
-S/--subrepos is specified. However, if you specify a file or
directory path in a subrepo, it will be removed even without
-S/--subrepos. Git and Subversion subrepositories are currently
silently ignored.
:update: update restores the subrepos in the state they were
originally committed in target changeset. If the recorded
changeset is not available in the current subrepository, Mercurial
will pull it in first before updating. This means that updating
can require network access when using subrepositories.
Remapping Subrepositories Sources
=================================
A subrepository source location may change during a project life,
invalidating references stored in the parent repository history. To
fix this, rewriting rules can be defined in parent repository ``hgrc``
file or in Mercurial configuration. See the ``[subpaths]`` section in
hgrc(5) for more details.