##// END OF EJS Templates
manifest: persist the manifestfulltext cache...
manifest: persist the manifestfulltext cache Reconstructing the manifest from the revlog takes time, so much so that there already is a LRU cache to avoid having to load a manifest multiple times. This patch persists that LRU cache in the .hg/cache directory, so we can re-use this cache across hg commands. Commit benchmark (run on Macos 10.13 on a 2017-model Macbook Pro with Core i7 2.9GHz and flash drive), testing without and with patch run 5 times, baseline is r2a227782e754: * committing to an existing file, against the mozilla-central repository. Baseline real time average 1.9692, with patch 1.3786. A new debugcommand "hg debugmanifestfulltextcache" lets you inspect the cache, clear it, or add specific manifest nodeids to it. When calling repo.updatecaches(), the manifest(s) for the working copy parents are added to the cache. The hg perfmanifest command has an additional --clear-disk switch to clear this cache when testing manifest loading performance. Using this command to test performance on the firefox repository for revision f947d902ed91, whose manifest has a delta chain length of 60540, we see: $ hg perfmanifest f947d902ed91 --clear-disk ! wall 0.972253 comb 0.970000 user 0.850000 sys 0.120000 (best of 10) $ hg debugmanifestfulltextcache -a `hg log --debug -r f947d902ed91 | grep manifest | cut -d: -f3` Cache contains 1 manifest entries, in order of most to least recent: id: 0294517df4aad07c70701db43bc7ff24c3ce7dbc, size 25.6 MB Total cache data size 25.6 MB, on-disk 0 bytes $ hg perfmanifest f947d902ed91 ! wall 0.036748 comb 0.040000 user 0.020000 sys 0.020000 (best of 100) Worst-case scenario: a manifest text loaded from a single delta; in the firefox repository manifest node 9a1246ff762e is the chain base for the manifest attached to revision f947d902ed91. Loading this from a full cache file is just as fast as without the cache; the extra node ids ensure a big full cache: $ for node in 9a1246ff762e 1a1922c14a3e 54a31d11a36a 0294517df4aa; do > hgd debugmanifestfulltextcache -a $node > /dev/null > done $ hgd perfmanifest -m 9a1246ff762e ! wall 0.077513 comb 0.080000 user 0.030000 sys 0.050000 (best of 100) $ hgd perfmanifest -m 9a1246ff762e --clear-disk ! wall 0.078547 comb 0.080000 user 0.070000 sys 0.010000 (best of 100)

File last commit:

r32139:de86a687 stable
r38803:0a57945a default
Show More
changegroups.txt
188 lines | 8.2 KiB | text/plain | TextLexer
Changegroups are representations of repository revlog data, specifically
the changelog data, root/flat manifest data, treemanifest data, and
filelogs.
There are 3 versions of changegroups: ``1``, ``2``, and ``3``. From a
high-level, versions ``1`` and ``2`` are almost exactly the same, with the
only difference being an additional item in the *delta header*. Version
``3`` adds support for revlog flags in the *delta header* and optionally
exchanging treemanifests (enabled by setting an option on the
``changegroup`` part in the bundle2).
Changegroups when not exchanging treemanifests consist of 3 logical
segments::
+---------------------------------+
| | | |
| changeset | manifest | filelogs |
| | | |
| | | |
+---------------------------------+
When exchanging treemanifests, there are 4 logical segments::
+-------------------------------------------------+
| | | | |
| changeset | root | treemanifests | filelogs |
| | manifest | | |
| | | | |
+-------------------------------------------------+
The principle building block of each segment is a *chunk*. A *chunk*
is a framed piece of data::
+---------------------------------------+
| | |
| length | data |
| (4 bytes) | (<length - 4> bytes) |
| | |
+---------------------------------------+
All integers are big-endian signed integers. Each chunk starts with a 32-bit
integer indicating the length of the entire chunk (including the length field
itself).
There is a special case chunk that has a value of 0 for the length
(``0x00000000``). We call this an *empty chunk*.
Delta Groups
============
A *delta group* expresses the content of a revlog as a series of deltas,
or patches against previous revisions.
Delta groups consist of 0 or more *chunks* followed by the *empty chunk*
to signal the end of the delta group::
+------------------------------------------------------------------------+
| | | | | |
| chunk0 length | chunk0 data | chunk1 length | chunk1 data | 0x0 |
| (4 bytes) | (various) | (4 bytes) | (various) | (4 bytes) |
| | | | | |
+------------------------------------------------------------------------+
Each *chunk*'s data consists of the following::
+---------------------------------------+
| | |
| delta header | delta data |
| (various by version) | (various) |
| | |
+---------------------------------------+
The *delta data* is a series of *delta*s that describe a diff from an existing
entry (either that the recipient already has, or previously specified in the
bundle/changegroup).
The *delta header* is different between versions ``1``, ``2``, and
``3`` of the changegroup format.
Version 1 (headerlen=80)::
+------------------------------------------------------+
| | | | |
| node | p1 node | p2 node | link node |
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) |
| | | | |
+------------------------------------------------------+
Version 2 (headerlen=100)::
+------------------------------------------------------------------+
| | | | | |
| node | p1 node | p2 node | base node | link node |
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) |
| | | | | |
+------------------------------------------------------------------+
Version 3 (headerlen=102)::
+------------------------------------------------------------------------------+
| | | | | | |
| node | p1 node | p2 node | base node | link node | flags |
| (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (20 bytes) | (2 bytes) |
| | | | | | |
+------------------------------------------------------------------------------+
The *delta data* consists of ``chunklen - 4 - headerlen`` bytes, which contain a
series of *delta*s, densely packed (no separators). These deltas describe a diff
from an existing entry (either that the recipient already has, or previously
specified in the bundle/changegroup). The format is described more fully in
``hg help internals.bdiff``, but briefly::
+---------------------------------------------------------------+
| | | | |
| start offset | end offset | new length | content |
| (4 bytes) | (4 bytes) | (4 bytes) | (<new length> bytes) |
| | | | |
+---------------------------------------------------------------+
Please note that the length field in the delta data does *not* include itself.
In version 1, the delta is always applied against the previous node from
the changegroup or the first parent if this is the first entry in the
changegroup.
In version 2 and up, the delta base node is encoded in the entry in the
changegroup. This allows the delta to be expressed against any parent,
which can result in smaller deltas and more efficient encoding of data.
Changeset Segment
=================
The *changeset segment* consists of a single *delta group* holding
changelog data. The *empty chunk* at the end of the *delta group* denotes
the boundary to the *manifest segment*.
Manifest Segment
================
The *manifest segment* consists of a single *delta group* holding manifest
data. If treemanifests are in use, it contains only the manifest for the
root directory of the repository. Otherwise, it contains the entire
manifest data. The *empty chunk* at the end of the *delta group* denotes
the boundary to the next segment (either the *treemanifests segment* or the
*filelogs segment*, depending on version and the request options).
Treemanifests Segment
---------------------
The *treemanifests segment* only exists in changegroup version ``3``, and
only if the 'treemanifest' param is part of the bundle2 changegroup part
(it is not possible to use changegroup version 3 outside of bundle2).
Aside from the filenames in the *treemanifests segment* containing a
trailing ``/`` character, it behaves identically to the *filelogs segment*
(see below). The final sub-segment is followed by an *empty chunk* (logically,
a sub-segment with filename size 0). This denotes the boundary to the
*filelogs segment*.
Filelogs Segment
================
The *filelogs segment* consists of multiple sub-segments, each
corresponding to an individual file whose data is being described::
+--------------------------------------------------+
| | | | | |
| filelog0 | filelog1 | filelog2 | ... | 0x0 |
| | | | | (4 bytes) |
| | | | | |
+--------------------------------------------------+
The final filelog sub-segment is followed by an *empty chunk* (logically,
a sub-segment with filename size 0). This denotes the end of the segment
and of the overall changegroup.
Each filelog sub-segment consists of the following::
+------------------------------------------------------+
| | | |
| filename length | filename | delta group |
| (4 bytes) | (<length - 4> bytes) | (various) |
| | | |
+------------------------------------------------------+
That is, a *chunk* consisting of the filename (not terminated or padded)
followed by N chunks constituting the *delta group* for this file. The
*empty chunk* at the end of each *delta group* denotes the boundary to the
next filelog sub-segment.