##// END OF EJS Templates
rust: peek_mut optim for lazy ancestors...
rust: peek_mut optim for lazy ancestors This is one of the two optimizations that are also present in the Python code: replacing pairs of pop/push on the BinaryHeap by single updates, hence having it under the hood maintain its consistency (sift) only once. On Mozilla central, the measured gain (see details below) is around 7%. Creating the PeekMut object by calling peek_mut() right away instead of peek() first is less efficient (gain is only 4%, stats not included). Our interpretation is that its creation has a cost which is vasted in the cases where it ends by droping the value (Peekmut::pop() just does self.heap.pop() anyway). On the other hand, the immutable peek() is very fast: it's just taking a reference in the underlying vector. The Python version still has another optimization: if parent(current) == current-1, then the heap doesn't need to maintain its consistency, since we already know that it's bigger than all the others in the heap. Rust's BinaryHeap doesn't allow us to mutate its biggest element with no housekeeping, but we tried it anyway, with a copy of the BinaryHeap implementation with a dedicaded added method: it's not worth the technical debt in our opinion (we measured only a further 1.6% improvement). One possible explanation would be that the sift is really fast anyway in that case, whereas it's not in the case of Python, because it's at least partly done in slow Python code. Still it's possible that replacing BinaryHeap by something more dedicated to discrete ordered types could be faster. Measurements on mozilla-central: Three runs of 'hg perfancestors' on the parent changeset: Moyenne des médianes: 0.100587 ! wall 0.100062 comb 0.100000 user 0.100000 sys 0.000000 (best of 98) ! wall 0.135804 comb 0.130000 user 0.130000 sys 0.000000 (max of 98) ! wall 0.102864 comb 0.102755 user 0.099286 sys 0.003469 (avg of 98) ! wall 0.101486 comb 0.110000 user 0.110000 sys 0.000000 (median of 98) ! wall 0.096804 comb 0.090000 user 0.090000 sys 0.000000 (best of 100) ! wall 0.132235 comb 0.130000 user 0.120000 sys 0.010000 (max of 100) ! wall 0.100258 comb 0.100300 user 0.096000 sys 0.004300 (avg of 100) ! wall 0.098384 comb 0.100000 user 0.100000 sys 0.000000 (median of 100) ! wall 0.099925 comb 0.100000 user 0.100000 sys 0.000000 (best of 98) ! wall 0.133518 comb 0.140000 user 0.130000 sys 0.010000 (max of 98) ! wall 0.102381 comb 0.102449 user 0.098265 sys 0.004184 (avg of 98) ! wall 0.101891 comb 0.090000 user 0.090000 sys 0.000000 (median of 98) Mean of the medians: 0.100587 On the present changeset: ! wall 0.091344 comb 0.090000 user 0.090000 sys 0.000000 (best of 100) ! wall 0.122728 comb 0.120000 user 0.110000 sys 0.010000 (max of 100) ! wall 0.093268 comb 0.093300 user 0.089300 sys 0.004000 (avg of 100) ! wall 0.092567 comb 0.100000 user 0.090000 sys 0.010000 (median of 100) ! wall 0.093294 comb 0.080000 user 0.080000 sys 0.000000 (best of 100) ! wall 0.144887 comb 0.150000 user 0.140000 sys 0.010000 (max of 100) ! wall 0.097708 comb 0.097700 user 0.093400 sys 0.004300 (avg of 100) ! wall 0.094980 comb 0.100000 user 0.090000 sys 0.010000 (median of 100) ! wall 0.091262 comb 0.090000 user 0.080000 sys 0.010000 (best of 100) ! wall 0.123772 comb 0.130000 user 0.120000 sys 0.010000 (max of 100) ! wall 0.093188 comb 0.093200 user 0.089300 sys 0.003900 (avg of 100) ! wall 0.092364 comb 0.100000 user 0.090000 sys 0.010000 (median of 100) Mean of the medians is 0.0933 Differential Revision: https://phab.mercurial-scm.org/D5358
Georges Racinet -
r40847:e13ab4ac default
Show More
Name Size Modified Last Commit Author
/ hgext / remotefilelog
README.md Loading ...
__init__.py Loading ...
basepack.py Loading ...
basestore.py Loading ...
connectionpool.py Loading ...
constants.py Loading ...
contentstore.py Loading ...
datapack.py Loading ...
debugcommands.py Loading ...
extutil.py Loading ...
fileserverclient.py Loading ...
historypack.py Loading ...
metadatastore.py Loading ...
remotefilectx.py Loading ...
remotefilelog.py Loading ...
remotefilelogserver.py Loading ...
repack.py Loading ...
shallowbundle.py Loading ...
shallowrepo.py Loading ...
shallowstore.py Loading ...
shallowutil.py Loading ...
shallowverifier.py Loading ...

remotefilelog

The remotefilelog extension allows Mercurial to clone shallow copies of a repository such that all file contents are left on the server and only downloaded on demand by the client. This greatly speeds up clone and pull performance for repositories that have long histories or that are growing quickly.

In addition, the extension allows using a caching layer (such as memcache) to serve the file contents, thus providing better scalability and reducing server load.

Installing

NOTE: See the limitations section below to check if remotefilelog will work for your use case.

remotefilelog can be installed like any other Mercurial extension. Download the source code and add the remotefilelog subdirectory to your hgrc:

[extensions]
remotefilelog=path/to/remotefilelog/remotefilelog

Configuring

Server

  • server (required) - Set to 'True' to indicate that the server can serve shallow clones.
  • serverexpiration - The server keeps a local cache of recently requested file revision blobs in .hg/remotefilelogcache. This setting specifies how many days they should be kept locally. Defaults to 30.

An example server configuration:

[remotefilelog]
server = True
serverexpiration = 14

Client

  • cachepath (required) - the location to store locally cached file revisions
  • cachelimit - the maximum size of the cachepath. By default it's 1000 GB.
  • cachegroup - the default unix group for the cachepath. Useful on shared systems so multiple users can read and write to the same cache.
  • cacheprocess - the external process that will handle the remote caching layer. If not set, all requests will go to the Mercurial server.
  • fallbackpath - the Mercurial repo path to fetch file revisions from. By default it uses the paths.default repo. This setting is useful for cloning from shallow clones and still talking to the central server for file revisions.
  • includepattern - a list of regex patterns matching files that should be kept remotely. Defaults to all files.
  • excludepattern - a list of regex patterns matching files that should not be kept remotely and should always be downloaded.
  • pullprefetch - a revset of commits whose file content should be prefetched after every pull. The most common value for this will be '(bookmark() + head()) & public()'. This is useful in environments where offline work is common, since it will enable offline updating to, rebasing to, and committing on every head and bookmark.

An example client configuration:

[remotefilelog]
cachepath = /dev/shm/hgcache
cachelimit = 2 GB

Using as a largefiles replacement

remotefilelog can theoretically be used as a replacement for the largefiles extension. You can use the includepattern setting to specify which directories or file types are considered large and they will be left on the server. Unlike the largefiles extension, this can be done without converting the server repository. Only the client configuration needs to specify the patterns.

The include/exclude settings haven't been extensively tested, so this feature is still considered experimental.

An example largefiles style client configuration:

[remotefilelog]
cachepath = /dev/shm/hgcache
cachelimit = 2 GB
includepattern = *.sql3
  bin/*

Usage

Once you have configured the server, you can get a shallow clone by doing:

hg clone --shallow ssh://server//path/repo

After that, all normal mercurial commands should work.

Occasionly the client or server caches may grow too big. Run hg gc to clean up the cache. It will remove cached files that appear to no longer be necessary, or any files that exceed the configured maximum size. This does not improve performance; it just frees up space.

Limitations

  1. The extension must be used with Mercurial 3.3 (commit d7d08337b3f6) or higher (earlier versions of the extension work with earlier versions of Mercurial though, up to Mercurial 2.7).

  2. remotefilelog has only been tested on linux with case-sensitive filesystems. It should work on other unix systems but may have problems on case-insensitive filesystems.

  3. remotefilelog only works with ssh based Mercurial repos. http based repos are currently not supported, though it shouldn't be too difficult for some motivated individual to implement.

  4. Tags are not supported in completely shallow repos. If you use tags in your repo you will have to specify excludepattern=.hgtags in your client configuration to ensure that file is downloaded. The include/excludepattern settings are experimental at the moment and have yet to be deployed in a production environment.

  5. A few commands will be slower. hg log <filename> will be much slower since it has to walk the entire commit history instead of just the filelog. Use hg log -f <filename> instead, which remains very fast.

Contributing

Patches are welcome as pull requests, though they will be collapsed and rebased to maintain a linear history. Tests can be run via:

cd tests
./run-tests --with-hg=path/to/hgrepo/hg

We (Facebook) have to ask for a "Contributor License Agreement" from someone who sends in a patch or code that we want to include in the codebase. This is a legal requirement; a similar situation applies to Apache and other ASF projects.

If we ask you to fill out a CLA we'll direct you to our online CLA page where you can complete it easily. We use the same form as the Apache CLA so that friction is minimal.

License

remotefilelog is made available under the terms of the GNU General Public License version 2, or any later version. See the COPYING file that accompanies this distribution for the full text of the license.