clonebundles.py
184 lines
| 8.7 KiB
| text/x-python
|
PythonLexer
/ hgext / clonebundles.py
Gregory Szorc
|
r26623 | # This software may be used and distributed according to the terms of the | ||
# GNU General Public License version 2 or any later version. | ||||
Gregory Szorc
|
r27738 | """advertise pre-generated bundles to seed clones | ||
Gregory Szorc
|
r26762 | |||
"clonebundles" is a server-side extension used to advertise the existence | ||||
of pre-generated, externally hosted bundle files to clients that are | ||||
cloning so that cloning can be faster, more reliable, and require less | ||||
resources on the server. | ||||
Cloning can be a CPU and I/O intensive operation on servers. Traditionally, | ||||
the server, in response to a client's request to clone, dynamically generates | ||||
a bundle containing the entire repository content and sends it to the client. | ||||
There is no caching on the server and the server will have to redundantly | ||||
generate the same outgoing bundle in response to each clone request. For | ||||
servers with large repositories or with high clone volume, the load from | ||||
clones can make scaling the server challenging and costly. | ||||
Gregory Szorc
|
r26623 | |||
Gregory Szorc
|
r26762 | This extension provides server operators the ability to offload potentially | ||
expensive clone load to an external service. Here's how it works. | ||||
1. A server operator establishes a mechanism for making bundle files available | ||||
on a hosting service where Mercurial clients can fetch them. | ||||
2. A manifest file listing available bundle URLs and some optional metadata | ||||
is added to the Mercurial repository on the server. | ||||
3. A client initiates a clone against a clone bundles aware server. | ||||
4. The client sees the server is advertising clone bundles and fetches the | ||||
manifest listing available bundles. | ||||
5. The client filters and sorts the available bundles based on what it | ||||
supports and prefers. | ||||
6. The client downloads and applies an available bundle from the | ||||
server-specified URL. | ||||
7. The client reconnects to the original server and performs the equivalent | ||||
of :hg:`pull` to retrieve all repository data not in the bundle. (The | ||||
repository could have been updated between when the bundle was created | ||||
and when the client started the clone.) | ||||
Gregory Szorc
|
r26623 | |||
Gregory Szorc
|
r26762 | Instead of the server generating full repository bundles for every clone | ||
request, it generates full bundles once and they are subsequently reused to | ||||
bootstrap new clones. The server may still transfer data at clone time. | ||||
However, this is only data that has been added/changed since the bundle was | ||||
created. For large, established repositories, this can reduce server load for | ||||
clones to less than 1% of original. | ||||
To work, this extension requires the following of server operators: | ||||
* Generating bundle files of repository content (typically periodically, | ||||
such as once per day). | ||||
* A file server that clients have network access to and that Python knows | ||||
Mathias De Maré
|
r27347 | how to talk to through its normal URL handling facility (typically an | ||
Gregory Szorc
|
r26762 | HTTP server). | ||
* A process for keeping the bundles manifest in sync with available bundle | ||||
files. | ||||
Strictly speaking, using a static file hosting server isn't required: a server | ||||
operator could use a dynamic service for retrieving bundle data. However, | ||||
static file hosting services are simple and scalable and should be sufficient | ||||
for most needs. | ||||
Gregory Szorc
|
r26623 | |||
Javi Merino
|
r26884 | Bundle files can be generated with the :hg:`bundle` command. Typically | ||
Gregory Szorc
|
r26762 | :hg:`bundle --all` is used to produce a bundle of the entire repository. | ||
:hg:`debugcreatestreamclonebundle` can be used to produce a special | ||||
*streaming clone bundle*. These are bundle files that are extremely efficient | ||||
to produce and consume (read: fast). However, they are larger than | ||||
traditional bundle formats and require that clients support the exact set | ||||
of repository data store formats in use by the repository that created them. | ||||
Typically, a newer server can serve data that is compatible with older clients. | ||||
However, *streaming clone bundles* don't have this guarantee. **Server | ||||
operators need to be aware that newer versions of Mercurial may produce | ||||
streaming clone bundles incompatible with older Mercurial versions.** | ||||
A server operator is responsible for creating a ``.hg/clonebundles.manifest`` | ||||
file containing the list of available bundle files suitable for seeding | ||||
clones. If this file does not exist, the repository will not advertise the | ||||
existence of clone bundles when clients connect. | ||||
Gregory Szorc
|
r26623 | |||
The manifest file contains a newline (\n) delimited list of entries. | ||||
Each line in this file defines an available bundle. Lines have the format: | ||||
Gregory Szorc
|
r26762 | <URL> [<key>=<value>[ <key>=<value>]] | ||
Gregory Szorc
|
r26623 | |||
Gregory Szorc
|
r26762 | That is, a URL followed by an optional, space-delimited list of key=value | ||
pairs describing additional properties of this bundle. Both keys and values | ||||
are URI encoded. | ||||
Gregory Szorc
|
r26623 | |||
Gregory Szorc
|
r26762 | Keys in UPPERCASE are reserved for use by Mercurial and are defined below. | ||
All non-uppercase keys can be used by site installations. An example use | ||||
for custom properties is to use the *datacenter* attribute to define which | ||||
data center a file is hosted in. Clients could then prefer a server in the | ||||
data center closest to them. | ||||
Gregory Szorc
|
r26623 | |||
Gregory Szorc
|
r26762 | The following reserved keys are currently defined: | ||
Gregory Szorc
|
r26623 | |||
Gregory Szorc
|
r26644 | BUNDLESPEC | ||
A "bundle specification" string that describes the type of the bundle. | ||||
These are string values that are accepted by the "--type" argument of | ||||
Gregory Szorc
|
r26762 | :hg:`bundle`. | ||
Gregory Szorc
|
r26644 | |||
The values are parsed in strict mode, which means they must be of the | ||||
"<compression>-<type>" form. See | ||||
mercurial.exchange.parsebundlespec() for more details. | ||||
Gregory Szorc
|
r27886 | :hg:`debugbundle --spec` can be used to print the bundle specification | ||
string for a bundle file. The output of this command can be used verbatim | ||||
for the value of ``BUNDLESPEC`` (it is already escaped). | ||||
Gregory Szorc
|
r26644 | Clients will automatically filter out specifications that are unknown or | ||
unsupported so they won't attempt to download something that likely won't | ||||
apply. | ||||
The actual value doesn't impact client behavior beyond filtering: | ||||
clients will still sniff the bundle type from the header of downloaded | ||||
files. | ||||
Gregory Szorc
|
r26645 | |||
Gregory Szorc
|
r26762 | **Use of this key is highly recommended**, as it allows clients to | ||
Gregory Szorc
|
r27886 | easily skip unsupported bundles. If this key is not defined, an old | ||
client may attempt to apply a bundle that it is incapable of reading. | ||||
Gregory Szorc
|
r26762 | |||
Gregory Szorc
|
r26645 | REQUIRESNI | ||
Whether Server Name Indication (SNI) is required to connect to the URL. | ||||
SNI allows servers to use multiple certificates on the same IP. It is | ||||
somewhat common in CDNs and other hosting providers. Older Python | ||||
versions do not support SNI. Defining this attribute enables clients | ||||
Gregory Szorc
|
r26762 | with older Python versions to filter this entry without experiencing | ||
an opaque SSL failure at connection time. | ||||
Gregory Szorc
|
r26645 | |||
If this is defined, it is important to advertise a non-SNI fallback | ||||
URL or clients running old Python releases may not be able to clone | ||||
with the clonebundles facility. | ||||
Value should be "true". | ||||
Gregory Szorc
|
r26762 | |||
Manifests can contain multiple entries. Assuming metadata is defined, clients | ||||
will filter entries from the manifest that they don't support. The remaining | ||||
entries are optionally sorted by client preferences | ||||
(``experimental.clonebundleprefers`` config option). The client then attempts | ||||
to fetch the bundle at the first URL in the remaining list. | ||||
**Errors when downloading a bundle will fail the entire clone operation: | ||||
clients do not automatically fall back to a traditional clone.** The reason | ||||
for this is that if a server is using clone bundles, it is probably doing so | ||||
because the feature is necessary to help it scale. In other words, there | ||||
is an assumption that clone load will be offloaded to another service and | ||||
that the Mercurial server isn't responsible for serving this clone load. | ||||
If that other service experiences issues and clients start mass falling back to | ||||
the original Mercurial server, the added clone load could overwhelm the server | ||||
due to unexpected load and effectively take it offline. Not having clients | ||||
automatically fall back to cloning from the original server mitigates this | ||||
scenario. | ||||
Because there is no automatic Mercurial server fallback on failure of the | ||||
bundle hosting service, it is important for server operators to view the bundle | ||||
hosting service as an extension of the Mercurial server in terms of | ||||
availability and service level agreements: if the bundle hosting service goes | ||||
down, so does the ability for clients to clone. Note: clients will see a | ||||
message informing them how to bypass the clone bundles facility when a failure | ||||
occurs. So server operators should prepare for some people to follow these | ||||
instructions when a failure occurs, thus driving more load to the original | ||||
Mercurial server when the bundle hosting service fails. | ||||
Gregory Szorc
|
r26623 | """ | ||
from mercurial import ( | ||||
extensions, | ||||
wireproto, | ||||
) | ||||
testedwith = 'internal' | ||||
def capabilities(orig, repo, proto): | ||||
caps = orig(repo, proto) | ||||
# Only advertise if a manifest exists. This does add some I/O to requests. | ||||
# But this should be cheaper than a wasted network round trip due to | ||||
# missing file. | ||||
if repo.opener.exists('clonebundles.manifest'): | ||||
caps.append('clonebundles') | ||||
return caps | ||||
def extsetup(ui): | ||||
extensions.wrapfunction(wireproto, '_capabilities', capabilities) | ||||