Show More
@@ -1,184 +1,184 b'' | |||
|
1 | 1 | # This software may be used and distributed according to the terms of the |
|
2 | 2 | # GNU General Public License version 2 or any later version. |
|
3 | 3 | |
|
4 | 4 | """advertise pre-generated bundles to seed clones |
|
5 | 5 | |
|
6 | 6 | "clonebundles" is a server-side extension used to advertise the existence |
|
7 | 7 | of pre-generated, externally hosted bundle files to clients that are |
|
8 | 8 | cloning so that cloning can be faster, more reliable, and require less |
|
9 | 9 | resources on the server. |
|
10 | 10 | |
|
11 | 11 | Cloning can be a CPU and I/O intensive operation on servers. Traditionally, |
|
12 | 12 | the server, in response to a client's request to clone, dynamically generates |
|
13 | 13 | a bundle containing the entire repository content and sends it to the client. |
|
14 | 14 | There is no caching on the server and the server will have to redundantly |
|
15 | 15 | generate the same outgoing bundle in response to each clone request. For |
|
16 | 16 | servers with large repositories or with high clone volume, the load from |
|
17 | 17 | clones can make scaling the server challenging and costly. |
|
18 | 18 | |
|
19 | 19 | This extension provides server operators the ability to offload potentially |
|
20 | 20 | expensive clone load to an external service. Here's how it works. |
|
21 | 21 | |
|
22 | 22 | 1. A server operator establishes a mechanism for making bundle files available |
|
23 | 23 | on a hosting service where Mercurial clients can fetch them. |
|
24 | 24 | 2. A manifest file listing available bundle URLs and some optional metadata |
|
25 | 25 | is added to the Mercurial repository on the server. |
|
26 | 26 | 3. A client initiates a clone against a clone bundles aware server. |
|
27 | 27 | 4. The client sees the server is advertising clone bundles and fetches the |
|
28 | 28 | manifest listing available bundles. |
|
29 | 29 | 5. The client filters and sorts the available bundles based on what it |
|
30 | 30 | supports and prefers. |
|
31 | 31 | 6. The client downloads and applies an available bundle from the |
|
32 | 32 | server-specified URL. |
|
33 | 33 | 7. The client reconnects to the original server and performs the equivalent |
|
34 | 34 | of :hg:`pull` to retrieve all repository data not in the bundle. (The |
|
35 | 35 | repository could have been updated between when the bundle was created |
|
36 | 36 | and when the client started the clone.) |
|
37 | 37 | |
|
38 | 38 | Instead of the server generating full repository bundles for every clone |
|
39 | 39 | request, it generates full bundles once and they are subsequently reused to |
|
40 | 40 | bootstrap new clones. The server may still transfer data at clone time. |
|
41 | 41 | However, this is only data that has been added/changed since the bundle was |
|
42 | 42 | created. For large, established repositories, this can reduce server load for |
|
43 | 43 | clones to less than 1% of original. |
|
44 | 44 | |
|
45 | 45 | To work, this extension requires the following of server operators: |
|
46 | 46 | |
|
47 | 47 | * Generating bundle files of repository content (typically periodically, |
|
48 | 48 | such as once per day). |
|
49 | 49 | * A file server that clients have network access to and that Python knows |
|
50 | 50 | how to talk to through its normal URL handling facility (typically an |
|
51 | 51 | HTTP server). |
|
52 | 52 | * A process for keeping the bundles manifest in sync with available bundle |
|
53 | 53 | files. |
|
54 | 54 | |
|
55 | 55 | Strictly speaking, using a static file hosting server isn't required: a server |
|
56 | 56 | operator could use a dynamic service for retrieving bundle data. However, |
|
57 | 57 | static file hosting services are simple and scalable and should be sufficient |
|
58 | 58 | for most needs. |
|
59 | 59 | |
|
60 | 60 | Bundle files can be generated with the :hg:`bundle` command. Typically |
|
61 | 61 | :hg:`bundle --all` is used to produce a bundle of the entire repository. |
|
62 | 62 | |
|
63 | 63 | :hg:`debugcreatestreamclonebundle` can be used to produce a special |
|
64 | 64 | *streaming clone bundle*. These are bundle files that are extremely efficient |
|
65 | 65 | to produce and consume (read: fast). However, they are larger than |
|
66 | 66 | traditional bundle formats and require that clients support the exact set |
|
67 | 67 | of repository data store formats in use by the repository that created them. |
|
68 | 68 | Typically, a newer server can serve data that is compatible with older clients. |
|
69 | 69 | However, *streaming clone bundles* don't have this guarantee. **Server |
|
70 | 70 | operators need to be aware that newer versions of Mercurial may produce |
|
71 | 71 | streaming clone bundles incompatible with older Mercurial versions.** |
|
72 | 72 | |
|
73 | The list of requirements printed by :hg:`debugcreatestreamclonebundle` should | |
|
74 | be specified in the ``requirements`` parameter of the *bundle specification | |
|
75 | string* for the ``BUNDLESPEC`` manifest property described below. e.g. | |
|
76 | ``BUNDLESPEC=none-packed1;requirements%3Drevlogv1``. | |
|
77 | ||
|
78 | 73 | A server operator is responsible for creating a ``.hg/clonebundles.manifest`` |
|
79 | 74 | file containing the list of available bundle files suitable for seeding |
|
80 | 75 | clones. If this file does not exist, the repository will not advertise the |
|
81 | 76 | existence of clone bundles when clients connect. |
|
82 | 77 | |
|
83 | 78 | The manifest file contains a newline (\n) delimited list of entries. |
|
84 | 79 | |
|
85 | 80 | Each line in this file defines an available bundle. Lines have the format: |
|
86 | 81 | |
|
87 | 82 | <URL> [<key>=<value>[ <key>=<value>]] |
|
88 | 83 | |
|
89 | 84 | That is, a URL followed by an optional, space-delimited list of key=value |
|
90 | 85 | pairs describing additional properties of this bundle. Both keys and values |
|
91 | 86 | are URI encoded. |
|
92 | 87 | |
|
93 | 88 | Keys in UPPERCASE are reserved for use by Mercurial and are defined below. |
|
94 | 89 | All non-uppercase keys can be used by site installations. An example use |
|
95 | 90 | for custom properties is to use the *datacenter* attribute to define which |
|
96 | 91 | data center a file is hosted in. Clients could then prefer a server in the |
|
97 | 92 | data center closest to them. |
|
98 | 93 | |
|
99 | 94 | The following reserved keys are currently defined: |
|
100 | 95 | |
|
101 | 96 | BUNDLESPEC |
|
102 | 97 | A "bundle specification" string that describes the type of the bundle. |
|
103 | 98 | |
|
104 | 99 | These are string values that are accepted by the "--type" argument of |
|
105 | 100 | :hg:`bundle`. |
|
106 | 101 | |
|
107 | 102 | The values are parsed in strict mode, which means they must be of the |
|
108 | 103 | "<compression>-<type>" form. See |
|
109 | 104 | mercurial.exchange.parsebundlespec() for more details. |
|
110 | 105 | |
|
106 | :hg:`debugbundle --spec` can be used to print the bundle specification | |
|
107 | string for a bundle file. The output of this command can be used verbatim | |
|
108 | for the value of ``BUNDLESPEC`` (it is already escaped). | |
|
109 | ||
|
111 | 110 | Clients will automatically filter out specifications that are unknown or |
|
112 | 111 | unsupported so they won't attempt to download something that likely won't |
|
113 | 112 | apply. |
|
114 | 113 | |
|
115 | 114 | The actual value doesn't impact client behavior beyond filtering: |
|
116 | 115 | clients will still sniff the bundle type from the header of downloaded |
|
117 | 116 | files. |
|
118 | 117 | |
|
119 | 118 | **Use of this key is highly recommended**, as it allows clients to |
|
120 | easily skip unsupported bundles. | |
|
119 | easily skip unsupported bundles. If this key is not defined, an old | |
|
120 | client may attempt to apply a bundle that it is incapable of reading. | |
|
121 | 121 | |
|
122 | 122 | REQUIRESNI |
|
123 | 123 | Whether Server Name Indication (SNI) is required to connect to the URL. |
|
124 | 124 | SNI allows servers to use multiple certificates on the same IP. It is |
|
125 | 125 | somewhat common in CDNs and other hosting providers. Older Python |
|
126 | 126 | versions do not support SNI. Defining this attribute enables clients |
|
127 | 127 | with older Python versions to filter this entry without experiencing |
|
128 | 128 | an opaque SSL failure at connection time. |
|
129 | 129 | |
|
130 | 130 | If this is defined, it is important to advertise a non-SNI fallback |
|
131 | 131 | URL or clients running old Python releases may not be able to clone |
|
132 | 132 | with the clonebundles facility. |
|
133 | 133 | |
|
134 | 134 | Value should be "true". |
|
135 | 135 | |
|
136 | 136 | Manifests can contain multiple entries. Assuming metadata is defined, clients |
|
137 | 137 | will filter entries from the manifest that they don't support. The remaining |
|
138 | 138 | entries are optionally sorted by client preferences |
|
139 | 139 | (``experimental.clonebundleprefers`` config option). The client then attempts |
|
140 | 140 | to fetch the bundle at the first URL in the remaining list. |
|
141 | 141 | |
|
142 | 142 | **Errors when downloading a bundle will fail the entire clone operation: |
|
143 | 143 | clients do not automatically fall back to a traditional clone.** The reason |
|
144 | 144 | for this is that if a server is using clone bundles, it is probably doing so |
|
145 | 145 | because the feature is necessary to help it scale. In other words, there |
|
146 | 146 | is an assumption that clone load will be offloaded to another service and |
|
147 | 147 | that the Mercurial server isn't responsible for serving this clone load. |
|
148 | 148 | If that other service experiences issues and clients start mass falling back to |
|
149 | 149 | the original Mercurial server, the added clone load could overwhelm the server |
|
150 | 150 | due to unexpected load and effectively take it offline. Not having clients |
|
151 | 151 | automatically fall back to cloning from the original server mitigates this |
|
152 | 152 | scenario. |
|
153 | 153 | |
|
154 | 154 | Because there is no automatic Mercurial server fallback on failure of the |
|
155 | 155 | bundle hosting service, it is important for server operators to view the bundle |
|
156 | 156 | hosting service as an extension of the Mercurial server in terms of |
|
157 | 157 | availability and service level agreements: if the bundle hosting service goes |
|
158 | 158 | down, so does the ability for clients to clone. Note: clients will see a |
|
159 | 159 | message informing them how to bypass the clone bundles facility when a failure |
|
160 | 160 | occurs. So server operators should prepare for some people to follow these |
|
161 | 161 | instructions when a failure occurs, thus driving more load to the original |
|
162 | 162 | Mercurial server when the bundle hosting service fails. |
|
163 | 163 | """ |
|
164 | 164 | |
|
165 | 165 | from mercurial import ( |
|
166 | 166 | extensions, |
|
167 | 167 | wireproto, |
|
168 | 168 | ) |
|
169 | 169 | |
|
170 | 170 | testedwith = 'internal' |
|
171 | 171 | |
|
172 | 172 | def capabilities(orig, repo, proto): |
|
173 | 173 | caps = orig(repo, proto) |
|
174 | 174 | |
|
175 | 175 | # Only advertise if a manifest exists. This does add some I/O to requests. |
|
176 | 176 | # But this should be cheaper than a wasted network round trip due to |
|
177 | 177 | # missing file. |
|
178 | 178 | if repo.opener.exists('clonebundles.manifest'): |
|
179 | 179 | caps.append('clonebundles') |
|
180 | 180 | |
|
181 | 181 | return caps |
|
182 | 182 | |
|
183 | 183 | def extsetup(ui): |
|
184 | 184 | extensions.wrapfunction(wireproto, '_capabilities', capabilities) |
General Comments 0
You need to be logged in to leave comments.
Login now