Show More
@@ -0,0 +1,363 b'' | |||||
|
1 | .. _parallelsecurity: | |||
|
2 | ||||
|
3 | ||||
|
4 | =========================== | |||
|
5 | Security details of IPython | |||
|
6 | =========================== | |||
|
7 | ||||
|
8 | IPython's :mod:`IPython.kernel` package exposes the full power of the Python | |||
|
9 | interpreter over a TCP/IP network for the purposes of parallel computing. This | |||
|
10 | feature brings up the important question of IPython's security model. This | |||
|
11 | document gives details about this model and how it is implemented in IPython's | |||
|
12 | architecture. | |||
|
13 | ||||
|
14 | Processs and network topology | |||
|
15 | ============================= | |||
|
16 | ||||
|
17 | To enable parallel computing, IPython has a number of different processes that | |||
|
18 | run. These processes are discussed at length in the IPython documentation and | |||
|
19 | are summarized here: | |||
|
20 | ||||
|
21 | * The IPython *engine*. This process is a full blown Python | |||
|
22 | interpreter in which user code is executed. Multiple | |||
|
23 | engines are started to make parallel computing possible. | |||
|
24 | * The IPython *controller*. This process manages a set of | |||
|
25 | engines, maintaining a queue for each and presenting | |||
|
26 | an asynchronous interface to the set of engines. | |||
|
27 | * The IPython *client*. This process is typically an | |||
|
28 | interactive Python process that is used to coordinate the | |||
|
29 | engines to get a parallel computation done. | |||
|
30 | ||||
|
31 | Collectively, these three processes are called the IPython *kernel*. | |||
|
32 | ||||
|
33 | These three processes communicate over TCP/IP connections with a well defined | |||
|
34 | topology. The IPython controller is the only process that listens on TCP/IP | |||
|
35 | sockets. Upon starting, an engine connects to a controller and registers | |||
|
36 | itself with the controller. These engine/controller TCP/IP connections persist | |||
|
37 | for the lifetime of each engine. | |||
|
38 | ||||
|
39 | The IPython client also connects to the controller using one or more TCP/IP | |||
|
40 | connections. These connections persist for the lifetime of the client only. | |||
|
41 | ||||
|
42 | A given IPython controller and set of engines typically has a relatively short | |||
|
43 | lifetime. Typically this lifetime corresponds to the duration of a single | |||
|
44 | parallel simulation performed by a single user. Finally, the controller, | |||
|
45 | engines and client processes typically execute with the permissions of that | |||
|
46 | same user. More specifically, the controller and engines are *not* executed as | |||
|
47 | root or with any other superuser permissions. | |||
|
48 | ||||
|
49 | Application logic | |||
|
50 | ================= | |||
|
51 | ||||
|
52 | When running the IPython kernel to perform a parallel computation, a user | |||
|
53 | utilizes the IPython client to send Python commands and data through the | |||
|
54 | IPython controller to the IPython engines, where those commands are executed | |||
|
55 | and the data processed. The design of IPython ensures that the client is the | |||
|
56 | only access point for the capabilities of the engines. That is, the only way of addressing the engines is through a client. | |||
|
57 | ||||
|
58 | A user can utilize the client to instruct the IPython engines to execute | |||
|
59 | arbitrary Python commands. These Python commands can include calls to the | |||
|
60 | system shell, access the filesystem, etc., as required by the user's | |||
|
61 | application code. From this perspective, when a user runs an IPython engine on | |||
|
62 | a host, that engine has the same capabilities and permissions as the user | |||
|
63 | themselves (as if they were logged onto the engine's host with a terminal). | |||
|
64 | ||||
|
65 | Secure network connections | |||
|
66 | ========================== | |||
|
67 | ||||
|
68 | Overview | |||
|
69 | -------- | |||
|
70 | ||||
|
71 | All TCP/IP connections between the client and controller as well as the | |||
|
72 | engines and controller are fully encrypted and authenticated. This section | |||
|
73 | describes the details of the encryption and authentication approached used | |||
|
74 | within IPython. | |||
|
75 | ||||
|
76 | IPython uses the `Foolscap <http://foolscap.lothar.com/trac>`_ network | |||
|
77 | protocol for all communications between processes. Thus, the details of | |||
|
78 | IPython's security model are directly related to those of Foolscap. Thus, much | |||
|
79 | of the following discussion is actually just a discussion of the security that | |||
|
80 | is built in to Foolscap. | |||
|
81 | ||||
|
82 | Encryption | |||
|
83 | ---------- | |||
|
84 | ||||
|
85 | For encryption purposes, IPython and Foolscap use the well known Secure Socket | |||
|
86 | Layer (SSL) protocol (`RFC5246 <http://tools.ietf.org/html/rfc5246>`_). We use | |||
|
87 | the implementation of this protocol provided by the OpenSSL project through | |||
|
88 | the `pyOpenSSL <http://pyopenssl.sourceforge.net/>`_ Python bindings to OpenSSL. | |||
|
89 | ||||
|
90 | Authentication | |||
|
91 | -------------- | |||
|
92 | ||||
|
93 | IPython clients and engines must also authenticate themselves with the | |||
|
94 | controller. This is handled in a `capabilities based security model | |||
|
95 | <http://en.wikipedia.org/wiki/Capability-based_security>`_. In this model, the | |||
|
96 | controller creates a strong cryptographic key or token that represents each | |||
|
97 | set of capability that the controller offers. Any party who has this key and | |||
|
98 | presents it to the controller has full access to the corresponding | |||
|
99 | capabilities of the controller. This model is analogous to using a physical | |||
|
100 | key to gain access to physical items (capabilities) behind a locked door. | |||
|
101 | ||||
|
102 | For a capabilities based authentication system to prevent unauthorized access, | |||
|
103 | two things must be ensured: | |||
|
104 | ||||
|
105 | * The keys must be cryptographically strong. Otherwise attackers could gain | |||
|
106 | access by a simple brute force key guessing attack. | |||
|
107 | * The actual keys must be distributed only to authorized parties. | |||
|
108 | ||||
|
109 | The keys in Foolscap are called Foolscap URL's or FURLs. The following section | |||
|
110 | gives details about how these FURLs are created in Foolscap. The IPython | |||
|
111 | controller creates a number of FURLs for different purposes: | |||
|
112 | ||||
|
113 | * One FURL that grants IPython engines access to the controller. Also | |||
|
114 | implicit in this access is permission to execute code sent by an | |||
|
115 | authenticated IPython client. | |||
|
116 | * Two or more FURLs that grant IPython clients access to the controller. | |||
|
117 | Implicit in this access is permission to give the controller's engine code | |||
|
118 | to execute. | |||
|
119 | ||||
|
120 | Upon starting, the controller creates these different FURLS and writes them | |||
|
121 | files in the user-read-only directory $HOME/.ipython/security. Thus, only the | |||
|
122 | user who starts the controller has access to the FURLs. | |||
|
123 | ||||
|
124 | For an IPython client or engine to authenticate with a controller, it must | |||
|
125 | present the appropriate FURL to the controller upon connecting. If the | |||
|
126 | FURL matches what the controller expects for a given capability, access is | |||
|
127 | granted. If not, access is denied. The exchange of FURLs is done after | |||
|
128 | encrypted communications channels have been established to prevent attackers | |||
|
129 | from capturing them. | |||
|
130 | ||||
|
131 | .. note:: | |||
|
132 | ||||
|
133 | The FURL is similar to an unsigned private key in SSH. | |||
|
134 | ||||
|
135 | Details of the Foolscap handshake | |||
|
136 | --------------------------------- | |||
|
137 | ||||
|
138 | In this section we detail the precise security handshake that takes place at | |||
|
139 | the beginning of any network connection in IPython. For the purposes of this | |||
|
140 | discussion, the SERVER is the IPython controller process and the CLIENT is the | |||
|
141 | IPython engine or client process. | |||
|
142 | ||||
|
143 | Upon starting, all IPython processes do the following: | |||
|
144 | ||||
|
145 | 1. Create a public key x509 certificate (ISO/IEC 9594). | |||
|
146 | 2. Create a hash of the contents of the certificate using the SHA-1 algorithm. | |||
|
147 | The base-32 encoded version of this hash is saved by the process as its | |||
|
148 | process id (actually in Foolscap, this is the Tub id, but here refer to | |||
|
149 | it as the process id). | |||
|
150 | ||||
|
151 | Upon starting, the IPython controller also does the following: | |||
|
152 | ||||
|
153 | 1. Save the x509 certificate to disk in a secure location. The CLIENT | |||
|
154 | certificate is never saved to disk. | |||
|
155 | 2. Create a FURL for each capability that the controller has. There are | |||
|
156 | separate capabilities the controller offers for clients and engines. The | |||
|
157 | FURL is created using: a) the process id of the SERVER, b) the IP | |||
|
158 | address and port the SERVER is listening on and c) a 160 bit, | |||
|
159 | cryptographically secure string that represents the capability (the | |||
|
160 | "capability id"). | |||
|
161 | 3. The FURLs are saved to disk in a secure location on the SERVER's host. | |||
|
162 | ||||
|
163 | For a CLIENT to be able to connect to the SERVER and access a capability of | |||
|
164 | that SERVER, the CLIENT must have knowledge of the FURL for that SERVER's | |||
|
165 | capability. This typically requires that the file containing the FURL be | |||
|
166 | moved from the SERVER's host to the CLIENT's host. This is done by the end | |||
|
167 | user who started the SERVER and wishes to have a CLIENT connect to the SERVER. | |||
|
168 | ||||
|
169 | When a CLIENT connects to the SERVER, the following handshake protocol takes | |||
|
170 | place: | |||
|
171 | ||||
|
172 | 1. The CLIENT tells the SERVER what process (or Tub) id it expects the SERVER | |||
|
173 | to have. | |||
|
174 | 2. If the SERVER has that process id, it notifies the CLIENT that it will now | |||
|
175 | enter encrypted mode. If the SERVER has a different id, the SERVER aborts. | |||
|
176 | 3. Both CLIENT and SERVER initiate the SSL handshake protocol. | |||
|
177 | 4. Both CLIENT and SERVER request the certificate of their peer and verify | |||
|
178 | that certificate. If this succeeds, all further communications are | |||
|
179 | encrypted. | |||
|
180 | 5. Both CLIENT and SERVER send a hello block containing connection parameters | |||
|
181 | and their process id. | |||
|
182 | 6. The CLIENT and SERVER check that their peer's stated process id matches the | |||
|
183 | hash of the x509 certificate the peer presented. If not, the connection is | |||
|
184 | aborted. | |||
|
185 | 7. The CLIENT verifies that the SERVER's stated id matches the id of the | |||
|
186 | SERVER the CLIENT is intending to connect to. If not, the connection is | |||
|
187 | aborted. | |||
|
188 | 8. The CLIENT and SERVER elect a master who decides on the final connection | |||
|
189 | parameters. | |||
|
190 | ||||
|
191 | The public/private key pair associated with each process's x509 certificate | |||
|
192 | are completely hidden from this handshake protocol. There are however, used | |||
|
193 | internally by OpenSSL as part of the SSL handshake protocol. Each process | |||
|
194 | keeps their own private key hidden and sends its peer only the public key | |||
|
195 | (embedded in the certificate). | |||
|
196 | ||||
|
197 | Finally, when the CLIENT requests access to a particular SERVER capability, | |||
|
198 | the following happens: | |||
|
199 | ||||
|
200 | 1. The CLIENT asks the SERVER for access to a capability by presenting that | |||
|
201 | capabilities id. | |||
|
202 | 2. If the SERVER has a capability with that id, access is granted. If not, | |||
|
203 | access is not granted. | |||
|
204 | 3. Once access has been gained, the CLIENT can use the capability. | |||
|
205 | ||||
|
206 | Specific security vulnerabilities | |||
|
207 | ================================= | |||
|
208 | ||||
|
209 | There are a number of potential security vulnerabilities present in IPython's | |||
|
210 | architecture. In this section we discuss those vulnerabilities and detail how | |||
|
211 | the security architecture described above prevents them from being exploited. | |||
|
212 | ||||
|
213 | Unauthorized clients | |||
|
214 | -------------------- | |||
|
215 | ||||
|
216 | The IPython client can instruct the IPython engines to execute arbitrary | |||
|
217 | Python code with the permissions of the user who started the engines. If an | |||
|
218 | attacker were able to connect their own hostile IPython client to the IPython | |||
|
219 | controller, they could instruct the engines to execute code. | |||
|
220 | ||||
|
221 | This attack is prevented by the capabilities based client authentication | |||
|
222 | performed after the encrypted channel has been established. The relevant | |||
|
223 | authentication information is encoded into the FURL that clients must | |||
|
224 | present to gain access to the IPython controller. By limiting the distribution | |||
|
225 | of those FURLs, a user can grant access to only authorized persons. | |||
|
226 | ||||
|
227 | It is highly unlikely that a client FURL could be guessed by an attacker | |||
|
228 | in a brute force guessing attack. A given instance of the IPython controller | |||
|
229 | only runs for a relatively short amount of time (on the order of hours). Thus | |||
|
230 | an attacker would have only a limited amount of time to test a search space of | |||
|
231 | size 2**320. Furthermore, even if a controller were to run for a longer amount | |||
|
232 | of time, this search space is quite large (larger for instance than that of | |||
|
233 | typical username/password pair). | |||
|
234 | ||||
|
235 | Unauthorized engines | |||
|
236 | -------------------- | |||
|
237 | ||||
|
238 | If an attacker were able to connect a hostile engine to a user's controller, | |||
|
239 | the user might unknowingly send sensitive code or data to the hostile engine. | |||
|
240 | This attacker's engine would then have full access to that code and data. | |||
|
241 | ||||
|
242 | This type of attack is prevented in the same way as the unauthorized client | |||
|
243 | attack, through the usage of the capabilities based authentication scheme. | |||
|
244 | ||||
|
245 | Unauthorized controllers | |||
|
246 | ------------------------ | |||
|
247 | ||||
|
248 | It is also possible that an attacker could try to convince a user's IPython | |||
|
249 | client or engine to connect to a hostile IPython controller. That controller | |||
|
250 | would then have full access to the code and data sent between the IPython | |||
|
251 | client and the IPython engines. | |||
|
252 | ||||
|
253 | Again, this attack is prevented through the FURLs, which ensure that a | |||
|
254 | client or engine connects to the correct controller. It is also important to | |||
|
255 | note that the FURLs also encode the IP address and port that the | |||
|
256 | controller is listening on, so there is little chance of mistakenly connecting | |||
|
257 | to a controller running on a different IP address and port. | |||
|
258 | ||||
|
259 | When starting an engine or client, a user must specify which FURL to use | |||
|
260 | for that connection. Thus, in order to introduce a hostile controller, the | |||
|
261 | attacker must convince the user to use the FURLs associated with the | |||
|
262 | hostile controller. As long as a user is diligent in only using FURLs from | |||
|
263 | trusted sources, this attack is not possible. | |||
|
264 | ||||
|
265 | Other security measures | |||
|
266 | ======================= | |||
|
267 | ||||
|
268 | A number of other measures are taken to further limit the security risks | |||
|
269 | involved in running the IPython kernel. | |||
|
270 | ||||
|
271 | First, by default, the IPython controller listens on random port numbers. | |||
|
272 | While this can be overridden by the user, in the default configuration, an | |||
|
273 | attacker would have to do a port scan to even find a controller to attack. | |||
|
274 | When coupled with the relatively short running time of a typical controller | |||
|
275 | (on the order of hours), an attacker would have to work extremely hard and | |||
|
276 | extremely *fast* to even find a running controller to attack. | |||
|
277 | ||||
|
278 | Second, much of the time, especially when run on supercomputers or clusters, | |||
|
279 | the controller is running behind a firewall. Thus, for engines or client to | |||
|
280 | connect to the controller: | |||
|
281 | ||||
|
282 | * The different processes have to all be behind the firewall. | |||
|
283 | ||||
|
284 | or: | |||
|
285 | ||||
|
286 | * The user has to use SSH port forwarding to tunnel the | |||
|
287 | connections through the firewall. | |||
|
288 | ||||
|
289 | In either case, an attacker is presented with addition barriers that prevent | |||
|
290 | attacking or even probing the system. | |||
|
291 | ||||
|
292 | Summary | |||
|
293 | ======= | |||
|
294 | ||||
|
295 | IPython's architecture has been carefully designed with security in mind. The | |||
|
296 | capabilities based authentication model, in conjunction with the encrypted | |||
|
297 | TCP/IP channels, address the core potential vulnerabilities in the system, | |||
|
298 | while still enabling user's to use the system in open networks. | |||
|
299 | ||||
|
300 | Other questions | |||
|
301 | =============== | |||
|
302 | ||||
|
303 | About keys | |||
|
304 | ---------- | |||
|
305 | ||||
|
306 | Can you clarify the roles of the certificate and its keys versus the FURL, | |||
|
307 | which is also called a key? | |||
|
308 | ||||
|
309 | The certificate created by IPython processes is a standard public key x509 | |||
|
310 | certificate, that is used by the SSL handshake protocol to setup encrypted | |||
|
311 | channel between the controller and the IPython engine or client. This public | |||
|
312 | and private key associated with this certificate are used only by the SSL | |||
|
313 | handshake protocol in setting up this encrypted channel. | |||
|
314 | ||||
|
315 | The FURL serves a completely different and independent purpose from the | |||
|
316 | key pair associated with the certificate. When we refer to a FURL as a | |||
|
317 | key, we are using the word "key" in the capabilities based security model | |||
|
318 | sense. This has nothing to do with "key" in the public/private key sense used | |||
|
319 | in the SSL protocol. | |||
|
320 | ||||
|
321 | With that said the FURL is used as an cryptographic key, to grant | |||
|
322 | IPython engines and clients access to particular capabilities that the | |||
|
323 | controller offers. | |||
|
324 | ||||
|
325 | Self signed certificates | |||
|
326 | ------------------------ | |||
|
327 | ||||
|
328 | Is the controller creating a self-signed certificate? Is this created for per | |||
|
329 | instance/session, one-time-setup or each-time the controller is started? | |||
|
330 | ||||
|
331 | The Foolscap network protocol, which handles the SSL protocol details, creates | |||
|
332 | a self-signed x509 certificate using OpenSSL for each IPython process. The | |||
|
333 | lifetime of the certificate is handled differently for the IPython controller | |||
|
334 | and the engines/client. | |||
|
335 | ||||
|
336 | For the IPython engines and client, the certificate is only held in memory for | |||
|
337 | the lifetime of its process. It is never written to disk. | |||
|
338 | ||||
|
339 | For the controller, the certificate can be created anew each time the | |||
|
340 | controller starts or it can be created once and reused each time the | |||
|
341 | controller starts. If at any point, the certificate is deleted, a new one is | |||
|
342 | created the next time the controller starts. | |||
|
343 | ||||
|
344 | SSL private key | |||
|
345 | --------------- | |||
|
346 | ||||
|
347 | How the private key (associated with the certificate) is distributed? | |||
|
348 | ||||
|
349 | In the usual implementation of the SSL protocol, the private key is never | |||
|
350 | distributed. We follow this standard always. | |||
|
351 | ||||
|
352 | SSL versus Foolscap authentication | |||
|
353 | ---------------------------------- | |||
|
354 | ||||
|
355 | Many SSL connections only perform one sided authentication (the server to the | |||
|
356 | client). How is the client authentication in IPython's system related to SSL | |||
|
357 | authentication? | |||
|
358 | ||||
|
359 | We perform a two way SSL handshake in which both parties request and verify | |||
|
360 | the certificate of their peer. This mutual authentication is handled by the | |||
|
361 | SSL handshake and is separate and independent from the additional | |||
|
362 | authentication steps that the CLIENT and SERVER perform after an encrypted | |||
|
363 | channel is established. |
General Comments 0
You need to be logged in to leave comments.
Login now