##// END OF EJS Templates
Small changes to the security documentation.
Brian Granger -
Show More
@@ -0,0 +1,363 b''
1 .. _parallelsecurity:
2
3
4 ===========================
5 Security details of IPython
6 ===========================
7
8 IPython's :mod:`IPython.kernel` package exposes the full power of the Python
9 interpreter over a TCP/IP network for the purposes of parallel computing. This
10 feature brings up the important question of IPython's security model. This
11 document gives details about this model and how it is implemented in IPython's
12 architecture.
13
14 Processs and network topology
15 =============================
16
17 To enable parallel computing, IPython has a number of different processes that
18 run. These processes are discussed at length in the IPython documentation and
19 are summarized here:
20
21 * The IPython *engine*. This process is a full blown Python
22 interpreter in which user code is executed. Multiple
23 engines are started to make parallel computing possible.
24 * The IPython *controller*. This process manages a set of
25 engines, maintaining a queue for each and presenting
26 an asynchronous interface to the set of engines.
27 * The IPython *client*. This process is typically an
28 interactive Python process that is used to coordinate the
29 engines to get a parallel computation done.
30
31 Collectively, these three processes are called the IPython *kernel*.
32
33 These three processes communicate over TCP/IP connections with a well defined
34 topology. The IPython controller is the only process that listens on TCP/IP
35 sockets. Upon starting, an engine connects to a controller and registers
36 itself with the controller. These engine/controller TCP/IP connections persist
37 for the lifetime of each engine.
38
39 The IPython client also connects to the controller using one or more TCP/IP
40 connections. These connections persist for the lifetime of the client only.
41
42 A given IPython controller and set of engines typically has a relatively short
43 lifetime. Typically this lifetime corresponds to the duration of a single
44 parallel simulation performed by a single user. Finally, the controller,
45 engines and client processes typically execute with the permissions of that
46 same user. More specifically, the controller and engines are *not* executed as
47 root or with any other superuser permissions.
48
49 Application logic
50 =================
51
52 When running the IPython kernel to perform a parallel computation, a user
53 utilizes the IPython client to send Python commands and data through the
54 IPython controller to the IPython engines, where those commands are executed
55 and the data processed. The design of IPython ensures that the client is the
56 only access point for the capabilities of the engines. That is, the only way of addressing the engines is through a client.
57
58 A user can utilize the client to instruct the IPython engines to execute
59 arbitrary Python commands. These Python commands can include calls to the
60 system shell, access the filesystem, etc., as required by the user's
61 application code. From this perspective, when a user runs an IPython engine on
62 a host, that engine has the same capabilities and permissions as the user
63 themselves (as if they were logged onto the engine's host with a terminal).
64
65 Secure network connections
66 ==========================
67
68 Overview
69 --------
70
71 All TCP/IP connections between the client and controller as well as the
72 engines and controller are fully encrypted and authenticated. This section
73 describes the details of the encryption and authentication approached used
74 within IPython.
75
76 IPython uses the `Foolscap <http://foolscap.lothar.com/trac>`_ network
77 protocol for all communications between processes. Thus, the details of
78 IPython's security model are directly related to those of Foolscap. Thus, much
79 of the following discussion is actually just a discussion of the security that
80 is built in to Foolscap.
81
82 Encryption
83 ----------
84
85 For encryption purposes, IPython and Foolscap use the well known Secure Socket
86 Layer (SSL) protocol (`RFC5246 <http://tools.ietf.org/html/rfc5246>`_). We use
87 the implementation of this protocol provided by the OpenSSL project through
88 the `pyOpenSSL <http://pyopenssl.sourceforge.net/>`_ Python bindings to OpenSSL.
89
90 Authentication
91 --------------
92
93 IPython clients and engines must also authenticate themselves with the
94 controller. This is handled in a `capabilities based security model
95 <http://en.wikipedia.org/wiki/Capability-based_security>`_. In this model, the
96 controller creates a strong cryptographic key or token that represents each
97 set of capability that the controller offers. Any party who has this key and
98 presents it to the controller has full access to the corresponding
99 capabilities of the controller. This model is analogous to using a physical
100 key to gain access to physical items (capabilities) behind a locked door.
101
102 For a capabilities based authentication system to prevent unauthorized access,
103 two things must be ensured:
104
105 * The keys must be cryptographically strong. Otherwise attackers could gain
106 access by a simple brute force key guessing attack.
107 * The actual keys must be distributed only to authorized parties.
108
109 The keys in Foolscap are called Foolscap URL's or FURLs. The following section
110 gives details about how these FURLs are created in Foolscap. The IPython
111 controller creates a number of FURLs for different purposes:
112
113 * One FURL that grants IPython engines access to the controller. Also
114 implicit in this access is permission to execute code sent by an
115 authenticated IPython client.
116 * Two or more FURLs that grant IPython clients access to the controller.
117 Implicit in this access is permission to give the controller's engine code
118 to execute.
119
120 Upon starting, the controller creates these different FURLS and writes them
121 files in the user-read-only directory $HOME/.ipython/security. Thus, only the
122 user who starts the controller has access to the FURLs.
123
124 For an IPython client or engine to authenticate with a controller, it must
125 present the appropriate FURL to the controller upon connecting. If the
126 FURL matches what the controller expects for a given capability, access is
127 granted. If not, access is denied. The exchange of FURLs is done after
128 encrypted communications channels have been established to prevent attackers
129 from capturing them.
130
131 .. note::
132
133 The FURL is similar to an unsigned private key in SSH.
134
135 Details of the Foolscap handshake
136 ---------------------------------
137
138 In this section we detail the precise security handshake that takes place at
139 the beginning of any network connection in IPython. For the purposes of this
140 discussion, the SERVER is the IPython controller process and the CLIENT is the
141 IPython engine or client process.
142
143 Upon starting, all IPython processes do the following:
144
145 1. Create a public key x509 certificate (ISO/IEC 9594).
146 2. Create a hash of the contents of the certificate using the SHA-1 algorithm.
147 The base-32 encoded version of this hash is saved by the process as its
148 process id (actually in Foolscap, this is the Tub id, but here refer to
149 it as the process id).
150
151 Upon starting, the IPython controller also does the following:
152
153 1. Save the x509 certificate to disk in a secure location. The CLIENT
154 certificate is never saved to disk.
155 2. Create a FURL for each capability that the controller has. There are
156 separate capabilities the controller offers for clients and engines. The
157 FURL is created using: a) the process id of the SERVER, b) the IP
158 address and port the SERVER is listening on and c) a 160 bit,
159 cryptographically secure string that represents the capability (the
160 "capability id").
161 3. The FURLs are saved to disk in a secure location on the SERVER's host.
162
163 For a CLIENT to be able to connect to the SERVER and access a capability of
164 that SERVER, the CLIENT must have knowledge of the FURL for that SERVER's
165 capability. This typically requires that the file containing the FURL be
166 moved from the SERVER's host to the CLIENT's host. This is done by the end
167 user who started the SERVER and wishes to have a CLIENT connect to the SERVER.
168
169 When a CLIENT connects to the SERVER, the following handshake protocol takes
170 place:
171
172 1. The CLIENT tells the SERVER what process (or Tub) id it expects the SERVER
173 to have.
174 2. If the SERVER has that process id, it notifies the CLIENT that it will now
175 enter encrypted mode. If the SERVER has a different id, the SERVER aborts.
176 3. Both CLIENT and SERVER initiate the SSL handshake protocol.
177 4. Both CLIENT and SERVER request the certificate of their peer and verify
178 that certificate. If this succeeds, all further communications are
179 encrypted.
180 5. Both CLIENT and SERVER send a hello block containing connection parameters
181 and their process id.
182 6. The CLIENT and SERVER check that their peer's stated process id matches the
183 hash of the x509 certificate the peer presented. If not, the connection is
184 aborted.
185 7. The CLIENT verifies that the SERVER's stated id matches the id of the
186 SERVER the CLIENT is intending to connect to. If not, the connection is
187 aborted.
188 8. The CLIENT and SERVER elect a master who decides on the final connection
189 parameters.
190
191 The public/private key pair associated with each process's x509 certificate
192 are completely hidden from this handshake protocol. There are however, used
193 internally by OpenSSL as part of the SSL handshake protocol. Each process
194 keeps their own private key hidden and sends its peer only the public key
195 (embedded in the certificate).
196
197 Finally, when the CLIENT requests access to a particular SERVER capability,
198 the following happens:
199
200 1. The CLIENT asks the SERVER for access to a capability by presenting that
201 capabilities id.
202 2. If the SERVER has a capability with that id, access is granted. If not,
203 access is not granted.
204 3. Once access has been gained, the CLIENT can use the capability.
205
206 Specific security vulnerabilities
207 =================================
208
209 There are a number of potential security vulnerabilities present in IPython's
210 architecture. In this section we discuss those vulnerabilities and detail how
211 the security architecture described above prevents them from being exploited.
212
213 Unauthorized clients
214 --------------------
215
216 The IPython client can instruct the IPython engines to execute arbitrary
217 Python code with the permissions of the user who started the engines. If an
218 attacker were able to connect their own hostile IPython client to the IPython
219 controller, they could instruct the engines to execute code.
220
221 This attack is prevented by the capabilities based client authentication
222 performed after the encrypted channel has been established. The relevant
223 authentication information is encoded into the FURL that clients must
224 present to gain access to the IPython controller. By limiting the distribution
225 of those FURLs, a user can grant access to only authorized persons.
226
227 It is highly unlikely that a client FURL could be guessed by an attacker
228 in a brute force guessing attack. A given instance of the IPython controller
229 only runs for a relatively short amount of time (on the order of hours). Thus
230 an attacker would have only a limited amount of time to test a search space of
231 size 2**320. Furthermore, even if a controller were to run for a longer amount
232 of time, this search space is quite large (larger for instance than that of
233 typical username/password pair).
234
235 Unauthorized engines
236 --------------------
237
238 If an attacker were able to connect a hostile engine to a user's controller,
239 the user might unknowingly send sensitive code or data to the hostile engine.
240 This attacker's engine would then have full access to that code and data.
241
242 This type of attack is prevented in the same way as the unauthorized client
243 attack, through the usage of the capabilities based authentication scheme.
244
245 Unauthorized controllers
246 ------------------------
247
248 It is also possible that an attacker could try to convince a user's IPython
249 client or engine to connect to a hostile IPython controller. That controller
250 would then have full access to the code and data sent between the IPython
251 client and the IPython engines.
252
253 Again, this attack is prevented through the FURLs, which ensure that a
254 client or engine connects to the correct controller. It is also important to
255 note that the FURLs also encode the IP address and port that the
256 controller is listening on, so there is little chance of mistakenly connecting
257 to a controller running on a different IP address and port.
258
259 When starting an engine or client, a user must specify which FURL to use
260 for that connection. Thus, in order to introduce a hostile controller, the
261 attacker must convince the user to use the FURLs associated with the
262 hostile controller. As long as a user is diligent in only using FURLs from
263 trusted sources, this attack is not possible.
264
265 Other security measures
266 =======================
267
268 A number of other measures are taken to further limit the security risks
269 involved in running the IPython kernel.
270
271 First, by default, the IPython controller listens on random port numbers.
272 While this can be overridden by the user, in the default configuration, an
273 attacker would have to do a port scan to even find a controller to attack.
274 When coupled with the relatively short running time of a typical controller
275 (on the order of hours), an attacker would have to work extremely hard and
276 extremely *fast* to even find a running controller to attack.
277
278 Second, much of the time, especially when run on supercomputers or clusters,
279 the controller is running behind a firewall. Thus, for engines or client to
280 connect to the controller:
281
282 * The different processes have to all be behind the firewall.
283
284 or:
285
286 * The user has to use SSH port forwarding to tunnel the
287 connections through the firewall.
288
289 In either case, an attacker is presented with addition barriers that prevent
290 attacking or even probing the system.
291
292 Summary
293 =======
294
295 IPython's architecture has been carefully designed with security in mind. The
296 capabilities based authentication model, in conjunction with the encrypted
297 TCP/IP channels, address the core potential vulnerabilities in the system,
298 while still enabling user's to use the system in open networks.
299
300 Other questions
301 ===============
302
303 About keys
304 ----------
305
306 Can you clarify the roles of the certificate and its keys versus the FURL,
307 which is also called a key?
308
309 The certificate created by IPython processes is a standard public key x509
310 certificate, that is used by the SSL handshake protocol to setup encrypted
311 channel between the controller and the IPython engine or client. This public
312 and private key associated with this certificate are used only by the SSL
313 handshake protocol in setting up this encrypted channel.
314
315 The FURL serves a completely different and independent purpose from the
316 key pair associated with the certificate. When we refer to a FURL as a
317 key, we are using the word "key" in the capabilities based security model
318 sense. This has nothing to do with "key" in the public/private key sense used
319 in the SSL protocol.
320
321 With that said the FURL is used as an cryptographic key, to grant
322 IPython engines and clients access to particular capabilities that the
323 controller offers.
324
325 Self signed certificates
326 ------------------------
327
328 Is the controller creating a self-signed certificate? Is this created for per
329 instance/session, one-time-setup or each-time the controller is started?
330
331 The Foolscap network protocol, which handles the SSL protocol details, creates
332 a self-signed x509 certificate using OpenSSL for each IPython process. The
333 lifetime of the certificate is handled differently for the IPython controller
334 and the engines/client.
335
336 For the IPython engines and client, the certificate is only held in memory for
337 the lifetime of its process. It is never written to disk.
338
339 For the controller, the certificate can be created anew each time the
340 controller starts or it can be created once and reused each time the
341 controller starts. If at any point, the certificate is deleted, a new one is
342 created the next time the controller starts.
343
344 SSL private key
345 ---------------
346
347 How the private key (associated with the certificate) is distributed?
348
349 In the usual implementation of the SSL protocol, the private key is never
350 distributed. We follow this standard always.
351
352 SSL versus Foolscap authentication
353 ----------------------------------
354
355 Many SSL connections only perform one sided authentication (the server to the
356 client). How is the client authentication in IPython's system related to SSL
357 authentication?
358
359 We perform a two way SSL handshake in which both parties request and verify
360 the certificate of their peer. This mutual authentication is handled by the
361 SSL handshake and is separate and independent from the additional
362 authentication steps that the CLIENT and SERVER perform after an encrypted
363 channel is established.
General Comments 0
You need to be logged in to leave comments. Login now