Show More
@@ -1,254 +1,251 b'' | |||
|
1 | 1 | .. _parallelsecurity: |
|
2 | 2 | |
|
3 | 3 | =========================== |
|
4 | 4 | Security details of IPython |
|
5 | 5 | =========================== |
|
6 | 6 | |
|
7 | 7 | .. note:: |
|
8 | 8 | |
|
9 | 9 | This section is not thorough, and IPython.zmq needs a thorough security |
|
10 | 10 | audit. |
|
11 | 11 | |
|
12 | 12 | IPython's :mod:`IPython.zmq` package exposes the full power of the |
|
13 | 13 | Python interpreter over a TCP/IP network for the purposes of parallel |
|
14 | 14 | computing. This feature brings up the important question of IPython's security |
|
15 | 15 | model. This document gives details about this model and how it is implemented |
|
16 | 16 | in IPython's architecture. |
|
17 | 17 | |
|
18 | 18 | Process and network topology |
|
19 | 19 | ============================ |
|
20 | 20 | |
|
21 | 21 | To enable parallel computing, IPython has a number of different processes that |
|
22 | 22 | run. These processes are discussed at length in the IPython documentation and |
|
23 | 23 | are summarized here: |
|
24 | 24 | |
|
25 | 25 | * The IPython *engine*. This process is a full blown Python |
|
26 | 26 | interpreter in which user code is executed. Multiple |
|
27 | 27 | engines are started to make parallel computing possible. |
|
28 | 28 | * The IPython *hub*. This process monitors a set of |
|
29 | 29 | engines and schedulers, and keeps track of the state of the processes. It listens |
|
30 | 30 | for registration connections from engines and clients, and monitor connections |
|
31 | 31 | from schedulers. |
|
32 | 32 | * The IPython *schedulers*. This is a set of processes that relay commands and results |
|
33 | 33 | between clients and engines. They are typically on the same machine as the controller, |
|
34 | 34 | and listen for connections from engines and clients, but connect to the Hub. |
|
35 | 35 | * The IPython *client*. This process is typically an |
|
36 | 36 | interactive Python process that is used to coordinate the |
|
37 | 37 | engines to get a parallel computation done. |
|
38 | 38 | |
|
39 | 39 | Collectively, these processes are called the IPython *cluster*, and the hub and schedulers |
|
40 | 40 | together are referred to as the *controller*. |
|
41 | 41 | |
|
42 | 42 | |
|
43 | 43 | These processes communicate over any transport supported by ZeroMQ (tcp,pgm,infiniband,ipc) |
|
44 | 44 | with a well defined topology. The IPython hub and schedulers listen on sockets. Upon |
|
45 | 45 | starting, an engine connects to a hub and registers itself, which then informs the engine |
|
46 | 46 | of the connection information for the schedulers, and the engine then connects to the |
|
47 | 47 | schedulers. These engine/hub and engine/scheduler connections persist for the |
|
48 | 48 | lifetime of each engine. |
|
49 | 49 | |
|
50 | 50 | The IPython client also connects to the controller processes using a number of socket |
|
51 | 51 | connections. As of writing, this is one socket per scheduler (4), and 3 connections to the |
|
52 | 52 | hub for a total of 7. These connections persist for the lifetime of the client only. |
|
53 | 53 | |
|
54 | 54 | A given IPython controller and set of engines engines typically has a relatively |
|
55 | 55 | short lifetime. Typically this lifetime corresponds to the duration of a single parallel |
|
56 | 56 | simulation performed by a single user. Finally, the hub, schedulers, engines, and client |
|
57 | 57 | processes typically execute with the permissions of that same user. More specifically, the |
|
58 | 58 | controller and engines are *not* executed as root or with any other superuser permissions. |
|
59 | 59 | |
|
60 | 60 | Application logic |
|
61 | 61 | ================= |
|
62 | 62 | |
|
63 | 63 | When running the IPython kernel to perform a parallel computation, a user |
|
64 | 64 | utilizes the IPython client to send Python commands and data through the |
|
65 | 65 | IPython schedulers to the IPython engines, where those commands are executed |
|
66 | 66 | and the data processed. The design of IPython ensures that the client is the |
|
67 | 67 | only access point for the capabilities of the engines. That is, the only way |
|
68 | 68 | of addressing the engines is through a client. |
|
69 | 69 | |
|
70 | 70 | A user can utilize the client to instruct the IPython engines to execute |
|
71 | 71 | arbitrary Python commands. These Python commands can include calls to the |
|
72 | 72 | system shell, access the filesystem, etc., as required by the user's |
|
73 | 73 | application code. From this perspective, when a user runs an IPython engine on |
|
74 | 74 | a host, that engine has the same capabilities and permissions as the user |
|
75 | 75 | themselves (as if they were logged onto the engine's host with a terminal). |
|
76 | 76 | |
|
77 | 77 | Secure network connections |
|
78 | 78 | ========================== |
|
79 | 79 | |
|
80 | 80 | Overview |
|
81 | 81 | -------- |
|
82 | 82 | |
|
83 | 83 | ZeroMQ provides exactly no security. For this reason, users of IPython must be very |
|
84 | 84 | careful in managing connections, because an open TCP/IP socket presents access to |
|
85 | 85 | arbitrary execution as the user on the engine machines. As a result, the default behavior |
|
86 | 86 | of controller processes is to only listen for clients on the loopback interface, and the |
|
87 | 87 | client must establish SSH tunnels to connect to the controller processes. |
|
88 | 88 | |
|
89 | 89 | .. warning:: |
|
90 | 90 | |
|
91 | 91 | If the controller's loopback interface is untrusted, then IPython should be considered |
|
92 | 92 | vulnerable, and this extends to the loopback of all connected clients, which have |
|
93 | 93 | opened a loopback port that is redirected to the controller's loopback port. |
|
94 | 94 | |
|
95 | 95 | |
|
96 | 96 | SSH |
|
97 | 97 | --- |
|
98 | 98 | |
|
99 | 99 | Since ZeroMQ provides no security, SSH tunnels are the primary source of secure |
|
100 | 100 | connections. A connector file, such as `ipcontroller-client.json`, will contain |
|
101 | 101 | information for connecting to the controller, possibly including the address of an |
|
102 | 102 | ssh-server through with the client is to tunnel. The Client object then creates tunnels |
|
103 | 103 | using either [OpenSSH]_ or [Paramiko]_, depending on the platform. If users do not wish to |
|
104 | 104 | use OpenSSH or Paramiko, or the tunneling utilities are insufficient, then they may |
|
105 | 105 | construct the tunnels themselves, and simply connect clients and engines as if the |
|
106 | 106 | controller were on loopback on the connecting machine. |
|
107 | 107 | |
|
108 | .. note:: | |
|
109 | ||
|
110 | There is not currently tunneling available for engines. | |
|
111 | 108 | |
|
112 | 109 | Authentication |
|
113 | 110 | -------------- |
|
114 | 111 | |
|
115 | 112 | To protect users of shared machines, [HMAC]_ digests are used to sign messages, using a |
|
116 | 113 | shared key. |
|
117 | 114 | |
|
118 | 115 | The Session object that handles the message protocol uses a unique key to verify valid |
|
119 | 116 | messages. This can be any value specified by the user, but the default behavior is a |
|
120 | 117 | pseudo-random 128-bit number, as generated by `uuid.uuid4()`. This key is used to |
|
121 | 118 | initialize an HMAC object, which digests all messages, and includes that digest as a |
|
122 | 119 | signature and part of the message. Every message that is unpacked (on Controller, Engine, |
|
123 | 120 | and Client) will also be digested by the receiver, ensuring that the sender's key is the |
|
124 | 121 | same as the receiver's. No messages that do not contain this key are acted upon in any |
|
125 | 122 | way. The key itself is never sent over the network. |
|
126 | 123 | |
|
127 | 124 | There is exactly one shared key per cluster - it must be the same everywhere. Typically, |
|
128 | 125 | the controller creates this key, and stores it in the private connection files |
|
129 | 126 | `ipython-{engine|client}.json`. These files are typically stored in the |
|
130 | 127 | `~/.ipython/profile_<name>/security` directory, and are maintained as readable only by the |
|
131 | 128 | owner, just as is common practice with a user's keys in their `.ssh` directory. |
|
132 | 129 | |
|
133 | 130 | .. warning:: |
|
134 | 131 | |
|
135 | 132 | It is important to note that the signatures protect against unauthorized messages, |
|
136 | 133 | but, as there is no encryption, provide exactly no protection of data privacy. It is |
|
137 | 134 | possible, however, to use a custom serialization scheme (via Session.packer/unpacker |
|
138 | 135 | traits) that does incorporate your own encryption scheme. |
|
139 | 136 | |
|
140 | 137 | |
|
141 | 138 | |
|
142 | 139 | Specific security vulnerabilities |
|
143 | 140 | ================================= |
|
144 | 141 | |
|
145 | 142 | There are a number of potential security vulnerabilities present in IPython's |
|
146 | 143 | architecture. In this section we discuss those vulnerabilities and detail how |
|
147 | 144 | the security architecture described above prevents them from being exploited. |
|
148 | 145 | |
|
149 | 146 | Unauthorized clients |
|
150 | 147 | -------------------- |
|
151 | 148 | |
|
152 | 149 | The IPython client can instruct the IPython engines to execute arbitrary |
|
153 | 150 | Python code with the permissions of the user who started the engines. If an |
|
154 | 151 | attacker were able to connect their own hostile IPython client to the IPython |
|
155 | 152 | controller, they could instruct the engines to execute code. |
|
156 | 153 | |
|
157 | 154 | |
|
158 | 155 | On the first level, this attack is prevented by requiring access to the controller's |
|
159 | 156 | ports, which are recommended to only be open on loopback if the controller is on an |
|
160 | 157 | untrusted local network. If the attacker does have access to the Controller's ports, then |
|
161 | 158 | the attack is prevented by the capabilities based client authentication of the execution |
|
162 | 159 | key. The relevant authentication information is encoded into the JSON file that clients |
|
163 | 160 | must present to gain access to the IPython controller. By limiting the distribution of |
|
164 | 161 | those keys, a user can grant access to only authorized persons, just as with SSH keys. |
|
165 | 162 | |
|
166 | 163 | It is highly unlikely that an execution key could be guessed by an attacker |
|
167 | 164 | in a brute force guessing attack. A given instance of the IPython controller |
|
168 | 165 | only runs for a relatively short amount of time (on the order of hours). Thus |
|
169 | 166 | an attacker would have only a limited amount of time to test a search space of |
|
170 | 167 | size 2**128. For added security, users can have arbitrarily long keys. |
|
171 | 168 | |
|
172 | 169 | .. warning:: |
|
173 | 170 | |
|
174 | 171 | If the attacker has gained enough access to intercept loopback connections on *either* the |
|
175 | 172 | controller or client, then a duplicate message can be sent. To protect against this, |
|
176 | 173 | recipients only allow each signature once, and consider duplicates invalid. However, |
|
177 | 174 | the duplicate message could be sent to *another* recipient using the same key, |
|
178 | 175 | and it would be considered valid. |
|
179 | 176 | |
|
180 | 177 | |
|
181 | 178 | Unauthorized engines |
|
182 | 179 | -------------------- |
|
183 | 180 | |
|
184 | 181 | If an attacker were able to connect a hostile engine to a user's controller, |
|
185 | 182 | the user might unknowingly send sensitive code or data to the hostile engine. |
|
186 | 183 | This attacker's engine would then have full access to that code and data. |
|
187 | 184 | |
|
188 | 185 | This type of attack is prevented in the same way as the unauthorized client |
|
189 | 186 | attack, through the usage of the capabilities based authentication scheme. |
|
190 | 187 | |
|
191 | 188 | Unauthorized controllers |
|
192 | 189 | ------------------------ |
|
193 | 190 | |
|
194 | 191 | It is also possible that an attacker could try to convince a user's IPython |
|
195 | 192 | client or engine to connect to a hostile IPython controller. That controller |
|
196 | 193 | would then have full access to the code and data sent between the IPython |
|
197 | 194 | client and the IPython engines. |
|
198 | 195 | |
|
199 | 196 | Again, this attack is prevented through the capabilities in a connection file, which |
|
200 | 197 | ensure that a client or engine connects to the correct controller. It is also important to |
|
201 | 198 | note that the connection files also encode the IP address and port that the controller is |
|
202 | 199 | listening on, so there is little chance of mistakenly connecting to a controller running |
|
203 | 200 | on a different IP address and port. |
|
204 | 201 | |
|
205 | 202 | When starting an engine or client, a user must specify the key to use |
|
206 | 203 | for that connection. Thus, in order to introduce a hostile controller, the |
|
207 | 204 | attacker must convince the user to use the key associated with the |
|
208 | 205 | hostile controller. As long as a user is diligent in only using keys from |
|
209 | 206 | trusted sources, this attack is not possible. |
|
210 | 207 | |
|
211 | 208 | .. note:: |
|
212 | 209 | |
|
213 | 210 | I may be wrong, the unauthorized controller may be easier to fake than this. |
|
214 | 211 | |
|
215 | 212 | Other security measures |
|
216 | 213 | ======================= |
|
217 | 214 | |
|
218 | 215 | A number of other measures are taken to further limit the security risks |
|
219 | 216 | involved in running the IPython kernel. |
|
220 | 217 | |
|
221 | 218 | First, by default, the IPython controller listens on random port numbers. |
|
222 | 219 | While this can be overridden by the user, in the default configuration, an |
|
223 | 220 | attacker would have to do a port scan to even find a controller to attack. |
|
224 | 221 | When coupled with the relatively short running time of a typical controller |
|
225 | 222 | (on the order of hours), an attacker would have to work extremely hard and |
|
226 | 223 | extremely *fast* to even find a running controller to attack. |
|
227 | 224 | |
|
228 | 225 | Second, much of the time, especially when run on supercomputers or clusters, |
|
229 | 226 | the controller is running behind a firewall. Thus, for engines or client to |
|
230 | 227 | connect to the controller: |
|
231 | 228 | |
|
232 | 229 | * The different processes have to all be behind the firewall. |
|
233 | 230 | |
|
234 | 231 | or: |
|
235 | 232 | |
|
236 | 233 | * The user has to use SSH port forwarding to tunnel the |
|
237 | 234 | connections through the firewall. |
|
238 | 235 | |
|
239 | 236 | In either case, an attacker is presented with additional barriers that prevent |
|
240 | 237 | attacking or even probing the system. |
|
241 | 238 | |
|
242 | 239 | Summary |
|
243 | 240 | ======= |
|
244 | 241 | |
|
245 | 242 | IPython's architecture has been carefully designed with security in mind. The |
|
246 | 243 | capabilities based authentication model, in conjunction with SSH tunneled |
|
247 | 244 | TCP/IP channels, address the core potential vulnerabilities in the system, |
|
248 | 245 | while still enabling user's to use the system in open networks. |
|
249 | 246 | |
|
250 | 247 | .. [RFC5246] <http://tools.ietf.org/html/rfc5246> |
|
251 | 248 | |
|
252 | 249 | .. [OpenSSH] <http://www.openssh.com/> |
|
253 | 250 | .. [Paramiko] <http://www.lag.net/paramiko/> |
|
254 | 251 | .. [HMAC] <http://tools.ietf.org/html/rfc2104.html> |
General Comments 0
You need to be logged in to leave comments.
Login now