Show More
@@ -1,255 +1,254 b'' | |||||
1 | .. _parallelsecurity: |
|
1 | .. _parallelsecurity: | |
2 |
|
2 | |||
3 | =========================== |
|
3 | =========================== | |
4 | Security details of IPython |
|
4 | Security details of IPython | |
5 | =========================== |
|
5 | =========================== | |
6 |
|
6 | |||
7 | .. note:: |
|
7 | .. note:: | |
8 |
|
8 | |||
9 | This section is not thorough, and IPython.zmq needs a thorough security |
|
9 | This section is not thorough, and IPython.zmq needs a thorough security | |
10 | audit. |
|
10 | audit. | |
11 |
|
11 | |||
12 | IPython's :mod:`IPython.zmq` package exposes the full power of the |
|
12 | IPython's :mod:`IPython.zmq` package exposes the full power of the | |
13 | Python interpreter over a TCP/IP network for the purposes of parallel |
|
13 | Python interpreter over a TCP/IP network for the purposes of parallel | |
14 | computing. This feature brings up the important question of IPython's security |
|
14 | computing. This feature brings up the important question of IPython's security | |
15 | model. This document gives details about this model and how it is implemented |
|
15 | model. This document gives details about this model and how it is implemented | |
16 | in IPython's architecture. |
|
16 | in IPython's architecture. | |
17 |
|
17 | |||
18 | Process and network topology |
|
18 | Process and network topology | |
19 | ============================ |
|
19 | ============================ | |
20 |
|
20 | |||
21 | To enable parallel computing, IPython has a number of different processes that |
|
21 | To enable parallel computing, IPython has a number of different processes that | |
22 | run. These processes are discussed at length in the IPython documentation and |
|
22 | run. These processes are discussed at length in the IPython documentation and | |
23 | are summarized here: |
|
23 | are summarized here: | |
24 |
|
24 | |||
25 | * The IPython *engine*. This process is a full blown Python |
|
25 | * The IPython *engine*. This process is a full blown Python | |
26 | interpreter in which user code is executed. Multiple |
|
26 | interpreter in which user code is executed. Multiple | |
27 | engines are started to make parallel computing possible. |
|
27 | engines are started to make parallel computing possible. | |
28 | * The IPython *hub*. This process monitors a set of |
|
28 | * The IPython *hub*. This process monitors a set of | |
29 | engines and schedulers, and keeps track of the state of the processes. It listens |
|
29 | engines and schedulers, and keeps track of the state of the processes. It listens | |
30 | for registration connections from engines and clients, and monitor connections |
|
30 | for registration connections from engines and clients, and monitor connections | |
31 | from schedulers. |
|
31 | from schedulers. | |
32 | * The IPython *schedulers*. This is a set of processes that relay commands and results |
|
32 | * The IPython *schedulers*. This is a set of processes that relay commands and results | |
33 | between clients and engines. They are typically on the same machine as the controller, |
|
33 | between clients and engines. They are typically on the same machine as the controller, | |
34 | and listen for connections from engines and clients, but connect to the Hub. |
|
34 | and listen for connections from engines and clients, but connect to the Hub. | |
35 | * The IPython *client*. This process is typically an |
|
35 | * The IPython *client*. This process is typically an | |
36 | interactive Python process that is used to coordinate the |
|
36 | interactive Python process that is used to coordinate the | |
37 | engines to get a parallel computation done. |
|
37 | engines to get a parallel computation done. | |
38 |
|
38 | |||
39 | Collectively, these processes are called the IPython *cluster*, and the hub and schedulers |
|
39 | Collectively, these processes are called the IPython *cluster*, and the hub and schedulers | |
40 | together are referred to as the *controller*. |
|
40 | together are referred to as the *controller*. | |
41 |
|
41 | |||
42 |
|
42 | |||
43 | These processes communicate over any transport supported by ZeroMQ (tcp,pgm,infiniband,ipc) |
|
43 | These processes communicate over any transport supported by ZeroMQ (tcp,pgm,infiniband,ipc) | |
44 | with a well defined topology. The IPython hub and schedulers listen on sockets. Upon |
|
44 | with a well defined topology. The IPython hub and schedulers listen on sockets. Upon | |
45 | starting, an engine connects to a hub and registers itself, which then informs the engine |
|
45 | starting, an engine connects to a hub and registers itself, which then informs the engine | |
46 | of the connection information for the schedulers, and the engine then connects to the |
|
46 | of the connection information for the schedulers, and the engine then connects to the | |
47 | schedulers. These engine/hub and engine/scheduler connections persist for the |
|
47 | schedulers. These engine/hub and engine/scheduler connections persist for the | |
48 | lifetime of each engine. |
|
48 | lifetime of each engine. | |
49 |
|
49 | |||
50 | The IPython client also connects to the controller processes using a number of socket |
|
50 | The IPython client also connects to the controller processes using a number of socket | |
51 | connections. As of writing, this is one socket per scheduler (4), and 3 connections to the |
|
51 | connections. As of writing, this is one socket per scheduler (4), and 3 connections to the | |
52 | hub for a total of 7. These connections persist for the lifetime of the client only. |
|
52 | hub for a total of 7. These connections persist for the lifetime of the client only. | |
53 |
|
53 | |||
54 | A given IPython controller and set of engines engines typically has a relatively |
|
54 | A given IPython controller and set of engines engines typically has a relatively | |
55 | short lifetime. Typically this lifetime corresponds to the duration of a single parallel |
|
55 | short lifetime. Typically this lifetime corresponds to the duration of a single parallel | |
56 | simulation performed by a single user. Finally, the hub, schedulers, engines, and client |
|
56 | simulation performed by a single user. Finally, the hub, schedulers, engines, and client | |
57 | processes typically execute with the permissions of that same user. More specifically, the |
|
57 | processes typically execute with the permissions of that same user. More specifically, the | |
58 | controller and engines are *not* executed as root or with any other superuser permissions. |
|
58 | controller and engines are *not* executed as root or with any other superuser permissions. | |
59 |
|
59 | |||
60 | Application logic |
|
60 | Application logic | |
61 | ================= |
|
61 | ================= | |
62 |
|
62 | |||
63 | When running the IPython kernel to perform a parallel computation, a user |
|
63 | When running the IPython kernel to perform a parallel computation, a user | |
64 | utilizes the IPython client to send Python commands and data through the |
|
64 | utilizes the IPython client to send Python commands and data through the | |
65 | IPython schedulers to the IPython engines, where those commands are executed |
|
65 | IPython schedulers to the IPython engines, where those commands are executed | |
66 | and the data processed. The design of IPython ensures that the client is the |
|
66 | and the data processed. The design of IPython ensures that the client is the | |
67 | only access point for the capabilities of the engines. That is, the only way |
|
67 | only access point for the capabilities of the engines. That is, the only way | |
68 | of addressing the engines is through a client. |
|
68 | of addressing the engines is through a client. | |
69 |
|
69 | |||
70 | A user can utilize the client to instruct the IPython engines to execute |
|
70 | A user can utilize the client to instruct the IPython engines to execute | |
71 | arbitrary Python commands. These Python commands can include calls to the |
|
71 | arbitrary Python commands. These Python commands can include calls to the | |
72 | system shell, access the filesystem, etc., as required by the user's |
|
72 | system shell, access the filesystem, etc., as required by the user's | |
73 | application code. From this perspective, when a user runs an IPython engine on |
|
73 | application code. From this perspective, when a user runs an IPython engine on | |
74 | a host, that engine has the same capabilities and permissions as the user |
|
74 | a host, that engine has the same capabilities and permissions as the user | |
75 | themselves (as if they were logged onto the engine's host with a terminal). |
|
75 | themselves (as if they were logged onto the engine's host with a terminal). | |
76 |
|
76 | |||
77 | Secure network connections |
|
77 | Secure network connections | |
78 | ========================== |
|
78 | ========================== | |
79 |
|
79 | |||
80 | Overview |
|
80 | Overview | |
81 | -------- |
|
81 | -------- | |
82 |
|
82 | |||
83 | ZeroMQ provides exactly no security. For this reason, users of IPython must be very |
|
83 | ZeroMQ provides exactly no security. For this reason, users of IPython must be very | |
84 | careful in managing connections, because an open TCP/IP socket presents access to |
|
84 | careful in managing connections, because an open TCP/IP socket presents access to | |
85 | arbitrary execution as the user on the engine machines. As a result, the default behavior |
|
85 | arbitrary execution as the user on the engine machines. As a result, the default behavior | |
86 | of controller processes is to only listen for clients on the loopback interface, and the |
|
86 | of controller processes is to only listen for clients on the loopback interface, and the | |
87 | client must establish SSH tunnels to connect to the controller processes. |
|
87 | client must establish SSH tunnels to connect to the controller processes. | |
88 |
|
88 | |||
89 | .. warning:: |
|
89 | .. warning:: | |
90 |
|
90 | |||
91 | If the controller's loopback interface is untrusted, then IPython should be considered |
|
91 | If the controller's loopback interface is untrusted, then IPython should be considered | |
92 | vulnerable, and this extends to the loopback of all connected clients, which have |
|
92 | vulnerable, and this extends to the loopback of all connected clients, which have | |
93 | opened a loopback port that is redirected to the controller's loopback port. |
|
93 | opened a loopback port that is redirected to the controller's loopback port. | |
94 |
|
94 | |||
95 |
|
95 | |||
96 | SSH |
|
96 | SSH | |
97 | --- |
|
97 | --- | |
98 |
|
98 | |||
99 | Since ZeroMQ provides no security, SSH tunnels are the primary source of secure |
|
99 | Since ZeroMQ provides no security, SSH tunnels are the primary source of secure | |
100 | connections. A connector file, such as `ipcontroller-client.json`, will contain |
|
100 | connections. A connector file, such as `ipcontroller-client.json`, will contain | |
101 | information for connecting to the controller, possibly including the address of an |
|
101 | information for connecting to the controller, possibly including the address of an | |
102 | ssh-server through with the client is to tunnel. The Client object then creates tunnels |
|
102 | ssh-server through with the client is to tunnel. The Client object then creates tunnels | |
103 | using either [OpenSSH]_ or [Paramiko]_, depending on the platform. If users do not wish to |
|
103 | using either [OpenSSH]_ or [Paramiko]_, depending on the platform. If users do not wish to | |
104 | use OpenSSH or Paramiko, or the tunneling utilities are insufficient, then they may |
|
104 | use OpenSSH or Paramiko, or the tunneling utilities are insufficient, then they may | |
105 | construct the tunnels themselves, and simply connect clients and engines as if the |
|
105 | construct the tunnels themselves, and simply connect clients and engines as if the | |
106 | controller were on loopback on the connecting machine. |
|
106 | controller were on loopback on the connecting machine. | |
107 |
|
107 | |||
108 | .. note:: |
|
108 | .. note:: | |
109 |
|
109 | |||
110 | There is not currently tunneling available for engines. |
|
110 | There is not currently tunneling available for engines. | |
111 |
|
111 | |||
112 | Authentication |
|
112 | Authentication | |
113 | -------------- |
|
113 | -------------- | |
114 |
|
114 | |||
115 | To protect users of shared machines, [HMAC]_ digests are used to sign messages, using a |
|
115 | To protect users of shared machines, [HMAC]_ digests are used to sign messages, using a | |
116 | shared key. |
|
116 | shared key. | |
117 |
|
117 | |||
118 | The Session object that handles the message protocol uses a unique key to verify valid |
|
118 | The Session object that handles the message protocol uses a unique key to verify valid | |
119 | messages. This can be any value specified by the user, but the default behavior is a |
|
119 | messages. This can be any value specified by the user, but the default behavior is a | |
120 | pseudo-random 128-bit number, as generated by `uuid.uuid4()`. This key is used to |
|
120 | pseudo-random 128-bit number, as generated by `uuid.uuid4()`. This key is used to | |
121 | initialize an HMAC object, which digests all messages, and includes that digest as a |
|
121 | initialize an HMAC object, which digests all messages, and includes that digest as a | |
122 | signature and part of the message. Every message that is unpacked (on Controller, Engine, |
|
122 | signature and part of the message. Every message that is unpacked (on Controller, Engine, | |
123 | and Client) will also be digested by the receiver, ensuring that the sender's key is the |
|
123 | and Client) will also be digested by the receiver, ensuring that the sender's key is the | |
124 | same as the receiver's. No messages that do not contain this key are acted upon in any |
|
124 | same as the receiver's. No messages that do not contain this key are acted upon in any | |
125 | way. The key itself is never sent over the network. |
|
125 | way. The key itself is never sent over the network. | |
126 |
|
126 | |||
127 | There is exactly one shared key per cluster - it must be the same everywhere. Typically, |
|
127 | There is exactly one shared key per cluster - it must be the same everywhere. Typically, | |
128 | the controller creates this key, and stores it in the private connection files |
|
128 | the controller creates this key, and stores it in the private connection files | |
129 | `ipython-{engine|client}.json`. These files are typically stored in the |
|
129 | `ipython-{engine|client}.json`. These files are typically stored in the | |
130 | `~/.ipython/profile_<name>/security` directory, and are maintained as readable only by the |
|
130 | `~/.ipython/profile_<name>/security` directory, and are maintained as readable only by the | |
131 | owner, just as is common practice with a user's keys in their `.ssh` directory. |
|
131 | owner, just as is common practice with a user's keys in their `.ssh` directory. | |
132 |
|
132 | |||
133 | .. warning:: |
|
133 | .. warning:: | |
134 |
|
134 | |||
135 |
It is important to note that the |
|
135 | It is important to note that the signatures protect against unauthorized messages, | |
136 | a uuid rather than generating a key with a cryptographic library, provides a |
|
136 | but, as there is no encryption, provide exactly no protection of data privacy. It is | |
137 | defense against *accidental* messages more than it does against malicious attacks. |
|
137 | possible, however, to use a custom serialization scheme (via Session.packer/unpacker | |
138 | If loopback is compromised, it would be trivial for an attacker to intercept messages |
|
138 | traits) that does incorporate your own encryption scheme. | |
139 | and deduce the key, as there is no encryption. |
|
|||
140 |
|
139 | |||
141 |
|
140 | |||
142 |
|
141 | |||
143 | Specific security vulnerabilities |
|
142 | Specific security vulnerabilities | |
144 | ================================= |
|
143 | ================================= | |
145 |
|
144 | |||
146 | There are a number of potential security vulnerabilities present in IPython's |
|
145 | There are a number of potential security vulnerabilities present in IPython's | |
147 | architecture. In this section we discuss those vulnerabilities and detail how |
|
146 | architecture. In this section we discuss those vulnerabilities and detail how | |
148 | the security architecture described above prevents them from being exploited. |
|
147 | the security architecture described above prevents them from being exploited. | |
149 |
|
148 | |||
150 | Unauthorized clients |
|
149 | Unauthorized clients | |
151 | -------------------- |
|
150 | -------------------- | |
152 |
|
151 | |||
153 | The IPython client can instruct the IPython engines to execute arbitrary |
|
152 | The IPython client can instruct the IPython engines to execute arbitrary | |
154 | Python code with the permissions of the user who started the engines. If an |
|
153 | Python code with the permissions of the user who started the engines. If an | |
155 | attacker were able to connect their own hostile IPython client to the IPython |
|
154 | attacker were able to connect their own hostile IPython client to the IPython | |
156 | controller, they could instruct the engines to execute code. |
|
155 | controller, they could instruct the engines to execute code. | |
157 |
|
156 | |||
158 |
|
157 | |||
159 | On the first level, this attack is prevented by requiring access to the controller's |
|
158 | On the first level, this attack is prevented by requiring access to the controller's | |
160 | ports, which are recommended to only be open on loopback if the controller is on an |
|
159 | ports, which are recommended to only be open on loopback if the controller is on an | |
161 | untrusted local network. If the attacker does have access to the Controller's ports, then |
|
160 | untrusted local network. If the attacker does have access to the Controller's ports, then | |
162 | the attack is prevented by the capabilities based client authentication of the execution |
|
161 | the attack is prevented by the capabilities based client authentication of the execution | |
163 | key. The relevant authentication information is encoded into the JSON file that clients |
|
162 | key. The relevant authentication information is encoded into the JSON file that clients | |
164 | must present to gain access to the IPython controller. By limiting the distribution of |
|
163 | must present to gain access to the IPython controller. By limiting the distribution of | |
165 | those keys, a user can grant access to only authorized persons, just as with SSH keys. |
|
164 | those keys, a user can grant access to only authorized persons, just as with SSH keys. | |
166 |
|
165 | |||
167 | It is highly unlikely that an execution key could be guessed by an attacker |
|
166 | It is highly unlikely that an execution key could be guessed by an attacker | |
168 | in a brute force guessing attack. A given instance of the IPython controller |
|
167 | in a brute force guessing attack. A given instance of the IPython controller | |
169 | only runs for a relatively short amount of time (on the order of hours). Thus |
|
168 | only runs for a relatively short amount of time (on the order of hours). Thus | |
170 | an attacker would have only a limited amount of time to test a search space of |
|
169 | an attacker would have only a limited amount of time to test a search space of | |
171 | size 2**128. For added security, users can have arbitrarily long keys. |
|
170 | size 2**128. For added security, users can have arbitrarily long keys. | |
172 |
|
171 | |||
173 | .. warning:: |
|
172 | .. warning:: | |
174 |
|
173 | |||
175 | If the attacker has gained enough access to intercept loopback connections on *either* the |
|
174 | If the attacker has gained enough access to intercept loopback connections on *either* the | |
176 | controller or client, then a duplicate message can be sent. To protect against this, |
|
175 | controller or client, then a duplicate message can be sent. To protect against this, | |
177 | recipients only allow each signature once, and consider duplicates invalid. However, |
|
176 | recipients only allow each signature once, and consider duplicates invalid. However, | |
178 | the duplicate message could be sent to *another* recipient using the same key, |
|
177 | the duplicate message could be sent to *another* recipient using the same key, | |
179 | and it would be considered valid. |
|
178 | and it would be considered valid. | |
180 |
|
179 | |||
181 |
|
180 | |||
182 | Unauthorized engines |
|
181 | Unauthorized engines | |
183 | -------------------- |
|
182 | -------------------- | |
184 |
|
183 | |||
185 | If an attacker were able to connect a hostile engine to a user's controller, |
|
184 | If an attacker were able to connect a hostile engine to a user's controller, | |
186 | the user might unknowingly send sensitive code or data to the hostile engine. |
|
185 | the user might unknowingly send sensitive code or data to the hostile engine. | |
187 | This attacker's engine would then have full access to that code and data. |
|
186 | This attacker's engine would then have full access to that code and data. | |
188 |
|
187 | |||
189 | This type of attack is prevented in the same way as the unauthorized client |
|
188 | This type of attack is prevented in the same way as the unauthorized client | |
190 | attack, through the usage of the capabilities based authentication scheme. |
|
189 | attack, through the usage of the capabilities based authentication scheme. | |
191 |
|
190 | |||
192 | Unauthorized controllers |
|
191 | Unauthorized controllers | |
193 | ------------------------ |
|
192 | ------------------------ | |
194 |
|
193 | |||
195 | It is also possible that an attacker could try to convince a user's IPython |
|
194 | It is also possible that an attacker could try to convince a user's IPython | |
196 | client or engine to connect to a hostile IPython controller. That controller |
|
195 | client or engine to connect to a hostile IPython controller. That controller | |
197 | would then have full access to the code and data sent between the IPython |
|
196 | would then have full access to the code and data sent between the IPython | |
198 | client and the IPython engines. |
|
197 | client and the IPython engines. | |
199 |
|
198 | |||
200 | Again, this attack is prevented through the capabilities in a connection file, which |
|
199 | Again, this attack is prevented through the capabilities in a connection file, which | |
201 | ensure that a client or engine connects to the correct controller. It is also important to |
|
200 | ensure that a client or engine connects to the correct controller. It is also important to | |
202 | note that the connection files also encode the IP address and port that the controller is |
|
201 | note that the connection files also encode the IP address and port that the controller is | |
203 | listening on, so there is little chance of mistakenly connecting to a controller running |
|
202 | listening on, so there is little chance of mistakenly connecting to a controller running | |
204 | on a different IP address and port. |
|
203 | on a different IP address and port. | |
205 |
|
204 | |||
206 | When starting an engine or client, a user must specify the key to use |
|
205 | When starting an engine or client, a user must specify the key to use | |
207 | for that connection. Thus, in order to introduce a hostile controller, the |
|
206 | for that connection. Thus, in order to introduce a hostile controller, the | |
208 | attacker must convince the user to use the key associated with the |
|
207 | attacker must convince the user to use the key associated with the | |
209 | hostile controller. As long as a user is diligent in only using keys from |
|
208 | hostile controller. As long as a user is diligent in only using keys from | |
210 | trusted sources, this attack is not possible. |
|
209 | trusted sources, this attack is not possible. | |
211 |
|
210 | |||
212 | .. note:: |
|
211 | .. note:: | |
213 |
|
212 | |||
214 | I may be wrong, the unauthorized controller may be easier to fake than this. |
|
213 | I may be wrong, the unauthorized controller may be easier to fake than this. | |
215 |
|
214 | |||
216 | Other security measures |
|
215 | Other security measures | |
217 | ======================= |
|
216 | ======================= | |
218 |
|
217 | |||
219 | A number of other measures are taken to further limit the security risks |
|
218 | A number of other measures are taken to further limit the security risks | |
220 | involved in running the IPython kernel. |
|
219 | involved in running the IPython kernel. | |
221 |
|
220 | |||
222 | First, by default, the IPython controller listens on random port numbers. |
|
221 | First, by default, the IPython controller listens on random port numbers. | |
223 | While this can be overridden by the user, in the default configuration, an |
|
222 | While this can be overridden by the user, in the default configuration, an | |
224 | attacker would have to do a port scan to even find a controller to attack. |
|
223 | attacker would have to do a port scan to even find a controller to attack. | |
225 | When coupled with the relatively short running time of a typical controller |
|
224 | When coupled with the relatively short running time of a typical controller | |
226 | (on the order of hours), an attacker would have to work extremely hard and |
|
225 | (on the order of hours), an attacker would have to work extremely hard and | |
227 | extremely *fast* to even find a running controller to attack. |
|
226 | extremely *fast* to even find a running controller to attack. | |
228 |
|
227 | |||
229 | Second, much of the time, especially when run on supercomputers or clusters, |
|
228 | Second, much of the time, especially when run on supercomputers or clusters, | |
230 | the controller is running behind a firewall. Thus, for engines or client to |
|
229 | the controller is running behind a firewall. Thus, for engines or client to | |
231 | connect to the controller: |
|
230 | connect to the controller: | |
232 |
|
231 | |||
233 | * The different processes have to all be behind the firewall. |
|
232 | * The different processes have to all be behind the firewall. | |
234 |
|
233 | |||
235 | or: |
|
234 | or: | |
236 |
|
235 | |||
237 | * The user has to use SSH port forwarding to tunnel the |
|
236 | * The user has to use SSH port forwarding to tunnel the | |
238 | connections through the firewall. |
|
237 | connections through the firewall. | |
239 |
|
238 | |||
240 | In either case, an attacker is presented with additional barriers that prevent |
|
239 | In either case, an attacker is presented with additional barriers that prevent | |
241 | attacking or even probing the system. |
|
240 | attacking or even probing the system. | |
242 |
|
241 | |||
243 | Summary |
|
242 | Summary | |
244 | ======= |
|
243 | ======= | |
245 |
|
244 | |||
246 | IPython's architecture has been carefully designed with security in mind. The |
|
245 | IPython's architecture has been carefully designed with security in mind. The | |
247 | capabilities based authentication model, in conjunction with SSH tunneled |
|
246 | capabilities based authentication model, in conjunction with SSH tunneled | |
248 | TCP/IP channels, address the core potential vulnerabilities in the system, |
|
247 | TCP/IP channels, address the core potential vulnerabilities in the system, | |
249 | while still enabling user's to use the system in open networks. |
|
248 | while still enabling user's to use the system in open networks. | |
250 |
|
249 | |||
251 | .. [RFC5246] <http://tools.ietf.org/html/rfc5246> |
|
250 | .. [RFC5246] <http://tools.ietf.org/html/rfc5246> | |
252 |
|
251 | |||
253 | .. [OpenSSH] <http://www.openssh.com/> |
|
252 | .. [OpenSSH] <http://www.openssh.com/> | |
254 | .. [Paramiko] <http://www.lag.net/paramiko/> |
|
253 | .. [Paramiko] <http://www.lag.net/paramiko/> | |
255 | .. [HMAC] <http://tools.ietf.org/html/rfc2104.html> |
|
254 | .. [HMAC] <http://tools.ietf.org/html/rfc2104.html> |
General Comments 0
You need to be logged in to leave comments.
Login now