##// END OF EJS Templates
remove now-obsolete note that engine's don't support ssh
MinRK -
Show More
@@ -1,254 +1,251 b''
1 .. _parallelsecurity:
1 .. _parallelsecurity:
2
2
3 ===========================
3 ===========================
4 Security details of IPython
4 Security details of IPython
5 ===========================
5 ===========================
6
6
7 .. note::
7 .. note::
8
8
9 This section is not thorough, and IPython.zmq needs a thorough security
9 This section is not thorough, and IPython.zmq needs a thorough security
10 audit.
10 audit.
11
11
12 IPython's :mod:`IPython.zmq` package exposes the full power of the
12 IPython's :mod:`IPython.zmq` package exposes the full power of the
13 Python interpreter over a TCP/IP network for the purposes of parallel
13 Python interpreter over a TCP/IP network for the purposes of parallel
14 computing. This feature brings up the important question of IPython's security
14 computing. This feature brings up the important question of IPython's security
15 model. This document gives details about this model and how it is implemented
15 model. This document gives details about this model and how it is implemented
16 in IPython's architecture.
16 in IPython's architecture.
17
17
18 Process and network topology
18 Process and network topology
19 ============================
19 ============================
20
20
21 To enable parallel computing, IPython has a number of different processes that
21 To enable parallel computing, IPython has a number of different processes that
22 run. These processes are discussed at length in the IPython documentation and
22 run. These processes are discussed at length in the IPython documentation and
23 are summarized here:
23 are summarized here:
24
24
25 * The IPython *engine*. This process is a full blown Python
25 * The IPython *engine*. This process is a full blown Python
26 interpreter in which user code is executed. Multiple
26 interpreter in which user code is executed. Multiple
27 engines are started to make parallel computing possible.
27 engines are started to make parallel computing possible.
28 * The IPython *hub*. This process monitors a set of
28 * The IPython *hub*. This process monitors a set of
29 engines and schedulers, and keeps track of the state of the processes. It listens
29 engines and schedulers, and keeps track of the state of the processes. It listens
30 for registration connections from engines and clients, and monitor connections
30 for registration connections from engines and clients, and monitor connections
31 from schedulers.
31 from schedulers.
32 * The IPython *schedulers*. This is a set of processes that relay commands and results
32 * The IPython *schedulers*. This is a set of processes that relay commands and results
33 between clients and engines. They are typically on the same machine as the controller,
33 between clients and engines. They are typically on the same machine as the controller,
34 and listen for connections from engines and clients, but connect to the Hub.
34 and listen for connections from engines and clients, but connect to the Hub.
35 * The IPython *client*. This process is typically an
35 * The IPython *client*. This process is typically an
36 interactive Python process that is used to coordinate the
36 interactive Python process that is used to coordinate the
37 engines to get a parallel computation done.
37 engines to get a parallel computation done.
38
38
39 Collectively, these processes are called the IPython *cluster*, and the hub and schedulers
39 Collectively, these processes are called the IPython *cluster*, and the hub and schedulers
40 together are referred to as the *controller*.
40 together are referred to as the *controller*.
41
41
42
42
43 These processes communicate over any transport supported by ZeroMQ (tcp,pgm,infiniband,ipc)
43 These processes communicate over any transport supported by ZeroMQ (tcp,pgm,infiniband,ipc)
44 with a well defined topology. The IPython hub and schedulers listen on sockets. Upon
44 with a well defined topology. The IPython hub and schedulers listen on sockets. Upon
45 starting, an engine connects to a hub and registers itself, which then informs the engine
45 starting, an engine connects to a hub and registers itself, which then informs the engine
46 of the connection information for the schedulers, and the engine then connects to the
46 of the connection information for the schedulers, and the engine then connects to the
47 schedulers. These engine/hub and engine/scheduler connections persist for the
47 schedulers. These engine/hub and engine/scheduler connections persist for the
48 lifetime of each engine.
48 lifetime of each engine.
49
49
50 The IPython client also connects to the controller processes using a number of socket
50 The IPython client also connects to the controller processes using a number of socket
51 connections. As of writing, this is one socket per scheduler (4), and 3 connections to the
51 connections. As of writing, this is one socket per scheduler (4), and 3 connections to the
52 hub for a total of 7. These connections persist for the lifetime of the client only.
52 hub for a total of 7. These connections persist for the lifetime of the client only.
53
53
54 A given IPython controller and set of engines engines typically has a relatively
54 A given IPython controller and set of engines engines typically has a relatively
55 short lifetime. Typically this lifetime corresponds to the duration of a single parallel
55 short lifetime. Typically this lifetime corresponds to the duration of a single parallel
56 simulation performed by a single user. Finally, the hub, schedulers, engines, and client
56 simulation performed by a single user. Finally, the hub, schedulers, engines, and client
57 processes typically execute with the permissions of that same user. More specifically, the
57 processes typically execute with the permissions of that same user. More specifically, the
58 controller and engines are *not* executed as root or with any other superuser permissions.
58 controller and engines are *not* executed as root or with any other superuser permissions.
59
59
60 Application logic
60 Application logic
61 =================
61 =================
62
62
63 When running the IPython kernel to perform a parallel computation, a user
63 When running the IPython kernel to perform a parallel computation, a user
64 utilizes the IPython client to send Python commands and data through the
64 utilizes the IPython client to send Python commands and data through the
65 IPython schedulers to the IPython engines, where those commands are executed
65 IPython schedulers to the IPython engines, where those commands are executed
66 and the data processed. The design of IPython ensures that the client is the
66 and the data processed. The design of IPython ensures that the client is the
67 only access point for the capabilities of the engines. That is, the only way
67 only access point for the capabilities of the engines. That is, the only way
68 of addressing the engines is through a client.
68 of addressing the engines is through a client.
69
69
70 A user can utilize the client to instruct the IPython engines to execute
70 A user can utilize the client to instruct the IPython engines to execute
71 arbitrary Python commands. These Python commands can include calls to the
71 arbitrary Python commands. These Python commands can include calls to the
72 system shell, access the filesystem, etc., as required by the user's
72 system shell, access the filesystem, etc., as required by the user's
73 application code. From this perspective, when a user runs an IPython engine on
73 application code. From this perspective, when a user runs an IPython engine on
74 a host, that engine has the same capabilities and permissions as the user
74 a host, that engine has the same capabilities and permissions as the user
75 themselves (as if they were logged onto the engine's host with a terminal).
75 themselves (as if they were logged onto the engine's host with a terminal).
76
76
77 Secure network connections
77 Secure network connections
78 ==========================
78 ==========================
79
79
80 Overview
80 Overview
81 --------
81 --------
82
82
83 ZeroMQ provides exactly no security. For this reason, users of IPython must be very
83 ZeroMQ provides exactly no security. For this reason, users of IPython must be very
84 careful in managing connections, because an open TCP/IP socket presents access to
84 careful in managing connections, because an open TCP/IP socket presents access to
85 arbitrary execution as the user on the engine machines. As a result, the default behavior
85 arbitrary execution as the user on the engine machines. As a result, the default behavior
86 of controller processes is to only listen for clients on the loopback interface, and the
86 of controller processes is to only listen for clients on the loopback interface, and the
87 client must establish SSH tunnels to connect to the controller processes.
87 client must establish SSH tunnels to connect to the controller processes.
88
88
89 .. warning::
89 .. warning::
90
90
91 If the controller's loopback interface is untrusted, then IPython should be considered
91 If the controller's loopback interface is untrusted, then IPython should be considered
92 vulnerable, and this extends to the loopback of all connected clients, which have
92 vulnerable, and this extends to the loopback of all connected clients, which have
93 opened a loopback port that is redirected to the controller's loopback port.
93 opened a loopback port that is redirected to the controller's loopback port.
94
94
95
95
96 SSH
96 SSH
97 ---
97 ---
98
98
99 Since ZeroMQ provides no security, SSH tunnels are the primary source of secure
99 Since ZeroMQ provides no security, SSH tunnels are the primary source of secure
100 connections. A connector file, such as `ipcontroller-client.json`, will contain
100 connections. A connector file, such as `ipcontroller-client.json`, will contain
101 information for connecting to the controller, possibly including the address of an
101 information for connecting to the controller, possibly including the address of an
102 ssh-server through with the client is to tunnel. The Client object then creates tunnels
102 ssh-server through with the client is to tunnel. The Client object then creates tunnels
103 using either [OpenSSH]_ or [Paramiko]_, depending on the platform. If users do not wish to
103 using either [OpenSSH]_ or [Paramiko]_, depending on the platform. If users do not wish to
104 use OpenSSH or Paramiko, or the tunneling utilities are insufficient, then they may
104 use OpenSSH or Paramiko, or the tunneling utilities are insufficient, then they may
105 construct the tunnels themselves, and simply connect clients and engines as if the
105 construct the tunnels themselves, and simply connect clients and engines as if the
106 controller were on loopback on the connecting machine.
106 controller were on loopback on the connecting machine.
107
107
108 .. note::
109
110 There is not currently tunneling available for engines.
111
108
112 Authentication
109 Authentication
113 --------------
110 --------------
114
111
115 To protect users of shared machines, [HMAC]_ digests are used to sign messages, using a
112 To protect users of shared machines, [HMAC]_ digests are used to sign messages, using a
116 shared key.
113 shared key.
117
114
118 The Session object that handles the message protocol uses a unique key to verify valid
115 The Session object that handles the message protocol uses a unique key to verify valid
119 messages. This can be any value specified by the user, but the default behavior is a
116 messages. This can be any value specified by the user, but the default behavior is a
120 pseudo-random 128-bit number, as generated by `uuid.uuid4()`. This key is used to
117 pseudo-random 128-bit number, as generated by `uuid.uuid4()`. This key is used to
121 initialize an HMAC object, which digests all messages, and includes that digest as a
118 initialize an HMAC object, which digests all messages, and includes that digest as a
122 signature and part of the message. Every message that is unpacked (on Controller, Engine,
119 signature and part of the message. Every message that is unpacked (on Controller, Engine,
123 and Client) will also be digested by the receiver, ensuring that the sender's key is the
120 and Client) will also be digested by the receiver, ensuring that the sender's key is the
124 same as the receiver's. No messages that do not contain this key are acted upon in any
121 same as the receiver's. No messages that do not contain this key are acted upon in any
125 way. The key itself is never sent over the network.
122 way. The key itself is never sent over the network.
126
123
127 There is exactly one shared key per cluster - it must be the same everywhere. Typically,
124 There is exactly one shared key per cluster - it must be the same everywhere. Typically,
128 the controller creates this key, and stores it in the private connection files
125 the controller creates this key, and stores it in the private connection files
129 `ipython-{engine|client}.json`. These files are typically stored in the
126 `ipython-{engine|client}.json`. These files are typically stored in the
130 `~/.ipython/profile_<name>/security` directory, and are maintained as readable only by the
127 `~/.ipython/profile_<name>/security` directory, and are maintained as readable only by the
131 owner, just as is common practice with a user's keys in their `.ssh` directory.
128 owner, just as is common practice with a user's keys in their `.ssh` directory.
132
129
133 .. warning::
130 .. warning::
134
131
135 It is important to note that the signatures protect against unauthorized messages,
132 It is important to note that the signatures protect against unauthorized messages,
136 but, as there is no encryption, provide exactly no protection of data privacy. It is
133 but, as there is no encryption, provide exactly no protection of data privacy. It is
137 possible, however, to use a custom serialization scheme (via Session.packer/unpacker
134 possible, however, to use a custom serialization scheme (via Session.packer/unpacker
138 traits) that does incorporate your own encryption scheme.
135 traits) that does incorporate your own encryption scheme.
139
136
140
137
141
138
142 Specific security vulnerabilities
139 Specific security vulnerabilities
143 =================================
140 =================================
144
141
145 There are a number of potential security vulnerabilities present in IPython's
142 There are a number of potential security vulnerabilities present in IPython's
146 architecture. In this section we discuss those vulnerabilities and detail how
143 architecture. In this section we discuss those vulnerabilities and detail how
147 the security architecture described above prevents them from being exploited.
144 the security architecture described above prevents them from being exploited.
148
145
149 Unauthorized clients
146 Unauthorized clients
150 --------------------
147 --------------------
151
148
152 The IPython client can instruct the IPython engines to execute arbitrary
149 The IPython client can instruct the IPython engines to execute arbitrary
153 Python code with the permissions of the user who started the engines. If an
150 Python code with the permissions of the user who started the engines. If an
154 attacker were able to connect their own hostile IPython client to the IPython
151 attacker were able to connect their own hostile IPython client to the IPython
155 controller, they could instruct the engines to execute code.
152 controller, they could instruct the engines to execute code.
156
153
157
154
158 On the first level, this attack is prevented by requiring access to the controller's
155 On the first level, this attack is prevented by requiring access to the controller's
159 ports, which are recommended to only be open on loopback if the controller is on an
156 ports, which are recommended to only be open on loopback if the controller is on an
160 untrusted local network. If the attacker does have access to the Controller's ports, then
157 untrusted local network. If the attacker does have access to the Controller's ports, then
161 the attack is prevented by the capabilities based client authentication of the execution
158 the attack is prevented by the capabilities based client authentication of the execution
162 key. The relevant authentication information is encoded into the JSON file that clients
159 key. The relevant authentication information is encoded into the JSON file that clients
163 must present to gain access to the IPython controller. By limiting the distribution of
160 must present to gain access to the IPython controller. By limiting the distribution of
164 those keys, a user can grant access to only authorized persons, just as with SSH keys.
161 those keys, a user can grant access to only authorized persons, just as with SSH keys.
165
162
166 It is highly unlikely that an execution key could be guessed by an attacker
163 It is highly unlikely that an execution key could be guessed by an attacker
167 in a brute force guessing attack. A given instance of the IPython controller
164 in a brute force guessing attack. A given instance of the IPython controller
168 only runs for a relatively short amount of time (on the order of hours). Thus
165 only runs for a relatively short amount of time (on the order of hours). Thus
169 an attacker would have only a limited amount of time to test a search space of
166 an attacker would have only a limited amount of time to test a search space of
170 size 2**128. For added security, users can have arbitrarily long keys.
167 size 2**128. For added security, users can have arbitrarily long keys.
171
168
172 .. warning::
169 .. warning::
173
170
174 If the attacker has gained enough access to intercept loopback connections on *either* the
171 If the attacker has gained enough access to intercept loopback connections on *either* the
175 controller or client, then a duplicate message can be sent. To protect against this,
172 controller or client, then a duplicate message can be sent. To protect against this,
176 recipients only allow each signature once, and consider duplicates invalid. However,
173 recipients only allow each signature once, and consider duplicates invalid. However,
177 the duplicate message could be sent to *another* recipient using the same key,
174 the duplicate message could be sent to *another* recipient using the same key,
178 and it would be considered valid.
175 and it would be considered valid.
179
176
180
177
181 Unauthorized engines
178 Unauthorized engines
182 --------------------
179 --------------------
183
180
184 If an attacker were able to connect a hostile engine to a user's controller,
181 If an attacker were able to connect a hostile engine to a user's controller,
185 the user might unknowingly send sensitive code or data to the hostile engine.
182 the user might unknowingly send sensitive code or data to the hostile engine.
186 This attacker's engine would then have full access to that code and data.
183 This attacker's engine would then have full access to that code and data.
187
184
188 This type of attack is prevented in the same way as the unauthorized client
185 This type of attack is prevented in the same way as the unauthorized client
189 attack, through the usage of the capabilities based authentication scheme.
186 attack, through the usage of the capabilities based authentication scheme.
190
187
191 Unauthorized controllers
188 Unauthorized controllers
192 ------------------------
189 ------------------------
193
190
194 It is also possible that an attacker could try to convince a user's IPython
191 It is also possible that an attacker could try to convince a user's IPython
195 client or engine to connect to a hostile IPython controller. That controller
192 client or engine to connect to a hostile IPython controller. That controller
196 would then have full access to the code and data sent between the IPython
193 would then have full access to the code and data sent between the IPython
197 client and the IPython engines.
194 client and the IPython engines.
198
195
199 Again, this attack is prevented through the capabilities in a connection file, which
196 Again, this attack is prevented through the capabilities in a connection file, which
200 ensure that a client or engine connects to the correct controller. It is also important to
197 ensure that a client or engine connects to the correct controller. It is also important to
201 note that the connection files also encode the IP address and port that the controller is
198 note that the connection files also encode the IP address and port that the controller is
202 listening on, so there is little chance of mistakenly connecting to a controller running
199 listening on, so there is little chance of mistakenly connecting to a controller running
203 on a different IP address and port.
200 on a different IP address and port.
204
201
205 When starting an engine or client, a user must specify the key to use
202 When starting an engine or client, a user must specify the key to use
206 for that connection. Thus, in order to introduce a hostile controller, the
203 for that connection. Thus, in order to introduce a hostile controller, the
207 attacker must convince the user to use the key associated with the
204 attacker must convince the user to use the key associated with the
208 hostile controller. As long as a user is diligent in only using keys from
205 hostile controller. As long as a user is diligent in only using keys from
209 trusted sources, this attack is not possible.
206 trusted sources, this attack is not possible.
210
207
211 .. note::
208 .. note::
212
209
213 I may be wrong, the unauthorized controller may be easier to fake than this.
210 I may be wrong, the unauthorized controller may be easier to fake than this.
214
211
215 Other security measures
212 Other security measures
216 =======================
213 =======================
217
214
218 A number of other measures are taken to further limit the security risks
215 A number of other measures are taken to further limit the security risks
219 involved in running the IPython kernel.
216 involved in running the IPython kernel.
220
217
221 First, by default, the IPython controller listens on random port numbers.
218 First, by default, the IPython controller listens on random port numbers.
222 While this can be overridden by the user, in the default configuration, an
219 While this can be overridden by the user, in the default configuration, an
223 attacker would have to do a port scan to even find a controller to attack.
220 attacker would have to do a port scan to even find a controller to attack.
224 When coupled with the relatively short running time of a typical controller
221 When coupled with the relatively short running time of a typical controller
225 (on the order of hours), an attacker would have to work extremely hard and
222 (on the order of hours), an attacker would have to work extremely hard and
226 extremely *fast* to even find a running controller to attack.
223 extremely *fast* to even find a running controller to attack.
227
224
228 Second, much of the time, especially when run on supercomputers or clusters,
225 Second, much of the time, especially when run on supercomputers or clusters,
229 the controller is running behind a firewall. Thus, for engines or client to
226 the controller is running behind a firewall. Thus, for engines or client to
230 connect to the controller:
227 connect to the controller:
231
228
232 * The different processes have to all be behind the firewall.
229 * The different processes have to all be behind the firewall.
233
230
234 or:
231 or:
235
232
236 * The user has to use SSH port forwarding to tunnel the
233 * The user has to use SSH port forwarding to tunnel the
237 connections through the firewall.
234 connections through the firewall.
238
235
239 In either case, an attacker is presented with additional barriers that prevent
236 In either case, an attacker is presented with additional barriers that prevent
240 attacking or even probing the system.
237 attacking or even probing the system.
241
238
242 Summary
239 Summary
243 =======
240 =======
244
241
245 IPython's architecture has been carefully designed with security in mind. The
242 IPython's architecture has been carefully designed with security in mind. The
246 capabilities based authentication model, in conjunction with SSH tunneled
243 capabilities based authentication model, in conjunction with SSH tunneled
247 TCP/IP channels, address the core potential vulnerabilities in the system,
244 TCP/IP channels, address the core potential vulnerabilities in the system,
248 while still enabling user's to use the system in open networks.
245 while still enabling user's to use the system in open networks.
249
246
250 .. [RFC5246] <http://tools.ietf.org/html/rfc5246>
247 .. [RFC5246] <http://tools.ietf.org/html/rfc5246>
251
248
252 .. [OpenSSH] <http://www.openssh.com/>
249 .. [OpenSSH] <http://www.openssh.com/>
253 .. [Paramiko] <http://www.lag.net/paramiko/>
250 .. [Paramiko] <http://www.lag.net/paramiko/>
254 .. [HMAC] <http://tools.ietf.org/html/rfc2104.html>
251 .. [HMAC] <http://tools.ietf.org/html/rfc2104.html>
General Comments 0
You need to be logged in to leave comments. Login now