##// END OF EJS Templates
Adding a complete description of the IPython security model.
Brian Granger -
Show More
@@ -1,14 +1,15 b''
1 1 .. _parallel_index:
2 2
3 3 ====================================
4 4 Using IPython for parallel computing
5 5 ====================================
6 6
7 7 .. toctree::
8 8 :maxdepth: 2
9 9
10 10 parallel_intro.txt
11 11 parallel_multiengine.txt
12 12 parallel_task.txt
13 13 parallel_mpi.txt
14 parallel_security.txt
14 15
@@ -1,327 +1,330 b''
1 1 .. _ip1par:
2 2
3 3 ============================
4 4 Overview and getting started
5 5 ============================
6 6
7 7 .. contents::
8 8
9 9 Introduction
10 10 ============
11 11
12 12 This file gives an overview of IPython's sophisticated and
13 13 powerful architecture for parallel and distributed computing. This
14 14 architecture abstracts out parallelism in a very general way, which
15 15 enables IPython to support many different styles of parallelism
16 16 including:
17 17
18 18 * Single program, multiple data (SPMD) parallelism.
19 19 * Multiple program, multiple data (MPMD) parallelism.
20 20 * Message passing using ``MPI``.
21 21 * Task farming.
22 22 * Data parallel.
23 23 * Combinations of these approaches.
24 24 * Custom user defined approaches.
25 25
26 26 Most importantly, IPython enables all types of parallel applications to
27 27 be developed, executed, debugged and monitored *interactively*. Hence,
28 28 the ``I`` in IPython. The following are some example usage cases for IPython:
29 29
30 30 * Quickly parallelize algorithms that are embarrassingly parallel
31 31 using a number of simple approaches. Many simple things can be
32 32 parallelized interactively in one or two lines of code.
33 33
34 34 * Steer traditional MPI applications on a supercomputer from an
35 35 IPython session on your laptop.
36 36
37 37 * Analyze and visualize large datasets (that could be remote and/or
38 38 distributed) interactively using IPython and tools like
39 39 matplotlib/TVTK.
40 40
41 41 * Develop, test and debug new parallel algorithms
42 42 (that may use MPI) interactively.
43 43
44 44 * Tie together multiple MPI jobs running on different systems into
45 45 one giant distributed and parallel system.
46 46
47 47 * Start a parallel job on your cluster and then have a remote
48 48 collaborator connect to it and pull back data into their
49 49 local IPython session for plotting and analysis.
50 50
51 51 * Run a set of tasks on a set of CPUs using dynamic load balancing.
52 52
53 53 Architecture overview
54 54 =====================
55 55
56 56 The IPython architecture consists of three components:
57 57
58 58 * The IPython engine.
59 59 * The IPython controller.
60 60 * Various controller clients.
61 61
62 62 These components live in the :mod:`IPython.kernel` package and are
63 63 installed with IPython. They do, however, have additional dependencies
64 64 that must be installed. For more information, see our
65 65 :ref:`installation documentation <install_index>`.
66 66
67 67 IPython engine
68 68 ---------------
69 69
70 70 The IPython engine is a Python instance that takes Python commands over a
71 71 network connection. Eventually, the IPython engine will be a full IPython
72 72 interpreter, but for now, it is a regular Python interpreter. The engine
73 73 can also handle incoming and outgoing Python objects sent over a network
74 74 connection. When multiple engines are started, parallel and distributed
75 75 computing becomes possible. An important feature of an IPython engine is
76 76 that it blocks while user code is being executed. Read on for how the
77 77 IPython controller solves this problem to expose a clean asynchronous API
78 78 to the user.
79 79
80 80 IPython controller
81 81 ------------------
82 82
83 83 The IPython controller provides an interface for working with a set of
84 84 engines. At an general level, the controller is a process to which
85 85 IPython engines can connect. For each connected engine, the controller
86 86 manages a queue. All actions that can be performed on the engine go
87 87 through this queue. While the engines themselves block when user code is
88 88 run, the controller hides that from the user to provide a fully
89 89 asynchronous interface to a set of engines.
90 90
91 91 .. note::
92 92
93 93 Because the controller listens on a network port for engines to
94 94 connect to it, it must be started *before* any engines are started.
95 95
96 96 The controller also provides a single point of contact for users who wish
97 97 to utilize the engines connected to the controller. There are different
98 98 ways of working with a controller. In IPython these ways correspond to different interfaces that the controller is adapted to. Currently we have two default interfaces to the controller:
99 99
100 100 * The MultiEngine interface, which provides the simplest possible way of working
101 101 with engines interactively.
102 102 * The Task interface, which provides presents the engines as a load balanced
103 103 task farming system.
104 104
105 105 Advanced users can easily add new custom interfaces to enable other
106 106 styles of parallelism.
107 107
108 108 .. note::
109 109
110 110 A single controller and set of engines can be accessed
111 111 through multiple interfaces simultaneously. This opens the
112 112 door for lots of interesting things.
113 113
114 114 Controller clients
115 115 ------------------
116 116
117 117 For each controller interface, there is a corresponding client. These
118 118 clients allow users to interact with a set of engines through the
119 119 interface. Here are the two default clients:
120 120
121 121 * The :class:`MultiEngineClient` class.
122 122 * The :class:`TaskClient` class.
123 123
124 124 Security
125 125 --------
126 126
127 127 By default (as long as `pyOpenSSL` is installed) all network connections between the controller and engines and the controller and clients are secure. What does this mean? First of all, all of the connections will be encrypted using SSL. Second, the connections are authenticated. We handle authentication in a `capabilities`__ based security model. In this model, a "capability (known in some systems as a key) is a communicable, unforgeable token of authority". Put simply, a capability is like a key to your house. If you have the key to your house, you can get in. If not, you can't.
128 128
129 129 .. __: http://en.wikipedia.org/wiki/Capability-based_security
130 130
131 131 In our architecture, the controller is the only process that listens on network ports, and is thus responsible to creating these keys. In IPython, these keys are known as Foolscap URLs, or FURLs, because of the underlying network protocol we are using. As a user, you don't need to know anything about the details of these FURLs, other than that when the controller starts, it saves a set of FURLs to files named :file:`something.furl`. The default location of these files is the :file:`~./ipython/security` directory.
132 132
133 133 To connect and authenticate to the controller an engine or client simply needs to present an appropriate furl (that was originally created by the controller) to the controller. Thus, the .furl files need to be copied to a location where the clients and engines can find them. Typically, this is the :file:`~./ipython/security` directory on the host where the client/engine is running (which could be a different host than the controller). Once the .furl files are copied over, everything should work fine.
134 134
135 135 Currently, there are three .furl files that the controller creates:
136 136
137 137 ipcontroller-engine.furl
138 138 This ``.furl`` file is the key that gives an engine the ability to connect
139 139 to a controller.
140 140
141 141 ipcontroller-tc.furl
142 142 This ``.furl`` file is the key that a :class:`TaskClient` must use to
143 143 connect to the task interface of a controller.
144 144
145 145 ipcontroller-mec.furl
146 146 This ``.furl`` file is the key that a :class:`MultiEngineClient` must use to
147 147 connect to the multiengine interface of a controller.
148 148
149 149 More details of how these ``.furl`` files are used are given below.
150 150
151 A detailed description of the security model and its implementation in IPython
152 can be found :ref:`here <parallelsecurity>`.
153
151 154 Getting Started
152 155 ===============
153 156
154 157 To use IPython for parallel computing, you need to start one instance of
155 158 the controller and one or more instances of the engine. The controller
156 159 and each engine can run on different machines or on the same machine.
157 160 Because of this, there are many different possibilities for setting up
158 161 the IP addresses and ports used by the various processes.
159 162
160 163 Starting the controller and engine on your local machine
161 164 --------------------------------------------------------
162 165
163 166 This is the simplest configuration that can be used and is useful for
164 167 testing the system and on machines that have multiple cores and/or
165 168 multple CPUs. The easiest way of getting started is to use the :command:`ipcluster`
166 169 command::
167 170
168 171 $ ipcluster -n 4
169 172
170 173 This will start an IPython controller and then 4 engines that connect to
171 174 the controller. Lastly, the script will print out the Python commands
172 175 that you can use to connect to the controller. It is that easy.
173 176
174 177 .. warning::
175 178
176 179 The :command:`ipcluster` does not currently work on Windows. We are
177 180 working on it though.
178 181
179 182 Underneath the hood, the controller creates ``.furl`` files in the
180 183 :file:`~./ipython/security` directory. Because the engines are on the
181 184 same host, they automatically find the needed :file:`ipcontroller-engine.furl`
182 185 there and use it to connect to the controller.
183 186
184 187 The :command:`ipcluster` script uses two other top-level
185 188 scripts that you can also use yourself. These scripts are
186 189 :command:`ipcontroller`, which starts the controller and :command:`ipengine` which
187 190 starts one engine. To use these scripts to start things on your local
188 191 machine, do the following.
189 192
190 193 First start the controller::
191 194
192 195 $ ipcontroller
193 196
194 197 Next, start however many instances of the engine you want using (repeatedly) the command::
195 198
196 199 $ ipengine
197 200
198 201 The engines should start and automatically connect to the controller using the ``.furl`` files in :file:`~./ipython/security`. You are now ready to use the controller and engines from IPython.
199 202
200 203 .. warning::
201 204
202 205 The order of the above operations is very important. You *must*
203 206 start the controller before the engines, since the engines connect
204 207 to the controller as they get started.
205 208
206 209 .. note::
207 210
208 211 On some platforms (OS X), to put the controller and engine into the background
209 212 you may need to give these commands in the form ``(ipcontroller &)``
210 213 and ``(ipengine &)`` (with the parentheses) for them to work properly.
211 214
212 215
213 216 Starting the controller and engines on different hosts
214 217 ------------------------------------------------------
215 218
216 219 When the controller and engines are running on different hosts, things are
217 220 slightly more complicated, but the underlying ideas are the same:
218 221
219 222 1. Start the controller on a host using :command:`ipcontroler`.
220 223 2. Copy :file:`ipcontroller-engine.furl` from :file:`~./ipython/security` on the controller's host to the host where the engines will run.
221 224 3. Use :command:`ipengine` on the engine's hosts to start the engines.
222 225
223 226 The only thing you have to be careful of is to tell :command:`ipengine` where the :file:`ipcontroller-engine.furl` file is located. There are two ways you can do this:
224 227
225 228 * Put :file:`ipcontroller-engine.furl` in the :file:`~./ipython/security` directory
226 229 on the engine's host, where it will be found automatically.
227 230 * Call :command:`ipengine` with the ``--furl-file=full_path_to_the_file`` flag.
228 231
229 232 The ``--furl-file`` flag works like this::
230 233
231 234 $ ipengine --furl-file=/path/to/my/ipcontroller-engine.furl
232 235
233 236 .. note::
234 237
235 238 If the controller's and engine's hosts all have a shared file system
236 239 (:file:`~./ipython/security` is the same on all of them), then things
237 240 will just work!
238 241
239 242 Make .furl files persistent
240 243 ---------------------------
241 244
242 245 At fist glance it may seem that that managing the ``.furl`` files is a bit annoying. Going back to the house and key analogy, copying the ``.furl`` around each time you start the controller is like having to make a new key everytime you want to unlock the door and enter your house. As with your house, you want to be able to create the key (or ``.furl`` file) once, and then simply use it at any point in the future.
243 246
244 247 This is possible. The only thing you have to do is decide what ports the controller will listen on for the engines and clients. This is done as follows::
245 248
246 249 $ ipcontroller --client-port=10101 --engine-port=10102
247 250
248 251 Then, just copy the furl files over the first time and you are set. You can start and stop the controller and engines any many times as you want in the future, just make sure to tell the controller to use the *same* ports.
249 252
250 253 .. note::
251 254
252 255 You may ask the question: what ports does the controller listen on if you
253 256 don't tell is to use specific ones? The default is to use high random port
254 257 numbers. We do this for two reasons: i) to increase security through obcurity
255 258 and ii) to multiple controllers on a given host to start and automatically
256 259 use different ports.
257 260
258 261 Starting engines using ``mpirun``
259 262 ---------------------------------
260 263
261 264 The IPython engines can be started using ``mpirun``/``mpiexec``, even if
262 265 the engines don't call ``MPI_Init()`` or use the MPI API in any way. This is
263 266 supported on modern MPI implementations like `Open MPI`_.. This provides
264 267 an really nice way of starting a bunch of engine. On a system with MPI
265 268 installed you can do::
266 269
267 270 mpirun -n 4 ipengine
268 271
269 272 to start 4 engine on a cluster. This works even if you don't have any
270 273 Python-MPI bindings installed.
271 274
272 275 .. _Open MPI: http://www.open-mpi.org/
273 276
274 277 More details on using MPI with IPython can be found :ref:`here <parallelmpi>`.
275 278
276 279 Log files
277 280 ---------
278 281
279 282 All of the components of IPython have log files associated with them.
280 283 These log files can be extremely useful in debugging problems with
281 284 IPython and can be found in the directory ``~/.ipython/log``. Sending
282 285 the log files to us will often help us to debug any problems.
283 286
284 287 Next Steps
285 288 ==========
286 289
287 290 Once you have started the IPython controller and one or more engines, you
288 291 are ready to use the engines to do something useful. To make sure
289 292 everything is working correctly, try the following commands::
290 293
291 294 In [1]: from IPython.kernel import client
292 295
293 296 In [2]: mec = client.MultiEngineClient()
294 297
295 298 In [4]: mec.get_ids()
296 299 Out[4]: [0, 1, 2, 3]
297 300
298 301 In [5]: mec.execute('print "Hello World"')
299 302 Out[5]:
300 303 <Results List>
301 304 [0] In [1]: print "Hello World"
302 305 [0] Out[1]: Hello World
303 306
304 307 [1] In [1]: print "Hello World"
305 308 [1] Out[1]: Hello World
306 309
307 310 [2] In [1]: print "Hello World"
308 311 [2] Out[1]: Hello World
309 312
310 313 [3] In [1]: print "Hello World"
311 314 [3] Out[1]: Hello World
312 315
313 316 Remember, a client also needs to present a ``.furl`` file to the controller. How does this happen? When a multiengine client is created with no arguments, the client tries to find the corresponding ``.furl`` file in the local :file:`~./ipython/security` directory. If it finds it, you are set. If you have put the ``.furl`` file in a different location or it has a different name, create the client like this::
314 317
315 318 mec = client.MultiEngineClient('/path/to/my/ipcontroller-mec.furl')
316 319
317 320 Same thing hold true of creating a task client::
318 321
319 322 tc = client.TaskClient('/path/to/my/ipcontroller-tc.furl')
320 323
321 324 You are now ready to learn more about the :ref:`MultiEngine <parallelmultiengine>` and :ref:`Task <paralleltask>` interfaces to the controller.
322 325
323 326 .. note::
324 327
325 328 Don't forget that the engine, multiengine client and task client all have
326 329 *different* furl files. You must move *each* of these around to an appropriate
327 330 location so that the engines and clients can use them to connect to the controller.
General Comments 0
You need to be logged in to leave comments. Login now