##// END OF EJS Templates
Update command line args format in parallel docs section.
Thomas Kluyver -
Show More
@@ -1,284 +1,284 b''
1 1 =================
2 2 Parallel examples
3 3 =================
4 4
5 5 .. note::
6 6
7 7 Performance numbers from ``IPython.kernel``, not newparallel.
8 8
9 9 In this section we describe two more involved examples of using an IPython
10 10 cluster to perform a parallel computation. In these examples, we will be using
11 11 IPython's "pylab" mode, which enables interactive plotting using the
12 12 Matplotlib package. IPython can be started in this mode by typing::
13 13
14 14 ipython --pylab
15 15
16 16 at the system command line.
17 17
18 18 150 million digits of pi
19 19 ========================
20 20
21 21 In this example we would like to study the distribution of digits in the
22 22 number pi (in base 10). While it is not known if pi is a normal number (a
23 23 number is normal in base 10 if 0-9 occur with equal likelihood) numerical
24 24 investigations suggest that it is. We will begin with a serial calculation on
25 25 10,000 digits of pi and then perform a parallel calculation involving 150
26 26 million digits.
27 27
28 28 In both the serial and parallel calculation we will be using functions defined
29 29 in the :file:`pidigits.py` file, which is available in the
30 30 :file:`docs/examples/newparallel` directory of the IPython source distribution.
31 31 These functions provide basic facilities for working with the digits of pi and
32 32 can be loaded into IPython by putting :file:`pidigits.py` in your current
33 33 working directory and then doing:
34 34
35 35 .. sourcecode:: ipython
36 36
37 37 In [1]: run pidigits.py
38 38
39 39 Serial calculation
40 40 ------------------
41 41
42 42 For the serial calculation, we will use `SymPy <http://www.sympy.org>`_ to
43 43 calculate 10,000 digits of pi and then look at the frequencies of the digits
44 44 0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While
45 45 SymPy is capable of calculating many more digits of pi, our purpose here is to
46 46 set the stage for the much larger parallel calculation.
47 47
48 48 In this example, we use two functions from :file:`pidigits.py`:
49 49 :func:`one_digit_freqs` (which calculates how many times each digit occurs)
50 50 and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result).
51 51 Here is an interactive IPython session that uses these functions with
52 52 SymPy:
53 53
54 54 .. sourcecode:: ipython
55 55
56 56 In [7]: import sympy
57 57
58 58 In [8]: pi = sympy.pi.evalf(40)
59 59
60 60 In [9]: pi
61 61 Out[9]: 3.141592653589793238462643383279502884197
62 62
63 63 In [10]: pi = sympy.pi.evalf(10000)
64 64
65 65 In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits
66 66
67 67 In [12]: run pidigits.py # load one_digit_freqs/plot_one_digit_freqs
68 68
69 69 In [13]: freqs = one_digit_freqs(digits)
70 70
71 71 In [14]: plot_one_digit_freqs(freqs)
72 72 Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>]
73 73
74 74 The resulting plot of the single digit counts shows that each digit occurs
75 75 approximately 1,000 times, but that with only 10,000 digits the
76 76 statistical fluctuations are still rather large:
77 77
78 78 .. image:: single_digits.*
79 79
80 80 It is clear that to reduce the relative fluctuations in the counts, we need
81 81 to look at many more digits of pi. That brings us to the parallel calculation.
82 82
83 83 Parallel calculation
84 84 --------------------
85 85
86 86 Calculating many digits of pi is a challenging computational problem in itself.
87 87 Because we want to focus on the distribution of digits in this example, we
88 88 will use pre-computed digit of pi from the website of Professor Yasumasa
89 89 Kanada at the University of Tokyo (http://www.super-computing.org). These
90 90 digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/)
91 91 that each have 10 million digits of pi.
92 92
93 93 For the parallel calculation, we have copied these files to the local hard
94 94 drives of the compute nodes. A total of 15 of these files will be used, for a
95 95 total of 150 million digits of pi. To make things a little more interesting we
96 96 will calculate the frequencies of all 2 digits sequences (00-99) and then plot
97 97 the result using a 2D matrix in Matplotlib.
98 98
99 99 The overall idea of the calculation is simple: each IPython engine will
100 100 compute the two digit counts for the digits in a single file. Then in a final
101 101 step the counts from each engine will be added up. To perform this
102 102 calculation, we will need two top-level functions from :file:`pidigits.py`:
103 103
104 104 .. literalinclude:: ../../examples/newparallel/pidigits.py
105 105 :language: python
106 :lines: 41-56
106 :lines: 47-62
107 107
108 108 We will also use the :func:`plot_two_digit_freqs` function to plot the
109 109 results. The code to run this calculation in parallel is contained in
110 110 :file:`docs/examples/newparallel/parallelpi.py`. This code can be run in parallel
111 111 using IPython by following these steps:
112 112
113 113 1. Use :command:`ipcluster` to start 15 engines. We used an 8 core (2 quad
114 114 core CPUs) cluster with hyperthreading enabled which makes the 8 cores
115 115 looks like 16 (1 controller + 15 engines) in the OS. However, the maximum
116 116 speedup we can observe is still only 8x.
117 117 2. With the file :file:`parallelpi.py` in your current working directory, open
118 118 up IPython in pylab mode and type ``run parallelpi.py``. This will download
119 119 the pi files via ftp the first time you run it, if they are not
120 120 present in the Engines' working directory.
121 121
122 122 When run on our 8 core cluster, we observe a speedup of 7.7x. This is slightly
123 123 less than linear scaling (8x) because the controller is also running on one of
124 124 the cores.
125 125
126 126 To emphasize the interactive nature of IPython, we now show how the
127 127 calculation can also be run by simply typing the commands from
128 128 :file:`parallelpi.py` interactively into IPython:
129 129
130 130 .. sourcecode:: ipython
131 131
132 132 In [1]: from IPython.parallel import Client
133 133
134 134 # The Client allows us to use the engines interactively.
135 135 # We simply pass Client the name of the cluster profile we
136 136 # are using.
137 137 In [2]: c = Client(profile='mycluster')
138 138 In [3]: view = c.load_balanced_view()
139 139
140 140 In [3]: c.ids
141 141 Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
142 142
143 143 In [4]: run pidigits.py
144 144
145 145 In [5]: filestring = 'pi200m.ascii.%(i)02dof20'
146 146
147 147 # Create the list of files to process.
148 148 In [6]: files = [filestring % {'i':i} for i in range(1,16)]
149 149
150 150 In [7]: files
151 151 Out[7]:
152 152 ['pi200m.ascii.01of20',
153 153 'pi200m.ascii.02of20',
154 154 'pi200m.ascii.03of20',
155 155 'pi200m.ascii.04of20',
156 156 'pi200m.ascii.05of20',
157 157 'pi200m.ascii.06of20',
158 158 'pi200m.ascii.07of20',
159 159 'pi200m.ascii.08of20',
160 160 'pi200m.ascii.09of20',
161 161 'pi200m.ascii.10of20',
162 162 'pi200m.ascii.11of20',
163 163 'pi200m.ascii.12of20',
164 164 'pi200m.ascii.13of20',
165 165 'pi200m.ascii.14of20',
166 166 'pi200m.ascii.15of20']
167 167
168 168 # download the data files if they don't already exist:
169 169 In [8]: v.map(fetch_pi_file, files)
170 170
171 171 # This is the parallel calculation using the Client.map method
172 172 # which applies compute_two_digit_freqs to each file in files in parallel.
173 173 In [9]: freqs_all = v.map(compute_two_digit_freqs, files)
174 174
175 175 # Add up the frequencies from each engine.
176 176 In [10]: freqs = reduce_freqs(freqs_all)
177 177
178 178 In [11]: plot_two_digit_freqs(freqs)
179 179 Out[11]: <matplotlib.image.AxesImage object at 0x18beb110>
180 180
181 181 In [12]: plt.title('2 digit counts of 150m digits of pi')
182 182 Out[12]: <matplotlib.text.Text object at 0x18d1f9b0>
183 183
184 184 The resulting plot generated by Matplotlib is shown below. The colors indicate
185 185 which two digit sequences are more (red) or less (blue) likely to occur in the
186 186 first 150 million digits of pi. We clearly see that the sequence "41" is
187 187 most likely and that "06" and "07" are least likely. Further analysis would
188 188 show that the relative size of the statistical fluctuations have decreased
189 189 compared to the 10,000 digit calculation.
190 190
191 191 .. image:: two_digit_counts.*
192 192
193 193
194 194 Parallel options pricing
195 195 ========================
196 196
197 197 An option is a financial contract that gives the buyer of the contract the
198 198 right to buy (a "call") or sell (a "put") a secondary asset (a stock for
199 199 example) at a particular date in the future (the expiration date) for a
200 200 pre-agreed upon price (the strike price). For this right, the buyer pays the
201 201 seller a premium (the option price). There are a wide variety of flavors of
202 202 options (American, European, Asian, etc.) that are useful for different
203 203 purposes: hedging against risk, speculation, etc.
204 204
205 205 Much of modern finance is driven by the need to price these contracts
206 206 accurately based on what is known about the properties (such as volatility) of
207 207 the underlying asset. One method of pricing options is to use a Monte Carlo
208 208 simulation of the underlying asset price. In this example we use this approach
209 209 to price both European and Asian (path dependent) options for various strike
210 210 prices and volatilities.
211 211
212 212 The code for this example can be found in the :file:`docs/examples/newparallel`
213 213 directory of the IPython source. The function :func:`price_options` in
214 214 :file:`mcpricer.py` implements the basic Monte Carlo pricing algorithm using
215 215 the NumPy package and is shown here:
216 216
217 217 .. literalinclude:: ../../examples/newparallel/mcpricer.py
218 218 :language: python
219 219
220 220 To run this code in parallel, we will use IPython's :class:`LoadBalancedView` class,
221 221 which distributes work to the engines using dynamic load balancing. This
222 222 view is a wrapper of the :class:`Client` class shown in
223 223 the previous example. The parallel calculation using :class:`LoadBalancedView` can
224 224 be found in the file :file:`mcpricer.py`. The code in this file creates a
225 225 :class:`TaskClient` instance and then submits a set of tasks using
226 226 :meth:`TaskClient.run` that calculate the option prices for different
227 227 volatilities and strike prices. The results are then plotted as a 2D contour
228 228 plot using Matplotlib.
229 229
230 230 .. literalinclude:: ../../examples/newparallel/mcdriver.py
231 231 :language: python
232 232
233 233 To use this code, start an IPython cluster using :command:`ipcluster`, open
234 234 IPython in the pylab mode with the file :file:`mcdriver.py` in your current
235 235 working directory and then type:
236 236
237 237 .. sourcecode:: ipython
238 238
239 239 In [7]: run mcdriver.py
240 240 Submitted tasks: [0, 1, 2, ...]
241 241
242 242 Once all the tasks have finished, the results can be plotted using the
243 243 :func:`plot_options` function. Here we make contour plots of the Asian
244 244 call and Asian put options as function of the volatility and strike price:
245 245
246 246 .. sourcecode:: ipython
247 247
248 248 In [8]: plot_options(sigma_vals, K_vals, prices['acall'])
249 249
250 250 In [9]: plt.figure()
251 251 Out[9]: <matplotlib.figure.Figure object at 0x18c178d0>
252 252
253 253 In [10]: plot_options(sigma_vals, K_vals, prices['aput'])
254 254
255 255 These results are shown in the two figures below. On a 8 core cluster the
256 256 entire calculation (10 strike prices, 10 volatilities, 100,000 paths for each)
257 257 took 30 seconds in parallel, giving a speedup of 7.7x, which is comparable
258 258 to the speedup observed in our previous example.
259 259
260 260 .. image:: asian_call.*
261 261
262 262 .. image:: asian_put.*
263 263
264 264 Conclusion
265 265 ==========
266 266
267 267 To conclude these examples, we summarize the key features of IPython's
268 268 parallel architecture that have been demonstrated:
269 269
270 270 * Serial code can be parallelized often with only a few extra lines of code.
271 271 We have used the :class:`DirectView` and :class:`LoadBalancedView` classes
272 272 for this purpose.
273 273 * The resulting parallel code can be run without ever leaving the IPython's
274 274 interactive shell.
275 275 * Any data computed in parallel can be explored interactively through
276 276 visualization or further numerical calculations.
277 277 * We have run these examples on a cluster running Windows HPC Server 2008.
278 278 IPython's built in support for the Windows HPC job scheduler makes it
279 279 easy to get started with IPython's parallel capabilities.
280 280
281 281 .. note::
282 282
283 283 The newparallel code has never been run on Windows HPC Server, so the last
284 284 conclusion is untested.
@@ -1,253 +1,253 b''
1 1 .. _ip1par:
2 2
3 3 ============================
4 4 Overview and getting started
5 5 ============================
6 6
7 7 Introduction
8 8 ============
9 9
10 10 This section gives an overview of IPython's sophisticated and powerful
11 11 architecture for parallel and distributed computing. This architecture
12 12 abstracts out parallelism in a very general way, which enables IPython to
13 13 support many different styles of parallelism including:
14 14
15 15 * Single program, multiple data (SPMD) parallelism.
16 16 * Multiple program, multiple data (MPMD) parallelism.
17 17 * Message passing using MPI.
18 18 * Task farming.
19 19 * Data parallel.
20 20 * Combinations of these approaches.
21 21 * Custom user defined approaches.
22 22
23 23 Most importantly, IPython enables all types of parallel applications to
24 24 be developed, executed, debugged and monitored *interactively*. Hence,
25 25 the ``I`` in IPython. The following are some example usage cases for IPython:
26 26
27 27 * Quickly parallelize algorithms that are embarrassingly parallel
28 28 using a number of simple approaches. Many simple things can be
29 29 parallelized interactively in one or two lines of code.
30 30
31 31 * Steer traditional MPI applications on a supercomputer from an
32 32 IPython session on your laptop.
33 33
34 34 * Analyze and visualize large datasets (that could be remote and/or
35 35 distributed) interactively using IPython and tools like
36 36 matplotlib/TVTK.
37 37
38 38 * Develop, test and debug new parallel algorithms
39 39 (that may use MPI) interactively.
40 40
41 41 * Tie together multiple MPI jobs running on different systems into
42 42 one giant distributed and parallel system.
43 43
44 44 * Start a parallel job on your cluster and then have a remote
45 45 collaborator connect to it and pull back data into their
46 46 local IPython session for plotting and analysis.
47 47
48 48 * Run a set of tasks on a set of CPUs using dynamic load balancing.
49 49
50 50 Architecture overview
51 51 =====================
52 52
53 53 The IPython architecture consists of four components:
54 54
55 55 * The IPython engine.
56 56 * The IPython hub.
57 57 * The IPython schedulers.
58 58 * The controller client.
59 59
60 60 These components live in the :mod:`IPython.parallel` package and are
61 61 installed with IPython. They do, however, have additional dependencies
62 62 that must be installed. For more information, see our
63 63 :ref:`installation documentation <install_index>`.
64 64
65 65 .. TODO: include zmq in install_index
66 66
67 67 IPython engine
68 68 ---------------
69 69
70 70 The IPython engine is a Python instance that takes Python commands over a
71 71 network connection. Eventually, the IPython engine will be a full IPython
72 72 interpreter, but for now, it is a regular Python interpreter. The engine
73 73 can also handle incoming and outgoing Python objects sent over a network
74 74 connection. When multiple engines are started, parallel and distributed
75 75 computing becomes possible. An important feature of an IPython engine is
76 76 that it blocks while user code is being executed. Read on for how the
77 77 IPython controller solves this problem to expose a clean asynchronous API
78 78 to the user.
79 79
80 80 IPython controller
81 81 ------------------
82 82
83 83 The IPython controller processes provide an interface for working with a set of engines.
84 84 At a general level, the controller is a collection of processes to which IPython engines
85 85 and clients can connect. The controller is composed of a :class:`Hub` and a collection of
86 86 :class:`Schedulers`. These Schedulers are typically run in separate processes but on the
87 87 same machine as the Hub, but can be run anywhere from local threads or on remote machines.
88 88
89 89 The controller also provides a single point of contact for users who wish to
90 90 utilize the engines connected to the controller. There are different ways of
91 91 working with a controller. In IPython, all of these models are implemented via
92 92 the client's :meth:`.View.apply` method, with various arguments, or
93 93 constructing :class:`.View` objects to represent subsets of engines. The two
94 94 primary models for interacting with engines are:
95 95
96 96 * A **Direct** interface, where engines are addressed explicitly.
97 97 * A **LoadBalanced** interface, where the Scheduler is trusted with assigning work to
98 98 appropriate engines.
99 99
100 100 Advanced users can readily extend the View models to enable other
101 101 styles of parallelism.
102 102
103 103 .. note::
104 104
105 105 A single controller and set of engines can be used with multiple models
106 106 simultaneously. This opens the door for lots of interesting things.
107 107
108 108
109 109 The Hub
110 110 *******
111 111
112 112 The center of an IPython cluster is the Hub. This is the process that keeps
113 113 track of engine connections, schedulers, clients, as well as all task requests and
114 114 results. The primary role of the Hub is to facilitate queries of the cluster state, and
115 115 minimize the necessary information required to establish the many connections involved in
116 116 connecting new clients and engines.
117 117
118 118
119 119 Schedulers
120 120 **********
121 121
122 122 All actions that can be performed on the engine go through a Scheduler. While the engines
123 123 themselves block when user code is run, the schedulers hide that from the user to provide
124 124 a fully asynchronous interface to a set of engines.
125 125
126 126
127 127 IPython client and views
128 128 ------------------------
129 129
130 130 There is one primary object, the :class:`~.parallel.Client`, for connecting to a cluster.
131 131 For each execution model, there is a corresponding :class:`~.parallel.View`. These views
132 132 allow users to interact with a set of engines through the interface. Here are the two default
133 133 views:
134 134
135 135 * The :class:`DirectView` class for explicit addressing.
136 136 * The :class:`LoadBalancedView` class for destination-agnostic scheduling.
137 137
138 138 Security
139 139 --------
140 140
141 141 IPython uses ZeroMQ for networking, which has provided many advantages, but
142 142 one of the setbacks is its utter lack of security [ZeroMQ]_. By default, no IPython
143 143 connections are encrypted, but open ports only listen on localhost. The only
144 144 source of security for IPython is via ssh-tunnel. IPython supports both shell
145 145 (`openssh`) and `paramiko` based tunnels for connections. There is a key necessary
146 146 to submit requests, but due to the lack of encryption, it does not provide
147 147 significant security if loopback traffic is compromised.
148 148
149 149 In our architecture, the controller is the only process that listens on
150 150 network ports, and is thus the main point of vulnerability. The standard model
151 151 for secure connections is to designate that the controller listen on
152 152 localhost, and use ssh-tunnels to connect clients and/or
153 153 engines.
154 154
155 155 To connect and authenticate to the controller an engine or client needs
156 156 some information that the controller has stored in a JSON file.
157 157 Thus, the JSON files need to be copied to a location where
158 158 the clients and engines can find them. Typically, this is the
159 159 :file:`~/.ipython/profile_default/security` directory on the host where the
160 160 client/engine is running (which could be a different host than the controller).
161 161 Once the JSON files are copied over, everything should work fine.
162 162
163 163 Currently, there are two JSON files that the controller creates:
164 164
165 165 ipcontroller-engine.json
166 166 This JSON file has the information necessary for an engine to connect
167 167 to a controller.
168 168
169 169 ipcontroller-client.json
170 170 The client's connection information. This may not differ from the engine's,
171 171 but since the controller may listen on different ports for clients and
172 172 engines, it is stored separately.
173 173
174 174 More details of how these JSON files are used are given below.
175 175
176 176 A detailed description of the security model and its implementation in IPython
177 177 can be found :ref:`here <parallelsecurity>`.
178 178
179 179 .. warning::
180 180
181 181 Even at its most secure, the Controller listens on ports on localhost, and
182 182 every time you make a tunnel, you open a localhost port on the connecting
183 183 machine that points to the Controller. If localhost on the Controller's
184 184 machine, or the machine of any client or engine, is untrusted, then your
185 185 Controller is insecure. There is no way around this with ZeroMQ.
186 186
187 187
188 188
189 189 Getting Started
190 190 ===============
191 191
192 192 To use IPython for parallel computing, you need to start one instance of the
193 193 controller and one or more instances of the engine. Initially, it is best to
194 194 simply start a controller and engines on a single host using the
195 195 :command:`ipcluster` command. To start a controller and 4 engines on your
196 196 localhost, just do::
197 197
198 $ ipcluster start n=4
198 $ ipcluster start --n=4
199 199
200 200 More details about starting the IPython controller and engines can be found
201 201 :ref:`here <parallel_process>`
202 202
203 203 Once you have started the IPython controller and one or more engines, you
204 204 are ready to use the engines to do something useful. To make sure
205 205 everything is working correctly, try the following commands:
206 206
207 207 .. sourcecode:: ipython
208 208
209 209 In [1]: from IPython.parallel import Client
210 210
211 211 In [2]: c = Client()
212 212
213 213 In [4]: c.ids
214 214 Out[4]: set([0, 1, 2, 3])
215 215
216 216 In [5]: c[:].apply_sync(lambda : "Hello, World")
217 217 Out[5]: [ 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World' ]
218 218
219 219
220 220 When a client is created with no arguments, the client tries to find the corresponding JSON file
221 221 in the local `~/.ipython/profile_default/security` directory. Or if you specified a profile,
222 222 you can use that with the Client. This should cover most cases:
223 223
224 224 .. sourcecode:: ipython
225 225
226 226 In [2]: c = Client(profile='myprofile')
227 227
228 228 If you have put the JSON file in a different location or it has a different name, create the
229 229 client like this:
230 230
231 231 .. sourcecode:: ipython
232 232
233 233 In [2]: c = Client('/path/to/my/ipcontroller-client.json')
234 234
235 235 Remember, a client needs to be able to see the Hub's ports to connect. So if they are on a
236 236 different machine, you may need to use an ssh server to tunnel access to that machine,
237 237 then you would connect to it with:
238 238
239 239 .. sourcecode:: ipython
240 240
241 241 In [2]: c = Client(sshserver='myhub.example.com')
242 242
243 243 Where 'myhub.example.com' is the url or IP address of the machine on
244 244 which the Hub process is running (or another machine that has direct access to the Hub's ports).
245 245
246 246 The SSH server may already be specified in ipcontroller-client.json, if the controller was
247 247 instructed at its launch time.
248 248
249 249 You are now ready to learn more about the :ref:`Direct
250 250 <parallel_multiengine>` and :ref:`LoadBalanced <parallel_task>` interfaces to the
251 251 controller.
252 252
253 253 .. [ZeroMQ] ZeroMQ. http://www.zeromq.org
@@ -1,156 +1,156 b''
1 1 .. _parallelmpi:
2 2
3 3 =======================
4 4 Using MPI with IPython
5 5 =======================
6 6
7 7 .. note::
8 8
9 9 Not adapted to zmq yet
10 10 This is out of date wrt ipcluster in general as well
11 11
12 12 Often, a parallel algorithm will require moving data between the engines. One
13 13 way of accomplishing this is by doing a pull and then a push using the
14 14 multiengine client. However, this will be slow as all the data has to go
15 15 through the controller to the client and then back through the controller, to
16 16 its final destination.
17 17
18 18 A much better way of moving data between engines is to use a message passing
19 19 library, such as the Message Passing Interface (MPI) [MPI]_. IPython's
20 20 parallel computing architecture has been designed from the ground up to
21 21 integrate with MPI. This document describes how to use MPI with IPython.
22 22
23 23 Additional installation requirements
24 24 ====================================
25 25
26 26 If you want to use MPI with IPython, you will need to install:
27 27
28 28 * A standard MPI implementation such as OpenMPI [OpenMPI]_ or MPICH.
29 29 * The mpi4py [mpi4py]_ package.
30 30
31 31 .. note::
32 32
33 33 The mpi4py package is not a strict requirement. However, you need to
34 34 have *some* way of calling MPI from Python. You also need some way of
35 35 making sure that :func:`MPI_Init` is called when the IPython engines start
36 36 up. There are a number of ways of doing this and a good number of
37 37 associated subtleties. We highly recommend just using mpi4py as it
38 38 takes care of most of these problems. If you want to do something
39 39 different, let us know and we can help you get started.
40 40
41 41 Starting the engines with MPI enabled
42 42 =====================================
43 43
44 44 To use code that calls MPI, there are typically two things that MPI requires.
45 45
46 46 1. The process that wants to call MPI must be started using
47 47 :command:`mpiexec` or a batch system (like PBS) that has MPI support.
48 48 2. Once the process starts, it must call :func:`MPI_Init`.
49 49
50 50 There are a couple of ways that you can start the IPython engines and get
51 51 these things to happen.
52 52
53 53 Automatic starting using :command:`mpiexec` and :command:`ipcluster`
54 54 --------------------------------------------------------------------
55 55
56 56 The easiest approach is to use the `MPIExec` Launchers in :command:`ipcluster`,
57 57 which will first start a controller and then a set of engines using
58 58 :command:`mpiexec`::
59 59
60 $ ipcluster start n=4 elauncher=MPIExecEngineSetLauncher
60 $ ipcluster start --n=4 --elauncher=MPIExecEngineSetLauncher
61 61
62 62 This approach is best as interrupting :command:`ipcluster` will automatically
63 63 stop and clean up the controller and engines.
64 64
65 65 Manual starting using :command:`mpiexec`
66 66 ----------------------------------------
67 67
68 68 If you want to start the IPython engines using the :command:`mpiexec`, just
69 69 do::
70 70
71 $ mpiexec n=4 ipengine mpi=mpi4py
71 $ mpiexec n=4 ipengine --mpi=mpi4py
72 72
73 73 This requires that you already have a controller running and that the FURL
74 74 files for the engines are in place. We also have built in support for
75 75 PyTrilinos [PyTrilinos]_, which can be used (assuming is installed) by
76 76 starting the engines with::
77 77
78 $ mpiexec n=4 ipengine mpi=pytrilinos
78 $ mpiexec n=4 ipengine --mpi=pytrilinos
79 79
80 80 Automatic starting using PBS and :command:`ipcluster`
81 81 ------------------------------------------------------
82 82
83 83 The :command:`ipcluster` command also has built-in integration with PBS. For
84 84 more information on this approach, see our documentation on :ref:`ipcluster
85 85 <parallel_process>`.
86 86
87 87 Actually using MPI
88 88 ==================
89 89
90 90 Once the engines are running with MPI enabled, you are ready to go. You can
91 91 now call any code that uses MPI in the IPython engines. And, all of this can
92 92 be done interactively. Here we show a simple example that uses mpi4py
93 93 [mpi4py]_ version 1.1.0 or later.
94 94
95 95 First, lets define a simply function that uses MPI to calculate the sum of a
96 96 distributed array. Save the following text in a file called :file:`psum.py`:
97 97
98 98 .. sourcecode:: python
99 99
100 100 from mpi4py import MPI
101 101 import numpy as np
102 102
103 103 def psum(a):
104 104 s = np.sum(a)
105 105 rcvBuf = np.array(0.0,'d')
106 106 MPI.COMM_WORLD.Allreduce([s, MPI.DOUBLE],
107 107 [rcvBuf, MPI.DOUBLE],
108 108 op=MPI.SUM)
109 109 return rcvBuf
110 110
111 111 Now, start an IPython cluster::
112 112
113 $ ipcluster start profile=mpi n=4
113 $ ipcluster start --profile=mpi --n=4
114 114
115 115 .. note::
116 116
117 117 It is assumed here that the mpi profile has been set up, as described :ref:`here
118 118 <parallel_process>`.
119 119
120 120 Finally, connect to the cluster and use this function interactively. In this
121 121 case, we create a random array on each engine and sum up all the random arrays
122 122 using our :func:`psum` function:
123 123
124 124 .. sourcecode:: ipython
125 125
126 126 In [1]: from IPython.parallel import Client
127 127
128 128 In [2]: %load_ext parallel_magic
129 129
130 130 In [3]: c = Client(profile='mpi')
131 131
132 132 In [4]: view = c[:]
133 133
134 134 In [5]: view.activate()
135 135
136 136 # run the contents of the file on each engine:
137 137 In [6]: view.run('psum.py')
138 138
139 139 In [6]: px a = np.random.rand(100)
140 140 Parallel execution on engines: [0,1,2,3]
141 141
142 142 In [8]: px s = psum(a)
143 143 Parallel execution on engines: [0,1,2,3]
144 144
145 145 In [9]: view['s']
146 146 Out[9]: [187.451545803,187.451545803,187.451545803,187.451545803]
147 147
148 148 Any Python code that makes calls to MPI can be used in this manner, including
149 149 compiled C, C++ and Fortran libraries that have been exposed to Python.
150 150
151 151 .. [MPI] Message Passing Interface. http://www-unix.mcs.anl.gov/mpi/
152 152 .. [mpi4py] MPI for Python. mpi4py: http://mpi4py.scipy.org/
153 153 .. [OpenMPI] Open MPI. http://www.open-mpi.org/
154 154 .. [PyTrilinos] PyTrilinos. http://trilinos.sandia.gov/packages/pytrilinos/
155 155
156 156
@@ -1,847 +1,847 b''
1 1 .. _parallel_multiengine:
2 2
3 3 ==========================
4 4 IPython's Direct interface
5 5 ==========================
6 6
7 7 The direct, or multiengine, interface represents one possible way of working with a set of
8 8 IPython engines. The basic idea behind the multiengine interface is that the
9 9 capabilities of each engine are directly and explicitly exposed to the user.
10 10 Thus, in the multiengine interface, each engine is given an id that is used to
11 11 identify the engine and give it work to do. This interface is very intuitive
12 12 and is designed with interactive usage in mind, and is the best place for
13 13 new users of IPython to begin.
14 14
15 15 Starting the IPython controller and engines
16 16 ===========================================
17 17
18 18 To follow along with this tutorial, you will need to start the IPython
19 19 controller and four IPython engines. The simplest way of doing this is to use
20 20 the :command:`ipcluster` command::
21 21
22 $ ipcluster start n=4
22 $ ipcluster start --n=4
23 23
24 24 For more detailed information about starting the controller and engines, see
25 25 our :ref:`introduction <ip1par>` to using IPython for parallel computing.
26 26
27 27 Creating a ``Client`` instance
28 28 ==============================
29 29
30 30 The first step is to import the IPython :mod:`IPython.parallel`
31 31 module and then create a :class:`.Client` instance:
32 32
33 33 .. sourcecode:: ipython
34 34
35 35 In [1]: from IPython.parallel import Client
36 36
37 37 In [2]: rc = Client()
38 38
39 39 This form assumes that the default connection information (stored in
40 40 :file:`ipcontroller-client.json` found in :file:`IPYTHON_DIR/profile_default/security`) is
41 41 accurate. If the controller was started on a remote machine, you must copy that connection
42 42 file to the client machine, or enter its contents as arguments to the Client constructor:
43 43
44 44 .. sourcecode:: ipython
45 45
46 46 # If you have copied the json connector file from the controller:
47 47 In [2]: rc = Client('/path/to/ipcontroller-client.json')
48 48 # or to connect with a specific profile you have set up:
49 49 In [3]: rc = Client(profile='mpi')
50 50
51 51
52 52 To make sure there are engines connected to the controller, users can get a list
53 53 of engine ids:
54 54
55 55 .. sourcecode:: ipython
56 56
57 57 In [3]: rc.ids
58 58 Out[3]: [0, 1, 2, 3]
59 59
60 60 Here we see that there are four engines ready to do work for us.
61 61
62 62 For direct execution, we will make use of a :class:`DirectView` object, which can be
63 63 constructed via list-access to the client:
64 64
65 65 .. sourcecode:: ipython
66 66
67 67 In [4]: dview = rc[:] # use all engines
68 68
69 69 .. seealso::
70 70
71 71 For more information, see the in-depth explanation of :ref:`Views <parallel_details>`.
72 72
73 73
74 74 Quick and easy parallelism
75 75 ==========================
76 76
77 77 In many cases, you simply want to apply a Python function to a sequence of
78 78 objects, but *in parallel*. The client interface provides a simple way
79 79 of accomplishing this: using the DirectView's :meth:`~DirectView.map` method.
80 80
81 81 Parallel map
82 82 ------------
83 83
84 84 Python's builtin :func:`map` functions allows a function to be applied to a
85 85 sequence element-by-element. This type of code is typically trivial to
86 86 parallelize. In fact, since IPython's interface is all about functions anyway,
87 87 you can just use the builtin :func:`map` with a :class:`RemoteFunction`, or a
88 88 DirectView's :meth:`map` method:
89 89
90 90 .. sourcecode:: ipython
91 91
92 92 In [62]: serial_result = map(lambda x:x**10, range(32))
93 93
94 94 In [63]: parallel_result = dview.map_sync(lambda x: x**10, range(32))
95 95
96 96 In [67]: serial_result==parallel_result
97 97 Out[67]: True
98 98
99 99
100 100 .. note::
101 101
102 102 The :class:`DirectView`'s version of :meth:`map` does
103 103 not do dynamic load balancing. For a load balanced version, use a
104 104 :class:`LoadBalancedView`.
105 105
106 106 .. seealso::
107 107
108 108 :meth:`map` is implemented via :class:`ParallelFunction`.
109 109
110 110 Remote function decorators
111 111 --------------------------
112 112
113 113 Remote functions are just like normal functions, but when they are called,
114 114 they execute on one or more engines, rather than locally. IPython provides
115 115 two decorators:
116 116
117 117 .. sourcecode:: ipython
118 118
119 119 In [10]: @dview.remote(block=True)
120 120 ...: def getpid():
121 121 ...: import os
122 122 ...: return os.getpid()
123 123 ...:
124 124
125 125 In [11]: getpid()
126 126 Out[11]: [12345, 12346, 12347, 12348]
127 127
128 128 The ``@parallel`` decorator creates parallel functions, that break up an element-wise
129 129 operations and distribute them, reconstructing the result.
130 130
131 131 .. sourcecode:: ipython
132 132
133 133 In [12]: import numpy as np
134 134
135 135 In [13]: A = np.random.random((64,48))
136 136
137 137 In [14]: @dview.parallel(block=True)
138 138 ...: def pmul(A,B):
139 139 ...: return A*B
140 140
141 141 In [15]: C_local = A*A
142 142
143 143 In [16]: C_remote = pmul(A,A)
144 144
145 145 In [17]: (C_local == C_remote).all()
146 146 Out[17]: True
147 147
148 148 .. seealso::
149 149
150 150 See the docstrings for the :func:`parallel` and :func:`remote` decorators for
151 151 options.
152 152
153 153 Calling Python functions
154 154 ========================
155 155
156 156 The most basic type of operation that can be performed on the engines is to
157 157 execute Python code or call Python functions. Executing Python code can be
158 158 done in blocking or non-blocking mode (non-blocking is default) using the
159 159 :meth:`.View.execute` method, and calling functions can be done via the
160 160 :meth:`.View.apply` method.
161 161
162 162 apply
163 163 -----
164 164
165 165 The main method for doing remote execution (in fact, all methods that
166 166 communicate with the engines are built on top of it), is :meth:`View.apply`.
167 167
168 168 We strive to provide the cleanest interface we can, so `apply` has the following
169 169 signature:
170 170
171 171 .. sourcecode:: python
172 172
173 173 view.apply(f, *args, **kwargs)
174 174
175 175 There are various ways to call functions with IPython, and these flags are set as
176 176 attributes of the View. The ``DirectView`` has just two of these flags:
177 177
178 178 dv.block : bool
179 179 whether to wait for the result, or return an :class:`AsyncResult` object
180 180 immediately
181 181 dv.track : bool
182 182 whether to instruct pyzmq to track when
183 183 This is primarily useful for non-copying sends of numpy arrays that you plan to
184 184 edit in-place. You need to know when it becomes safe to edit the buffer
185 185 without corrupting the message.
186 186
187 187
188 188 Creating a view is simple: index-access on a client creates a :class:`.DirectView`.
189 189
190 190 .. sourcecode:: ipython
191 191
192 192 In [4]: view = rc[1:3]
193 193 Out[4]: <DirectView [1, 2]>
194 194
195 195 In [5]: view.apply<tab>
196 196 view.apply view.apply_async view.apply_sync
197 197
198 198 For convenience, you can set block temporarily for a single call with the extra sync/async methods.
199 199
200 200 Blocking execution
201 201 ------------------
202 202
203 203 In blocking mode, the :class:`.DirectView` object (called ``dview`` in
204 204 these examples) submits the command to the controller, which places the
205 205 command in the engines' queues for execution. The :meth:`apply` call then
206 206 blocks until the engines are done executing the command:
207 207
208 208 .. sourcecode:: ipython
209 209
210 210 In [2]: dview = rc[:] # A DirectView of all engines
211 211 In [3]: dview.block=True
212 212 In [4]: dview['a'] = 5
213 213
214 214 In [5]: dview['b'] = 10
215 215
216 216 In [6]: dview.apply(lambda x: a+b+x, 27)
217 217 Out[6]: [42, 42, 42, 42]
218 218
219 219 You can also select blocking execution on a call-by-call basis with the :meth:`apply_sync`
220 220 method:
221 221
222 222 In [7]: dview.block=False
223 223
224 224 In [8]: dview.apply_sync(lambda x: a+b+x, 27)
225 225 Out[8]: [42, 42, 42, 42]
226 226
227 227 Python commands can be executed as strings on specific engines by using a View's ``execute``
228 228 method:
229 229
230 230 .. sourcecode:: ipython
231 231
232 232 In [6]: rc[::2].execute('c=a+b')
233 233
234 234 In [7]: rc[1::2].execute('c=a-b')
235 235
236 236 In [8]: dview['c'] # shorthand for dview.pull('c', block=True)
237 237 Out[8]: [15, -5, 15, -5]
238 238
239 239
240 240 Non-blocking execution
241 241 ----------------------
242 242
243 243 In non-blocking mode, :meth:`apply` submits the command to be executed and
244 244 then returns a :class:`AsyncResult` object immediately. The
245 245 :class:`AsyncResult` object gives you a way of getting a result at a later
246 246 time through its :meth:`get` method.
247 247
248 248 .. Note::
249 249
250 250 The :class:`AsyncResult` object provides a superset of the interface in
251 251 :py:class:`multiprocessing.pool.AsyncResult`. See the
252 252 `official Python documentation <http://docs.python.org/library/multiprocessing#multiprocessing.pool.AsyncResult>`_
253 253 for more.
254 254
255 255
256 256 This allows you to quickly submit long running commands without blocking your
257 257 local Python/IPython session:
258 258
259 259 .. sourcecode:: ipython
260 260
261 261 # define our function
262 262 In [6]: def wait(t):
263 263 ...: import time
264 264 ...: tic = time.time()
265 265 ...: time.sleep(t)
266 266 ...: return time.time()-tic
267 267
268 268 # In non-blocking mode
269 269 In [7]: ar = dview.apply_async(wait, 2)
270 270
271 271 # Now block for the result
272 272 In [8]: ar.get()
273 273 Out[8]: [2.0006198883056641, 1.9997570514678955, 1.9996809959411621, 2.0003249645233154]
274 274
275 275 # Again in non-blocking mode
276 276 In [9]: ar = dview.apply_async(wait, 10)
277 277
278 278 # Poll to see if the result is ready
279 279 In [10]: ar.ready()
280 280 Out[10]: False
281 281
282 282 # ask for the result, but wait a maximum of 1 second:
283 283 In [45]: ar.get(1)
284 284 ---------------------------------------------------------------------------
285 285 TimeoutError Traceback (most recent call last)
286 286 /home/you/<ipython-input-45-7cd858bbb8e0> in <module>()
287 287 ----> 1 ar.get(1)
288 288
289 289 /path/to/site-packages/IPython/parallel/asyncresult.pyc in get(self, timeout)
290 290 62 raise self._exception
291 291 63 else:
292 292 ---> 64 raise error.TimeoutError("Result not ready.")
293 293 65
294 294 66 def ready(self):
295 295
296 296 TimeoutError: Result not ready.
297 297
298 298 .. Note::
299 299
300 300 Note the import inside the function. This is a common model, to ensure
301 301 that the appropriate modules are imported where the task is run. You can
302 302 also manually import modules into the engine(s) namespace(s) via
303 303 :meth:`view.execute('import numpy')`.
304 304
305 305 Often, it is desirable to wait until a set of :class:`AsyncResult` objects
306 306 are done. For this, there is a the method :meth:`wait`. This method takes a
307 307 tuple of :class:`AsyncResult` objects (or `msg_ids` or indices to the client's History),
308 308 and blocks until all of the associated results are ready:
309 309
310 310 .. sourcecode:: ipython
311 311
312 312 In [72]: dview.block=False
313 313
314 314 # A trivial list of AsyncResults objects
315 315 In [73]: pr_list = [dview.apply_async(wait, 3) for i in range(10)]
316 316
317 317 # Wait until all of them are done
318 318 In [74]: dview.wait(pr_list)
319 319
320 320 # Then, their results are ready using get() or the `.r` attribute
321 321 In [75]: pr_list[0].get()
322 322 Out[75]: [2.9982571601867676, 2.9982588291168213, 2.9987530708312988, 2.9990990161895752]
323 323
324 324
325 325
326 326 The ``block`` and ``targets`` keyword arguments and attributes
327 327 --------------------------------------------------------------
328 328
329 329 Most DirectView methods (excluding :meth:`apply` and :meth:`map`) accept ``block`` and
330 330 ``targets`` as keyword arguments. As we have seen above, these keyword arguments control the
331 331 blocking mode and which engines the command is applied to. The :class:`View` class also has
332 332 :attr:`block` and :attr:`targets` attributes that control the default behavior when the keyword
333 333 arguments are not provided. Thus the following logic is used for :attr:`block` and :attr:`targets`:
334 334
335 335 * If no keyword argument is provided, the instance attributes are used.
336 336 * Keyword argument, if provided override the instance attributes for
337 337 the duration of a single call.
338 338
339 339 The following examples demonstrate how to use the instance attributes:
340 340
341 341 .. sourcecode:: ipython
342 342
343 343 In [16]: dview.targets = [0,2]
344 344
345 345 In [17]: dview.block = False
346 346
347 347 In [18]: ar = dview.apply(lambda : 10)
348 348
349 349 In [19]: ar.get()
350 350 Out[19]: [10, 10]
351 351
352 352 In [16]: dview.targets = v.client.ids # all engines (4)
353 353
354 354 In [21]: dview.block = True
355 355
356 356 In [22]: dview.apply(lambda : 42)
357 357 Out[22]: [42, 42, 42, 42]
358 358
359 359 The :attr:`block` and :attr:`targets` instance attributes of the
360 360 :class:`.DirectView` also determine the behavior of the parallel magic commands.
361 361
362 362 Parallel magic commands
363 363 -----------------------
364 364
365 365 .. warning::
366 366
367 367 The magics have not been changed to work with the zeromq system. The
368 368 magics do work, but *do not* print stdin/out like they used to in IPython.kernel.
369 369
370 370 We provide a few IPython magic commands (``%px``, ``%autopx`` and ``%result``)
371 371 that make it more pleasant to execute Python commands on the engines
372 372 interactively. These are simply shortcuts to :meth:`execute` and
373 373 :meth:`get_result` of the :class:`DirectView`. The ``%px`` magic executes a single
374 374 Python command on the engines specified by the :attr:`targets` attribute of the
375 375 :class:`DirectView` instance:
376 376
377 377 .. sourcecode:: ipython
378 378
379 379 # load the parallel magic extension:
380 380 In [21]: %load_ext parallelmagic
381 381
382 382 # Create a DirectView for all targets
383 383 In [22]: dv = rc[:]
384 384
385 385 # Make this DirectView active for parallel magic commands
386 386 In [23]: dv.activate()
387 387
388 388 In [24]: dv.block=True
389 389
390 390 In [25]: import numpy
391 391
392 392 In [26]: %px import numpy
393 393 Parallel execution on engines: [0, 1, 2, 3]
394 394
395 395 In [27]: %px a = numpy.random.rand(2,2)
396 396 Parallel execution on engines: [0, 1, 2, 3]
397 397
398 398 In [28]: %px ev = numpy.linalg.eigvals(a)
399 399 Parallel execution on engines: [0, 1, 2, 3]
400 400
401 401 In [28]: dv['ev']
402 402 Out[28]: [ array([ 1.09522024, -0.09645227]),
403 403 array([ 1.21435496, -0.35546712]),
404 404 array([ 0.72180653, 0.07133042]),
405 405 array([ 1.46384341e+00, 1.04353244e-04])
406 406 ]
407 407
408 408 The ``%result`` magic gets the most recent result, or takes an argument
409 409 specifying the index of the result to be requested. It is simply a shortcut to the
410 410 :meth:`get_result` method:
411 411
412 412 .. sourcecode:: ipython
413 413
414 414 In [29]: dv.apply_async(lambda : ev)
415 415
416 416 In [30]: %result
417 417 Out[30]: [ [ 1.28167017 0.14197338],
418 418 [-0.14093616 1.27877273],
419 419 [-0.37023573 1.06779409],
420 420 [ 0.83664764 -0.25602658] ]
421 421
422 422 The ``%autopx`` magic switches to a mode where everything you type is executed
423 423 on the engines given by the :attr:`targets` attribute:
424 424
425 425 .. sourcecode:: ipython
426 426
427 427 In [30]: dv.block=False
428 428
429 429 In [31]: %autopx
430 430 Auto Parallel Enabled
431 431 Type %autopx to disable
432 432
433 433 In [32]: max_evals = []
434 434 <IPython.parallel.AsyncResult object at 0x17b8a70>
435 435
436 436 In [33]: for i in range(100):
437 437 ....: a = numpy.random.rand(10,10)
438 438 ....: a = a+a.transpose()
439 439 ....: evals = numpy.linalg.eigvals(a)
440 440 ....: max_evals.append(evals[0].real)
441 441 ....:
442 442 ....:
443 443 <IPython.parallel.AsyncResult object at 0x17af8f0>
444 444
445 445 In [34]: %autopx
446 446 Auto Parallel Disabled
447 447
448 448 In [35]: dv.block=True
449 449
450 450 In [36]: px ans= "Average max eigenvalue is: %f"%(sum(max_evals)/len(max_evals))
451 451 Parallel execution on engines: [0, 1, 2, 3]
452 452
453 453 In [37]: dv['ans']
454 454 Out[37]: [ 'Average max eigenvalue is: 10.1387247332',
455 455 'Average max eigenvalue is: 10.2076902286',
456 456 'Average max eigenvalue is: 10.1891484655',
457 457 'Average max eigenvalue is: 10.1158837784',]
458 458
459 459
460 460 Moving Python objects around
461 461 ============================
462 462
463 463 In addition to calling functions and executing code on engines, you can
464 464 transfer Python objects to and from your IPython session and the engines. In
465 465 IPython, these operations are called :meth:`push` (sending an object to the
466 466 engines) and :meth:`pull` (getting an object from the engines).
467 467
468 468 Basic push and pull
469 469 -------------------
470 470
471 471 Here are some examples of how you use :meth:`push` and :meth:`pull`:
472 472
473 473 .. sourcecode:: ipython
474 474
475 475 In [38]: dview.push(dict(a=1.03234,b=3453))
476 476 Out[38]: [None,None,None,None]
477 477
478 478 In [39]: dview.pull('a')
479 479 Out[39]: [ 1.03234, 1.03234, 1.03234, 1.03234]
480 480
481 481 In [40]: dview.pull('b', targets=0)
482 482 Out[40]: 3453
483 483
484 484 In [41]: dview.pull(('a','b'))
485 485 Out[41]: [ [1.03234, 3453], [1.03234, 3453], [1.03234, 3453], [1.03234, 3453] ]
486 486
487 487 In [43]: dview.push(dict(c='speed'))
488 488 Out[43]: [None,None,None,None]
489 489
490 490 In non-blocking mode :meth:`push` and :meth:`pull` also return
491 491 :class:`AsyncResult` objects:
492 492
493 493 .. sourcecode:: ipython
494 494
495 495 In [48]: ar = dview.pull('a', block=False)
496 496
497 497 In [49]: ar.get()
498 498 Out[49]: [1.03234, 1.03234, 1.03234, 1.03234]
499 499
500 500
501 501 Dictionary interface
502 502 --------------------
503 503
504 504 Since a Python namespace is just a :class:`dict`, :class:`DirectView` objects provide
505 505 dictionary-style access by key and methods such as :meth:`get` and
506 506 :meth:`update` for convenience. This make the remote namespaces of the engines
507 507 appear as a local dictionary. Underneath, these methods call :meth:`apply`:
508 508
509 509 .. sourcecode:: ipython
510 510
511 511 In [51]: dview['a']=['foo','bar']
512 512
513 513 In [52]: dview['a']
514 514 Out[52]: [ ['foo', 'bar'], ['foo', 'bar'], ['foo', 'bar'], ['foo', 'bar'] ]
515 515
516 516 Scatter and gather
517 517 ------------------
518 518
519 519 Sometimes it is useful to partition a sequence and push the partitions to
520 520 different engines. In MPI language, this is know as scatter/gather and we
521 521 follow that terminology. However, it is important to remember that in
522 522 IPython's :class:`Client` class, :meth:`scatter` is from the
523 523 interactive IPython session to the engines and :meth:`gather` is from the
524 524 engines back to the interactive IPython session. For scatter/gather operations
525 525 between engines, MPI should be used:
526 526
527 527 .. sourcecode:: ipython
528 528
529 529 In [58]: dview.scatter('a',range(16))
530 530 Out[58]: [None,None,None,None]
531 531
532 532 In [59]: dview['a']
533 533 Out[59]: [ [0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15] ]
534 534
535 535 In [60]: dview.gather('a')
536 536 Out[60]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
537 537
538 538 Other things to look at
539 539 =======================
540 540
541 541 How to do parallel list comprehensions
542 542 --------------------------------------
543 543
544 544 In many cases list comprehensions are nicer than using the map function. While
545 545 we don't have fully parallel list comprehensions, it is simple to get the
546 546 basic effect using :meth:`scatter` and :meth:`gather`:
547 547
548 548 .. sourcecode:: ipython
549 549
550 550 In [66]: dview.scatter('x',range(64))
551 551
552 552 In [67]: %px y = [i**10 for i in x]
553 553 Parallel execution on engines: [0, 1, 2, 3]
554 554 Out[67]:
555 555
556 556 In [68]: y = dview.gather('y')
557 557
558 558 In [69]: print y
559 559 [0, 1, 1024, 59049, 1048576, 9765625, 60466176, 282475249, 1073741824,...]
560 560
561 561 Remote imports
562 562 --------------
563 563
564 564 Sometimes you will want to import packages both in your interactive session
565 565 and on your remote engines. This can be done with the :class:`ContextManager`
566 566 created by a DirectView's :meth:`sync_imports` method:
567 567
568 568 .. sourcecode:: ipython
569 569
570 570 In [69]: with dview.sync_imports():
571 571 ...: import numpy
572 572 importing numpy on engine(s)
573 573
574 574 Any imports made inside the block will also be performed on the view's engines.
575 575 sync_imports also takes a `local` boolean flag that defaults to True, which specifies
576 576 whether the local imports should also be performed. However, support for `local=False`
577 577 has not been implemented, so only packages that can be imported locally will work
578 578 this way.
579 579
580 580 You can also specify imports via the ``@require`` decorator. This is a decorator
581 581 designed for use in Dependencies, but can be used to handle remote imports as well.
582 582 Modules or module names passed to ``@require`` will be imported before the decorated
583 583 function is called. If they cannot be imported, the decorated function will never
584 584 execution, and will fail with an UnmetDependencyError.
585 585
586 586 .. sourcecode:: ipython
587 587
588 588 In [69]: from IPython.parallel import require
589 589
590 590 In [70]: @requre('re'):
591 591 ...: def findall(pat, x):
592 592 ...: # re is guaranteed to be available
593 593 ...: return re.findall(pat, x)
594 594
595 595 # you can also pass modules themselves, that you already have locally:
596 596 In [71]: @requre(time):
597 597 ...: def wait(t):
598 598 ...: time.sleep(t)
599 599 ...: return t
600 600
601 601 .. _parallel_exceptions:
602 602
603 603 Parallel exceptions
604 604 -------------------
605 605
606 606 In the multiengine interface, parallel commands can raise Python exceptions,
607 607 just like serial commands. But, it is a little subtle, because a single
608 608 parallel command can actually raise multiple exceptions (one for each engine
609 609 the command was run on). To express this idea, we have a
610 610 :exc:`CompositeError` exception class that will be raised in most cases. The
611 611 :exc:`CompositeError` class is a special type of exception that wraps one or
612 612 more other types of exceptions. Here is how it works:
613 613
614 614 .. sourcecode:: ipython
615 615
616 616 In [76]: dview.block=True
617 617
618 618 In [77]: dview.execute('1/0')
619 619 ---------------------------------------------------------------------------
620 620 CompositeError Traceback (most recent call last)
621 621 /home/user/<ipython-input-10-5d56b303a66c> in <module>()
622 622 ----> 1 dview.execute('1/0')
623 623
624 624 /path/to/site-packages/IPython/parallel/client/view.pyc in execute(self, code, targets, block)
625 625 591 default: self.block
626 626 592 """
627 627 --> 593 return self._really_apply(util._execute, args=(code,), block=block, targets=targets)
628 628 594
629 629 595 def run(self, filename, targets=None, block=None):
630 630
631 631 /home/user/<string> in _really_apply(self, f, args, kwargs, targets, block, track)
632 632
633 633 /path/to/site-packages/IPython/parallel/client/view.pyc in sync_results(f, self, *args, **kwargs)
634 634 55 def sync_results(f, self, *args, **kwargs):
635 635 56 """sync relevant results from self.client to our results attribute."""
636 636 ---> 57 ret = f(self, *args, **kwargs)
637 637 58 delta = self.outstanding.difference(self.client.outstanding)
638 638 59 completed = self.outstanding.intersection(delta)
639 639
640 640 /home/user/<string> in _really_apply(self, f, args, kwargs, targets, block, track)
641 641
642 642 /path/to/site-packages/IPython/parallel/client/view.pyc in save_ids(f, self, *args, **kwargs)
643 643 44 n_previous = len(self.client.history)
644 644 45 try:
645 645 ---> 46 ret = f(self, *args, **kwargs)
646 646 47 finally:
647 647 48 nmsgs = len(self.client.history) - n_previous
648 648
649 649 /path/to/site-packages/IPython/parallel/client/view.pyc in _really_apply(self, f, args, kwargs, targets, block, track)
650 650 529 if block:
651 651 530 try:
652 652 --> 531 return ar.get()
653 653 532 except KeyboardInterrupt:
654 654 533 pass
655 655
656 656 /path/to/site-packages/IPython/parallel/client/asyncresult.pyc in get(self, timeout)
657 657 101 return self._result
658 658 102 else:
659 659 --> 103 raise self._exception
660 660 104 else:
661 661 105 raise error.TimeoutError("Result not ready.")
662 662
663 663 CompositeError: one or more exceptions from call to method: _execute
664 664 [0:apply]: ZeroDivisionError: integer division or modulo by zero
665 665 [1:apply]: ZeroDivisionError: integer division or modulo by zero
666 666 [2:apply]: ZeroDivisionError: integer division or modulo by zero
667 667 [3:apply]: ZeroDivisionError: integer division or modulo by zero
668 668
669 669 Notice how the error message printed when :exc:`CompositeError` is raised has
670 670 information about the individual exceptions that were raised on each engine.
671 671 If you want, you can even raise one of these original exceptions:
672 672
673 673 .. sourcecode:: ipython
674 674
675 675 In [80]: try:
676 676 ....: dview.execute('1/0')
677 677 ....: except parallel.error.CompositeError, e:
678 678 ....: e.raise_exception()
679 679 ....:
680 680 ....:
681 681 ---------------------------------------------------------------------------
682 682 RemoteError Traceback (most recent call last)
683 683 /home/user/<ipython-input-17-8597e7e39858> in <module>()
684 684 2 dview.execute('1/0')
685 685 3 except CompositeError as e:
686 686 ----> 4 e.raise_exception()
687 687
688 688 /path/to/site-packages/IPython/parallel/error.pyc in raise_exception(self, excid)
689 689 266 raise IndexError("an exception with index %i does not exist"%excid)
690 690 267 else:
691 691 --> 268 raise RemoteError(en, ev, etb, ei)
692 692 269
693 693 270
694 694
695 695 RemoteError: ZeroDivisionError(integer division or modulo by zero)
696 696 Traceback (most recent call last):
697 697 File "/path/to/site-packages/IPython/parallel/engine/streamkernel.py", line 330, in apply_request
698 698 exec code in working,working
699 699 File "<string>", line 1, in <module>
700 700 File "/path/to/site-packages/IPython/parallel/util.py", line 354, in _execute
701 701 exec code in globals()
702 702 File "<string>", line 1, in <module>
703 703 ZeroDivisionError: integer division or modulo by zero
704 704
705 705 If you are working in IPython, you can simple type ``%debug`` after one of
706 706 these :exc:`CompositeError` exceptions is raised, and inspect the exception
707 707 instance:
708 708
709 709 .. sourcecode:: ipython
710 710
711 711 In [81]: dview.execute('1/0')
712 712 ---------------------------------------------------------------------------
713 713 CompositeError Traceback (most recent call last)
714 714 /home/user/<ipython-input-10-5d56b303a66c> in <module>()
715 715 ----> 1 dview.execute('1/0')
716 716
717 717 /path/to/site-packages/IPython/parallel/client/view.pyc in execute(self, code, targets, block)
718 718 591 default: self.block
719 719 592 """
720 720 --> 593 return self._really_apply(util._execute, args=(code,), block=block, targets=targets)
721 721 594
722 722 595 def run(self, filename, targets=None, block=None):
723 723
724 724 /home/user/<string> in _really_apply(self, f, args, kwargs, targets, block, track)
725 725
726 726 /path/to/site-packages/IPython/parallel/client/view.pyc in sync_results(f, self, *args, **kwargs)
727 727 55 def sync_results(f, self, *args, **kwargs):
728 728 56 """sync relevant results from self.client to our results attribute."""
729 729 ---> 57 ret = f(self, *args, **kwargs)
730 730 58 delta = self.outstanding.difference(self.client.outstanding)
731 731 59 completed = self.outstanding.intersection(delta)
732 732
733 733 /home/user/<string> in _really_apply(self, f, args, kwargs, targets, block, track)
734 734
735 735 /path/to/site-packages/IPython/parallel/client/view.pyc in save_ids(f, self, *args, **kwargs)
736 736 44 n_previous = len(self.client.history)
737 737 45 try:
738 738 ---> 46 ret = f(self, *args, **kwargs)
739 739 47 finally:
740 740 48 nmsgs = len(self.client.history) - n_previous
741 741
742 742 /path/to/site-packages/IPython/parallel/client/view.pyc in _really_apply(self, f, args, kwargs, targets, block, track)
743 743 529 if block:
744 744 530 try:
745 745 --> 531 return ar.get()
746 746 532 except KeyboardInterrupt:
747 747 533 pass
748 748
749 749 /path/to/site-packages/IPython/parallel/client/asyncresult.pyc in get(self, timeout)
750 750 101 return self._result
751 751 102 else:
752 752 --> 103 raise self._exception
753 753 104 else:
754 754 105 raise error.TimeoutError("Result not ready.")
755 755
756 756 CompositeError: one or more exceptions from call to method: _execute
757 757 [0:apply]: ZeroDivisionError: integer division or modulo by zero
758 758 [1:apply]: ZeroDivisionError: integer division or modulo by zero
759 759 [2:apply]: ZeroDivisionError: integer division or modulo by zero
760 760 [3:apply]: ZeroDivisionError: integer division or modulo by zero
761 761
762 762 In [82]: %debug
763 763 > /path/to/site-packages/IPython/parallel/client/asyncresult.py(103)get()
764 764 102 else:
765 765 --> 103 raise self._exception
766 766 104 else:
767 767
768 768 # With the debugger running, self._exception is the exceptions instance. We can tab complete
769 769 # on it and see the extra methods that are available.
770 770 ipdb> self._exception.<tab>
771 771 e.__class__ e.__getitem__ e.__new__ e.__setstate__ e.args
772 772 e.__delattr__ e.__getslice__ e.__reduce__ e.__str__ e.elist
773 773 e.__dict__ e.__hash__ e.__reduce_ex__ e.__weakref__ e.message
774 774 e.__doc__ e.__init__ e.__repr__ e._get_engine_str e.print_tracebacks
775 775 e.__getattribute__ e.__module__ e.__setattr__ e._get_traceback e.raise_exception
776 776 ipdb> self._exception.print_tracebacks()
777 777 [0:apply]:
778 778 Traceback (most recent call last):
779 779 File "/path/to/site-packages/IPython/parallel/engine/streamkernel.py", line 330, in apply_request
780 780 exec code in working,working
781 781 File "<string>", line 1, in <module>
782 782 File "/path/to/site-packages/IPython/parallel/util.py", line 354, in _execute
783 783 exec code in globals()
784 784 File "<string>", line 1, in <module>
785 785 ZeroDivisionError: integer division or modulo by zero
786 786
787 787
788 788 [1:apply]:
789 789 Traceback (most recent call last):
790 790 File "/path/to/site-packages/IPython/parallel/engine/streamkernel.py", line 330, in apply_request
791 791 exec code in working,working
792 792 File "<string>", line 1, in <module>
793 793 File "/path/to/site-packages/IPython/parallel/util.py", line 354, in _execute
794 794 exec code in globals()
795 795 File "<string>", line 1, in <module>
796 796 ZeroDivisionError: integer division or modulo by zero
797 797
798 798
799 799 [2:apply]:
800 800 Traceback (most recent call last):
801 801 File "/path/to/site-packages/IPython/parallel/engine/streamkernel.py", line 330, in apply_request
802 802 exec code in working,working
803 803 File "<string>", line 1, in <module>
804 804 File "/path/to/site-packages/IPython/parallel/util.py", line 354, in _execute
805 805 exec code in globals()
806 806 File "<string>", line 1, in <module>
807 807 ZeroDivisionError: integer division or modulo by zero
808 808
809 809
810 810 [3:apply]:
811 811 Traceback (most recent call last):
812 812 File "/path/to/site-packages/IPython/parallel/engine/streamkernel.py", line 330, in apply_request
813 813 exec code in working,working
814 814 File "<string>", line 1, in <module>
815 815 File "/path/to/site-packages/IPython/parallel/util.py", line 354, in _execute
816 816 exec code in globals()
817 817 File "<string>", line 1, in <module>
818 818 ZeroDivisionError: integer division or modulo by zero
819 819
820 820
821 821 All of this same error handling magic even works in non-blocking mode:
822 822
823 823 .. sourcecode:: ipython
824 824
825 825 In [83]: dview.block=False
826 826
827 827 In [84]: ar = dview.execute('1/0')
828 828
829 829 In [85]: ar.get()
830 830 ---------------------------------------------------------------------------
831 831 CompositeError Traceback (most recent call last)
832 832 /home/user/<ipython-input-21-8531eb3d26fb> in <module>()
833 833 ----> 1 ar.get()
834 834
835 835 /path/to/site-packages/IPython/parallel/client/asyncresult.pyc in get(self, timeout)
836 836 101 return self._result
837 837 102 else:
838 838 --> 103 raise self._exception
839 839 104 else:
840 840 105 raise error.TimeoutError("Result not ready.")
841 841
842 842 CompositeError: one or more exceptions from call to method: _execute
843 843 [0:apply]: ZeroDivisionError: integer division or modulo by zero
844 844 [1:apply]: ZeroDivisionError: integer division or modulo by zero
845 845 [2:apply]: ZeroDivisionError: integer division or modulo by zero
846 846 [3:apply]: ZeroDivisionError: integer division or modulo by zero
847 847
@@ -1,691 +1,691 b''
1 1 .. _parallel_process:
2 2
3 3 ===========================================
4 4 Starting the IPython controller and engines
5 5 ===========================================
6 6
7 7 To use IPython for parallel computing, you need to start one instance of
8 8 the controller and one or more instances of the engine. The controller
9 9 and each engine can run on different machines or on the same machine.
10 10 Because of this, there are many different possibilities.
11 11
12 12 Broadly speaking, there are two ways of going about starting a controller and engines:
13 13
14 14 * In an automated manner using the :command:`ipcluster` command.
15 15 * In a more manual way using the :command:`ipcontroller` and
16 16 :command:`ipengine` commands.
17 17
18 18 This document describes both of these methods. We recommend that new users
19 19 start with the :command:`ipcluster` command as it simplifies many common usage
20 20 cases.
21 21
22 22 General considerations
23 23 ======================
24 24
25 25 Before delving into the details about how you can start a controller and
26 26 engines using the various methods, we outline some of the general issues that
27 27 come up when starting the controller and engines. These things come up no
28 28 matter which method you use to start your IPython cluster.
29 29
30 30 If you are running engines on multiple machines, you will likely need to instruct the
31 31 controller to listen for connections on an external interface. This can be done by specifying
32 32 the ``ip`` argument on the command-line, or the ``HubFactory.ip`` configurable in
33 33 :file:`ipcontroller_config.py`.
34 34
35 35 If your machines are on a trusted network, you can safely instruct the controller to listen
36 36 on all public interfaces with::
37 37
38 $> ipcontroller ip=*
38 $> ipcontroller --ip=*
39 39
40 40 Or you can set the same behavior as the default by adding the following line to your :file:`ipcontroller_config.py`:
41 41
42 42 .. sourcecode:: python
43 43
44 44 c.HubFactory.ip = '*'
45 45
46 46 .. note::
47 47
48 48 Due to the lack of security in ZeroMQ, the controller will only listen for connections on
49 49 localhost by default. If you see Timeout errors on engines or clients, then the first
50 50 thing you should check is the ip address the controller is listening on, and make sure
51 51 that it is visible from the timing out machine.
52 52
53 53 .. seealso::
54 54
55 55 Our `notes <parallel_security>`_ on security in the new parallel computing code.
56 56
57 57 Let's say that you want to start the controller on ``host0`` and engines on
58 58 hosts ``host1``-``hostn``. The following steps are then required:
59 59
60 60 1. Start the controller on ``host0`` by running :command:`ipcontroller` on
61 61 ``host0``. The controller must be instructed to listen on an interface visible
62 62 to the engine machines, via the ``ip`` command-line argument or ``HubFactory.ip``
63 63 in :file:`ipcontroller_config.py`.
64 64 2. Move the JSON file (:file:`ipcontroller-engine.json`) created by the
65 65 controller from ``host0`` to hosts ``host1``-``hostn``.
66 66 3. Start the engines on hosts ``host1``-``hostn`` by running
67 67 :command:`ipengine`. This command has to be told where the JSON file
68 68 (:file:`ipcontroller-engine.json`) is located.
69 69
70 70 At this point, the controller and engines will be connected. By default, the JSON files
71 71 created by the controller are put into the :file:`~/.ipython/profile_default/security`
72 72 directory. If the engines share a filesystem with the controller, step 2 can be skipped as
73 73 the engines will automatically look at that location.
74 74
75 75 The final step required to actually use the running controller from a client is to move
76 76 the JSON file :file:`ipcontroller-client.json` from ``host0`` to any host where clients
77 77 will be run. If these file are put into the :file:`~/.ipython/profile_default/security`
78 78 directory of the client's host, they will be found automatically. Otherwise, the full path
79 79 to them has to be passed to the client's constructor.
80 80
81 81 Using :command:`ipcluster`
82 82 ===========================
83 83
84 84 The :command:`ipcluster` command provides a simple way of starting a
85 85 controller and engines in the following situations:
86 86
87 87 1. When the controller and engines are all run on localhost. This is useful
88 88 for testing or running on a multicore computer.
89 89 2. When engines are started using the :command:`mpiexec` command that comes
90 90 with most MPI [MPI]_ implementations
91 91 3. When engines are started using the PBS [PBS]_ batch system
92 92 (or other `qsub` systems, such as SGE).
93 93 4. When the controller is started on localhost and the engines are started on
94 94 remote nodes using :command:`ssh`.
95 95 5. When engines are started using the Windows HPC Server batch system.
96 96
97 97 .. note::
98 98
99 99 Currently :command:`ipcluster` requires that the
100 100 :file:`~/.ipython/profile_<name>/security` directory live on a shared filesystem that is
101 101 seen by both the controller and engines. If you don't have a shared file
102 102 system you will need to use :command:`ipcontroller` and
103 103 :command:`ipengine` directly.
104 104
105 105 Under the hood, :command:`ipcluster` just uses :command:`ipcontroller`
106 106 and :command:`ipengine` to perform the steps described above.
107 107
108 108 The simplest way to use ipcluster requires no configuration, and will
109 109 launch a controller and a number of engines on the local machine. For instance,
110 110 to start one controller and 4 engines on localhost, just do::
111 111
112 $ ipcluster start n=4
112 $ ipcluster start --n=4
113 113
114 114 To see other command line options, do::
115 115
116 116 $ ipcluster -h
117 117
118 118
119 119 Configuring an IPython cluster
120 120 ==============================
121 121
122 122 Cluster configurations are stored as `profiles`. You can create a new profile with::
123 123
124 $ ipython profile create --parallel profile=myprofile
124 $ ipython profile create --parallel --profile=myprofile
125 125
126 126 This will create the directory :file:`IPYTHONDIR/profile_myprofile`, and populate it
127 127 with the default configuration files for the three IPython cluster commands. Once
128 128 you edit those files, you can continue to call ipcluster/ipcontroller/ipengine
129 129 with no arguments beyond ``profile=myprofile``, and any configuration will be maintained.
130 130
131 131 There is no limit to the number of profiles you can have, so you can maintain a profile for each
132 132 of your common use cases. The default profile will be used whenever the
133 133 profile argument is not specified, so edit :file:`IPYTHONDIR/profile_default/*_config.py` to
134 134 represent your most common use case.
135 135
136 136 The configuration files are loaded with commented-out settings and explanations,
137 137 which should cover most of the available possibilities.
138 138
139 139 Using various batch systems with :command:`ipcluster`
140 140 -----------------------------------------------------
141 141
142 142 :command:`ipcluster` has a notion of Launchers that can start controllers
143 143 and engines with various remote execution schemes. Currently supported
144 144 models include :command:`ssh`, :command:`mpiexec`, PBS-style (Torque, SGE),
145 145 and Windows HPC Server.
146 146
147 147 .. note::
148 148
149 149 The Launchers and configuration are designed in such a way that advanced
150 150 users can subclass and configure them to fit their own system that we
151 151 have not yet supported (such as Condor)
152 152
153 153 Using :command:`ipcluster` in mpiexec/mpirun mode
154 154 --------------------------------------------------
155 155
156 156
157 157 The mpiexec/mpirun mode is useful if you:
158 158
159 159 1. Have MPI installed.
160 160 2. Your systems are configured to use the :command:`mpiexec` or
161 161 :command:`mpirun` commands to start MPI processes.
162 162
163 163 If these are satisfied, you can create a new profile::
164 164
165 $ ipython profile create --parallel profile=mpi
165 $ ipython profile create --parallel --profile=mpi
166 166
167 167 and edit the file :file:`IPYTHONDIR/profile_mpi/ipcluster_config.py`.
168 168
169 169 There, instruct ipcluster to use the MPIExec launchers by adding the lines:
170 170
171 171 .. sourcecode:: python
172 172
173 173 c.IPClusterEngines.engine_launcher = 'IPython.parallel.apps.launcher.MPIExecEngineSetLauncher'
174 174
175 175 If the default MPI configuration is correct, then you can now start your cluster, with::
176 176
177 $ ipcluster start n=4 profile=mpi
177 $ ipcluster start --n=4 --profile=mpi
178 178
179 179 This does the following:
180 180
181 181 1. Starts the IPython controller on current host.
182 182 2. Uses :command:`mpiexec` to start 4 engines.
183 183
184 184 If you have a reason to also start the Controller with mpi, you can specify:
185 185
186 186 .. sourcecode:: python
187 187
188 188 c.IPClusterStart.controller_launcher = 'IPython.parallel.apps.launcher.MPIExecControllerLauncher'
189 189
190 190 .. note::
191 191
192 192 The Controller *will not* be in the same MPI universe as the engines, so there is not
193 193 much reason to do this unless sysadmins demand it.
194 194
195 195 On newer MPI implementations (such as OpenMPI), this will work even if you
196 196 don't make any calls to MPI or call :func:`MPI_Init`. However, older MPI
197 197 implementations actually require each process to call :func:`MPI_Init` upon
198 198 starting. The easiest way of having this done is to install the mpi4py
199 199 [mpi4py]_ package and then specify the ``c.MPI.use`` option in :file:`ipengine_config.py`:
200 200
201 201 .. sourcecode:: python
202 202
203 203 c.MPI.use = 'mpi4py'
204 204
205 205 Unfortunately, even this won't work for some MPI implementations. If you are
206 206 having problems with this, you will likely have to use a custom Python
207 207 executable that itself calls :func:`MPI_Init` at the appropriate time.
208 208 Fortunately, mpi4py comes with such a custom Python executable that is easy to
209 209 install and use. However, this custom Python executable approach will not work
210 210 with :command:`ipcluster` currently.
211 211
212 212 More details on using MPI with IPython can be found :ref:`here <parallelmpi>`.
213 213
214 214
215 215 Using :command:`ipcluster` in PBS mode
216 216 ---------------------------------------
217 217
218 218 The PBS mode uses the Portable Batch System (PBS) to start the engines.
219 219
220 220 As usual, we will start by creating a fresh profile::
221 221
222 $ ipython profile create --parallel profile=pbs
222 $ ipython profile create --parallel --profile=pbs
223 223
224 224 And in :file:`ipcluster_config.py`, we will select the PBS launchers for the controller
225 225 and engines:
226 226
227 227 .. sourcecode:: python
228 228
229 229 c.IPClusterStart.controller_launcher = \
230 230 'IPython.parallel.apps.launcher.PBSControllerLauncher'
231 231 c.IPClusterEngines.engine_launcher = \
232 232 'IPython.parallel.apps.launcher.PBSEngineSetLauncher'
233 233
234 234 .. note::
235 235
236 236 Note that the configurable is IPClusterEngines for the engine launcher, and
237 237 IPClusterStart for the controller launcher. This is because the start command is a
238 238 subclass of the engine command, adding a controller launcher. Since it is a subclass,
239 239 any configuration made in IPClusterEngines is inherited by IPClusterStart unless it is
240 240 overridden.
241 241
242 242 IPython does provide simple default batch templates for PBS and SGE, but you may need
243 243 to specify your own. Here is a sample PBS script template:
244 244
245 245 .. sourcecode:: bash
246 246
247 247 #PBS -N ipython
248 248 #PBS -j oe
249 249 #PBS -l walltime=00:10:00
250 250 #PBS -l nodes={n/4}:ppn=4
251 251 #PBS -q {queue}
252 252
253 253 cd $PBS_O_WORKDIR
254 254 export PATH=$HOME/usr/local/bin
255 255 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
256 /usr/local/bin/mpiexec -n {n} ipengine profile_dir={profile_dir}
256 /usr/local/bin/mpiexec -n {n} ipengine --profile_dir={profile_dir}
257 257
258 258 There are a few important points about this template:
259 259
260 260 1. This template will be rendered at runtime using IPython's :class:`EvalFormatter`.
261 261 This is simply a subclass of :class:`string.Formatter` that allows simple expressions
262 262 on keys.
263 263
264 264 2. Instead of putting in the actual number of engines, use the notation
265 265 ``{n}`` to indicate the number of engines to be started. You can also use
266 266 expressions like ``{n/4}`` in the template to indicate the number of nodes.
267 267 There will always be ``{n}`` and ``{profile_dir}`` variables passed to the formatter.
268 268 These allow the batch system to know how many engines, and where the configuration
269 269 files reside. The same is true for the batch queue, with the template variable
270 270 ``{queue}``.
271 271
272 272 3. Any options to :command:`ipengine` can be given in the batch script
273 273 template, or in :file:`ipengine_config.py`.
274 274
275 275 4. Depending on the configuration of you system, you may have to set
276 276 environment variables in the script template.
277 277
278 278 The controller template should be similar, but simpler:
279 279
280 280 .. sourcecode:: bash
281 281
282 282 #PBS -N ipython
283 283 #PBS -j oe
284 284 #PBS -l walltime=00:10:00
285 285 #PBS -l nodes=1:ppn=4
286 286 #PBS -q {queue}
287 287
288 288 cd $PBS_O_WORKDIR
289 289 export PATH=$HOME/usr/local/bin
290 290 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
291 ipcontroller profile_dir={profile_dir}
291 ipcontroller --profile_dir={profile_dir}
292 292
293 293
294 294 Once you have created these scripts, save them with names like
295 295 :file:`pbs.engine.template`. Now you can load them into the :file:`ipcluster_config` with:
296 296
297 297 .. sourcecode:: python
298 298
299 299 c.PBSEngineSetLauncher.batch_template_file = "pbs.engine.template"
300 300
301 301 c.PBSControllerLauncher.batch_template_file = "pbs.controller.template"
302 302
303 303
304 304 Alternately, you can just define the templates as strings inside :file:`ipcluster_config`.
305 305
306 306 Whether you are using your own templates or our defaults, the extra configurables available are
307 307 the number of engines to launch (``{n}``, and the batch system queue to which the jobs are to be
308 308 submitted (``{queue}``)). These are configurables, and can be specified in
309 309 :file:`ipcluster_config`:
310 310
311 311 .. sourcecode:: python
312 312
313 313 c.PBSLauncher.queue = 'veryshort.q'
314 314 c.IPClusterEngines.n = 64
315 315
316 316 Note that assuming you are running PBS on a multi-node cluster, the Controller's default behavior
317 317 of listening only on localhost is likely too restrictive. In this case, also assuming the
318 318 nodes are safely behind a firewall, you can simply instruct the Controller to listen for
319 319 connections on all its interfaces, by adding in :file:`ipcontroller_config`:
320 320
321 321 .. sourcecode:: python
322 322
323 323 c.HubFactory.ip = '*'
324 324
325 325 You can now run the cluster with::
326 326
327 $ ipcluster start profile=pbs n=128
327 $ ipcluster start --profile=pbs --n=128
328 328
329 329 Additional configuration options can be found in the PBS section of :file:`ipcluster_config`.
330 330
331 331 .. note::
332 332
333 333 Due to the flexibility of configuration, the PBS launchers work with simple changes
334 334 to the template for other :command:`qsub`-using systems, such as Sun Grid Engine,
335 335 and with further configuration in similar batch systems like Condor.
336 336
337 337
338 338 Using :command:`ipcluster` in SSH mode
339 339 ---------------------------------------
340 340
341 341
342 342 The SSH mode uses :command:`ssh` to execute :command:`ipengine` on remote
343 343 nodes and :command:`ipcontroller` can be run remotely as well, or on localhost.
344 344
345 345 .. note::
346 346
347 347 When using this mode it highly recommended that you have set up SSH keys
348 348 and are using ssh-agent [SSH]_ for password-less logins.
349 349
350 350 As usual, we start by creating a clean profile::
351 351
352 $ ipython profile create --parallel profile=ssh
352 $ ipython profile create --parallel --profile=ssh
353 353
354 354 To use this mode, select the SSH launchers in :file:`ipcluster_config.py`:
355 355
356 356 .. sourcecode:: python
357 357
358 358 c.IPClusterEngines.engine_launcher = \
359 359 'IPython.parallel.apps.launcher.SSHEngineSetLauncher'
360 360 # and if the Controller is also to be remote:
361 361 c.IPClusterStart.controller_launcher = \
362 362 'IPython.parallel.apps.launcher.SSHControllerLauncher'
363 363
364 364
365 365 The controller's remote location and configuration can be specified:
366 366
367 367 .. sourcecode:: python
368 368
369 369 # Set the user and hostname for the controller
370 370 # c.SSHControllerLauncher.hostname = 'controller.example.com'
371 371 # c.SSHControllerLauncher.user = os.environ.get('USER','username')
372 372
373 373 # Set the arguments to be passed to ipcontroller
374 374 # note that remotely launched ipcontroller will not get the contents of
375 375 # the local ipcontroller_config.py unless it resides on the *remote host*
376 376 # in the location specified by the `profile_dir` argument.
377 # c.SSHControllerLauncher.program_args = ['--reuse', 'ip=*', 'profile_dir=/path/to/cd']
377 # c.SSHControllerLauncher.program_args = ['--reuse', '--ip=*', '--profile_dir=/path/to/cd']
378 378
379 379 .. note::
380 380
381 381 SSH mode does not do any file movement, so you will need to distribute configuration
382 382 files manually. To aid in this, the `reuse_files` flag defaults to True for ssh-launched
383 383 Controllers, so you will only need to do this once, unless you override this flag back
384 384 to False.
385 385
386 386 Engines are specified in a dictionary, by hostname and the number of engines to be run
387 387 on that host.
388 388
389 389 .. sourcecode:: python
390 390
391 391 c.SSHEngineSetLauncher.engines = { 'host1.example.com' : 2,
392 392 'host2.example.com' : 5,
393 'host3.example.com' : (1, ['profile_dir=/home/different/location']),
393 'host3.example.com' : (1, ['--profile_dir=/home/different/location']),
394 394 'host4.example.com' : 8 }
395 395
396 396 * The `engines` dict, where the keys are the host we want to run engines on and
397 397 the value is the number of engines to run on that host.
398 398 * on host3, the value is a tuple, where the number of engines is first, and the arguments
399 399 to be passed to :command:`ipengine` are the second element.
400 400
401 401 For engines without explicitly specified arguments, the default arguments are set in
402 402 a single location:
403 403
404 404 .. sourcecode:: python
405 405
406 c.SSHEngineSetLauncher.engine_args = ['profile_dir=/path/to/profile_ssh']
406 c.SSHEngineSetLauncher.engine_args = ['--profile_dir=/path/to/profile_ssh']
407 407
408 408 Current limitations of the SSH mode of :command:`ipcluster` are:
409 409
410 410 * Untested on Windows. Would require a working :command:`ssh` on Windows.
411 411 Also, we are using shell scripts to setup and execute commands on remote
412 412 hosts.
413 413 * No file movement - This is a regression from 0.10, which moved connection files
414 414 around with scp. This will be improved, but not before 0.11 release.
415 415
416 416 Using the :command:`ipcontroller` and :command:`ipengine` commands
417 417 ====================================================================
418 418
419 419 It is also possible to use the :command:`ipcontroller` and :command:`ipengine`
420 420 commands to start your controller and engines. This approach gives you full
421 421 control over all aspects of the startup process.
422 422
423 423 Starting the controller and engine on your local machine
424 424 --------------------------------------------------------
425 425
426 426 To use :command:`ipcontroller` and :command:`ipengine` to start things on your
427 427 local machine, do the following.
428 428
429 429 First start the controller::
430 430
431 431 $ ipcontroller
432 432
433 433 Next, start however many instances of the engine you want using (repeatedly)
434 434 the command::
435 435
436 436 $ ipengine
437 437
438 438 The engines should start and automatically connect to the controller using the
439 439 JSON files in :file:`~/.ipython/profile_default/security`. You are now ready to use the
440 440 controller and engines from IPython.
441 441
442 442 .. warning::
443 443
444 444 The order of the above operations may be important. You *must*
445 445 start the controller before the engines, unless you are reusing connection
446 446 information (via ``--reuse``), in which case ordering is not important.
447 447
448 448 .. note::
449 449
450 450 On some platforms (OS X), to put the controller and engine into the
451 451 background you may need to give these commands in the form ``(ipcontroller
452 452 &)`` and ``(ipengine &)`` (with the parentheses) for them to work
453 453 properly.
454 454
455 455 Starting the controller and engines on different hosts
456 456 ------------------------------------------------------
457 457
458 458 When the controller and engines are running on different hosts, things are
459 459 slightly more complicated, but the underlying ideas are the same:
460 460
461 461 1. Start the controller on a host using :command:`ipcontroller`. The controller must be
462 462 instructed to listen on an interface visible to the engine machines, via the ``ip``
463 463 command-line argument or ``HubFactory.ip`` in :file:`ipcontroller_config.py`.
464 464 2. Copy :file:`ipcontroller-engine.json` from :file:`~/.ipython/profile_<name>/security` on
465 465 the controller's host to the host where the engines will run.
466 466 3. Use :command:`ipengine` on the engine's hosts to start the engines.
467 467
468 468 The only thing you have to be careful of is to tell :command:`ipengine` where
469 469 the :file:`ipcontroller-engine.json` file is located. There are two ways you
470 470 can do this:
471 471
472 472 * Put :file:`ipcontroller-engine.json` in the :file:`~/.ipython/profile_<name>/security`
473 473 directory on the engine's host, where it will be found automatically.
474 * Call :command:`ipengine` with the ``file=full_path_to_the_file``
474 * Call :command:`ipengine` with the ``--file=full_path_to_the_file``
475 475 flag.
476 476
477 477 The ``file`` flag works like this::
478 478
479 $ ipengine file=/path/to/my/ipcontroller-engine.json
479 $ ipengine --file=/path/to/my/ipcontroller-engine.json
480 480
481 481 .. note::
482 482
483 483 If the controller's and engine's hosts all have a shared file system
484 484 (:file:`~/.ipython/profile_<name>/security` is the same on all of them), then things
485 485 will just work!
486 486
487 487 Make JSON files persistent
488 488 --------------------------
489 489
490 490 At fist glance it may seem that that managing the JSON files is a bit
491 491 annoying. Going back to the house and key analogy, copying the JSON around
492 492 each time you start the controller is like having to make a new key every time
493 493 you want to unlock the door and enter your house. As with your house, you want
494 494 to be able to create the key (or JSON file) once, and then simply use it at
495 495 any point in the future.
496 496
497 497 To do this, the only thing you have to do is specify the `--reuse` flag, so that
498 498 the connection information in the JSON files remains accurate::
499 499
500 500 $ ipcontroller --reuse
501 501
502 502 Then, just copy the JSON files over the first time and you are set. You can
503 503 start and stop the controller and engines any many times as you want in the
504 504 future, just make sure to tell the controller to reuse the file.
505 505
506 506 .. note::
507 507
508 508 You may ask the question: what ports does the controller listen on if you
509 509 don't tell is to use specific ones? The default is to use high random port
510 510 numbers. We do this for two reasons: i) to increase security through
511 511 obscurity and ii) to multiple controllers on a given host to start and
512 512 automatically use different ports.
513 513
514 514 Log files
515 515 ---------
516 516
517 517 All of the components of IPython have log files associated with them.
518 518 These log files can be extremely useful in debugging problems with
519 519 IPython and can be found in the directory :file:`~/.ipython/profile_<name>/log`.
520 520 Sending the log files to us will often help us to debug any problems.
521 521
522 522
523 523 Configuring `ipcontroller`
524 524 ---------------------------
525 525
526 526 The IPython Controller takes its configuration from the file :file:`ipcontroller_config.py`
527 527 in the active profile directory.
528 528
529 529 Ports and addresses
530 530 *******************
531 531
532 532 In many cases, you will want to configure the Controller's network identity. By default,
533 533 the Controller listens only on loopback, which is the most secure but often impractical.
534 534 To instruct the controller to listen on a specific interface, you can set the
535 535 :attr:`HubFactory.ip` trait. To listen on all interfaces, simply specify:
536 536
537 537 .. sourcecode:: python
538 538
539 539 c.HubFactory.ip = '*'
540 540
541 541 When connecting to a Controller that is listening on loopback or behind a firewall, it may
542 542 be necessary to specify an SSH server to use for tunnels, and the external IP of the
543 543 Controller. If you specified that the HubFactory listen on loopback, or all interfaces,
544 544 then IPython will try to guess the external IP. If you are on a system with VM network
545 545 devices, or many interfaces, this guess may be incorrect. In these cases, you will want
546 546 to specify the 'location' of the Controller. This is the IP of the machine the Controller
547 547 is on, as seen by the clients, engines, or the SSH server used to tunnel connections.
548 548
549 549 For example, to set up a cluster with a Controller on a work node, using ssh tunnels
550 550 through the login node, an example :file:`ipcontroller_config.py` might contain:
551 551
552 552 .. sourcecode:: python
553 553
554 554 # allow connections on all interfaces from engines
555 555 # engines on the same node will use loopback, while engines
556 556 # from other nodes will use an external IP
557 557 c.HubFactory.ip = '*'
558 558
559 559 # you typically only need to specify the location when there are extra
560 560 # interfaces that may not be visible to peer nodes (e.g. VM interfaces)
561 561 c.HubFactory.location = '10.0.1.5'
562 562 # or to get an automatic value, try this:
563 563 import socket
564 564 ex_ip = socket.gethostbyname_ex(socket.gethostname())[-1][0]
565 565 c.HubFactory.location = ex_ip
566 566
567 567 # now instruct clients to use the login node for SSH tunnels:
568 568 c.HubFactory.ssh_server = 'login.mycluster.net'
569 569
570 570 After doing this, your :file:`ipcontroller-client.json` file will look something like this:
571 571
572 572 .. this can be Python, despite the fact that it's actually JSON, because it's
573 573 .. still valid Python
574 574
575 575 .. sourcecode:: python
576 576
577 577 {
578 578 "url":"tcp:\/\/*:43447",
579 579 "exec_key":"9c7779e4-d08a-4c3b-ba8e-db1f80b562c1",
580 580 "ssh":"login.mycluster.net",
581 581 "location":"10.0.1.5"
582 582 }
583 583
584 584 Then this file will be all you need for a client to connect to the controller, tunneling
585 585 SSH connections through login.mycluster.net.
586 586
587 587 Database Backend
588 588 ****************
589 589
590 590 The Hub stores all messages and results passed between Clients and Engines.
591 591 For large and/or long-running clusters, it would be unreasonable to keep all
592 592 of this information in memory. For this reason, we have two database backends:
593 593 [MongoDB]_ via PyMongo_, and SQLite with the stdlib :py:mod:`sqlite`.
594 594
595 595 MongoDB is our design target, and the dict-like model it uses has driven our design. As far
596 596 as we are concerned, BSON can be considered essentially the same as JSON, adding support
597 597 for binary data and datetime objects, and any new database backend must support the same
598 598 data types.
599 599
600 600 .. seealso::
601 601
602 602 MongoDB `BSON doc <http://www.mongodb.org/display/DOCS/BSON>`_
603 603
604 604 To use one of these backends, you must set the :attr:`HubFactory.db_class` trait:
605 605
606 606 .. sourcecode:: python
607 607
608 608 # for a simple dict-based in-memory implementation, use dictdb
609 609 # This is the default and the fastest, since it doesn't involve the filesystem
610 610 c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.DictDB'
611 611
612 612 # To use MongoDB:
613 613 c.HubFactory.db_class = 'IPython.parallel.controller.mongodb.MongoDB'
614 614
615 615 # and SQLite:
616 616 c.HubFactory.db_class = 'IPython.parallel.controller.sqlitedb.SQLiteDB'
617 617
618 618 When using the proper databases, you can actually allow for tasks to persist from
619 619 one session to the next by specifying the MongoDB database or SQLite table in
620 620 which tasks are to be stored. The default is to use a table named for the Hub's Session,
621 621 which is a UUID, and thus different every time.
622 622
623 623 .. sourcecode:: python
624 624
625 625 # To keep persistant task history in MongoDB:
626 626 c.MongoDB.database = 'tasks'
627 627
628 628 # and in SQLite:
629 629 c.SQLiteDB.table = 'tasks'
630 630
631 631
632 632 Since MongoDB servers can be running remotely or configured to listen on a particular port,
633 633 you can specify any arguments you may need to the PyMongo `Connection
634 634 <http://api.mongodb.org/python/1.9/api/pymongo/connection.html#pymongo.connection.Connection>`_:
635 635
636 636 .. sourcecode:: python
637 637
638 638 # positional args to pymongo.Connection
639 639 c.MongoDB.connection_args = []
640 640
641 641 # keyword args to pymongo.Connection
642 642 c.MongoDB.connection_kwargs = {}
643 643
644 644 .. _MongoDB: http://www.mongodb.org
645 645 .. _PyMongo: http://api.mongodb.org/python/1.9/
646 646
647 647 Configuring `ipengine`
648 648 -----------------------
649 649
650 650 The IPython Engine takes its configuration from the file :file:`ipengine_config.py`
651 651
652 652 The Engine itself also has some amount of configuration. Most of this
653 653 has to do with initializing MPI or connecting to the controller.
654 654
655 655 To instruct the Engine to initialize with an MPI environment set up by
656 656 mpi4py, add:
657 657
658 658 .. sourcecode:: python
659 659
660 660 c.MPI.use = 'mpi4py'
661 661
662 662 In this case, the Engine will use our default mpi4py init script to set up
663 663 the MPI environment prior to exection. We have default init scripts for
664 664 mpi4py and pytrilinos. If you want to specify your own code to be run
665 665 at the beginning, specify `c.MPI.init_script`.
666 666
667 667 You can also specify a file or python command to be run at startup of the
668 668 Engine:
669 669
670 670 .. sourcecode:: python
671 671
672 672 c.IPEngineApp.startup_script = u'/path/to/my/startup.py'
673 673
674 674 c.IPEngineApp.startup_command = 'import numpy, scipy, mpi4py'
675 675
676 676 These commands/files will be run again, after each
677 677
678 678 It's also useful on systems with shared filesystems to run the engines
679 679 in some scratch directory. This can be set with:
680 680
681 681 .. sourcecode:: python
682 682
683 683 c.IPEngineApp.work_dir = u'/path/to/scratch/'
684 684
685 685
686 686
687 687 .. [MongoDB] MongoDB database http://www.mongodb.org
688 688
689 689 .. [PBS] Portable Batch System http://www.openpbs.org
690 690
691 691 .. [SSH] SSH-Agent http://en.wikipedia.org/wiki/ssh-agent
@@ -1,442 +1,442 b''
1 1 .. _parallel_task:
2 2
3 3 ==========================
4 4 The IPython task interface
5 5 ==========================
6 6
7 7 The task interface to the cluster presents the engines as a fault tolerant,
8 8 dynamic load-balanced system of workers. Unlike the multiengine interface, in
9 9 the task interface the user have no direct access to individual engines. By
10 10 allowing the IPython scheduler to assign work, this interface is simultaneously
11 11 simpler and more powerful.
12 12
13 13 Best of all, the user can use both of these interfaces running at the same time
14 14 to take advantage of their respective strengths. When the user can break up
15 15 the user's work into segments that do not depend on previous execution, the
16 16 task interface is ideal. But it also has more power and flexibility, allowing
17 17 the user to guide the distribution of jobs, without having to assign tasks to
18 18 engines explicitly.
19 19
20 20 Starting the IPython controller and engines
21 21 ===========================================
22 22
23 23 To follow along with this tutorial, you will need to start the IPython
24 24 controller and four IPython engines. The simplest way of doing this is to use
25 25 the :command:`ipcluster` command::
26 26
27 $ ipcluster start n=4
27 $ ipcluster start --n=4
28 28
29 29 For more detailed information about starting the controller and engines, see
30 30 our :ref:`introduction <ip1par>` to using IPython for parallel computing.
31 31
32 32 Creating a ``Client`` instance
33 33 ==============================
34 34
35 35 The first step is to import the IPython :mod:`IPython.parallel`
36 36 module and then create a :class:`.Client` instance, and we will also be using
37 37 a :class:`LoadBalancedView`, here called `lview`:
38 38
39 39 .. sourcecode:: ipython
40 40
41 41 In [1]: from IPython.parallel import Client
42 42
43 43 In [2]: rc = Client()
44 44
45 45
46 46 This form assumes that the controller was started on localhost with default
47 47 configuration. If not, the location of the controller must be given as an
48 48 argument to the constructor:
49 49
50 50 .. sourcecode:: ipython
51 51
52 52 # for a visible LAN controller listening on an external port:
53 53 In [2]: rc = Client('tcp://192.168.1.16:10101')
54 54 # or to connect with a specific profile you have set up:
55 55 In [3]: rc = Client(profile='mpi')
56 56
57 57 For load-balanced execution, we will make use of a :class:`LoadBalancedView` object, which can
58 58 be constructed via the client's :meth:`load_balanced_view` method:
59 59
60 60 .. sourcecode:: ipython
61 61
62 62 In [4]: lview = rc.load_balanced_view() # default load-balanced view
63 63
64 64 .. seealso::
65 65
66 66 For more information, see the in-depth explanation of :ref:`Views <parallel_details>`.
67 67
68 68
69 69 Quick and easy parallelism
70 70 ==========================
71 71
72 72 In many cases, you simply want to apply a Python function to a sequence of
73 73 objects, but *in parallel*. Like the multiengine interface, these can be
74 74 implemented via the task interface. The exact same tools can perform these
75 75 actions in load-balanced ways as well as multiplexed ways: a parallel version
76 76 of :func:`map` and :func:`@parallel` function decorator. If one specifies the
77 77 argument `balanced=True`, then they are dynamically load balanced. Thus, if the
78 78 execution time per item varies significantly, you should use the versions in
79 79 the task interface.
80 80
81 81 Parallel map
82 82 ------------
83 83
84 84 To load-balance :meth:`map`,simply use a LoadBalancedView:
85 85
86 86 .. sourcecode:: ipython
87 87
88 88 In [62]: lview.block = True
89 89
90 90 In [63]: serial_result = map(lambda x:x**10, range(32))
91 91
92 92 In [64]: parallel_result = lview.map(lambda x:x**10, range(32))
93 93
94 94 In [65]: serial_result==parallel_result
95 95 Out[65]: True
96 96
97 97 Parallel function decorator
98 98 ---------------------------
99 99
100 100 Parallel functions are just like normal function, but they can be called on
101 101 sequences and *in parallel*. The multiengine interface provides a decorator
102 102 that turns any Python function into a parallel function:
103 103
104 104 .. sourcecode:: ipython
105 105
106 106 In [10]: @lview.parallel()
107 107 ....: def f(x):
108 108 ....: return 10.0*x**4
109 109 ....:
110 110
111 111 In [11]: f.map(range(32)) # this is done in parallel
112 112 Out[11]: [0.0,10.0,160.0,...]
113 113
114 114 .. _parallel_dependencies:
115 115
116 116 Dependencies
117 117 ============
118 118
119 119 Often, pure atomic load-balancing is too primitive for your work. In these cases, you
120 120 may want to associate some kind of `Dependency` that describes when, where, or whether
121 121 a task can be run. In IPython, we provide two types of dependencies:
122 122 `Functional Dependencies`_ and `Graph Dependencies`_
123 123
124 124 .. note::
125 125
126 126 It is important to note that the pure ZeroMQ scheduler does not support dependencies,
127 127 and you will see errors or warnings if you try to use dependencies with the pure
128 128 scheduler.
129 129
130 130 Functional Dependencies
131 131 -----------------------
132 132
133 133 Functional dependencies are used to determine whether a given engine is capable of running
134 134 a particular task. This is implemented via a special :class:`Exception` class,
135 135 :class:`UnmetDependency`, found in `IPython.parallel.error`. Its use is very simple:
136 136 if a task fails with an UnmetDependency exception, then the scheduler, instead of relaying
137 137 the error up to the client like any other error, catches the error, and submits the task
138 138 to a different engine. This will repeat indefinitely, and a task will never be submitted
139 139 to a given engine a second time.
140 140
141 141 You can manually raise the :class:`UnmetDependency` yourself, but IPython has provided
142 142 some decorators for facilitating this behavior.
143 143
144 144 There are two decorators and a class used for functional dependencies:
145 145
146 146 .. sourcecode:: ipython
147 147
148 148 In [9]: from IPython.parallel import depend, require, dependent
149 149
150 150 @require
151 151 ********
152 152
153 153 The simplest sort of dependency is requiring that a Python module is available. The
154 154 ``@require`` decorator lets you define a function that will only run on engines where names
155 155 you specify are importable:
156 156
157 157 .. sourcecode:: ipython
158 158
159 159 In [10]: @require('numpy', 'zmq')
160 160 ...: def myfunc():
161 161 ...: return dostuff()
162 162
163 163 Now, any time you apply :func:`myfunc`, the task will only run on a machine that has
164 164 numpy and pyzmq available, and when :func:`myfunc` is called, numpy and zmq will be imported.
165 165
166 166 @depend
167 167 *******
168 168
169 169 The ``@depend`` decorator lets you decorate any function with any *other* function to
170 170 evaluate the dependency. The dependency function will be called at the start of the task,
171 171 and if it returns ``False``, then the dependency will be considered unmet, and the task
172 172 will be assigned to another engine. If the dependency returns *anything other than
173 173 ``False``*, the rest of the task will continue.
174 174
175 175 .. sourcecode:: ipython
176 176
177 177 In [10]: def platform_specific(plat):
178 178 ...: import sys
179 179 ...: return sys.platform == plat
180 180
181 181 In [11]: @depend(platform_specific, 'darwin')
182 182 ...: def mactask():
183 183 ...: do_mac_stuff()
184 184
185 185 In [12]: @depend(platform_specific, 'nt')
186 186 ...: def wintask():
187 187 ...: do_windows_stuff()
188 188
189 189 In this case, any time you apply ``mytask``, it will only run on an OSX machine.
190 190 ``@depend`` is just like ``apply``, in that it has a ``@depend(f,*args,**kwargs)``
191 191 signature.
192 192
193 193 dependents
194 194 **********
195 195
196 196 You don't have to use the decorators on your tasks, if for instance you may want
197 197 to run tasks with a single function but varying dependencies, you can directly construct
198 198 the :class:`dependent` object that the decorators use:
199 199
200 200 .. sourcecode::ipython
201 201
202 202 In [13]: def mytask(*args):
203 203 ...: dostuff()
204 204
205 205 In [14]: mactask = dependent(mytask, platform_specific, 'darwin')
206 206 # this is the same as decorating the declaration of mytask with @depend
207 207 # but you can do it again:
208 208
209 209 In [15]: wintask = dependent(mytask, platform_specific, 'nt')
210 210
211 211 # in general:
212 212 In [16]: t = dependent(f, g, *dargs, **dkwargs)
213 213
214 214 # is equivalent to:
215 215 In [17]: @depend(g, *dargs, **dkwargs)
216 216 ...: def t(a,b,c):
217 217 ...: # contents of f
218 218
219 219 Graph Dependencies
220 220 ------------------
221 221
222 222 Sometimes you want to restrict the time and/or location to run a given task as a function
223 223 of the time and/or location of other tasks. This is implemented via a subclass of
224 224 :class:`set`, called a :class:`Dependency`. A Dependency is just a set of `msg_ids`
225 225 corresponding to tasks, and a few attributes to guide how to decide when the Dependency
226 226 has been met.
227 227
228 228 The switches we provide for interpreting whether a given dependency set has been met:
229 229
230 230 any|all
231 231 Whether the dependency is considered met if *any* of the dependencies are done, or
232 232 only after *all* of them have finished. This is set by a Dependency's :attr:`all`
233 233 boolean attribute, which defaults to ``True``.
234 234
235 235 success [default: True]
236 236 Whether to consider tasks that succeeded as fulfilling dependencies.
237 237
238 238 failure [default : False]
239 239 Whether to consider tasks that failed as fulfilling dependencies.
240 240 using `failure=True,success=False` is useful for setting up cleanup tasks, to be run
241 241 only when tasks have failed.
242 242
243 243 Sometimes you want to run a task after another, but only if that task succeeded. In this case,
244 244 ``success`` should be ``True`` and ``failure`` should be ``False``. However sometimes you may
245 245 not care whether the task succeeds, and always want the second task to run, in which case you
246 246 should use `success=failure=True`. The default behavior is to only use successes.
247 247
248 248 There are other switches for interpretation that are made at the *task* level. These are
249 249 specified via keyword arguments to the client's :meth:`apply` method.
250 250
251 251 after,follow
252 252 You may want to run a task *after* a given set of dependencies have been run and/or
253 253 run it *where* another set of dependencies are met. To support this, every task has an
254 254 `after` dependency to restrict time, and a `follow` dependency to restrict
255 255 destination.
256 256
257 257 timeout
258 258 You may also want to set a time-limit for how long the scheduler should wait before a
259 259 task's dependencies are met. This is done via a `timeout`, which defaults to 0, which
260 260 indicates that the task should never timeout. If the timeout is reached, and the
261 261 scheduler still hasn't been able to assign the task to an engine, the task will fail
262 262 with a :class:`DependencyTimeout`.
263 263
264 264 .. note::
265 265
266 266 Dependencies only work within the task scheduler. You cannot instruct a load-balanced
267 267 task to run after a job submitted via the MUX interface.
268 268
269 269 The simplest form of Dependencies is with `all=True,success=True,failure=False`. In these cases,
270 270 you can skip using Dependency objects, and just pass msg_ids or AsyncResult objects as the
271 271 `follow` and `after` keywords to :meth:`client.apply`:
272 272
273 273 .. sourcecode:: ipython
274 274
275 275 In [14]: client.block=False
276 276
277 277 In [15]: ar = lview.apply(f, args, kwargs)
278 278
279 279 In [16]: ar2 = lview.apply(f2)
280 280
281 281 In [17]: ar3 = lview.apply_with_flags(f3, after=[ar,ar2])
282 282
283 283 In [17]: ar4 = lview.apply_with_flags(f3, follow=[ar], timeout=2.5)
284 284
285 285
286 286 .. seealso::
287 287
288 288 Some parallel workloads can be described as a `Directed Acyclic Graph
289 289 <http://en.wikipedia.org/wiki/Directed_acyclic_graph>`_, or DAG. See :ref:`DAG
290 290 Dependencies <dag_dependencies>` for an example demonstrating how to use map a NetworkX DAG
291 291 onto task dependencies.
292 292
293 293
294 294
295 295
296 296 Impossible Dependencies
297 297 ***********************
298 298
299 299 The schedulers do perform some analysis on graph dependencies to determine whether they
300 300 are not possible to be met. If the scheduler does discover that a dependency cannot be
301 301 met, then the task will fail with an :class:`ImpossibleDependency` error. This way, if the
302 302 scheduler realized that a task can never be run, it won't sit indefinitely in the
303 303 scheduler clogging the pipeline.
304 304
305 305 The basic cases that are checked:
306 306
307 307 * depending on nonexistent messages
308 308 * `follow` dependencies were run on more than one machine and `all=True`
309 309 * any dependencies failed and `all=True,success=True,failures=False`
310 310 * all dependencies failed and `all=False,success=True,failure=False`
311 311
312 312 .. warning::
313 313
314 314 This analysis has not been proven to be rigorous, so it is likely possible for tasks
315 315 to become impossible to run in obscure situations, so a timeout may be a good choice.
316 316
317 317
318 318 Retries and Resubmit
319 319 ====================
320 320
321 321 Retries
322 322 -------
323 323
324 324 Another flag for tasks is `retries`. This is an integer, specifying how many times
325 325 a task should be resubmitted after failure. This is useful for tasks that should still run
326 326 if their engine was shutdown, or may have some statistical chance of failing. The default
327 327 is to not retry tasks.
328 328
329 329 Resubmit
330 330 --------
331 331
332 332 Sometimes you may want to re-run a task. This could be because it failed for some reason, and
333 333 you have fixed the error, or because you want to restore the cluster to an interrupted state.
334 334 For this, the :class:`Client` has a :meth:`rc.resubmit` method. This simply takes one or more
335 335 msg_ids, and returns an :class:`AsyncHubResult` for the result(s). You cannot resubmit
336 336 a task that is pending - only those that have finished, either successful or unsuccessful.
337 337
338 338 .. _parallel_schedulers:
339 339
340 340 Schedulers
341 341 ==========
342 342
343 343 There are a variety of valid ways to determine where jobs should be assigned in a
344 344 load-balancing situation. In IPython, we support several standard schemes, and
345 345 even make it easy to define your own. The scheme can be selected via the ``scheme``
346 346 argument to :command:`ipcontroller`, or in the :attr:`TaskScheduler.schemename` attribute
347 347 of a controller config object.
348 348
349 349 The built-in routing schemes:
350 350
351 351 To select one of these schemes, simply do::
352 352
353 $ ipcontroller scheme=<schemename>
353 $ ipcontroller --scheme=<schemename>
354 354 for instance:
355 $ ipcontroller scheme=lru
355 $ ipcontroller --scheme=lru
356 356
357 357 lru: Least Recently Used
358 358
359 359 Always assign work to the least-recently-used engine. A close relative of
360 360 round-robin, it will be fair with respect to the number of tasks, agnostic
361 361 with respect to runtime of each task.
362 362
363 363 plainrandom: Plain Random
364 364
365 365 Randomly picks an engine on which to run.
366 366
367 367 twobin: Two-Bin Random
368 368
369 369 **Requires numpy**
370 370
371 371 Pick two engines at random, and use the LRU of the two. This is known to be better
372 372 than plain random in many cases, but requires a small amount of computation.
373 373
374 374 leastload: Least Load
375 375
376 376 **This is the default scheme**
377 377
378 378 Always assign tasks to the engine with the fewest outstanding tasks (LRU breaks tie).
379 379
380 380 weighted: Weighted Two-Bin Random
381 381
382 382 **Requires numpy**
383 383
384 384 Pick two engines at random using the number of outstanding tasks as inverse weights,
385 385 and use the one with the lower load.
386 386
387 387
388 388 Pure ZMQ Scheduler
389 389 ------------------
390 390
391 391 For maximum throughput, the 'pure' scheme is not Python at all, but a C-level
392 392 :class:`MonitoredQueue` from PyZMQ, which uses a ZeroMQ ``XREQ`` socket to perform all
393 393 load-balancing. This scheduler does not support any of the advanced features of the Python
394 394 :class:`.Scheduler`.
395 395
396 396 Disabled features when using the ZMQ Scheduler:
397 397
398 398 * Engine unregistration
399 399 Task farming will be disabled if an engine unregisters.
400 400 Further, if an engine is unregistered during computation, the scheduler may not recover.
401 401 * Dependencies
402 402 Since there is no Python logic inside the Scheduler, routing decisions cannot be made
403 403 based on message content.
404 404 * Early destination notification
405 405 The Python schedulers know which engine gets which task, and notify the Hub. This
406 406 allows graceful handling of Engines coming and going. There is no way to know
407 407 where ZeroMQ messages have gone, so there is no way to know what tasks are on which
408 408 engine until they *finish*. This makes recovery from engine shutdown very difficult.
409 409
410 410
411 411 .. note::
412 412
413 413 TODO: performance comparisons
414 414
415 415
416 416
417 417
418 418 More details
419 419 ============
420 420
421 421 The :class:`LoadBalancedView` has many more powerful features that allow quite a bit
422 422 of flexibility in how tasks are defined and run. The next places to look are
423 423 in the following classes:
424 424
425 425 * :class:`~IPython.parallel.client.view.LoadBalancedView`
426 426 * :class:`~IPython.parallel.client.asyncresult.AsyncResult`
427 427 * :meth:`~IPython.parallel.client.view.LoadBalancedView.apply`
428 428 * :mod:`~IPython.parallel.controller.dependency`
429 429
430 430 The following is an overview of how to use these classes together:
431 431
432 432 1. Create a :class:`Client` and :class:`LoadBalancedView`
433 433 2. Define some functions to be run as tasks
434 434 3. Submit your tasks to using the :meth:`apply` method of your
435 435 :class:`LoadBalancedView` instance.
436 436 4. Use :meth:`Client.get_result` to get the results of the
437 437 tasks, or use the :meth:`AsyncResult.get` method of the results to wait
438 438 for and then receive the results.
439 439
440 440 .. seealso::
441 441
442 442 A demo of :ref:`DAG Dependencies <dag_dependencies>` with NetworkX and IPython.
@@ -1,334 +1,334 b''
1 1 ============================================
2 2 Getting started with Windows HPC Server 2008
3 3 ============================================
4 4
5 5 .. note::
6 6
7 7 Not adapted to zmq yet
8 8
9 9 Introduction
10 10 ============
11 11
12 12 The Python programming language is an increasingly popular language for
13 13 numerical computing. This is due to a unique combination of factors. First,
14 14 Python is a high-level and *interactive* language that is well matched to
15 15 interactive numerical work. Second, it is easy (often times trivial) to
16 16 integrate legacy C/C++/Fortran code into Python. Third, a large number of
17 17 high-quality open source projects provide all the needed building blocks for
18 18 numerical computing: numerical arrays (NumPy), algorithms (SciPy), 2D/3D
19 19 Visualization (Matplotlib, Mayavi, Chaco), Symbolic Mathematics (Sage, Sympy)
20 20 and others.
21 21
22 22 The IPython project is a core part of this open-source toolchain and is
23 23 focused on creating a comprehensive environment for interactive and
24 24 exploratory computing in the Python programming language. It enables all of
25 25 the above tools to be used interactively and consists of two main components:
26 26
27 27 * An enhanced interactive Python shell with support for interactive plotting
28 28 and visualization.
29 29 * An architecture for interactive parallel computing.
30 30
31 31 With these components, it is possible to perform all aspects of a parallel
32 32 computation interactively. This type of workflow is particularly relevant in
33 33 scientific and numerical computing where algorithms, code and data are
34 34 continually evolving as the user/developer explores a problem. The broad
35 35 treads in computing (commodity clusters, multicore, cloud computing, etc.)
36 36 make these capabilities of IPython particularly relevant.
37 37
38 38 While IPython is a cross platform tool, it has particularly strong support for
39 39 Windows based compute clusters running Windows HPC Server 2008. This document
40 40 describes how to get started with IPython on Windows HPC Server 2008. The
41 41 content and emphasis here is practical: installing IPython, configuring
42 42 IPython to use the Windows job scheduler and running example parallel programs
43 43 interactively. A more complete description of IPython's parallel computing
44 44 capabilities can be found in IPython's online documentation
45 45 (http://ipython.org/documentation.html).
46 46
47 47 Setting up your Windows cluster
48 48 ===============================
49 49
50 50 This document assumes that you already have a cluster running Windows
51 51 HPC Server 2008. Here is a broad overview of what is involved with setting up
52 52 such a cluster:
53 53
54 54 1. Install Windows Server 2008 on the head and compute nodes in the cluster.
55 55 2. Setup the network configuration on each host. Each host should have a
56 56 static IP address.
57 57 3. On the head node, activate the "Active Directory Domain Services" role
58 58 and make the head node the domain controller.
59 59 4. Join the compute nodes to the newly created Active Directory (AD) domain.
60 60 5. Setup user accounts in the domain with shared home directories.
61 61 6. Install the HPC Pack 2008 on the head node to create a cluster.
62 62 7. Install the HPC Pack 2008 on the compute nodes.
63 63
64 64 More details about installing and configuring Windows HPC Server 2008 can be
65 65 found on the Windows HPC Home Page (http://www.microsoft.com/hpc). Regardless
66 66 of what steps you follow to set up your cluster, the remainder of this
67 67 document will assume that:
68 68
69 69 * There are domain users that can log on to the AD domain and submit jobs
70 70 to the cluster scheduler.
71 71 * These domain users have shared home directories. While shared home
72 72 directories are not required to use IPython, they make it much easier to
73 73 use IPython.
74 74
75 75 Installation of IPython and its dependencies
76 76 ============================================
77 77
78 78 IPython and all of its dependencies are freely available and open source.
79 79 These packages provide a powerful and cost-effective approach to numerical and
80 80 scientific computing on Windows. The following dependencies are needed to run
81 81 IPython on Windows:
82 82
83 83 * Python 2.6 or 2.7 (http://www.python.org)
84 84 * pywin32 (http://sourceforge.net/projects/pywin32/)
85 85 * PyReadline (https://launchpad.net/pyreadline)
86 86 * pyzmq (http://github.com/zeromq/pyzmq/downloads)
87 87 * IPython (http://ipython.org)
88 88
89 89 In addition, the following dependencies are needed to run the demos described
90 90 in this document.
91 91
92 92 * NumPy and SciPy (http://www.scipy.org)
93 93 * Matplotlib (http://matplotlib.sourceforge.net/)
94 94
95 95 The easiest way of obtaining these dependencies is through the Enthought
96 96 Python Distribution (EPD) (http://www.enthought.com/products/epd.php). EPD is
97 97 produced by Enthought, Inc. and contains all of these packages and others in a
98 98 single installer and is available free for academic users. While it is also
99 99 possible to download and install each package individually, this is a tedious
100 100 process. Thus, we highly recommend using EPD to install these packages on
101 101 Windows.
102 102
103 103 Regardless of how you install the dependencies, here are the steps you will
104 104 need to follow:
105 105
106 106 1. Install all of the packages listed above, either individually or using EPD
107 107 on the head node, compute nodes and user workstations.
108 108
109 109 2. Make sure that :file:`C:\\Python27` and :file:`C:\\Python27\\Scripts` are
110 110 in the system :envvar:`%PATH%` variable on each node.
111 111
112 112 3. Install the latest development version of IPython. This can be done by
113 113 downloading the the development version from the IPython website
114 114 (http://ipython.org) and following the installation instructions.
115 115
116 116 Further details about installing IPython or its dependencies can be found in
117 117 the online IPython documentation (http://ipython.org/documentation.html)
118 118 Once you are finished with the installation, you can try IPython out by
119 119 opening a Windows Command Prompt and typing ``ipython``. This will
120 120 start IPython's interactive shell and you should see something like the
121 121 following screenshot:
122 122
123 123 .. image:: ipython_shell.*
124 124
125 125 Starting an IPython cluster
126 126 ===========================
127 127
128 128 To use IPython's parallel computing capabilities, you will need to start an
129 129 IPython cluster. An IPython cluster consists of one controller and multiple
130 130 engines:
131 131
132 132 IPython controller
133 133 The IPython controller manages the engines and acts as a gateway between
134 134 the engines and the client, which runs in the user's interactive IPython
135 135 session. The controller is started using the :command:`ipcontroller`
136 136 command.
137 137
138 138 IPython engine
139 139 IPython engines run a user's Python code in parallel on the compute nodes.
140 140 Engines are starting using the :command:`ipengine` command.
141 141
142 142 Once these processes are started, a user can run Python code interactively and
143 143 in parallel on the engines from within the IPython shell using an appropriate
144 144 client. This includes the ability to interact with, plot and visualize data
145 145 from the engines.
146 146
147 147 IPython has a command line program called :command:`ipcluster` that automates
148 148 all aspects of starting the controller and engines on the compute nodes.
149 149 :command:`ipcluster` has full support for the Windows HPC job scheduler,
150 150 meaning that :command:`ipcluster` can use this job scheduler to start the
151 151 controller and engines. In our experience, the Windows HPC job scheduler is
152 152 particularly well suited for interactive applications, such as IPython. Once
153 153 :command:`ipcluster` is configured properly, a user can start an IPython
154 154 cluster from their local workstation almost instantly, without having to log
155 155 on to the head node (as is typically required by Unix based job schedulers).
156 156 This enables a user to move seamlessly between serial and parallel
157 157 computations.
158 158
159 159 In this section we show how to use :command:`ipcluster` to start an IPython
160 160 cluster using the Windows HPC Server 2008 job scheduler. To make sure that
161 161 :command:`ipcluster` is installed and working properly, you should first try
162 162 to start an IPython cluster on your local host. To do this, open a Windows
163 163 Command Prompt and type the following command::
164 164
165 165 ipcluster start n=2
166 166
167 167 You should see a number of messages printed to the screen, ending with
168 168 "IPython cluster: started". The result should look something like the following
169 169 screenshot:
170 170
171 171 .. image:: ipcluster_start.*
172 172
173 173 At this point, the controller and two engines are running on your local host.
174 174 This configuration is useful for testing and for situations where you want to
175 175 take advantage of multiple cores on your local computer.
176 176
177 177 Now that we have confirmed that :command:`ipcluster` is working properly, we
178 178 describe how to configure and run an IPython cluster on an actual compute
179 179 cluster running Windows HPC Server 2008. Here is an outline of the needed
180 180 steps:
181 181
182 182 1. Create a cluster profile using: ``ipython profile create --parallel profile=mycluster``
183 183
184 184 2. Edit configuration files in the directory :file:`.ipython\\cluster_mycluster`
185 185
186 186 3. Start the cluster using: ``ipcluser start profile=mycluster n=32``
187 187
188 188 Creating a cluster profile
189 189 --------------------------
190 190
191 191 In most cases, you will have to create a cluster profile to use IPython on a
192 192 cluster. A cluster profile is a name (like "mycluster") that is associated
193 193 with a particular cluster configuration. The profile name is used by
194 194 :command:`ipcluster` when working with the cluster.
195 195
196 196 Associated with each cluster profile is a cluster directory. This cluster
197 197 directory is a specially named directory (typically located in the
198 198 :file:`.ipython` subdirectory of your home directory) that contains the
199 199 configuration files for a particular cluster profile, as well as log files and
200 200 security keys. The naming convention for cluster directories is:
201 201 :file:`profile_<profile name>`. Thus, the cluster directory for a profile named
202 202 "foo" would be :file:`.ipython\\cluster_foo`.
203 203
204 204 To create a new cluster profile (named "mycluster") and the associated cluster
205 205 directory, type the following command at the Windows Command Prompt::
206 206
207 ipython profile create --parallel profile=mycluster
207 ipython profile create --parallel --profile=mycluster
208 208
209 209 The output of this command is shown in the screenshot below. Notice how
210 210 :command:`ipcluster` prints out the location of the newly created cluster
211 211 directory.
212 212
213 213 .. image:: ipcluster_create.*
214 214
215 215 Configuring a cluster profile
216 216 -----------------------------
217 217
218 218 Next, you will need to configure the newly created cluster profile by editing
219 219 the following configuration files in the cluster directory:
220 220
221 221 * :file:`ipcluster_config.py`
222 222 * :file:`ipcontroller_config.py`
223 223 * :file:`ipengine_config.py`
224 224
225 225 When :command:`ipcluster` is run, these configuration files are used to
226 226 determine how the engines and controller will be started. In most cases,
227 227 you will only have to set a few of the attributes in these files.
228 228
229 229 To configure :command:`ipcluster` to use the Windows HPC job scheduler, you
230 230 will need to edit the following attributes in the file
231 231 :file:`ipcluster_config.py`::
232 232
233 233 # Set these at the top of the file to tell ipcluster to use the
234 234 # Windows HPC job scheduler.
235 235 c.IPClusterStart.controller_launcher = \
236 236 'IPython.parallel.apps.launcher.WindowsHPCControllerLauncher'
237 237 c.IPClusterEngines.engine_launcher = \
238 238 'IPython.parallel.apps.launcher.WindowsHPCEngineSetLauncher'
239 239
240 240 # Set these to the host name of the scheduler (head node) of your cluster.
241 241 c.WindowsHPCControllerLauncher.scheduler = 'HEADNODE'
242 242 c.WindowsHPCEngineSetLauncher.scheduler = 'HEADNODE'
243 243
244 244 There are a number of other configuration attributes that can be set, but
245 245 in most cases these will be sufficient to get you started.
246 246
247 247 .. warning::
248 248 If any of your configuration attributes involve specifying the location
249 249 of shared directories or files, you must make sure that you use UNC paths
250 250 like :file:`\\\\host\\share`. It is also important that you specify
251 251 these paths using raw Python strings: ``r'\\host\share'`` to make sure
252 252 that the backslashes are properly escaped.
253 253
254 254 Starting the cluster profile
255 255 ----------------------------
256 256
257 257 Once a cluster profile has been configured, starting an IPython cluster using
258 258 the profile is simple::
259 259
260 ipcluster start profile=mycluster n=32
260 ipcluster start --profile=mycluster --n=32
261 261
262 262 The ``-n`` option tells :command:`ipcluster` how many engines to start (in
263 263 this case 32). Stopping the cluster is as simple as typing Control-C.
264 264
265 265 Using the HPC Job Manager
266 266 -------------------------
267 267
268 268 When ``ipcluster start`` is run the first time, :command:`ipcluster` creates
269 269 two XML job description files in the cluster directory:
270 270
271 271 * :file:`ipcontroller_job.xml`
272 272 * :file:`ipengineset_job.xml`
273 273
274 274 Once these files have been created, they can be imported into the HPC Job
275 275 Manager application. Then, the controller and engines for that profile can be
276 276 started using the HPC Job Manager directly, without using :command:`ipcluster`.
277 277 However, anytime the cluster profile is re-configured, ``ipcluster start``
278 278 must be run again to regenerate the XML job description files. The
279 279 following screenshot shows what the HPC Job Manager interface looks like
280 280 with a running IPython cluster.
281 281
282 282 .. image:: hpc_job_manager.*
283 283
284 284 Performing a simple interactive parallel computation
285 285 ====================================================
286 286
287 287 Once you have started your IPython cluster, you can start to use it. To do
288 288 this, open up a new Windows Command Prompt and start up IPython's interactive
289 289 shell by typing::
290 290
291 291 ipython
292 292
293 293 Then you can create a :class:`MultiEngineClient` instance for your profile and
294 294 use the resulting instance to do a simple interactive parallel computation. In
295 295 the code and screenshot that follows, we take a simple Python function and
296 296 apply it to each element of an array of integers in parallel using the
297 297 :meth:`MultiEngineClient.map` method:
298 298
299 299 .. sourcecode:: ipython
300 300
301 301 In [1]: from IPython.parallel import *
302 302
303 303 In [2]: c = MultiEngineClient(profile='mycluster')
304 304
305 305 In [3]: mec.get_ids()
306 306 Out[3]: [0, 1, 2, 3, 4, 5, 67, 8, 9, 10, 11, 12, 13, 14]
307 307
308 308 In [4]: def f(x):
309 309 ...: return x**10
310 310
311 311 In [5]: mec.map(f, range(15)) # f is applied in parallel
312 312 Out[5]:
313 313 [0,
314 314 1,
315 315 1024,
316 316 59049,
317 317 1048576,
318 318 9765625,
319 319 60466176,
320 320 282475249,
321 321 1073741824,
322 322 3486784401L,
323 323 10000000000L,
324 324 25937424601L,
325 325 61917364224L,
326 326 137858491849L,
327 327 289254654976L]
328 328
329 329 The :meth:`map` method has the same signature as Python's builtin :func:`map`
330 330 function, but runs the calculation in parallel. More involved examples of using
331 331 :class:`MultiEngineClient` are provided in the examples that follow.
332 332
333 333 .. image:: mec_simple.*
334 334
General Comments 0
You need to be logged in to leave comments. Login now