##// END OF EJS Templates
Final work on the Win HPC whitepaper.
Brian Granger -
Show More
1 NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
@@ -1,17 +1,19 b''
1 1 .. _parallel_index:
2 2
3 3 ====================================
4 4 Using IPython for parallel computing
5 5 ====================================
6 6
7 7 .. toctree::
8 8 :maxdepth: 2
9 9
10 10 parallel_intro.txt
11 11 parallel_process.txt
12 12 parallel_multiengine.txt
13 13 parallel_task.txt
14 14 parallel_mpi.txt
15 15 parallel_security.txt
16 parallel_winhpc.txt
17 parallel_demos.txt
16 18
17 19
@@ -1,270 +1,282 b''
1 1 =================
2 2 Parallel examples
3 3 =================
4 4
5 5 In this section we describe two more involved examples of using an IPython
6 6 cluster to perform a parallel computation. In these examples, we will be using
7 7 IPython's "pylab" mode, which enables interactive plotting using the
8 8 Matplotlib package. IPython can be started in this mode by typing::
9 9
10 10 ipython -p pylab
11 11
12 12 at the system command line. If this prints an error message, you will
13 13 need to install the default profiles from within IPython by doing,
14 14
15 15 .. sourcecode:: ipython
16 16
17 17 In [1]: %install_profiles
18 18
19 19 and then restarting IPython.
20 20
21 21 150 million digits of pi
22 22 ========================
23 23
24 24 In this example we would like to study the distribution of digits in the
25 25 number pi (in base 10). While it is not known if pi is a normal number (a
26 26 number is normal in base 10 if 0-9 occur with equal likelihood) numerical
27 27 investigations suggest that it is. We will begin with a serial calculation on
28 28 10,000 digits of pi and then perform a parallel calculation involving 150
29 29 million digits.
30 30
31 31 In both the serial and parallel calculation we will be using functions defined
32 32 in the :file:`pidigits.py` file, which is available in the
33 33 :file:`docs/examples/kernel` directory of the IPython source distribution.
34 34 These functions provide basic facilities for working with the digits of pi and
35 35 can be loaded into IPython by putting :file:`pidigits.py` in your current
36 36 working directory and then doing:
37 37
38 38 .. sourcecode:: ipython
39 39
40 40 In [1]: run pidigits.py
41 41
42 42 Serial calculation
43 43 ------------------
44 44
45 45 For the serial calculation, we will use SymPy (http://www.sympy.org) to
46 46 calculate 10,000 digits of pi and then look at the frequencies of the digits
47 47 0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While
48 48 SymPy is capable of calculating many more digits of pi, our purpose here is to
49 49 set the stage for the much larger parallel calculation.
50 50
51 51 In this example, we use two functions from :file:`pidigits.py`:
52 52 :func:`one_digit_freqs` (which calculates how many times each digit occurs)
53 53 and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result).
54 54 Here is an interactive IPython session that uses these functions with
55 55 SymPy:
56 56
57 57 .. sourcecode:: ipython
58 58
59 59 In [7]: import sympy
60 60
61 61 In [8]: pi = sympy.pi.evalf(40)
62 62
63 63 In [9]: pi
64 64 Out[9]: 3.141592653589793238462643383279502884197
65 65
66 66 In [10]: pi = sympy.pi.evalf(10000)
67 67
68 68 In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits
69 69
70 70 In [12]: run pidigits.py # load one_digit_freqs/plot_one_digit_freqs
71 71
72 72 In [13]: freqs = one_digit_freqs(digits)
73 73
74 74 In [14]: plot_one_digit_freqs(freqs)
75 75 Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>]
76 76
77 77 The resulting plot of the single digit counts shows that each digit occurs
78 78 approximately 1,000 times, but that with only 10,000 digits the
79 79 statistical fluctuations are still rather large:
80 80
81 81 .. image:: single_digits.*
82 82
83 83 It is clear that to reduce the relative fluctuations in the counts, we need
84 84 to look at many more digits of pi. That brings us to the parallel calculation.
85 85
86 86 Parallel calculation
87 87 --------------------
88 88
89 89 Calculating many digits of pi is a challenging computational problem in itself.
90 90 Because we want to focus on the distribution of digits in this example, we
91 91 will use pre-computed digit of pi from the website of Professor Yasumasa
92 92 Kanada at the University of Tokoyo (http://www.super-computing.org). These
93 93 digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/)
94 94 that each have 10 million digits of pi.
95 95
96 96 For the parallel calculation, we have copied these files to the local hard
97 97 drives of the compute nodes. A total of 15 of these files will be used, for a
98 98 total of 150 million digits of pi. To make things a little more interesting we
99 99 will calculate the frequencies of all 2 digits sequences (00-99) and then plot
100 100 the result using a 2D matrix in Matplotlib.
101 101
102 102 The overall idea of the calculation is simple: each IPython engine will
103 103 compute the two digit counts for the digits in a single file. Then in a final
104 104 step the counts from each engine will be added up. To perform this
105 105 calculation, we will need two top-level functions from :file:`pidigits.py`:
106 106
107 107 .. literalinclude:: ../../examples/kernel/pidigits.py
108 108 :language: python
109 109 :lines: 34-49
110 110
111 111 We will also use the :func:`plot_two_digit_freqs` function to plot the
112 112 results. The code to run this calculation in parallel is contained in
113 113 :file:`docs/examples/kernel/parallelpi.py`. This code can be run in parallel
114 114 using IPython by following these steps:
115 115
116 116 1. Copy the text files with the digits of pi
117 117 (ftp://pi.super-computing.org/.2/pi200m/) to the working directory of the
118 118 engines on the compute nodes.
119 2. Use :command:`ipcluster` to start 15 engines. We used an 8 core cluster
120 with hyperthreading enabled which makes the 8 cores looks like 16 (1
121 controller + 15 engines) in the OS. However, the maximum speedup we can
122 observe is still only 8x.
119 2. Use :command:`ipcluster` to start 15 engines. We used an 8 core (2 quad
120 core CPUs) cluster with hyperthreading enabled which makes the 8 cores
121 looks like 16 (1 controller + 15 engines) in the OS. However, the maximum
122 speedup we can observe is still only 8x.
123 123 3. With the file :file:`parallelpi.py` in your current working directory, open
124 124 up IPython in pylab mode and type ``run parallelpi.py``.
125 125
126 126 When run on our 8 core cluster, we observe a speedup of 7.7x. This is slightly
127 127 less than linear scaling (8x) because the controller is also running on one of
128 128 the cores.
129 129
130 130 To emphasize the interactive nature of IPython, we now show how the
131 131 calculation can also be run by simply typing the commands from
132 132 :file:`parallelpi.py` interactively into IPython:
133 133
134 134 .. sourcecode:: ipython
135 135
136 136 In [1]: from IPython.kernel import client
137 137 2009-11-19 11:32:38-0800 [-] Log opened.
138 138
139 # The MultiEngineClient allows us to use the engines interactively
139 # The MultiEngineClient allows us to use the engines interactively.
140 # We simply pass MultiEngineClient the name of the cluster profile we
141 # are using.
140 142 In [2]: mec = client.MultiEngineClient(profile='mycluster')
141 143 2009-11-19 11:32:44-0800 [-] Connecting [0]
142 144 2009-11-19 11:32:44-0800 [Negotiation,client] Connected: ./ipcontroller-mec.furl
143 145
144 146 In [3]: mec.get_ids()
145 147 Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
146 148
147 149 In [4]: run pidigits.py
148 150
149 151 In [5]: filestring = 'pi200m-ascii-%(i)02dof20.txt'
150 152
153 # Create the list of files to process.
151 154 In [6]: files = [filestring % {'i':i} for i in range(1,16)]
152 155
153 156 In [7]: files
154 157 Out[7]:
155 158 ['pi200m-ascii-01of20.txt',
156 159 'pi200m-ascii-02of20.txt',
157 160 'pi200m-ascii-03of20.txt',
158 161 'pi200m-ascii-04of20.txt',
159 162 'pi200m-ascii-05of20.txt',
160 163 'pi200m-ascii-06of20.txt',
161 164 'pi200m-ascii-07of20.txt',
162 165 'pi200m-ascii-08of20.txt',
163 166 'pi200m-ascii-09of20.txt',
164 167 'pi200m-ascii-10of20.txt',
165 168 'pi200m-ascii-11of20.txt',
166 169 'pi200m-ascii-12of20.txt',
167 170 'pi200m-ascii-13of20.txt',
168 171 'pi200m-ascii-14of20.txt',
169 172 'pi200m-ascii-15of20.txt']
170 173
171 174 # This is the parallel calculation using the MultiEngineClient.map method
172 175 # which applies compute_two_digit_freqs to each file in files in parallel.
173 176 In [8]: freqs_all = mec.map(compute_two_digit_freqs, files)
174 177
175 178 # Add up the frequencies from each engine.
176 179 In [8]: freqs = reduce_freqs(freqs_all)
177 180
178 181 In [9]: plot_two_digit_freqs(freqs)
179 182 Out[9]: <matplotlib.image.AxesImage object at 0x18beb110>
180 183
181 184 In [10]: plt.title('2 digit counts of 150m digits of pi')
182 185 Out[10]: <matplotlib.text.Text object at 0x18d1f9b0>
183 186
184 187 The resulting plot generated by Matplotlib is shown below. The colors indicate
185 188 which two digit sequences are more (red) or less (blue) likely to occur in the
186 189 first 150 million digits of pi. We clearly see that the sequence "41" is
187 190 most likely and that "06" and "07" are least likely. Further analysis would
188 191 show that the relative size of the statistical fluctuations have decreased
189 192 compared to the 10,000 digit calculation.
190 193
191 194 .. image:: two_digit_counts.*
192 195
193 To conclude this example, we summarize the key features of IPython's parallel
194 architecture that this example demonstrates:
195
196 * Serial code can be parallelized often with only a few extra lines of code.
197 In this case we have used :meth:`MultiEngineClient.map`; the
198 :class:`MultiEngineClient` class has a number of other methods that provide
199 more fine grained control of the IPython cluster.
200 * The resulting parallel code can be run without ever leaving the IPython's
201 interactive shell.
202 * Any data computed in parallel can be explored interactively through
203 visualization or further numerical calculations.
204
205 196
206 197 Parallel options pricing
207 198 ========================
208 199
209 200 An option is a financial contract that gives the buyer of the contract the
210 201 right to buy (a "call") or sell (a "put") a secondary asset (a stock for
211 202 example) at a particular date in the future (the expiration date) for a
212 203 pre-agreed upon price (the strike price). For this right, the buyer pays the
213 204 seller a premium (the option price). There are a wide variety of flavors of
214 205 options (American, European, Asian, etc.) that are useful for different
215 206 purposes: hedging against risk, speculation, etc.
216 207
217 208 Much of modern finance is driven by the need to price these contracts
218 209 accurately based on what is known about the properties (such as volatility) of
219 210 the underlying asset. One method of pricing options is to use a Monte Carlo
220 simulation of the underlying assets. In this example we use this approach to
221 price both European and Asian (path dependent) options for various strike
211 simulation of the underlying asset price. In this example we use this approach
212 to price both European and Asian (path dependent) options for various strike
222 213 prices and volatilities.
223 214
224 215 The code for this example can be found in the :file:`docs/examples/kernel`
225 directory of the IPython source.
226
227 The function :func:`price_options`, calculates the option prices for a single
228 option (:file:`mcpricer.py`):
216 directory of the IPython source. The function :func:`price_options` in
217 :file:`mcpricer.py` implements the basic Monte Carlo pricing algorithm using
218 the NumPy package and is shown here:
229 219
230 220 .. literalinclude:: ../../examples/kernel/mcpricer.py
231 221 :language: python
232 222
233 To run this code in parallel, we will use IPython's :class:`TaskClient`, which
234 distributes work to the engines using dynamic load balancing. This client
235 can be used along side the :class:`MultiEngineClient` shown in the previous
236 example.
237
238 Here is the code that calls :func:`price_options` for a number of different
239 volatilities and strike prices in parallel:
223 To run this code in parallel, we will use IPython's :class:`TaskClient` class,
224 which distributes work to the engines using dynamic load balancing. This
225 client can be used along side the :class:`MultiEngineClient` class shown in
226 the previous example. The parallel calculation using :class:`TaskClient` can
227 be found in the file :file:`mcpricer.py`. The code in this file creates a
228 :class:`TaskClient` instance and then submits a set of tasks using
229 :meth:`TaskClient.run` that calculate the option prices for different
230 volatilities and strike prices. The results are then plotted as a 2D contour
231 plot using Matplotlib.
240 232
241 233 .. literalinclude:: ../../examples/kernel/mcdriver.py
242 234 :language: python
243 235
244 To run this code in parallel, start an IPython cluster using
245 :command:`ipcluster`, open IPython in the pylab mode with the file
246 :file:`mcdriver.py` in your current working directory and then type:
236 To use this code, start an IPython cluster using :command:`ipcluster`, open
237 IPython in the pylab mode with the file :file:`mcdriver.py` in your current
238 working directory and then type:
247 239
248 240 .. sourcecode:: ipython
249 241
250 242 In [7]: run mcdriver.py
251 243 Submitted tasks: [0, 1, 2, ...]
252 244
253 245 Once all the tasks have finished, the results can be plotted using the
254 246 :func:`plot_options` function. Here we make contour plots of the Asian
255 call and Asian put as function of the volatility and strike price:
247 call and Asian put options as function of the volatility and strike price:
256 248
257 249 .. sourcecode:: ipython
258 250
259 251 In [8]: plot_options(sigma_vals, K_vals, prices['acall'])
260 252
261 253 In [9]: plt.figure()
262 254 Out[9]: <matplotlib.figure.Figure object at 0x18c178d0>
263 255
264 256 In [10]: plot_options(sigma_vals, K_vals, prices['aput'])
265 257
266 The plots generated by Matplotlib will look like this:
258 These results are shown in the two figures below. On a 8 core cluster the
259 entire calculation (10 strike prices, 10 volatilities, 100,000 paths for each)
260 took 30 seconds in parallel, giving a speedup of 7.7x, which is comparable
261 to the speedup observed in our previous example.
267 262
268 263 .. image:: asian_call.*
269 264
270 265 .. image:: asian_put.*
266
267 Conclusion
268 ==========
269
270 To conclude these examples, we summarize the key features of IPython's
271 parallel architecture that have been demonstrated:
272
273 * Serial code can be parallelized often with only a few extra lines of code.
274 We have used the :class:`MultiEngineClient` and :class:`TaskClient` classes
275 for this purpose.
276 * The resulting parallel code can be run without ever leaving the IPython's
277 interactive shell.
278 * Any data computed in parallel can be explored interactively through
279 visualization or further numerical calculations.
280 * We have run these examples on a cluster running Windows HPC Server 2008.
281 IPython's built in support for the Windows HPC job scheduler makes it
282 easy to get started with IPython's parallel capabilities.
@@ -1,332 +1,333 b''
1 ========================================
2 Getting started
3 ========================================
1 ============================================
2 Getting started with Windows HPC Server 2008
3 ============================================
4 4
5 5 Introduction
6 6 ============
7 7
8 The Python programming language is increasingly popular language for numerical
9 computing. This is due to a unique combination of factors. First, Python is a
10 high-level and *interactive* language that is well matched for interactive
11 numerical work. Second, it is easy (often times trivial) to integrate legacy
12 C/C++/Fortran code into Python. Third, a large number of high-quality open
13 source projects provide all the needed building blocks for numerical
14 computing: numerical arrays (NumPy), algorithms (SciPy), 2D/3D Visualization
15 (Matplotlib, Mayavi, Chaco), Symbolic Mathematics (Sage, Sympy) and others.
8 The Python programming language is an increasingly popular language for
9 numerical computing. This is due to a unique combination of factors. First,
10 Python is a high-level and *interactive* language that is well matched to
11 interactive numerical work. Second, it is easy (often times trivial) to
12 integrate legacy C/C++/Fortran code into Python. Third, a large number of
13 high-quality open source projects provide all the needed building blocks for
14 numerical computing: numerical arrays (NumPy), algorithms (SciPy), 2D/3D
15 Visualization (Matplotlib, Mayavi, Chaco), Symbolic Mathematics (Sage, Sympy)
16 and others.
16 17
17 18 The IPython project is a core part of this open-source toolchain and is
18 19 focused on creating a comprehensive environment for interactive and
19 20 exploratory computing in the Python programming language. It enables all of
20 21 the above tools to be used interactively and consists of two main components:
21 22
22 23 * An enhanced interactive Python shell with support for interactive plotting
23 24 and visualization.
24 25 * An architecture for interactive parallel computing.
25 26
26 27 With these components, it is possible to perform all aspects of a parallel
27 28 computation interactively. This type of workflow is particularly relevant in
28 29 scientific and numerical computing where algorithms, code and data are
29 30 continually evolving as the user/developer explores a problem. The broad
30 31 treads in computing (commodity clusters, multicore, cloud computing, etc.)
31 32 make these capabilities of IPython particularly relevant.
32 33
33 34 While IPython is a cross platform tool, it has particularly strong support for
34 35 Windows based compute clusters running Windows HPC Server 2008. This document
35 36 describes how to get started with IPython on Windows HPC Server 2008. The
36 37 content and emphasis here is practical: installing IPython, configuring
37 38 IPython to use the Windows job scheduler and running example parallel programs
38 39 interactively. A more complete description of IPython's parallel computing
39 40 capabilities can be found in IPython's online documentation
40 41 (http://ipython.scipy.org/moin/Documentation).
41 42
42 43 Setting up your Windows cluster
43 44 ===============================
44 45
45 46 This document assumes that you already have a cluster running Windows
46 47 HPC Server 2008. Here is a broad overview of what is involved with setting up
47 48 such a cluster:
48 49
49 50 1. Install Windows Server 2008 on the head and compute nodes in the cluster.
50 51 2. Setup the network configuration on each host. Each host should have a
51 52 static IP address.
52 53 3. On the head node, activate the "Active Directory Domain Services" role
53 54 and make the head node the domain controller.
54 55 4. Join the compute nodes to the newly created Active Directory (AD) domain.
55 56 5. Setup user accounts in the domain with shared home directories.
56 57 6. Install the HPC Pack 2008 on the head node to create a cluster.
57 58 7. Install the HPC Pack 2008 on the compute nodes.
58 59
59 60 More details about installing and configuring Windows HPC Server 2008 can be
60 61 found on the Windows HPC Home Page (http://www.microsoft.com/hpc). Regardless
61 62 of what steps you follow to set up your cluster, the remainder of this
62 63 document will assume that:
63 64
64 65 * There are domain users that can log on to the AD domain and submit jobs
65 66 to the cluster scheduler.
66 67 * These domain users have shared home directories. While shared home
67 68 directories are not required to use IPython, they make it much easier to
68 69 use IPython.
69 70
70 71 Installation of IPython and its dependencies
71 72 ============================================
72 73
73 74 IPython and all of its dependencies are freely available and open source.
74 75 These packages provide a powerful and cost-effective approach to numerical and
75 76 scientific computing on Windows. The following dependencies are needed to run
76 77 IPython on Windows:
77 78
78 79 * Python 2.5 or 2.6 (http://www.python.org)
79 80 * pywin32 (http://sourceforge.net/projects/pywin32/)
80 81 * PyReadline (https://launchpad.net/pyreadline)
81 82 * zope.interface and Twisted (http://twistedmatrix.com)
82 83 * Foolcap (http://foolscap.lothar.com/trac)
83 84 * pyOpenSSL (https://launchpad.net/pyopenssl)
84 85 * IPython (http://ipython.scipy.org)
85 86
86 87 In addition, the following dependencies are needed to run the demos described
87 88 in this document.
88 89
89 90 * NumPy and SciPy (http://www.scipy.org)
90 91 * wxPython (http://www.wxpython.org)
91 92 * Matplotlib (http://matplotlib.sourceforge.net/)
92 93
93 94 The easiest way of obtaining these dependencies is through the Enthought
94 95 Python Distribution (EPD) (http://www.enthought.com/products/epd.php). EPD is
95 96 produced by Enthought, Inc. and contains all of these packages and others in a
96 97 single installer and is available free for academic users. While it is also
97 98 possible to download and install each package individually, this is a tedious
98 99 process. Thus, we highly recommend using EPD to install these packages on
99 100 Windows.
100 101
101 102 Regardless of how you install the dependencies, here are the steps you will
102 103 need to follow:
103 104
104 105 1. Install all of the packages listed above, either individually or using EPD
105 106 on the head node, compute nodes and user workstations.
106 107
107 108 2. Make sure that :file:`C:\\Python25` and :file:`C:\\Python25\\Scripts` are
108 109 in the system :envvar:`%PATH%` variable on each node.
109 110
110 111 3. Install the latest development version of IPython. This can be done by
111 112 downloading the the development version from the IPython website
112 113 (http://ipython.scipy.org) and following the installation instructions.
113 114
114 115 Further details about installing IPython or its dependencies can be found in
115 116 the online IPython documentation (http://ipython.scipy.org/moin/Documentation)
116 117 Once you are finished with the installation, you can try IPython out by
117 118 opening a Windows Command Prompt and typing ``ipython``. This will
118 119 start IPython's interactive shell and you should see something like the
119 120 following screenshot:
120 121
121 122 .. image:: ipython_shell.*
122 123
123 124 Starting an IPython cluster
124 125 ===========================
125 126
126 127 To use IPython's parallel computing capabilities, you will need to start an
127 128 IPython cluster. An IPython cluster consists of one controller and multiple
128 129 engines:
129 130
130 131 IPython controller
131 132 The IPython controller manages the engines and acts as a gateway between
132 133 the engines and the client, which runs in the user's interactive IPython
133 134 session. The controller is started using the :command:`ipcontroller`
134 135 command.
135 136
136 137 IPython engine
137 138 IPython engines run a user's Python code in parallel on the compute nodes.
138 139 Engines are starting using the :command:`ipengine` command.
139 140
140 141 Once these processes are started, a user can run Python code interactively and
141 142 in parallel on the engines from within the IPython shell using an appropriate
142 143 client. This includes the ability to interact with, plot and visualize data
143 144 from the engines.
144 145
145 146 IPython has a command line program called :command:`ipcluster` that automates
146 147 all aspects of starting the controller and engines on the compute nodes.
147 148 :command:`ipcluster` has full support for the Windows HPC job scheduler,
148 149 meaning that :command:`ipcluster` can use this job scheduler to start the
149 150 controller and engines. In our experience, the Windows HPC job scheduler is
150 151 particularly well suited for interactive applications, such as IPython. Once
151 152 :command:`ipcluster` is configured properly, a user can start an IPython
152 153 cluster from their local workstation almost instantly, without having to log
153 154 on to the head node (as is typically required by Unix based job schedulers).
154 155 This enables a user to move seamlessly between serial and parallel
155 156 computations.
156 157
157 158 In this section we show how to use :command:`ipcluster` to start an IPython
158 159 cluster using the Windows HPC Server 2008 job scheduler. To make sure that
159 160 :command:`ipcluster` is installed and working properly, you should first try
160 161 to start an IPython cluster on your local host. To do this, open a Windows
161 162 Command Prompt and type the following command::
162 163
163 164 ipcluster start -n 2
164 165
165 166 You should see a number of messages printed to the screen, ending with
166 167 "IPython cluster: started". The result should look something like the following
167 168 screenshot:
168 169
169 170 .. image:: ipcluster_start.*
170 171
171 172 At this point, the controller and two engines are running on your local host.
172 173 This configuration is useful for testing and for situations where you want to
173 174 take advantage of multiple cores on your local computer.
174 175
175 176 Now that we have confirmed that :command:`ipcluster` is working properly, we
176 177 describe how to configure and run an IPython cluster on an actual compute
177 178 cluster running Windows HPC Server 2008. Here is an outline of the needed
178 179 steps:
179 180
180 181 1. Create a cluster profile using: ``ipcluster create -p mycluster``
181 182
182 183 2. Edit configuration files in the directory :file:`.ipython\\cluster_mycluster`
183 184
184 185 3. Start the cluster using: ``ipcluser start -p mycluster -n 32``
185 186
186 187 Creating a cluster profile
187 188 --------------------------
188 189
189 190 In most cases, you will have to create a cluster profile to use IPython on a
190 191 cluster. A cluster profile is a name (like "mycluster") that is associated
191 192 with a particular cluster configuration. The profile name is used by
192 193 :command:`ipcluster` when working with the cluster.
193 194
194 195 Associated with each cluster profile is a cluster directory. This cluster
195 196 directory is a specially named directory (typically located in the
196 197 :file:`.ipython` subdirectory of your home directory) that contains the
197 198 configuration files for a particular cluster profile, as well as log files and
198 199 security keys. The naming convention for cluster directories is:
199 200 :file:`cluster_<profile name>`. Thus, the cluster directory for a profile named
200 201 "foo" would be :file:`.ipython\\cluster_foo`.
201 202
202 203 To create a new cluster profile (named "mycluster") and the associated cluster
203 204 directory, type the following command at the Windows Command Prompt::
204 205
205 206 ipcluster create -p mycluster
206 207
207 208 The output of this command is shown in the screenshot below. Notice how
208 209 :command:`ipcluster` prints out the location of the newly created cluster
209 210 directory.
210 211
211 212 .. image:: ipcluster_create.*
212 213
213 214 Configuring a cluster profile
214 215 -----------------------------
215 216
216 217 Next, you will need to configure the newly created cluster profile by editing
217 218 the following configuration files in the cluster directory:
218 219
219 220 * :file:`ipcluster_config.py`
220 221 * :file:`ipcontroller_config.py`
221 222 * :file:`ipengine_config.py`
222 223
223 224 When :command:`ipcluster` is run, these configuration files are used to
224 225 determine how the engines and controller will be started. In most cases,
225 226 you will only have to set a few of the attributes in these files.
226 227
227 228 To configure :command:`ipcluster` to use the Windows HPC job scheduler, you
228 229 will need to edit the following attributes in the file
229 230 :file:`ipcluster_config.py`::
230 231
231 232 # Set these at the top of the file to tell ipcluster to use the
232 233 # Windows HPC job scheduler.
233 234 c.Global.controller_launcher = \
234 235 'IPython.kernel.launcher.WindowsHPCControllerLauncher'
235 236 c.Global.engine_launcher = \
236 237 'IPython.kernel.launcher.WindowsHPCEngineSetLauncher'
237 238
238 239 # Set these to the host name of the scheduler (head node) of your cluster.
239 240 c.WindowsHPCControllerLauncher.scheduler = 'HEADNODE'
240 241 c.WindowsHPCEngineSetLauncher.scheduler = 'HEADNODE'
241 242
242 243 There are a number of other configuration attributes that can be set, but
243 244 in most cases these will be sufficient to get you started.
244 245
245 246 .. warning::
246 247 If any of your configuration attributes involve specifying the location
247 248 of shared directories or files, you must make sure that you use UNC paths
248 249 like :file:`\\\\host\\share`. It is also important that you specify
249 250 these paths using raw Python strings: ``r'\\host\share'`` to make sure
250 251 that the backslashes are properly escaped.
251 252
252 253 Starting the cluster profile
253 254 ----------------------------
254 255
255 256 Once a cluster profile has been configured, starting an IPython cluster using
256 257 the profile is simple::
257 258
258 259 ipcluster start -p mycluster -n 32
259 260
260 261 The ``-n`` option tells :command:`ipcluster` how many engines to start (in
261 262 this case 32). Stopping the cluster is as simple as typing Control-C.
262 263
263 264 Using the HPC Job Manager
264 265 -------------------------
265 266
266 267 When ``ipcluster start`` is run the first time, :command:`ipcluster` creates
267 268 two XML job description files in the cluster directory:
268 269
269 270 * :file:`ipcontroller_job.xml`
270 271 * :file:`ipengineset_job.xml`
271 272
272 273 Once these files have been created, they can be imported into the HPC Job
273 274 Manager application. Then, the controller and engines for that profile can be
274 275 started using the HPC Job Manager directly, without using :command:`ipcluster`.
275 276 However, anytime the cluster profile is re-configured, ``ipcluster start``
276 277 must be run again to regenerate the XML job description files. The
277 278 following screenshot shows what the HPC Job Manager interface looks like
278 279 with a running IPython cluster.
279 280
280 281 .. image:: hpc_job_manager.*
281 282
282 283 Performing a simple interactive parallel computation
283 284 ====================================================
284 285
285 286 Once you have started your IPython cluster, you can start to use it. To do
286 287 this, open up a new Windows Command Prompt and start up IPython's interactive
287 288 shell by typing::
288 289
289 290 ipython
290 291
291 292 Then you can create a :class:`MultiEngineClient` instance for your profile and
292 293 use the resulting instance to do a simple interactive parallel computation. In
293 294 the code and screenshot that follows, we take a simple Python function and
294 295 apply it to each element of an array of integers in parallel using the
295 296 :meth:`MultiEngineClient.map` method:
296 297
297 298 .. sourcecode:: ipython
298 299
299 300 In [1]: from IPython.kernel.client import *
300 301
301 302 In [2]: mec = MultiEngineClient(profile='mycluster')
302 303
303 304 In [3]: mec.get_ids()
304 305 Out[3]: [0, 1, 2, 3, 4, 5, 67, 8, 9, 10, 11, 12, 13, 14]
305 306
306 307 In [4]: def f(x):
307 308 ...: return x**10
308 309
309 310 In [5]: mec.map(f, range(15)) # f is applied in parallel
310 311 Out[5]:
311 312 [0,
312 313 1,
313 314 1024,
314 315 59049,
315 316 1048576,
316 317 9765625,
317 318 60466176,
318 319 282475249,
319 320 1073741824,
320 321 3486784401L,
321 322 10000000000L,
322 323 25937424601L,
323 324 61917364224L,
324 325 137858491849L,
325 326 289254654976L]
326 327
327 328 The :meth:`map` method has the same signature as Python's builtin :func:`map`
328 329 function, but runs the calculation in parallel. More involved examples of using
329 330 :class:`MultiEngineClient` are provided in the examples that follow.
330 331
331 332 .. image:: mec_simple.*
332 333
General Comments 0
You need to be logged in to leave comments. Login now