##// END OF EJS Templates
Initial draft of Windows HPC documentation.
Brian Granger -
Show More
1 NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
@@ -0,0 +1,126 b''
1 =================
2 Parallel examples
3 =================
4
5 In this section we describe a few more involved examples of using an IPython
6 cluster to perform a parallel computation.
7
8 150 million digits of pi
9 ========================
10
11 In this example we would like to study the distribution of digits in the
12 number pi. More specifically, we are going to study how often each 2
13 digits sequence occurs in the first 150 million digits of pi. If the digits
14 0-9 occur with equal probability, we expect that each two digits sequence
15 (00, 01, ..., 99) will occur 1% of the time.
16
17 This examples uses precomputed digits of pi from the website of Professor
18 Yasumasa Kanada at the University of Tokoyo (http://www.super-computing.org).
19 These digits come in a set of ``.txt`` files
20 (ftp://pi.super-computing.org/.2/pi200m/) that each have 10 million digits of
21 pi. In the parallel computation, we will use the :meth:`MultiEngineClient.map`
22 method to have each engine compute the desired statistics on a subset of these
23 files. Before I started the parallel computation, I copied the data files
24 to the compute nodes so the engine have fast access to them.
25
26 Here are the Python functions for counting the frequencies of each two digit
27 sequence in serial::
28
29 def compute_two_digit_freqs(filename):
30 """
31 Compute the two digit frequencies from a single file.
32 """
33 d = txt_file_to_digits(filename)
34 freqs = two_digit_freqs(d)
35 return freqs
36
37 def txt_file_to_digits(filename, the_type=str):
38 """
39 Yield the digits of pi read from a .txt file.
40 """
41 with open(filename, 'r') as f:
42 for line in f.readlines():
43 for c in line:
44 if c != '\n' and c!= ' ':
45 yield the_type(c)
46
47 def two_digit_freqs(digits, normalize=False):
48 """
49 Consume digits of pi and compute 2 digits freq. counts.
50 """
51 freqs = np.zeros(100, dtype='i4')
52 last = digits.next()
53 this = digits.next()
54 for d in digits:
55 index = int(last + this)
56 freqs[index] += 1
57 last = this
58 this = d
59 if normalize:
60 freqs = freqs/freqs.sum()
61 return freqs
62
63 These functions are defined in the file :file:`pidigits.py`. To perform the
64 calculation in parallel, we use an additional file: :file:`parallelpi.py`::
65
66 from IPython.kernel import client
67 from matplotlib import pyplot as plt
68 import numpy as np
69 from pidigits import *
70 from timeit import default_timer as clock
71
72 # Files with digits of pi (10m digits each)
73 filestring = 'pi200m-ascii-%(i)02dof20.txt'
74 files = [filestring % {'i':i} for i in range(1,16)]
75
76 # A function for reducing the frequencies calculated
77 # by different engines.
78 def reduce_freqs(freqlist):
79 allfreqs = np.zeros_like(freqlist[0])
80 for f in freqlist:
81 allfreqs += f
82 return allfreqs
83
84 # Connect to the IPython cluster
85 mec = client.MultiEngineClient(profile='mycluster')
86 mec.run('pidigits.py')
87
88 # Run 10m digits on 1 engine
89 mapper = mec.mapper(targets=0)
90 t1 = clock()
91 freqs10m = mapper.map(compute_two_digit_freqs, files[:1])[0]
92 t2 = clock()
93 digits_per_second1 = 10.0e6/(t2-t1)
94 print "Digits per second (1 core, 10m digits): ", digits_per_second1
95
96 # Run 150m digits on 15 engines (8 cores)
97 t1 = clock()
98 freqs_all = mec.map(compute_two_digit_freqs, files[:len(mec)])
99 freqs150m = reduce_freqs(freqs_all)
100 t2 = clock()
101 digits_per_second8 = 150.0e6/(t2-t1)
102 print "Digits per second (8 cores, 150m digits): ", digits_per_second8
103
104 print "Speedup: ", digits_per_second8/digits_per_second1
105
106 plot_two_digit_freqs(freqs150m)
107 plt.title("2 digit sequences in 150m digits of pi")
108
109 To run this code on an IPython cluster:
110
111 1. Start an IPython cluster with 15 engines: ``ipcluster start -p mycluster -n 15``
112 2. Open IPython's interactive shell using the pylab profile
113 ``ipython -p pylab`` and type ``run parallelpi.py``.
114
115 At this point, the parallel calculation will begin. On a small an 8 core
116 cluster, we observe a speedup of 7.7x. The resulting plot of the two digit
117 sequences is shown in the following screenshot.
118
119 .. image:: parallel_pi.*
120
121
122 Parallel option pricing
123 =======================
124
125 The example will be added at a later point.
126
1 NO CONTENT: new file 100644, binary diff hidden
1 NO CONTENT: new file 100644, binary diff hidden
@@ -0,0 +1,282 b''
1 ========================================
2 Getting started
3 ========================================
4
5 Introduction
6 ============
7
8 IPython is an open source project focused on interactive and exploratory
9 computing in the Python programming language. It consists of two
10 main componenents:
11
12 * An enhanced interactive Python shell with support for interactive plotting
13 and visualization.
14 * An architecture for interactive parallel computing.
15
16 With these components, it is possible to perform all aspects of a parallel
17 computation interactively. This document describes how to get started with
18 IPython on Window HPC Server 2008. A more complete desription of IPython's
19 parallel computing capabilities can be found in IPython's online documentation
20 (http://ipython.scipy.org/moin/Documentation).
21
22 Setting up your Windows cluster
23 ===============================
24
25 This document assumes that you already have a cluster running Windows
26 HPC Server 2008. Here is a broad overview of what is involved with setting up
27 such a cluster:
28
29 1. Install Windows Server 2008 on the head and compute nodes in the cluster.
30 2. Setup the network configuration on each host. Each host should have a
31 static IP address.
32 3. On the head node, activate the "Active Directory Domain Services" role
33 and make the head node the domain controller.
34 4. Join the compute nodes to the newly created Active Directory (AD) domain.
35 5. Setup user accounts in the domain with shared home directories.
36 6. Install the HPC Pack 2008 on the head node to create a cluster.
37 7. Install the HPC Pack 2008 on the compute nodes.
38
39 More details about installing and configuring Windows HPC Server 2008 can be
40 found on the Windows HPC Home Page (http://www.microsoft.com/hpc). Regardless
41 of what steps you go through to set up your cluster, the remainder of this
42 document will assume that:
43
44 * There are domain users that can log on to the AD domain and submit jobs
45 to the cluster scheduler.
46 * These domain users have shared home directories. While shared home
47 directories are not required to use IPython, they make it much easier to
48 use IPython.
49
50 Installation of IPython and its dependencies
51 ============================================
52
53 IPython and all of its dependencies are freely available and open source.
54 These packages provide a powerful and cost-effective approach to numerical and
55 scientific computing on Windows. The following dependencies are needed to run
56 IPython on Windows:
57
58 * Python 2.5 or 2.6 (http://www.python.org)
59 * pywin32 (http://sourceforge.net/projects/pywin32/)
60 * PyReadline (https://launchpad.net/pyreadline)
61 * zope.interface and Twisted (http://twistedmatrix.com)
62 * Foolcap (http://foolscap.lothar.com/trac)
63 * pyOpenSSL (https://launchpad.net/pyopenssl)
64 * IPython (http://ipython.scipy.org)
65
66 In addition, the following dependencies are needed to run the demos
67 described in this document.
68
69 * NumPy and SciPy (http://www.scipy.org)
70 * wxPython (http://www.wxpython.org)
71 * Matplotlib (http://matplotlib.sourceforge.net/)
72
73 The easiest way of obtaining these dependencies is through the Enthought
74 Python Distribution (EPD) (http://www.enthought.com/products/epd.php). EPD is
75 produced by Enthought, Inc. and contains all of these packages and others in a
76 single installer and is available free for academic users. While it is also
77 possible to download and install each package individually, this is a tedious
78 process. Thus, we highly recommend using EPD to install these packages on
79 Windows.
80
81 Regardless of how you install the dependencies, here are the steps you will
82 need to follow:
83
84 1. Install all of the packages listed above, either individually or using EPD
85 on the head node, compute nodes and user workstations.
86
87 2. Make sure that :file:`C:\\Python25` and :file:`C:\\Python25\\Scripts` are
88 in the system :envvar:`%PATH%` variable on each node.
89
90 3. Install the latest development version of IPython. This can be done by
91 downloading the the development version from the IPython website
92 (http://ipython.scipy.org) and following the installation instructions.
93
94 Further details about installing IPython or its dependencies can be found in
95 the online IPython documentation (http://ipython.scipy.org/moin/Documentation)
96 Once you are finished with the installation, you can try IPython out by
97 opening a Windows Command Prompt and typing :command:`ipython`. This will
98 start IPython's interactive shell and you should see something like the
99 following screenshot:
100
101 .. image:: ipython_shell.*
102
103 Starting an IPython cluster
104 ===========================
105
106 To use IPython's parallel computing capabilities, you will need to start an
107 IPython cluster. An IPython cluster consists of one controller and multiple
108 engines:
109
110 IPython controller
111 The IPython controller manages the engines and acts as a gateway between
112 the engines and the client, which runs in the user's interactive IPython
113 session. The controller is started using the :command:`ipcontroller`
114 command.
115
116 IPython engine
117 IPython engines run your Python code in parallel on the compute nodes.
118 Engines are starting using the :command:`ipengine` command.
119
120 Once these processes are started, a user can run Python code interactively and
121 in parallel on the engines from within the IPython shell. This includes the
122 ability to interact with, plot and visualize data from the engines.
123
124 IPython has a command line program called :command:`ipcluster` that handles
125 all aspects of starting the controller and engines on the compute nodes.
126 :command:`ipcluster` has full support for the Windows HPC job scheduler,
127 meaning that :command:`ipcluster` can use this job scheduler to start the
128 controller and engines. In our experience, the Windows HPC job scheduler is
129 particularly well suited for interactive applications, such as IPython. Once
130 :command:`ipcluster` is configured properly, a user can start an IPython
131 cluster from their local workstation almost instantly, without having to log
132 on to the head node (as is typically required by Unix based job schedulers).
133 This enables a user to move seamlessly between serial and parallel
134 computations.
135
136 In this section we show how to use :command:`ipcluster` to start an IPython
137 cluster using the Windows HPC Server 2008 job scheduler. To make sure that
138 :command:`ipcluster` is installed and working properly, you should first try
139 to start an IPython cluster on your local host. To do this, open a Windows
140 Command Prompt and type the following command::
141
142 ipcluster start -n 2
143
144 You should see a number of messages printed to the screen, ending with
145 "IPython cluster: started". A screenshot of this follows.
146
147
148 .. image:: ipcluster_start.*
149
150 At this point, the controller and two engines are running on your local host.
151 This configuration is useful for testing and for situations where you
152 have multiple cores on your local computer.
153
154 Now that we have confirmed that :command:`ipcluster` is working properly, we
155 describe how to configure and run an IPython cluster on an actual cluster
156 running Windows HPC Server 2008. Here is an outline of the needed steps:
157
158 1. Create a cluster profile: ``ipcluster create -p mycluster``
159
160 2. Edit confguration files in :file:`.ipython\\cluster_mycluster`.
161
162 3. Start the cluster: ``ipcluser start -p mycluster -n 32``
163
164 Creating a cluster profile
165 --------------------------
166
167 In most cases, you will have to create and configure a cluster profile to use
168 IPython on a cluster. A cluster profile is a specially named directory
169 (typically located in the :file:`.ipython` subdirectory of your home
170 directory) that contains the configuration files for a particular IPython
171 cluster, as well as log files and security keys. The naming convention
172 for cluster directories is: "cluster_<profile name>". Thus, the cluster
173 directory for a profile named "foo" would be :file:`.ipython\\cluster_foo`.
174
175 To create a new cluster profile (named "mycluster"), type the following
176 command at the Windows Command Prompt::
177
178 ipcluster create -p mycluster
179
180 The output of this command is shown in the screenshot below. Notice how
181 :command:`ipcluster` prints out the location of the newly created cluster
182 directory.
183
184
185 .. image:: ipcluster_create.*
186
187
188 Configuring a cluster profile
189 -----------------------------
190
191 Next, you will need to configure the newly created cluster profile by editing
192 the following configuration files in the cluster directory:
193
194 * :file:`ipcluster_config.py`
195 * :file:`ipcontroller_config.py`
196 * :file:`ipengine_config.py`
197
198 When :command:`ipcluster` is run, these configuration files are used to
199 determine how the engines and controller will be started. In most cases,
200 you will only have to set a few of the attributes in these files.
201
202 To configure :command:`ipcluster` to use the Windows HPC job scheduler, you
203 will need to edit the following attributes in the file
204 :file:`ipcluster_config.py`::
205
206 # Set these at the top of the file to tell ipcluster to use the
207 # Windows HPC job scheduler.
208 c.Global.controller_launcher = \
209 'IPython.kernel.launcher.WindowsHPCControllerLauncher'
210 c.Global.engine_launcher = \
211 'IPython.kernel.launcher.WindowsHPCEngineSetLauncher'
212
213 # Set these to the host name of the scheduler (head node) of your cluster.
214 c.WindowsHPCControllerLauncher.scheduler = 'HEADNODE'
215 c.WindowsHPCEngineSetLauncher.scheduler = 'HEADNODE'
216
217 There are a number of other configuration attributes that can be set, but
218 in most cases these will be sufficient to get you started.
219
220 .. warning::
221 If any of your configuration attributes involve specifying the location
222 of shared directories or files, you must make sure that you use UNC paths
223 like :file:`\\\\host\\share`. It is also important that you specify
224 these paths using raw Python strings: ``r'\\host\share'``.
225
226 Starting the cluster profile
227 ----------------------------
228
229 Once a cluster profile has been configured, starting an IPython cluster using
230 the profile is simple:
231
232 ipcluster start -p mycluster -n 32
233
234 The ``-n 32`` option tells :command:`ipcluster` how many engines to start.
235 Stopping the cluster is as simple as typing Control-C.
236
237 Using the HPC Job Manager
238 -------------------------
239
240 When ``ipcluster start`` is run the first time, :command:`ipcluster` creates
241 two XML job description files in the cluster directory:
242
243 * :file:`ipcontroller_job.xml`
244 * :file:`ipengineset_job.xml`
245
246 Once these files have been created, they can be imported into the HPC Job
247 Manager application. Then, the controller and engines for that profile can be
248 started using the HPC Job Manager directly, without using :command:`ipcluster`.
249 However, anytime the cluster profile is re-configured, ``ipcluster start``
250 has to be run again to regenerate the XML job description files. The
251 following screenshot shows what the HPC Job Manager interface looks like
252 with a running IPython cluster.
253
254
255 .. image:: hpc_job_manager.*
256
257 Performing a simple interactive parallel computation
258 ====================================================
259
260 Once you have started your IPython cluster, you can start to use it. To do
261 this, start up IPython's interactive shell by typing::
262
263 ipython
264
265 at the Windows Command Prompt. Then you can create a :class:`MultiEngineClient`
266 instance for your profile and use the resulting instance to
267 have the cluster do a simple interactive parallel computation. In the
268 screenshot that follows, we take a simple Python function::
269
270 def f(x): return x**10
271
272 and apply it to each element of an array of integers in
273 parallel using the :meth:`MultiEngineClient.map` method::
274
275 mec.map(f, range(15))
276
277 The :meth:`map` method has the same signature as Python's builtin :func:`map`
278 function, but runs the calculation in parallel. More involved examples of using
279 :class:`MultiEngineClient` are provided in the examples that follow.
280
281 .. image:: mec_simple.*
282
@@ -0,0 +1,14 b''
1 ========================================
2 Using IPython on Windows HPC Server 2008
3 ========================================
4
5
6 Contents
7 ========
8
9 .. toctree::
10 :maxdepth: 1
11
12 parallel_winhpc.txt
13 parallel_demos.txt
14
@@ -3,37 +3,36 b''
3 3 """Run a Monte-Carlo options pricer in parallel."""
4 4
5 5 from IPython.kernel import client
6 import numpy as N
7 from mcpricer import MCOptionPricer
6 import numpy as np
7 from mcpricer import price_options
8 8
9 9
10 tc = client.TaskClient()
11 rc = client.MultiEngineClient()
10 tc = client.TaskClient(profile='default')
11 mec = client.MultiEngineClient(profile='default')
12 12
13 # Initialize the common code on the engines
14 rc.run('mcpricer.py')
15 13
16 # Push the variables that won't change
17 #(stock print, interest rate, days and MC paths)
18 rc.push(dict(S=100.0, r=0.05, days=260, paths=10000))
14 # Initialize the common code on the engines
15 mec.run('mcpricer.py')
19 16
20 task_string = """\
21 op = MCOptionPricer(S,K,sigma,r,days,paths)
22 op.run()
23 vp, ap, vc, ac = op.vanilla_put, op.asian_put, op.vanilla_call, op.asian_call
24 """
17 # Define the function that will do the calculation
18 def my_prices(K, sigma):
19 S = 100.0
20 r = 0.05
21 days = 260
22 paths = 10000
23 return price_options(S, K, sigma, r, days, paths)
25 24
26 25 # Create arrays of strike prices and volatilities
27 K_vals = N.linspace(90.0,100.0,5)
28 sigma_vals = N.linspace(0.0, 0.2,5)
26 nK = 5
27 nsigma = 5
28 K_vals = np.linspace(90.0, 100.0, nK)
29 sigma_vals = np.linspace(0.0, 0.2, nsigma)
29 30
30 31 # Submit tasks
31 32 taskids = []
32 33 for K in K_vals:
33 34 for sigma in sigma_vals:
34 t = client.StringTask(task_string,
35 push=dict(sigma=sigma,K=K),
36 pull=('vp','ap','vc','ac','sigma','K'))
35 t = client.MapTask(my_prices, args=(K, sigma))
37 36 taskids.append(tc.run(t))
38 37
39 38 print "Submitted tasks: ", taskids
@@ -45,27 +44,21 b' tc.barrier(taskids)'
45 44 results = [tc.get_task_result(tid) for tid in taskids]
46 45
47 46 # Assemble the result
48 vc = N.empty(K_vals.shape[0]*sigma_vals.shape[0],dtype='float64')
49 vp = N.empty(K_vals.shape[0]*sigma_vals.shape[0],dtype='float64')
50 ac = N.empty(K_vals.shape[0]*sigma_vals.shape[0],dtype='float64')
51 ap = N.empty(K_vals.shape[0]*sigma_vals.shape[0],dtype='float64')
52 for i, tr in enumerate(results):
53 ns = tr.ns
54 vc[i] = ns.vc
55 vp[i] = ns.vp
56 ac[i] = ns.ac
57 ap[i] = ns.ap
58 vc.shape = (K_vals.shape[0],sigma_vals.shape[0])
59 vp.shape = (K_vals.shape[0],sigma_vals.shape[0])
60 ac.shape = (K_vals.shape[0],sigma_vals.shape[0])
61 ap.shape = (K_vals.shape[0],sigma_vals.shape[0])
47 prices = np.empty(nK*nsigma,
48 dtype=[('vcall',float),('vput',float),('acall',float),('aput',float)]
49 )
50 for i, price_tuple in enumerate(results):
51 prices[i] = price_tuple
52 prices.shape = (nK, nsigma)
62 53
63 54
64 55 def plot_options(K_vals, sigma_vals, prices):
65 """Make a contour plot of the option prices."""
66 import pylab
67 pylab.contourf(sigma_vals, K_vals, prices)
68 pylab.colorbar()
69 pylab.title("Option Price")
70 pylab.xlabel("Volatility")
71 pylab.ylabel("Strike Price")
56 """
57 Make a contour plot of the option prices.
58 """
59 from matplotlib import pyplot as plt
60 plt.contourf(sigma_vals, K_vals, prices)
61 plt.colorbar()
62 plt.title("Option Price")
63 plt.xlabel("Volatility")
64 plt.ylabel("Strike Price")
@@ -1,43 +1,33 b''
1 import numpy as N
1 import numpy as np
2 2 from math import *
3 3
4 class MCOptionPricer(object):
5 def __init__(self, S=100.0, K=100.0, sigma=0.25, r=0.05, days=260, paths=10000):
6 self.S = S
7 self.K = K
8 self.sigma = sigma
9 self.r = r
10 self.days = days
11 self.paths = paths
12 self.h = 1.0/self.days
13 self.const1 = exp((self.r-0.5*self.sigma**2)*self.h)
14 self.const2 = self.sigma*sqrt(self.h)
15
16 def run(self):
17 stock_price = self.S*N.ones(self.paths, dtype='float64')
18 stock_price_sum = N.zeros(self.paths, dtype='float64')
19 for j in range(self.days):
20 growth_factor = self.const1*N.exp(self.const2*N.random.standard_normal(self.paths))
21 stock_price = stock_price*growth_factor
22 stock_price_sum = stock_price_sum + stock_price
23 stock_price_avg = stock_price_sum/self.days
24 zeros = N.zeros(self.paths, dtype='float64')
25 r_factor = exp(-self.r*self.h*self.days)
26 self.vanilla_put = r_factor*N.mean(N.maximum(zeros,self.K-stock_price))
27 self.asian_put = r_factor*N.mean(N.maximum(zeros,self.K-stock_price_avg))
28 self.vanilla_call = r_factor*N.mean(N.maximum(zeros,stock_price-self.K))
29 self.asian_call = r_factor*N.mean(N.maximum(zeros,stock_price_avg-self.K))
30 4
31
32 def main():
33 op = MCOptionPricer()
34 op.run()
35 print "Vanilla Put Price = ", op.vanilla_put
36 print "Asian Put Price = ", op.asian_put
37 print "Vanilla Call Price = ", op.vanilla_call
38 print "Asian Call Price = ", op.asian_call
5 def price_options(S=100.0, K=100.0, sigma=0.25, r=0.05, days=260, paths=10000):
6 """
7 Price vanilla and asian options using a Monte Carlo method.
8 """
9 h = 1.0/days
10 const1 = exp((r-0.5*sigma**2)*h)
11 const2 = sigma*sqrt(h)
12 stock_price = S*np.ones(paths, dtype='float64')
13 stock_price_sum = np.zeros(paths, dtype='float64')
14 for j in range(days):
15 growth_factor = const1*np.exp(const2*np.random.standard_normal(paths))
16 stock_price = stock_price*growth_factor
17 stock_price_sum = stock_price_sum + stock_price
18 stock_price_avg = stock_price_sum/days
19 zeros = np.zeros(paths, dtype='float64')
20 r_factor = exp(-r*h*days)
21 vanilla_put = r_factor*np.mean(np.maximum(zeros, K-stock_price))
22 asian_put = r_factor*np.mean(np.maximum(zeros, K-stock_price_avg))
23 vanilla_call = r_factor*np.mean(np.maximum(zeros, stock_price-K))
24 asian_call = r_factor*np.mean(np.maximum(zeros, stock_price_avg-K))
25 return (vanilla_call, vanilla_put, asian_call, asian_put)
39 26
40 27
41 28 if __name__ == '__main__':
42 main()
43
29 (vc, vp, ac, ap) = price_options()
30 print "Vanilla Put Price = ", vp
31 print "Asian Put Price = ", ap
32 print "Vanilla Call Price = ", vc
33 print "Asian Call Price = ", ac
@@ -162,10 +162,13 b" latex_font_size = '11pt'"
162 162 # Grouping the document tree into LaTeX files. List of tuples
163 163 # (source start file, target name, title, author, document class [howto/manual]).
164 164
165 latex_documents = [ ('index', 'ipython.tex', 'IPython Documentation',
166 ur"""The IPython Development Team""",
167 'manual', True),
168 ]
165 latex_documents = [
166 ('index', 'ipython.tex', 'IPython Documentation',
167 ur"""The IPython Development Team""", 'manual', True),
168 ('parallel/winhpc_index', 'winhpc_whitepaper.tex',
169 'Using IPython on Windows HPC Server 2008',
170 ur"Brian E. Granger", 'manual', True)
171 ]
169 172
170 173 # The name of an image file (relative to this directory) to place at the top of
171 174 # the title page.
General Comments 0
You need to be logged in to leave comments. Login now