##// END OF EJS Templates
add note about examples to head of parallel docs
MinRK -
Show More
@@ -1,275 +1,277 b''
1 .. _parallel_examples:
2
1 =================
3 =================
2 Parallel examples
4 Parallel examples
3 =================
5 =================
4
6
5 In this section we describe two more involved examples of using an IPython
7 In this section we describe two more involved examples of using an IPython
6 cluster to perform a parallel computation. In these examples, we will be using
8 cluster to perform a parallel computation. In these examples, we will be using
7 IPython's "pylab" mode, which enables interactive plotting using the
9 IPython's "pylab" mode, which enables interactive plotting using the
8 Matplotlib package. IPython can be started in this mode by typing::
10 Matplotlib package. IPython can be started in this mode by typing::
9
11
10 ipython --pylab
12 ipython --pylab
11
13
12 at the system command line.
14 at the system command line.
13
15
14 150 million digits of pi
16 150 million digits of pi
15 ========================
17 ========================
16
18
17 In this example we would like to study the distribution of digits in the
19 In this example we would like to study the distribution of digits in the
18 number pi (in base 10). While it is not known if pi is a normal number (a
20 number pi (in base 10). While it is not known if pi is a normal number (a
19 number is normal in base 10 if 0-9 occur with equal likelihood) numerical
21 number is normal in base 10 if 0-9 occur with equal likelihood) numerical
20 investigations suggest that it is. We will begin with a serial calculation on
22 investigations suggest that it is. We will begin with a serial calculation on
21 10,000 digits of pi and then perform a parallel calculation involving 150
23 10,000 digits of pi and then perform a parallel calculation involving 150
22 million digits.
24 million digits.
23
25
24 In both the serial and parallel calculation we will be using functions defined
26 In both the serial and parallel calculation we will be using functions defined
25 in the :file:`pidigits.py` file, which is available in the
27 in the :file:`pidigits.py` file, which is available in the
26 :file:`docs/examples/parallel` directory of the IPython source distribution.
28 :file:`docs/examples/parallel` directory of the IPython source distribution.
27 These functions provide basic facilities for working with the digits of pi and
29 These functions provide basic facilities for working with the digits of pi and
28 can be loaded into IPython by putting :file:`pidigits.py` in your current
30 can be loaded into IPython by putting :file:`pidigits.py` in your current
29 working directory and then doing:
31 working directory and then doing:
30
32
31 .. sourcecode:: ipython
33 .. sourcecode:: ipython
32
34
33 In [1]: run pidigits.py
35 In [1]: run pidigits.py
34
36
35 Serial calculation
37 Serial calculation
36 ------------------
38 ------------------
37
39
38 For the serial calculation, we will use `SymPy <http://www.sympy.org>`_ to
40 For the serial calculation, we will use `SymPy <http://www.sympy.org>`_ to
39 calculate 10,000 digits of pi and then look at the frequencies of the digits
41 calculate 10,000 digits of pi and then look at the frequencies of the digits
40 0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While
42 0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While
41 SymPy is capable of calculating many more digits of pi, our purpose here is to
43 SymPy is capable of calculating many more digits of pi, our purpose here is to
42 set the stage for the much larger parallel calculation.
44 set the stage for the much larger parallel calculation.
43
45
44 In this example, we use two functions from :file:`pidigits.py`:
46 In this example, we use two functions from :file:`pidigits.py`:
45 :func:`one_digit_freqs` (which calculates how many times each digit occurs)
47 :func:`one_digit_freqs` (which calculates how many times each digit occurs)
46 and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result).
48 and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result).
47 Here is an interactive IPython session that uses these functions with
49 Here is an interactive IPython session that uses these functions with
48 SymPy:
50 SymPy:
49
51
50 .. sourcecode:: ipython
52 .. sourcecode:: ipython
51
53
52 In [7]: import sympy
54 In [7]: import sympy
53
55
54 In [8]: pi = sympy.pi.evalf(40)
56 In [8]: pi = sympy.pi.evalf(40)
55
57
56 In [9]: pi
58 In [9]: pi
57 Out[9]: 3.141592653589793238462643383279502884197
59 Out[9]: 3.141592653589793238462643383279502884197
58
60
59 In [10]: pi = sympy.pi.evalf(10000)
61 In [10]: pi = sympy.pi.evalf(10000)
60
62
61 In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits
63 In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits
62
64
63 In [12]: run pidigits.py # load one_digit_freqs/plot_one_digit_freqs
65 In [12]: run pidigits.py # load one_digit_freqs/plot_one_digit_freqs
64
66
65 In [13]: freqs = one_digit_freqs(digits)
67 In [13]: freqs = one_digit_freqs(digits)
66
68
67 In [14]: plot_one_digit_freqs(freqs)
69 In [14]: plot_one_digit_freqs(freqs)
68 Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>]
70 Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>]
69
71
70 The resulting plot of the single digit counts shows that each digit occurs
72 The resulting plot of the single digit counts shows that each digit occurs
71 approximately 1,000 times, but that with only 10,000 digits the
73 approximately 1,000 times, but that with only 10,000 digits the
72 statistical fluctuations are still rather large:
74 statistical fluctuations are still rather large:
73
75
74 .. image:: figs/single_digits.*
76 .. image:: figs/single_digits.*
75
77
76 It is clear that to reduce the relative fluctuations in the counts, we need
78 It is clear that to reduce the relative fluctuations in the counts, we need
77 to look at many more digits of pi. That brings us to the parallel calculation.
79 to look at many more digits of pi. That brings us to the parallel calculation.
78
80
79 Parallel calculation
81 Parallel calculation
80 --------------------
82 --------------------
81
83
82 Calculating many digits of pi is a challenging computational problem in itself.
84 Calculating many digits of pi is a challenging computational problem in itself.
83 Because we want to focus on the distribution of digits in this example, we
85 Because we want to focus on the distribution of digits in this example, we
84 will use pre-computed digit of pi from the website of Professor Yasumasa
86 will use pre-computed digit of pi from the website of Professor Yasumasa
85 Kanada at the University of Tokyo (http://www.super-computing.org). These
87 Kanada at the University of Tokyo (http://www.super-computing.org). These
86 digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/)
88 digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/)
87 that each have 10 million digits of pi.
89 that each have 10 million digits of pi.
88
90
89 For the parallel calculation, we have copied these files to the local hard
91 For the parallel calculation, we have copied these files to the local hard
90 drives of the compute nodes. A total of 15 of these files will be used, for a
92 drives of the compute nodes. A total of 15 of these files will be used, for a
91 total of 150 million digits of pi. To make things a little more interesting we
93 total of 150 million digits of pi. To make things a little more interesting we
92 will calculate the frequencies of all 2 digits sequences (00-99) and then plot
94 will calculate the frequencies of all 2 digits sequences (00-99) and then plot
93 the result using a 2D matrix in Matplotlib.
95 the result using a 2D matrix in Matplotlib.
94
96
95 The overall idea of the calculation is simple: each IPython engine will
97 The overall idea of the calculation is simple: each IPython engine will
96 compute the two digit counts for the digits in a single file. Then in a final
98 compute the two digit counts for the digits in a single file. Then in a final
97 step the counts from each engine will be added up. To perform this
99 step the counts from each engine will be added up. To perform this
98 calculation, we will need two top-level functions from :file:`pidigits.py`:
100 calculation, we will need two top-level functions from :file:`pidigits.py`:
99
101
100 .. literalinclude:: ../../examples/parallel/pi/pidigits.py
102 .. literalinclude:: ../../examples/parallel/pi/pidigits.py
101 :language: python
103 :language: python
102 :lines: 47-62
104 :lines: 47-62
103
105
104 We will also use the :func:`plot_two_digit_freqs` function to plot the
106 We will also use the :func:`plot_two_digit_freqs` function to plot the
105 results. The code to run this calculation in parallel is contained in
107 results. The code to run this calculation in parallel is contained in
106 :file:`docs/examples/parallel/parallelpi.py`. This code can be run in parallel
108 :file:`docs/examples/parallel/parallelpi.py`. This code can be run in parallel
107 using IPython by following these steps:
109 using IPython by following these steps:
108
110
109 1. Use :command:`ipcluster` to start 15 engines. We used 16 cores of an SGE linux
111 1. Use :command:`ipcluster` to start 15 engines. We used 16 cores of an SGE linux
110 cluster (1 controller + 15 engines).
112 cluster (1 controller + 15 engines).
111 2. With the file :file:`parallelpi.py` in your current working directory, open
113 2. With the file :file:`parallelpi.py` in your current working directory, open
112 up IPython in pylab mode and type ``run parallelpi.py``. This will download
114 up IPython in pylab mode and type ``run parallelpi.py``. This will download
113 the pi files via ftp the first time you run it, if they are not
115 the pi files via ftp the first time you run it, if they are not
114 present in the Engines' working directory.
116 present in the Engines' working directory.
115
117
116 When run on our 16 cores, we observe a speedup of 14.2x. This is slightly
118 When run on our 16 cores, we observe a speedup of 14.2x. This is slightly
117 less than linear scaling (16x) because the controller is also running on one of
119 less than linear scaling (16x) because the controller is also running on one of
118 the cores.
120 the cores.
119
121
120 To emphasize the interactive nature of IPython, we now show how the
122 To emphasize the interactive nature of IPython, we now show how the
121 calculation can also be run by simply typing the commands from
123 calculation can also be run by simply typing the commands from
122 :file:`parallelpi.py` interactively into IPython:
124 :file:`parallelpi.py` interactively into IPython:
123
125
124 .. sourcecode:: ipython
126 .. sourcecode:: ipython
125
127
126 In [1]: from IPython.parallel import Client
128 In [1]: from IPython.parallel import Client
127
129
128 # The Client allows us to use the engines interactively.
130 # The Client allows us to use the engines interactively.
129 # We simply pass Client the name of the cluster profile we
131 # We simply pass Client the name of the cluster profile we
130 # are using.
132 # are using.
131 In [2]: c = Client(profile='mycluster')
133 In [2]: c = Client(profile='mycluster')
132 In [3]: v = c[:]
134 In [3]: v = c[:]
133
135
134 In [3]: c.ids
136 In [3]: c.ids
135 Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
137 Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
136
138
137 In [4]: run pidigits.py
139 In [4]: run pidigits.py
138
140
139 In [5]: filestring = 'pi200m.ascii.%(i)02dof20'
141 In [5]: filestring = 'pi200m.ascii.%(i)02dof20'
140
142
141 # Create the list of files to process.
143 # Create the list of files to process.
142 In [6]: files = [filestring % {'i':i} for i in range(1,16)]
144 In [6]: files = [filestring % {'i':i} for i in range(1,16)]
143
145
144 In [7]: files
146 In [7]: files
145 Out[7]:
147 Out[7]:
146 ['pi200m.ascii.01of20',
148 ['pi200m.ascii.01of20',
147 'pi200m.ascii.02of20',
149 'pi200m.ascii.02of20',
148 'pi200m.ascii.03of20',
150 'pi200m.ascii.03of20',
149 'pi200m.ascii.04of20',
151 'pi200m.ascii.04of20',
150 'pi200m.ascii.05of20',
152 'pi200m.ascii.05of20',
151 'pi200m.ascii.06of20',
153 'pi200m.ascii.06of20',
152 'pi200m.ascii.07of20',
154 'pi200m.ascii.07of20',
153 'pi200m.ascii.08of20',
155 'pi200m.ascii.08of20',
154 'pi200m.ascii.09of20',
156 'pi200m.ascii.09of20',
155 'pi200m.ascii.10of20',
157 'pi200m.ascii.10of20',
156 'pi200m.ascii.11of20',
158 'pi200m.ascii.11of20',
157 'pi200m.ascii.12of20',
159 'pi200m.ascii.12of20',
158 'pi200m.ascii.13of20',
160 'pi200m.ascii.13of20',
159 'pi200m.ascii.14of20',
161 'pi200m.ascii.14of20',
160 'pi200m.ascii.15of20']
162 'pi200m.ascii.15of20']
161
163
162 # download the data files if they don't already exist:
164 # download the data files if they don't already exist:
163 In [8]: v.map(fetch_pi_file, files)
165 In [8]: v.map(fetch_pi_file, files)
164
166
165 # This is the parallel calculation using the Client.map method
167 # This is the parallel calculation using the Client.map method
166 # which applies compute_two_digit_freqs to each file in files in parallel.
168 # which applies compute_two_digit_freqs to each file in files in parallel.
167 In [9]: freqs_all = v.map(compute_two_digit_freqs, files)
169 In [9]: freqs_all = v.map(compute_two_digit_freqs, files)
168
170
169 # Add up the frequencies from each engine.
171 # Add up the frequencies from each engine.
170 In [10]: freqs = reduce_freqs(freqs_all)
172 In [10]: freqs = reduce_freqs(freqs_all)
171
173
172 In [11]: plot_two_digit_freqs(freqs)
174 In [11]: plot_two_digit_freqs(freqs)
173 Out[11]: <matplotlib.image.AxesImage object at 0x18beb110>
175 Out[11]: <matplotlib.image.AxesImage object at 0x18beb110>
174
176
175 In [12]: plt.title('2 digit counts of 150m digits of pi')
177 In [12]: plt.title('2 digit counts of 150m digits of pi')
176 Out[12]: <matplotlib.text.Text object at 0x18d1f9b0>
178 Out[12]: <matplotlib.text.Text object at 0x18d1f9b0>
177
179
178 The resulting plot generated by Matplotlib is shown below. The colors indicate
180 The resulting plot generated by Matplotlib is shown below. The colors indicate
179 which two digit sequences are more (red) or less (blue) likely to occur in the
181 which two digit sequences are more (red) or less (blue) likely to occur in the
180 first 150 million digits of pi. We clearly see that the sequence "41" is
182 first 150 million digits of pi. We clearly see that the sequence "41" is
181 most likely and that "06" and "07" are least likely. Further analysis would
183 most likely and that "06" and "07" are least likely. Further analysis would
182 show that the relative size of the statistical fluctuations have decreased
184 show that the relative size of the statistical fluctuations have decreased
183 compared to the 10,000 digit calculation.
185 compared to the 10,000 digit calculation.
184
186
185 .. image:: figs/two_digit_counts.*
187 .. image:: figs/two_digit_counts.*
186
188
187
189
188 Parallel options pricing
190 Parallel options pricing
189 ========================
191 ========================
190
192
191 An option is a financial contract that gives the buyer of the contract the
193 An option is a financial contract that gives the buyer of the contract the
192 right to buy (a "call") or sell (a "put") a secondary asset (a stock for
194 right to buy (a "call") or sell (a "put") a secondary asset (a stock for
193 example) at a particular date in the future (the expiration date) for a
195 example) at a particular date in the future (the expiration date) for a
194 pre-agreed upon price (the strike price). For this right, the buyer pays the
196 pre-agreed upon price (the strike price). For this right, the buyer pays the
195 seller a premium (the option price). There are a wide variety of flavors of
197 seller a premium (the option price). There are a wide variety of flavors of
196 options (American, European, Asian, etc.) that are useful for different
198 options (American, European, Asian, etc.) that are useful for different
197 purposes: hedging against risk, speculation, etc.
199 purposes: hedging against risk, speculation, etc.
198
200
199 Much of modern finance is driven by the need to price these contracts
201 Much of modern finance is driven by the need to price these contracts
200 accurately based on what is known about the properties (such as volatility) of
202 accurately based on what is known about the properties (such as volatility) of
201 the underlying asset. One method of pricing options is to use a Monte Carlo
203 the underlying asset. One method of pricing options is to use a Monte Carlo
202 simulation of the underlying asset price. In this example we use this approach
204 simulation of the underlying asset price. In this example we use this approach
203 to price both European and Asian (path dependent) options for various strike
205 to price both European and Asian (path dependent) options for various strike
204 prices and volatilities.
206 prices and volatilities.
205
207
206 The code for this example can be found in the :file:`docs/examples/parallel/options`
208 The code for this example can be found in the :file:`docs/examples/parallel/options`
207 directory of the IPython source. The function :func:`price_options` in
209 directory of the IPython source. The function :func:`price_options` in
208 :file:`mckernel.py` implements the basic Monte Carlo pricing algorithm using
210 :file:`mckernel.py` implements the basic Monte Carlo pricing algorithm using
209 the NumPy package and is shown here:
211 the NumPy package and is shown here:
210
212
211 .. literalinclude:: ../../examples/parallel/options/mckernel.py
213 .. literalinclude:: ../../examples/parallel/options/mckernel.py
212 :language: python
214 :language: python
213
215
214 To run this code in parallel, we will use IPython's :class:`LoadBalancedView` class,
216 To run this code in parallel, we will use IPython's :class:`LoadBalancedView` class,
215 which distributes work to the engines using dynamic load balancing. This
217 which distributes work to the engines using dynamic load balancing. This
216 view is a wrapper of the :class:`Client` class shown in
218 view is a wrapper of the :class:`Client` class shown in
217 the previous example. The parallel calculation using :class:`LoadBalancedView` can
219 the previous example. The parallel calculation using :class:`LoadBalancedView` can
218 be found in the file :file:`mcpricer.py`. The code in this file creates a
220 be found in the file :file:`mcpricer.py`. The code in this file creates a
219 :class:`LoadBalancedView` instance and then submits a set of tasks using
221 :class:`LoadBalancedView` instance and then submits a set of tasks using
220 :meth:`LoadBalancedView.apply` that calculate the option prices for different
222 :meth:`LoadBalancedView.apply` that calculate the option prices for different
221 volatilities and strike prices. The results are then plotted as a 2D contour
223 volatilities and strike prices. The results are then plotted as a 2D contour
222 plot using Matplotlib.
224 plot using Matplotlib.
223
225
224 .. literalinclude:: ../../examples/parallel/options/mcpricer.py
226 .. literalinclude:: ../../examples/parallel/options/mcpricer.py
225 :language: python
227 :language: python
226
228
227 To use this code, start an IPython cluster using :command:`ipcluster`, open
229 To use this code, start an IPython cluster using :command:`ipcluster`, open
228 IPython in the pylab mode with the file :file:`mckernel.py` in your current
230 IPython in the pylab mode with the file :file:`mckernel.py` in your current
229 working directory and then type:
231 working directory and then type:
230
232
231 .. sourcecode:: ipython
233 .. sourcecode:: ipython
232
234
233 In [7]: run mcpricer.py
235 In [7]: run mcpricer.py
234
236
235 Submitted tasks: 30
237 Submitted tasks: 30
236
238
237 Once all the tasks have finished, the results can be plotted using the
239 Once all the tasks have finished, the results can be plotted using the
238 :func:`plot_options` function. Here we make contour plots of the Asian
240 :func:`plot_options` function. Here we make contour plots of the Asian
239 call and Asian put options as function of the volatility and strike price:
241 call and Asian put options as function of the volatility and strike price:
240
242
241 .. sourcecode:: ipython
243 .. sourcecode:: ipython
242
244
243 In [8]: plot_options(sigma_vals, strike_vals, prices['acall'])
245 In [8]: plot_options(sigma_vals, strike_vals, prices['acall'])
244
246
245 In [9]: plt.figure()
247 In [9]: plt.figure()
246 Out[9]: <matplotlib.figure.Figure object at 0x18c178d0>
248 Out[9]: <matplotlib.figure.Figure object at 0x18c178d0>
247
249
248 In [10]: plot_options(sigma_vals, strike_vals, prices['aput'])
250 In [10]: plot_options(sigma_vals, strike_vals, prices['aput'])
249
251
250 These results are shown in the two figures below. On our 15 engines, the
252 These results are shown in the two figures below. On our 15 engines, the
251 entire calculation (15 strike prices, 15 volatilities, 100,000 paths for each)
253 entire calculation (15 strike prices, 15 volatilities, 100,000 paths for each)
252 took 37 seconds in parallel, giving a speedup of 14.1x, which is comparable
254 took 37 seconds in parallel, giving a speedup of 14.1x, which is comparable
253 to the speedup observed in our previous example.
255 to the speedup observed in our previous example.
254
256
255 .. image:: figs/asian_call.*
257 .. image:: figs/asian_call.*
256
258
257 .. image:: figs/asian_put.*
259 .. image:: figs/asian_put.*
258
260
259 Conclusion
261 Conclusion
260 ==========
262 ==========
261
263
262 To conclude these examples, we summarize the key features of IPython's
264 To conclude these examples, we summarize the key features of IPython's
263 parallel architecture that have been demonstrated:
265 parallel architecture that have been demonstrated:
264
266
265 * Serial code can be parallelized often with only a few extra lines of code.
267 * Serial code can be parallelized often with only a few extra lines of code.
266 We have used the :class:`DirectView` and :class:`LoadBalancedView` classes
268 We have used the :class:`DirectView` and :class:`LoadBalancedView` classes
267 for this purpose.
269 for this purpose.
268 * The resulting parallel code can be run without ever leaving the IPython's
270 * The resulting parallel code can be run without ever leaving the IPython's
269 interactive shell.
271 interactive shell.
270 * Any data computed in parallel can be explored interactively through
272 * Any data computed in parallel can be explored interactively through
271 visualization or further numerical calculations.
273 visualization or further numerical calculations.
272 * We have run these examples on a cluster running RHEL 5 and Sun GridEngine.
274 * We have run these examples on a cluster running RHEL 5 and Sun GridEngine.
273 IPython's built in support for SGE (and other batch systems) makes it easy
275 IPython's built in support for SGE (and other batch systems) makes it easy
274 to get started with IPython's parallel capabilities.
276 to get started with IPython's parallel capabilities.
275
277
@@ -1,295 +1,306 b''
1 .. _parallel_overview:
1 .. _parallel_overview:
2
2
3 ============================
3 ============================
4 Overview and getting started
4 Overview and getting started
5 ============================
5 ============================
6
6
7
8 Examples
9 ========
10
11 We have various example scripts and notebooks for using IPython.parallel in our
12 :file:`docs/examples/parallel` directory, or they can be found `on GitHub`__.
13 Some of these are covered in more detail in the :ref:`examples
14 <parallel_examples>` section.
15
16 .. __: https://github.com/ipython/ipython/tree/master/docs/examples/parallel
17
7 Introduction
18 Introduction
8 ============
19 ============
9
20
10 This section gives an overview of IPython's sophisticated and powerful
21 This section gives an overview of IPython's sophisticated and powerful
11 architecture for parallel and distributed computing. This architecture
22 architecture for parallel and distributed computing. This architecture
12 abstracts out parallelism in a very general way, which enables IPython to
23 abstracts out parallelism in a very general way, which enables IPython to
13 support many different styles of parallelism including:
24 support many different styles of parallelism including:
14
25
15 * Single program, multiple data (SPMD) parallelism.
26 * Single program, multiple data (SPMD) parallelism.
16 * Multiple program, multiple data (MPMD) parallelism.
27 * Multiple program, multiple data (MPMD) parallelism.
17 * Message passing using MPI.
28 * Message passing using MPI.
18 * Task farming.
29 * Task farming.
19 * Data parallel.
30 * Data parallel.
20 * Combinations of these approaches.
31 * Combinations of these approaches.
21 * Custom user defined approaches.
32 * Custom user defined approaches.
22
33
23 Most importantly, IPython enables all types of parallel applications to
34 Most importantly, IPython enables all types of parallel applications to
24 be developed, executed, debugged and monitored *interactively*. Hence,
35 be developed, executed, debugged and monitored *interactively*. Hence,
25 the ``I`` in IPython. The following are some example usage cases for IPython:
36 the ``I`` in IPython. The following are some example usage cases for IPython:
26
37
27 * Quickly parallelize algorithms that are embarrassingly parallel
38 * Quickly parallelize algorithms that are embarrassingly parallel
28 using a number of simple approaches. Many simple things can be
39 using a number of simple approaches. Many simple things can be
29 parallelized interactively in one or two lines of code.
40 parallelized interactively in one or two lines of code.
30
41
31 * Steer traditional MPI applications on a supercomputer from an
42 * Steer traditional MPI applications on a supercomputer from an
32 IPython session on your laptop.
43 IPython session on your laptop.
33
44
34 * Analyze and visualize large datasets (that could be remote and/or
45 * Analyze and visualize large datasets (that could be remote and/or
35 distributed) interactively using IPython and tools like
46 distributed) interactively using IPython and tools like
36 matplotlib/TVTK.
47 matplotlib/TVTK.
37
48
38 * Develop, test and debug new parallel algorithms
49 * Develop, test and debug new parallel algorithms
39 (that may use MPI) interactively.
50 (that may use MPI) interactively.
40
51
41 * Tie together multiple MPI jobs running on different systems into
52 * Tie together multiple MPI jobs running on different systems into
42 one giant distributed and parallel system.
53 one giant distributed and parallel system.
43
54
44 * Start a parallel job on your cluster and then have a remote
55 * Start a parallel job on your cluster and then have a remote
45 collaborator connect to it and pull back data into their
56 collaborator connect to it and pull back data into their
46 local IPython session for plotting and analysis.
57 local IPython session for plotting and analysis.
47
58
48 * Run a set of tasks on a set of CPUs using dynamic load balancing.
59 * Run a set of tasks on a set of CPUs using dynamic load balancing.
49
60
50 .. tip::
61 .. tip::
51
62
52 At the SciPy 2011 conference in Austin, Min Ragan-Kelley presented a
63 At the SciPy 2011 conference in Austin, Min Ragan-Kelley presented a
53 complete 4-hour tutorial on the use of these features, and all the materials
64 complete 4-hour tutorial on the use of these features, and all the materials
54 for the tutorial are now `available online`__. That tutorial provides an
65 for the tutorial are now `available online`__. That tutorial provides an
55 excellent, hands-on oriented complement to the reference documentation
66 excellent, hands-on oriented complement to the reference documentation
56 presented here.
67 presented here.
57
68
58 .. __: http://minrk.github.com/scipy-tutorial-2011
69 .. __: http://minrk.github.com/scipy-tutorial-2011
59
70
60 Architecture overview
71 Architecture overview
61 =====================
72 =====================
62
73
63 .. figure:: figs/wideView.png
74 .. figure:: figs/wideView.png
64 :width: 300px
75 :width: 300px
65
76
66
77
67 The IPython architecture consists of four components:
78 The IPython architecture consists of four components:
68
79
69 * The IPython engine.
80 * The IPython engine.
70 * The IPython hub.
81 * The IPython hub.
71 * The IPython schedulers.
82 * The IPython schedulers.
72 * The controller client.
83 * The controller client.
73
84
74 These components live in the :mod:`IPython.parallel` package and are
85 These components live in the :mod:`IPython.parallel` package and are
75 installed with IPython. They do, however, have additional dependencies
86 installed with IPython. They do, however, have additional dependencies
76 that must be installed. For more information, see our
87 that must be installed. For more information, see our
77 :ref:`installation documentation <install_index>`.
88 :ref:`installation documentation <install_index>`.
78
89
79 .. TODO: include zmq in install_index
90 .. TODO: include zmq in install_index
80
91
81 IPython engine
92 IPython engine
82 ---------------
93 ---------------
83
94
84 The IPython engine is a Python instance that takes Python commands over a
95 The IPython engine is a Python instance that takes Python commands over a
85 network connection. Eventually, the IPython engine will be a full IPython
96 network connection. Eventually, the IPython engine will be a full IPython
86 interpreter, but for now, it is a regular Python interpreter. The engine
97 interpreter, but for now, it is a regular Python interpreter. The engine
87 can also handle incoming and outgoing Python objects sent over a network
98 can also handle incoming and outgoing Python objects sent over a network
88 connection. When multiple engines are started, parallel and distributed
99 connection. When multiple engines are started, parallel and distributed
89 computing becomes possible. An important feature of an IPython engine is
100 computing becomes possible. An important feature of an IPython engine is
90 that it blocks while user code is being executed. Read on for how the
101 that it blocks while user code is being executed. Read on for how the
91 IPython controller solves this problem to expose a clean asynchronous API
102 IPython controller solves this problem to expose a clean asynchronous API
92 to the user.
103 to the user.
93
104
94 IPython controller
105 IPython controller
95 ------------------
106 ------------------
96
107
97 The IPython controller processes provide an interface for working with a set of engines.
108 The IPython controller processes provide an interface for working with a set of engines.
98 At a general level, the controller is a collection of processes to which IPython engines
109 At a general level, the controller is a collection of processes to which IPython engines
99 and clients can connect. The controller is composed of a :class:`Hub` and a collection of
110 and clients can connect. The controller is composed of a :class:`Hub` and a collection of
100 :class:`Schedulers`. These Schedulers are typically run in separate processes but on the
111 :class:`Schedulers`. These Schedulers are typically run in separate processes but on the
101 same machine as the Hub, but can be run anywhere from local threads or on remote machines.
112 same machine as the Hub, but can be run anywhere from local threads or on remote machines.
102
113
103 The controller also provides a single point of contact for users who wish to
114 The controller also provides a single point of contact for users who wish to
104 utilize the engines connected to the controller. There are different ways of
115 utilize the engines connected to the controller. There are different ways of
105 working with a controller. In IPython, all of these models are implemented via
116 working with a controller. In IPython, all of these models are implemented via
106 the :meth:`.View.apply` method, after
117 the :meth:`.View.apply` method, after
107 constructing :class:`.View` objects to represent subsets of engines. The two
118 constructing :class:`.View` objects to represent subsets of engines. The two
108 primary models for interacting with engines are:
119 primary models for interacting with engines are:
109
120
110 * A **Direct** interface, where engines are addressed explicitly.
121 * A **Direct** interface, where engines are addressed explicitly.
111 * A **LoadBalanced** interface, where the Scheduler is trusted with assigning work to
122 * A **LoadBalanced** interface, where the Scheduler is trusted with assigning work to
112 appropriate engines.
123 appropriate engines.
113
124
114 Advanced users can readily extend the View models to enable other
125 Advanced users can readily extend the View models to enable other
115 styles of parallelism.
126 styles of parallelism.
116
127
117 .. note::
128 .. note::
118
129
119 A single controller and set of engines can be used with multiple models
130 A single controller and set of engines can be used with multiple models
120 simultaneously. This opens the door for lots of interesting things.
131 simultaneously. This opens the door for lots of interesting things.
121
132
122
133
123 The Hub
134 The Hub
124 *******
135 *******
125
136
126 The center of an IPython cluster is the Hub. This is the process that keeps
137 The center of an IPython cluster is the Hub. This is the process that keeps
127 track of engine connections, schedulers, clients, as well as all task requests and
138 track of engine connections, schedulers, clients, as well as all task requests and
128 results. The primary role of the Hub is to facilitate queries of the cluster state, and
139 results. The primary role of the Hub is to facilitate queries of the cluster state, and
129 minimize the necessary information required to establish the many connections involved in
140 minimize the necessary information required to establish the many connections involved in
130 connecting new clients and engines.
141 connecting new clients and engines.
131
142
132
143
133 Schedulers
144 Schedulers
134 **********
145 **********
135
146
136 All actions that can be performed on the engine go through a Scheduler. While the engines
147 All actions that can be performed on the engine go through a Scheduler. While the engines
137 themselves block when user code is run, the schedulers hide that from the user to provide
148 themselves block when user code is run, the schedulers hide that from the user to provide
138 a fully asynchronous interface to a set of engines.
149 a fully asynchronous interface to a set of engines.
139
150
140
151
141 IPython client and views
152 IPython client and views
142 ------------------------
153 ------------------------
143
154
144 There is one primary object, the :class:`~.parallel.Client`, for connecting to a cluster.
155 There is one primary object, the :class:`~.parallel.Client`, for connecting to a cluster.
145 For each execution model, there is a corresponding :class:`~.parallel.View`. These views
156 For each execution model, there is a corresponding :class:`~.parallel.View`. These views
146 allow users to interact with a set of engines through the interface. Here are the two default
157 allow users to interact with a set of engines through the interface. Here are the two default
147 views:
158 views:
148
159
149 * The :class:`DirectView` class for explicit addressing.
160 * The :class:`DirectView` class for explicit addressing.
150 * The :class:`LoadBalancedView` class for destination-agnostic scheduling.
161 * The :class:`LoadBalancedView` class for destination-agnostic scheduling.
151
162
152 Security
163 Security
153 --------
164 --------
154
165
155 IPython uses ZeroMQ for networking, which has provided many advantages, but
166 IPython uses ZeroMQ for networking, which has provided many advantages, but
156 one of the setbacks is its utter lack of security [ZeroMQ]_. By default, no IPython
167 one of the setbacks is its utter lack of security [ZeroMQ]_. By default, no IPython
157 connections are encrypted, but open ports only listen on localhost. The only
168 connections are encrypted, but open ports only listen on localhost. The only
158 source of security for IPython is via ssh-tunnel. IPython supports both shell
169 source of security for IPython is via ssh-tunnel. IPython supports both shell
159 (`openssh`) and `paramiko` based tunnels for connections. There is a key necessary
170 (`openssh`) and `paramiko` based tunnels for connections. There is a key necessary
160 to submit requests, but due to the lack of encryption, it does not provide
171 to submit requests, but due to the lack of encryption, it does not provide
161 significant security if loopback traffic is compromised.
172 significant security if loopback traffic is compromised.
162
173
163 In our architecture, the controller is the only process that listens on
174 In our architecture, the controller is the only process that listens on
164 network ports, and is thus the main point of vulnerability. The standard model
175 network ports, and is thus the main point of vulnerability. The standard model
165 for secure connections is to designate that the controller listen on
176 for secure connections is to designate that the controller listen on
166 localhost, and use ssh-tunnels to connect clients and/or
177 localhost, and use ssh-tunnels to connect clients and/or
167 engines.
178 engines.
168
179
169 To connect and authenticate to the controller an engine or client needs
180 To connect and authenticate to the controller an engine or client needs
170 some information that the controller has stored in a JSON file.
181 some information that the controller has stored in a JSON file.
171 Thus, the JSON files need to be copied to a location where
182 Thus, the JSON files need to be copied to a location where
172 the clients and engines can find them. Typically, this is the
183 the clients and engines can find them. Typically, this is the
173 :file:`~/.ipython/profile_default/security` directory on the host where the
184 :file:`~/.ipython/profile_default/security` directory on the host where the
174 client/engine is running (which could be a different host than the controller).
185 client/engine is running (which could be a different host than the controller).
175 Once the JSON files are copied over, everything should work fine.
186 Once the JSON files are copied over, everything should work fine.
176
187
177 Currently, there are two JSON files that the controller creates:
188 Currently, there are two JSON files that the controller creates:
178
189
179 ipcontroller-engine.json
190 ipcontroller-engine.json
180 This JSON file has the information necessary for an engine to connect
191 This JSON file has the information necessary for an engine to connect
181 to a controller.
192 to a controller.
182
193
183 ipcontroller-client.json
194 ipcontroller-client.json
184 The client's connection information. This may not differ from the engine's,
195 The client's connection information. This may not differ from the engine's,
185 but since the controller may listen on different ports for clients and
196 but since the controller may listen on different ports for clients and
186 engines, it is stored separately.
197 engines, it is stored separately.
187
198
188 ipcontroller-client.json will look something like this, under default localhost
199 ipcontroller-client.json will look something like this, under default localhost
189 circumstances:
200 circumstances:
190
201
191 .. sourcecode:: python
202 .. sourcecode:: python
192
203
193 {
204 {
194 "url":"tcp:\/\/127.0.0.1:54424",
205 "url":"tcp:\/\/127.0.0.1:54424",
195 "exec_key":"a361fe89-92fc-4762-9767-e2f0a05e3130",
206 "exec_key":"a361fe89-92fc-4762-9767-e2f0a05e3130",
196 "ssh":"",
207 "ssh":"",
197 "location":"10.19.1.135"
208 "location":"10.19.1.135"
198 }
209 }
199
210
200 If, however, you are running the controller on a work node on a cluster, you will likely
211 If, however, you are running the controller on a work node on a cluster, you will likely
201 need to use ssh tunnels to connect clients from your laptop to it. You will also
212 need to use ssh tunnels to connect clients from your laptop to it. You will also
202 probably need to instruct the controller to listen for engines coming from other work nodes
213 probably need to instruct the controller to listen for engines coming from other work nodes
203 on the cluster. An example of ipcontroller-client.json, as created by::
214 on the cluster. An example of ipcontroller-client.json, as created by::
204
215
205 $> ipcontroller --ip=0.0.0.0 --ssh=login.mycluster.com
216 $> ipcontroller --ip=0.0.0.0 --ssh=login.mycluster.com
206
217
207
218
208 .. sourcecode:: python
219 .. sourcecode:: python
209
220
210 {
221 {
211 "url":"tcp:\/\/*:54424",
222 "url":"tcp:\/\/*:54424",
212 "exec_key":"a361fe89-92fc-4762-9767-e2f0a05e3130",
223 "exec_key":"a361fe89-92fc-4762-9767-e2f0a05e3130",
213 "ssh":"login.mycluster.com",
224 "ssh":"login.mycluster.com",
214 "location":"10.0.0.2"
225 "location":"10.0.0.2"
215 }
226 }
216 More details of how these JSON files are used are given below.
227 More details of how these JSON files are used are given below.
217
228
218 A detailed description of the security model and its implementation in IPython
229 A detailed description of the security model and its implementation in IPython
219 can be found :ref:`here <parallelsecurity>`.
230 can be found :ref:`here <parallelsecurity>`.
220
231
221 .. warning::
232 .. warning::
222
233
223 Even at its most secure, the Controller listens on ports on localhost, and
234 Even at its most secure, the Controller listens on ports on localhost, and
224 every time you make a tunnel, you open a localhost port on the connecting
235 every time you make a tunnel, you open a localhost port on the connecting
225 machine that points to the Controller. If localhost on the Controller's
236 machine that points to the Controller. If localhost on the Controller's
226 machine, or the machine of any client or engine, is untrusted, then your
237 machine, or the machine of any client or engine, is untrusted, then your
227 Controller is insecure. There is no way around this with ZeroMQ.
238 Controller is insecure. There is no way around this with ZeroMQ.
228
239
229
240
230
241
231 Getting Started
242 Getting Started
232 ===============
243 ===============
233
244
234 To use IPython for parallel computing, you need to start one instance of the
245 To use IPython for parallel computing, you need to start one instance of the
235 controller and one or more instances of the engine. Initially, it is best to
246 controller and one or more instances of the engine. Initially, it is best to
236 simply start a controller and engines on a single host using the
247 simply start a controller and engines on a single host using the
237 :command:`ipcluster` command. To start a controller and 4 engines on your
248 :command:`ipcluster` command. To start a controller and 4 engines on your
238 localhost, just do::
249 localhost, just do::
239
250
240 $ ipcluster start -n 4
251 $ ipcluster start -n 4
241
252
242 More details about starting the IPython controller and engines can be found
253 More details about starting the IPython controller and engines can be found
243 :ref:`here <parallel_process>`
254 :ref:`here <parallel_process>`
244
255
245 Once you have started the IPython controller and one or more engines, you
256 Once you have started the IPython controller and one or more engines, you
246 are ready to use the engines to do something useful. To make sure
257 are ready to use the engines to do something useful. To make sure
247 everything is working correctly, try the following commands:
258 everything is working correctly, try the following commands:
248
259
249 .. sourcecode:: ipython
260 .. sourcecode:: ipython
250
261
251 In [1]: from IPython.parallel import Client
262 In [1]: from IPython.parallel import Client
252
263
253 In [2]: c = Client()
264 In [2]: c = Client()
254
265
255 In [4]: c.ids
266 In [4]: c.ids
256 Out[4]: set([0, 1, 2, 3])
267 Out[4]: set([0, 1, 2, 3])
257
268
258 In [5]: c[:].apply_sync(lambda : "Hello, World")
269 In [5]: c[:].apply_sync(lambda : "Hello, World")
259 Out[5]: [ 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World' ]
270 Out[5]: [ 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World' ]
260
271
261
272
262 When a client is created with no arguments, the client tries to find the corresponding JSON file
273 When a client is created with no arguments, the client tries to find the corresponding JSON file
263 in the local `~/.ipython/profile_default/security` directory. Or if you specified a profile,
274 in the local `~/.ipython/profile_default/security` directory. Or if you specified a profile,
264 you can use that with the Client. This should cover most cases:
275 you can use that with the Client. This should cover most cases:
265
276
266 .. sourcecode:: ipython
277 .. sourcecode:: ipython
267
278
268 In [2]: c = Client(profile='myprofile')
279 In [2]: c = Client(profile='myprofile')
269
280
270 If you have put the JSON file in a different location or it has a different name, create the
281 If you have put the JSON file in a different location or it has a different name, create the
271 client like this:
282 client like this:
272
283
273 .. sourcecode:: ipython
284 .. sourcecode:: ipython
274
285
275 In [2]: c = Client('/path/to/my/ipcontroller-client.json')
286 In [2]: c = Client('/path/to/my/ipcontroller-client.json')
276
287
277 Remember, a client needs to be able to see the Hub's ports to connect. So if they are on a
288 Remember, a client needs to be able to see the Hub's ports to connect. So if they are on a
278 different machine, you may need to use an ssh server to tunnel access to that machine,
289 different machine, you may need to use an ssh server to tunnel access to that machine,
279 then you would connect to it with:
290 then you would connect to it with:
280
291
281 .. sourcecode:: ipython
292 .. sourcecode:: ipython
282
293
283 In [2]: c = Client('/path/to/my/ipcontroller-client.json', sshserver='me@myhub.example.com')
294 In [2]: c = Client('/path/to/my/ipcontroller-client.json', sshserver='me@myhub.example.com')
284
295
285 Where 'myhub.example.com' is the url or IP address of the machine on
296 Where 'myhub.example.com' is the url or IP address of the machine on
286 which the Hub process is running (or another machine that has direct access to the Hub's ports).
297 which the Hub process is running (or another machine that has direct access to the Hub's ports).
287
298
288 The SSH server may already be specified in ipcontroller-client.json, if the controller was
299 The SSH server may already be specified in ipcontroller-client.json, if the controller was
289 instructed at its launch time.
300 instructed at its launch time.
290
301
291 You are now ready to learn more about the :ref:`Direct
302 You are now ready to learn more about the :ref:`Direct
292 <parallel_multiengine>` and :ref:`LoadBalanced <parallel_task>` interfaces to the
303 <parallel_multiengine>` and :ref:`LoadBalanced <parallel_task>` interfaces to the
293 controller.
304 controller.
294
305
295 .. [ZeroMQ] ZeroMQ. http://www.zeromq.org
306 .. [ZeroMQ] ZeroMQ. http://www.zeromq.org
General Comments 0
You need to be logged in to leave comments. Login now