Show More
@@ -1,275 +1,277 | |||
|
1 | .. _parallel_examples: | |
|
2 | ||
|
1 | 3 | ================= |
|
2 | 4 | Parallel examples |
|
3 | 5 | ================= |
|
4 | 6 | |
|
5 | 7 | In this section we describe two more involved examples of using an IPython |
|
6 | 8 | cluster to perform a parallel computation. In these examples, we will be using |
|
7 | 9 | IPython's "pylab" mode, which enables interactive plotting using the |
|
8 | 10 | Matplotlib package. IPython can be started in this mode by typing:: |
|
9 | 11 | |
|
10 | 12 | ipython --pylab |
|
11 | 13 | |
|
12 | 14 | at the system command line. |
|
13 | 15 | |
|
14 | 16 | 150 million digits of pi |
|
15 | 17 | ======================== |
|
16 | 18 | |
|
17 | 19 | In this example we would like to study the distribution of digits in the |
|
18 | 20 | number pi (in base 10). While it is not known if pi is a normal number (a |
|
19 | 21 | number is normal in base 10 if 0-9 occur with equal likelihood) numerical |
|
20 | 22 | investigations suggest that it is. We will begin with a serial calculation on |
|
21 | 23 | 10,000 digits of pi and then perform a parallel calculation involving 150 |
|
22 | 24 | million digits. |
|
23 | 25 | |
|
24 | 26 | In both the serial and parallel calculation we will be using functions defined |
|
25 | 27 | in the :file:`pidigits.py` file, which is available in the |
|
26 | 28 | :file:`docs/examples/parallel` directory of the IPython source distribution. |
|
27 | 29 | These functions provide basic facilities for working with the digits of pi and |
|
28 | 30 | can be loaded into IPython by putting :file:`pidigits.py` in your current |
|
29 | 31 | working directory and then doing: |
|
30 | 32 | |
|
31 | 33 | .. sourcecode:: ipython |
|
32 | 34 | |
|
33 | 35 | In [1]: run pidigits.py |
|
34 | 36 | |
|
35 | 37 | Serial calculation |
|
36 | 38 | ------------------ |
|
37 | 39 | |
|
38 | 40 | For the serial calculation, we will use `SymPy <http://www.sympy.org>`_ to |
|
39 | 41 | calculate 10,000 digits of pi and then look at the frequencies of the digits |
|
40 | 42 | 0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While |
|
41 | 43 | SymPy is capable of calculating many more digits of pi, our purpose here is to |
|
42 | 44 | set the stage for the much larger parallel calculation. |
|
43 | 45 | |
|
44 | 46 | In this example, we use two functions from :file:`pidigits.py`: |
|
45 | 47 | :func:`one_digit_freqs` (which calculates how many times each digit occurs) |
|
46 | 48 | and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result). |
|
47 | 49 | Here is an interactive IPython session that uses these functions with |
|
48 | 50 | SymPy: |
|
49 | 51 | |
|
50 | 52 | .. sourcecode:: ipython |
|
51 | 53 | |
|
52 | 54 | In [7]: import sympy |
|
53 | 55 | |
|
54 | 56 | In [8]: pi = sympy.pi.evalf(40) |
|
55 | 57 | |
|
56 | 58 | In [9]: pi |
|
57 | 59 | Out[9]: 3.141592653589793238462643383279502884197 |
|
58 | 60 | |
|
59 | 61 | In [10]: pi = sympy.pi.evalf(10000) |
|
60 | 62 | |
|
61 | 63 | In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits |
|
62 | 64 | |
|
63 | 65 | In [12]: run pidigits.py # load one_digit_freqs/plot_one_digit_freqs |
|
64 | 66 | |
|
65 | 67 | In [13]: freqs = one_digit_freqs(digits) |
|
66 | 68 | |
|
67 | 69 | In [14]: plot_one_digit_freqs(freqs) |
|
68 | 70 | Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>] |
|
69 | 71 | |
|
70 | 72 | The resulting plot of the single digit counts shows that each digit occurs |
|
71 | 73 | approximately 1,000 times, but that with only 10,000 digits the |
|
72 | 74 | statistical fluctuations are still rather large: |
|
73 | 75 | |
|
74 | 76 | .. image:: figs/single_digits.* |
|
75 | 77 | |
|
76 | 78 | It is clear that to reduce the relative fluctuations in the counts, we need |
|
77 | 79 | to look at many more digits of pi. That brings us to the parallel calculation. |
|
78 | 80 | |
|
79 | 81 | Parallel calculation |
|
80 | 82 | -------------------- |
|
81 | 83 | |
|
82 | 84 | Calculating many digits of pi is a challenging computational problem in itself. |
|
83 | 85 | Because we want to focus on the distribution of digits in this example, we |
|
84 | 86 | will use pre-computed digit of pi from the website of Professor Yasumasa |
|
85 | 87 | Kanada at the University of Tokyo (http://www.super-computing.org). These |
|
86 | 88 | digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/) |
|
87 | 89 | that each have 10 million digits of pi. |
|
88 | 90 | |
|
89 | 91 | For the parallel calculation, we have copied these files to the local hard |
|
90 | 92 | drives of the compute nodes. A total of 15 of these files will be used, for a |
|
91 | 93 | total of 150 million digits of pi. To make things a little more interesting we |
|
92 | 94 | will calculate the frequencies of all 2 digits sequences (00-99) and then plot |
|
93 | 95 | the result using a 2D matrix in Matplotlib. |
|
94 | 96 | |
|
95 | 97 | The overall idea of the calculation is simple: each IPython engine will |
|
96 | 98 | compute the two digit counts for the digits in a single file. Then in a final |
|
97 | 99 | step the counts from each engine will be added up. To perform this |
|
98 | 100 | calculation, we will need two top-level functions from :file:`pidigits.py`: |
|
99 | 101 | |
|
100 | 102 | .. literalinclude:: ../../examples/parallel/pi/pidigits.py |
|
101 | 103 | :language: python |
|
102 | 104 | :lines: 47-62 |
|
103 | 105 | |
|
104 | 106 | We will also use the :func:`plot_two_digit_freqs` function to plot the |
|
105 | 107 | results. The code to run this calculation in parallel is contained in |
|
106 | 108 | :file:`docs/examples/parallel/parallelpi.py`. This code can be run in parallel |
|
107 | 109 | using IPython by following these steps: |
|
108 | 110 | |
|
109 | 111 | 1. Use :command:`ipcluster` to start 15 engines. We used 16 cores of an SGE linux |
|
110 | 112 | cluster (1 controller + 15 engines). |
|
111 | 113 | 2. With the file :file:`parallelpi.py` in your current working directory, open |
|
112 | 114 | up IPython in pylab mode and type ``run parallelpi.py``. This will download |
|
113 | 115 | the pi files via ftp the first time you run it, if they are not |
|
114 | 116 | present in the Engines' working directory. |
|
115 | 117 | |
|
116 | 118 | When run on our 16 cores, we observe a speedup of 14.2x. This is slightly |
|
117 | 119 | less than linear scaling (16x) because the controller is also running on one of |
|
118 | 120 | the cores. |
|
119 | 121 | |
|
120 | 122 | To emphasize the interactive nature of IPython, we now show how the |
|
121 | 123 | calculation can also be run by simply typing the commands from |
|
122 | 124 | :file:`parallelpi.py` interactively into IPython: |
|
123 | 125 | |
|
124 | 126 | .. sourcecode:: ipython |
|
125 | 127 | |
|
126 | 128 | In [1]: from IPython.parallel import Client |
|
127 | 129 | |
|
128 | 130 | # The Client allows us to use the engines interactively. |
|
129 | 131 | # We simply pass Client the name of the cluster profile we |
|
130 | 132 | # are using. |
|
131 | 133 | In [2]: c = Client(profile='mycluster') |
|
132 | 134 | In [3]: v = c[:] |
|
133 | 135 | |
|
134 | 136 | In [3]: c.ids |
|
135 | 137 | Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] |
|
136 | 138 | |
|
137 | 139 | In [4]: run pidigits.py |
|
138 | 140 | |
|
139 | 141 | In [5]: filestring = 'pi200m.ascii.%(i)02dof20' |
|
140 | 142 | |
|
141 | 143 | # Create the list of files to process. |
|
142 | 144 | In [6]: files = [filestring % {'i':i} for i in range(1,16)] |
|
143 | 145 | |
|
144 | 146 | In [7]: files |
|
145 | 147 | Out[7]: |
|
146 | 148 | ['pi200m.ascii.01of20', |
|
147 | 149 | 'pi200m.ascii.02of20', |
|
148 | 150 | 'pi200m.ascii.03of20', |
|
149 | 151 | 'pi200m.ascii.04of20', |
|
150 | 152 | 'pi200m.ascii.05of20', |
|
151 | 153 | 'pi200m.ascii.06of20', |
|
152 | 154 | 'pi200m.ascii.07of20', |
|
153 | 155 | 'pi200m.ascii.08of20', |
|
154 | 156 | 'pi200m.ascii.09of20', |
|
155 | 157 | 'pi200m.ascii.10of20', |
|
156 | 158 | 'pi200m.ascii.11of20', |
|
157 | 159 | 'pi200m.ascii.12of20', |
|
158 | 160 | 'pi200m.ascii.13of20', |
|
159 | 161 | 'pi200m.ascii.14of20', |
|
160 | 162 | 'pi200m.ascii.15of20'] |
|
161 | 163 | |
|
162 | 164 | # download the data files if they don't already exist: |
|
163 | 165 | In [8]: v.map(fetch_pi_file, files) |
|
164 | 166 | |
|
165 | 167 | # This is the parallel calculation using the Client.map method |
|
166 | 168 | # which applies compute_two_digit_freqs to each file in files in parallel. |
|
167 | 169 | In [9]: freqs_all = v.map(compute_two_digit_freqs, files) |
|
168 | 170 | |
|
169 | 171 | # Add up the frequencies from each engine. |
|
170 | 172 | In [10]: freqs = reduce_freqs(freqs_all) |
|
171 | 173 | |
|
172 | 174 | In [11]: plot_two_digit_freqs(freqs) |
|
173 | 175 | Out[11]: <matplotlib.image.AxesImage object at 0x18beb110> |
|
174 | 176 | |
|
175 | 177 | In [12]: plt.title('2 digit counts of 150m digits of pi') |
|
176 | 178 | Out[12]: <matplotlib.text.Text object at 0x18d1f9b0> |
|
177 | 179 | |
|
178 | 180 | The resulting plot generated by Matplotlib is shown below. The colors indicate |
|
179 | 181 | which two digit sequences are more (red) or less (blue) likely to occur in the |
|
180 | 182 | first 150 million digits of pi. We clearly see that the sequence "41" is |
|
181 | 183 | most likely and that "06" and "07" are least likely. Further analysis would |
|
182 | 184 | show that the relative size of the statistical fluctuations have decreased |
|
183 | 185 | compared to the 10,000 digit calculation. |
|
184 | 186 | |
|
185 | 187 | .. image:: figs/two_digit_counts.* |
|
186 | 188 | |
|
187 | 189 | |
|
188 | 190 | Parallel options pricing |
|
189 | 191 | ======================== |
|
190 | 192 | |
|
191 | 193 | An option is a financial contract that gives the buyer of the contract the |
|
192 | 194 | right to buy (a "call") or sell (a "put") a secondary asset (a stock for |
|
193 | 195 | example) at a particular date in the future (the expiration date) for a |
|
194 | 196 | pre-agreed upon price (the strike price). For this right, the buyer pays the |
|
195 | 197 | seller a premium (the option price). There are a wide variety of flavors of |
|
196 | 198 | options (American, European, Asian, etc.) that are useful for different |
|
197 | 199 | purposes: hedging against risk, speculation, etc. |
|
198 | 200 | |
|
199 | 201 | Much of modern finance is driven by the need to price these contracts |
|
200 | 202 | accurately based on what is known about the properties (such as volatility) of |
|
201 | 203 | the underlying asset. One method of pricing options is to use a Monte Carlo |
|
202 | 204 | simulation of the underlying asset price. In this example we use this approach |
|
203 | 205 | to price both European and Asian (path dependent) options for various strike |
|
204 | 206 | prices and volatilities. |
|
205 | 207 | |
|
206 | 208 | The code for this example can be found in the :file:`docs/examples/parallel/options` |
|
207 | 209 | directory of the IPython source. The function :func:`price_options` in |
|
208 | 210 | :file:`mckernel.py` implements the basic Monte Carlo pricing algorithm using |
|
209 | 211 | the NumPy package and is shown here: |
|
210 | 212 | |
|
211 | 213 | .. literalinclude:: ../../examples/parallel/options/mckernel.py |
|
212 | 214 | :language: python |
|
213 | 215 | |
|
214 | 216 | To run this code in parallel, we will use IPython's :class:`LoadBalancedView` class, |
|
215 | 217 | which distributes work to the engines using dynamic load balancing. This |
|
216 | 218 | view is a wrapper of the :class:`Client` class shown in |
|
217 | 219 | the previous example. The parallel calculation using :class:`LoadBalancedView` can |
|
218 | 220 | be found in the file :file:`mcpricer.py`. The code in this file creates a |
|
219 | 221 | :class:`LoadBalancedView` instance and then submits a set of tasks using |
|
220 | 222 | :meth:`LoadBalancedView.apply` that calculate the option prices for different |
|
221 | 223 | volatilities and strike prices. The results are then plotted as a 2D contour |
|
222 | 224 | plot using Matplotlib. |
|
223 | 225 | |
|
224 | 226 | .. literalinclude:: ../../examples/parallel/options/mcpricer.py |
|
225 | 227 | :language: python |
|
226 | 228 | |
|
227 | 229 | To use this code, start an IPython cluster using :command:`ipcluster`, open |
|
228 | 230 | IPython in the pylab mode with the file :file:`mckernel.py` in your current |
|
229 | 231 | working directory and then type: |
|
230 | 232 | |
|
231 | 233 | .. sourcecode:: ipython |
|
232 | 234 | |
|
233 | 235 | In [7]: run mcpricer.py |
|
234 | 236 | |
|
235 | 237 | Submitted tasks: 30 |
|
236 | 238 | |
|
237 | 239 | Once all the tasks have finished, the results can be plotted using the |
|
238 | 240 | :func:`plot_options` function. Here we make contour plots of the Asian |
|
239 | 241 | call and Asian put options as function of the volatility and strike price: |
|
240 | 242 | |
|
241 | 243 | .. sourcecode:: ipython |
|
242 | 244 | |
|
243 | 245 | In [8]: plot_options(sigma_vals, strike_vals, prices['acall']) |
|
244 | 246 | |
|
245 | 247 | In [9]: plt.figure() |
|
246 | 248 | Out[9]: <matplotlib.figure.Figure object at 0x18c178d0> |
|
247 | 249 | |
|
248 | 250 | In [10]: plot_options(sigma_vals, strike_vals, prices['aput']) |
|
249 | 251 | |
|
250 | 252 | These results are shown in the two figures below. On our 15 engines, the |
|
251 | 253 | entire calculation (15 strike prices, 15 volatilities, 100,000 paths for each) |
|
252 | 254 | took 37 seconds in parallel, giving a speedup of 14.1x, which is comparable |
|
253 | 255 | to the speedup observed in our previous example. |
|
254 | 256 | |
|
255 | 257 | .. image:: figs/asian_call.* |
|
256 | 258 | |
|
257 | 259 | .. image:: figs/asian_put.* |
|
258 | 260 | |
|
259 | 261 | Conclusion |
|
260 | 262 | ========== |
|
261 | 263 | |
|
262 | 264 | To conclude these examples, we summarize the key features of IPython's |
|
263 | 265 | parallel architecture that have been demonstrated: |
|
264 | 266 | |
|
265 | 267 | * Serial code can be parallelized often with only a few extra lines of code. |
|
266 | 268 | We have used the :class:`DirectView` and :class:`LoadBalancedView` classes |
|
267 | 269 | for this purpose. |
|
268 | 270 | * The resulting parallel code can be run without ever leaving the IPython's |
|
269 | 271 | interactive shell. |
|
270 | 272 | * Any data computed in parallel can be explored interactively through |
|
271 | 273 | visualization or further numerical calculations. |
|
272 | 274 | * We have run these examples on a cluster running RHEL 5 and Sun GridEngine. |
|
273 | 275 | IPython's built in support for SGE (and other batch systems) makes it easy |
|
274 | 276 | to get started with IPython's parallel capabilities. |
|
275 | 277 |
@@ -1,295 +1,306 | |||
|
1 | 1 | .. _parallel_overview: |
|
2 | 2 | |
|
3 | 3 | ============================ |
|
4 | 4 | Overview and getting started |
|
5 | 5 | ============================ |
|
6 | 6 | |
|
7 | ||
|
8 | Examples | |
|
9 | ======== | |
|
10 | ||
|
11 | We have various example scripts and notebooks for using IPython.parallel in our | |
|
12 | :file:`docs/examples/parallel` directory, or they can be found `on GitHub`__. | |
|
13 | Some of these are covered in more detail in the :ref:`examples | |
|
14 | <parallel_examples>` section. | |
|
15 | ||
|
16 | .. __: https://github.com/ipython/ipython/tree/master/docs/examples/parallel | |
|
17 | ||
|
7 | 18 | Introduction |
|
8 | 19 | ============ |
|
9 | 20 | |
|
10 | 21 | This section gives an overview of IPython's sophisticated and powerful |
|
11 | 22 | architecture for parallel and distributed computing. This architecture |
|
12 | 23 | abstracts out parallelism in a very general way, which enables IPython to |
|
13 | 24 | support many different styles of parallelism including: |
|
14 | 25 | |
|
15 | 26 | * Single program, multiple data (SPMD) parallelism. |
|
16 | 27 | * Multiple program, multiple data (MPMD) parallelism. |
|
17 | 28 | * Message passing using MPI. |
|
18 | 29 | * Task farming. |
|
19 | 30 | * Data parallel. |
|
20 | 31 | * Combinations of these approaches. |
|
21 | 32 | * Custom user defined approaches. |
|
22 | 33 | |
|
23 | 34 | Most importantly, IPython enables all types of parallel applications to |
|
24 | 35 | be developed, executed, debugged and monitored *interactively*. Hence, |
|
25 | 36 | the ``I`` in IPython. The following are some example usage cases for IPython: |
|
26 | 37 | |
|
27 | 38 | * Quickly parallelize algorithms that are embarrassingly parallel |
|
28 | 39 | using a number of simple approaches. Many simple things can be |
|
29 | 40 | parallelized interactively in one or two lines of code. |
|
30 | 41 | |
|
31 | 42 | * Steer traditional MPI applications on a supercomputer from an |
|
32 | 43 | IPython session on your laptop. |
|
33 | 44 | |
|
34 | 45 | * Analyze and visualize large datasets (that could be remote and/or |
|
35 | 46 | distributed) interactively using IPython and tools like |
|
36 | 47 | matplotlib/TVTK. |
|
37 | 48 | |
|
38 | 49 | * Develop, test and debug new parallel algorithms |
|
39 | 50 | (that may use MPI) interactively. |
|
40 | 51 | |
|
41 | 52 | * Tie together multiple MPI jobs running on different systems into |
|
42 | 53 | one giant distributed and parallel system. |
|
43 | 54 | |
|
44 | 55 | * Start a parallel job on your cluster and then have a remote |
|
45 | 56 | collaborator connect to it and pull back data into their |
|
46 | 57 | local IPython session for plotting and analysis. |
|
47 | 58 | |
|
48 | 59 | * Run a set of tasks on a set of CPUs using dynamic load balancing. |
|
49 | 60 | |
|
50 | 61 | .. tip:: |
|
51 | 62 | |
|
52 | 63 | At the SciPy 2011 conference in Austin, Min Ragan-Kelley presented a |
|
53 | 64 | complete 4-hour tutorial on the use of these features, and all the materials |
|
54 | 65 | for the tutorial are now `available online`__. That tutorial provides an |
|
55 | 66 | excellent, hands-on oriented complement to the reference documentation |
|
56 | 67 | presented here. |
|
57 | 68 | |
|
58 | 69 | .. __: http://minrk.github.com/scipy-tutorial-2011 |
|
59 | 70 | |
|
60 | 71 | Architecture overview |
|
61 | 72 | ===================== |
|
62 | 73 | |
|
63 | 74 | .. figure:: figs/wideView.png |
|
64 | 75 | :width: 300px |
|
65 | 76 | |
|
66 | 77 | |
|
67 | 78 | The IPython architecture consists of four components: |
|
68 | 79 | |
|
69 | 80 | * The IPython engine. |
|
70 | 81 | * The IPython hub. |
|
71 | 82 | * The IPython schedulers. |
|
72 | 83 | * The controller client. |
|
73 | 84 | |
|
74 | 85 | These components live in the :mod:`IPython.parallel` package and are |
|
75 | 86 | installed with IPython. They do, however, have additional dependencies |
|
76 | 87 | that must be installed. For more information, see our |
|
77 | 88 | :ref:`installation documentation <install_index>`. |
|
78 | 89 | |
|
79 | 90 | .. TODO: include zmq in install_index |
|
80 | 91 | |
|
81 | 92 | IPython engine |
|
82 | 93 | --------------- |
|
83 | 94 | |
|
84 | 95 | The IPython engine is a Python instance that takes Python commands over a |
|
85 | 96 | network connection. Eventually, the IPython engine will be a full IPython |
|
86 | 97 | interpreter, but for now, it is a regular Python interpreter. The engine |
|
87 | 98 | can also handle incoming and outgoing Python objects sent over a network |
|
88 | 99 | connection. When multiple engines are started, parallel and distributed |
|
89 | 100 | computing becomes possible. An important feature of an IPython engine is |
|
90 | 101 | that it blocks while user code is being executed. Read on for how the |
|
91 | 102 | IPython controller solves this problem to expose a clean asynchronous API |
|
92 | 103 | to the user. |
|
93 | 104 | |
|
94 | 105 | IPython controller |
|
95 | 106 | ------------------ |
|
96 | 107 | |
|
97 | 108 | The IPython controller processes provide an interface for working with a set of engines. |
|
98 | 109 | At a general level, the controller is a collection of processes to which IPython engines |
|
99 | 110 | and clients can connect. The controller is composed of a :class:`Hub` and a collection of |
|
100 | 111 | :class:`Schedulers`. These Schedulers are typically run in separate processes but on the |
|
101 | 112 | same machine as the Hub, but can be run anywhere from local threads or on remote machines. |
|
102 | 113 | |
|
103 | 114 | The controller also provides a single point of contact for users who wish to |
|
104 | 115 | utilize the engines connected to the controller. There are different ways of |
|
105 | 116 | working with a controller. In IPython, all of these models are implemented via |
|
106 | 117 | the :meth:`.View.apply` method, after |
|
107 | 118 | constructing :class:`.View` objects to represent subsets of engines. The two |
|
108 | 119 | primary models for interacting with engines are: |
|
109 | 120 | |
|
110 | 121 | * A **Direct** interface, where engines are addressed explicitly. |
|
111 | 122 | * A **LoadBalanced** interface, where the Scheduler is trusted with assigning work to |
|
112 | 123 | appropriate engines. |
|
113 | 124 | |
|
114 | 125 | Advanced users can readily extend the View models to enable other |
|
115 | 126 | styles of parallelism. |
|
116 | 127 | |
|
117 | 128 | .. note:: |
|
118 | 129 | |
|
119 | 130 | A single controller and set of engines can be used with multiple models |
|
120 | 131 | simultaneously. This opens the door for lots of interesting things. |
|
121 | 132 | |
|
122 | 133 | |
|
123 | 134 | The Hub |
|
124 | 135 | ******* |
|
125 | 136 | |
|
126 | 137 | The center of an IPython cluster is the Hub. This is the process that keeps |
|
127 | 138 | track of engine connections, schedulers, clients, as well as all task requests and |
|
128 | 139 | results. The primary role of the Hub is to facilitate queries of the cluster state, and |
|
129 | 140 | minimize the necessary information required to establish the many connections involved in |
|
130 | 141 | connecting new clients and engines. |
|
131 | 142 | |
|
132 | 143 | |
|
133 | 144 | Schedulers |
|
134 | 145 | ********** |
|
135 | 146 | |
|
136 | 147 | All actions that can be performed on the engine go through a Scheduler. While the engines |
|
137 | 148 | themselves block when user code is run, the schedulers hide that from the user to provide |
|
138 | 149 | a fully asynchronous interface to a set of engines. |
|
139 | 150 | |
|
140 | 151 | |
|
141 | 152 | IPython client and views |
|
142 | 153 | ------------------------ |
|
143 | 154 | |
|
144 | 155 | There is one primary object, the :class:`~.parallel.Client`, for connecting to a cluster. |
|
145 | 156 | For each execution model, there is a corresponding :class:`~.parallel.View`. These views |
|
146 | 157 | allow users to interact with a set of engines through the interface. Here are the two default |
|
147 | 158 | views: |
|
148 | 159 | |
|
149 | 160 | * The :class:`DirectView` class for explicit addressing. |
|
150 | 161 | * The :class:`LoadBalancedView` class for destination-agnostic scheduling. |
|
151 | 162 | |
|
152 | 163 | Security |
|
153 | 164 | -------- |
|
154 | 165 | |
|
155 | 166 | IPython uses ZeroMQ for networking, which has provided many advantages, but |
|
156 | 167 | one of the setbacks is its utter lack of security [ZeroMQ]_. By default, no IPython |
|
157 | 168 | connections are encrypted, but open ports only listen on localhost. The only |
|
158 | 169 | source of security for IPython is via ssh-tunnel. IPython supports both shell |
|
159 | 170 | (`openssh`) and `paramiko` based tunnels for connections. There is a key necessary |
|
160 | 171 | to submit requests, but due to the lack of encryption, it does not provide |
|
161 | 172 | significant security if loopback traffic is compromised. |
|
162 | 173 | |
|
163 | 174 | In our architecture, the controller is the only process that listens on |
|
164 | 175 | network ports, and is thus the main point of vulnerability. The standard model |
|
165 | 176 | for secure connections is to designate that the controller listen on |
|
166 | 177 | localhost, and use ssh-tunnels to connect clients and/or |
|
167 | 178 | engines. |
|
168 | 179 | |
|
169 | 180 | To connect and authenticate to the controller an engine or client needs |
|
170 | 181 | some information that the controller has stored in a JSON file. |
|
171 | 182 | Thus, the JSON files need to be copied to a location where |
|
172 | 183 | the clients and engines can find them. Typically, this is the |
|
173 | 184 | :file:`~/.ipython/profile_default/security` directory on the host where the |
|
174 | 185 | client/engine is running (which could be a different host than the controller). |
|
175 | 186 | Once the JSON files are copied over, everything should work fine. |
|
176 | 187 | |
|
177 | 188 | Currently, there are two JSON files that the controller creates: |
|
178 | 189 | |
|
179 | 190 | ipcontroller-engine.json |
|
180 | 191 | This JSON file has the information necessary for an engine to connect |
|
181 | 192 | to a controller. |
|
182 | 193 | |
|
183 | 194 | ipcontroller-client.json |
|
184 | 195 | The client's connection information. This may not differ from the engine's, |
|
185 | 196 | but since the controller may listen on different ports for clients and |
|
186 | 197 | engines, it is stored separately. |
|
187 | 198 | |
|
188 | 199 | ipcontroller-client.json will look something like this, under default localhost |
|
189 | 200 | circumstances: |
|
190 | 201 | |
|
191 | 202 | .. sourcecode:: python |
|
192 | 203 | |
|
193 | 204 | { |
|
194 | 205 | "url":"tcp:\/\/127.0.0.1:54424", |
|
195 | 206 | "exec_key":"a361fe89-92fc-4762-9767-e2f0a05e3130", |
|
196 | 207 | "ssh":"", |
|
197 | 208 | "location":"10.19.1.135" |
|
198 | 209 | } |
|
199 | 210 | |
|
200 | 211 | If, however, you are running the controller on a work node on a cluster, you will likely |
|
201 | 212 | need to use ssh tunnels to connect clients from your laptop to it. You will also |
|
202 | 213 | probably need to instruct the controller to listen for engines coming from other work nodes |
|
203 | 214 | on the cluster. An example of ipcontroller-client.json, as created by:: |
|
204 | 215 | |
|
205 | 216 | $> ipcontroller --ip=0.0.0.0 --ssh=login.mycluster.com |
|
206 | 217 | |
|
207 | 218 | |
|
208 | 219 | .. sourcecode:: python |
|
209 | 220 | |
|
210 | 221 | { |
|
211 | 222 | "url":"tcp:\/\/*:54424", |
|
212 | 223 | "exec_key":"a361fe89-92fc-4762-9767-e2f0a05e3130", |
|
213 | 224 | "ssh":"login.mycluster.com", |
|
214 | 225 | "location":"10.0.0.2" |
|
215 | 226 | } |
|
216 | 227 | More details of how these JSON files are used are given below. |
|
217 | 228 | |
|
218 | 229 | A detailed description of the security model and its implementation in IPython |
|
219 | 230 | can be found :ref:`here <parallelsecurity>`. |
|
220 | 231 | |
|
221 | 232 | .. warning:: |
|
222 | 233 | |
|
223 | 234 | Even at its most secure, the Controller listens on ports on localhost, and |
|
224 | 235 | every time you make a tunnel, you open a localhost port on the connecting |
|
225 | 236 | machine that points to the Controller. If localhost on the Controller's |
|
226 | 237 | machine, or the machine of any client or engine, is untrusted, then your |
|
227 | 238 | Controller is insecure. There is no way around this with ZeroMQ. |
|
228 | 239 | |
|
229 | 240 | |
|
230 | 241 | |
|
231 | 242 | Getting Started |
|
232 | 243 | =============== |
|
233 | 244 | |
|
234 | 245 | To use IPython for parallel computing, you need to start one instance of the |
|
235 | 246 | controller and one or more instances of the engine. Initially, it is best to |
|
236 | 247 | simply start a controller and engines on a single host using the |
|
237 | 248 | :command:`ipcluster` command. To start a controller and 4 engines on your |
|
238 | 249 | localhost, just do:: |
|
239 | 250 | |
|
240 | 251 | $ ipcluster start -n 4 |
|
241 | 252 | |
|
242 | 253 | More details about starting the IPython controller and engines can be found |
|
243 | 254 | :ref:`here <parallel_process>` |
|
244 | 255 | |
|
245 | 256 | Once you have started the IPython controller and one or more engines, you |
|
246 | 257 | are ready to use the engines to do something useful. To make sure |
|
247 | 258 | everything is working correctly, try the following commands: |
|
248 | 259 | |
|
249 | 260 | .. sourcecode:: ipython |
|
250 | 261 | |
|
251 | 262 | In [1]: from IPython.parallel import Client |
|
252 | 263 | |
|
253 | 264 | In [2]: c = Client() |
|
254 | 265 | |
|
255 | 266 | In [4]: c.ids |
|
256 | 267 | Out[4]: set([0, 1, 2, 3]) |
|
257 | 268 | |
|
258 | 269 | In [5]: c[:].apply_sync(lambda : "Hello, World") |
|
259 | 270 | Out[5]: [ 'Hello, World', 'Hello, World', 'Hello, World', 'Hello, World' ] |
|
260 | 271 | |
|
261 | 272 | |
|
262 | 273 | When a client is created with no arguments, the client tries to find the corresponding JSON file |
|
263 | 274 | in the local `~/.ipython/profile_default/security` directory. Or if you specified a profile, |
|
264 | 275 | you can use that with the Client. This should cover most cases: |
|
265 | 276 | |
|
266 | 277 | .. sourcecode:: ipython |
|
267 | 278 | |
|
268 | 279 | In [2]: c = Client(profile='myprofile') |
|
269 | 280 | |
|
270 | 281 | If you have put the JSON file in a different location or it has a different name, create the |
|
271 | 282 | client like this: |
|
272 | 283 | |
|
273 | 284 | .. sourcecode:: ipython |
|
274 | 285 | |
|
275 | 286 | In [2]: c = Client('/path/to/my/ipcontroller-client.json') |
|
276 | 287 | |
|
277 | 288 | Remember, a client needs to be able to see the Hub's ports to connect. So if they are on a |
|
278 | 289 | different machine, you may need to use an ssh server to tunnel access to that machine, |
|
279 | 290 | then you would connect to it with: |
|
280 | 291 | |
|
281 | 292 | .. sourcecode:: ipython |
|
282 | 293 | |
|
283 | 294 | In [2]: c = Client('/path/to/my/ipcontroller-client.json', sshserver='me@myhub.example.com') |
|
284 | 295 | |
|
285 | 296 | Where 'myhub.example.com' is the url or IP address of the machine on |
|
286 | 297 | which the Hub process is running (or another machine that has direct access to the Hub's ports). |
|
287 | 298 | |
|
288 | 299 | The SSH server may already be specified in ipcontroller-client.json, if the controller was |
|
289 | 300 | instructed at its launch time. |
|
290 | 301 | |
|
291 | 302 | You are now ready to learn more about the :ref:`Direct |
|
292 | 303 | <parallel_multiengine>` and :ref:`LoadBalanced <parallel_task>` interfaces to the |
|
293 | 304 | controller. |
|
294 | 305 | |
|
295 | 306 | .. [ZeroMQ] ZeroMQ. http://www.zeromq.org |
General Comments 0
You need to be logged in to leave comments.
Login now