##// END OF EJS Templates
update some parallel docs...
MinRK -
Show More
@@ -1,284 +1,275 b''
1 1 =================
2 2 Parallel examples
3 3 =================
4 4
5 .. note::
6
7 Performance numbers from ``IPython.kernel``, not new ``IPython.parallel``.
8
9 5 In this section we describe two more involved examples of using an IPython
10 6 cluster to perform a parallel computation. In these examples, we will be using
11 7 IPython's "pylab" mode, which enables interactive plotting using the
12 8 Matplotlib package. IPython can be started in this mode by typing::
13 9
14 10 ipython --pylab
15 11
16 12 at the system command line.
17 13
18 14 150 million digits of pi
19 15 ========================
20 16
21 17 In this example we would like to study the distribution of digits in the
22 18 number pi (in base 10). While it is not known if pi is a normal number (a
23 19 number is normal in base 10 if 0-9 occur with equal likelihood) numerical
24 20 investigations suggest that it is. We will begin with a serial calculation on
25 21 10,000 digits of pi and then perform a parallel calculation involving 150
26 22 million digits.
27 23
28 24 In both the serial and parallel calculation we will be using functions defined
29 25 in the :file:`pidigits.py` file, which is available in the
30 26 :file:`docs/examples/parallel` directory of the IPython source distribution.
31 27 These functions provide basic facilities for working with the digits of pi and
32 28 can be loaded into IPython by putting :file:`pidigits.py` in your current
33 29 working directory and then doing:
34 30
35 31 .. sourcecode:: ipython
36 32
37 33 In [1]: run pidigits.py
38 34
39 35 Serial calculation
40 36 ------------------
41 37
42 38 For the serial calculation, we will use `SymPy <http://www.sympy.org>`_ to
43 39 calculate 10,000 digits of pi and then look at the frequencies of the digits
44 40 0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While
45 41 SymPy is capable of calculating many more digits of pi, our purpose here is to
46 42 set the stage for the much larger parallel calculation.
47 43
48 44 In this example, we use two functions from :file:`pidigits.py`:
49 45 :func:`one_digit_freqs` (which calculates how many times each digit occurs)
50 46 and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result).
51 47 Here is an interactive IPython session that uses these functions with
52 48 SymPy:
53 49
54 50 .. sourcecode:: ipython
55 51
56 52 In [7]: import sympy
57 53
58 54 In [8]: pi = sympy.pi.evalf(40)
59 55
60 56 In [9]: pi
61 57 Out[9]: 3.141592653589793238462643383279502884197
62 58
63 59 In [10]: pi = sympy.pi.evalf(10000)
64 60
65 61 In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits
66 62
67 63 In [12]: run pidigits.py # load one_digit_freqs/plot_one_digit_freqs
68 64
69 65 In [13]: freqs = one_digit_freqs(digits)
70 66
71 67 In [14]: plot_one_digit_freqs(freqs)
72 68 Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>]
73 69
74 70 The resulting plot of the single digit counts shows that each digit occurs
75 71 approximately 1,000 times, but that with only 10,000 digits the
76 72 statistical fluctuations are still rather large:
77 73
78 74 .. image:: figs/single_digits.*
79 75
80 76 It is clear that to reduce the relative fluctuations in the counts, we need
81 77 to look at many more digits of pi. That brings us to the parallel calculation.
82 78
83 79 Parallel calculation
84 80 --------------------
85 81
86 82 Calculating many digits of pi is a challenging computational problem in itself.
87 83 Because we want to focus on the distribution of digits in this example, we
88 84 will use pre-computed digit of pi from the website of Professor Yasumasa
89 85 Kanada at the University of Tokyo (http://www.super-computing.org). These
90 86 digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/)
91 87 that each have 10 million digits of pi.
92 88
93 89 For the parallel calculation, we have copied these files to the local hard
94 90 drives of the compute nodes. A total of 15 of these files will be used, for a
95 91 total of 150 million digits of pi. To make things a little more interesting we
96 92 will calculate the frequencies of all 2 digits sequences (00-99) and then plot
97 93 the result using a 2D matrix in Matplotlib.
98 94
99 95 The overall idea of the calculation is simple: each IPython engine will
100 96 compute the two digit counts for the digits in a single file. Then in a final
101 97 step the counts from each engine will be added up. To perform this
102 98 calculation, we will need two top-level functions from :file:`pidigits.py`:
103 99
104 100 .. literalinclude:: ../../examples/parallel/pi/pidigits.py
105 101 :language: python
106 102 :lines: 47-62
107 103
108 104 We will also use the :func:`plot_two_digit_freqs` function to plot the
109 105 results. The code to run this calculation in parallel is contained in
110 106 :file:`docs/examples/parallel/parallelpi.py`. This code can be run in parallel
111 107 using IPython by following these steps:
112 108
113 1. Use :command:`ipcluster` to start 15 engines. We used an 8 core (2 quad
114 core CPUs) cluster with hyperthreading enabled which makes the 8 cores
115 looks like 16 (1 controller + 15 engines) in the OS. However, the maximum
116 speedup we can observe is still only 8x.
109 1. Use :command:`ipcluster` to start 15 engines. We used 16 cores of an SGE linux
110 cluster (1 controller + 15 engines).
117 111 2. With the file :file:`parallelpi.py` in your current working directory, open
118 112 up IPython in pylab mode and type ``run parallelpi.py``. This will download
119 113 the pi files via ftp the first time you run it, if they are not
120 114 present in the Engines' working directory.
121 115
122 When run on our 8 core cluster, we observe a speedup of 7.7x. This is slightly
123 less than linear scaling (8x) because the controller is also running on one of
116 When run on our 16 cores, we observe a speedup of 14.2x. This is slightly
117 less than linear scaling (16x) because the controller is also running on one of
124 118 the cores.
125 119
126 120 To emphasize the interactive nature of IPython, we now show how the
127 121 calculation can also be run by simply typing the commands from
128 122 :file:`parallelpi.py` interactively into IPython:
129 123
130 124 .. sourcecode:: ipython
131 125
132 126 In [1]: from IPython.parallel import Client
133 127
134 128 # The Client allows us to use the engines interactively.
135 129 # We simply pass Client the name of the cluster profile we
136 130 # are using.
137 131 In [2]: c = Client(profile='mycluster')
138 In [3]: view = c.load_balanced_view()
132 In [3]: v = c[:]
139 133
140 134 In [3]: c.ids
141 135 Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
142 136
143 137 In [4]: run pidigits.py
144 138
145 139 In [5]: filestring = 'pi200m.ascii.%(i)02dof20'
146 140
147 141 # Create the list of files to process.
148 142 In [6]: files = [filestring % {'i':i} for i in range(1,16)]
149 143
150 144 In [7]: files
151 145 Out[7]:
152 146 ['pi200m.ascii.01of20',
153 147 'pi200m.ascii.02of20',
154 148 'pi200m.ascii.03of20',
155 149 'pi200m.ascii.04of20',
156 150 'pi200m.ascii.05of20',
157 151 'pi200m.ascii.06of20',
158 152 'pi200m.ascii.07of20',
159 153 'pi200m.ascii.08of20',
160 154 'pi200m.ascii.09of20',
161 155 'pi200m.ascii.10of20',
162 156 'pi200m.ascii.11of20',
163 157 'pi200m.ascii.12of20',
164 158 'pi200m.ascii.13of20',
165 159 'pi200m.ascii.14of20',
166 160 'pi200m.ascii.15of20']
167 161
168 162 # download the data files if they don't already exist:
169 163 In [8]: v.map(fetch_pi_file, files)
170 164
171 165 # This is the parallel calculation using the Client.map method
172 166 # which applies compute_two_digit_freqs to each file in files in parallel.
173 167 In [9]: freqs_all = v.map(compute_two_digit_freqs, files)
174 168
175 169 # Add up the frequencies from each engine.
176 170 In [10]: freqs = reduce_freqs(freqs_all)
177 171
178 172 In [11]: plot_two_digit_freqs(freqs)
179 173 Out[11]: <matplotlib.image.AxesImage object at 0x18beb110>
180 174
181 175 In [12]: plt.title('2 digit counts of 150m digits of pi')
182 176 Out[12]: <matplotlib.text.Text object at 0x18d1f9b0>
183 177
184 178 The resulting plot generated by Matplotlib is shown below. The colors indicate
185 179 which two digit sequences are more (red) or less (blue) likely to occur in the
186 180 first 150 million digits of pi. We clearly see that the sequence "41" is
187 181 most likely and that "06" and "07" are least likely. Further analysis would
188 182 show that the relative size of the statistical fluctuations have decreased
189 183 compared to the 10,000 digit calculation.
190 184
191 185 .. image:: figs/two_digit_counts.*
192 186
193 187
194 188 Parallel options pricing
195 189 ========================
196 190
197 191 An option is a financial contract that gives the buyer of the contract the
198 192 right to buy (a "call") or sell (a "put") a secondary asset (a stock for
199 193 example) at a particular date in the future (the expiration date) for a
200 194 pre-agreed upon price (the strike price). For this right, the buyer pays the
201 195 seller a premium (the option price). There are a wide variety of flavors of
202 196 options (American, European, Asian, etc.) that are useful for different
203 197 purposes: hedging against risk, speculation, etc.
204 198
205 199 Much of modern finance is driven by the need to price these contracts
206 200 accurately based on what is known about the properties (such as volatility) of
207 201 the underlying asset. One method of pricing options is to use a Monte Carlo
208 202 simulation of the underlying asset price. In this example we use this approach
209 203 to price both European and Asian (path dependent) options for various strike
210 204 prices and volatilities.
211 205
212 The code for this example can be found in the :file:`docs/examples/parallel`
206 The code for this example can be found in the :file:`docs/examples/parallel/options`
213 207 directory of the IPython source. The function :func:`price_options` in
214 :file:`mcpricer.py` implements the basic Monte Carlo pricing algorithm using
208 :file:`mckernel.py` implements the basic Monte Carlo pricing algorithm using
215 209 the NumPy package and is shown here:
216 210
217 .. literalinclude:: ../../examples/parallel/options/mcpricer.py
211 .. literalinclude:: ../../examples/parallel/options/mckernel.py
218 212 :language: python
219 213
220 214 To run this code in parallel, we will use IPython's :class:`LoadBalancedView` class,
221 215 which distributes work to the engines using dynamic load balancing. This
222 216 view is a wrapper of the :class:`Client` class shown in
223 217 the previous example. The parallel calculation using :class:`LoadBalancedView` can
224 218 be found in the file :file:`mcpricer.py`. The code in this file creates a
225 219 :class:`LoadBalancedView` instance and then submits a set of tasks using
226 220 :meth:`LoadBalancedView.apply` that calculate the option prices for different
227 221 volatilities and strike prices. The results are then plotted as a 2D contour
228 222 plot using Matplotlib.
229 223
230 .. literalinclude:: ../../examples/parallel/options/mckernel.py
224 .. literalinclude:: ../../examples/parallel/options/mcpricer.py
231 225 :language: python
232 226
233 227 To use this code, start an IPython cluster using :command:`ipcluster`, open
234 228 IPython in the pylab mode with the file :file:`mckernel.py` in your current
235 229 working directory and then type:
236 230
237 231 .. sourcecode:: ipython
238 232
239 In [7]: run mckernel.py
240 Submitted tasks: [0, 1, 2, ...]
233 In [7]: run mcpricer.py
234
235 Submitted tasks: 30
241 236
242 237 Once all the tasks have finished, the results can be plotted using the
243 238 :func:`plot_options` function. Here we make contour plots of the Asian
244 239 call and Asian put options as function of the volatility and strike price:
245 240
246 241 .. sourcecode:: ipython
247 242
248 In [8]: plot_options(sigma_vals, K_vals, prices['acall'])
243 In [8]: plot_options(sigma_vals, strike_vals, prices['acall'])
249 244
250 245 In [9]: plt.figure()
251 246 Out[9]: <matplotlib.figure.Figure object at 0x18c178d0>
252 247
253 In [10]: plot_options(sigma_vals, K_vals, prices['aput'])
248 In [10]: plot_options(sigma_vals, strike_vals, prices['aput'])
254 249
255 These results are shown in the two figures below. On a 8 core cluster the
256 entire calculation (10 strike prices, 10 volatilities, 100,000 paths for each)
257 took 30 seconds in parallel, giving a speedup of 7.7x, which is comparable
250 These results are shown in the two figures below. On our 15 engines, the
251 entire calculation (15 strike prices, 15 volatilities, 100,000 paths for each)
252 took 37 seconds in parallel, giving a speedup of 14.1x, which is comparable
258 253 to the speedup observed in our previous example.
259 254
260 255 .. image:: figs/asian_call.*
261 256
262 257 .. image:: figs/asian_put.*
263 258
264 259 Conclusion
265 260 ==========
266 261
267 262 To conclude these examples, we summarize the key features of IPython's
268 263 parallel architecture that have been demonstrated:
269 264
270 265 * Serial code can be parallelized often with only a few extra lines of code.
271 266 We have used the :class:`DirectView` and :class:`LoadBalancedView` classes
272 267 for this purpose.
273 268 * The resulting parallel code can be run without ever leaving the IPython's
274 269 interactive shell.
275 270 * Any data computed in parallel can be explored interactively through
276 271 visualization or further numerical calculations.
277 * We have run these examples on a cluster running Windows HPC Server 2008.
278 IPython's built in support for the Windows HPC job scheduler makes it
279 easy to get started with IPython's parallel capabilities.
280
281 .. note::
272 * We have run these examples on a cluster running RHEL 5 and Sun GridEngine.
273 IPython's built in support for SGE (and other batch systems) makes it easy
274 to get started with IPython's parallel capabilities.
282 275
283 The new parallel code has never been run on Windows HPC Server, so the last
284 conclusion is untested.
@@ -1,761 +1,826 b''
1 1 .. _parallel_process:
2 2
3 3 ===========================================
4 4 Starting the IPython controller and engines
5 5 ===========================================
6 6
7 7 To use IPython for parallel computing, you need to start one instance of
8 8 the controller and one or more instances of the engine. The controller
9 9 and each engine can run on different machines or on the same machine.
10 10 Because of this, there are many different possibilities.
11 11
12 12 Broadly speaking, there are two ways of going about starting a controller and engines:
13 13
14 14 * In an automated manner using the :command:`ipcluster` command.
15 15 * In a more manual way using the :command:`ipcontroller` and
16 16 :command:`ipengine` commands.
17 17
18 18 This document describes both of these methods. We recommend that new users
19 19 start with the :command:`ipcluster` command as it simplifies many common usage
20 20 cases.
21 21
22 22 General considerations
23 23 ======================
24 24
25 25 Before delving into the details about how you can start a controller and
26 26 engines using the various methods, we outline some of the general issues that
27 27 come up when starting the controller and engines. These things come up no
28 28 matter which method you use to start your IPython cluster.
29 29
30 30 If you are running engines on multiple machines, you will likely need to instruct the
31 31 controller to listen for connections on an external interface. This can be done by specifying
32 32 the ``ip`` argument on the command-line, or the ``HubFactory.ip`` configurable in
33 33 :file:`ipcontroller_config.py`.
34 34
35 35 If your machines are on a trusted network, you can safely instruct the controller to listen
36 36 on all public interfaces with::
37 37
38 38 $> ipcontroller --ip=*
39 39
40 40 Or you can set the same behavior as the default by adding the following line to your :file:`ipcontroller_config.py`:
41 41
42 42 .. sourcecode:: python
43 43
44 44 c.HubFactory.ip = '*'
45 45
46 46 .. note::
47 47
48 48 Due to the lack of security in ZeroMQ, the controller will only listen for connections on
49 49 localhost by default. If you see Timeout errors on engines or clients, then the first
50 50 thing you should check is the ip address the controller is listening on, and make sure
51 51 that it is visible from the timing out machine.
52 52
53 53 .. seealso::
54 54
55 55 Our `notes <parallel_security>`_ on security in the new parallel computing code.
56 56
57 57 Let's say that you want to start the controller on ``host0`` and engines on
58 58 hosts ``host1``-``hostn``. The following steps are then required:
59 59
60 60 1. Start the controller on ``host0`` by running :command:`ipcontroller` on
61 61 ``host0``. The controller must be instructed to listen on an interface visible
62 62 to the engine machines, via the ``ip`` command-line argument or ``HubFactory.ip``
63 63 in :file:`ipcontroller_config.py`.
64 64 2. Move the JSON file (:file:`ipcontroller-engine.json`) created by the
65 65 controller from ``host0`` to hosts ``host1``-``hostn``.
66 66 3. Start the engines on hosts ``host1``-``hostn`` by running
67 67 :command:`ipengine`. This command has to be told where the JSON file
68 68 (:file:`ipcontroller-engine.json`) is located.
69 69
70 70 At this point, the controller and engines will be connected. By default, the JSON files
71 71 created by the controller are put into the :file:`~/.ipython/profile_default/security`
72 72 directory. If the engines share a filesystem with the controller, step 2 can be skipped as
73 73 the engines will automatically look at that location.
74 74
75 75 The final step required to actually use the running controller from a client is to move
76 76 the JSON file :file:`ipcontroller-client.json` from ``host0`` to any host where clients
77 77 will be run. If these file are put into the :file:`~/.ipython/profile_default/security`
78 78 directory of the client's host, they will be found automatically. Otherwise, the full path
79 79 to them has to be passed to the client's constructor.
80 80
81 81 Using :command:`ipcluster`
82 82 ===========================
83 83
84 84 The :command:`ipcluster` command provides a simple way of starting a
85 85 controller and engines in the following situations:
86 86
87 87 1. When the controller and engines are all run on localhost. This is useful
88 88 for testing or running on a multicore computer.
89 89 2. When engines are started using the :command:`mpiexec` command that comes
90 90 with most MPI [MPI]_ implementations
91 91 3. When engines are started using the PBS [PBS]_ batch system
92 92 (or other `qsub` systems, such as SGE).
93 93 4. When the controller is started on localhost and the engines are started on
94 94 remote nodes using :command:`ssh`.
95 95 5. When engines are started using the Windows HPC Server batch system.
96 96
97 97 .. note::
98 98
99 99 Currently :command:`ipcluster` requires that the
100 100 :file:`~/.ipython/profile_<name>/security` directory live on a shared filesystem that is
101 101 seen by both the controller and engines. If you don't have a shared file
102 102 system you will need to use :command:`ipcontroller` and
103 103 :command:`ipengine` directly.
104 104
105 105 Under the hood, :command:`ipcluster` just uses :command:`ipcontroller`
106 106 and :command:`ipengine` to perform the steps described above.
107 107
108 108 The simplest way to use ipcluster requires no configuration, and will
109 109 launch a controller and a number of engines on the local machine. For instance,
110 110 to start one controller and 4 engines on localhost, just do::
111 111
112 112 $ ipcluster start -n 4
113 113
114 114 To see other command line options, do::
115 115
116 116 $ ipcluster -h
117 117
118 118
119 119 Configuring an IPython cluster
120 120 ==============================
121 121
122 122 Cluster configurations are stored as `profiles`. You can create a new profile with::
123 123
124 124 $ ipython profile create --parallel --profile=myprofile
125 125
126 126 This will create the directory :file:`IPYTHONDIR/profile_myprofile`, and populate it
127 127 with the default configuration files for the three IPython cluster commands. Once
128 128 you edit those files, you can continue to call ipcluster/ipcontroller/ipengine
129 129 with no arguments beyond ``profile=myprofile``, and any configuration will be maintained.
130 130
131 131 There is no limit to the number of profiles you can have, so you can maintain a profile for each
132 132 of your common use cases. The default profile will be used whenever the
133 133 profile argument is not specified, so edit :file:`IPYTHONDIR/profile_default/*_config.py` to
134 134 represent your most common use case.
135 135
136 136 The configuration files are loaded with commented-out settings and explanations,
137 137 which should cover most of the available possibilities.
138 138
139 139 Using various batch systems with :command:`ipcluster`
140 140 -----------------------------------------------------
141 141
142 142 :command:`ipcluster` has a notion of Launchers that can start controllers
143 143 and engines with various remote execution schemes. Currently supported
144 144 models include :command:`ssh`, :command:`mpiexec`, PBS-style (Torque, SGE, LSF),
145 145 and Windows HPC Server.
146 146
147 147 In general, these are configured by the :attr:`IPClusterEngines.engine_set_launcher_class`,
148 148 and :attr:`IPClusterStart.controller_launcher_class` configurables, which can be the
149 149 fully specified object name (e.g. ``'IPython.parallel.apps.launcher.LocalControllerLauncher'``),
150 150 but if you are using IPython's builtin launchers, you can specify just the class name,
151 151 or even just the prefix e.g:
152 152
153 153 .. sourcecode:: python
154 154
155 155 c.IPClusterEngines.engine_launcher_class = 'SSH'
156 156 # equivalent to
157 157 c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
158 158 # both of which expand to
159 159 c.IPClusterEngines.engine_launcher_class = 'IPython.parallel.apps.launcher.SSHEngineSetLauncher'
160 160
161 161 The shortest form being of particular use on the command line, where all you need to do to
162 162 get an IPython cluster running with engines started with MPI is:
163 163
164 164 .. sourcecode:: bash
165 165
166 166 $> ipcluster start --engines=MPIExec
167 167
168 168 Assuming that the default MPI config is sufficient.
169 169
170 170 .. note::
171 171
172 172 shortcuts for builtin launcher names were added in 0.12, as was the ``_class`` suffix
173 173 on the configurable names. If you use the old 0.11 names (e.g. ``engine_set_launcher``),
174 174 they will still work, but you will get a deprecation warning that the name has changed.
175 175
176 176
177 177 .. note::
178 178
179 179 The Launchers and configuration are designed in such a way that advanced
180 180 users can subclass and configure them to fit their own system that we
181 181 have not yet supported (such as Condor)
182 182
183 183 Using :command:`ipcluster` in mpiexec/mpirun mode
184 --------------------------------------------------
184 -------------------------------------------------
185 185
186 186
187 187 The mpiexec/mpirun mode is useful if you:
188 188
189 189 1. Have MPI installed.
190 190 2. Your systems are configured to use the :command:`mpiexec` or
191 191 :command:`mpirun` commands to start MPI processes.
192 192
193 193 If these are satisfied, you can create a new profile::
194 194
195 195 $ ipython profile create --parallel --profile=mpi
196 196
197 197 and edit the file :file:`IPYTHONDIR/profile_mpi/ipcluster_config.py`.
198 198
199 199 There, instruct ipcluster to use the MPIExec launchers by adding the lines:
200 200
201 201 .. sourcecode:: python
202 202
203 203 c.IPClusterEngines.engine_launcher_class = 'MPIExecEngineSetLauncher'
204 204
205 205 If the default MPI configuration is correct, then you can now start your cluster, with::
206 206
207 207 $ ipcluster start -n 4 --profile=mpi
208 208
209 209 This does the following:
210 210
211 211 1. Starts the IPython controller on current host.
212 212 2. Uses :command:`mpiexec` to start 4 engines.
213 213
214 214 If you have a reason to also start the Controller with mpi, you can specify:
215 215
216 216 .. sourcecode:: python
217 217
218 218 c.IPClusterStart.controller_launcher_class = 'MPIExecControllerLauncher'
219 219
220 220 .. note::
221 221
222 222 The Controller *will not* be in the same MPI universe as the engines, so there is not
223 223 much reason to do this unless sysadmins demand it.
224 224
225 225 On newer MPI implementations (such as OpenMPI), this will work even if you
226 226 don't make any calls to MPI or call :func:`MPI_Init`. However, older MPI
227 227 implementations actually require each process to call :func:`MPI_Init` upon
228 228 starting. The easiest way of having this done is to install the mpi4py
229 229 [mpi4py]_ package and then specify the ``c.MPI.use`` option in :file:`ipengine_config.py`:
230 230
231 231 .. sourcecode:: python
232 232
233 233 c.MPI.use = 'mpi4py'
234 234
235 235 Unfortunately, even this won't work for some MPI implementations. If you are
236 236 having problems with this, you will likely have to use a custom Python
237 237 executable that itself calls :func:`MPI_Init` at the appropriate time.
238 238 Fortunately, mpi4py comes with such a custom Python executable that is easy to
239 239 install and use. However, this custom Python executable approach will not work
240 240 with :command:`ipcluster` currently.
241 241
242 242 More details on using MPI with IPython can be found :ref:`here <parallelmpi>`.
243 243
244 244
245 245 Using :command:`ipcluster` in PBS mode
246 ---------------------------------------
246 --------------------------------------
247 247
248 248 The PBS mode uses the Portable Batch System (PBS) to start the engines.
249 249
250 250 As usual, we will start by creating a fresh profile::
251 251
252 252 $ ipython profile create --parallel --profile=pbs
253 253
254 254 And in :file:`ipcluster_config.py`, we will select the PBS launchers for the controller
255 255 and engines:
256 256
257 257 .. sourcecode:: python
258 258
259 259 c.IPClusterStart.controller_launcher_class = 'PBSControllerLauncher'
260 260 c.IPClusterEngines.engine_launcher_class = 'PBSEngineSetLauncher'
261 261
262 262 .. note::
263 263
264 264 Note that the configurable is IPClusterEngines for the engine launcher, and
265 265 IPClusterStart for the controller launcher. This is because the start command is a
266 266 subclass of the engine command, adding a controller launcher. Since it is a subclass,
267 267 any configuration made in IPClusterEngines is inherited by IPClusterStart unless it is
268 268 overridden.
269 269
270 270 IPython does provide simple default batch templates for PBS and SGE, but you may need
271 271 to specify your own. Here is a sample PBS script template:
272 272
273 273 .. sourcecode:: bash
274 274
275 275 #PBS -N ipython
276 276 #PBS -j oe
277 277 #PBS -l walltime=00:10:00
278 278 #PBS -l nodes={n/4}:ppn=4
279 279 #PBS -q {queue}
280 280
281 281 cd $PBS_O_WORKDIR
282 282 export PATH=$HOME/usr/local/bin
283 283 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
284 284 /usr/local/bin/mpiexec -n {n} ipengine --profile-dir={profile_dir}
285 285
286 286 There are a few important points about this template:
287 287
288 288 1. This template will be rendered at runtime using IPython's :class:`EvalFormatter`.
289 289 This is simply a subclass of :class:`string.Formatter` that allows simple expressions
290 290 on keys.
291 291
292 292 2. Instead of putting in the actual number of engines, use the notation
293 293 ``{n}`` to indicate the number of engines to be started. You can also use
294 294 expressions like ``{n/4}`` in the template to indicate the number of nodes.
295 295 There will always be ``{n}`` and ``{profile_dir}`` variables passed to the formatter.
296 296 These allow the batch system to know how many engines, and where the configuration
297 297 files reside. The same is true for the batch queue, with the template variable
298 298 ``{queue}``.
299 299
300 300 3. Any options to :command:`ipengine` can be given in the batch script
301 301 template, or in :file:`ipengine_config.py`.
302 302
303 303 4. Depending on the configuration of you system, you may have to set
304 304 environment variables in the script template.
305 305
306 306 The controller template should be similar, but simpler:
307 307
308 308 .. sourcecode:: bash
309 309
310 310 #PBS -N ipython
311 311 #PBS -j oe
312 312 #PBS -l walltime=00:10:00
313 313 #PBS -l nodes=1:ppn=4
314 314 #PBS -q {queue}
315 315
316 316 cd $PBS_O_WORKDIR
317 317 export PATH=$HOME/usr/local/bin
318 318 export PYTHONPATH=$HOME/usr/local/lib/python2.7/site-packages
319 319 ipcontroller --profile-dir={profile_dir}
320 320
321 321
322 322 Once you have created these scripts, save them with names like
323 323 :file:`pbs.engine.template`. Now you can load them into the :file:`ipcluster_config` with:
324 324
325 325 .. sourcecode:: python
326 326
327 327 c.PBSEngineSetLauncher.batch_template_file = "pbs.engine.template"
328 328
329 329 c.PBSControllerLauncher.batch_template_file = "pbs.controller.template"
330 330
331 331
332 332 Alternately, you can just define the templates as strings inside :file:`ipcluster_config`.
333 333
334 334 Whether you are using your own templates or our defaults, the extra configurables available are
335 335 the number of engines to launch (``{n}``, and the batch system queue to which the jobs are to be
336 336 submitted (``{queue}``)). These are configurables, and can be specified in
337 337 :file:`ipcluster_config`:
338 338
339 339 .. sourcecode:: python
340 340
341 341 c.PBSLauncher.queue = 'veryshort.q'
342 342 c.IPClusterEngines.n = 64
343 343
344 344 Note that assuming you are running PBS on a multi-node cluster, the Controller's default behavior
345 345 of listening only on localhost is likely too restrictive. In this case, also assuming the
346 346 nodes are safely behind a firewall, you can simply instruct the Controller to listen for
347 347 connections on all its interfaces, by adding in :file:`ipcontroller_config`:
348 348
349 349 .. sourcecode:: python
350 350
351 351 c.HubFactory.ip = '*'
352 352
353 353 You can now run the cluster with::
354 354
355 355 $ ipcluster start --profile=pbs -n 128
356 356
357 357 Additional configuration options can be found in the PBS section of :file:`ipcluster_config`.
358 358
359 359 .. note::
360 360
361 361 Due to the flexibility of configuration, the PBS launchers work with simple changes
362 362 to the template for other :command:`qsub`-using systems, such as Sun Grid Engine,
363 363 and with further configuration in similar batch systems like Condor.
364 364
365 365
366 366 Using :command:`ipcluster` in SSH mode
367 ---------------------------------------
367 --------------------------------------
368 368
369 369
370 370 The SSH mode uses :command:`ssh` to execute :command:`ipengine` on remote
371 371 nodes and :command:`ipcontroller` can be run remotely as well, or on localhost.
372 372
373 373 .. note::
374 374
375 375 When using this mode it highly recommended that you have set up SSH keys
376 376 and are using ssh-agent [SSH]_ for password-less logins.
377 377
378 378 As usual, we start by creating a clean profile::
379 379
380 380 $ ipython profile create --parallel --profile=ssh
381 381
382 382 To use this mode, select the SSH launchers in :file:`ipcluster_config.py`:
383 383
384 384 .. sourcecode:: python
385 385
386 386 c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
387 387 # and if the Controller is also to be remote:
388 388 c.IPClusterStart.controller_launcher_class = 'SSHControllerLauncher'
389 389
390 390
391 391
392 392 The controller's remote location and configuration can be specified:
393 393
394 394 .. sourcecode:: python
395 395
396 396 # Set the user and hostname for the controller
397 397 # c.SSHControllerLauncher.hostname = 'controller.example.com'
398 398 # c.SSHControllerLauncher.user = os.environ.get('USER','username')
399 399
400 400 # Set the arguments to be passed to ipcontroller
401 401 # note that remotely launched ipcontroller will not get the contents of
402 402 # the local ipcontroller_config.py unless it resides on the *remote host*
403 403 # in the location specified by the `profile-dir` argument.
404 # c.SSHControllerLauncher.program_args = ['--reuse', '--ip=*', '--profile-dir=/path/to/cd']
404 # c.SSHControllerLauncher.controller_args = ['--reuse', '--ip=*', '--profile-dir=/path/to/cd']
405 405
406 406 .. note::
407 407
408 408 SSH mode does not do any file movement, so you will need to distribute configuration
409 409 files manually. To aid in this, the `reuse_files` flag defaults to True for ssh-launched
410 410 Controllers, so you will only need to do this once, unless you override this flag back
411 411 to False.
412 412
413 413 Engines are specified in a dictionary, by hostname and the number of engines to be run
414 414 on that host.
415 415
416 416 .. sourcecode:: python
417 417
418 418 c.SSHEngineSetLauncher.engines = { 'host1.example.com' : 2,
419 419 'host2.example.com' : 5,
420 420 'host3.example.com' : (1, ['--profile-dir=/home/different/location']),
421 421 'host4.example.com' : 8 }
422 422
423 423 * The `engines` dict, where the keys are the host we want to run engines on and
424 424 the value is the number of engines to run on that host.
425 425 * on host3, the value is a tuple, where the number of engines is first, and the arguments
426 426 to be passed to :command:`ipengine` are the second element.
427 427
428 428 For engines without explicitly specified arguments, the default arguments are set in
429 429 a single location:
430 430
431 431 .. sourcecode:: python
432 432
433 433 c.SSHEngineSetLauncher.engine_args = ['--profile-dir=/path/to/profile_ssh']
434 434
435 435 Current limitations of the SSH mode of :command:`ipcluster` are:
436 436
437 437 * Untested on Windows. Would require a working :command:`ssh` on Windows.
438 438 Also, we are using shell scripts to setup and execute commands on remote
439 439 hosts.
440 440 * No file movement - This is a regression from 0.10, which moved connection files
441 around with scp. This will be improved, but not before 0.11 release.
441 around with scp. This will be improved, Pull Requests are welcome.
442
442 443
443 444 Using the :command:`ipcontroller` and :command:`ipengine` commands
444 ====================================================================
445 ==================================================================
445 446
446 447 It is also possible to use the :command:`ipcontroller` and :command:`ipengine`
447 448 commands to start your controller and engines. This approach gives you full
448 449 control over all aspects of the startup process.
449 450
450 451 Starting the controller and engine on your local machine
451 452 --------------------------------------------------------
452 453
453 454 To use :command:`ipcontroller` and :command:`ipengine` to start things on your
454 455 local machine, do the following.
455 456
456 457 First start the controller::
457 458
458 459 $ ipcontroller
459 460
460 461 Next, start however many instances of the engine you want using (repeatedly)
461 462 the command::
462 463
463 464 $ ipengine
464 465
465 466 The engines should start and automatically connect to the controller using the
466 467 JSON files in :file:`~/.ipython/profile_default/security`. You are now ready to use the
467 468 controller and engines from IPython.
468 469
469 470 .. warning::
470 471
471 472 The order of the above operations may be important. You *must*
472 473 start the controller before the engines, unless you are reusing connection
473 474 information (via ``--reuse``), in which case ordering is not important.
474 475
475 476 .. note::
476 477
477 478 On some platforms (OS X), to put the controller and engine into the
478 479 background you may need to give these commands in the form ``(ipcontroller
479 480 &)`` and ``(ipengine &)`` (with the parentheses) for them to work
480 481 properly.
481 482
482 483 Starting the controller and engines on different hosts
483 484 ------------------------------------------------------
484 485
485 486 When the controller and engines are running on different hosts, things are
486 487 slightly more complicated, but the underlying ideas are the same:
487 488
488 489 1. Start the controller on a host using :command:`ipcontroller`. The controller must be
489 490 instructed to listen on an interface visible to the engine machines, via the ``ip``
490 command-line argument or ``HubFactory.ip`` in :file:`ipcontroller_config.py`.
491 command-line argument or ``HubFactory.ip`` in :file:`ipcontroller_config.py`::
492
493 $ ipcontroller --ip=192.168.1.16
494
495 .. sourcecode:: python
496
497 # in ipcontroller_config.py
498 HubFactory.ip = '192.168.1.16'
499
491 500 2. Copy :file:`ipcontroller-engine.json` from :file:`~/.ipython/profile_<name>/security` on
492 501 the controller's host to the host where the engines will run.
493 502 3. Use :command:`ipengine` on the engine's hosts to start the engines.
494 503
495 504 The only thing you have to be careful of is to tell :command:`ipengine` where
496 505 the :file:`ipcontroller-engine.json` file is located. There are two ways you
497 506 can do this:
498 507
499 508 * Put :file:`ipcontroller-engine.json` in the :file:`~/.ipython/profile_<name>/security`
500 509 directory on the engine's host, where it will be found automatically.
501 510 * Call :command:`ipengine` with the ``--file=full_path_to_the_file``
502 511 flag.
503 512
504 513 The ``file`` flag works like this::
505 514
506 515 $ ipengine --file=/path/to/my/ipcontroller-engine.json
507 516
508 517 .. note::
509 518
510 519 If the controller's and engine's hosts all have a shared file system
511 520 (:file:`~/.ipython/profile_<name>/security` is the same on all of them), then things
512 521 will just work!
513 522
514 523 SSH Tunnels
515 524 ***********
516 525
517 526 If your engines are not on the same LAN as the controller, or you are on a highly
518 527 restricted network where your nodes cannot see each others ports, then you can
519 528 use SSH tunnels to connect engines to the controller.
520 529
521 530 .. note::
522 531
523 532 This does not work in all cases. Manual tunnels may be an option, but are
524 533 highly inconvenient. Support for manual tunnels will be improved.
525 534
526 535 You can instruct all engines to use ssh, by specifying the ssh server in
527 536 :file:`ipcontroller-engine.json`:
528 537
529 538 .. I know this is really JSON, but the example is a subset of Python:
530 539 .. sourcecode:: python
531 540
532 541 {
533 542 "url":"tcp://192.168.1.123:56951",
534 543 "exec_key":"26f4c040-587d-4a4e-b58b-030b96399584",
535 544 "ssh":"user@example.com",
536 545 "location":"192.168.1.123"
537 546 }
538 547
539 548 This will be specified if you give the ``--enginessh=use@example.com`` argument when
540 549 starting :command:`ipcontroller`.
541 550
542 551 Or you can specify an ssh server on the command-line when starting an engine::
543 552
544 553 $> ipengine --profile=foo --ssh=my.login.node
545 554
546 555 For example, if your system is totally restricted, then all connections will actually be
547 556 loopback, and ssh tunnels will be used to connect engines to the controller::
548 557
549 558 [node1] $> ipcontroller --enginessh=node1
550 559 [node2] $> ipengine
551 560 [node3] $> ipcluster engines --n=4
552 561
553 562 Or if you want to start many engines on each node, the command `ipcluster engines --n=4`
554 563 without any configuration is equivalent to running ipengine 4 times.
555 564
565 An example using ipcontroller/engine with ssh
566 ---------------------------------------------
567
568 No configuration files are necessary to use ipcontroller/engine in an SSH environment
569 without a shared filesystem. You simply need to make sure that the controller is listening
570 on an interface visible to the engines, and move the connection file from the controller to
571 the engines.
572
573 1. start the controller, listening on an ip-address visible to the engine machines::
574
575 [controller.host] $ ipcontroller --ip=192.168.1.16
576
577 [IPControllerApp] Using existing profile dir: u'/Users/me/.ipython/profile_default'
578 [IPControllerApp] Hub listening on tcp://192.168.1.16:63320 for registration.
579 [IPControllerApp] Hub using DB backend: 'IPython.parallel.controller.dictdb.DictDB'
580 [IPControllerApp] hub::created hub
581 [IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-client.json
582 [IPControllerApp] writing connection info to /Users/me/.ipython/profile_default/security/ipcontroller-engine.json
583 [IPControllerApp] task::using Python leastload Task scheduler
584 [IPControllerApp] Heartmonitor started
585 [IPControllerApp] Creating pid file: /Users/me/.ipython/profile_default/pid/ipcontroller.pid
586 Scheduler started [leastload]
587
588 2. on each engine, fetch the connection file with scp::
589
590 [engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ./
591
592 .. note::
593
594 The log output of ipcontroller above shows you where the json files were written.
595 They will be in :file:`~/.ipython` (or :file:`~/.config/ipython`) under
596 :file:`profile_default/security/ipcontroller-engine.json`
597
598 3. start the engines, using the connection file::
599
600 [engine.host.n] $ ipengine --file=./ipcontroller-engine.json
601
602 A couple of notes:
603
604 * You can avoid having to fetch the connection file every time by adding ``--reuse`` flag
605 to ipcontroller, which instructs the controller to read the previous connection file for
606 connection info, rather than generate a new one with randomized ports.
607
608 * In step 2, if you fetch the connection file directly into the security dir of a profile,
609 then you need not specify its path directly, only the profile (assumes the path exists,
610 otherwise you must create it first)::
611
612 [engine.host.n] $ scp controller.host:.ipython/profile_default/security/ipcontroller-engine.json ~/.ipython/profile_ssh/security/
613 [engine.host.n] $ ipengine --profile=ssh
614
615 Of course, if you fetch the file into the default profile, no arguments must be passed to
616 ipengine at all.
617
618 * Note that ipengine *did not* specify the ip argument. In general, it is unlikely for any
619 connection information to be specified at the command-line to ipengine, as all of this
620 information should be contained in the connection file written by ipcontroller.
556 621
557 622 Make JSON files persistent
558 623 --------------------------
559 624
560 625 At fist glance it may seem that that managing the JSON files is a bit
561 626 annoying. Going back to the house and key analogy, copying the JSON around
562 627 each time you start the controller is like having to make a new key every time
563 628 you want to unlock the door and enter your house. As with your house, you want
564 629 to be able to create the key (or JSON file) once, and then simply use it at
565 630 any point in the future.
566 631
567 632 To do this, the only thing you have to do is specify the `--reuse` flag, so that
568 633 the connection information in the JSON files remains accurate::
569 634
570 635 $ ipcontroller --reuse
571 636
572 637 Then, just copy the JSON files over the first time and you are set. You can
573 638 start and stop the controller and engines any many times as you want in the
574 639 future, just make sure to tell the controller to reuse the file.
575 640
576 641 .. note::
577 642
578 643 You may ask the question: what ports does the controller listen on if you
579 644 don't tell is to use specific ones? The default is to use high random port
580 645 numbers. We do this for two reasons: i) to increase security through
581 646 obscurity and ii) to multiple controllers on a given host to start and
582 647 automatically use different ports.
583 648
584 649 Log files
585 650 ---------
586 651
587 652 All of the components of IPython have log files associated with them.
588 653 These log files can be extremely useful in debugging problems with
589 654 IPython and can be found in the directory :file:`~/.ipython/profile_<name>/log`.
590 655 Sending the log files to us will often help us to debug any problems.
591 656
592 657
593 658 Configuring `ipcontroller`
594 659 ---------------------------
595 660
596 661 The IPython Controller takes its configuration from the file :file:`ipcontroller_config.py`
597 662 in the active profile directory.
598 663
599 664 Ports and addresses
600 665 *******************
601 666
602 667 In many cases, you will want to configure the Controller's network identity. By default,
603 668 the Controller listens only on loopback, which is the most secure but often impractical.
604 669 To instruct the controller to listen on a specific interface, you can set the
605 670 :attr:`HubFactory.ip` trait. To listen on all interfaces, simply specify:
606 671
607 672 .. sourcecode:: python
608 673
609 674 c.HubFactory.ip = '*'
610 675
611 676 When connecting to a Controller that is listening on loopback or behind a firewall, it may
612 677 be necessary to specify an SSH server to use for tunnels, and the external IP of the
613 678 Controller. If you specified that the HubFactory listen on loopback, or all interfaces,
614 679 then IPython will try to guess the external IP. If you are on a system with VM network
615 680 devices, or many interfaces, this guess may be incorrect. In these cases, you will want
616 681 to specify the 'location' of the Controller. This is the IP of the machine the Controller
617 682 is on, as seen by the clients, engines, or the SSH server used to tunnel connections.
618 683
619 684 For example, to set up a cluster with a Controller on a work node, using ssh tunnels
620 685 through the login node, an example :file:`ipcontroller_config.py` might contain:
621 686
622 687 .. sourcecode:: python
623 688
624 689 # allow connections on all interfaces from engines
625 690 # engines on the same node will use loopback, while engines
626 691 # from other nodes will use an external IP
627 692 c.HubFactory.ip = '*'
628 693
629 694 # you typically only need to specify the location when there are extra
630 695 # interfaces that may not be visible to peer nodes (e.g. VM interfaces)
631 696 c.HubFactory.location = '10.0.1.5'
632 697 # or to get an automatic value, try this:
633 698 import socket
634 699 ex_ip = socket.gethostbyname_ex(socket.gethostname())[-1][0]
635 700 c.HubFactory.location = ex_ip
636 701
637 702 # now instruct clients to use the login node for SSH tunnels:
638 703 c.HubFactory.ssh_server = 'login.mycluster.net'
639 704
640 705 After doing this, your :file:`ipcontroller-client.json` file will look something like this:
641 706
642 707 .. this can be Python, despite the fact that it's actually JSON, because it's
643 708 .. still valid Python
644 709
645 710 .. sourcecode:: python
646 711
647 712 {
648 713 "url":"tcp:\/\/*:43447",
649 714 "exec_key":"9c7779e4-d08a-4c3b-ba8e-db1f80b562c1",
650 715 "ssh":"login.mycluster.net",
651 716 "location":"10.0.1.5"
652 717 }
653 718
654 719 Then this file will be all you need for a client to connect to the controller, tunneling
655 720 SSH connections through login.mycluster.net.
656 721
657 722 Database Backend
658 723 ****************
659 724
660 725 The Hub stores all messages and results passed between Clients and Engines.
661 726 For large and/or long-running clusters, it would be unreasonable to keep all
662 727 of this information in memory. For this reason, we have two database backends:
663 728 [MongoDB]_ via PyMongo_, and SQLite with the stdlib :py:mod:`sqlite`.
664 729
665 730 MongoDB is our design target, and the dict-like model it uses has driven our design. As far
666 731 as we are concerned, BSON can be considered essentially the same as JSON, adding support
667 732 for binary data and datetime objects, and any new database backend must support the same
668 733 data types.
669 734
670 735 .. seealso::
671 736
672 737 MongoDB `BSON doc <http://www.mongodb.org/display/DOCS/BSON>`_
673 738
674 739 To use one of these backends, you must set the :attr:`HubFactory.db_class` trait:
675 740
676 741 .. sourcecode:: python
677 742
678 743 # for a simple dict-based in-memory implementation, use dictdb
679 744 # This is the default and the fastest, since it doesn't involve the filesystem
680 745 c.HubFactory.db_class = 'IPython.parallel.controller.dictdb.DictDB'
681 746
682 747 # To use MongoDB:
683 748 c.HubFactory.db_class = 'IPython.parallel.controller.mongodb.MongoDB'
684 749
685 750 # and SQLite:
686 751 c.HubFactory.db_class = 'IPython.parallel.controller.sqlitedb.SQLiteDB'
687 752
688 753 When using the proper databases, you can actually allow for tasks to persist from
689 754 one session to the next by specifying the MongoDB database or SQLite table in
690 755 which tasks are to be stored. The default is to use a table named for the Hub's Session,
691 756 which is a UUID, and thus different every time.
692 757
693 758 .. sourcecode:: python
694 759
695 760 # To keep persistant task history in MongoDB:
696 761 c.MongoDB.database = 'tasks'
697 762
698 763 # and in SQLite:
699 764 c.SQLiteDB.table = 'tasks'
700 765
701 766
702 767 Since MongoDB servers can be running remotely or configured to listen on a particular port,
703 768 you can specify any arguments you may need to the PyMongo `Connection
704 769 <http://api.mongodb.org/python/1.9/api/pymongo/connection.html#pymongo.connection.Connection>`_:
705 770
706 771 .. sourcecode:: python
707 772
708 773 # positional args to pymongo.Connection
709 774 c.MongoDB.connection_args = []
710 775
711 776 # keyword args to pymongo.Connection
712 777 c.MongoDB.connection_kwargs = {}
713 778
714 779 .. _MongoDB: http://www.mongodb.org
715 780 .. _PyMongo: http://api.mongodb.org/python/1.9/
716 781
717 782 Configuring `ipengine`
718 783 -----------------------
719 784
720 785 The IPython Engine takes its configuration from the file :file:`ipengine_config.py`
721 786
722 787 The Engine itself also has some amount of configuration. Most of this
723 788 has to do with initializing MPI or connecting to the controller.
724 789
725 790 To instruct the Engine to initialize with an MPI environment set up by
726 791 mpi4py, add:
727 792
728 793 .. sourcecode:: python
729 794
730 795 c.MPI.use = 'mpi4py'
731 796
732 797 In this case, the Engine will use our default mpi4py init script to set up
733 798 the MPI environment prior to exection. We have default init scripts for
734 799 mpi4py and pytrilinos. If you want to specify your own code to be run
735 800 at the beginning, specify `c.MPI.init_script`.
736 801
737 802 You can also specify a file or python command to be run at startup of the
738 803 Engine:
739 804
740 805 .. sourcecode:: python
741 806
742 807 c.IPEngineApp.startup_script = u'/path/to/my/startup.py'
743 808
744 809 c.IPEngineApp.startup_command = 'import numpy, scipy, mpi4py'
745 810
746 811 These commands/files will be run again, after each
747 812
748 813 It's also useful on systems with shared filesystems to run the engines
749 814 in some scratch directory. This can be set with:
750 815
751 816 .. sourcecode:: python
752 817
753 818 c.IPEngineApp.work_dir = u'/path/to/scratch/'
754 819
755 820
756 821
757 822 .. [MongoDB] MongoDB database http://www.mongodb.org
758 823
759 824 .. [PBS] Portable Batch System http://www.openpbs.org
760 825
761 826 .. [SSH] SSH-Agent http://en.wikipedia.org/wiki/ssh-agent
@@ -1,332 +1,360 b''
1 1 ============================================
2 2 Getting started with Windows HPC Server 2008
3 3 ============================================
4 4
5 .. note::
6
7 Not adapted to zmq yet
8
9 5 Introduction
10 6 ============
11 7
12 8 The Python programming language is an increasingly popular language for
13 9 numerical computing. This is due to a unique combination of factors. First,
14 10 Python is a high-level and *interactive* language that is well matched to
15 11 interactive numerical work. Second, it is easy (often times trivial) to
16 12 integrate legacy C/C++/Fortran code into Python. Third, a large number of
17 13 high-quality open source projects provide all the needed building blocks for
18 14 numerical computing: numerical arrays (NumPy), algorithms (SciPy), 2D/3D
19 15 Visualization (Matplotlib, Mayavi, Chaco), Symbolic Mathematics (Sage, Sympy)
20 16 and others.
21 17
22 18 The IPython project is a core part of this open-source toolchain and is
23 19 focused on creating a comprehensive environment for interactive and
24 20 exploratory computing in the Python programming language. It enables all of
25 21 the above tools to be used interactively and consists of two main components:
26 22
27 23 * An enhanced interactive Python shell with support for interactive plotting
28 24 and visualization.
29 25 * An architecture for interactive parallel computing.
30 26
31 27 With these components, it is possible to perform all aspects of a parallel
32 28 computation interactively. This type of workflow is particularly relevant in
33 29 scientific and numerical computing where algorithms, code and data are
34 30 continually evolving as the user/developer explores a problem. The broad
35 31 treads in computing (commodity clusters, multicore, cloud computing, etc.)
36 32 make these capabilities of IPython particularly relevant.
37 33
38 34 While IPython is a cross platform tool, it has particularly strong support for
39 35 Windows based compute clusters running Windows HPC Server 2008. This document
40 36 describes how to get started with IPython on Windows HPC Server 2008. The
41 37 content and emphasis here is practical: installing IPython, configuring
42 38 IPython to use the Windows job scheduler and running example parallel programs
43 39 interactively. A more complete description of IPython's parallel computing
44 40 capabilities can be found in IPython's online documentation
45 41 (http://ipython.org/documentation.html).
46 42
47 43 Setting up your Windows cluster
48 44 ===============================
49 45
50 46 This document assumes that you already have a cluster running Windows
51 47 HPC Server 2008. Here is a broad overview of what is involved with setting up
52 48 such a cluster:
53 49
54 50 1. Install Windows Server 2008 on the head and compute nodes in the cluster.
55 51 2. Setup the network configuration on each host. Each host should have a
56 52 static IP address.
57 53 3. On the head node, activate the "Active Directory Domain Services" role
58 54 and make the head node the domain controller.
59 55 4. Join the compute nodes to the newly created Active Directory (AD) domain.
60 56 5. Setup user accounts in the domain with shared home directories.
61 57 6. Install the HPC Pack 2008 on the head node to create a cluster.
62 58 7. Install the HPC Pack 2008 on the compute nodes.
63 59
64 60 More details about installing and configuring Windows HPC Server 2008 can be
65 61 found on the Windows HPC Home Page (http://www.microsoft.com/hpc). Regardless
66 62 of what steps you follow to set up your cluster, the remainder of this
67 63 document will assume that:
68 64
69 65 * There are domain users that can log on to the AD domain and submit jobs
70 66 to the cluster scheduler.
71 67 * These domain users have shared home directories. While shared home
72 68 directories are not required to use IPython, they make it much easier to
73 69 use IPython.
74 70
75 71 Installation of IPython and its dependencies
76 72 ============================================
77 73
78 74 IPython and all of its dependencies are freely available and open source.
79 75 These packages provide a powerful and cost-effective approach to numerical and
80 76 scientific computing on Windows. The following dependencies are needed to run
81 77 IPython on Windows:
82 78
83 79 * Python 2.6 or 2.7 (http://www.python.org)
84 80 * pywin32 (http://sourceforge.net/projects/pywin32/)
85 81 * PyReadline (https://launchpad.net/pyreadline)
86 82 * pyzmq (http://github.com/zeromq/pyzmq/downloads)
87 83 * IPython (http://ipython.org)
88 84
89 85 In addition, the following dependencies are needed to run the demos described
90 86 in this document.
91 87
92 88 * NumPy and SciPy (http://www.scipy.org)
93 89 * Matplotlib (http://matplotlib.sourceforge.net/)
94 90
95 91 The easiest way of obtaining these dependencies is through the Enthought
96 92 Python Distribution (EPD) (http://www.enthought.com/products/epd.php). EPD is
97 93 produced by Enthought, Inc. and contains all of these packages and others in a
98 94 single installer and is available free for academic users. While it is also
99 95 possible to download and install each package individually, this is a tedious
100 96 process. Thus, we highly recommend using EPD to install these packages on
101 97 Windows.
102 98
103 99 Regardless of how you install the dependencies, here are the steps you will
104 100 need to follow:
105 101
106 102 1. Install all of the packages listed above, either individually or using EPD
107 103 on the head node, compute nodes and user workstations.
108 104
109 105 2. Make sure that :file:`C:\\Python27` and :file:`C:\\Python27\\Scripts` are
110 106 in the system :envvar:`%PATH%` variable on each node.
111 107
112 108 3. Install the latest development version of IPython. This can be done by
113 109 downloading the the development version from the IPython website
114 110 (http://ipython.org) and following the installation instructions.
115 111
116 112 Further details about installing IPython or its dependencies can be found in
117 113 the online IPython documentation (http://ipython.org/documentation.html)
118 114 Once you are finished with the installation, you can try IPython out by
119 115 opening a Windows Command Prompt and typing ``ipython``. This will
120 116 start IPython's interactive shell and you should see something like the
121 following screenshot:
117 following::
118
119 Microsoft Windows [Version 6.0.6001]
120 Copyright (c) 2006 Microsoft Corporation. All rights reserved.
121
122 Z:\>ipython
123 Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]
124 Type "copyright", "credits" or "license" for more information.
125
126 IPython 0.12.dev -- An enhanced Interactive Python.
127 ? -> Introduction and overview of IPython's features.
128 %quickref -> Quick reference.
129 help -> Python's own help system.
130 object? -> Details about 'object', use 'object??' for extra details.
131
132 In [1]:
122 133
123 .. image:: figs/ipython_shell.*
124 134
125 135 Starting an IPython cluster
126 136 ===========================
127 137
128 138 To use IPython's parallel computing capabilities, you will need to start an
129 139 IPython cluster. An IPython cluster consists of one controller and multiple
130 140 engines:
131 141
132 142 IPython controller
133 143 The IPython controller manages the engines and acts as a gateway between
134 144 the engines and the client, which runs in the user's interactive IPython
135 145 session. The controller is started using the :command:`ipcontroller`
136 146 command.
137 147
138 148 IPython engine
139 149 IPython engines run a user's Python code in parallel on the compute nodes.
140 150 Engines are starting using the :command:`ipengine` command.
141 151
142 152 Once these processes are started, a user can run Python code interactively and
143 153 in parallel on the engines from within the IPython shell using an appropriate
144 154 client. This includes the ability to interact with, plot and visualize data
145 155 from the engines.
146 156
147 157 IPython has a command line program called :command:`ipcluster` that automates
148 158 all aspects of starting the controller and engines on the compute nodes.
149 159 :command:`ipcluster` has full support for the Windows HPC job scheduler,
150 160 meaning that :command:`ipcluster` can use this job scheduler to start the
151 161 controller and engines. In our experience, the Windows HPC job scheduler is
152 162 particularly well suited for interactive applications, such as IPython. Once
153 163 :command:`ipcluster` is configured properly, a user can start an IPython
154 164 cluster from their local workstation almost instantly, without having to log
155 165 on to the head node (as is typically required by Unix based job schedulers).
156 166 This enables a user to move seamlessly between serial and parallel
157 167 computations.
158 168
159 169 In this section we show how to use :command:`ipcluster` to start an IPython
160 170 cluster using the Windows HPC Server 2008 job scheduler. To make sure that
161 171 :command:`ipcluster` is installed and working properly, you should first try
162 172 to start an IPython cluster on your local host. To do this, open a Windows
163 173 Command Prompt and type the following command::
164 174
165 ipcluster start n=2
175 ipcluster start -n 2
166 176
167 You should see a number of messages printed to the screen, ending with
168 "IPython cluster: started". The result should look something like the following
169 screenshot:
177 You should see a number of messages printed to the screen.
178 The result should look something like this::
170 179
171 .. image:: figs/ipcluster_start.*
180 Microsoft Windows [Version 6.1.7600]
181 Copyright (c) 2009 Microsoft Corporation. All rights reserved.
182
183 Z:\>ipcluster start --profile=mycluster
184 [IPClusterStart] Using existing profile dir: u'\\\\blue\\domainusers$\\bgranger\\.ipython\\profile_mycluster'
185 [IPClusterStart] Starting ipcluster with [daemon=False]
186 [IPClusterStart] Creating pid file: \\blue\domainusers$\bgranger\.ipython\profile_mycluster\pid\ipcluster.pid
187 [IPClusterStart] Writing job description file: \\blue\domainusers$\bgranger\.ipython\profile_mycluster\ipcontroller_job.xml
188 [IPClusterStart] Starting Win HPC Job: job submit /jobfile:\\blue\domainusers$\bgranger\.ipython\profile_mycluster\ipcontroller_job.xml /scheduler:HEADNODE
189 [IPClusterStart] Starting 15 engines
190 [IPClusterStart] Writing job description file: \\blue\domainusers$\bgranger\.ipython\profile_mycluster\ipcontroller_job.xml
191 [IPClusterStart] Starting Win HPC Job: job submit /jobfile:\\blue\domainusers$\bgranger\.ipython\profile_mycluster\ipengineset_job.xml /scheduler:HEADNODE
192
172 193
173 194 At this point, the controller and two engines are running on your local host.
174 195 This configuration is useful for testing and for situations where you want to
175 196 take advantage of multiple cores on your local computer.
176 197
177 198 Now that we have confirmed that :command:`ipcluster` is working properly, we
178 199 describe how to configure and run an IPython cluster on an actual compute
179 200 cluster running Windows HPC Server 2008. Here is an outline of the needed
180 201 steps:
181 202
182 1. Create a cluster profile using: ``ipython profile create --parallel profile=mycluster``
203 1. Create a cluster profile using: ``ipython profile create mycluster --parallel``
183 204
184 205 2. Edit configuration files in the directory :file:`.ipython\\cluster_mycluster`
185 206
186 3. Start the cluster using: ``ipcluser start profile=mycluster n=32``
207 3. Start the cluster using: ``ipcluster start --profile=mycluster -n 32``
187 208
188 209 Creating a cluster profile
189 210 --------------------------
190 211
191 212 In most cases, you will have to create a cluster profile to use IPython on a
192 213 cluster. A cluster profile is a name (like "mycluster") that is associated
193 214 with a particular cluster configuration. The profile name is used by
194 215 :command:`ipcluster` when working with the cluster.
195 216
196 217 Associated with each cluster profile is a cluster directory. This cluster
197 218 directory is a specially named directory (typically located in the
198 219 :file:`.ipython` subdirectory of your home directory) that contains the
199 220 configuration files for a particular cluster profile, as well as log files and
200 221 security keys. The naming convention for cluster directories is:
201 222 :file:`profile_<profile name>`. Thus, the cluster directory for a profile named
202 223 "foo" would be :file:`.ipython\\cluster_foo`.
203 224
204 225 To create a new cluster profile (named "mycluster") and the associated cluster
205 226 directory, type the following command at the Windows Command Prompt::
206 227
207 228 ipython profile create --parallel --profile=mycluster
208 229
209 230 The output of this command is shown in the screenshot below. Notice how
210 :command:`ipcluster` prints out the location of the newly created cluster
211 directory.
231 :command:`ipcluster` prints out the location of the newly created profile
232 directory::
233
234 Z:\>ipython profile create mycluster --parallel
235 [ProfileCreate] Generating default config file: u'\\\\blue\\domainusers$\\bgranger\\.ipython\\profile_mycluster\\ipython_config.py'
236 [ProfileCreate] Generating default config file: u'\\\\blue\\domainusers$\\bgranger\\.ipython\\profile_mycluster\\ipcontroller_config.py'
237 [ProfileCreate] Generating default config file: u'\\\\blue\\domainusers$\\bgranger\\.ipython\\profile_mycluster\\ipengine_config.py'
238 [ProfileCreate] Generating default config file: u'\\\\blue\\domainusers$\\bgranger\\.ipython\\profile_mycluster\\ipcluster_config.py'
239 [ProfileCreate] Generating default config file: u'\\\\blue\\domainusers$\\bgranger\\.ipython\\profile_mycluster\\iplogger_config.py'
212 240
213 .. image:: figs/ipcluster_create.*
241 Z:\>
214 242
215 243 Configuring a cluster profile
216 244 -----------------------------
217 245
218 246 Next, you will need to configure the newly created cluster profile by editing
219 247 the following configuration files in the cluster directory:
220 248
221 249 * :file:`ipcluster_config.py`
222 250 * :file:`ipcontroller_config.py`
223 251 * :file:`ipengine_config.py`
224 252
225 253 When :command:`ipcluster` is run, these configuration files are used to
226 254 determine how the engines and controller will be started. In most cases,
227 255 you will only have to set a few of the attributes in these files.
228 256
229 257 To configure :command:`ipcluster` to use the Windows HPC job scheduler, you
230 258 will need to edit the following attributes in the file
231 259 :file:`ipcluster_config.py`::
232 260
233 261 # Set these at the top of the file to tell ipcluster to use the
234 262 # Windows HPC job scheduler.
235 263 c.IPClusterStart.controller_launcher_class = 'WindowsHPCControllerLauncher'
236 264 c.IPClusterEngines.engine_launcher_class = 'WindowsHPCEngineSetLauncher'
237 265
238 266 # Set these to the host name of the scheduler (head node) of your cluster.
239 267 c.WindowsHPCControllerLauncher.scheduler = 'HEADNODE'
240 268 c.WindowsHPCEngineSetLauncher.scheduler = 'HEADNODE'
241 269
242 270 There are a number of other configuration attributes that can be set, but
243 271 in most cases these will be sufficient to get you started.
244 272
245 273 .. warning::
246 274 If any of your configuration attributes involve specifying the location
247 275 of shared directories or files, you must make sure that you use UNC paths
248 like :file:`\\\\host\\share`. It is also important that you specify
276 like :file:`\\\\host\\share`. It is helpful to specify
249 277 these paths using raw Python strings: ``r'\\host\share'`` to make sure
250 278 that the backslashes are properly escaped.
251 279
252 280 Starting the cluster profile
253 281 ----------------------------
254 282
255 283 Once a cluster profile has been configured, starting an IPython cluster using
256 284 the profile is simple::
257 285
258 286 ipcluster start --profile=mycluster -n 32
259 287
260 288 The ``-n`` option tells :command:`ipcluster` how many engines to start (in
261 289 this case 32). Stopping the cluster is as simple as typing Control-C.
262 290
263 291 Using the HPC Job Manager
264 292 -------------------------
265
293 føø
266 294 When ``ipcluster start`` is run the first time, :command:`ipcluster` creates
267 295 two XML job description files in the cluster directory:
268 296
269 297 * :file:`ipcontroller_job.xml`
270 298 * :file:`ipengineset_job.xml`
271 299
272 300 Once these files have been created, they can be imported into the HPC Job
273 301 Manager application. Then, the controller and engines for that profile can be
274 302 started using the HPC Job Manager directly, without using :command:`ipcluster`.
275 303 However, anytime the cluster profile is re-configured, ``ipcluster start``
276 304 must be run again to regenerate the XML job description files. The
277 305 following screenshot shows what the HPC Job Manager interface looks like
278 306 with a running IPython cluster.
279 307
280 308 .. image:: figs/hpc_job_manager.*
281 309
282 310 Performing a simple interactive parallel computation
283 311 ====================================================
284 312
285 313 Once you have started your IPython cluster, you can start to use it. To do
286 314 this, open up a new Windows Command Prompt and start up IPython's interactive
287 315 shell by typing::
288 316
289 317 ipython
290 318
291 Then you can create a :class:`MultiEngineClient` instance for your profile and
319 Then you can create a :class:`DirectView` instance for your profile and
292 320 use the resulting instance to do a simple interactive parallel computation. In
293 321 the code and screenshot that follows, we take a simple Python function and
294 322 apply it to each element of an array of integers in parallel using the
295 :meth:`MultiEngineClient.map` method:
323 :meth:`DirectView.map` method:
296 324
297 325 .. sourcecode:: ipython
298 326
299 327 In [1]: from IPython.parallel import *
300 328
301 In [2]: c = MultiEngineClient(profile='mycluster')
329 In [2]: c = Client(profile='mycluster')
302 330
303 In [3]: mec.get_ids()
304 Out[3]: [0, 1, 2, 3, 4, 5, 67, 8, 9, 10, 11, 12, 13, 14]
331 In [3]: view = c[:]
305 332
306 In [4]: def f(x):
333 In [4]: c.ids
334 Out[4]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
335
336 In [5]: def f(x):
307 337 ...: return x**10
308 338
309 In [5]: mec.map(f, range(15)) # f is applied in parallel
310 Out[5]:
339 In [6]: view.map(f, range(15)) # f is applied in parallel
340 Out[6]:
311 341 [0,
312 342 1,
313 343 1024,
314 344 59049,
315 345 1048576,
316 346 9765625,
317 347 60466176,
318 348 282475249,
319 349 1073741824,
320 350 3486784401L,
321 351 10000000000L,
322 352 25937424601L,
323 353 61917364224L,
324 354 137858491849L,
325 355 289254654976L]
326 356
327 357 The :meth:`map` method has the same signature as Python's builtin :func:`map`
328 358 function, but runs the calculation in parallel. More involved examples of using
329 :class:`MultiEngineClient` are provided in the examples that follow.
330
331 .. image:: figs/mec_simple.*
359 :class:`DirectView` are provided in the examples that follow.
332 360
1 NO CONTENT: file was removed, binary diff hidden
1 NO CONTENT: file was removed, binary diff hidden
1 NO CONTENT: file was removed, binary diff hidden
1 NO CONTENT: file was removed, binary diff hidden
1 NO CONTENT: file was removed, binary diff hidden
1 NO CONTENT: file was removed, binary diff hidden
General Comments 0
You need to be logged in to leave comments. Login now