##// END OF EJS Templates
add binary-tree engine interconnect example...
add binary-tree engine interconnect example implements parallel [all]reduce

File last commit:

r5487:57c7e48f
r5924:f164fa4e
Show More
parallel_demos.txt
275 lines | 10.8 KiB | text/plain | TextLexer
MinRK
clone parallel docs to parallelz
r3586 =================
Parallel examples
=================
In this section we describe two more involved examples of using an IPython
cluster to perform a parallel computation. In these examples, we will be using
IPython's "pylab" mode, which enables interactive plotting using the
Matplotlib package. IPython can be started in this mode by typing::
MinRK
update parallel demos for newparallel
r3621 ipython --pylab
MinRK
clone parallel docs to parallelz
r3586
MinRK
dependency tweaks + dependency/scheduler docs
r3624 at the system command line.
MinRK
clone parallel docs to parallelz
r3586
150 million digits of pi
========================
In this example we would like to study the distribution of digits in the
number pi (in base 10). While it is not known if pi is a normal number (a
number is normal in base 10 if 0-9 occur with equal likelihood) numerical
investigations suggest that it is. We will begin with a serial calculation on
10,000 digits of pi and then perform a parallel calculation involving 150
million digits.
In both the serial and parallel calculation we will be using functions defined
in the :file:`pidigits.py` file, which is available in the
MinRK
move parallel doc figures into 'figs' subdir...
r5168 :file:`docs/examples/parallel` directory of the IPython source distribution.
MinRK
clone parallel docs to parallelz
r3586 These functions provide basic facilities for working with the digits of pi and
can be loaded into IPython by putting :file:`pidigits.py` in your current
working directory and then doing:
.. sourcecode:: ipython
In [1]: run pidigits.py
Serial calculation
------------------
MinRK
newparallel tweaks, fixes...
r3622 For the serial calculation, we will use `SymPy <http://www.sympy.org>`_ to
MinRK
clone parallel docs to parallelz
r3586 calculate 10,000 digits of pi and then look at the frequencies of the digits
0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While
SymPy is capable of calculating many more digits of pi, our purpose here is to
set the stage for the much larger parallel calculation.
In this example, we use two functions from :file:`pidigits.py`:
:func:`one_digit_freqs` (which calculates how many times each digit occurs)
and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result).
Here is an interactive IPython session that uses these functions with
SymPy:
.. sourcecode:: ipython
In [7]: import sympy
In [8]: pi = sympy.pi.evalf(40)
In [9]: pi
Out[9]: 3.141592653589793238462643383279502884197
In [10]: pi = sympy.pi.evalf(10000)
In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits
In [12]: run pidigits.py # load one_digit_freqs/plot_one_digit_freqs
In [13]: freqs = one_digit_freqs(digits)
In [14]: plot_one_digit_freqs(freqs)
Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>]
The resulting plot of the single digit counts shows that each digit occurs
approximately 1,000 times, but that with only 10,000 digits the
statistical fluctuations are still rather large:
MinRK
move parallel doc figures into 'figs' subdir...
r5168 .. image:: figs/single_digits.*
MinRK
clone parallel docs to parallelz
r3586
It is clear that to reduce the relative fluctuations in the counts, we need
to look at many more digits of pi. That brings us to the parallel calculation.
Parallel calculation
--------------------
Calculating many digits of pi is a challenging computational problem in itself.
Because we want to focus on the distribution of digits in this example, we
will use pre-computed digit of pi from the website of Professor Yasumasa
MinRK
update parallel demos for newparallel
r3621 Kanada at the University of Tokyo (http://www.super-computing.org). These
MinRK
clone parallel docs to parallelz
r3586 digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/)
that each have 10 million digits of pi.
For the parallel calculation, we have copied these files to the local hard
drives of the compute nodes. A total of 15 of these files will be used, for a
total of 150 million digits of pi. To make things a little more interesting we
will calculate the frequencies of all 2 digits sequences (00-99) and then plot
the result using a 2D matrix in Matplotlib.
The overall idea of the calculation is simple: each IPython engine will
compute the two digit counts for the digits in a single file. Then in a final
step the counts from each engine will be added up. To perform this
calculation, we will need two top-level functions from :file:`pidigits.py`:
MinRK
move parallel doc figures into 'figs' subdir...
r5168 .. literalinclude:: ../../examples/parallel/pi/pidigits.py
MinRK
clone parallel docs to parallelz
r3586 :language: python
Thomas Kluyver
Update command line args format in parallel docs section.
r4196 :lines: 47-62
MinRK
clone parallel docs to parallelz
r3586
We will also use the :func:`plot_two_digit_freqs` function to plot the
results. The code to run this calculation in parallel is contained in
MinRK
move parallel doc figures into 'figs' subdir...
r5168 :file:`docs/examples/parallel/parallelpi.py`. This code can be run in parallel
MinRK
clone parallel docs to parallelz
r3586 using IPython by following these steps:
MinRK
update some parallel docs...
r5487 1. Use :command:`ipcluster` to start 15 engines. We used 16 cores of an SGE linux
cluster (1 controller + 15 engines).
MinRK
update parallel demos for newparallel
r3621 2. With the file :file:`parallelpi.py` in your current working directory, open
up IPython in pylab mode and type ``run parallelpi.py``. This will download
the pi files via ftp the first time you run it, if they are not
present in the Engines' working directory.
MinRK
clone parallel docs to parallelz
r3586
MinRK
update some parallel docs...
r5487 When run on our 16 cores, we observe a speedup of 14.2x. This is slightly
less than linear scaling (16x) because the controller is also running on one of
MinRK
clone parallel docs to parallelz
r3586 the cores.
To emphasize the interactive nature of IPython, we now show how the
calculation can also be run by simply typing the commands from
:file:`parallelpi.py` interactively into IPython:
.. sourcecode:: ipython
MinRK
move IPython.zmq.parallel to IPython.parallel
r3666 In [1]: from IPython.parallel import Client
MinRK
clone parallel docs to parallelz
r3586
MinRK
update parallel demos for newparallel
r3621 # The Client allows us to use the engines interactively.
# We simply pass Client the name of the cluster profile we
MinRK
clone parallel docs to parallelz
r3586 # are using.
MinRK
move IPython.zmq.parallel to IPython.parallel
r3666 In [2]: c = Client(profile='mycluster')
MinRK
update some parallel docs...
r5487 In [3]: v = c[:]
MinRK
clone parallel docs to parallelz
r3586
MinRK
update parallel demos for newparallel
r3621 In [3]: c.ids
MinRK
clone parallel docs to parallelz
r3586 Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
In [4]: run pidigits.py
MinRK
update parallel demos for newparallel
r3621 In [5]: filestring = 'pi200m.ascii.%(i)02dof20'
MinRK
clone parallel docs to parallelz
r3586
# Create the list of files to process.
In [6]: files = [filestring % {'i':i} for i in range(1,16)]
In [7]: files
Out[7]:
MinRK
update parallel demos for newparallel
r3621 ['pi200m.ascii.01of20',
'pi200m.ascii.02of20',
'pi200m.ascii.03of20',
'pi200m.ascii.04of20',
'pi200m.ascii.05of20',
'pi200m.ascii.06of20',
'pi200m.ascii.07of20',
'pi200m.ascii.08of20',
'pi200m.ascii.09of20',
'pi200m.ascii.10of20',
'pi200m.ascii.11of20',
'pi200m.ascii.12of20',
'pi200m.ascii.13of20',
'pi200m.ascii.14of20',
'pi200m.ascii.15of20']
# download the data files if they don't already exist:
MinRK
update API after sagedays29...
r3664 In [8]: v.map(fetch_pi_file, files)
MinRK
update parallel demos for newparallel
r3621
# This is the parallel calculation using the Client.map method
MinRK
clone parallel docs to parallelz
r3586 # which applies compute_two_digit_freqs to each file in files in parallel.
MinRK
update API after sagedays29...
r3664 In [9]: freqs_all = v.map(compute_two_digit_freqs, files)
MinRK
clone parallel docs to parallelz
r3586
# Add up the frequencies from each engine.
MinRK
update parallel demos for newparallel
r3621 In [10]: freqs = reduce_freqs(freqs_all)
MinRK
clone parallel docs to parallelz
r3586
MinRK
update parallel demos for newparallel
r3621 In [11]: plot_two_digit_freqs(freqs)
Out[11]: <matplotlib.image.AxesImage object at 0x18beb110>
MinRK
clone parallel docs to parallelz
r3586
MinRK
update parallel demos for newparallel
r3621 In [12]: plt.title('2 digit counts of 150m digits of pi')
Out[12]: <matplotlib.text.Text object at 0x18d1f9b0>
MinRK
clone parallel docs to parallelz
r3586
The resulting plot generated by Matplotlib is shown below. The colors indicate
which two digit sequences are more (red) or less (blue) likely to occur in the
first 150 million digits of pi. We clearly see that the sequence "41" is
most likely and that "06" and "07" are least likely. Further analysis would
show that the relative size of the statistical fluctuations have decreased
compared to the 10,000 digit calculation.
MinRK
move parallel doc figures into 'figs' subdir...
r5168 .. image:: figs/two_digit_counts.*
MinRK
clone parallel docs to parallelz
r3586
Parallel options pricing
========================
An option is a financial contract that gives the buyer of the contract the
right to buy (a "call") or sell (a "put") a secondary asset (a stock for
example) at a particular date in the future (the expiration date) for a
pre-agreed upon price (the strike price). For this right, the buyer pays the
seller a premium (the option price). There are a wide variety of flavors of
options (American, European, Asian, etc.) that are useful for different
purposes: hedging against risk, speculation, etc.
Much of modern finance is driven by the need to price these contracts
accurately based on what is known about the properties (such as volatility) of
the underlying asset. One method of pricing options is to use a Monte Carlo
simulation of the underlying asset price. In this example we use this approach
to price both European and Asian (path dependent) options for various strike
prices and volatilities.
MinRK
update some parallel docs...
r5487 The code for this example can be found in the :file:`docs/examples/parallel/options`
MinRK
clone parallel docs to parallelz
r3586 directory of the IPython source. The function :func:`price_options` in
MinRK
update some parallel docs...
r5487 :file:`mckernel.py` implements the basic Monte Carlo pricing algorithm using
MinRK
clone parallel docs to parallelz
r3586 the NumPy package and is shown here:
MinRK
update some parallel docs...
r5487 .. literalinclude:: ../../examples/parallel/options/mckernel.py
MinRK
clone parallel docs to parallelz
r3586 :language: python
MinRK
update parallel demos for newparallel
r3621 To run this code in parallel, we will use IPython's :class:`LoadBalancedView` class,
MinRK
clone parallel docs to parallelz
r3586 which distributes work to the engines using dynamic load balancing. This
MinRK
update parallel demos for newparallel
r3621 view is a wrapper of the :class:`Client` class shown in
the previous example. The parallel calculation using :class:`LoadBalancedView` can
MinRK
clone parallel docs to parallelz
r3586 be found in the file :file:`mcpricer.py`. The code in this file creates a
MinRK
move parallel doc figures into 'figs' subdir...
r5168 :class:`LoadBalancedView` instance and then submits a set of tasks using
:meth:`LoadBalancedView.apply` that calculate the option prices for different
MinRK
clone parallel docs to parallelz
r3586 volatilities and strike prices. The results are then plotted as a 2D contour
plot using Matplotlib.
MinRK
update some parallel docs...
r5487 .. literalinclude:: ../../examples/parallel/options/mcpricer.py
MinRK
clone parallel docs to parallelz
r3586 :language: python
MinRK
rebase IPython.parallel after removal of IPython.kernel...
r3672 To use this code, start an IPython cluster using :command:`ipcluster`, open
MinRK
move parallel doc figures into 'figs' subdir...
r5168 IPython in the pylab mode with the file :file:`mckernel.py` in your current
MinRK
clone parallel docs to parallelz
r3586 working directory and then type:
.. sourcecode:: ipython
MinRK
update some parallel docs...
r5487 In [7]: run mcpricer.py
Submitted tasks: 30
MinRK
clone parallel docs to parallelz
r3586
Once all the tasks have finished, the results can be plotted using the
:func:`plot_options` function. Here we make contour plots of the Asian
call and Asian put options as function of the volatility and strike price:
.. sourcecode:: ipython
MinRK
update some parallel docs...
r5487 In [8]: plot_options(sigma_vals, strike_vals, prices['acall'])
MinRK
clone parallel docs to parallelz
r3586
In [9]: plt.figure()
Out[9]: <matplotlib.figure.Figure object at 0x18c178d0>
MinRK
update some parallel docs...
r5487 In [10]: plot_options(sigma_vals, strike_vals, prices['aput'])
MinRK
clone parallel docs to parallelz
r3586
MinRK
update some parallel docs...
r5487 These results are shown in the two figures below. On our 15 engines, the
entire calculation (15 strike prices, 15 volatilities, 100,000 paths for each)
took 37 seconds in parallel, giving a speedup of 14.1x, which is comparable
MinRK
clone parallel docs to parallelz
r3586 to the speedup observed in our previous example.
MinRK
move parallel doc figures into 'figs' subdir...
r5168 .. image:: figs/asian_call.*
MinRK
clone parallel docs to parallelz
r3586
MinRK
move parallel doc figures into 'figs' subdir...
r5168 .. image:: figs/asian_put.*
MinRK
clone parallel docs to parallelz
r3586
Conclusion
==========
To conclude these examples, we summarize the key features of IPython's
parallel architecture that have been demonstrated:
* Serial code can be parallelized often with only a few extra lines of code.
MinRK
update parallel demos for newparallel
r3621 We have used the :class:`DirectView` and :class:`LoadBalancedView` classes
MinRK
clone parallel docs to parallelz
r3586 for this purpose.
* The resulting parallel code can be run without ever leaving the IPython's
interactive shell.
* Any data computed in parallel can be explored interactively through
visualization or further numerical calculations.
MinRK
update some parallel docs...
r5487 * We have run these examples on a cluster running RHEL 5 and Sun GridEngine.
IPython's built in support for SGE (and other batch systems) makes it easy
to get started with IPython's parallel capabilities.
MinRK
update parallel demos for newparallel
r3621