upstream/ipython Files · docs/source/parallel/parallel_demos.txt

Skip doctests where necessary.

Brian Granger - - Load All Authors

File last commit:

r2347:ef857c5c


                r3341:bd63ffa0

Download file

             parallel_demos.txt
        
                    282 lines
            
             | 11.2 KiB
            
                | text/plain
            
             |
                TextLexer

/ docs / source / parallel / parallel_demos.txt

History | Source | Raw |Copy content |Copy permalink

Brian Granger Initial draft of Windows HPC documentation.	r2344	=================
		Parallel examples
		=================

Brian Granger Work in the documentation.	r2345	In this section we describe two more involved examples of using an IPython
		cluster to perform a parallel computation. In these examples, we will be using
		IPython's "pylab" mode, which enables interactive plotting using the
		Matplotlib package. IPython can be started in this mode by typing::

		ipython -p pylab

		at the system command line. If this prints an error message, you will
		need to install the default profiles from within IPython by doing,

		.. sourcecode:: ipython

		In [1]: %install_profiles

		and then restarting IPython.
Brian Granger Initial draft of Windows HPC documentation.	r2344
		150 million digits of pi
		========================

		In this example we would like to study the distribution of digits in the
Brian Granger Work in the documentation.	r2345	number pi (in base 10). While it is not known if pi is a normal number (a
		number is normal in base 10 if 0-9 occur with equal likelihood) numerical
		investigations suggest that it is. We will begin with a serial calculation on
		10,000 digits of pi and then perform a parallel calculation involving 150
		million digits.

		In both the serial and parallel calculation we will be using functions defined
		in the :file:`pidigits.py` file, which is available in the
		:file:`docs/examples/kernel` directory of the IPython source distribution.
		These functions provide basic facilities for working with the digits of pi and
		can be loaded into IPython by putting :file:`pidigits.py` in your current
		working directory and then doing:

		.. sourcecode:: ipython

		In [1]: run pidigits.py

		Serial calculation
		------------------

		For the serial calculation, we will use SymPy (http://www.sympy.org) to
		calculate 10,000 digits of pi and then look at the frequencies of the digits
		0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While
		SymPy is capable of calculating many more digits of pi, our purpose here is to
		set the stage for the much larger parallel calculation.

		In this example, we use two functions from :file:`pidigits.py`:
		:func:`one_digit_freqs` (which calculates how many times each digit occurs)
		and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result).
		Here is an interactive IPython session that uses these functions with
		SymPy:

		.. sourcecode:: ipython

		In [7]: import sympy

		In [8]: pi = sympy.pi.evalf(40)

		In [9]: pi
		Out[9]: 3.141592653589793238462643383279502884197

		In [10]: pi = sympy.pi.evalf(10000)

		In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits

		In [12]: run pidigits.py # load one_digit_freqs/plot_one_digit_freqs

		In [13]: freqs = one_digit_freqs(digits)

		In [14]: plot_one_digit_freqs(freqs)
		Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>]

		The resulting plot of the single digit counts shows that each digit occurs
		approximately 1,000 times, but that with only 10,000 digits the
		statistical fluctuations are still rather large:

		.. image:: single_digits.*

		It is clear that to reduce the relative fluctuations in the counts, we need
		to look at many more digits of pi. That brings us to the parallel calculation.

		Parallel calculation
		--------------------

		Calculating many digits of pi is a challenging computational problem in itself.
		Because we want to focus on the distribution of digits in this example, we
		will use pre-computed digit of pi from the website of Professor Yasumasa
		Kanada at the University of Tokoyo (http://www.super-computing.org). These
		digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/)
		that each have 10 million digits of pi.

		For the parallel calculation, we have copied these files to the local hard
		drives of the compute nodes. A total of 15 of these files will be used, for a
		total of 150 million digits of pi. To make things a little more interesting we
		will calculate the frequencies of all 2 digits sequences (00-99) and then plot
		the result using a 2D matrix in Matplotlib.

		The overall idea of the calculation is simple: each IPython engine will
		compute the two digit counts for the digits in a single file. Then in a final
		step the counts from each engine will be added up. To perform this
		calculation, we will need two top-level functions from :file:`pidigits.py`:

		.. literalinclude:: ../../examples/kernel/pidigits.py
		:language: python
		:lines: 34-49

		We will also use the :func:`plot_two_digit_freqs` function to plot the
		results. The code to run this calculation in parallel is contained in
		:file:`docs/examples/kernel/parallelpi.py`. This code can be run in parallel
		using IPython by following these steps:

		1. Copy the text files with the digits of pi
		(ftp://pi.super-computing.org/.2/pi200m/) to the working directory of the
		engines on the compute nodes.
Brian Granger Final work on the Win HPC whitepaper.	r2347	2. Use :command:`ipcluster` to start 15 engines. We used an 8 core (2 quad
		core CPUs) cluster with hyperthreading enabled which makes the 8 cores
		looks like 16 (1 controller + 15 engines) in the OS. However, the maximum
		speedup we can observe is still only 8x.
Brian Granger Work in the documentation.	r2345	3. With the file :file:`parallelpi.py` in your current working directory, open
		up IPython in pylab mode and type ``run parallelpi.py``.

		When run on our 8 core cluster, we observe a speedup of 7.7x. This is slightly
		less than linear scaling (8x) because the controller is also running on one of
		the cores.

		To emphasize the interactive nature of IPython, we now show how the
		calculation can also be run by simply typing the commands from
		:file:`parallelpi.py` interactively into IPython:

		.. sourcecode:: ipython

		In [1]: from IPython.kernel import client
		2009-11-19 11:32:38-0800 [-] Log opened.

Brian Granger Final work on the Win HPC whitepaper.	r2347	# The MultiEngineClient allows us to use the engines interactively.
		# We simply pass MultiEngineClient the name of the cluster profile we
		# are using.
Brian Granger Work in the documentation.	r2345	In [2]: mec = client.MultiEngineClient(profile='mycluster')
		2009-11-19 11:32:44-0800 [-] Connecting [0]
		2009-11-19 11:32:44-0800 [Negotiation,client] Connected: ./ipcontroller-mec.furl

		In [3]: mec.get_ids()
		Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

		In [4]: run pidigits.py

		In [5]: filestring = 'pi200m-ascii-%(i)02dof20.txt'

Brian Granger Final work on the Win HPC whitepaper.	r2347	# Create the list of files to process.
Brian Granger Work in the documentation.	r2345	In [6]: files = [filestring % {'i':i} for i in range(1,16)]

		In [7]: files
		Out[7]:
		['pi200m-ascii-01of20.txt',
		'pi200m-ascii-02of20.txt',
		'pi200m-ascii-03of20.txt',
		'pi200m-ascii-04of20.txt',
		'pi200m-ascii-05of20.txt',
		'pi200m-ascii-06of20.txt',
		'pi200m-ascii-07of20.txt',
		'pi200m-ascii-08of20.txt',
		'pi200m-ascii-09of20.txt',
		'pi200m-ascii-10of20.txt',
		'pi200m-ascii-11of20.txt',
		'pi200m-ascii-12of20.txt',
		'pi200m-ascii-13of20.txt',
		'pi200m-ascii-14of20.txt',
		'pi200m-ascii-15of20.txt']

		# This is the parallel calculation using the MultiEngineClient.map method
		# which applies compute_two_digit_freqs to each file in files in parallel.
		In [8]: freqs_all = mec.map(compute_two_digit_freqs, files)

		# Add up the frequencies from each engine.
		In [8]: freqs = reduce_freqs(freqs_all)

		In [9]: plot_two_digit_freqs(freqs)
		Out[9]: <matplotlib.image.AxesImage object at 0x18beb110>

		In [10]: plt.title('2 digit counts of 150m digits of pi')
		Out[10]: <matplotlib.text.Text object at 0x18d1f9b0>

		The resulting plot generated by Matplotlib is shown below. The colors indicate
		which two digit sequences are more (red) or less (blue) likely to occur in the
		first 150 million digits of pi. We clearly see that the sequence "41" is
		most likely and that "06" and "07" are least likely. Further analysis would
		show that the relative size of the statistical fluctuations have decreased
		compared to the 10,000 digit calculation.

		.. image:: two_digit_counts.*


		Parallel options pricing
		========================

		An option is a financial contract that gives the buyer of the contract the
		right to buy (a "call") or sell (a "put") a secondary asset (a stock for
		example) at a particular date in the future (the expiration date) for a
		pre-agreed upon price (the strike price). For this right, the buyer pays the
		seller a premium (the option price). There are a wide variety of flavors of
		options (American, European, Asian, etc.) that are useful for different
		purposes: hedging against risk, speculation, etc.

		Much of modern finance is driven by the need to price these contracts
		accurately based on what is known about the properties (such as volatility) of
		the underlying asset. One method of pricing options is to use a Monte Carlo
Brian Granger Final work on the Win HPC whitepaper.	r2347	simulation of the underlying asset price. In this example we use this approach
		to price both European and Asian (path dependent) options for various strike
Brian Granger Work in the documentation.	r2345	prices and volatilities.

		The code for this example can be found in the :file:`docs/examples/kernel`
Brian Granger Final work on the Win HPC whitepaper.	r2347	directory of the IPython source. The function :func:`price_options` in
		:file:`mcpricer.py` implements the basic Monte Carlo pricing algorithm using
		the NumPy package and is shown here:
Brian Granger Work in the documentation.	r2345
		.. literalinclude:: ../../examples/kernel/mcpricer.py
		:language: python

Brian Granger Final work on the Win HPC whitepaper.	r2347	To run this code in parallel, we will use IPython's :class:`TaskClient` class,
		which distributes work to the engines using dynamic load balancing. This
		client can be used along side the :class:`MultiEngineClient` class shown in
		the previous example. The parallel calculation using :class:`TaskClient` can
		be found in the file :file:`mcpricer.py`. The code in this file creates a
		:class:`TaskClient` instance and then submits a set of tasks using
		:meth:`TaskClient.run` that calculate the option prices for different
		volatilities and strike prices. The results are then plotted as a 2D contour
		plot using Matplotlib.
Brian Granger Work in the documentation.	r2345
		.. literalinclude:: ../../examples/kernel/mcdriver.py
		:language: python

Brian Granger Final work on the Win HPC whitepaper.	r2347	To use this code, start an IPython cluster using :command:`ipcluster`, open
		IPython in the pylab mode with the file :file:`mcdriver.py` in your current
		working directory and then type:
Brian Granger Work in the documentation.	r2345
		.. sourcecode:: ipython

		In [7]: run mcdriver.py
		Submitted tasks: [0, 1, 2, ...]

		Once all the tasks have finished, the results can be plotted using the
		:func:`plot_options` function. Here we make contour plots of the Asian
Brian Granger Final work on the Win HPC whitepaper.	r2347	call and Asian put options as function of the volatility and strike price:
Brian Granger Work in the documentation.	r2345
		.. sourcecode:: ipython

		In [8]: plot_options(sigma_vals, K_vals, prices['acall'])

		In [9]: plt.figure()
		Out[9]: <matplotlib.figure.Figure object at 0x18c178d0>

		In [10]: plot_options(sigma_vals, K_vals, prices['aput'])

Brian Granger Final work on the Win HPC whitepaper.	r2347	These results are shown in the two figures below. On a 8 core cluster the
		entire calculation (10 strike prices, 10 volatilities, 100,000 paths for each)
		took 30 seconds in parallel, giving a speedup of 7.7x, which is comparable
		to the speedup observed in our previous example.
Brian Granger Work in the documentation.	r2345
		.. image:: asian_call.*
Brian Granger Initial draft of Windows HPC documentation.	r2344
Brian Granger Work in the documentation.	r2345	.. image:: asian_put.*
Brian Granger Final work on the Win HPC whitepaper.	r2347
		Conclusion
		==========

		To conclude these examples, we summarize the key features of IPython's
		parallel architecture that have been demonstrated:

		* Serial code can be parallelized often with only a few extra lines of code.
		We have used the :class:`MultiEngineClient` and :class:`TaskClient` classes
		for this purpose.
		* The resulting parallel code can be run without ever leaving the IPython's
		interactive shell.
		* Any data computed in parallel can be explored interactively through
		visualization or further numerical calculations.
		* We have run these examples on a cluster running Windows HPC Server 2008.
		IPython's built in support for the Windows HPC job scheduler makes it
		easy to get started with IPython's parallel capabilities.

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages