upstream/ipython Files · docs/source/parallel/parallel_demos.txt

Initial draft of Windows HPC documentation.

Brian Granger - - Load All Authors

File last commit:

r2344:8446fc51


                r2344:8446fc51

Download file

             parallel_demos.txt
        
                    126 lines
            
             | 4.3 KiB
            
                | text/plain
            
             |
                TextLexer

/ docs / source / parallel / parallel_demos.txt

History | Source | Raw |Copy content |Copy permalink

Brian Granger Initial draft of Windows HPC documentation.	r2344	=================
		Parallel examples
		=================

		In this section we describe a few more involved examples of using an IPython
		cluster to perform a parallel computation.

		150 million digits of pi
		========================

		In this example we would like to study the distribution of digits in the
		number pi. More specifically, we are going to study how often each 2
		digits sequence occurs in the first 150 million digits of pi. If the digits
		0-9 occur with equal probability, we expect that each two digits sequence
		(00, 01, ..., 99) will occur 1% of the time.

		This examples uses precomputed digits of pi from the website of Professor
		Yasumasa Kanada at the University of Tokoyo (http://www.super-computing.org).
		These digits come in a set of ``.txt`` files
		(ftp://pi.super-computing.org/.2/pi200m/) that each have 10 million digits of
		pi. In the parallel computation, we will use the :meth:`MultiEngineClient.map`
		method to have each engine compute the desired statistics on a subset of these
		files. Before I started the parallel computation, I copied the data files
		to the compute nodes so the engine have fast access to them.

		Here are the Python functions for counting the frequencies of each two digit
		sequence in serial::

		def compute_two_digit_freqs(filename):
		"""
		Compute the two digit frequencies from a single file.
		"""
		d = txt_file_to_digits(filename)
		freqs = two_digit_freqs(d)
		return freqs

		def txt_file_to_digits(filename, the_type=str):
		"""
		Yield the digits of pi read from a .txt file.
		"""
		with open(filename, 'r') as f:
		for line in f.readlines():
		for c in line:
		if c != '\n' and c!= ' ':
		yield the_type(c)

		def two_digit_freqs(digits, normalize=False):
		"""
		Consume digits of pi and compute 2 digits freq. counts.
		"""
		freqs = np.zeros(100, dtype='i4')
		last = digits.next()
		this = digits.next()
		for d in digits:
		index = int(last + this)
		freqs[index] += 1
		last = this
		this = d
		if normalize:
		freqs = freqs/freqs.sum()
		return freqs

		These functions are defined in the file :file:`pidigits.py`. To perform the
		calculation in parallel, we use an additional file: :file:`parallelpi.py`::

		from IPython.kernel import client
		from matplotlib import pyplot as plt
		import numpy as np
		from pidigits import *
		from timeit import default_timer as clock

		# Files with digits of pi (10m digits each)
		filestring = 'pi200m-ascii-%(i)02dof20.txt'
		files = [filestring % {'i':i} for i in range(1,16)]

		# A function for reducing the frequencies calculated
		# by different engines.
		def reduce_freqs(freqlist):
		allfreqs = np.zeros_like(freqlist[0])
		for f in freqlist:
		allfreqs += f
		return allfreqs

		# Connect to the IPython cluster
		mec = client.MultiEngineClient(profile='mycluster')
		mec.run('pidigits.py')

		# Run 10m digits on 1 engine
		mapper = mec.mapper(targets=0)
		t1 = clock()
		freqs10m = mapper.map(compute_two_digit_freqs, files[:1])[0]
		t2 = clock()
		digits_per_second1 = 10.0e6/(t2-t1)
		print "Digits per second (1 core, 10m digits): ", digits_per_second1

		# Run 150m digits on 15 engines (8 cores)
		t1 = clock()
		freqs_all = mec.map(compute_two_digit_freqs, files[:len(mec)])
		freqs150m = reduce_freqs(freqs_all)
		t2 = clock()
		digits_per_second8 = 150.0e6/(t2-t1)
		print "Digits per second (8 cores, 150m digits): ", digits_per_second8

		print "Speedup: ", digits_per_second8/digits_per_second1

		plot_two_digit_freqs(freqs150m)
		plt.title("2 digit sequences in 150m digits of pi")

		To run this code on an IPython cluster:

		1. Start an IPython cluster with 15 engines: ``ipcluster start -p mycluster -n 15``
		2. Open IPython's interactive shell using the pylab profile
		``ipython -p pylab`` and type ``run parallelpi.py``.

		At this point, the parallel calculation will begin. On a small an 8 core
		cluster, we observe a speedup of 7.7x. The resulting plot of the two digit
		sequences is shown in the following screenshot.

		.. image:: parallel_pi.*


		Parallel option pricing
		=======================

		The example will be added at a later point.

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages