##// END OF EJS Templates
Initial draft of Windows HPC documentation.
Initial draft of Windows HPC documentation.

File last commit:

r2344:8446fc51
r2344:8446fc51
Show More
parallel_demos.txt
126 lines | 4.3 KiB | text/plain | TextLexer
=================
Parallel examples
=================
In this section we describe a few more involved examples of using an IPython
cluster to perform a parallel computation.
150 million digits of pi
========================
In this example we would like to study the distribution of digits in the
number pi. More specifically, we are going to study how often each 2
digits sequence occurs in the first 150 million digits of pi. If the digits
0-9 occur with equal probability, we expect that each two digits sequence
(00, 01, ..., 99) will occur 1% of the time.
This examples uses precomputed digits of pi from the website of Professor
Yasumasa Kanada at the University of Tokoyo (http://www.super-computing.org).
These digits come in a set of ``.txt`` files
(ftp://pi.super-computing.org/.2/pi200m/) that each have 10 million digits of
pi. In the parallel computation, we will use the :meth:`MultiEngineClient.map`
method to have each engine compute the desired statistics on a subset of these
files. Before I started the parallel computation, I copied the data files
to the compute nodes so the engine have fast access to them.
Here are the Python functions for counting the frequencies of each two digit
sequence in serial::
def compute_two_digit_freqs(filename):
"""
Compute the two digit frequencies from a single file.
"""
d = txt_file_to_digits(filename)
freqs = two_digit_freqs(d)
return freqs
def txt_file_to_digits(filename, the_type=str):
"""
Yield the digits of pi read from a .txt file.
"""
with open(filename, 'r') as f:
for line in f.readlines():
for c in line:
if c != '\n' and c!= ' ':
yield the_type(c)
def two_digit_freqs(digits, normalize=False):
"""
Consume digits of pi and compute 2 digits freq. counts.
"""
freqs = np.zeros(100, dtype='i4')
last = digits.next()
this = digits.next()
for d in digits:
index = int(last + this)
freqs[index] += 1
last = this
this = d
if normalize:
freqs = freqs/freqs.sum()
return freqs
These functions are defined in the file :file:`pidigits.py`. To perform the
calculation in parallel, we use an additional file: :file:`parallelpi.py`::
from IPython.kernel import client
from matplotlib import pyplot as plt
import numpy as np
from pidigits import *
from timeit import default_timer as clock
# Files with digits of pi (10m digits each)
filestring = 'pi200m-ascii-%(i)02dof20.txt'
files = [filestring % {'i':i} for i in range(1,16)]
# A function for reducing the frequencies calculated
# by different engines.
def reduce_freqs(freqlist):
allfreqs = np.zeros_like(freqlist[0])
for f in freqlist:
allfreqs += f
return allfreqs
# Connect to the IPython cluster
mec = client.MultiEngineClient(profile='mycluster')
mec.run('pidigits.py')
# Run 10m digits on 1 engine
mapper = mec.mapper(targets=0)
t1 = clock()
freqs10m = mapper.map(compute_two_digit_freqs, files[:1])[0]
t2 = clock()
digits_per_second1 = 10.0e6/(t2-t1)
print "Digits per second (1 core, 10m digits): ", digits_per_second1
# Run 150m digits on 15 engines (8 cores)
t1 = clock()
freqs_all = mec.map(compute_two_digit_freqs, files[:len(mec)])
freqs150m = reduce_freqs(freqs_all)
t2 = clock()
digits_per_second8 = 150.0e6/(t2-t1)
print "Digits per second (8 cores, 150m digits): ", digits_per_second8
print "Speedup: ", digits_per_second8/digits_per_second1
plot_two_digit_freqs(freqs150m)
plt.title("2 digit sequences in 150m digits of pi")
To run this code on an IPython cluster:
1. Start an IPython cluster with 15 engines: ``ipcluster start -p mycluster -n 15``
2. Open IPython's interactive shell using the pylab profile
``ipython -p pylab`` and type ``run parallelpi.py``.
At this point, the parallel calculation will begin. On a small an 8 core
cluster, we observe a speedup of 7.7x. The resulting plot of the two digit
sequences is shown in the following screenshot.
.. image:: parallel_pi.*
Parallel option pricing
=======================
The example will be added at a later point.