parallel_winhpc.txt
333 lines
| 13.9 KiB
| text/plain
|
TextLexer
Brian Granger
|
r2347 | ============================================ | ||
Getting started with Windows HPC Server 2008 | ||||
============================================ | ||||
Brian Granger
|
r2344 | |||
Introduction | ||||
============ | ||||
Brian Granger
|
r2347 | The Python programming language is an increasingly popular language for | ||
numerical computing. This is due to a unique combination of factors. First, | ||||
Python is a high-level and *interactive* language that is well matched to | ||||
interactive numerical work. Second, it is easy (often times trivial) to | ||||
integrate legacy C/C++/Fortran code into Python. Third, a large number of | ||||
high-quality open source projects provide all the needed building blocks for | ||||
numerical computing: numerical arrays (NumPy), algorithms (SciPy), 2D/3D | ||||
Visualization (Matplotlib, Mayavi, Chaco), Symbolic Mathematics (Sage, Sympy) | ||||
and others. | ||||
Brian Granger
|
r2345 | |||
The IPython project is a core part of this open-source toolchain and is | ||||
focused on creating a comprehensive environment for interactive and | ||||
exploratory computing in the Python programming language. It enables all of | ||||
the above tools to be used interactively and consists of two main components: | ||||
Brian Granger
|
r2344 | |||
* An enhanced interactive Python shell with support for interactive plotting | ||||
and visualization. | ||||
* An architecture for interactive parallel computing. | ||||
With these components, it is possible to perform all aspects of a parallel | ||||
Brian Granger
|
r2345 | computation interactively. This type of workflow is particularly relevant in | ||
scientific and numerical computing where algorithms, code and data are | ||||
continually evolving as the user/developer explores a problem. The broad | ||||
treads in computing (commodity clusters, multicore, cloud computing, etc.) | ||||
make these capabilities of IPython particularly relevant. | ||||
While IPython is a cross platform tool, it has particularly strong support for | ||||
Windows based compute clusters running Windows HPC Server 2008. This document | ||||
describes how to get started with IPython on Windows HPC Server 2008. The | ||||
content and emphasis here is practical: installing IPython, configuring | ||||
IPython to use the Windows job scheduler and running example parallel programs | ||||
interactively. A more complete description of IPython's parallel computing | ||||
capabilities can be found in IPython's online documentation | ||||
Brian Granger
|
r2344 | (http://ipython.scipy.org/moin/Documentation). | ||
Setting up your Windows cluster | ||||
=============================== | ||||
This document assumes that you already have a cluster running Windows | ||||
HPC Server 2008. Here is a broad overview of what is involved with setting up | ||||
such a cluster: | ||||
1. Install Windows Server 2008 on the head and compute nodes in the cluster. | ||||
2. Setup the network configuration on each host. Each host should have a | ||||
static IP address. | ||||
3. On the head node, activate the "Active Directory Domain Services" role | ||||
and make the head node the domain controller. | ||||
4. Join the compute nodes to the newly created Active Directory (AD) domain. | ||||
5. Setup user accounts in the domain with shared home directories. | ||||
6. Install the HPC Pack 2008 on the head node to create a cluster. | ||||
7. Install the HPC Pack 2008 on the compute nodes. | ||||
More details about installing and configuring Windows HPC Server 2008 can be | ||||
found on the Windows HPC Home Page (http://www.microsoft.com/hpc). Regardless | ||||
Brian Granger
|
r2345 | of what steps you follow to set up your cluster, the remainder of this | ||
Brian Granger
|
r2344 | document will assume that: | ||
* There are domain users that can log on to the AD domain and submit jobs | ||||
to the cluster scheduler. | ||||
* These domain users have shared home directories. While shared home | ||||
directories are not required to use IPython, they make it much easier to | ||||
use IPython. | ||||
Installation of IPython and its dependencies | ||||
============================================ | ||||
IPython and all of its dependencies are freely available and open source. | ||||
These packages provide a powerful and cost-effective approach to numerical and | ||||
scientific computing on Windows. The following dependencies are needed to run | ||||
IPython on Windows: | ||||
* Python 2.5 or 2.6 (http://www.python.org) | ||||
* pywin32 (http://sourceforge.net/projects/pywin32/) | ||||
* PyReadline (https://launchpad.net/pyreadline) | ||||
* zope.interface and Twisted (http://twistedmatrix.com) | ||||
* Foolcap (http://foolscap.lothar.com/trac) | ||||
* pyOpenSSL (https://launchpad.net/pyopenssl) | ||||
* IPython (http://ipython.scipy.org) | ||||
Brian Granger
|
r2345 | In addition, the following dependencies are needed to run the demos described | ||
in this document. | ||||
Brian Granger
|
r2344 | |||
* NumPy and SciPy (http://www.scipy.org) | ||||
* wxPython (http://www.wxpython.org) | ||||
* Matplotlib (http://matplotlib.sourceforge.net/) | ||||
The easiest way of obtaining these dependencies is through the Enthought | ||||
Python Distribution (EPD) (http://www.enthought.com/products/epd.php). EPD is | ||||
produced by Enthought, Inc. and contains all of these packages and others in a | ||||
single installer and is available free for academic users. While it is also | ||||
possible to download and install each package individually, this is a tedious | ||||
process. Thus, we highly recommend using EPD to install these packages on | ||||
Windows. | ||||
Regardless of how you install the dependencies, here are the steps you will | ||||
need to follow: | ||||
1. Install all of the packages listed above, either individually or using EPD | ||||
on the head node, compute nodes and user workstations. | ||||
2. Make sure that :file:`C:\\Python25` and :file:`C:\\Python25\\Scripts` are | ||||
in the system :envvar:`%PATH%` variable on each node. | ||||
3. Install the latest development version of IPython. This can be done by | ||||
downloading the the development version from the IPython website | ||||
(http://ipython.scipy.org) and following the installation instructions. | ||||
Further details about installing IPython or its dependencies can be found in | ||||
the online IPython documentation (http://ipython.scipy.org/moin/Documentation) | ||||
Once you are finished with the installation, you can try IPython out by | ||||
Brian Granger
|
r2345 | opening a Windows Command Prompt and typing ``ipython``. This will | ||
Brian Granger
|
r2344 | start IPython's interactive shell and you should see something like the | ||
following screenshot: | ||||
.. image:: ipython_shell.* | ||||
Starting an IPython cluster | ||||
=========================== | ||||
To use IPython's parallel computing capabilities, you will need to start an | ||||
IPython cluster. An IPython cluster consists of one controller and multiple | ||||
engines: | ||||
IPython controller | ||||
The IPython controller manages the engines and acts as a gateway between | ||||
the engines and the client, which runs in the user's interactive IPython | ||||
session. The controller is started using the :command:`ipcontroller` | ||||
command. | ||||
IPython engine | ||||
Brian Granger
|
r2345 | IPython engines run a user's Python code in parallel on the compute nodes. | ||
Brian Granger
|
r2344 | Engines are starting using the :command:`ipengine` command. | ||
Once these processes are started, a user can run Python code interactively and | ||||
Brian Granger
|
r2345 | in parallel on the engines from within the IPython shell using an appropriate | ||
client. This includes the ability to interact with, plot and visualize data | ||||
from the engines. | ||||
Brian Granger
|
r2344 | |||
Brian Granger
|
r2345 | IPython has a command line program called :command:`ipcluster` that automates | ||
Brian Granger
|
r2344 | all aspects of starting the controller and engines on the compute nodes. | ||
:command:`ipcluster` has full support for the Windows HPC job scheduler, | ||||
meaning that :command:`ipcluster` can use this job scheduler to start the | ||||
controller and engines. In our experience, the Windows HPC job scheduler is | ||||
particularly well suited for interactive applications, such as IPython. Once | ||||
:command:`ipcluster` is configured properly, a user can start an IPython | ||||
cluster from their local workstation almost instantly, without having to log | ||||
on to the head node (as is typically required by Unix based job schedulers). | ||||
This enables a user to move seamlessly between serial and parallel | ||||
computations. | ||||
In this section we show how to use :command:`ipcluster` to start an IPython | ||||
cluster using the Windows HPC Server 2008 job scheduler. To make sure that | ||||
:command:`ipcluster` is installed and working properly, you should first try | ||||
to start an IPython cluster on your local host. To do this, open a Windows | ||||
Command Prompt and type the following command:: | ||||
ipcluster start -n 2 | ||||
You should see a number of messages printed to the screen, ending with | ||||
Brian Granger
|
r2345 | "IPython cluster: started". The result should look something like the following | ||
screenshot: | ||||
Brian Granger
|
r2344 | |||
.. image:: ipcluster_start.* | ||||
At this point, the controller and two engines are running on your local host. | ||||
Brian Granger
|
r2345 | This configuration is useful for testing and for situations where you want to | ||
take advantage of multiple cores on your local computer. | ||||
Brian Granger
|
r2344 | |||
Now that we have confirmed that :command:`ipcluster` is working properly, we | ||||
Brian Granger
|
r2345 | describe how to configure and run an IPython cluster on an actual compute | ||
cluster running Windows HPC Server 2008. Here is an outline of the needed | ||||
steps: | ||||
Brian Granger
|
r2344 | |||
Brian Granger
|
r2345 | 1. Create a cluster profile using: ``ipcluster create -p mycluster`` | ||
Brian Granger
|
r2344 | |||
Brian Granger
|
r2345 | 2. Edit configuration files in the directory :file:`.ipython\\cluster_mycluster` | ||
Brian Granger
|
r2344 | |||
Brian Granger
|
r2345 | 3. Start the cluster using: ``ipcluser start -p mycluster -n 32`` | ||
Brian Granger
|
r2344 | |||
Creating a cluster profile | ||||
-------------------------- | ||||
Brian Granger
|
r2345 | In most cases, you will have to create a cluster profile to use IPython on a | ||
cluster. A cluster profile is a name (like "mycluster") that is associated | ||||
with a particular cluster configuration. The profile name is used by | ||||
:command:`ipcluster` when working with the cluster. | ||||
Associated with each cluster profile is a cluster directory. This cluster | ||||
directory is a specially named directory (typically located in the | ||||
:file:`.ipython` subdirectory of your home directory) that contains the | ||||
configuration files for a particular cluster profile, as well as log files and | ||||
security keys. The naming convention for cluster directories is: | ||||
:file:`cluster_<profile name>`. Thus, the cluster directory for a profile named | ||||
"foo" would be :file:`.ipython\\cluster_foo`. | ||||
Brian Granger
|
r2344 | |||
Brian Granger
|
r2345 | To create a new cluster profile (named "mycluster") and the associated cluster | ||
directory, type the following command at the Windows Command Prompt:: | ||||
Brian Granger
|
r2344 | |||
ipcluster create -p mycluster | ||||
The output of this command is shown in the screenshot below. Notice how | ||||
:command:`ipcluster` prints out the location of the newly created cluster | ||||
directory. | ||||
.. image:: ipcluster_create.* | ||||
Configuring a cluster profile | ||||
----------------------------- | ||||
Next, you will need to configure the newly created cluster profile by editing | ||||
the following configuration files in the cluster directory: | ||||
* :file:`ipcluster_config.py` | ||||
* :file:`ipcontroller_config.py` | ||||
* :file:`ipengine_config.py` | ||||
When :command:`ipcluster` is run, these configuration files are used to | ||||
determine how the engines and controller will be started. In most cases, | ||||
you will only have to set a few of the attributes in these files. | ||||
To configure :command:`ipcluster` to use the Windows HPC job scheduler, you | ||||
will need to edit the following attributes in the file | ||||
:file:`ipcluster_config.py`:: | ||||
# Set these at the top of the file to tell ipcluster to use the | ||||
# Windows HPC job scheduler. | ||||
c.Global.controller_launcher = \ | ||||
'IPython.kernel.launcher.WindowsHPCControllerLauncher' | ||||
c.Global.engine_launcher = \ | ||||
'IPython.kernel.launcher.WindowsHPCEngineSetLauncher' | ||||
# Set these to the host name of the scheduler (head node) of your cluster. | ||||
c.WindowsHPCControllerLauncher.scheduler = 'HEADNODE' | ||||
c.WindowsHPCEngineSetLauncher.scheduler = 'HEADNODE' | ||||
There are a number of other configuration attributes that can be set, but | ||||
in most cases these will be sufficient to get you started. | ||||
.. warning:: | ||||
If any of your configuration attributes involve specifying the location | ||||
of shared directories or files, you must make sure that you use UNC paths | ||||
like :file:`\\\\host\\share`. It is also important that you specify | ||||
Brian Granger
|
r2345 | these paths using raw Python strings: ``r'\\host\share'`` to make sure | ||
that the backslashes are properly escaped. | ||||
Brian Granger
|
r2344 | |||
Starting the cluster profile | ||||
---------------------------- | ||||
Once a cluster profile has been configured, starting an IPython cluster using | ||||
Brian Granger
|
r2345 | the profile is simple:: | ||
Brian Granger
|
r2344 | |||
ipcluster start -p mycluster -n 32 | ||||
Brian Granger
|
r2345 | The ``-n`` option tells :command:`ipcluster` how many engines to start (in | ||
this case 32). Stopping the cluster is as simple as typing Control-C. | ||||
Brian Granger
|
r2344 | |||
Using the HPC Job Manager | ||||
------------------------- | ||||
When ``ipcluster start`` is run the first time, :command:`ipcluster` creates | ||||
two XML job description files in the cluster directory: | ||||
* :file:`ipcontroller_job.xml` | ||||
* :file:`ipengineset_job.xml` | ||||
Once these files have been created, they can be imported into the HPC Job | ||||
Manager application. Then, the controller and engines for that profile can be | ||||
started using the HPC Job Manager directly, without using :command:`ipcluster`. | ||||
However, anytime the cluster profile is re-configured, ``ipcluster start`` | ||||
Brian Granger
|
r2345 | must be run again to regenerate the XML job description files. The | ||
Brian Granger
|
r2344 | following screenshot shows what the HPC Job Manager interface looks like | ||
with a running IPython cluster. | ||||
.. image:: hpc_job_manager.* | ||||
Performing a simple interactive parallel computation | ||||
==================================================== | ||||
Once you have started your IPython cluster, you can start to use it. To do | ||||
Brian Granger
|
r2345 | this, open up a new Windows Command Prompt and start up IPython's interactive | ||
shell by typing:: | ||||
Brian Granger
|
r2344 | |||
ipython | ||||
Brian Granger
|
r2345 | Then you can create a :class:`MultiEngineClient` instance for your profile and | ||
use the resulting instance to do a simple interactive parallel computation. In | ||||
the code and screenshot that follows, we take a simple Python function and | ||||
apply it to each element of an array of integers in parallel using the | ||||
:meth:`MultiEngineClient.map` method: | ||||
.. sourcecode:: ipython | ||||
In [1]: from IPython.kernel.client import * | ||||
In [2]: mec = MultiEngineClient(profile='mycluster') | ||||
In [3]: mec.get_ids() | ||||
Out[3]: [0, 1, 2, 3, 4, 5, 67, 8, 9, 10, 11, 12, 13, 14] | ||||
In [4]: def f(x): | ||||
...: return x**10 | ||||
In [5]: mec.map(f, range(15)) # f is applied in parallel | ||||
Out[5]: | ||||
[0, | ||||
1, | ||||
1024, | ||||
59049, | ||||
1048576, | ||||
9765625, | ||||
60466176, | ||||
282475249, | ||||
1073741824, | ||||
3486784401L, | ||||
10000000000L, | ||||
25937424601L, | ||||
61917364224L, | ||||
137858491849L, | ||||
289254654976L] | ||||
Brian Granger
|
r2344 | |||
The :meth:`map` method has the same signature as Python's builtin :func:`map` | ||||
function, but runs the calculation in parallel. More involved examples of using | ||||
:class:`MultiEngineClient` are provided in the examples that follow. | ||||
.. image:: mec_simple.* | ||||