upstream/ipython Commit - r4199:81d2608c

Update info in parallel docs.

Thomas Kluyver -

r4199:81d2608c

parent child

docs/source/parallel/parallel_mpi.txt

0 0 -5

              .. _parallelmpi:
              =======================
              Using MPI with IPython
              =======================
-             .. note::
-                 Not adapted to zmq yet
-                 This is out of date wrt ipcluster in general as well
              Often, a parallel algorithm will require moving data between the engines. One
              way of accomplishing this is by doing a pull and then a push using the
              multiengine client. However, this will be slow as all the data has to go
              through the controller to the client and then back through the controller, to
              its final destination.
              A much better way of moving data between engines is to use a message passing
              library, such as the Message Passing Interface (MPI) [MPI]_. IPython's
              parallel computing architecture has been designed from the ground up to
              integrate with MPI. This document describes how to use MPI with IPython.
              Additional installation requirements
              ====================================
              If you want to use MPI with IPython, you will need to install:
              * A standard MPI implementation such as OpenMPI [OpenMPI]_ or MPICH.
              * The mpi4py [mpi4py]_ package.
              .. note::
                  The mpi4py package is not a strict requirement. However, you need to
                  have *some* way of calling MPI from Python. You also need some way of
                  making sure that :func:`MPI_Init` is called when the IPython engines start
                  up. There are a number of ways of doing this and a good number of
                  associated subtleties. We highly recommend just using mpi4py as it
                  takes care of most of these problems. If you want to do something
                  different, let us know and we can help you get started.
              Starting the engines with MPI enabled
              =====================================
              To use code that calls MPI, there are typically two things that MPI requires.
 . The process that wants to call MPI must be started using
                 :command:`mpiexec` or a batch system (like PBS) that has MPI support.
 . Once the process starts, it must call :func:`MPI_Init`.
              There are a couple of ways that you can start the IPython engines and get
              these things to happen.
              Automatic starting using :command:`mpiexec` and :command:`ipcluster`
              --------------------------------------------------------------------
              The easiest approach is to use the `MPIExec` Launchers in :command:`ipcluster`,
              which will first start a controller and then a set of engines using
              :command:`mpiexec`::
                  $ ipcluster start --n=4 --elauncher=MPIExecEngineSetLauncher
              This approach is best as interrupting :command:`ipcluster` will automatically
              stop and clean up the controller and engines.
              Manual starting using :command:`mpiexec`
              ----------------------------------------
              If you want to start the IPython engines using the :command:`mpiexec`, just
              do::
                  $ mpiexec n=4 ipengine --mpi=mpi4py
              This requires that you already have a controller running and that the FURL
              files for the engines are in place. We also have built in support for
              PyTrilinos [PyTrilinos]_, which can be used (assuming is installed) by
              starting the engines with::
                  $ mpiexec n=4 ipengine --mpi=pytrilinos
              Automatic starting using PBS and :command:`ipcluster`
              ------------------------------------------------------
              The :command:`ipcluster` command also has built-in integration with PBS. For
              more information on this approach, see our documentation on :ref:`ipcluster
              <parallel_process>`.
              Actually using MPI
              ==================
              Once the engines are running with MPI enabled, you are ready to go. You can
              now call any code that uses MPI in the IPython engines. And, all of this can
              be done interactively. Here we show a simple example that uses mpi4py
              [mpi4py]_ version 1.1.0 or later.
              First, lets define a simply function that uses MPI to calculate the sum of a
              distributed array. Save the following text in a file called :file:`psum.py`:
              .. sourcecode:: python
                  from mpi4py import MPI
                  import numpy as np
                  def psum(a):
                      s = np.sum(a)
                      rcvBuf = np.array(0.0,'d')
                      MPI.COMM_WORLD.Allreduce([s, MPI.DOUBLE],
                          [rcvBuf, MPI.DOUBLE],
                          op=MPI.SUM)
                      return rcvBuf
              Now, start an IPython cluster::
                  $ ipcluster start --profile=mpi --n=4
              .. note::
                  It is assumed here that the mpi profile has been set up, as described :ref:`here
                  <parallel_process>`.
              Finally, connect to the cluster and use this function interactively. In this
              case, we create a random array on each engine and sum up all the random arrays
              using our :func:`psum` function:
              .. sourcecode:: ipython
                  In [1]: from IPython.parallel import Client
                  In [2]: %load_ext parallel_magic
                  In [3]: c = Client(profile='mpi')
                  In [4]: view = c[:]
                  In [5]: view.activate()
                  # run the contents of the file on each engine:
                  In [6]: view.run('psum.py')
                  In [6]: px a = np.random.rand(100)
                  Parallel execution on engines: [0,1,2,3]
                  In [8]: px s = psum(a)
                  Parallel execution on engines: [0,1,2,3]
                  In [9]: view['s']
                  Out[9]: [187.451545803,187.451545803,187.451545803,187.451545803]
              Any Python code that makes calls to MPI can be used in this manner, including
              compiled C, C++ and Fortran libraries that have been exposed to Python.
              .. [MPI] Message Passing Interface.  http://www-unix.mcs.anl.gov/mpi/
              .. [mpi4py] MPI for Python. mpi4py: http://mpi4py.scipy.org/
              .. [OpenMPI] Open MPI. http://www.open-mpi.org/
              .. [PyTrilinos] PyTrilinos. http://trilinos.sandia.gov/packages/pytrilinos/

docs/source/parallel/parallel_security.txt

0 0 -69

              .. _parallelsecurity:
              ===========================
              Security details of IPython
              ===========================
              .. note::
                  This section is not thorough, and IPython.zmq needs a thorough security
                  audit.
              IPython's :mod:`IPython.zmq` package exposes the full power of the
              Python interpreter over a TCP/IP network for the purposes of parallel
              computing. This feature brings up the important question of IPython's security
              model. This document gives details about this model and how it is implemented
              in IPython's architecture.
              Process and network topology
              ============================
              To enable parallel computing, IPython has a number of different processes that
              run. These processes are discussed at length in the IPython documentation and
              are summarized here:
              * The IPython *engine*.  This process is a full blown Python
                interpreter in which user code is executed.  Multiple
                engines are started to make parallel computing possible.
              * The IPython *hub*.  This process monitors a set of
                engines and schedulers, and keeps track of the state of the processes. It listens
                for registration connections from engines and clients, and monitor connections
                from schedulers.
              * The IPython *schedulers*. This is a set of processes that relay commands and results
                between clients and engines. They are typically on the same machine as the controller,
                and listen for connections from engines and clients, but connect to the Hub.
              * The IPython *client*.  This process is typically an
                interactive Python process that is used to coordinate the
                engines to get a parallel computation done.
              Collectively, these processes are called the IPython *cluster*, and the hub and schedulers
              together are referred to as the *controller*.
              These processes communicate over any transport supported by ZeroMQ (tcp,pgm,infiniband,ipc)
              with a well defined topology. The IPython hub and schedulers listen on sockets. Upon
              starting, an engine connects to a hub and registers itself, which then informs the engine
              of the connection information for the schedulers, and the engine then connects to the
              schedulers. These engine/hub and engine/scheduler connections persist for the
              lifetime of each engine.
              The IPython client also connects to the controller processes using a number of socket
              connections. As of writing, this is one socket per scheduler (4), and 3 connections to the
              hub for a total of 7. These connections persist for the lifetime of the client only.
              A given IPython controller and set of engines engines typically has a relatively
              short lifetime. Typically this lifetime corresponds to the duration of a single parallel
              simulation performed by a single user. Finally, the hub, schedulers, engines, and client
              processes typically execute with the permissions of that same user. More specifically, the
              controller and engines are *not* executed as root or with any other superuser permissions.
              Application logic
              =================
              When running the IPython kernel to perform a parallel computation, a user
              utilizes the IPython client to send Python commands and data through the
              IPython schedulers to the IPython engines, where those commands are executed
              and the data processed. The design of IPython ensures that the client is the
              only access point for the capabilities of the engines. That is, the only way
              of addressing the engines is through a client.
              A user can utilize the client to instruct the IPython engines to execute
              arbitrary Python commands. These Python commands can include calls to the
              system shell, access the filesystem, etc., as required by the user's
              application code. From this perspective, when a user runs an IPython engine on
              a host, that engine has the same capabilities and permissions as the user
              themselves (as if they were logged onto the engine's host with a terminal).
              Secure network connections
              ==========================
              Overview
              --------
              ZeroMQ provides exactly no security. For this reason, users of IPython must be very
              careful in managing connections, because an open TCP/IP socket presents access to
              arbitrary execution as the user on the engine machines. As a result, the default behavior
              of controller processes is to only listen for clients on the loopback interface, and the
              client must establish SSH tunnels to connect to the controller processes.
              .. warning::
                  If the controller's loopback interface is untrusted, then IPython should be considered
                  vulnerable, and this extends to the loopback of all connected clients, which have
                  opened a loopback port that is redirected to the controller's loopback port.
              SSH
              ---
              Since ZeroMQ provides no security, SSH tunnels are the primary source of secure
              connections. A connector file, such as `ipcontroller-client.json`, will contain
              information for connecting to the controller, possibly including the address of an
              ssh-server through with the client is to tunnel. The Client object then creates tunnels
              using either [OpenSSH]_ or [Paramiko]_, depending on the platform. If users do not wish to
              use OpenSSH or Paramiko, or the tunneling utilities are insufficient, then they may
              construct the tunnels themselves, and simply connect clients and engines as if the
              controller were on loopback on the connecting machine.
              .. note::
                  There is not currently tunneling available for engines.
              Authentication
              --------------
              To protect users of shared machines, [HMAC]_ digests are used to sign messages, using a
              shared key.
              The Session object that handles the message protocol uses a unique key to verify valid
              messages. This can be any value specified by the user, but the default behavior is a
              pseudo-random 128-bit number, as generated by `uuid.uuid4()`. This key is used to
              initialize an HMAC object, which digests all messages, and includes that digest as a
              signature and part of the message. Every message that is unpacked (on Controller, Engine,
              and Client) will also be digested by the receiver, ensuring that the sender's key is the
              same as the receiver's. No messages that do not contain this key are acted upon in any
              way. The key itself is never sent over the network.
              There is exactly one shared key per cluster - it must be the same everywhere. Typically,
              the controller creates this key, and stores it in the private connection files
              `ipython-{engine|client}.json`. These files are typically stored in the
              `~/.ipython/profile_<name>/security` directory, and are maintained as readable only by the
              owner, just as is common practice with a user's keys in their `.ssh` directory.
              .. warning::
                  It is important to note that the key authentication, as emphasized by the use of
                  a uuid rather than generating a key with a cryptographic library, provides a
                  defense against *accidental* messages more than it does against malicious attacks.
                  If loopback is compromised, it would be trivial for an attacker to intercept messages
                  and deduce the key, as there is no encryption.
              Specific security vulnerabilities
              =================================
              There are a number of potential security vulnerabilities present in IPython's
              architecture. In this section we discuss those vulnerabilities and detail how
              the security architecture described above prevents them from being exploited.
              Unauthorized clients
              --------------------
              The IPython client can instruct the IPython engines to execute arbitrary
              Python code with the permissions of the user who started the engines. If an
              attacker were able to connect their own hostile IPython client to the IPython
              controller, they could instruct the engines to execute code.
              On the first level, this attack is prevented by requiring access to the controller's
              ports, which are recommended to only be open on loopback if the controller is on an
              untrusted local network. If the attacker does have access to the Controller's ports, then
              the attack is prevented by the capabilities based client authentication of the execution
              key. The relevant authentication information is encoded into the JSON file that clients
              must present to gain access to the IPython controller. By limiting the distribution of
              those keys, a user can grant access to only authorized persons, just as with SSH keys.
              It is highly unlikely that an execution key could be guessed by an attacker
              in a brute force guessing attack. A given instance of the IPython controller
              only runs for a relatively short amount of time (on the order of hours). Thus
              an attacker would have only a limited amount of time to test a search space of
              size 2**128.  For added security, users can have arbitrarily long keys.
              .. warning::
                  If the attacker has gained enough access to intercept loopback connections on *either* the
                  controller or client, then a duplicate message can be sent. To protect against this,
                  recipients only allow each signature once, and consider duplicates invalid.  However,
                  the duplicate message could be sent to *another* recipient using the same key,
                  and it would be considered valid.
              Unauthorized engines
              --------------------
              If an attacker were able to connect a hostile engine to a user's controller,
              the user might unknowingly send sensitive code or data to the hostile engine.
              This attacker's engine would then have full access to that code and data.
              This type of attack is prevented in the same way as the unauthorized client
              attack, through the usage of the capabilities based authentication scheme.
              Unauthorized controllers
              ------------------------
              It is also possible that an attacker could try to convince a user's IPython
              client or engine to connect to a hostile IPython controller. That controller
              would then have full access to the code and data sent between the IPython
              client and the IPython engines.
              Again, this attack is prevented through the capabilities in a connection file, which
              ensure that a client or engine connects to the correct controller. It is also important to
              note that the connection files also encode the IP address and port that the controller is
              listening on, so there is little chance of mistakenly connecting to a controller running
              on a different IP address and port.
              When starting an engine or client, a user must specify the key to use
              for that connection. Thus, in order to introduce a hostile controller, the
              attacker must convince the user to use the key associated with the
              hostile controller. As long as a user is diligent in only using keys from
              trusted sources, this attack is not possible.
              .. note::
                  I may be wrong, the unauthorized controller may be easier to fake than this.
              Other security measures
              =======================
              A number of other measures are taken to further limit the security risks
              involved in running the IPython kernel.
              First, by default, the IPython controller listens on random port numbers.
              While this can be overridden by the user, in the default configuration, an
              attacker would have to do a port scan to even find a controller to attack.
              When coupled with the relatively short running time of a typical controller
              (on the order of hours), an attacker would have to work extremely hard and
              extremely *fast* to even find a running controller to attack.
              Second, much of the time, especially when run on supercomputers or clusters,
              the controller is running behind a firewall. Thus, for engines or client to
              connect to the controller:
              * The different processes have to all be behind the firewall.
              or:
              * The user has to use SSH port forwarding to tunnel the
                connections through the firewall.
              In either case, an attacker is presented with additional barriers that prevent
              attacking or even probing the system.
              Summary
              =======
              IPython's architecture has been carefully designed with security in mind. The
              capabilities based authentication model, in conjunction with SSH tunneled
              TCP/IP channels, address the core potential vulnerabilities in the system,
              while still enabling user's to use the system in open networks.
-             Other questions
-             ===============
-             .. note::
-                 this does not apply to ZMQ, but I am sure there will be questions.
-             About keys
-             ----------
-             Can you clarify the roles of the certificate and its keys versus the FURL,
-             which is also called a key?
-             The certificate created by IPython processes is a standard public key x509
-             certificate, that is used by the SSL handshake protocol to setup encrypted
-             channel between the controller and the IPython engine or client. This public
-             and private key associated with this certificate are used only by the SSL
-             handshake protocol in setting up this encrypted channel.
-             The FURL serves a completely different and independent purpose from the
-             key pair associated with the certificate. When we refer to a FURL as a
-             key, we are using the word "key" in the capabilities based security model
-             sense. This has nothing to do with "key" in the public/private key sense used
-             in the SSL protocol.
-             With that said the FURL is used as an cryptographic key, to grant
-             IPython engines and clients access to particular capabilities that the
-             controller offers.
-             Self signed certificates
-             ------------------------
-             Is the controller creating a self-signed certificate? Is this created for per
-             instance/session, one-time-setup or each-time the controller is started?
-             The Foolscap network protocol, which handles the SSL protocol details, creates
-             a self-signed x509 certificate using OpenSSL for each IPython process. The
-             lifetime of the certificate is handled differently for the IPython controller
-             and the engines/client.
-             For the IPython engines and client, the certificate is only held in memory for
-             the lifetime of its process. It is never written to disk.
-             For the controller, the certificate can be created anew each time the
-             controller starts or it can be created once and reused each time the
-             controller starts. If at any point, the certificate is deleted, a new one is
-             created the next time the controller starts.
-             SSL private key
-             ---------------
-             How the private key (associated with the certificate) is distributed?
-             In the usual implementation of the SSL protocol, the private key is never
-             distributed. We follow this standard always.
-             SSL versus Foolscap authentication
-             ----------------------------------
-             Many SSL connections only perform one sided authentication (the server to the
-             client). How is the client authentication in IPython's system related to SSL
-             authentication?
-             We perform a two way SSL handshake in which both parties request and verify
-             the certificate of their peer. This mutual authentication is handled by the
-             SSL handshake and is separate and independent from the additional
-             authentication steps that the CLIENT and SERVER perform after an encrypted
-             channel is established.
              .. [RFC5246] <http://tools.ietf.org/html/rfc5246>
              .. [OpenSSH] <http://www.openssh.com/>
              .. [Paramiko] <http://www.lag.net/paramiko/>
              .. [HMAC] <http://tools.ietf.org/html/rfc2104.html>

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages