|
|
.. _parallelsecurity:
|
|
|
|
|
|
===========================
|
|
|
Security details of IPython
|
|
|
===========================
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
This section is not thorough, and IPython.zmq needs a thorough security
|
|
|
audit.
|
|
|
|
|
|
IPython's :mod:`IPython.zmq` package exposes the full power of the
|
|
|
Python interpreter over a TCP/IP network for the purposes of parallel
|
|
|
computing. This feature brings up the important question of IPython's security
|
|
|
model. This document gives details about this model and how it is implemented
|
|
|
in IPython's architecture.
|
|
|
|
|
|
Processs and network topology
|
|
|
=============================
|
|
|
|
|
|
To enable parallel computing, IPython has a number of different processes that
|
|
|
run. These processes are discussed at length in the IPython documentation and
|
|
|
are summarized here:
|
|
|
|
|
|
* The IPython *engine*. This process is a full blown Python
|
|
|
interpreter in which user code is executed. Multiple
|
|
|
engines are started to make parallel computing possible.
|
|
|
* The IPython *hub*. This process monitors a set of
|
|
|
engines and schedulers, and keeps track of the state of the processes. It listens
|
|
|
for registration connections from engines and clients, and monitor connections
|
|
|
from schedulers.
|
|
|
* The IPython *schedulers*. This is a set of processes that relay commands and results
|
|
|
between clients and engines. They are typically on the same machine as the controller,
|
|
|
and listen for connections from engines and clients, but connect to the Hub.
|
|
|
* The IPython *client*. This process is typically an
|
|
|
interactive Python process that is used to coordinate the
|
|
|
engines to get a parallel computation done.
|
|
|
|
|
|
Collectively, these processes are called the IPython *kernel*, and the hub and schedulers
|
|
|
together are referred to as the *controller*.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
Are these really still referred to as the Kernel? It doesn't seem so to me. 'cluster'
|
|
|
seems more accurate.
|
|
|
|
|
|
-MinRK
|
|
|
|
|
|
These processes communicate over any transport supported by ZeroMQ (tcp,pgm,infiniband,ipc)
|
|
|
with a well defined topology. The IPython hub and schedulers listen on sockets. Upon
|
|
|
starting, an engine connects to a hub and registers itself, which then informs the engine
|
|
|
of the connection information for the schedulers, and the engine then connects to the
|
|
|
schedulers. These engine/hub and engine/scheduler connections persist for the
|
|
|
lifetime of each engine.
|
|
|
|
|
|
The IPython client also connects to the controller processes using a number of socket
|
|
|
connections. As of writing, this is one socket per scheduler (4), and 3 connections to the
|
|
|
hub for a total of 7. These connections persist for the lifetime of the client only.
|
|
|
|
|
|
A given IPython controller and set of engines engines typically has a relatively
|
|
|
short lifetime. Typically this lifetime corresponds to the duration of a single parallel
|
|
|
simulation performed by a single user. Finally, the hub, schedulers, engines, and client
|
|
|
processes typically execute with the permissions of that same user. More specifically, the
|
|
|
controller and engines are *not* executed as root or with any other superuser permissions.
|
|
|
|
|
|
Application logic
|
|
|
=================
|
|
|
|
|
|
When running the IPython kernel to perform a parallel computation, a user
|
|
|
utilizes the IPython client to send Python commands and data through the
|
|
|
IPython schedulers to the IPython engines, where those commands are executed
|
|
|
and the data processed. The design of IPython ensures that the client is the
|
|
|
only access point for the capabilities of the engines. That is, the only way
|
|
|
of addressing the engines is through a client.
|
|
|
|
|
|
A user can utilize the client to instruct the IPython engines to execute
|
|
|
arbitrary Python commands. These Python commands can include calls to the
|
|
|
system shell, access the filesystem, etc., as required by the user's
|
|
|
application code. From this perspective, when a user runs an IPython engine on
|
|
|
a host, that engine has the same capabilities and permissions as the user
|
|
|
themselves (as if they were logged onto the engine's host with a terminal).
|
|
|
|
|
|
Secure network connections
|
|
|
==========================
|
|
|
|
|
|
Overview
|
|
|
--------
|
|
|
|
|
|
ZeroMQ provides exactly no security. For this reason, users of IPython must be very
|
|
|
careful in managing connections, because an open TCP/IP socket presents access to
|
|
|
arbitrary execution as the user on the engine machines. As a result, the default behavior
|
|
|
of controller processes is to only listen for clients on the loopback interface, and the
|
|
|
client must establish SSH tunnels to connect to the controller processes.
|
|
|
|
|
|
.. warning::
|
|
|
|
|
|
If the controller's loopback interface is untrusted, then IPython should be considered
|
|
|
vulnerable, and this extends to the loopback of all connected clients, which have
|
|
|
opened a loopback port that is redirected to the controller's loopback port.
|
|
|
|
|
|
|
|
|
SSH
|
|
|
---
|
|
|
|
|
|
Since ZeroMQ provides no security, SSH tunnels are the primary source of secure
|
|
|
connections. A connector file, such as `ipcontroller-client.json`, will contain
|
|
|
information for connecting to the controller, possibly including the address of an
|
|
|
ssh-server through with the client is to tunnel. The Client object then creates tunnels
|
|
|
using either [OpenSSH]_ or [Paramiko]_, depending on the platform. If users do not wish to
|
|
|
use OpenSSH or Paramiko, or the tunneling utilities are insufficient, then they may
|
|
|
construct the tunnels themselves, and simply connect clients and engines as if the
|
|
|
controller were on loopback on the connecting machine.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
There is not currently tunneling available for engines.
|
|
|
|
|
|
Authentication
|
|
|
--------------
|
|
|
|
|
|
To protect users of shared machines, an execution key is used to authenticate all messages.
|
|
|
|
|
|
The Session object that handles the message protocol uses a unique key to verify valid
|
|
|
messages. This can be any value specified by the user, but the default behavior is a
|
|
|
pseudo-random 128-bit number, as generated by `uuid.uuid4()`. This key is checked on every
|
|
|
message everywhere it is unpacked (Controller, Engine, and Client) to ensure that it came
|
|
|
from an authentic user, and no messages that do not contain this key are acted upon in any
|
|
|
way.
|
|
|
|
|
|
There is exactly one key per cluster - it must be the same everywhere. Typically, the
|
|
|
controller creates this key, and stores it in the private connection files
|
|
|
`ipython-{engine|client}.json`. These files are typically stored in the
|
|
|
`~/.ipython/cluster_<profile>/security` directory, and are maintained as readable only by
|
|
|
the owner, just as is common practice with a user's keys in their `.ssh` directory.
|
|
|
|
|
|
.. warning::
|
|
|
|
|
|
It is important to note that the key authentication, as emphasized by the use of
|
|
|
a uuid rather than generating a key with a cryptographic library, provides a
|
|
|
defense against *accidental* messages more than it does against malicious attacks.
|
|
|
If loopback is compromised, it would be trivial for an attacker to intercept messages
|
|
|
and deduce the key, as there is no encryption.
|
|
|
|
|
|
|
|
|
|
|
|
Specific security vulnerabilities
|
|
|
=================================
|
|
|
|
|
|
There are a number of potential security vulnerabilities present in IPython's
|
|
|
architecture. In this section we discuss those vulnerabilities and detail how
|
|
|
the security architecture described above prevents them from being exploited.
|
|
|
|
|
|
Unauthorized clients
|
|
|
--------------------
|
|
|
|
|
|
The IPython client can instruct the IPython engines to execute arbitrary
|
|
|
Python code with the permissions of the user who started the engines. If an
|
|
|
attacker were able to connect their own hostile IPython client to the IPython
|
|
|
controller, they could instruct the engines to execute code.
|
|
|
|
|
|
|
|
|
On the first level, this attack is prevented by requiring access to the controller's
|
|
|
ports, which are recommended to only be open on loopback if the controller is on an
|
|
|
untrusted local network. If the attacker does have access to the Controller's ports, then
|
|
|
the attack is prevented by the capabilities based client authentication of the execution
|
|
|
key. The relevant authentication information is encoded into the JSON file that clients
|
|
|
must present to gain access to the IPython controller. By limiting the distribution of
|
|
|
those keys, a user can grant access to only authorized persons, just as with SSH keys.
|
|
|
|
|
|
It is highly unlikely that an execution key could be guessed by an attacker
|
|
|
in a brute force guessing attack. A given instance of the IPython controller
|
|
|
only runs for a relatively short amount of time (on the order of hours). Thus
|
|
|
an attacker would have only a limited amount of time to test a search space of
|
|
|
size 2**128.
|
|
|
|
|
|
.. warning::
|
|
|
|
|
|
If the attacker has gained enough access to intercept loopback connections on
|
|
|
*either* the controller or client, then the key is easily deduced from network
|
|
|
traffic.
|
|
|
|
|
|
|
|
|
Unauthorized engines
|
|
|
--------------------
|
|
|
|
|
|
If an attacker were able to connect a hostile engine to a user's controller,
|
|
|
the user might unknowingly send sensitive code or data to the hostile engine.
|
|
|
This attacker's engine would then have full access to that code and data.
|
|
|
|
|
|
This type of attack is prevented in the same way as the unauthorized client
|
|
|
attack, through the usage of the capabilities based authentication scheme.
|
|
|
|
|
|
Unauthorized controllers
|
|
|
------------------------
|
|
|
|
|
|
It is also possible that an attacker could try to convince a user's IPython
|
|
|
client or engine to connect to a hostile IPython controller. That controller
|
|
|
would then have full access to the code and data sent between the IPython
|
|
|
client and the IPython engines.
|
|
|
|
|
|
Again, this attack is prevented through the capabilities in a connection file, which
|
|
|
ensure that a client or engine connects to the correct controller. It is also important to
|
|
|
note that the connection files also encode the IP address and port that the controller is
|
|
|
listening on, so there is little chance of mistakenly connecting to a controller running
|
|
|
on a different IP address and port.
|
|
|
|
|
|
When starting an engine or client, a user must specify the key to use
|
|
|
for that connection. Thus, in order to introduce a hostile controller, the
|
|
|
attacker must convince the user to use the key associated with the
|
|
|
hostile controller. As long as a user is diligent in only using keys from
|
|
|
trusted sources, this attack is not possible.
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
I may be wrong, the unauthorized controller may be easier to fake than this.
|
|
|
|
|
|
Other security measures
|
|
|
=======================
|
|
|
|
|
|
A number of other measures are taken to further limit the security risks
|
|
|
involved in running the IPython kernel.
|
|
|
|
|
|
First, by default, the IPython controller listens on random port numbers.
|
|
|
While this can be overridden by the user, in the default configuration, an
|
|
|
attacker would have to do a port scan to even find a controller to attack.
|
|
|
When coupled with the relatively short running time of a typical controller
|
|
|
(on the order of hours), an attacker would have to work extremely hard and
|
|
|
extremely *fast* to even find a running controller to attack.
|
|
|
|
|
|
Second, much of the time, especially when run on supercomputers or clusters,
|
|
|
the controller is running behind a firewall. Thus, for engines or client to
|
|
|
connect to the controller:
|
|
|
|
|
|
* The different processes have to all be behind the firewall.
|
|
|
|
|
|
or:
|
|
|
|
|
|
* The user has to use SSH port forwarding to tunnel the
|
|
|
connections through the firewall.
|
|
|
|
|
|
In either case, an attacker is presented with additional barriers that prevent
|
|
|
attacking or even probing the system.
|
|
|
|
|
|
Summary
|
|
|
=======
|
|
|
|
|
|
IPython's architecture has been carefully designed with security in mind. The
|
|
|
capabilities based authentication model, in conjunction with SSH tunneled
|
|
|
TCP/IP channels, address the core potential vulnerabilities in the system,
|
|
|
while still enabling user's to use the system in open networks.
|
|
|
|
|
|
Other questions
|
|
|
===============
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
this does not apply to ZMQ, but I am sure there will be questions.
|
|
|
|
|
|
About keys
|
|
|
----------
|
|
|
|
|
|
Can you clarify the roles of the certificate and its keys versus the FURL,
|
|
|
which is also called a key?
|
|
|
|
|
|
The certificate created by IPython processes is a standard public key x509
|
|
|
certificate, that is used by the SSL handshake protocol to setup encrypted
|
|
|
channel between the controller and the IPython engine or client. This public
|
|
|
and private key associated with this certificate are used only by the SSL
|
|
|
handshake protocol in setting up this encrypted channel.
|
|
|
|
|
|
The FURL serves a completely different and independent purpose from the
|
|
|
key pair associated with the certificate. When we refer to a FURL as a
|
|
|
key, we are using the word "key" in the capabilities based security model
|
|
|
sense. This has nothing to do with "key" in the public/private key sense used
|
|
|
in the SSL protocol.
|
|
|
|
|
|
With that said the FURL is used as an cryptographic key, to grant
|
|
|
IPython engines and clients access to particular capabilities that the
|
|
|
controller offers.
|
|
|
|
|
|
Self signed certificates
|
|
|
------------------------
|
|
|
|
|
|
Is the controller creating a self-signed certificate? Is this created for per
|
|
|
instance/session, one-time-setup or each-time the controller is started?
|
|
|
|
|
|
The Foolscap network protocol, which handles the SSL protocol details, creates
|
|
|
a self-signed x509 certificate using OpenSSL for each IPython process. The
|
|
|
lifetime of the certificate is handled differently for the IPython controller
|
|
|
and the engines/client.
|
|
|
|
|
|
For the IPython engines and client, the certificate is only held in memory for
|
|
|
the lifetime of its process. It is never written to disk.
|
|
|
|
|
|
For the controller, the certificate can be created anew each time the
|
|
|
controller starts or it can be created once and reused each time the
|
|
|
controller starts. If at any point, the certificate is deleted, a new one is
|
|
|
created the next time the controller starts.
|
|
|
|
|
|
SSL private key
|
|
|
---------------
|
|
|
|
|
|
How the private key (associated with the certificate) is distributed?
|
|
|
|
|
|
In the usual implementation of the SSL protocol, the private key is never
|
|
|
distributed. We follow this standard always.
|
|
|
|
|
|
SSL versus Foolscap authentication
|
|
|
----------------------------------
|
|
|
|
|
|
Many SSL connections only perform one sided authentication (the server to the
|
|
|
client). How is the client authentication in IPython's system related to SSL
|
|
|
authentication?
|
|
|
|
|
|
We perform a two way SSL handshake in which both parties request and verify
|
|
|
the certificate of their peer. This mutual authentication is handled by the
|
|
|
SSL handshake and is separate and independent from the additional
|
|
|
authentication steps that the CLIENT and SERVER perform after an encrypted
|
|
|
channel is established.
|
|
|
|
|
|
.. [RFC5246] <http://tools.ietf.org/html/rfc5246>
|
|
|
|
|
|
.. [OpenSSH] <http://www.openssh.com/>
|
|
|
.. [Paramiko] <http://www.lag.net/paramiko/>
|
|
|
|