From 78f12f6cfef2c8971bd403b9b3fdb60b9d1affe6 2008-11-11 23:12:31 From: Brian Granger Date: 2008-11-11 23:12:31 Subject: [PATCH] Small changes to the security documentation. --- diff --git a/docs/source/parallel/parallel_security.txt b/docs/source/parallel/parallel_security.txt new file mode 100644 index 0000000..bacca35 --- /dev/null +++ b/docs/source/parallel/parallel_security.txt @@ -0,0 +1,363 @@ +.. _parallelsecurity: + + +=========================== +Security details of IPython +=========================== + +IPython's :mod:`IPython.kernel` package exposes the full power of the Python +interpreter over a TCP/IP network for the purposes of parallel computing. This +feature brings up the important question of IPython's security model. This +document gives details about this model and how it is implemented in IPython's +architecture. + +Processs and network topology +============================= + +To enable parallel computing, IPython has a number of different processes that +run. These processes are discussed at length in the IPython documentation and +are summarized here: + +* The IPython *engine*. This process is a full blown Python + interpreter in which user code is executed. Multiple + engines are started to make parallel computing possible. +* The IPython *controller*. This process manages a set of + engines, maintaining a queue for each and presenting + an asynchronous interface to the set of engines. +* The IPython *client*. This process is typically an + interactive Python process that is used to coordinate the + engines to get a parallel computation done. + +Collectively, these three processes are called the IPython *kernel*. + +These three processes communicate over TCP/IP connections with a well defined +topology. The IPython controller is the only process that listens on TCP/IP +sockets. Upon starting, an engine connects to a controller and registers +itself with the controller. These engine/controller TCP/IP connections persist +for the lifetime of each engine. + +The IPython client also connects to the controller using one or more TCP/IP +connections. These connections persist for the lifetime of the client only. + +A given IPython controller and set of engines typically has a relatively short +lifetime. Typically this lifetime corresponds to the duration of a single +parallel simulation performed by a single user. Finally, the controller, +engines and client processes typically execute with the permissions of that +same user. More specifically, the controller and engines are *not* executed as +root or with any other superuser permissions. + +Application logic +================= + +When running the IPython kernel to perform a parallel computation, a user +utilizes the IPython client to send Python commands and data through the +IPython controller to the IPython engines, where those commands are executed +and the data processed. The design of IPython ensures that the client is the +only access point for the capabilities of the engines. That is, the only way of addressing the engines is through a client. + +A user can utilize the client to instruct the IPython engines to execute +arbitrary Python commands. These Python commands can include calls to the +system shell, access the filesystem, etc., as required by the user's +application code. From this perspective, when a user runs an IPython engine on +a host, that engine has the same capabilities and permissions as the user +themselves (as if they were logged onto the engine's host with a terminal). + +Secure network connections +========================== + +Overview +-------- + +All TCP/IP connections between the client and controller as well as the +engines and controller are fully encrypted and authenticated. This section +describes the details of the encryption and authentication approached used +within IPython. + +IPython uses the `Foolscap `_ network +protocol for all communications between processes. Thus, the details of +IPython's security model are directly related to those of Foolscap. Thus, much +of the following discussion is actually just a discussion of the security that +is built in to Foolscap. + +Encryption +---------- + +For encryption purposes, IPython and Foolscap use the well known Secure Socket +Layer (SSL) protocol (`RFC5246 `_). We use +the implementation of this protocol provided by the OpenSSL project through +the `pyOpenSSL `_ Python bindings to OpenSSL. + +Authentication +-------------- + +IPython clients and engines must also authenticate themselves with the +controller. This is handled in a `capabilities based security model +`_. In this model, the +controller creates a strong cryptographic key or token that represents each +set of capability that the controller offers. Any party who has this key and +presents it to the controller has full access to the corresponding +capabilities of the controller. This model is analogous to using a physical +key to gain access to physical items (capabilities) behind a locked door. + +For a capabilities based authentication system to prevent unauthorized access, +two things must be ensured: + +* The keys must be cryptographically strong. Otherwise attackers could gain + access by a simple brute force key guessing attack. +* The actual keys must be distributed only to authorized parties. + +The keys in Foolscap are called Foolscap URL's or FURLs. The following section +gives details about how these FURLs are created in Foolscap. The IPython +controller creates a number of FURLs for different purposes: + +* One FURL that grants IPython engines access to the controller. Also + implicit in this access is permission to execute code sent by an + authenticated IPython client. +* Two or more FURLs that grant IPython clients access to the controller. + Implicit in this access is permission to give the controller's engine code + to execute. + +Upon starting, the controller creates these different FURLS and writes them +files in the user-read-only directory $HOME/.ipython/security. Thus, only the +user who starts the controller has access to the FURLs. + +For an IPython client or engine to authenticate with a controller, it must +present the appropriate FURL to the controller upon connecting. If the +FURL matches what the controller expects for a given capability, access is +granted. If not, access is denied. The exchange of FURLs is done after +encrypted communications channels have been established to prevent attackers +from capturing them. + +.. note:: + + The FURL is similar to an unsigned private key in SSH. + +Details of the Foolscap handshake +--------------------------------- + +In this section we detail the precise security handshake that takes place at +the beginning of any network connection in IPython. For the purposes of this +discussion, the SERVER is the IPython controller process and the CLIENT is the +IPython engine or client process. + +Upon starting, all IPython processes do the following: + +1. Create a public key x509 certificate (ISO/IEC 9594). +2. Create a hash of the contents of the certificate using the SHA-1 algorithm. + The base-32 encoded version of this hash is saved by the process as its + process id (actually in Foolscap, this is the Tub id, but here refer to + it as the process id). + +Upon starting, the IPython controller also does the following: + +1. Save the x509 certificate to disk in a secure location. The CLIENT + certificate is never saved to disk. +2. Create a FURL for each capability that the controller has. There are + separate capabilities the controller offers for clients and engines. The + FURL is created using: a) the process id of the SERVER, b) the IP + address and port the SERVER is listening on and c) a 160 bit, + cryptographically secure string that represents the capability (the + "capability id"). +3. The FURLs are saved to disk in a secure location on the SERVER's host. + +For a CLIENT to be able to connect to the SERVER and access a capability of +that SERVER, the CLIENT must have knowledge of the FURL for that SERVER's +capability. This typically requires that the file containing the FURL be +moved from the SERVER's host to the CLIENT's host. This is done by the end +user who started the SERVER and wishes to have a CLIENT connect to the SERVER. + +When a CLIENT connects to the SERVER, the following handshake protocol takes +place: + +1. The CLIENT tells the SERVER what process (or Tub) id it expects the SERVER + to have. +2. If the SERVER has that process id, it notifies the CLIENT that it will now + enter encrypted mode. If the SERVER has a different id, the SERVER aborts. +3. Both CLIENT and SERVER initiate the SSL handshake protocol. +4. Both CLIENT and SERVER request the certificate of their peer and verify + that certificate. If this succeeds, all further communications are + encrypted. +5. Both CLIENT and SERVER send a hello block containing connection parameters + and their process id. +6. The CLIENT and SERVER check that their peer's stated process id matches the + hash of the x509 certificate the peer presented. If not, the connection is + aborted. +7. The CLIENT verifies that the SERVER's stated id matches the id of the + SERVER the CLIENT is intending to connect to. If not, the connection is + aborted. +8. The CLIENT and SERVER elect a master who decides on the final connection + parameters. + +The public/private key pair associated with each process's x509 certificate +are completely hidden from this handshake protocol. There are however, used +internally by OpenSSL as part of the SSL handshake protocol. Each process +keeps their own private key hidden and sends its peer only the public key +(embedded in the certificate). + +Finally, when the CLIENT requests access to a particular SERVER capability, +the following happens: + +1. The CLIENT asks the SERVER for access to a capability by presenting that + capabilities id. +2. If the SERVER has a capability with that id, access is granted. If not, + access is not granted. +3. Once access has been gained, the CLIENT can use the capability. + +Specific security vulnerabilities +================================= + +There are a number of potential security vulnerabilities present in IPython's +architecture. In this section we discuss those vulnerabilities and detail how +the security architecture described above prevents them from being exploited. + +Unauthorized clients +-------------------- + +The IPython client can instruct the IPython engines to execute arbitrary +Python code with the permissions of the user who started the engines. If an +attacker were able to connect their own hostile IPython client to the IPython +controller, they could instruct the engines to execute code. + +This attack is prevented by the capabilities based client authentication +performed after the encrypted channel has been established. The relevant +authentication information is encoded into the FURL that clients must +present to gain access to the IPython controller. By limiting the distribution +of those FURLs, a user can grant access to only authorized persons. + +It is highly unlikely that a client FURL could be guessed by an attacker +in a brute force guessing attack. A given instance of the IPython controller +only runs for a relatively short amount of time (on the order of hours). Thus +an attacker would have only a limited amount of time to test a search space of +size 2**320. Furthermore, even if a controller were to run for a longer amount +of time, this search space is quite large (larger for instance than that of +typical username/password pair). + +Unauthorized engines +-------------------- + +If an attacker were able to connect a hostile engine to a user's controller, +the user might unknowingly send sensitive code or data to the hostile engine. +This attacker's engine would then have full access to that code and data. + +This type of attack is prevented in the same way as the unauthorized client +attack, through the usage of the capabilities based authentication scheme. + +Unauthorized controllers +------------------------ + +It is also possible that an attacker could try to convince a user's IPython +client or engine to connect to a hostile IPython controller. That controller +would then have full access to the code and data sent between the IPython +client and the IPython engines. + +Again, this attack is prevented through the FURLs, which ensure that a +client or engine connects to the correct controller. It is also important to +note that the FURLs also encode the IP address and port that the +controller is listening on, so there is little chance of mistakenly connecting +to a controller running on a different IP address and port. + +When starting an engine or client, a user must specify which FURL to use +for that connection. Thus, in order to introduce a hostile controller, the +attacker must convince the user to use the FURLs associated with the +hostile controller. As long as a user is diligent in only using FURLs from +trusted sources, this attack is not possible. + +Other security measures +======================= + +A number of other measures are taken to further limit the security risks +involved in running the IPython kernel. + +First, by default, the IPython controller listens on random port numbers. +While this can be overridden by the user, in the default configuration, an +attacker would have to do a port scan to even find a controller to attack. +When coupled with the relatively short running time of a typical controller +(on the order of hours), an attacker would have to work extremely hard and +extremely *fast* to even find a running controller to attack. + +Second, much of the time, especially when run on supercomputers or clusters, +the controller is running behind a firewall. Thus, for engines or client to +connect to the controller: + +* The different processes have to all be behind the firewall. + +or: + +* The user has to use SSH port forwarding to tunnel the + connections through the firewall. + +In either case, an attacker is presented with addition barriers that prevent +attacking or even probing the system. + +Summary +======= + +IPython's architecture has been carefully designed with security in mind. The +capabilities based authentication model, in conjunction with the encrypted +TCP/IP channels, address the core potential vulnerabilities in the system, +while still enabling user's to use the system in open networks. + +Other questions +=============== + +About keys +---------- + +Can you clarify the roles of the certificate and its keys versus the FURL, +which is also called a key? + +The certificate created by IPython processes is a standard public key x509 +certificate, that is used by the SSL handshake protocol to setup encrypted +channel between the controller and the IPython engine or client. This public +and private key associated with this certificate are used only by the SSL +handshake protocol in setting up this encrypted channel. + +The FURL serves a completely different and independent purpose from the +key pair associated with the certificate. When we refer to a FURL as a +key, we are using the word "key" in the capabilities based security model +sense. This has nothing to do with "key" in the public/private key sense used +in the SSL protocol. + +With that said the FURL is used as an cryptographic key, to grant +IPython engines and clients access to particular capabilities that the +controller offers. + +Self signed certificates +------------------------ + +Is the controller creating a self-signed certificate? Is this created for per +instance/session, one-time-setup or each-time the controller is started? + +The Foolscap network protocol, which handles the SSL protocol details, creates +a self-signed x509 certificate using OpenSSL for each IPython process. The +lifetime of the certificate is handled differently for the IPython controller +and the engines/client. + +For the IPython engines and client, the certificate is only held in memory for +the lifetime of its process. It is never written to disk. + +For the controller, the certificate can be created anew each time the +controller starts or it can be created once and reused each time the +controller starts. If at any point, the certificate is deleted, a new one is +created the next time the controller starts. + +SSL private key +--------------- + +How the private key (associated with the certificate) is distributed? + +In the usual implementation of the SSL protocol, the private key is never +distributed. We follow this standard always. + +SSL versus Foolscap authentication +---------------------------------- + +Many SSL connections only perform one sided authentication (the server to the +client). How is the client authentication in IPython's system related to SSL +authentication? + +We perform a two way SSL handshake in which both parties request and verify +the certificate of their peer. This mutual authentication is handled by the +SSL handshake and is separate and independent from the additional +authentication steps that the CLIENT and SERVER perform after an encrypted +channel is established.