IntroNumPy_orig.tex
2358 lines
| 73.3 KiB
| application/x-tex
|
TexLexer
Matthias BUSSONNIER
|
r9653 | |||
%% This file was auto-generated by IPython. | ||||
%% Conversion from the original notebook file: | ||||
%% tests/ipynbref/Gun_Data.orig.ipynb | ||||
%% | ||||
\documentclass[11pt,english]{article} | ||||
%% This is the automatic preamble used by IPython. Note that it does *not* | ||||
%% include a documentclass declaration, that is added at runtime to the overall | ||||
%% document. | ||||
\usepackage{amsmath} | ||||
\usepackage{amssymb} | ||||
\usepackage{graphicx} | ||||
\usepackage{ucs} | ||||
\usepackage[utf8x]{inputenc} | ||||
% needed for markdown enumerations to work | ||||
\usepackage{enumerate} | ||||
% Slightly bigger margins than the latex defaults | ||||
\usepackage{geometry} | ||||
\geometry{verbose,tmargin=3cm,bmargin=3cm,lmargin=2.5cm,rmargin=2.5cm} | ||||
% Define a few colors for use in code, links and cell shading | ||||
\usepackage{color} | ||||
\definecolor{orange}{cmyk}{0,0.4,0.8,0.2} | ||||
\definecolor{darkorange}{rgb}{.71,0.21,0.01} | ||||
\definecolor{darkgreen}{rgb}{.12,.54,.11} | ||||
\definecolor{myteal}{rgb}{.26, .44, .56} | ||||
\definecolor{gray}{gray}{0.45} | ||||
\definecolor{lightgray}{gray}{.95} | ||||
\definecolor{mediumgray}{gray}{.8} | ||||
\definecolor{inputbackground}{rgb}{.95, .95, .85} | ||||
\definecolor{outputbackground}{rgb}{.95, .95, .95} | ||||
\definecolor{traceback}{rgb}{1, .95, .95} | ||||
% Framed environments for code cells (inputs, outputs, errors, ...). The | ||||
% various uses of \unskip (or not) at the end were fine-tuned by hand, so don't | ||||
% randomly change them unless you're sure of the effect it will have. | ||||
\usepackage{framed} | ||||
% remove extraneous vertical space in boxes | ||||
\setlength\fboxsep{0pt} | ||||
% codecell is the whole input+output set of blocks that a Code cell can | ||||
% generate. | ||||
% TODO: unfortunately, it seems that using a framed codecell environment breaks | ||||
% the ability of the frames inside of it to be broken across pages. This | ||||
% causes at least the problem of having lots of empty space at the bottom of | ||||
% pages as new frames are moved to the next page, and if a single frame is too | ||||
% long to fit on a page, will completely stop latex from compiling the | ||||
% document. So unless we figure out a solution to this, we'll have to instead | ||||
% leave the codecell env. as empty. I'm keeping the original codecell | ||||
% definition here (a thin vertical bar) for reference, in case we find a | ||||
% solution to the page break issue. | ||||
%% \newenvironment{codecell}{% | ||||
%% \def\FrameCommand{\color{mediumgray} \vrule width 1pt \hspace{5pt}}% | ||||
%% \MakeFramed{\vspace{-0.5em}}} | ||||
%% {\unskip\endMakeFramed} | ||||
% For now, make this a no-op... | ||||
\newenvironment{codecell}{} | ||||
\newenvironment{codeinput}{% | ||||
\def\FrameCommand{\colorbox{inputbackground}}% | ||||
\MakeFramed{\advance\hsize-\width \FrameRestore}} | ||||
{\unskip\endMakeFramed} | ||||
\newenvironment{codeoutput}{% | ||||
\def\FrameCommand{\colorbox{outputbackground}}% | ||||
\vspace{-1.4em} | ||||
\MakeFramed{\advance\hsize-\width \FrameRestore}} | ||||
{\unskip\medskip\endMakeFramed} | ||||
\newenvironment{traceback}{% | ||||
\def\FrameCommand{\colorbox{traceback}}% | ||||
\MakeFramed{\advance\hsize-\width \FrameRestore}} | ||||
{\endMakeFramed} | ||||
% Use and configure listings package for nicely formatted code | ||||
\usepackage{listingsutf8} | ||||
\lstset{ | ||||
language=python, | ||||
inputencoding=utf8x, | ||||
extendedchars=\true, | ||||
aboveskip=\smallskipamount, | ||||
belowskip=\smallskipamount, | ||||
xleftmargin=2mm, | ||||
breaklines=true, | ||||
basicstyle=\small \ttfamily, | ||||
showstringspaces=false, | ||||
keywordstyle=\color{blue}\bfseries, | ||||
commentstyle=\color{myteal}, | ||||
stringstyle=\color{darkgreen}, | ||||
identifierstyle=\color{darkorange}, | ||||
columns=fullflexible, % tighter character kerning, like verb | ||||
} | ||||
% The hyperref package gives us a pdf with properly built | ||||
% internal navigation ('pdf bookmarks' for the table of contents, | ||||
% internal cross-reference links, web links for URLs, etc.) | ||||
\usepackage{hyperref} | ||||
\hypersetup{ | ||||
breaklinks=true, % so long urls are correctly broken across lines | ||||
colorlinks=true, | ||||
urlcolor=blue, | ||||
linkcolor=darkorange, | ||||
citecolor=darkgreen, | ||||
} | ||||
% hardcode size of all verbatim environments to be a bit smaller | ||||
\makeatletter | ||||
\g@addto@macro\@verbatim\small\topsep=0.5em\partopsep=0pt | ||||
\makeatother | ||||
% Prevent overflowing lines due to urls and other hard-to-break entities. | ||||
\sloppy | ||||
\begin{document} | ||||
\section{An Introduction to the Scientific Python Ecosystem} | ||||
While the Python language is an excellent tool for general-purpose | ||||
programming, with a highly readable syntax, rich and powerful data types | ||||
(strings, lists, sets, dictionaries, arbitrary length integers, etc) and | ||||
a very comprehensive standard library, it was not designed specifically | ||||
for mathematical and scientific computing. Neither the language nor its | ||||
standard library have facilities for the efficient representation of | ||||
multidimensional datasets, tools for linear algebra and general matrix | ||||
manipulations (an essential building block of virtually all technical | ||||
computing), nor any data visualization facilities. | ||||
In particular, Python lists are very flexible containers that can be | ||||
nested arbitrarily deep and which can hold any Python object in them, | ||||
but they are poorly suited to represent efficiently common mathematical | ||||
constructs like vectors and matrices. In contrast, much of our modern | ||||
heritage of scientific computing has been built on top of libraries | ||||
written in the Fortran language, which has native support for vectors | ||||
and matrices as well as a library of mathematical functions that can | ||||
efficiently operate on entire arrays at once. | ||||
\section{Scientific Python: a collaboration of projects built by scientists} | ||||
The scientific community has developed a set of related Python libraries | ||||
that provide powerful array facilities, linear algebra, numerical | ||||
algorithms, data visualization and more. In this appendix, we will | ||||
briefly outline the tools most frequently used for this purpose, that | ||||
make ``Scientific Python'' something far more powerful than the Python | ||||
language alone. | ||||
For reasons of space, we can only describe in some detail the central | ||||
Numpy library, but below we provide links to the websites of each | ||||
project where you can read their documentation in more detail. | ||||
First, let's look at an overview of the basic tools that most scientists | ||||
use in daily research with Python. The core of this ecosystem is | ||||
composed of: | ||||
\begin{itemize} | ||||
\item | ||||
Numpy: the basic library that most others depend on, it provides a | ||||
powerful array type that can represent multidmensional datasets of | ||||
many different kinds and that supports arithmetic operations. Numpy | ||||
also provides a library of common mathematical functions, basic linear | ||||
algebra, random number generation and Fast Fourier Transforms. Numpy | ||||
can be found at \href{http://numpy.scipy.org}{numpy.scipy.org} | ||||
\item | ||||
Scipy: a large collection of numerical algorithms that operate on | ||||
numpy arrays and provide facilities for many common tasks in | ||||
scientific computing, including dense and sparse linear algebra | ||||
support, optimization, special functions, statistics, n-dimensional | ||||
image processing, signal processing and more. Scipy can be found at | ||||
\href{http://scipy.org}{scipy.org}. | ||||
\item | ||||
Matplotlib: a data visualization library with a strong focus on | ||||
producing high-quality output, it supports a variety of common | ||||
scientific plot types in two and three dimensions, with precise | ||||
control over the final output and format for publication-quality | ||||
results. Matplotlib can also be controlled interactively allowing | ||||
graphical manipulation of your data (zooming, panning, etc) and can be | ||||
used with most modern user interface toolkits. It can be found at | ||||
\href{http://matplotlib.sf.net}{matplotlib.sf.net}. | ||||
\item | ||||
IPython: while not strictly scientific in nature, IPython is the | ||||
interactive environment in which many scientists spend their time. | ||||
IPython provides a powerful Python shell that integrates tightly with | ||||
Matplotlib and with easy access to the files and operating system, and | ||||
which can execute in a terminal or in a graphical Qt console. IPython | ||||
also has a web-based notebook interface that can combine code with | ||||
text, mathematical expressions, figures and multimedia. It can be | ||||
found at \href{http://ipython.org}{ipython.org}. | ||||
\end{itemize} | ||||
While each of these tools can be installed separately, in our opinion | ||||
the most convenient way today of accessing them (especially on Windows | ||||
and Mac computers) is to install the | ||||
\href{http://www.enthought.com/products/epd\_free.php}{Free Edition of | ||||
the Enthought Python Distribution} which contain all the above. Other | ||||
free alternatives on Windows (but not on Macs) are | ||||
\href{http://code.google.com/p/pythonxy}{Python(x,y)} and | ||||
\href{http://www.lfd.uci.edu/~gohlke/pythonlibs}{Christoph Gohlke's | ||||
packages page}. | ||||
These four `core' libraries are in practice complemented by a number of | ||||
other tools for more specialized work. We will briefly list here the | ||||
ones that we think are the most commonly needed: | ||||
\begin{itemize} | ||||
\item | ||||
Sympy: a symbolic manipulation tool that turns a Python session into a | ||||
computer algebra system. It integrates with the IPython notebook, | ||||
rendering results in properly typeset mathematical notation. | ||||
\href{http://sympy.org}{sympy.org}. | ||||
\item | ||||
Mayavi: sophisticated 3d data visualization; | ||||
\href{http://code.enthought.com/projects/mayavi}{code.enthought.com/projects/mayavi}. | ||||
\item | ||||
Cython: a bridge language between Python and C, useful both to | ||||
optimize performance bottlenecks in Python and to access C libraries | ||||
directly; \href{http://cython.org}{cython.org}. | ||||
\item | ||||
Pandas: high-performance data structures and data analysis tools, with | ||||
powerful data alignment and structural manipulation capabilities; | ||||
\href{http://pandas.pydata.org}{pandas.pydata.org}. | ||||
\item | ||||
Statsmodels: statistical data exploration and model estimation; | ||||
\href{http://statsmodels.sourceforge.net}{statsmodels.sourceforge.net}. | ||||
\item | ||||
Scikit-learn: general purpose machine learning algorithms with a | ||||
common interface; \href{http://scikit-learn.org}{scikit-learn.org}. | ||||
\item | ||||
Scikits-image: image processing toolbox; | ||||
\href{http://scikits-image.org}{scikits-image.org}. | ||||
\item | ||||
NetworkX: analysis of complex networks (in the graph theoretical | ||||
sense); \href{http://networkx.lanl.gov}{networkx.lanl.gov}. | ||||
\item | ||||
PyTables: management of hierarchical datasets using the | ||||
industry-standard HDF5 format; | ||||
\href{http://www.pytables.org}{www.pytables.org}. | ||||
\end{itemize} | ||||
Beyond these, for any specific problem you should look on the internet | ||||
first, before starting to write code from scratch. There's a good chance | ||||
that someone, somewhere, has written an open source library that you can | ||||
use for part or all of your problem. | ||||
\section{A note about the examples below} | ||||
In all subsequent examples, you will see blocks of input code, followed | ||||
by the results of the code if the code generated output. This output may | ||||
include text, graphics and other result objects. These blocks of input | ||||
can be pasted into your interactive IPython session or notebook for you | ||||
to execute. In the print version of this document, a thin vertical bar | ||||
on the left of the blocks of input and output shows which blocks go | ||||
together. | ||||
If you are reading this text as an actual IPython notebook, you can | ||||
press \texttt{Shift-Enter} or use the `play' button on the toolbar | ||||
(right-pointing triangle) to execute each block of code, known as a | ||||
`cell' in IPython: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
# This is a block of code, below you'll see its output | ||||
print "Welcome to the world of scientific computing with Python!" | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
Welcome to the world of scientific computing with Python! | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\section{Motivation: the trapezoidal rule} | ||||
In subsequent sections we'll provide a basic introduction to the nuts | ||||
and bolts of the basic scientific python tools; but we'll first motivate | ||||
it with a brief example that illustrates what you can do in a few lines | ||||
with these tools. For this, we will use the simple problem of | ||||
approximating a definite integral with the trapezoid rule: | ||||
\[ | ||||
\int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right). | ||||
\] | ||||
Our task will be to compute this formula for a function such as: | ||||
\[ | ||||
f(x) = (x-3)(x-5)(x-7)+85 | ||||
\] | ||||
integrated between $a=1$ and $b=9$. | ||||
First, we define the function and sample it evenly between 0 and 10 at | ||||
200 points: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
def f(x): | ||||
return (x-3)*(x-5)*(x-7)+85 | ||||
import numpy as np | ||||
x = np.linspace(0, 10, 200) | ||||
y = f(x) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\end{codecell} | ||||
We select $a$ and $b$, our integration limits, and we take only a few | ||||
points in that region to illustrate the error behavior of the trapezoid | ||||
approximation: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
a, b = 1, 9 | ||||
xint = x[logical_and(x>=a, x<=b)][::30] | ||||
yint = y[logical_and(x>=a, x<=b)][::30] | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\end{codecell} | ||||
Let's plot both the function and the area below it in the trapezoid | ||||
approximation: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
import matplotlib.pyplot as plt | ||||
plt.plot(x, y, lw=2) | ||||
plt.axis([0, 10, 0, 140]) | ||||
plt.fill_between(xint, 0, yint, facecolor='gray', alpha=0.4) | ||||
plt.text(0.5 * (a + b), 30,r"$\int_a^b f(x)dx$", horizontalalignment='center', fontsize=20); | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{center} | ||||
\includegraphics[width=0.7\textwidth]{_fig_13.pdf} | ||||
\par | ||||
\end{center} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Compute the integral both at high accuracy and with the trapezoid | ||||
approximation | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
from scipy.integrate import quad, trapz | ||||
integral, error = quad(f, 1, 9) | ||||
trap_integral = trapz(yint, xint) | ||||
print "The integral is: %g +/- %.1e" % (integral, error) | ||||
print "The trapezoid approximation with", len(xint), "points is:", trap_integral | ||||
print "The absolute error is:", abs(integral - trap_integral) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
The integral is: 680 +/- 7.5e-12 | ||||
The trapezoid approximation with 6 points is: 621.286411141 | ||||
The absolute error is: 58.7135888589 | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
This simple example showed us how, combining the numpy, scipy and | ||||
matplotlib libraries we can provide an illustration of a standard method | ||||
in elementary calculus with just a few lines of code. We will now | ||||
discuss with more detail the basic usage of these tools. | ||||
\section{NumPy arrays: the right data structure for scientific computing} | ||||
\section{Basics of Numpy arrays} | ||||
We now turn our attention to the Numpy library, which forms the base | ||||
layer for the entire `scipy ecosystem'. Once you have installed numpy, | ||||
you can import it as | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
import numpy | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\end{codecell} | ||||
though in this book we will use the common shorthand | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
import numpy as np | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\end{codecell} | ||||
As mentioned above, the main object provided by numpy is a powerful | ||||
array. We'll start by exploring how the numpy array differs from Python | ||||
lists. We start by creating a simple list and an array with the same | ||||
contents of the list: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
lst = [10, 20, 30, 40] | ||||
arr = np.array([10, 20, 30, 40]) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\end{codecell} | ||||
Elements of a one-dimensional array are accessed with the same syntax as | ||||
a list: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
lst[0] | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
10 | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr[0] | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
10 | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr[-1] | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
40 | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr[2:] | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([30, 40]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
The first difference to note between lists and arrays is that arrays are | ||||
\emph{homogeneous}; i.e.~all elements of an array must be of the same | ||||
type. In contrast, lists can contain elements of arbitrary type. For | ||||
example, we can change the last element in our list above to be a | ||||
string: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
lst[-1] = 'a string inside a list' | ||||
lst | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
[10, 20, 30, 'a string inside a list'] | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
but the same can not be done with an array, as we get an error message: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr[-1] = 'a string inside an array' | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
--------------------------------------------------------------------------- | ||||
ValueError Traceback (most recent call last) | ||||
/home/fperez/teach/book-math-labtool/<ipython-input-13-29c0bfa5fa8a> in <module>() | ||||
----> 1 arr[-1] = 'a string inside an array' | ||||
ValueError: invalid literal for long() with base 10: 'a string inside an array' | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
The information about the type of an array is contained in its | ||||
\emph{dtype} attribute: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr.dtype | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
dtype('int32') | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Once an array has been created, its dtype is fixed and it can only store | ||||
elements of the same type. For this example where the dtype is integer, | ||||
if we store a floating point number it will be automatically converted | ||||
into an integer: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr[-1] = 1.234 | ||||
arr | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([10, 20, 30, 1]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Above we created an array from an existing list; now let us now see | ||||
other ways in which we can create arrays, which we'll illustrate next. A | ||||
common need is to have an array initialized with a constant value, and | ||||
very often this value is 0 or 1 (suitable as starting value for additive | ||||
and multiplicative loops respectively); \texttt{zeros} creates arrays of | ||||
all zeros, with any desired dtype: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
np.zeros(5, float) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([ 0., 0., 0., 0., 0.]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
np.zeros(3, int) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([0, 0, 0]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
np.zeros(3, complex) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([ 0.+0.j, 0.+0.j, 0.+0.j]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
and similarly for \texttt{ones}: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print '5 ones:', np.ones(5) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
5 ones: [ 1. 1. 1. 1. 1.] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
If we want an array initialized with an arbitrary value, we can create | ||||
an empty array and then use the fill method to put the value we want | ||||
into the array: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
a = empty(4) | ||||
a.fill(5.5) | ||||
a | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([ 5.5, 5.5, 5.5, 5.5]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Numpy also offers the \texttt{arange} function, which works like the | ||||
builtin \texttt{range} but returns an array instead of a list: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
np.arange(5) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([0, 1, 2, 3, 4]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
and the \texttt{linspace} and \texttt{logspace} functions to create | ||||
linearly and logarithmically-spaced grids respectively, with a fixed | ||||
number of points and including both ends of the specified interval: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print "A linear grid between 0 and 1:", np.linspace(0, 1, 5) | ||||
print "A logarithmic grid between 10**1 and 10**4: ", np.logspace(1, 4, 4) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
A linear grid between 0 and 1: [ 0. 0.25 0.5 0.75 1. ] | ||||
A logarithmic grid between 10**1 and 10**4: [ 10. 100. 1000. 10000.] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Finally, it is often useful to create arrays with random numbers that | ||||
follow a specific distribution. The \texttt{np.random} module contains a | ||||
number of functions that can be used to this effect, for example this | ||||
will produce an array of 5 random samples taken from a standard normal | ||||
distribution (0 mean and variance 1): | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
np.random.randn(5) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([-0.08633343, -0.67375434, 1.00589536, 0.87081651, 1.65597822]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
whereas this will also give 5 samples, but from a normal distribution | ||||
with a mean of 10 and a variance of 3: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
norm10 = np.random.normal(10, 3, 5) | ||||
norm10 | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([ 8.94879575, 5.53038269, 8.24847281, 12.14944165, 11.56209294]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\section{Indexing with other arrays} | ||||
Above we saw how to index arrays with single numbers and slices, just | ||||
like Python lists. But arrays allow for a more sophisticated kind of | ||||
indexing which is very powerful: you can index an array with another | ||||
array, and in particular with an array of boolean values. This is | ||||
particluarly useful to extract information from an array that matches a | ||||
certain condition. | ||||
Consider for example that in the array \texttt{norm10} we want to | ||||
replace all values above 9 with the value 0. We can do so by first | ||||
finding the \emph{mask} that indicates where this condition is true or | ||||
false: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
mask = norm10 > 9 | ||||
mask | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([False, False, False, True, True], dtype=bool) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Now that we have this mask, we can use it to either read those values or | ||||
to reset them to 0: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print 'Values above 9:', norm10[mask] | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
Values above 9: [ 12.14944165 11.56209294] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print 'Resetting all values above 9 to 0...' | ||||
norm10[mask] = 0 | ||||
print norm10 | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
Resetting all values above 9 to 0... | ||||
[ 8.94879575 5.53038269 8.24847281 0. 0. ] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\section{Arrays with more than one dimension} | ||||
Up until now all our examples have used one-dimensional arrays. But | ||||
Numpy can create arrays of aribtrary dimensions, and all the methods | ||||
illustrated in the previous section work with more than one dimension. | ||||
For example, a list of lists can be used to initialize a two dimensional | ||||
array: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
lst2 = [[1, 2], [3, 4]] | ||||
arr2 = np.array([[1, 2], [3, 4]]) | ||||
arr2 | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([[1, 2], | ||||
[3, 4]]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
With two-dimensional arrays we start seeing the power of numpy: while a | ||||
nested list can be indexed using repeatedly the \texttt{{[} {]}} | ||||
operator, multidimensional arrays support a much more natural indexing | ||||
syntax with a single \texttt{{[} {]}} and a set of indices separated by | ||||
commas: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print lst2[0][1] | ||||
print arr2[0,1] | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
2 | ||||
2 | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Most of the array creation functions listed above can be used with more | ||||
than one dimension, for example: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
np.zeros((2,3)) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([[ 0., 0., 0.], | ||||
[ 0., 0., 0.]]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
np.random.normal(10, 3, (2, 4)) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([[ 11.26788826, 4.29619866, 11.09346496, 9.73861307], | ||||
[ 10.54025996, 9.5146268 , 10.80367214, 13.62204505]]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
In fact, the shape of an array can be changed at any time, as long as | ||||
the total number of elements is unchanged. For example, if we want a 2x4 | ||||
array with numbers increasing from 0, the easiest way to create it is: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr = np.arange(8).reshape(2,4) | ||||
print arr | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
[[0 1 2 3] | ||||
[4 5 6 7]] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
With multidimensional arrays, you can also use slices, and you can mix | ||||
and match slices and single indices in the different dimensions (using | ||||
the same array as above): | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print 'Slicing in the second row:', arr[1, 2:4] | ||||
print 'All rows, third column :', arr[:, 2] | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
Slicing in the second row: [6 7] | ||||
All rows, third column : [2 6] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
If you only provide one index, then you will get an array with one less | ||||
dimension containing that row: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print 'First row: ', arr[0] | ||||
print 'Second row: ', arr[1] | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
First row: [0 1 2 3] | ||||
Second row: [4 5 6 7] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Now that we have seen how to create arrays with more than one dimension, | ||||
it's a good idea to look at some of the most useful properties and | ||||
methods that arrays have. The following provide basic information about | ||||
the size, shape and data in the array: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print 'Data type :', arr.dtype | ||||
print 'Total number of elements :', arr.size | ||||
print 'Number of dimensions :', arr.ndim | ||||
print 'Shape (dimensionality) :', arr.shape | ||||
print 'Memory used (in bytes) :', arr.nbytes | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
Data type : int32 | ||||
Total number of elements : 8 | ||||
Number of dimensions : 2 | ||||
Shape (dimensionality) : (2, 4) | ||||
Memory used (in bytes) : 32 | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Arrays also have many useful methods, some especially useful ones are: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print 'Minimum and maximum :', arr.min(), arr.max() | ||||
print 'Sum and product of all elements :', arr.sum(), arr.prod() | ||||
print 'Mean and standard deviation :', arr.mean(), arr.std() | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
Minimum and maximum : 0 7 | ||||
Sum and product of all elements : 28 0 | ||||
Mean and standard deviation : 3.5 2.29128784748 | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
For these methods, the above operations area all computed on all the | ||||
elements of the array. But for a multidimensional array, it's possible | ||||
to do the computation along a single dimension, by passing the | ||||
\texttt{axis} parameter; for example: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print 'For the following array:\n', arr | ||||
print 'The sum of elements along the rows is :', arr.sum(axis=1) | ||||
print 'The sum of elements along the columns is :', arr.sum(axis=0) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
For the following array: | ||||
[[0 1 2 3] | ||||
[4 5 6 7]] | ||||
The sum of elements along the rows is : [ 6 22] | ||||
The sum of elements along the columns is : [ 4 6 8 10] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
As you can see in this example, the value of the \texttt{axis} parameter | ||||
is the dimension which will be \emph{consumed} once the operation has | ||||
been carried out. This is why to sum along the rows we use | ||||
\texttt{axis=0}. | ||||
This can be easily illustrated with an example that has more dimensions; | ||||
we create an array with 4 dimensions and shape \texttt{(3,4,5,6)} and | ||||
sum along the axis number 2 (i.e.~the \emph{third} axis, since in Python | ||||
all counts are 0-based). That consumes the dimension whose length was 5, | ||||
leaving us with a new array that has shape \texttt{(3,4,6)}: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
np.zeros((3,4,5,6)).sum(2).shape | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
(3, 4, 6) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Another widely used property of arrays is the \texttt{.T} attribute, | ||||
which allows you to access the transpose of the array: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print 'Array:\n', arr | ||||
print 'Transpose:\n', arr.T | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
Array: | ||||
[[0 1 2 3] | ||||
[4 5 6 7]] | ||||
Transpose: | ||||
[[0 4] | ||||
[1 5] | ||||
[2 6] | ||||
[3 7]] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
We don't have time here to look at all the methods and properties of | ||||
arrays, here's a complete list. Simply try exploring some of these | ||||
IPython to learn more, or read their description in the full Numpy | ||||
documentation: | ||||
\begin{verbatim} | ||||
arr.T arr.copy arr.getfield arr.put arr.squeeze | ||||
arr.all arr.ctypes arr.imag arr.ravel arr.std | ||||
arr.any arr.cumprod arr.item arr.real arr.strides | ||||
arr.argmax arr.cumsum arr.itemset arr.repeat arr.sum | ||||
arr.argmin arr.data arr.itemsize arr.reshape arr.swapaxes | ||||
arr.argsort arr.diagonal arr.max arr.resize arr.take | ||||
arr.astype arr.dot arr.mean arr.round arr.tofile | ||||
arr.base arr.dtype arr.min arr.searchsorted arr.tolist | ||||
arr.byteswap arr.dump arr.nbytes arr.setasflat arr.tostring | ||||
arr.choose arr.dumps arr.ndim arr.setfield arr.trace | ||||
arr.clip arr.fill arr.newbyteorder arr.setflags arr.transpose | ||||
arr.compress arr.flags arr.nonzero arr.shape arr.var | ||||
arr.conj arr.flat arr.prod arr.size arr.view | ||||
arr.conjugate arr.flatten arr.ptp arr.sort | ||||
\end{verbatim} | ||||
\section{Operating with arrays} | ||||
Arrays support all regular arithmetic operators, and the numpy library | ||||
also contains a complete collection of basic mathematical functions that | ||||
operate on arrays. It is important to remember that in general, all | ||||
operations with arrays are applied \emph{element-wise}, i.e., are | ||||
applied to all the elements of the array at the same time. Consider for | ||||
example: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr1 = np.arange(4) | ||||
arr2 = np.arange(10, 14) | ||||
print arr1, '+', arr2, '=', arr1+arr2 | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
[0 1 2 3] + [10 11 12 13] = [10 12 14 16] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Importantly, you must remember that even the multiplication operator is | ||||
by default applied element-wise, it is \emph{not} the matrix | ||||
multiplication from linear algebra (as is the case in Matlab, for | ||||
example): | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print arr1, '*', arr2, '=', arr1*arr2 | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
[0 1 2 3] * [10 11 12 13] = [ 0 11 24 39] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
While this means that in principle arrays must always match in their | ||||
dimensionality in order for an operation to be valid, numpy will | ||||
\emph{broadcast} dimensions when possible. For example, suppose that you | ||||
want to add the number 1.5 to \texttt{arr1}; the following would be a | ||||
valid way to do it: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr1 + 1.5*np.ones(4) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([ 1.5, 2.5, 3.5, 4.5]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
But thanks to numpy's broadcasting rules, the following is equally | ||||
valid: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr1 + 1.5 | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([ 1.5, 2.5, 3.5, 4.5]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
In this case, numpy looked at both operands and saw that the first | ||||
(\texttt{arr1}) was a one-dimensional array of length 4 and the second | ||||
was a scalar, considered a zero-dimensional object. The broadcasting | ||||
rules allow numpy to: | ||||
\begin{itemize} | ||||
\item | ||||
\emph{create} new dimensions of length 1 (since this doesn't change | ||||
the size of the array) | ||||
\item | ||||
`stretch' a dimension of length 1 that needs to be matched to a | ||||
dimension of a different size. | ||||
\end{itemize} | ||||
So in the above example, the scalar 1.5 is effectively: | ||||
\begin{itemize} | ||||
\item | ||||
first `promoted' to a 1-dimensional array of length 1 | ||||
\item | ||||
then, this array is `stretched' to length 4 to match the dimension of | ||||
\texttt{arr1}. | ||||
\end{itemize} | ||||
After these two operations are complete, the addition can proceed as now | ||||
both operands are one-dimensional arrays of length 4. | ||||
This broadcasting behavior is in practice enormously powerful, | ||||
especially because when numpy broadcasts to create new dimensions or to | ||||
`stretch' existing ones, it doesn't actually replicate the data. In the | ||||
example above the operation is carried \emph{as if} the 1.5 was a 1-d | ||||
array with 1.5 in all of its entries, but no actual array was ever | ||||
created. This can save lots of memory in cases when the arrays in | ||||
question are large and can have significant performance implications. | ||||
The general rule is: when operating on two arrays, NumPy compares their | ||||
shapes element-wise. It starts with the trailing dimensions, and works | ||||
its way forward, creating dimensions of length 1 as needed. Two | ||||
dimensions are considered compatible when | ||||
\begin{itemize} | ||||
\item | ||||
they are equal to begin with, or | ||||
\item | ||||
one of them is 1; in this case numpy will do the `stretching' to make | ||||
them equal. | ||||
\end{itemize} | ||||
If these conditions are not met, a | ||||
\texttt{ValueError: frames are not aligned} exception is thrown, | ||||
indicating that the arrays have incompatible shapes. The size of the | ||||
resulting array is the maximum size along each dimension of the input | ||||
arrays. | ||||
This shows how the broadcasting rules work in several dimensions: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
b = np.array([2, 3, 4, 5]) | ||||
print arr, '\n\n+', b , '\n----------------\n', arr + b | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
[[0 1 2 3] | ||||
[4 5 6 7]] | ||||
+ [2 3 4 5] | ||||
---------------- | ||||
[[ 2 4 6 8] | ||||
[ 6 8 10 12]] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Now, how could you use broadcasting to say add \texttt{{[}4, 6{]}} along | ||||
the rows to \texttt{arr} above? Simply performing the direct addition | ||||
will produce the error we previously mentioned: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
c = np.array([4, 6]) | ||||
arr + c | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
--------------------------------------------------------------------------- | ||||
ValueError Traceback (most recent call last) | ||||
/home/fperez/teach/book-math-labtool/<ipython-input-45-62aa20ac1980> in <module>() | ||||
1 c = np.array([4, 6]) | ||||
----> 2 arr + c | ||||
ValueError: operands could not be broadcast together with shapes (2,4) (2) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
According to the rules above, the array \texttt{c} would need to have a | ||||
\emph{trailing} dimension of 1 for the broadcasting to work. It turns | ||||
out that numpy allows you to `inject' new dimensions anywhere into an | ||||
array on the fly, by indexing it with the special object | ||||
\texttt{np.newaxis}: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
(c[:, np.newaxis]).shape | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
(2, 1) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
This is exactly what we need, and indeed it works: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr + c[:, np.newaxis] | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
array([[ 4, 5, 6, 7], | ||||
[10, 11, 12, 13]]) | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
For the full broadcasting rules, please see the official Numpy docs, | ||||
which describe them in detail and with more complex examples. | ||||
As we mentioned before, Numpy ships with a full complement of | ||||
mathematical functions that work on entire arrays, including logarithms, | ||||
exponentials, trigonometric and hyperbolic trigonometric functions, etc. | ||||
Furthermore, scipy ships a rich special function library in the | ||||
\texttt{scipy.special} module that includes Bessel, Airy, Fresnel, | ||||
Laguerre and other classical special functions. For example, sampling | ||||
the sine function at 100 points between $0$ and $2\pi$ is as simple as: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
x = np.linspace(0, 2*np.pi, 100) | ||||
y = np.sin(x) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\end{codecell} | ||||
\section{Linear algebra in numpy} | ||||
Numpy ships with a basic linear algebra library, and all arrays have a | ||||
\texttt{dot} method whose behavior is that of the scalar dot product | ||||
when its arguments are vectors (one-dimensional arrays) and the | ||||
traditional matrix multiplication when one or both of its arguments are | ||||
two-dimensional arrays: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
v1 = np.array([2, 3, 4]) | ||||
v2 = np.array([1, 0, 1]) | ||||
print v1, '.', v2, '=', v1.dot(v2) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
[2 3 4] . [1 0 1] = 6 | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Here is a regular matrix-vector multiplication, note that the array | ||||
\texttt{v1} should be viewed as a \emph{column} vector in traditional | ||||
linear algebra notation; numpy makes no distinction between row and | ||||
column vectors and simply verifies that the dimensions match the | ||||
required rules of matrix multiplication, in this case we have a | ||||
$2 \times 3$ matrix multiplied by a 3-vector, which produces a 2-vector: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
A = np.arange(6).reshape(2, 3) | ||||
print A, 'x', v1, '=', A.dot(v1) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
[[0 1 2] | ||||
[3 4 5]] x [2 3 4] = [11 38] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
For matrix-matrix multiplication, the same dimension-matching rules must | ||||
be satisfied, e.g.~consider the difference between $A \times A^T$: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print A.dot(A.T) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
[[ 5 14] | ||||
[14 50]] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
and $A^T \times A$: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print A.T.dot(A) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
[[ 9 12 15] | ||||
[12 17 22] | ||||
[15 22 29]] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Furthermore, the \texttt{numpy.linalg} module includes additional | ||||
functionality such as determinants, matrix norms, Cholesky, eigenvalue | ||||
and singular value decompositions, etc. For even more linear algebra | ||||
tools, \texttt{scipy.linalg} contains the majority of the tools in the | ||||
classic LAPACK libraries as well as functions to operate on sparse | ||||
matrices. We refer the reader to the Numpy and Scipy documentations for | ||||
additional details on these. | ||||
\section{Reading and writing arrays to disk} | ||||
Numpy lets you read and write arrays into files in a number of ways. In | ||||
order to use these tools well, it is critical to understand the | ||||
difference between a \emph{text} and a \emph{binary} file containing | ||||
numerical data. In a text file, the number $\pi$ could be written as | ||||
``3.141592653589793'', for example: a string of digits that a human can | ||||
read, with in this case 15 decimal digits. In contrast, that same number | ||||
written to a binary file would be encoded as 8 characters (bytes) that | ||||
are not readable by a human but which contain the exact same data that | ||||
the variable \texttt{pi} had in the computer's memory. | ||||
The tradeoffs between the two modes are thus: | ||||
\begin{itemize} | ||||
\item | ||||
Text mode: occupies more space, precision can be lost (if not all | ||||
digits are written to disk), but is readable and editable by hand with | ||||
a text editor. Can \emph{only} be used for one- and two-dimensional | ||||
arrays. | ||||
\item | ||||
Binary mode: compact and exact representation of the data in memory, | ||||
can't be read or edited by hand. Arrays of any size and dimensionality | ||||
can be saved and read without loss of information. | ||||
\end{itemize} | ||||
First, let's see how to read and write arrays in text mode. The | ||||
\texttt{np.savetxt} function saves an array to a text file, with options | ||||
to control the precision, separators and even adding a header: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr = np.arange(10).reshape(2, 5) | ||||
np.savetxt('test.out', arr, fmt='%.2e', header="My dataset") | ||||
!cat test.out | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
# My dataset | ||||
0.00e+00 1.00e+00 2.00e+00 3.00e+00 4.00e+00 | ||||
5.00e+00 6.00e+00 7.00e+00 8.00e+00 9.00e+00 | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
And this same type of file can then be read with the matching | ||||
\texttt{np.loadtxt} function: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
arr2 = np.loadtxt('test.out') | ||||
print arr2 | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
[[ 0. 1. 2. 3. 4.] | ||||
[ 5. 6. 7. 8. 9.]] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
For binary data, Numpy provides the \texttt{np.save} and | ||||
\texttt{np.savez} routines. The first saves a single array to a file | ||||
with \texttt{.npy} extension, while the latter can be used to save a | ||||
\emph{group} of arrays into a single file with \texttt{.npz} extension. | ||||
The files created with these routines can then be read with the | ||||
\texttt{np.load} function. | ||||
Let us first see how to use the simpler \texttt{np.save} function to | ||||
save a single array: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
np.save('test.npy', arr2) | ||||
# Now we read this back | ||||
arr2n = np.load('test.npy') | ||||
# Let's see if any element is non-zero in the difference. | ||||
# A value of True would be a problem. | ||||
print 'Any differences?', np.any(arr2-arr2n) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
Any differences? False | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Now let us see how the \texttt{np.savez} function works. You give it a | ||||
filename and either a sequence of arrays or a set of keywords. In the | ||||
first mode, the function will auotmatically name the saved arrays in the | ||||
archive as \texttt{arr\_0}, \texttt{arr\_1}, etc: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
np.savez('test.npz', arr, arr2) | ||||
arrays = np.load('test.npz') | ||||
arrays.files | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
['arr_1', 'arr_0'] | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
Alternatively, we can explicitly choose how to name the arrays we save: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
np.savez('test.npz', array1=arr, array2=arr2) | ||||
arrays = np.load('test.npz') | ||||
arrays.files | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
['array2', 'array1'] | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
The object returned by \texttt{np.load} from an \texttt{.npz} file works | ||||
like a dictionary, though you can also access its constituent files by | ||||
attribute using its special \texttt{.f} field; this is best illustrated | ||||
with an example with the \texttt{arrays} object from above: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
print 'First row of first array:', arrays['array1'][0] | ||||
# This is an equivalent way to get the same field | ||||
print 'First row of first array:', arrays.f.array1[0] | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
First row of first array: [0 1 2 3 4] | ||||
First row of first array: [0 1 2 3 4] | ||||
\end{verbatim} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
This \texttt{.npz} format is a very convenient way to package compactly | ||||
and without loss of information, into a single file, a group of related | ||||
arrays that pertain to a specific problem. At some point, however, the | ||||
complexity of your dataset may be such that the optimal approach is to | ||||
use one of the standard formats in scientific data processing that have | ||||
been designed to handle complex datasets, such as NetCDF or HDF5. | ||||
Fortunately, there are tools for manipulating these formats in Python, | ||||
and for storing data in other ways such as databases. A complete | ||||
discussion of the possibilities is beyond the scope of this discussion, | ||||
but of particular interest for scientific users we at least mention the | ||||
following: | ||||
\begin{itemize} | ||||
\item | ||||
The \texttt{scipy.io} module contains routines to read and write | ||||
Matlab files in \texttt{.mat} format and files in the NetCDF format | ||||
that is widely used in certain scientific disciplines. | ||||
\item | ||||
For manipulating files in the HDF5 format, there are two excellent | ||||
options in Python: The PyTables project offers a high-level, object | ||||
oriented approach to manipulating HDF5 datasets, while the h5py | ||||
project offers a more direct mapping to the standard HDF5 library | ||||
interface. Both are excellent tools; if you need to work with HDF5 | ||||
datasets you should read some of their documentation and examples and | ||||
decide which approach is a better match for your needs. | ||||
\end{itemize} | ||||
\section{High quality data visualization with Matplotlib} | ||||
The \href{http://matplotlib.sf.net}{matplotlib} library is a powerful | ||||
tool capable of producing complex publication-quality figures with fine | ||||
layout control in two and three dimensions; here we will only provide a | ||||
minimal self-contained introduction to its usage that covers the | ||||
functionality needed for the rest of the book. We encourage the reader | ||||
to read the tutorials included with the matplotlib documentation as well | ||||
as to browse its extensive gallery of examples that include source code. | ||||
Just as we typically use the shorthand \texttt{np} for Numpy, we will | ||||
use \texttt{plt} for the \texttt{matplotlib.pyplot} module where the | ||||
easy-to-use plotting functions reside (the library contains a rich | ||||
object-oriented architecture that we don't have the space to discuss | ||||
here): | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
import matplotlib.pyplot as plt | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\end{codecell} | ||||
The most frequently used function is simply called \texttt{plot}, here | ||||
is how you can make a simple plot of $\sin(x)$ for $x \in [0, 2\pi]$ | ||||
with labels and a grid (we use the semicolon in the last line to | ||||
suppress the display of some information that is unnecessary right now): | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
x = np.linspace(0, 2*np.pi) | ||||
y = np.sin(x) | ||||
plt.plot(x,y, label='sin(x)') | ||||
plt.legend() | ||||
plt.grid() | ||||
plt.title('Harmonic') | ||||
plt.xlabel('x') | ||||
plt.ylabel('y'); | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{center} | ||||
\includegraphics[width=0.7\textwidth]{_fig_134.pdf} | ||||
\par | ||||
\end{center} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
You can control the style, color and other properties of the markers, | ||||
for example: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
plt.plot(x, y, linewidth=2); | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{center} | ||||
\includegraphics[width=0.7\textwidth]{_fig_136.pdf} | ||||
\par | ||||
\end{center} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
plt.plot(x, y, 'o', markersize=5, color='r'); | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{center} | ||||
\includegraphics[width=0.7\textwidth]{_fig_137.pdf} | ||||
\par | ||||
\end{center} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
We will now see how to create a few other common plot types, such as a | ||||
simple error plot: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
# example data | ||||
x = np.arange(0.1, 4, 0.5) | ||||
y = np.exp(-x) | ||||
# example variable error bar values | ||||
yerr = 0.1 + 0.2*np.sqrt(x) | ||||
xerr = 0.1 + yerr | ||||
# First illustrate basic pyplot interface, using defaults where possible. | ||||
plt.figure() | ||||
plt.errorbar(x, y, xerr=0.2, yerr=0.4) | ||||
plt.title("Simplest errorbars, 0.2 in x, 0.4 in y"); | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{center} | ||||
\includegraphics[width=0.7\textwidth]{_fig_139.pdf} | ||||
\par | ||||
\end{center} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
A simple log plot | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
x = np.linspace(-5, 5) | ||||
y = np.exp(-x**2) | ||||
plt.semilogy(x, y); | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{center} | ||||
\includegraphics[width=0.7\textwidth]{_fig_141.pdf} | ||||
\par | ||||
\end{center} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
A histogram annotated with text inside the plot, using the \texttt{text} | ||||
function: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
mu, sigma = 100, 15 | ||||
x = mu + sigma * np.random.randn(10000) | ||||
# the histogram of the data | ||||
n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75) | ||||
plt.xlabel('Smarts') | ||||
plt.ylabel('Probability') | ||||
plt.title('Histogram of IQ') | ||||
# This will put a text fragment at the position given: | ||||
plt.text(55, .027, r'$\mu=100,\ \sigma=15$', fontsize=14) | ||||
plt.axis([40, 160, 0, 0.03]) | ||||
plt.grid(True) | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{center} | ||||
\includegraphics[width=0.7\textwidth]{_fig_143.pdf} | ||||
\par | ||||
\end{center} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\section{Image display} | ||||
The \texttt{imshow} command can display single or multi-channel images. | ||||
A simple array of random numbers, plotted in grayscale: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
from matplotlib import cm | ||||
plt.imshow(np.random.rand(5, 10), cmap=cm.gray, interpolation='nearest'); | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{center} | ||||
\includegraphics[width=0.7\textwidth]{_fig_146.pdf} | ||||
\par | ||||
\end{center} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
A real photograph is a multichannel image, \texttt{imshow} interprets it | ||||
correctly: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
img = plt.imread('stinkbug.png') | ||||
print 'Dimensions of the array img:', img.shape | ||||
plt.imshow(img); | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{verbatim} | ||||
Dimensions of the array img: (375, 500, 3) | ||||
\end{verbatim} | ||||
\begin{center} | ||||
\includegraphics[width=0.7\textwidth]{_fig_148.pdf} | ||||
\par | ||||
\end{center} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\section{Simple 3d plotting with matplotlib} | ||||
Note that you must execute at least once in your session: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
from mpl_toolkits.mplot3d import Axes3D | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\end{codecell} | ||||
One this has been done, you can create 3d axes with the | ||||
\texttt{projection='3d'} keyword to \texttt{add\_subplot}: | ||||
\begin{verbatim} | ||||
fig = plt.figure() | ||||
fig.add_subplot(<other arguments here>, projection='3d') | ||||
\end{verbatim} | ||||
A simple surface plot: | ||||
\begin{codecell} | ||||
\begin{codeinput} | ||||
\begin{lstlisting} | ||||
from mpl_toolkits.mplot3d.axes3d import Axes3D | ||||
from matplotlib import cm | ||||
fig = plt.figure() | ||||
ax = fig.add_subplot(1, 1, 1, projection='3d') | ||||
X = np.arange(-5, 5, 0.25) | ||||
Y = np.arange(-5, 5, 0.25) | ||||
X, Y = np.meshgrid(X, Y) | ||||
R = np.sqrt(X**2 + Y**2) | ||||
Z = np.sin(R) | ||||
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet, | ||||
linewidth=0, antialiased=False) | ||||
ax.set_zlim3d(-1.01, 1.01); | ||||
\end{lstlisting} | ||||
\end{codeinput} | ||||
\begin{codeoutput} | ||||
\begin{center} | ||||
\includegraphics[width=0.7\textwidth]{_fig_154.pdf} | ||||
\par | ||||
\end{center} | ||||
\end{codeoutput} | ||||
\end{codecell} | ||||
\section{IPython: a powerful interactive environment} | ||||
A key component of the everyday workflow of most scientific computing | ||||
environments is a good interactive environment, that is, a system in | ||||
which you can execute small amounts of code and view the results | ||||
immediately, combining both printing out data and opening graphical | ||||
visualizations. All modern systems for scientific computing, commercial | ||||
and open source, include such functionality. | ||||
Out of the box, Python also offers a simple interactive shell with very | ||||
limited capabilities. But just like the scientific community built Numpy | ||||
to provide arrays suited for scientific work (since Pytyhon's lists | ||||
aren't optimal for this task), it has also developed an interactive | ||||
environment much more sophisticated than the built-in one. The | ||||
\href{http://ipython.org}{IPython project} offers a set of tools to make | ||||
productive use of the Python language, all the while working | ||||
interactively and with immedate feedback on your results. The basic | ||||
tools that IPython provides are: | ||||
\begin{enumerate}[1.] | ||||
\item | ||||
A powerful terminal shell, with many features designed to increase the | ||||
fluidity and productivity of everyday scientific workflows, including: | ||||
\begin{itemize} | ||||
\item | ||||
rich introspection of all objects and variables including easy | ||||
access to the source code of any function | ||||
\item | ||||
powerful and extensible tab completion of variables and filenames, | ||||
\item | ||||
tight integration with matplotlib, supporting interactive figures | ||||
that don't block the terminal, | ||||
\item | ||||
direct access to the filesystem and underlying operating system, | ||||
\item | ||||
an extensible system for shell-like commands called `magics' that | ||||
reduce the work needed to perform many common tasks, | ||||
\item | ||||
tools for easily running, timing, profiling and debugging your | ||||
codes, | ||||
\item | ||||
syntax highlighted error messages with much more detail than the | ||||
default Python ones, | ||||
\item | ||||
logging and access to all previous history of inputs, including | ||||
across sessions | ||||
\end{itemize} | ||||
\item | ||||
A Qt console that provides the look and feel of a terminal, but adds | ||||
support for inline figures, graphical calltips, a persistent session | ||||
that can survive crashes (even segfaults) of the kernel process, and | ||||
more. | ||||
\item | ||||
A web-based notebook that can execute code and also contain rich text | ||||
and figures, mathematical equations and arbitrary HTML. This notebook | ||||
presents a document-like view with cells where code is executed but | ||||
that can be edited in-place, reordered, mixed with explanatory text | ||||
and figures, etc. | ||||
\item | ||||
A high-performance, low-latency system for parallel computing that | ||||
supports the control of a cluster of IPython engines communicating | ||||
over a network, with optimizations that minimize unnecessary copying | ||||
of large objects (especially numpy arrays). | ||||
\end{enumerate} | ||||
We will now discuss the highlights of the tools 1-3 above so that you | ||||
can make them an effective part of your workflow. The topic of parallel | ||||
computing is beyond the scope of this document, but we encourage you to | ||||
read the extensive | ||||
\href{http://ipython.org/ipython-doc/rel-0.12.1/parallel/index.html}{documentation} | ||||
and \href{http://minrk.github.com/scipy-tutorial-2011/}{tutorials} on | ||||
this available on the IPython website. | ||||
\section{The IPython terminal} | ||||
You can start IPython at the terminal simply by typing: | ||||
\begin{verbatim} | ||||
$ ipython | ||||
\end{verbatim} | ||||
which will provide you some basic information about how to get started | ||||
and will then open a prompt labeled \texttt{In {[}1{]}:} for you to | ||||
start typing. Here we type $2^{64}$ and Python computes the result for | ||||
us in exact arithmetic, returning it as \texttt{Out{[}1{]}}: | ||||
\begin{verbatim} | ||||
$ ipython | ||||
Python 2.7.2+ (default, Oct 4 2011, 20:03:08) | ||||
Type "copyright", "credits" or "license" for more information. | ||||
IPython 0.13.dev -- An enhanced Interactive Python. | ||||
? -> Introduction and overview of IPython's features. | ||||
%quickref -> Quick reference. | ||||
help -> Python's own help system. | ||||
object? -> Details about 'object', use 'object??' for extra details. | ||||
In [1]: 2**64 | ||||
Out[1]: 18446744073709551616L | ||||
\end{verbatim} | ||||
The first thing you should know about IPython is that all your inputs | ||||
and outputs are saved. There are two variables named \texttt{In} and | ||||
\texttt{Out} which are filled as you work with your results. | ||||
Furthermore, all outputs are also saved to auto-created variables of the | ||||
form \texttt{\_NN} where \texttt{NN} is the prompt number, and inputs to | ||||
\texttt{\_iNN}. This allows you to recover quickly the result of a prior | ||||
computation by referring to its number even if you forgot to store it as | ||||
a variable. For example, later on in the above session you can do: | ||||
\begin{verbatim} | ||||
In [6]: print _1 | ||||
18446744073709551616 | ||||
\end{verbatim} | ||||
We strongly recommend that you take a few minutes to read at least the | ||||
basic introduction provided by the \texttt{?} command, and keep in mind | ||||
that the \texttt{\%quickref} command at all times can be used as a quick | ||||
reference ``cheat sheet'' of the most frequently used features of | ||||
IPython. | ||||
At the IPython prompt, any valid Python code that you type will be | ||||
executed similarly to the default Python shell (though often with more | ||||
informative feedback). But since IPython is a \emph{superset} of the | ||||
default Python shell; let's have a brief look at some of its additional | ||||
functionality. | ||||
\textbf{Object introspection} | ||||
A simple \texttt{?} command provides a general introduction to IPython, | ||||
but as indicated in the banner above, you can use the \texttt{?} syntax | ||||
to ask for details about any object. For example, if we type | ||||
\texttt{\_1?}, IPython will print the following details about this | ||||
variable: | ||||
\begin{verbatim} | ||||
In [14]: _1? | ||||
Type: long | ||||
Base Class: <type 'long'> | ||||
String Form:18446744073709551616 | ||||
Namespace: Interactive | ||||
Docstring: | ||||
long(x[, base]) -> integer | ||||
Convert a string or number to a long integer, if possible. A floating | ||||
[etc... snipped for brevity] | ||||
\end{verbatim} | ||||
If you add a second \texttt{?} and for any oobject \texttt{x} type | ||||
\texttt{x??}, IPython will try to provide an even more detailed analsysi | ||||
of the object, including its syntax-highlighted source code when it can | ||||
be found. It's possible that \texttt{x??} returns the same information | ||||
as \texttt{x?}, but in many cases \texttt{x??} will indeed provide | ||||
additional details. | ||||
Finally, the \texttt{?} syntax is also useful to search | ||||
\emph{namespaces} with wildcards. Suppose you are wondering if there is | ||||
any function in Numpy that may do text-related things; with | ||||
\texttt{np.*txt*?}, IPython will print all the names in the \texttt{np} | ||||
namespace (our Numpy shorthand) that have `txt' anywhere in their name: | ||||
\begin{verbatim} | ||||
In [17]: np.*txt*? | ||||
np.genfromtxt | ||||
np.loadtxt | ||||
np.mafromtxt | ||||
np.ndfromtxt | ||||
np.recfromtxt | ||||
np.savetxt | ||||
\end{verbatim} | ||||
\textbf{Tab completion} | ||||
IPython makes the tab key work extra hard for you as a way to rapidly | ||||
inspect objects and libraries. Whenever you have typed something at the | ||||
prompt, by hitting the \texttt{\textless{}tab\textgreater{}} key IPython | ||||
will try to complete the rest of the line. For this, IPython will | ||||
analyze the text you had so far and try to search for Python data or | ||||
files that may match the context you have already provided. | ||||
For example, if you type \texttt{np.load} and hit the key, you'll see: | ||||
\begin{verbatim} | ||||
In [21]: np.load<TAB HERE> | ||||
np.load np.loads np.loadtxt | ||||
\end{verbatim} | ||||
so you can quickly find all the load-related functionality in numpy. Tab | ||||
completion works even for function arguments, for example consider this | ||||
function definition: | ||||
\begin{verbatim} | ||||
In [20]: def f(x, frobinate=False): | ||||
....: if frobinate: | ||||
....: return x**2 | ||||
....: | ||||
\end{verbatim} | ||||
If you now use the \texttt{\textless{}tab\textgreater{}} key after | ||||
having typed `fro' you'll get all valid Python completions, but those | ||||
marked with \texttt{=} at the end are known to be keywords of your | ||||
function: | ||||
\begin{verbatim} | ||||
In [21]: f(2, fro<TAB HERE> | ||||
frobinate= frombuffer fromfunction frompyfunc fromstring | ||||
from fromfile fromiter fromregex frozenset | ||||
\end{verbatim} | ||||
at this point you can add the \texttt{b} letter and hit | ||||
\texttt{\textless{}tab\textgreater{}} once more, and IPython will finish | ||||
the line for you: | ||||
\begin{verbatim} | ||||
In [21]: f(2, frobinate= | ||||
\end{verbatim} | ||||
As a beginner, simply get into the habit of using | ||||
\texttt{\textless{}tab\textgreater{}} after most objects; it should | ||||
quickly become second nature as you will see how helps keep a fluid | ||||
workflow and discover useful information. Later on you can also | ||||
customize this behavior by writing your own completion code, if you so | ||||
desire. | ||||
\textbf{Matplotlib integration} | ||||
One of the most useful features of IPython for scientists is its tight | ||||
integration with matplotlib: at the terminal IPython lets you open | ||||
matplotlib figures without blocking your typing (which is what happens | ||||
if you try to do the same thing at the default Python shell), and in the | ||||
Qt console and notebook you can even view your figures embedded in your | ||||
workspace next to the code that created them. | ||||
The matplotlib support can be either activated when you start IPython by | ||||
passing the \texttt{-{}-pylab} flag, or at any point later in your | ||||
session by using the \texttt{\%pylab} command. If you start IPython with | ||||
\texttt{-{}-pylab}, you'll see something like this (note the extra | ||||
message about pylab): | ||||
\begin{verbatim} | ||||
$ ipython --pylab | ||||
Python 2.7.2+ (default, Oct 4 2011, 20:03:08) | ||||
Type "copyright", "credits" or "license" for more information. | ||||
IPython 0.13.dev -- An enhanced Interactive Python. | ||||
? -> Introduction and overview of IPython's features. | ||||
%quickref -> Quick reference. | ||||
help -> Python's own help system. | ||||
object? -> Details about 'object', use 'object??' for extra details. | ||||
Welcome to pylab, a matplotlib-based Python environment [backend: Qt4Agg]. | ||||
For more information, type 'help(pylab)'. | ||||
In [1]: | ||||
\end{verbatim} | ||||
Furthermore, IPython will import \texttt{numpy} with the \texttt{np} | ||||
shorthand, \texttt{matplotlib.pyplot} as \texttt{plt}, and it will also | ||||
load all of the numpy and pyplot top-level names so that you can | ||||
directly type something like: | ||||
\begin{verbatim} | ||||
In [1]: x = linspace(0, 2*pi, 200) | ||||
In [2]: plot(x, sin(x)) | ||||
Out[2]: [<matplotlib.lines.Line2D at 0x9e7c16c>] | ||||
\end{verbatim} | ||||
instead of having to prefix each call with its full signature (as we | ||||
have been doing in the examples thus far): | ||||
\begin{verbatim} | ||||
In [3]: x = np.linspace(0, 2*np.pi, 200) | ||||
In [4]: plt.plot(x, np.sin(x)) | ||||
Out[4]: [<matplotlib.lines.Line2D at 0x9e900ac>] | ||||
\end{verbatim} | ||||
This shorthand notation can be a huge time-saver when working | ||||
interactively (it's a few characters but you are likely to type them | ||||
hundreds of times in a session). But we should note that as you develop | ||||
persistent scripts and notebooks meant for reuse, it's best to get in | ||||
the habit of using the longer notation (known as \emph{fully qualified | ||||
names} as it's clearer where things come from and it makes for more | ||||
robust, readable and maintainable code in the long run). | ||||
\textbf{Access to the operating system and files} | ||||
In IPython, you can type \texttt{ls} to see your files or \texttt{cd} to | ||||
change directories, just like you would at a regular system prompt: | ||||
\begin{verbatim} | ||||
In [2]: cd tests | ||||
/home/fperez/ipython/nbconvert/tests | ||||
In [3]: ls test.* | ||||
test.aux test.html test.ipynb test.log test.out test.pdf test.rst test.tex | ||||
\end{verbatim} | ||||
Furthermore, if you use the \texttt{!} at the beginning of a line, any | ||||
commands you pass afterwards go directly to the operating system: | ||||
\begin{verbatim} | ||||
In [4]: !echo "Hello IPython" | ||||
Hello IPython | ||||
\end{verbatim} | ||||
IPython offers a useful twist in this feature: it will substitute in the | ||||
command the value of any \emph{Python} variable you may have if you | ||||
prepend it with a \texttt{\$} sign: | ||||
\begin{verbatim} | ||||
In [5]: message = 'IPython interpolates from Python to the shell' | ||||
In [6]: !echo $message | ||||
IPython interpolates from Python to the shell | ||||
\end{verbatim} | ||||
This feature can be extremely useful, as it lets you combine the power | ||||
and clarity of Python for complex logic with the immediacy and | ||||
familiarity of many shell commands. Additionally, if you start the line | ||||
with \emph{two} \texttt{\$\$} signs, the output of the command will be | ||||
automatically captured as a list of lines, e.g.: | ||||
\begin{verbatim} | ||||
In [10]: !!ls test.* | ||||
Out[10]: | ||||
['test.aux', | ||||
'test.html', | ||||
'test.ipynb', | ||||
'test.log', | ||||
'test.out', | ||||
'test.pdf', | ||||
'test.rst', | ||||
'test.tex'] | ||||
\end{verbatim} | ||||
As explained above, you can now use this as the variable \texttt{\_10}. | ||||
If you directly want to capture the output of a system command to a | ||||
Python variable, you can use the syntax \texttt{=!}: | ||||
\begin{verbatim} | ||||
In [11]: testfiles =! ls test.* | ||||
In [12]: print testfiles | ||||
['test.aux', 'test.html', 'test.ipynb', 'test.log', 'test.out', 'test.pdf', 'test.rst', 'test.tex'] | ||||
\end{verbatim} | ||||
Finally, the special \texttt{\%alias} command lets you define names that | ||||
are shorthands for system commands, so that you can type them without | ||||
having to prefix them via \texttt{!} explicitly (for example, | ||||
\texttt{ls} is an alias that has been predefined for you at startup). | ||||
\textbf{Magic commands} | ||||
IPython has a system for special commands, called `magics', that let you | ||||
control IPython itself and perform many common tasks with a more | ||||
shell-like syntax: it uses spaces for delimiting arguments, flags can be | ||||
set with dashes and all arguments are treated as strings, so no | ||||
additional quoting is required. This kind of syntax is invalid in the | ||||
Python language but very convenient for interactive typing (less | ||||
parentheses, commans and quoting everywhere); IPython distinguishes the | ||||
two by detecting lines that start with the \texttt{\%} character. | ||||
You can learn more about the magic system by simply typing | ||||
\texttt{\%magic} at the prompt, which will give you a short description | ||||
plus the documentation on \emph{all} available magics. If you want to | ||||
see only a listing of existing magics, you can use \texttt{\%lsmagic}: | ||||
\begin{verbatim} | ||||
In [4]: lsmagic | ||||
Available magic functions: | ||||
%alias %autocall %autoindent %automagic %bookmark %c %cd %colors %config %cpaste | ||||
%debug %dhist %dirs %doctest_mode %ds %ed %edit %env %gui %hist %history | ||||
%install_default_config %install_ext %install_profiles %load_ext %loadpy %logoff %logon | ||||
%logstart %logstate %logstop %lsmagic %macro %magic %notebook %page %paste %pastebin | ||||
%pd %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %pop %popd %pprint %precision %profile | ||||
%prun %psearch %psource %pushd %pwd %pycat %pylab %quickref %recall %rehashx | ||||
%reload_ext %rep %rerun %reset %reset_selective %run %save %sc %stop %store %sx %tb | ||||
%time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode | ||||
Automagic is ON, % prefix NOT needed for magic functions. | ||||
\end{verbatim} | ||||
Note how the example above omitted the eplicit \texttt{\%} marker and | ||||
simply uses \texttt{lsmagic}. As long as the `automagic' feature is on | ||||
(which it is by default), you can omit the \texttt{\%} marker as long as | ||||
there is no ambiguity with a Python variable of the same name. | ||||
\textbf{Running your code} | ||||
While it's easy to type a few lines of code in IPython, for any | ||||
long-lived work you should keep your codes in Python scripts (or in | ||||
IPython notebooks, see below). Consider that you have a script, in this | ||||
case trivially simple for the sake of brevity, named \texttt{simple.py}: | ||||
\begin{verbatim} | ||||
In [12]: !cat simple.py | ||||
import numpy as np | ||||
x = np.random.normal(size=100) | ||||
print 'First elment of x:', x[0] | ||||
\end{verbatim} | ||||
The typical workflow with IPython is to use the \texttt{\%run} magic to | ||||
execute your script (you can omit the .py extension if you want). When | ||||
you run it, the script will execute just as if it had been run at the | ||||
system prompt with \texttt{python simple.py} (though since modules don't | ||||
get re-executed on new imports by Python, all system initialization is | ||||
essentially free, which can have a significant run time impact in some | ||||
cases): | ||||
\begin{verbatim} | ||||
In [13]: run simple | ||||
First elment of x: -1.55872256289 | ||||
\end{verbatim} | ||||
Once it completes, all variables defined in it become available for you | ||||
to use interactively: | ||||
\begin{verbatim} | ||||
In [14]: x.shape | ||||
Out[14]: (100,) | ||||
\end{verbatim} | ||||
This allows you to plot data, try out ideas, etc, in a | ||||
\texttt{\%run}/interact/edit cycle that can be very productive. As you | ||||
start understanding your problem better you can refine your script | ||||
further, incrementally improving it based on the work you do at the | ||||
IPython prompt. At any point you can use the \texttt{\%hist} magic to | ||||
print out your history without prompts, so that you can copy useful | ||||
fragments back into the script. | ||||
By default, \texttt{\%run} executes scripts in a completely empty | ||||
namespace, to better mimic how they would execute at the system prompt | ||||
with plain Python. But if you use the \texttt{-i} flag, the script will | ||||
also see your interactively defined variables. This lets you edit in a | ||||
script larger amounts of code that still behave as if you had typed them | ||||
at the IPython prompt. | ||||
You can also get a summary of the time taken by your script with the | ||||
\texttt{-t} flag; consider a different script \texttt{randsvd.py} that | ||||
takes a bit longer to run: | ||||
\begin{verbatim} | ||||
In [21]: run -t randsvd.py | ||||
IPython CPU timings (estimated): | ||||
User : 0.38 s. | ||||
System : 0.04 s. | ||||
Wall time: 0.34 s. | ||||
\end{verbatim} | ||||
\texttt{User} is the time spent by the computer executing your code, | ||||
while \texttt{System} is the time the operating system had to work on | ||||
your behalf, doing things like memory allocation that are needed by your | ||||
code but that you didn't explicitly program and that happen inside the | ||||
kernel. The \texttt{Wall time} is the time on a `clock on the wall' | ||||
between the start and end of your program. | ||||
If \texttt{Wall \textgreater{} User+System}, your code is most likely | ||||
waiting idle for certain periods. That could be waiting for data to | ||||
arrive from a remote source or perhaps because the operating system has | ||||
to swap large amounts of virtual memory. If you know that your code | ||||
doesn't explicitly wait for remote data to arrive, you should | ||||
investigate further to identify possible ways of improving the | ||||
performance profile. | ||||
If you only want to time how long a single statement takes, you don't | ||||
need to put it into a script as you can use the \texttt{\%timeit} magic, | ||||
which uses Python's \texttt{timeit} module to very carefully measure | ||||
timig data; \texttt{timeit} can measure even short statements that | ||||
execute extremely fast: | ||||
\begin{verbatim} | ||||
In [27]: %timeit a=1 | ||||
10000000 loops, best of 3: 23 ns per loop | ||||
\end{verbatim} | ||||
and for code that runs longer, it automatically adjusts so the overall | ||||
measurement doesn't take too long: | ||||
\begin{verbatim} | ||||
In [28]: %timeit np.linalg.svd(x) | ||||
1 loops, best of 3: 310 ms per loop | ||||
\end{verbatim} | ||||
The \texttt{\%run} magic still has more options for debugging and | ||||
profiling data; you should read its documentation for many useful | ||||
details (as always, just type \texttt{\%run?}). | ||||
\section{The graphical Qt console} | ||||
If you type at the system prompt (see the IPython website for | ||||
installation details, as this requires some additional libraries): | ||||
\begin{verbatim} | ||||
$ ipython qtconsole | ||||
\end{verbatim} | ||||
instead of opening in a terminal as before, IPython will start a | ||||
graphical console that at first sight appears just like a terminal, but | ||||
which is in fact much more capable than a text-only terminal. This is a | ||||
specialized terminal designed for interactive scientific work, and it | ||||
supports full multi-line editing with color highlighting and graphical | ||||
calltips for functions, it can keep multiple IPython sessions open | ||||
simultaneously in tabs, and when scripts run it can display the figures | ||||
inline directly in the work area. | ||||
# % This cell is for the pdflatex output only | ||||
# \begin{figure}[htbp] | ||||
# \centering | ||||
# \includegraphics[width=3in]{ipython_qtconsole2.png} | ||||
# \caption{The IPython Qt console: a lightweight terminal for scientific exploration, with code, results and graphics in a soingle environment.} | ||||
# \end{figure} | ||||
The Qt console accepts the same \texttt{-{}-pylab} startup flags as the | ||||
terminal, but you can additionally supply the value | ||||
\texttt{-{}-pylab inline}, which enables the support for inline graphics | ||||
shown in the figure. This is ideal for keeping all the code and figures | ||||
in the same session, given that the console can save the output of your | ||||
entire session to HTML or PDF. | ||||
Since the Qt console makes it far more convenient than the terminal to | ||||
edit blocks of code with multiple lines, in this environment it's worth | ||||
knowing about the \texttt{\%loadpy} magic function. \texttt{\%loadpy} | ||||
takes a path to a local file or remote URL, fetches its contents, and | ||||
puts it in the work area for you to further edit and execute. It can be | ||||
an extremely fast and convenient way of loading code from local disk or | ||||
remote examples from sites such as the | ||||
\href{http://matplotlib.sourceforge.net/gallery.html}{Matplotlib | ||||
gallery}. | ||||
Other than its enhanced capabilities for code and graphics, all of the | ||||
features of IPython we've explained before remain functional in this | ||||
graphical console. | ||||
\section{The IPython Notebook} | ||||
The third way to interact with IPython, in addition to the terminal and | ||||
graphical Qt console, is a powerful web interface called the ``IPython | ||||
Notebook''. If you run at the system console (you can omit the | ||||
\texttt{pylab} flags if you don't need plotting support): | ||||
\begin{verbatim} | ||||
$ ipython notebook --pylab inline | ||||
\end{verbatim} | ||||
IPython will start a process that runs a web server in your local | ||||
machine and to which a web browser can connect. The Notebook is a | ||||
workspace that lets you execute code in blocks called `cells' and | ||||
displays any results and figures, but which can also contain arbitrary | ||||
text (including LaTeX-formatted mathematical expressions) and any rich | ||||
media that a modern web browser is capable of displaying. | ||||
# % This cell is for the pdflatex output only | ||||
# \begin{figure}[htbp] | ||||
# \centering | ||||
# \includegraphics[width=3in]{ipython-notebook-specgram-2.png} | ||||
# \caption{The IPython Notebook: text, equations, code, results, graphics and other multimedia in an open format for scientific exploration and collaboration} | ||||
# \end{figure} | ||||
In fact, this document was written as a Notebook, and only exported to | ||||
LaTeX for printing. Inside of each cell, all the features of IPython | ||||
that we have discussed before remain functional, since ultimately this | ||||
web client is communicating with the same IPython code that runs in the | ||||
terminal. But this interface is a much more rich and powerful | ||||
environment for maintaining long-term ``live and executable'' scientific | ||||
documents. | ||||
Notebook environments have existed in commercial systems like | ||||
Mathematica(TM) and Maple(TM) for a long time; in the open source world | ||||
the \href{http://sagemath.org}{Sage} project blazed this particular | ||||
trail starting in 2006, and now we bring all the features that have made | ||||
IPython such a widely used tool to a Notebook model. | ||||
Since the Notebook runs as a web application, it is possible to | ||||
configure it for remote access, letting you run your computations on a | ||||
persistent server close to your data, which you can then access remotely | ||||
from any browser-equipped computer. We encourage you to read the | ||||
extensive documentation provided by the IPython project for details on | ||||
how to do this and many more features of the notebook. | ||||
Finally, as we said earlier, IPython also has a high-level and easy to | ||||
use set of libraries for parallel computing, that let you control | ||||
(interactively if desired) not just one IPython but an entire cluster of | ||||
`IPython engines'. Unfortunately a detailed discussion of these tools is | ||||
beyond the scope of this text, but should you need to parallelize your | ||||
analysis codes, a quick read of the tutorials and examples provided at | ||||
the IPython site may prove fruitful. | ||||
\end{document} | ||||