|
|
%% This file was auto-generated by IPython.
|
|
|
%% Conversion from the original notebook file:
|
|
|
%% tests/ipynbref/IntroNumPy.orig.ipynb
|
|
|
%%
|
|
|
\documentclass[11pt,english]{article}
|
|
|
|
|
|
%% This is the automatic preamble used by IPython. Note that it does *not*
|
|
|
%% include a documentclass declaration, that is added at runtime to the overall
|
|
|
%% document.
|
|
|
|
|
|
\usepackage{amsmath}
|
|
|
\usepackage{amssymb}
|
|
|
\usepackage{graphicx}
|
|
|
\usepackage{ucs}
|
|
|
\usepackage[utf8x]{inputenc}
|
|
|
|
|
|
% needed for markdown enumerations to work
|
|
|
\usepackage{enumerate}
|
|
|
|
|
|
% Slightly bigger margins than the latex defaults
|
|
|
\usepackage{geometry}
|
|
|
\geometry{verbose,tmargin=3cm,bmargin=3cm,lmargin=2.5cm,rmargin=2.5cm}
|
|
|
|
|
|
% Define a few colors for use in code, links and cell shading
|
|
|
\usepackage{color}
|
|
|
\definecolor{orange}{cmyk}{0,0.4,0.8,0.2}
|
|
|
\definecolor{darkorange}{rgb}{.71,0.21,0.01}
|
|
|
\definecolor{darkgreen}{rgb}{.12,.54,.11}
|
|
|
\definecolor{myteal}{rgb}{.26, .44, .56}
|
|
|
\definecolor{gray}{gray}{0.45}
|
|
|
\definecolor{lightgray}{gray}{.95}
|
|
|
\definecolor{mediumgray}{gray}{.8}
|
|
|
\definecolor{inputbackground}{rgb}{.95, .95, .85}
|
|
|
\definecolor{outputbackground}{rgb}{.95, .95, .95}
|
|
|
\definecolor{traceback}{rgb}{1, .95, .95}
|
|
|
|
|
|
% Framed environments for code cells (inputs, outputs, errors, ...). The
|
|
|
% various uses of \unskip (or not) at the end were fine-tuned by hand, so don't
|
|
|
% randomly change them unless you're sure of the effect it will have.
|
|
|
\usepackage{framed}
|
|
|
|
|
|
% remove extraneous vertical space in boxes
|
|
|
\setlength\fboxsep{0pt}
|
|
|
|
|
|
% codecell is the whole input+output set of blocks that a Code cell can
|
|
|
% generate.
|
|
|
|
|
|
% TODO: unfortunately, it seems that using a framed codecell environment breaks
|
|
|
% the ability of the frames inside of it to be broken across pages. This
|
|
|
% causes at least the problem of having lots of empty space at the bottom of
|
|
|
% pages as new frames are moved to the next page, and if a single frame is too
|
|
|
% long to fit on a page, will completely stop latex from compiling the
|
|
|
% document. So unless we figure out a solution to this, we'll have to instead
|
|
|
% leave the codecell env. as empty. I'm keeping the original codecell
|
|
|
% definition here (a thin vertical bar) for reference, in case we find a
|
|
|
% solution to the page break issue.
|
|
|
|
|
|
%% \newenvironment{codecell}{%
|
|
|
%% \def\FrameCommand{\color{mediumgray} \vrule width 1pt \hspace{5pt}}%
|
|
|
%% \MakeFramed{\vspace{-0.5em}}}
|
|
|
%% {\unskip\endMakeFramed}
|
|
|
|
|
|
% For now, make this a no-op...
|
|
|
\newenvironment{codecell}{}
|
|
|
|
|
|
\newenvironment{codeinput}{%
|
|
|
\def\FrameCommand{\colorbox{inputbackground}}%
|
|
|
\MakeFramed{\advance\hsize-\width \FrameRestore}}
|
|
|
{\unskip\endMakeFramed}
|
|
|
|
|
|
\newenvironment{codeoutput}{%
|
|
|
\def\FrameCommand{\colorbox{outputbackground}}%
|
|
|
\vspace{-1.4em}
|
|
|
\MakeFramed{\advance\hsize-\width \FrameRestore}}
|
|
|
{\unskip\medskip\endMakeFramed}
|
|
|
|
|
|
\newenvironment{traceback}{%
|
|
|
\def\FrameCommand{\colorbox{traceback}}%
|
|
|
\MakeFramed{\advance\hsize-\width \FrameRestore}}
|
|
|
{\endMakeFramed}
|
|
|
|
|
|
% Use and configure listings package for nicely formatted code
|
|
|
\usepackage{listingsutf8}
|
|
|
\lstset{
|
|
|
language=python,
|
|
|
inputencoding=utf8x,
|
|
|
extendedchars=\true,
|
|
|
aboveskip=\smallskipamount,
|
|
|
belowskip=\smallskipamount,
|
|
|
xleftmargin=2mm,
|
|
|
breaklines=true,
|
|
|
basicstyle=\small \ttfamily,
|
|
|
showstringspaces=false,
|
|
|
keywordstyle=\color{blue}\bfseries,
|
|
|
commentstyle=\color{myteal},
|
|
|
stringstyle=\color{darkgreen},
|
|
|
identifierstyle=\color{darkorange},
|
|
|
columns=fullflexible, % tighter character kerning, like verb
|
|
|
}
|
|
|
|
|
|
% The hyperref package gives us a pdf with properly built
|
|
|
% internal navigation ('pdf bookmarks' for the table of contents,
|
|
|
% internal cross-reference links, web links for URLs, etc.)
|
|
|
\usepackage{hyperref}
|
|
|
\hypersetup{
|
|
|
breaklinks=true, % so long urls are correctly broken across lines
|
|
|
colorlinks=true,
|
|
|
urlcolor=blue,
|
|
|
linkcolor=darkorange,
|
|
|
citecolor=darkgreen,
|
|
|
}
|
|
|
|
|
|
% hardcode size of all verbatim environments to be a bit smaller
|
|
|
\makeatletter
|
|
|
\g@addto@macro\@verbatim\small\topsep=0.5em\partopsep=0pt
|
|
|
\makeatother
|
|
|
|
|
|
% Prevent overflowing lines due to urls and other hard-to-break entities.
|
|
|
\sloppy
|
|
|
|
|
|
\begin{document}
|
|
|
|
|
|
\section{An Introduction to the Scientific Python Ecosystem}
|
|
|
While the Python language is an excellent tool for general-purpose
|
|
|
programming, with a highly readable syntax, rich and powerful data types
|
|
|
(strings, lists, sets, dictionaries, arbitrary length integers, etc) and
|
|
|
a very comprehensive standard library, it was not designed specifically
|
|
|
for mathematical and scientific computing. Neither the language nor its
|
|
|
standard library have facilities for the efficient representation of
|
|
|
multidimensional datasets, tools for linear algebra and general matrix
|
|
|
manipulations (an essential building block of virtually all technical
|
|
|
computing), nor any data visualization facilities.
|
|
|
|
|
|
In particular, Python lists are very flexible containers that can be
|
|
|
nested arbitrarily deep and which can hold any Python object in them,
|
|
|
but they are poorly suited to represent efficiently common mathematical
|
|
|
constructs like vectors and matrices. In contrast, much of our modern
|
|
|
heritage of scientific computing has been built on top of libraries
|
|
|
written in the Fortran language, which has native support for vectors
|
|
|
and matrices as well as a library of mathematical functions that can
|
|
|
efficiently operate on entire arrays at once.
|
|
|
|
|
|
\subsection{Scientific Python: a collaboration of projects built by scientists}
|
|
|
The scientific community has developed a set of related Python libraries
|
|
|
that provide powerful array facilities, linear algebra, numerical
|
|
|
algorithms, data visualization and more. In this appendix, we will
|
|
|
briefly outline the tools most frequently used for this purpose, that
|
|
|
make ``Scientific Python'' something far more powerful than the Python
|
|
|
language alone.
|
|
|
|
|
|
For reasons of space, we can only describe in some detail the central
|
|
|
Numpy library, but below we provide links to the websites of each
|
|
|
project where you can read their documentation in more detail.
|
|
|
|
|
|
First, let's look at an overview of the basic tools that most scientists
|
|
|
use in daily research with Python. The core of this ecosystem is
|
|
|
composed of:
|
|
|
|
|
|
\begin{itemize}
|
|
|
\item
|
|
|
Numpy: the basic library that most others depend on, it provides a
|
|
|
powerful array type that can represent multidmensional datasets of
|
|
|
many different kinds and that supports arithmetic operations. Numpy
|
|
|
also provides a library of common mathematical functions, basic linear
|
|
|
algebra, random number generation and Fast Fourier Transforms. Numpy
|
|
|
can be found at \href{http://numpy.scipy.org}{numpy.scipy.org}
|
|
|
\item
|
|
|
Scipy: a large collection of numerical algorithms that operate on
|
|
|
numpy arrays and provide facilities for many common tasks in
|
|
|
scientific computing, including dense and sparse linear algebra
|
|
|
support, optimization, special functions, statistics, n-dimensional
|
|
|
image processing, signal processing and more. Scipy can be found at
|
|
|
\href{http://scipy.org}{scipy.org}.
|
|
|
\item
|
|
|
Matplotlib: a data visualization library with a strong focus on
|
|
|
producing high-quality output, it supports a variety of common
|
|
|
scientific plot types in two and three dimensions, with precise
|
|
|
control over the final output and format for publication-quality
|
|
|
results. Matplotlib can also be controlled interactively allowing
|
|
|
graphical manipulation of your data (zooming, panning, etc) and can be
|
|
|
used with most modern user interface toolkits. It can be found at
|
|
|
\href{http://matplotlib.sf.net}{matplotlib.sf.net}.
|
|
|
\item
|
|
|
IPython: while not strictly scientific in nature, IPython is the
|
|
|
interactive environment in which many scientists spend their time.
|
|
|
IPython provides a powerful Python shell that integrates tightly with
|
|
|
Matplotlib and with easy access to the files and operating system, and
|
|
|
which can execute in a terminal or in a graphical Qt console. IPython
|
|
|
also has a web-based notebook interface that can combine code with
|
|
|
text, mathematical expressions, figures and multimedia. It can be
|
|
|
found at \href{http://ipython.org}{ipython.org}.
|
|
|
\end{itemize}
|
|
|
While each of these tools can be installed separately, in our opinion
|
|
|
the most convenient way today of accessing them (especially on Windows
|
|
|
and Mac computers) is to install the
|
|
|
\href{http://www.enthought.com/products/epd\_free.php}{Free Edition of
|
|
|
the Enthought Python Distribution} which contain all the above. Other
|
|
|
free alternatives on Windows (but not on Macs) are
|
|
|
\href{http://code.google.com/p/pythonxy}{Python(x,y)} and
|
|
|
\href{http://www.lfd.uci.edu/~gohlke/pythonlibs}{Christoph Gohlke's
|
|
|
packages page}.
|
|
|
|
|
|
These four `core' libraries are in practice complemented by a number of
|
|
|
other tools for more specialized work. We will briefly list here the
|
|
|
ones that we think are the most commonly needed:
|
|
|
|
|
|
\begin{itemize}
|
|
|
\item
|
|
|
Sympy: a symbolic manipulation tool that turns a Python session into a
|
|
|
computer algebra system. It integrates with the IPython notebook,
|
|
|
rendering results in properly typeset mathematical notation.
|
|
|
\href{http://sympy.org}{sympy.org}.
|
|
|
\item
|
|
|
Mayavi: sophisticated 3d data visualization;
|
|
|
\href{http://code.enthought.com/projects/mayavi}{code.enthought.com/projects/mayavi}.
|
|
|
\item
|
|
|
Cython: a bridge language between Python and C, useful both to
|
|
|
optimize performance bottlenecks in Python and to access C libraries
|
|
|
directly; \href{http://cython.org}{cython.org}.
|
|
|
\item
|
|
|
Pandas: high-performance data structures and data analysis tools, with
|
|
|
powerful data alignment and structural manipulation capabilities;
|
|
|
\href{http://pandas.pydata.org}{pandas.pydata.org}.
|
|
|
\item
|
|
|
Statsmodels: statistical data exploration and model estimation;
|
|
|
\href{http://statsmodels.sourceforge.net}{statsmodels.sourceforge.net}.
|
|
|
\item
|
|
|
Scikit-learn: general purpose machine learning algorithms with a
|
|
|
common interface; \href{http://scikit-learn.org}{scikit-learn.org}.
|
|
|
\item
|
|
|
Scikits-image: image processing toolbox;
|
|
|
\href{http://scikits-image.org}{scikits-image.org}.
|
|
|
\item
|
|
|
NetworkX: analysis of complex networks (in the graph theoretical
|
|
|
sense); \href{http://networkx.lanl.gov}{networkx.lanl.gov}.
|
|
|
\item
|
|
|
PyTables: management of hierarchical datasets using the
|
|
|
industry-standard HDF5 format;
|
|
|
\href{http://www.pytables.org}{www.pytables.org}.
|
|
|
\end{itemize}
|
|
|
Beyond these, for any specific problem you should look on the internet
|
|
|
first, before starting to write code from scratch. There's a good chance
|
|
|
that someone, somewhere, has written an open source library that you can
|
|
|
use for part or all of your problem.
|
|
|
|
|
|
\subsection{A note about the examples below}
|
|
|
In all subsequent examples, you will see blocks of input code, followed
|
|
|
by the results of the code if the code generated output. This output may
|
|
|
include text, graphics and other result objects. These blocks of input
|
|
|
can be pasted into your interactive IPython session or notebook for you
|
|
|
to execute. In the print version of this document, a thin vertical bar
|
|
|
on the left of the blocks of input and output shows which blocks go
|
|
|
together.
|
|
|
|
|
|
If you are reading this text as an actual IPython notebook, you can
|
|
|
press \texttt{Shift-Enter} or use the `play' button on the toolbar
|
|
|
(right-pointing triangle) to execute each block of code, known as a
|
|
|
`cell' in IPython:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
# This is a block of code, below you'll see its output
|
|
|
print "Welcome to the world of scientific computing with Python!"
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
Welcome to the world of scientific computing with Python!
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\section{Motivation: the trapezoidal rule}
|
|
|
In subsequent sections we'll provide a basic introduction to the nuts
|
|
|
and bolts of the basic scientific python tools; but we'll first motivate
|
|
|
it with a brief example that illustrates what you can do in a few lines
|
|
|
with these tools. For this, we will use the simple problem of
|
|
|
approximating a definite integral with the trapezoid rule:
|
|
|
|
|
|
\[
|
|
|
\int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right).
|
|
|
\]
|
|
|
|
|
|
Our task will be to compute this formula for a function such as:
|
|
|
|
|
|
\[
|
|
|
f(x) = (x-3)(x-5)(x-7)+85
|
|
|
\]
|
|
|
|
|
|
integrated between $a=1$ and $b=9$.
|
|
|
|
|
|
First, we define the function and sample it evenly between 0 and 10 at
|
|
|
200 points:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
def f(x):
|
|
|
return (x-3)*(x-5)*(x-7)+85
|
|
|
|
|
|
import numpy as np
|
|
|
x = np.linspace(0, 10, 200)
|
|
|
y = f(x)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\end{codecell}
|
|
|
We select $a$ and $b$, our integration limits, and we take only a few
|
|
|
points in that region to illustrate the error behavior of the trapezoid
|
|
|
approximation:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
a, b = 1, 9
|
|
|
xint = x[logical_and(x>=a, x<=b)][::30]
|
|
|
yint = y[logical_and(x>=a, x<=b)][::30]
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\end{codecell}
|
|
|
Let's plot both the function and the area below it in the trapezoid
|
|
|
approximation:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
import matplotlib.pyplot as plt
|
|
|
plt.plot(x, y, lw=2)
|
|
|
plt.axis([0, 10, 0, 140])
|
|
|
plt.fill_between(xint, 0, yint, facecolor='gray', alpha=0.4)
|
|
|
plt.text(0.5 * (a + b), 30,r"$\int_a^b f(x)dx$", horizontalalignment='center', fontsize=20);
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{center}
|
|
|
\includegraphics[width=0.7\textwidth]{IntroNumPy_orig_files/IntroNumPy_orig_fig_00.pdf}
|
|
|
\par
|
|
|
\end{center}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Compute the integral both at high accuracy and with the trapezoid
|
|
|
approximation
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
from scipy.integrate import quad, trapz
|
|
|
integral, error = quad(f, 1, 9)
|
|
|
trap_integral = trapz(yint, xint)
|
|
|
print "The integral is: %g +/- %.1e" % (integral, error)
|
|
|
print "The trapezoid approximation with", len(xint), "points is:", trap_integral
|
|
|
print "The absolute error is:", abs(integral - trap_integral)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
The integral is: 680 +/- 7.5e-12
|
|
|
The trapezoid approximation with 6 points is: 621.286411141
|
|
|
The absolute error is: 58.7135888589
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
This simple example showed us how, combining the numpy, scipy and
|
|
|
matplotlib libraries we can provide an illustration of a standard method
|
|
|
in elementary calculus with just a few lines of code. We will now
|
|
|
discuss with more detail the basic usage of these tools.
|
|
|
|
|
|
\section{NumPy arrays: the right data structure for scientific computing}
|
|
|
\subsection{Basics of Numpy arrays}
|
|
|
We now turn our attention to the Numpy library, which forms the base
|
|
|
layer for the entire `scipy ecosystem'. Once you have installed numpy,
|
|
|
you can import it as
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
import numpy
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\end{codecell}
|
|
|
though in this book we will use the common shorthand
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
import numpy as np
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\end{codecell}
|
|
|
As mentioned above, the main object provided by numpy is a powerful
|
|
|
array. We'll start by exploring how the numpy array differs from Python
|
|
|
lists. We start by creating a simple list and an array with the same
|
|
|
contents of the list:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
lst = [10, 20, 30, 40]
|
|
|
arr = np.array([10, 20, 30, 40])
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\end{codecell}
|
|
|
Elements of a one-dimensional array are accessed with the same syntax as
|
|
|
a list:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
lst[0]
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
10
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr[0]
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
10
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr[-1]
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
40
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr[2:]
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([30, 40])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
The first difference to note between lists and arrays is that arrays are
|
|
|
\emph{homogeneous}; i.e.~all elements of an array must be of the same
|
|
|
type. In contrast, lists can contain elements of arbitrary type. For
|
|
|
example, we can change the last element in our list above to be a
|
|
|
string:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
lst[-1] = 'a string inside a list'
|
|
|
lst
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
[10, 20, 30, 'a string inside a list']
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
but the same can not be done with an array, as we get an error message:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr[-1] = 'a string inside an array'
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{traceback}
|
|
|
\begin{verbatim}
|
|
|
---------------------------------------------------------------------------
|
|
|
ValueError Traceback (most recent call last)
|
|
|
/home/fperez/teach/book-math-labtool/<ipython-input-13-29c0bfa5fa8a> in <module>()
|
|
|
----> 1 arr[-1] = 'a string inside an array'
|
|
|
|
|
|
ValueError: invalid literal for long() with base 10: 'a string inside an array'
|
|
|
\end{verbatim}
|
|
|
\end{traceback}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
The information about the type of an array is contained in its
|
|
|
\emph{dtype} attribute:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr.dtype
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
dtype('int32')
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Once an array has been created, its dtype is fixed and it can only store
|
|
|
elements of the same type. For this example where the dtype is integer,
|
|
|
if we store a floating point number it will be automatically converted
|
|
|
into an integer:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr[-1] = 1.234
|
|
|
arr
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([10, 20, 30, 1])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Above we created an array from an existing list; now let us now see
|
|
|
other ways in which we can create arrays, which we'll illustrate next. A
|
|
|
common need is to have an array initialized with a constant value, and
|
|
|
very often this value is 0 or 1 (suitable as starting value for additive
|
|
|
and multiplicative loops respectively); \texttt{zeros} creates arrays of
|
|
|
all zeros, with any desired dtype:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
np.zeros(5, float)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([ 0., 0., 0., 0., 0.])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
np.zeros(3, int)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([0, 0, 0])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
np.zeros(3, complex)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([ 0.+0.j, 0.+0.j, 0.+0.j])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
and similarly for \texttt{ones}:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print '5 ones:', np.ones(5)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
5 ones: [ 1. 1. 1. 1. 1.]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
If we want an array initialized with an arbitrary value, we can create
|
|
|
an empty array and then use the fill method to put the value we want
|
|
|
into the array:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
a = empty(4)
|
|
|
a.fill(5.5)
|
|
|
a
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([ 5.5, 5.5, 5.5, 5.5])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Numpy also offers the \texttt{arange} function, which works like the
|
|
|
builtin \texttt{range} but returns an array instead of a list:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
np.arange(5)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([0, 1, 2, 3, 4])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
and the \texttt{linspace} and \texttt{logspace} functions to create
|
|
|
linearly and logarithmically-spaced grids respectively, with a fixed
|
|
|
number of points and including both ends of the specified interval:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print "A linear grid between 0 and 1:", np.linspace(0, 1, 5)
|
|
|
print "A logarithmic grid between 10**1 and 10**4: ", np.logspace(1, 4, 4)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
A linear grid between 0 and 1: [ 0. 0.25 0.5 0.75 1. ]
|
|
|
A logarithmic grid between 10**1 and 10**4: [ 10. 100. 1000. 10000.]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Finally, it is often useful to create arrays with random numbers that
|
|
|
follow a specific distribution. The \texttt{np.random} module contains a
|
|
|
number of functions that can be used to this effect, for example this
|
|
|
will produce an array of 5 random samples taken from a standard normal
|
|
|
distribution (0 mean and variance 1):
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
np.random.randn(5)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([-0.08633343, -0.67375434, 1.00589536, 0.87081651, 1.65597822])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
whereas this will also give 5 samples, but from a normal distribution
|
|
|
with a mean of 10 and a variance of 3:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
norm10 = np.random.normal(10, 3, 5)
|
|
|
norm10
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([ 8.94879575, 5.53038269, 8.24847281, 12.14944165, 11.56209294])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\subsection{Indexing with other arrays}
|
|
|
Above we saw how to index arrays with single numbers and slices, just
|
|
|
like Python lists. But arrays allow for a more sophisticated kind of
|
|
|
indexing which is very powerful: you can index an array with another
|
|
|
array, and in particular with an array of boolean values. This is
|
|
|
particluarly useful to extract information from an array that matches a
|
|
|
certain condition.
|
|
|
|
|
|
Consider for example that in the array \texttt{norm10} we want to
|
|
|
replace all values above 9 with the value 0. We can do so by first
|
|
|
finding the \emph{mask} that indicates where this condition is true or
|
|
|
false:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
mask = norm10 > 9
|
|
|
mask
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([False, False, False, True, True], dtype=bool)
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Now that we have this mask, we can use it to either read those values or
|
|
|
to reset them to 0:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print 'Values above 9:', norm10[mask]
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
Values above 9: [ 12.14944165 11.56209294]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print 'Resetting all values above 9 to 0...'
|
|
|
norm10[mask] = 0
|
|
|
print norm10
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
Resetting all values above 9 to 0...
|
|
|
[ 8.94879575 5.53038269 8.24847281 0. 0. ]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\subsection{Arrays with more than one dimension}
|
|
|
Up until now all our examples have used one-dimensional arrays. But
|
|
|
Numpy can create arrays of aribtrary dimensions, and all the methods
|
|
|
illustrated in the previous section work with more than one dimension.
|
|
|
For example, a list of lists can be used to initialize a two dimensional
|
|
|
array:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
lst2 = [[1, 2], [3, 4]]
|
|
|
arr2 = np.array([[1, 2], [3, 4]])
|
|
|
arr2
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([[1, 2],
|
|
|
[3, 4]])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
With two-dimensional arrays we start seeing the power of numpy: while a
|
|
|
nested list can be indexed using repeatedly the \texttt{{[} {]}}
|
|
|
operator, multidimensional arrays support a much more natural indexing
|
|
|
syntax with a single \texttt{{[} {]}} and a set of indices separated by
|
|
|
commas:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print lst2[0][1]
|
|
|
print arr2[0,1]
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
2
|
|
|
2
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Most of the array creation functions listed above can be used with more
|
|
|
than one dimension, for example:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
np.zeros((2,3))
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([[ 0., 0., 0.],
|
|
|
[ 0., 0., 0.]])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
np.random.normal(10, 3, (2, 4))
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([[ 11.26788826, 4.29619866, 11.09346496, 9.73861307],
|
|
|
[ 10.54025996, 9.5146268 , 10.80367214, 13.62204505]])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
In fact, the shape of an array can be changed at any time, as long as
|
|
|
the total number of elements is unchanged. For example, if we want a 2x4
|
|
|
array with numbers increasing from 0, the easiest way to create it is:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr = np.arange(8).reshape(2,4)
|
|
|
print arr
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
[[0 1 2 3]
|
|
|
[4 5 6 7]]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
With multidimensional arrays, you can also use slices, and you can mix
|
|
|
and match slices and single indices in the different dimensions (using
|
|
|
the same array as above):
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print 'Slicing in the second row:', arr[1, 2:4]
|
|
|
print 'All rows, third column :', arr[:, 2]
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
Slicing in the second row: [6 7]
|
|
|
All rows, third column : [2 6]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
If you only provide one index, then you will get an array with one less
|
|
|
dimension containing that row:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print 'First row: ', arr[0]
|
|
|
print 'Second row: ', arr[1]
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
First row: [0 1 2 3]
|
|
|
Second row: [4 5 6 7]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Now that we have seen how to create arrays with more than one dimension,
|
|
|
it's a good idea to look at some of the most useful properties and
|
|
|
methods that arrays have. The following provide basic information about
|
|
|
the size, shape and data in the array:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print 'Data type :', arr.dtype
|
|
|
print 'Total number of elements :', arr.size
|
|
|
print 'Number of dimensions :', arr.ndim
|
|
|
print 'Shape (dimensionality) :', arr.shape
|
|
|
print 'Memory used (in bytes) :', arr.nbytes
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
Data type : int32
|
|
|
Total number of elements : 8
|
|
|
Number of dimensions : 2
|
|
|
Shape (dimensionality) : (2, 4)
|
|
|
Memory used (in bytes) : 32
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Arrays also have many useful methods, some especially useful ones are:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print 'Minimum and maximum :', arr.min(), arr.max()
|
|
|
print 'Sum and product of all elements :', arr.sum(), arr.prod()
|
|
|
print 'Mean and standard deviation :', arr.mean(), arr.std()
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
Minimum and maximum : 0 7
|
|
|
Sum and product of all elements : 28 0
|
|
|
Mean and standard deviation : 3.5 2.29128784748
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
For these methods, the above operations area all computed on all the
|
|
|
elements of the array. But for a multidimensional array, it's possible
|
|
|
to do the computation along a single dimension, by passing the
|
|
|
\texttt{axis} parameter; for example:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print 'For the following array:\n', arr
|
|
|
print 'The sum of elements along the rows is :', arr.sum(axis=1)
|
|
|
print 'The sum of elements along the columns is :', arr.sum(axis=0)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
For the following array:
|
|
|
[[0 1 2 3]
|
|
|
[4 5 6 7]]
|
|
|
The sum of elements along the rows is : [ 6 22]
|
|
|
The sum of elements along the columns is : [ 4 6 8 10]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
As you can see in this example, the value of the \texttt{axis} parameter
|
|
|
is the dimension which will be \emph{consumed} once the operation has
|
|
|
been carried out. This is why to sum along the rows we use
|
|
|
\texttt{axis=0}.
|
|
|
|
|
|
This can be easily illustrated with an example that has more dimensions;
|
|
|
we create an array with 4 dimensions and shape \texttt{(3,4,5,6)} and
|
|
|
sum along the axis number 2 (i.e.~the \emph{third} axis, since in Python
|
|
|
all counts are 0-based). That consumes the dimension whose length was 5,
|
|
|
leaving us with a new array that has shape \texttt{(3,4,6)}:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
np.zeros((3,4,5,6)).sum(2).shape
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
(3, 4, 6)
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Another widely used property of arrays is the \texttt{.T} attribute,
|
|
|
which allows you to access the transpose of the array:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print 'Array:\n', arr
|
|
|
print 'Transpose:\n', arr.T
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
Array:
|
|
|
[[0 1 2 3]
|
|
|
[4 5 6 7]]
|
|
|
Transpose:
|
|
|
[[0 4]
|
|
|
[1 5]
|
|
|
[2 6]
|
|
|
[3 7]]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
We don't have time here to look at all the methods and properties of
|
|
|
arrays, here's a complete list. Simply try exploring some of these
|
|
|
IPython to learn more, or read their description in the full Numpy
|
|
|
documentation:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
arr.T arr.copy arr.getfield arr.put arr.squeeze
|
|
|
arr.all arr.ctypes arr.imag arr.ravel arr.std
|
|
|
arr.any arr.cumprod arr.item arr.real arr.strides
|
|
|
arr.argmax arr.cumsum arr.itemset arr.repeat arr.sum
|
|
|
arr.argmin arr.data arr.itemsize arr.reshape arr.swapaxes
|
|
|
arr.argsort arr.diagonal arr.max arr.resize arr.take
|
|
|
arr.astype arr.dot arr.mean arr.round arr.tofile
|
|
|
arr.base arr.dtype arr.min arr.searchsorted arr.tolist
|
|
|
arr.byteswap arr.dump arr.nbytes arr.setasflat arr.tostring
|
|
|
arr.choose arr.dumps arr.ndim arr.setfield arr.trace
|
|
|
arr.clip arr.fill arr.newbyteorder arr.setflags arr.transpose
|
|
|
arr.compress arr.flags arr.nonzero arr.shape arr.var
|
|
|
arr.conj arr.flat arr.prod arr.size arr.view
|
|
|
arr.conjugate arr.flatten arr.ptp arr.sort
|
|
|
\end{verbatim}
|
|
|
|
|
|
|
|
|
\subsection{Operating with arrays}
|
|
|
Arrays support all regular arithmetic operators, and the numpy library
|
|
|
also contains a complete collection of basic mathematical functions that
|
|
|
operate on arrays. It is important to remember that in general, all
|
|
|
operations with arrays are applied \emph{element-wise}, i.e., are
|
|
|
applied to all the elements of the array at the same time. Consider for
|
|
|
example:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr1 = np.arange(4)
|
|
|
arr2 = np.arange(10, 14)
|
|
|
print arr1, '+', arr2, '=', arr1+arr2
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
[0 1 2 3] + [10 11 12 13] = [10 12 14 16]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Importantly, you must remember that even the multiplication operator is
|
|
|
by default applied element-wise, it is \emph{not} the matrix
|
|
|
multiplication from linear algebra (as is the case in Matlab, for
|
|
|
example):
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print arr1, '*', arr2, '=', arr1*arr2
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
[0 1 2 3] * [10 11 12 13] = [ 0 11 24 39]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
While this means that in principle arrays must always match in their
|
|
|
dimensionality in order for an operation to be valid, numpy will
|
|
|
\emph{broadcast} dimensions when possible. For example, suppose that you
|
|
|
want to add the number 1.5 to \texttt{arr1}; the following would be a
|
|
|
valid way to do it:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr1 + 1.5*np.ones(4)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([ 1.5, 2.5, 3.5, 4.5])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
But thanks to numpy's broadcasting rules, the following is equally
|
|
|
valid:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr1 + 1.5
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([ 1.5, 2.5, 3.5, 4.5])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
In this case, numpy looked at both operands and saw that the first
|
|
|
(\texttt{arr1}) was a one-dimensional array of length 4 and the second
|
|
|
was a scalar, considered a zero-dimensional object. The broadcasting
|
|
|
rules allow numpy to:
|
|
|
|
|
|
\begin{itemize}
|
|
|
\item
|
|
|
\emph{create} new dimensions of length 1 (since this doesn't change
|
|
|
the size of the array)
|
|
|
\item
|
|
|
`stretch' a dimension of length 1 that needs to be matched to a
|
|
|
dimension of a different size.
|
|
|
\end{itemize}
|
|
|
So in the above example, the scalar 1.5 is effectively:
|
|
|
|
|
|
\begin{itemize}
|
|
|
\item
|
|
|
first `promoted' to a 1-dimensional array of length 1
|
|
|
\item
|
|
|
then, this array is `stretched' to length 4 to match the dimension of
|
|
|
\texttt{arr1}.
|
|
|
\end{itemize}
|
|
|
After these two operations are complete, the addition can proceed as now
|
|
|
both operands are one-dimensional arrays of length 4.
|
|
|
|
|
|
This broadcasting behavior is in practice enormously powerful,
|
|
|
especially because when numpy broadcasts to create new dimensions or to
|
|
|
`stretch' existing ones, it doesn't actually replicate the data. In the
|
|
|
example above the operation is carried \emph{as if} the 1.5 was a 1-d
|
|
|
array with 1.5 in all of its entries, but no actual array was ever
|
|
|
created. This can save lots of memory in cases when the arrays in
|
|
|
question are large and can have significant performance implications.
|
|
|
|
|
|
The general rule is: when operating on two arrays, NumPy compares their
|
|
|
shapes element-wise. It starts with the trailing dimensions, and works
|
|
|
its way forward, creating dimensions of length 1 as needed. Two
|
|
|
dimensions are considered compatible when
|
|
|
|
|
|
\begin{itemize}
|
|
|
\item
|
|
|
they are equal to begin with, or
|
|
|
\item
|
|
|
one of them is 1; in this case numpy will do the `stretching' to make
|
|
|
them equal.
|
|
|
\end{itemize}
|
|
|
If these conditions are not met, a
|
|
|
\texttt{ValueError: frames are not aligned} exception is thrown,
|
|
|
indicating that the arrays have incompatible shapes. The size of the
|
|
|
resulting array is the maximum size along each dimension of the input
|
|
|
arrays.
|
|
|
|
|
|
This shows how the broadcasting rules work in several dimensions:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
b = np.array([2, 3, 4, 5])
|
|
|
print arr, '\n\n+', b , '\n----------------\n', arr + b
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
[[0 1 2 3]
|
|
|
[4 5 6 7]]
|
|
|
|
|
|
+ [2 3 4 5]
|
|
|
----------------
|
|
|
[[ 2 4 6 8]
|
|
|
[ 6 8 10 12]]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Now, how could you use broadcasting to say add \texttt{{[}4, 6{]}} along
|
|
|
the rows to \texttt{arr} above? Simply performing the direct addition
|
|
|
will produce the error we previously mentioned:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
c = np.array([4, 6])
|
|
|
arr + c
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{traceback}
|
|
|
\begin{verbatim}
|
|
|
---------------------------------------------------------------------------
|
|
|
ValueError Traceback (most recent call last)
|
|
|
/home/fperez/teach/book-math-labtool/<ipython-input-45-62aa20ac1980> in <module>()
|
|
|
1 c = np.array([4, 6])
|
|
|
----> 2 arr + c
|
|
|
|
|
|
ValueError: operands could not be broadcast together with shapes (2,4) (2)
|
|
|
\end{verbatim}
|
|
|
\end{traceback}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
According to the rules above, the array \texttt{c} would need to have a
|
|
|
\emph{trailing} dimension of 1 for the broadcasting to work. It turns
|
|
|
out that numpy allows you to `inject' new dimensions anywhere into an
|
|
|
array on the fly, by indexing it with the special object
|
|
|
\texttt{np.newaxis}:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
(c[:, np.newaxis]).shape
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
(2, 1)
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
This is exactly what we need, and indeed it works:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr + c[:, np.newaxis]
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
array([[ 4, 5, 6, 7],
|
|
|
[10, 11, 12, 13]])
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
For the full broadcasting rules, please see the official Numpy docs,
|
|
|
which describe them in detail and with more complex examples.
|
|
|
|
|
|
As we mentioned before, Numpy ships with a full complement of
|
|
|
mathematical functions that work on entire arrays, including logarithms,
|
|
|
exponentials, trigonometric and hyperbolic trigonometric functions, etc.
|
|
|
Furthermore, scipy ships a rich special function library in the
|
|
|
\texttt{scipy.special} module that includes Bessel, Airy, Fresnel,
|
|
|
Laguerre and other classical special functions. For example, sampling
|
|
|
the sine function at 100 points between $0$ and $2\pi$ is as simple as:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
x = np.linspace(0, 2*np.pi, 100)
|
|
|
y = np.sin(x)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\end{codecell}
|
|
|
\subsection{Linear algebra in numpy}
|
|
|
Numpy ships with a basic linear algebra library, and all arrays have a
|
|
|
\texttt{dot} method whose behavior is that of the scalar dot product
|
|
|
when its arguments are vectors (one-dimensional arrays) and the
|
|
|
traditional matrix multiplication when one or both of its arguments are
|
|
|
two-dimensional arrays:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
v1 = np.array([2, 3, 4])
|
|
|
v2 = np.array([1, 0, 1])
|
|
|
print v1, '.', v2, '=', v1.dot(v2)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
[2 3 4] . [1 0 1] = 6
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Here is a regular matrix-vector multiplication, note that the array
|
|
|
\texttt{v1} should be viewed as a \emph{column} vector in traditional
|
|
|
linear algebra notation; numpy makes no distinction between row and
|
|
|
column vectors and simply verifies that the dimensions match the
|
|
|
required rules of matrix multiplication, in this case we have a
|
|
|
$2 \times 3$ matrix multiplied by a 3-vector, which produces a 2-vector:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
A = np.arange(6).reshape(2, 3)
|
|
|
print A, 'x', v1, '=', A.dot(v1)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
[[0 1 2]
|
|
|
[3 4 5]] x [2 3 4] = [11 38]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
For matrix-matrix multiplication, the same dimension-matching rules must
|
|
|
be satisfied, e.g.~consider the difference between $A \times A^T$:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print A.dot(A.T)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
[[ 5 14]
|
|
|
[14 50]]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
and $A^T \times A$:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print A.T.dot(A)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
[[ 9 12 15]
|
|
|
[12 17 22]
|
|
|
[15 22 29]]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Furthermore, the \texttt{numpy.linalg} module includes additional
|
|
|
functionality such as determinants, matrix norms, Cholesky, eigenvalue
|
|
|
and singular value decompositions, etc. For even more linear algebra
|
|
|
tools, \texttt{scipy.linalg} contains the majority of the tools in the
|
|
|
classic LAPACK libraries as well as functions to operate on sparse
|
|
|
matrices. We refer the reader to the Numpy and Scipy documentations for
|
|
|
additional details on these.
|
|
|
|
|
|
\subsection{Reading and writing arrays to disk}
|
|
|
Numpy lets you read and write arrays into files in a number of ways. In
|
|
|
order to use these tools well, it is critical to understand the
|
|
|
difference between a \emph{text} and a \emph{binary} file containing
|
|
|
numerical data. In a text file, the number $\pi$ could be written as
|
|
|
``3.141592653589793'', for example: a string of digits that a human can
|
|
|
read, with in this case 15 decimal digits. In contrast, that same number
|
|
|
written to a binary file would be encoded as 8 characters (bytes) that
|
|
|
are not readable by a human but which contain the exact same data that
|
|
|
the variable \texttt{pi} had in the computer's memory.
|
|
|
|
|
|
The tradeoffs between the two modes are thus:
|
|
|
|
|
|
\begin{itemize}
|
|
|
\item
|
|
|
Text mode: occupies more space, precision can be lost (if not all
|
|
|
digits are written to disk), but is readable and editable by hand with
|
|
|
a text editor. Can \emph{only} be used for one- and two-dimensional
|
|
|
arrays.
|
|
|
\item
|
|
|
Binary mode: compact and exact representation of the data in memory,
|
|
|
can't be read or edited by hand. Arrays of any size and dimensionality
|
|
|
can be saved and read without loss of information.
|
|
|
\end{itemize}
|
|
|
First, let's see how to read and write arrays in text mode. The
|
|
|
\texttt{np.savetxt} function saves an array to a text file, with options
|
|
|
to control the precision, separators and even adding a header:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr = np.arange(10).reshape(2, 5)
|
|
|
np.savetxt('test.out', arr, fmt='%.2e', header="My dataset")
|
|
|
!cat test.out
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
# My dataset
|
|
|
0.00e+00 1.00e+00 2.00e+00 3.00e+00 4.00e+00
|
|
|
5.00e+00 6.00e+00 7.00e+00 8.00e+00 9.00e+00
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
And this same type of file can then be read with the matching
|
|
|
\texttt{np.loadtxt} function:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
arr2 = np.loadtxt('test.out')
|
|
|
print arr2
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
[[ 0. 1. 2. 3. 4.]
|
|
|
[ 5. 6. 7. 8. 9.]]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
For binary data, Numpy provides the \texttt{np.save} and
|
|
|
\texttt{np.savez} routines. The first saves a single array to a file
|
|
|
with \texttt{.npy} extension, while the latter can be used to save a
|
|
|
\emph{group} of arrays into a single file with \texttt{.npz} extension.
|
|
|
The files created with these routines can then be read with the
|
|
|
\texttt{np.load} function.
|
|
|
|
|
|
Let us first see how to use the simpler \texttt{np.save} function to
|
|
|
save a single array:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
np.save('test.npy', arr2)
|
|
|
# Now we read this back
|
|
|
arr2n = np.load('test.npy')
|
|
|
# Let's see if any element is non-zero in the difference.
|
|
|
# A value of True would be a problem.
|
|
|
print 'Any differences?', np.any(arr2-arr2n)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
Any differences? False
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Now let us see how the \texttt{np.savez} function works. You give it a
|
|
|
filename and either a sequence of arrays or a set of keywords. In the
|
|
|
first mode, the function will auotmatically name the saved arrays in the
|
|
|
archive as \texttt{arr\_0}, \texttt{arr\_1}, etc:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
np.savez('test.npz', arr, arr2)
|
|
|
arrays = np.load('test.npz')
|
|
|
arrays.files
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
['arr_1', 'arr_0']
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
Alternatively, we can explicitly choose how to name the arrays we save:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
np.savez('test.npz', array1=arr, array2=arr2)
|
|
|
arrays = np.load('test.npz')
|
|
|
arrays.files
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
['array2', 'array1']
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
The object returned by \texttt{np.load} from an \texttt{.npz} file works
|
|
|
like a dictionary, though you can also access its constituent files by
|
|
|
attribute using its special \texttt{.f} field; this is best illustrated
|
|
|
with an example with the \texttt{arrays} object from above:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
print 'First row of first array:', arrays['array1'][0]
|
|
|
# This is an equivalent way to get the same field
|
|
|
print 'First row of first array:', arrays.f.array1[0]
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
First row of first array: [0 1 2 3 4]
|
|
|
First row of first array: [0 1 2 3 4]
|
|
|
\end{verbatim}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
This \texttt{.npz} format is a very convenient way to package compactly
|
|
|
and without loss of information, into a single file, a group of related
|
|
|
arrays that pertain to a specific problem. At some point, however, the
|
|
|
complexity of your dataset may be such that the optimal approach is to
|
|
|
use one of the standard formats in scientific data processing that have
|
|
|
been designed to handle complex datasets, such as NetCDF or HDF5.
|
|
|
|
|
|
Fortunately, there are tools for manipulating these formats in Python,
|
|
|
and for storing data in other ways such as databases. A complete
|
|
|
discussion of the possibilities is beyond the scope of this discussion,
|
|
|
but of particular interest for scientific users we at least mention the
|
|
|
following:
|
|
|
|
|
|
\begin{itemize}
|
|
|
\item
|
|
|
The \texttt{scipy.io} module contains routines to read and write
|
|
|
Matlab files in \texttt{.mat} format and files in the NetCDF format
|
|
|
that is widely used in certain scientific disciplines.
|
|
|
\item
|
|
|
For manipulating files in the HDF5 format, there are two excellent
|
|
|
options in Python: The PyTables project offers a high-level, object
|
|
|
oriented approach to manipulating HDF5 datasets, while the h5py
|
|
|
project offers a more direct mapping to the standard HDF5 library
|
|
|
interface. Both are excellent tools; if you need to work with HDF5
|
|
|
datasets you should read some of their documentation and examples and
|
|
|
decide which approach is a better match for your needs.
|
|
|
\end{itemize}
|
|
|
|
|
|
\section{High quality data visualization with Matplotlib}
|
|
|
The \href{http://matplotlib.sf.net}{matplotlib} library is a powerful
|
|
|
tool capable of producing complex publication-quality figures with fine
|
|
|
layout control in two and three dimensions; here we will only provide a
|
|
|
minimal self-contained introduction to its usage that covers the
|
|
|
functionality needed for the rest of the book. We encourage the reader
|
|
|
to read the tutorials included with the matplotlib documentation as well
|
|
|
as to browse its extensive gallery of examples that include source code.
|
|
|
|
|
|
Just as we typically use the shorthand \texttt{np} for Numpy, we will
|
|
|
use \texttt{plt} for the \texttt{matplotlib.pyplot} module where the
|
|
|
easy-to-use plotting functions reside (the library contains a rich
|
|
|
object-oriented architecture that we don't have the space to discuss
|
|
|
here):
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
import matplotlib.pyplot as plt
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\end{codecell}
|
|
|
The most frequently used function is simply called \texttt{plot}, here
|
|
|
is how you can make a simple plot of $\sin(x)$ for $x \in [0, 2\pi]$
|
|
|
with labels and a grid (we use the semicolon in the last line to
|
|
|
suppress the display of some information that is unnecessary right now):
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
x = np.linspace(0, 2*np.pi)
|
|
|
y = np.sin(x)
|
|
|
plt.plot(x,y, label='sin(x)')
|
|
|
plt.legend()
|
|
|
plt.grid()
|
|
|
plt.title('Harmonic')
|
|
|
plt.xlabel('x')
|
|
|
plt.ylabel('y');
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{center}
|
|
|
\includegraphics[width=0.7\textwidth]{IntroNumPy_orig_files/IntroNumPy_orig_fig_01.pdf}
|
|
|
\par
|
|
|
\end{center}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
You can control the style, color and other properties of the markers,
|
|
|
for example:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
plt.plot(x, y, linewidth=2);
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{center}
|
|
|
\includegraphics[width=0.7\textwidth]{IntroNumPy_orig_files/IntroNumPy_orig_fig_02.pdf}
|
|
|
\par
|
|
|
\end{center}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
plt.plot(x, y, 'o', markersize=5, color='r');
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{center}
|
|
|
\includegraphics[width=0.7\textwidth]{IntroNumPy_orig_files/IntroNumPy_orig_fig_03.pdf}
|
|
|
\par
|
|
|
\end{center}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
We will now see how to create a few other common plot types, such as a
|
|
|
simple error plot:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
# example data
|
|
|
x = np.arange(0.1, 4, 0.5)
|
|
|
y = np.exp(-x)
|
|
|
|
|
|
# example variable error bar values
|
|
|
yerr = 0.1 + 0.2*np.sqrt(x)
|
|
|
xerr = 0.1 + yerr
|
|
|
|
|
|
# First illustrate basic pyplot interface, using defaults where possible.
|
|
|
plt.figure()
|
|
|
plt.errorbar(x, y, xerr=0.2, yerr=0.4)
|
|
|
plt.title("Simplest errorbars, 0.2 in x, 0.4 in y");
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{center}
|
|
|
\includegraphics[width=0.7\textwidth]{IntroNumPy_orig_files/IntroNumPy_orig_fig_04.pdf}
|
|
|
\par
|
|
|
\end{center}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
A simple log plot
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
x = np.linspace(-5, 5)
|
|
|
y = np.exp(-x**2)
|
|
|
plt.semilogy(x, y);
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{center}
|
|
|
\includegraphics[width=0.7\textwidth]{IntroNumPy_orig_files/IntroNumPy_orig_fig_05.pdf}
|
|
|
\par
|
|
|
\end{center}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
A histogram annotated with text inside the plot, using the \texttt{text}
|
|
|
function:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
mu, sigma = 100, 15
|
|
|
x = mu + sigma * np.random.randn(10000)
|
|
|
|
|
|
# the histogram of the data
|
|
|
n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)
|
|
|
|
|
|
plt.xlabel('Smarts')
|
|
|
plt.ylabel('Probability')
|
|
|
plt.title('Histogram of IQ')
|
|
|
# This will put a text fragment at the position given:
|
|
|
plt.text(55, .027, r'$\mu=100,\ \sigma=15$', fontsize=14)
|
|
|
plt.axis([40, 160, 0, 0.03])
|
|
|
plt.grid(True)
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{center}
|
|
|
\includegraphics[width=0.7\textwidth]{IntroNumPy_orig_files/IntroNumPy_orig_fig_06.pdf}
|
|
|
\par
|
|
|
\end{center}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\subsection{Image display}
|
|
|
The \texttt{imshow} command can display single or multi-channel images.
|
|
|
A simple array of random numbers, plotted in grayscale:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
from matplotlib import cm
|
|
|
plt.imshow(np.random.rand(5, 10), cmap=cm.gray, interpolation='nearest');
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{center}
|
|
|
\includegraphics[width=0.7\textwidth]{IntroNumPy_orig_files/IntroNumPy_orig_fig_07.pdf}
|
|
|
\par
|
|
|
\end{center}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
A real photograph is a multichannel image, \texttt{imshow} interprets it
|
|
|
correctly:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
img = plt.imread('stinkbug.png')
|
|
|
print 'Dimensions of the array img:', img.shape
|
|
|
plt.imshow(img);
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{verbatim}
|
|
|
Dimensions of the array img: (375, 500, 3)
|
|
|
\end{verbatim}
|
|
|
\begin{center}
|
|
|
\includegraphics[width=0.7\textwidth]{IntroNumPy_orig_files/IntroNumPy_orig_fig_08.pdf}
|
|
|
\par
|
|
|
\end{center}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\subsection{Simple 3d plotting with matplotlib}
|
|
|
Note that you must execute at least once in your session:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
from mpl_toolkits.mplot3d import Axes3D
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\end{codecell}
|
|
|
One this has been done, you can create 3d axes with the
|
|
|
\texttt{projection='3d'} keyword to \texttt{add\_subplot}:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
fig = plt.figure()
|
|
|
fig.add_subplot(<other arguments here>, projection='3d')
|
|
|
\end{verbatim}
|
|
|
|
|
|
|
|
|
A simple surface plot:
|
|
|
|
|
|
\begin{codecell}
|
|
|
\begin{codeinput}
|
|
|
\begin{lstlisting}
|
|
|
from mpl_toolkits.mplot3d.axes3d import Axes3D
|
|
|
from matplotlib import cm
|
|
|
|
|
|
fig = plt.figure()
|
|
|
ax = fig.add_subplot(1, 1, 1, projection='3d')
|
|
|
X = np.arange(-5, 5, 0.25)
|
|
|
Y = np.arange(-5, 5, 0.25)
|
|
|
X, Y = np.meshgrid(X, Y)
|
|
|
R = np.sqrt(X**2 + Y**2)
|
|
|
Z = np.sin(R)
|
|
|
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet,
|
|
|
linewidth=0, antialiased=False)
|
|
|
ax.set_zlim3d(-1.01, 1.01);
|
|
|
\end{lstlisting}
|
|
|
\end{codeinput}
|
|
|
\begin{codeoutput}
|
|
|
\begin{center}
|
|
|
\includegraphics[width=0.7\textwidth]{IntroNumPy_orig_files/IntroNumPy_orig_fig_09.pdf}
|
|
|
\par
|
|
|
\end{center}
|
|
|
\end{codeoutput}
|
|
|
\end{codecell}
|
|
|
\section{IPython: a powerful interactive environment}
|
|
|
A key component of the everyday workflow of most scientific computing
|
|
|
environments is a good interactive environment, that is, a system in
|
|
|
which you can execute small amounts of code and view the results
|
|
|
immediately, combining both printing out data and opening graphical
|
|
|
visualizations. All modern systems for scientific computing, commercial
|
|
|
and open source, include such functionality.
|
|
|
|
|
|
Out of the box, Python also offers a simple interactive shell with very
|
|
|
limited capabilities. But just like the scientific community built Numpy
|
|
|
to provide arrays suited for scientific work (since Pytyhon's lists
|
|
|
aren't optimal for this task), it has also developed an interactive
|
|
|
environment much more sophisticated than the built-in one. The
|
|
|
\href{http://ipython.org}{IPython project} offers a set of tools to make
|
|
|
productive use of the Python language, all the while working
|
|
|
interactively and with immedate feedback on your results. The basic
|
|
|
tools that IPython provides are:
|
|
|
|
|
|
\begin{enumerate}[1.]
|
|
|
\item
|
|
|
A powerful terminal shell, with many features designed to increase the
|
|
|
fluidity and productivity of everyday scientific workflows, including:
|
|
|
|
|
|
\begin{itemize}
|
|
|
\item
|
|
|
rich introspection of all objects and variables including easy
|
|
|
access to the source code of any function
|
|
|
\item
|
|
|
powerful and extensible tab completion of variables and filenames,
|
|
|
\item
|
|
|
tight integration with matplotlib, supporting interactive figures
|
|
|
that don't block the terminal,
|
|
|
\item
|
|
|
direct access to the filesystem and underlying operating system,
|
|
|
\item
|
|
|
an extensible system for shell-like commands called `magics' that
|
|
|
reduce the work needed to perform many common tasks,
|
|
|
\item
|
|
|
tools for easily running, timing, profiling and debugging your
|
|
|
codes,
|
|
|
\item
|
|
|
syntax highlighted error messages with much more detail than the
|
|
|
default Python ones,
|
|
|
\item
|
|
|
logging and access to all previous history of inputs, including
|
|
|
across sessions
|
|
|
\end{itemize}
|
|
|
\item
|
|
|
A Qt console that provides the look and feel of a terminal, but adds
|
|
|
support for inline figures, graphical calltips, a persistent session
|
|
|
that can survive crashes (even segfaults) of the kernel process, and
|
|
|
more.
|
|
|
\item
|
|
|
A web-based notebook that can execute code and also contain rich text
|
|
|
and figures, mathematical equations and arbitrary HTML. This notebook
|
|
|
presents a document-like view with cells where code is executed but
|
|
|
that can be edited in-place, reordered, mixed with explanatory text
|
|
|
and figures, etc.
|
|
|
\item
|
|
|
A high-performance, low-latency system for parallel computing that
|
|
|
supports the control of a cluster of IPython engines communicating
|
|
|
over a network, with optimizations that minimize unnecessary copying
|
|
|
of large objects (especially numpy arrays).
|
|
|
\end{enumerate}
|
|
|
We will now discuss the highlights of the tools 1-3 above so that you
|
|
|
can make them an effective part of your workflow. The topic of parallel
|
|
|
computing is beyond the scope of this document, but we encourage you to
|
|
|
read the extensive
|
|
|
\href{http://ipython.org/ipython-doc/rel-0.12.1/parallel/index.html}{documentation}
|
|
|
and \href{http://minrk.github.com/scipy-tutorial-2011/}{tutorials} on
|
|
|
this available on the IPython website.
|
|
|
|
|
|
\subsection{The IPython terminal}
|
|
|
You can start IPython at the terminal simply by typing:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
$ ipython
|
|
|
\end{verbatim}
|
|
|
which will provide you some basic information about how to get started
|
|
|
and will then open a prompt labeled \texttt{In {[}1{]}:} for you to
|
|
|
start typing. Here we type $2^{64}$ and Python computes the result for
|
|
|
us in exact arithmetic, returning it as \texttt{Out{[}1{]}}:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
$ ipython
|
|
|
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
|
|
|
Type "copyright", "credits" or "license" for more information.
|
|
|
|
|
|
IPython 0.13.dev -- An enhanced Interactive Python.
|
|
|
? -> Introduction and overview of IPython's features.
|
|
|
%quickref -> Quick reference.
|
|
|
help -> Python's own help system.
|
|
|
object? -> Details about 'object', use 'object??' for extra details.
|
|
|
|
|
|
In [1]: 2**64
|
|
|
Out[1]: 18446744073709551616L
|
|
|
\end{verbatim}
|
|
|
The first thing you should know about IPython is that all your inputs
|
|
|
and outputs are saved. There are two variables named \texttt{In} and
|
|
|
\texttt{Out} which are filled as you work with your results.
|
|
|
Furthermore, all outputs are also saved to auto-created variables of the
|
|
|
form \texttt{\_NN} where \texttt{NN} is the prompt number, and inputs to
|
|
|
\texttt{\_iNN}. This allows you to recover quickly the result of a prior
|
|
|
computation by referring to its number even if you forgot to store it as
|
|
|
a variable. For example, later on in the above session you can do:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [6]: print _1
|
|
|
18446744073709551616
|
|
|
\end{verbatim}
|
|
|
|
|
|
|
|
|
We strongly recommend that you take a few minutes to read at least the
|
|
|
basic introduction provided by the \texttt{?} command, and keep in mind
|
|
|
that the \texttt{\%quickref} command at all times can be used as a quick
|
|
|
reference ``cheat sheet'' of the most frequently used features of
|
|
|
IPython.
|
|
|
|
|
|
At the IPython prompt, any valid Python code that you type will be
|
|
|
executed similarly to the default Python shell (though often with more
|
|
|
informative feedback). But since IPython is a \emph{superset} of the
|
|
|
default Python shell; let's have a brief look at some of its additional
|
|
|
functionality.
|
|
|
|
|
|
\textbf{Object introspection}
|
|
|
|
|
|
A simple \texttt{?} command provides a general introduction to IPython,
|
|
|
but as indicated in the banner above, you can use the \texttt{?} syntax
|
|
|
to ask for details about any object. For example, if we type
|
|
|
\texttt{\_1?}, IPython will print the following details about this
|
|
|
variable:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [14]: _1?
|
|
|
Type: long
|
|
|
Base Class: <type 'long'>
|
|
|
String Form:18446744073709551616
|
|
|
Namespace: Interactive
|
|
|
Docstring:
|
|
|
long(x[, base]) -> integer
|
|
|
|
|
|
Convert a string or number to a long integer, if possible. A floating
|
|
|
|
|
|
[etc... snipped for brevity]
|
|
|
\end{verbatim}
|
|
|
If you add a second \texttt{?} and for any oobject \texttt{x} type
|
|
|
\texttt{x??}, IPython will try to provide an even more detailed analsysi
|
|
|
of the object, including its syntax-highlighted source code when it can
|
|
|
be found. It's possible that \texttt{x??} returns the same information
|
|
|
as \texttt{x?}, but in many cases \texttt{x??} will indeed provide
|
|
|
additional details.
|
|
|
|
|
|
Finally, the \texttt{?} syntax is also useful to search
|
|
|
\emph{namespaces} with wildcards. Suppose you are wondering if there is
|
|
|
any function in Numpy that may do text-related things; with
|
|
|
\texttt{np.*txt*?}, IPython will print all the names in the \texttt{np}
|
|
|
namespace (our Numpy shorthand) that have `txt' anywhere in their name:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [17]: np.*txt*?
|
|
|
np.genfromtxt
|
|
|
np.loadtxt
|
|
|
np.mafromtxt
|
|
|
np.ndfromtxt
|
|
|
np.recfromtxt
|
|
|
np.savetxt
|
|
|
\end{verbatim}
|
|
|
|
|
|
|
|
|
\textbf{Tab completion}
|
|
|
|
|
|
IPython makes the tab key work extra hard for you as a way to rapidly
|
|
|
inspect objects and libraries. Whenever you have typed something at the
|
|
|
prompt, by hitting the \texttt{\textless{}tab\textgreater{}} key IPython
|
|
|
will try to complete the rest of the line. For this, IPython will
|
|
|
analyze the text you had so far and try to search for Python data or
|
|
|
files that may match the context you have already provided.
|
|
|
|
|
|
For example, if you type \texttt{np.load} and hit the key, you'll see:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [21]: np.load<TAB HERE>
|
|
|
np.load np.loads np.loadtxt
|
|
|
\end{verbatim}
|
|
|
so you can quickly find all the load-related functionality in numpy. Tab
|
|
|
completion works even for function arguments, for example consider this
|
|
|
function definition:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [20]: def f(x, frobinate=False):
|
|
|
....: if frobinate:
|
|
|
....: return x**2
|
|
|
....:
|
|
|
\end{verbatim}
|
|
|
If you now use the \texttt{\textless{}tab\textgreater{}} key after
|
|
|
having typed `fro' you'll get all valid Python completions, but those
|
|
|
marked with \texttt{=} at the end are known to be keywords of your
|
|
|
function:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [21]: f(2, fro<TAB HERE>
|
|
|
frobinate= frombuffer fromfunction frompyfunc fromstring
|
|
|
from fromfile fromiter fromregex frozenset
|
|
|
\end{verbatim}
|
|
|
at this point you can add the \texttt{b} letter and hit
|
|
|
\texttt{\textless{}tab\textgreater{}} once more, and IPython will finish
|
|
|
the line for you:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [21]: f(2, frobinate=
|
|
|
\end{verbatim}
|
|
|
As a beginner, simply get into the habit of using
|
|
|
\texttt{\textless{}tab\textgreater{}} after most objects; it should
|
|
|
quickly become second nature as you will see how helps keep a fluid
|
|
|
workflow and discover useful information. Later on you can also
|
|
|
customize this behavior by writing your own completion code, if you so
|
|
|
desire.
|
|
|
|
|
|
\textbf{Matplotlib integration}
|
|
|
|
|
|
One of the most useful features of IPython for scientists is its tight
|
|
|
integration with matplotlib: at the terminal IPython lets you open
|
|
|
matplotlib figures without blocking your typing (which is what happens
|
|
|
if you try to do the same thing at the default Python shell), and in the
|
|
|
Qt console and notebook you can even view your figures embedded in your
|
|
|
workspace next to the code that created them.
|
|
|
|
|
|
The matplotlib support can be either activated when you start IPython by
|
|
|
passing the \texttt{-{}-pylab} flag, or at any point later in your
|
|
|
session by using the \texttt{\%pylab} command. If you start IPython with
|
|
|
\texttt{-{}-pylab}, you'll see something like this (note the extra
|
|
|
message about pylab):
|
|
|
|
|
|
\begin{verbatim}
|
|
|
$ ipython --pylab
|
|
|
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
|
|
|
Type "copyright", "credits" or "license" for more information.
|
|
|
|
|
|
IPython 0.13.dev -- An enhanced Interactive Python.
|
|
|
? -> Introduction and overview of IPython's features.
|
|
|
%quickref -> Quick reference.
|
|
|
help -> Python's own help system.
|
|
|
object? -> Details about 'object', use 'object??' for extra details.
|
|
|
|
|
|
Welcome to pylab, a matplotlib-based Python environment [backend: Qt4Agg].
|
|
|
For more information, type 'help(pylab)'.
|
|
|
|
|
|
In [1]:
|
|
|
\end{verbatim}
|
|
|
Furthermore, IPython will import \texttt{numpy} with the \texttt{np}
|
|
|
shorthand, \texttt{matplotlib.pyplot} as \texttt{plt}, and it will also
|
|
|
load all of the numpy and pyplot top-level names so that you can
|
|
|
directly type something like:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [1]: x = linspace(0, 2*pi, 200)
|
|
|
|
|
|
In [2]: plot(x, sin(x))
|
|
|
Out[2]: [<matplotlib.lines.Line2D at 0x9e7c16c>]
|
|
|
\end{verbatim}
|
|
|
instead of having to prefix each call with its full signature (as we
|
|
|
have been doing in the examples thus far):
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [3]: x = np.linspace(0, 2*np.pi, 200)
|
|
|
|
|
|
In [4]: plt.plot(x, np.sin(x))
|
|
|
Out[4]: [<matplotlib.lines.Line2D at 0x9e900ac>]
|
|
|
\end{verbatim}
|
|
|
This shorthand notation can be a huge time-saver when working
|
|
|
interactively (it's a few characters but you are likely to type them
|
|
|
hundreds of times in a session). But we should note that as you develop
|
|
|
persistent scripts and notebooks meant for reuse, it's best to get in
|
|
|
the habit of using the longer notation (known as \emph{fully qualified
|
|
|
names} as it's clearer where things come from and it makes for more
|
|
|
robust, readable and maintainable code in the long run).
|
|
|
|
|
|
\textbf{Access to the operating system and files}
|
|
|
|
|
|
In IPython, you can type \texttt{ls} to see your files or \texttt{cd} to
|
|
|
change directories, just like you would at a regular system prompt:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [2]: cd tests
|
|
|
/home/fperez/ipython/nbconvert/tests
|
|
|
|
|
|
In [3]: ls test.*
|
|
|
test.aux test.html test.ipynb test.log test.out test.pdf test.rst test.tex
|
|
|
\end{verbatim}
|
|
|
Furthermore, if you use the \texttt{!} at the beginning of a line, any
|
|
|
commands you pass afterwards go directly to the operating system:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [4]: !echo "Hello IPython"
|
|
|
Hello IPython
|
|
|
\end{verbatim}
|
|
|
IPython offers a useful twist in this feature: it will substitute in the
|
|
|
command the value of any \emph{Python} variable you may have if you
|
|
|
prepend it with a \texttt{\$} sign:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [5]: message = 'IPython interpolates from Python to the shell'
|
|
|
|
|
|
In [6]: !echo $message
|
|
|
IPython interpolates from Python to the shell
|
|
|
\end{verbatim}
|
|
|
This feature can be extremely useful, as it lets you combine the power
|
|
|
and clarity of Python for complex logic with the immediacy and
|
|
|
familiarity of many shell commands. Additionally, if you start the line
|
|
|
with \emph{two} \texttt{\$\$} signs, the output of the command will be
|
|
|
automatically captured as a list of lines, e.g.:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [10]: !!ls test.*
|
|
|
Out[10]:
|
|
|
['test.aux',
|
|
|
'test.html',
|
|
|
'test.ipynb',
|
|
|
'test.log',
|
|
|
'test.out',
|
|
|
'test.pdf',
|
|
|
'test.rst',
|
|
|
'test.tex']
|
|
|
\end{verbatim}
|
|
|
As explained above, you can now use this as the variable \texttt{\_10}.
|
|
|
If you directly want to capture the output of a system command to a
|
|
|
Python variable, you can use the syntax \texttt{=!}:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [11]: testfiles =! ls test.*
|
|
|
|
|
|
In [12]: print testfiles
|
|
|
['test.aux', 'test.html', 'test.ipynb', 'test.log', 'test.out', 'test.pdf', 'test.rst', 'test.tex']
|
|
|
\end{verbatim}
|
|
|
Finally, the special \texttt{\%alias} command lets you define names that
|
|
|
are shorthands for system commands, so that you can type them without
|
|
|
having to prefix them via \texttt{!} explicitly (for example,
|
|
|
\texttt{ls} is an alias that has been predefined for you at startup).
|
|
|
|
|
|
\textbf{Magic commands}
|
|
|
|
|
|
IPython has a system for special commands, called `magics', that let you
|
|
|
control IPython itself and perform many common tasks with a more
|
|
|
shell-like syntax: it uses spaces for delimiting arguments, flags can be
|
|
|
set with dashes and all arguments are treated as strings, so no
|
|
|
additional quoting is required. This kind of syntax is invalid in the
|
|
|
Python language but very convenient for interactive typing (less
|
|
|
parentheses, commans and quoting everywhere); IPython distinguishes the
|
|
|
two by detecting lines that start with the \texttt{\%} character.
|
|
|
|
|
|
You can learn more about the magic system by simply typing
|
|
|
\texttt{\%magic} at the prompt, which will give you a short description
|
|
|
plus the documentation on \emph{all} available magics. If you want to
|
|
|
see only a listing of existing magics, you can use \texttt{\%lsmagic}:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [4]: lsmagic
|
|
|
Available magic functions:
|
|
|
%alias %autocall %autoindent %automagic %bookmark %c %cd %colors %config %cpaste
|
|
|
%debug %dhist %dirs %doctest_mode %ds %ed %edit %env %gui %hist %history
|
|
|
%install_default_config %install_ext %install_profiles %load_ext %loadpy %logoff %logon
|
|
|
%logstart %logstate %logstop %lsmagic %macro %magic %notebook %page %paste %pastebin
|
|
|
%pd %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %pop %popd %pprint %precision %profile
|
|
|
%prun %psearch %psource %pushd %pwd %pycat %pylab %quickref %recall %rehashx
|
|
|
%reload_ext %rep %rerun %reset %reset_selective %run %save %sc %stop %store %sx %tb
|
|
|
%time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode
|
|
|
|
|
|
Automagic is ON, % prefix NOT needed for magic functions.
|
|
|
\end{verbatim}
|
|
|
Note how the example above omitted the eplicit \texttt{\%} marker and
|
|
|
simply uses \texttt{lsmagic}. As long as the `automagic' feature is on
|
|
|
(which it is by default), you can omit the \texttt{\%} marker as long as
|
|
|
there is no ambiguity with a Python variable of the same name.
|
|
|
|
|
|
\textbf{Running your code}
|
|
|
|
|
|
While it's easy to type a few lines of code in IPython, for any
|
|
|
long-lived work you should keep your codes in Python scripts (or in
|
|
|
IPython notebooks, see below). Consider that you have a script, in this
|
|
|
case trivially simple for the sake of brevity, named \texttt{simple.py}:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [12]: !cat simple.py
|
|
|
import numpy as np
|
|
|
|
|
|
x = np.random.normal(size=100)
|
|
|
|
|
|
print 'First elment of x:', x[0]
|
|
|
\end{verbatim}
|
|
|
The typical workflow with IPython is to use the \texttt{\%run} magic to
|
|
|
execute your script (you can omit the .py extension if you want). When
|
|
|
you run it, the script will execute just as if it had been run at the
|
|
|
system prompt with \texttt{python simple.py} (though since modules don't
|
|
|
get re-executed on new imports by Python, all system initialization is
|
|
|
essentially free, which can have a significant run time impact in some
|
|
|
cases):
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [13]: run simple
|
|
|
First elment of x: -1.55872256289
|
|
|
\end{verbatim}
|
|
|
Once it completes, all variables defined in it become available for you
|
|
|
to use interactively:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [14]: x.shape
|
|
|
Out[14]: (100,)
|
|
|
\end{verbatim}
|
|
|
This allows you to plot data, try out ideas, etc, in a
|
|
|
\texttt{\%run}/interact/edit cycle that can be very productive. As you
|
|
|
start understanding your problem better you can refine your script
|
|
|
further, incrementally improving it based on the work you do at the
|
|
|
IPython prompt. At any point you can use the \texttt{\%hist} magic to
|
|
|
print out your history without prompts, so that you can copy useful
|
|
|
fragments back into the script.
|
|
|
|
|
|
By default, \texttt{\%run} executes scripts in a completely empty
|
|
|
namespace, to better mimic how they would execute at the system prompt
|
|
|
with plain Python. But if you use the \texttt{-i} flag, the script will
|
|
|
also see your interactively defined variables. This lets you edit in a
|
|
|
script larger amounts of code that still behave as if you had typed them
|
|
|
at the IPython prompt.
|
|
|
|
|
|
You can also get a summary of the time taken by your script with the
|
|
|
\texttt{-t} flag; consider a different script \texttt{randsvd.py} that
|
|
|
takes a bit longer to run:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [21]: run -t randsvd.py
|
|
|
|
|
|
IPython CPU timings (estimated):
|
|
|
User : 0.38 s.
|
|
|
System : 0.04 s.
|
|
|
Wall time: 0.34 s.
|
|
|
\end{verbatim}
|
|
|
\texttt{User} is the time spent by the computer executing your code,
|
|
|
while \texttt{System} is the time the operating system had to work on
|
|
|
your behalf, doing things like memory allocation that are needed by your
|
|
|
code but that you didn't explicitly program and that happen inside the
|
|
|
kernel. The \texttt{Wall time} is the time on a `clock on the wall'
|
|
|
between the start and end of your program.
|
|
|
|
|
|
If \texttt{Wall \textgreater{} User+System}, your code is most likely
|
|
|
waiting idle for certain periods. That could be waiting for data to
|
|
|
arrive from a remote source or perhaps because the operating system has
|
|
|
to swap large amounts of virtual memory. If you know that your code
|
|
|
doesn't explicitly wait for remote data to arrive, you should
|
|
|
investigate further to identify possible ways of improving the
|
|
|
performance profile.
|
|
|
|
|
|
If you only want to time how long a single statement takes, you don't
|
|
|
need to put it into a script as you can use the \texttt{\%timeit} magic,
|
|
|
which uses Python's \texttt{timeit} module to very carefully measure
|
|
|
timig data; \texttt{timeit} can measure even short statements that
|
|
|
execute extremely fast:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [27]: %timeit a=1
|
|
|
10000000 loops, best of 3: 23 ns per loop
|
|
|
\end{verbatim}
|
|
|
and for code that runs longer, it automatically adjusts so the overall
|
|
|
measurement doesn't take too long:
|
|
|
|
|
|
\begin{verbatim}
|
|
|
In [28]: %timeit np.linalg.svd(x)
|
|
|
1 loops, best of 3: 310 ms per loop
|
|
|
\end{verbatim}
|
|
|
The \texttt{\%run} magic still has more options for debugging and
|
|
|
profiling data; you should read its documentation for many useful
|
|
|
details (as always, just type \texttt{\%run?}).
|
|
|
|
|
|
\subsection{The graphical Qt console}
|
|
|
If you type at the system prompt (see the IPython website for
|
|
|
installation details, as this requires some additional libraries):
|
|
|
|
|
|
\begin{verbatim}
|
|
|
$ ipython qtconsole
|
|
|
\end{verbatim}
|
|
|
instead of opening in a terminal as before, IPython will start a
|
|
|
graphical console that at first sight appears just like a terminal, but
|
|
|
which is in fact much more capable than a text-only terminal. This is a
|
|
|
specialized terminal designed for interactive scientific work, and it
|
|
|
supports full multi-line editing with color highlighting and graphical
|
|
|
calltips for functions, it can keep multiple IPython sessions open
|
|
|
simultaneously in tabs, and when scripts run it can display the figures
|
|
|
inline directly in the work area.
|
|
|
|
|
|
% This cell is for the pdflatex output only
|
|
|
\begin{figure}[htbp]
|
|
|
\centering
|
|
|
\includegraphics[width=3in]{ipython_qtconsole2.png}
|
|
|
\caption{The IPython Qt console: a lightweight terminal for scientific exploration, with code, results and graphics in a soingle environment.}
|
|
|
\end{figure}
|
|
|
The Qt console accepts the same \texttt{-{}-pylab} startup flags as the
|
|
|
terminal, but you can additionally supply the value
|
|
|
\texttt{-{}-pylab inline}, which enables the support for inline graphics
|
|
|
shown in the figure. This is ideal for keeping all the code and figures
|
|
|
in the same session, given that the console can save the output of your
|
|
|
entire session to HTML or PDF.
|
|
|
|
|
|
Since the Qt console makes it far more convenient than the terminal to
|
|
|
edit blocks of code with multiple lines, in this environment it's worth
|
|
|
knowing about the \texttt{\%loadpy} magic function. \texttt{\%loadpy}
|
|
|
takes a path to a local file or remote URL, fetches its contents, and
|
|
|
puts it in the work area for you to further edit and execute. It can be
|
|
|
an extremely fast and convenient way of loading code from local disk or
|
|
|
remote examples from sites such as the
|
|
|
\href{http://matplotlib.sourceforge.net/gallery.html}{Matplotlib
|
|
|
gallery}.
|
|
|
|
|
|
Other than its enhanced capabilities for code and graphics, all of the
|
|
|
features of IPython we've explained before remain functional in this
|
|
|
graphical console.
|
|
|
|
|
|
\subsection{The IPython Notebook}
|
|
|
The third way to interact with IPython, in addition to the terminal and
|
|
|
graphical Qt console, is a powerful web interface called the ``IPython
|
|
|
Notebook''. If you run at the system console (you can omit the
|
|
|
\texttt{pylab} flags if you don't need plotting support):
|
|
|
|
|
|
\begin{verbatim}
|
|
|
$ ipython notebook --pylab inline
|
|
|
\end{verbatim}
|
|
|
IPython will start a process that runs a web server in your local
|
|
|
machine and to which a web browser can connect. The Notebook is a
|
|
|
workspace that lets you execute code in blocks called `cells' and
|
|
|
displays any results and figures, but which can also contain arbitrary
|
|
|
text (including LaTeX-formatted mathematical expressions) and any rich
|
|
|
media that a modern web browser is capable of displaying.
|
|
|
|
|
|
% This cell is for the pdflatex output only
|
|
|
\begin{figure}[htbp]
|
|
|
\centering
|
|
|
\includegraphics[width=3in]{ipython-notebook-specgram-2.png}
|
|
|
\caption{The IPython Notebook: text, equations, code, results, graphics and other multimedia in an open format for scientific exploration and collaboration}
|
|
|
\end{figure}
|
|
|
In fact, this document was written as a Notebook, and only exported to
|
|
|
LaTeX for printing. Inside of each cell, all the features of IPython
|
|
|
that we have discussed before remain functional, since ultimately this
|
|
|
web client is communicating with the same IPython code that runs in the
|
|
|
terminal. But this interface is a much more rich and powerful
|
|
|
environment for maintaining long-term ``live and executable'' scientific
|
|
|
documents.
|
|
|
|
|
|
Notebook environments have existed in commercial systems like
|
|
|
Mathematica(TM) and Maple(TM) for a long time; in the open source world
|
|
|
the \href{http://sagemath.org}{Sage} project blazed this particular
|
|
|
trail starting in 2006, and now we bring all the features that have made
|
|
|
IPython such a widely used tool to a Notebook model.
|
|
|
|
|
|
Since the Notebook runs as a web application, it is possible to
|
|
|
configure it for remote access, letting you run your computations on a
|
|
|
persistent server close to your data, which you can then access remotely
|
|
|
from any browser-equipped computer. We encourage you to read the
|
|
|
extensive documentation provided by the IPython project for details on
|
|
|
how to do this and many more features of the notebook.
|
|
|
|
|
|
Finally, as we said earlier, IPython also has a high-level and easy to
|
|
|
use set of libraries for parallel computing, that let you control
|
|
|
(interactively if desired) not just one IPython but an entire cluster of
|
|
|
`IPython engines'. Unfortunately a detailed discussion of these tools is
|
|
|
beyond the scope of this text, but should you need to parallelize your
|
|
|
analysis codes, a quick read of the tutorials and examples provided at
|
|
|
the IPython site may prove fruitful.
|
|
|
|
|
|
\end{document}
|
|
|
|