##// END OF EJS Templates
update ref texfile
Matthias BUSSONNIER -
Show More
@@ -1,2255 +1,2255 b''
1 1 %% This file was auto-generated by IPython, do NOT edit
2 2 %% Conversion from the original notebook file:
3 3 %% tests/ipynbref/IntroNumPy.orig.ipynb
4 4 %%
5 5 \documentclass[11pt,english]{article}
6 6
7 7 %% This is the automatic preamble used by IPython. Note that it does *not*
8 8 %% include a documentclass declaration, that is added at runtime to the overall
9 9 %% document.
10 10
11 11 \usepackage{amsmath}
12 12 \usepackage{amssymb}
13 13 \usepackage{graphicx}
14 14 \usepackage{ucs}
15 15 \usepackage[utf8x]{inputenc}
16 16
17 17 % needed for markdown enumerations to work
18 18 \usepackage{enumerate}
19 19
20 20 % Slightly bigger margins than the latex defaults
21 21 \usepackage{geometry}
22 22 \geometry{verbose,tmargin=3cm,bmargin=3cm,lmargin=2.5cm,rmargin=2.5cm}
23 23
24 24 % Define a few colors for use in code, links and cell shading
25 25 \usepackage{color}
26 26 \definecolor{orange}{cmyk}{0,0.4,0.8,0.2}
27 27 \definecolor{darkorange}{rgb}{.71,0.21,0.01}
28 28 \definecolor{darkgreen}{rgb}{.12,.54,.11}
29 29 \definecolor{myteal}{rgb}{.26, .44, .56}
30 30 \definecolor{gray}{gray}{0.45}
31 31 \definecolor{lightgray}{gray}{.95}
32 32 \definecolor{mediumgray}{gray}{.8}
33 33 \definecolor{inputbackground}{rgb}{.95, .95, .85}
34 34 \definecolor{outputbackground}{rgb}{.95, .95, .95}
35 35 \definecolor{traceback}{rgb}{1, .95, .95}
36 36
37 37 % Framed environments for code cells (inputs, outputs, errors, ...). The
38 38 % various uses of \unskip (or not) at the end were fine-tuned by hand, so don't
39 39 % randomly change them unless you're sure of the effect it will have.
40 40 \usepackage{framed}
41 41
42 42 % remove extraneous vertical space in boxes
43 43 \setlength\fboxsep{0pt}
44 44
45 45 % codecell is the whole input+output set of blocks that a Code cell can
46 46 % generate.
47 47
48 48 % TODO: unfortunately, it seems that using a framed codecell environment breaks
49 49 % the ability of the frames inside of it to be broken across pages. This
50 50 % causes at least the problem of having lots of empty space at the bottom of
51 51 % pages as new frames are moved to the next page, and if a single frame is too
52 52 % long to fit on a page, will completely stop latex from compiling the
53 53 % document. So unless we figure out a solution to this, we'll have to instead
54 54 % leave the codecell env. as empty. I'm keeping the original codecell
55 55 % definition here (a thin vertical bar) for reference, in case we find a
56 56 % solution to the page break issue.
57 57
58 58 %% \newenvironment{codecell}{%
59 59 %% \def\FrameCommand{\color{mediumgray} \vrule width 1pt \hspace{5pt}}%
60 60 %% \MakeFramed{\vspace{-0.5em}}}
61 61 %% {\unskip\endMakeFramed}
62 62
63 63 % For now, make this a no-op...
64 64 \newenvironment{codecell}{}
65 65
66 66 \newenvironment{codeinput}{%
67 67 \def\FrameCommand{\colorbox{inputbackground}}%
68 68 \MakeFramed{\advance\hsize-\width \FrameRestore}}
69 69 {\unskip\endMakeFramed}
70 70
71 71 \newenvironment{codeoutput}{%
72 72 \def\FrameCommand{\colorbox{outputbackground}}%
73 73 \vspace{-1.4em}
74 74 \MakeFramed{\advance\hsize-\width \FrameRestore}}
75 75 {\unskip\medskip\endMakeFramed}
76 76
77 77 \newenvironment{traceback}{%
78 78 \def\FrameCommand{\colorbox{traceback}}%
79 79 \MakeFramed{\advance\hsize-\width \FrameRestore}}
80 80 {\endMakeFramed}
81 81
82 82 % Use and configure listings package for nicely formatted code
83 83 \usepackage{listingsutf8}
84 84 \lstset{
85 85 language=python,
86 86 inputencoding=utf8x,
87 87 extendedchars=\true,
88 88 aboveskip=\smallskipamount,
89 89 belowskip=\smallskipamount,
90 90 xleftmargin=2mm,
91 91 breaklines=true,
92 92 basicstyle=\small \ttfamily,
93 93 showstringspaces=false,
94 94 keywordstyle=\color{blue}\bfseries,
95 95 commentstyle=\color{myteal},
96 96 stringstyle=\color{darkgreen},
97 97 identifierstyle=\color{darkorange},
98 98 columns=fullflexible, % tighter character kerning, like verb
99 99 }
100 100
101 101 % The hyperref package gives us a pdf with properly built
102 102 % internal navigation ('pdf bookmarks' for the table of contents,
103 103 % internal cross-reference links, web links for URLs, etc.)
104 104 \usepackage{hyperref}
105 105 \hypersetup{
106 106 breaklinks=true, % so long urls are correctly broken across lines
107 107 colorlinks=true,
108 108 urlcolor=blue,
109 109 linkcolor=darkorange,
110 110 citecolor=darkgreen,
111 111 }
112 112
113 113 % hardcode size of all verbatim environments to be a bit smaller
114 114 \makeatletter
115 115 \g@addto@macro\@verbatim\small\topsep=0.5em\partopsep=0pt
116 116 \makeatother
117 117
118 118 % Prevent overflowing lines due to urls and other hard-to-break entities.
119 119 \sloppy
120 120
121 121 \begin{document}
122 122
123 123 \section{An Introduction to the Scientific Python Ecosystem}
124 124 While the Python language is an excellent tool for general-purpose
125 125 programming, with a highly readable syntax, rich and powerful data types
126 126 (strings, lists, sets, dictionaries, arbitrary length integers, etc) and
127 127 a very comprehensive standard library, it was not designed specifically
128 128 for mathematical and scientific computing. Neither the language nor its
129 129 standard library have facilities for the efficient representation of
130 130 multidimensional datasets, tools for linear algebra and general matrix
131 131 manipulations (an essential building block of virtually all technical
132 132 computing), nor any data visualization facilities.
133 133
134 134 In particular, Python lists are very flexible containers that can be
135 135 nested arbitrarily deep and which can hold any Python object in them,
136 136 but they are poorly suited to represent efficiently common mathematical
137 137 constructs like vectors and matrices. In contrast, much of our modern
138 138 heritage of scientific computing has been built on top of libraries
139 139 written in the Fortran language, which has native support for vectors
140 140 and matrices as well as a library of mathematical functions that can
141 141 efficiently operate on entire arrays at once.
142 142
143 143 \subsection{Scientific Python: a collaboration of projects built by scientists}
144 144 The scientific community has developed a set of related Python libraries
145 145 that provide powerful array facilities, linear algebra, numerical
146 146 algorithms, data visualization and more. In this appendix, we will
147 147 briefly outline the tools most frequently used for this purpose, that
148 148 make ``Scientific Python'' something far more powerful than the Python
149 149 language alone.
150 150
151 151 For reasons of space, we can only describe in some detail the central
152 152 Numpy library, but below we provide links to the websites of each
153 153 project where you can read their documentation in more detail.
154 154
155 155 First, let's look at an overview of the basic tools that most scientists
156 156 use in daily research with Python. The core of this ecosystem is
157 157 composed of:
158 158
159 159 \begin{itemize}
160 160 \item
161 161 Numpy: the basic library that most others depend on, it provides a
162 162 powerful array type that can represent multidmensional datasets of
163 163 many different kinds and that supports arithmetic operations. Numpy
164 164 also provides a library of common mathematical functions, basic linear
165 165 algebra, random number generation and Fast Fourier Transforms. Numpy
166 166 can be found at \href{http://numpy.scipy.org}{numpy.scipy.org}
167 167 \item
168 168 Scipy: a large collection of numerical algorithms that operate on
169 169 numpy arrays and provide facilities for many common tasks in
170 170 scientific computing, including dense and sparse linear algebra
171 171 support, optimization, special functions, statistics, n-dimensional
172 172 image processing, signal processing and more. Scipy can be found at
173 173 \href{http://scipy.org}{scipy.org}.
174 174 \item
175 175 Matplotlib: a data visualization library with a strong focus on
176 176 producing high-quality output, it supports a variety of common
177 177 scientific plot types in two and three dimensions, with precise
178 178 control over the final output and format for publication-quality
179 179 results. Matplotlib can also be controlled interactively allowing
180 180 graphical manipulation of your data (zooming, panning, etc) and can be
181 181 used with most modern user interface toolkits. It can be found at
182 182 \href{http://matplotlib.sf.net}{matplotlib.sf.net}.
183 183 \item
184 184 IPython: while not strictly scientific in nature, IPython is the
185 185 interactive environment in which many scientists spend their time.
186 186 IPython provides a powerful Python shell that integrates tightly with
187 187 Matplotlib and with easy access to the files and operating system, and
188 188 which can execute in a terminal or in a graphical Qt console. IPython
189 189 also has a web-based notebook interface that can combine code with
190 190 text, mathematical expressions, figures and multimedia. It can be
191 191 found at \href{http://ipython.org}{ipython.org}.
192 192 \end{itemize}
193 193 While each of these tools can be installed separately, in our opinion
194 194 the most convenient way today of accessing them (especially on Windows
195 195 and Mac computers) is to install the
196 196 \href{http://www.enthought.com/products/epd\_free.php}{Free Edition of
197 197 the Enthought Python Distribution} which contain all the above. Other
198 198 free alternatives on Windows (but not on Macs) are
199 199 \href{http://code.google.com/p/pythonxy}{Python(x,y)} and
200 200 \href{http://www.lfd.uci.edu/~gohlke/pythonlibs}{Christoph Gohlke's
201 201 packages page}.
202 202
203 203 These four `core' libraries are in practice complemented by a number of
204 204 other tools for more specialized work. We will briefly list here the
205 205 ones that we think are the most commonly needed:
206 206
207 207 \begin{itemize}
208 208 \item
209 209 Sympy: a symbolic manipulation tool that turns a Python session into a
210 210 computer algebra system. It integrates with the IPython notebook,
211 211 rendering results in properly typeset mathematical notation.
212 212 \href{http://sympy.org}{sympy.org}.
213 213 \item
214 214 Mayavi: sophisticated 3d data visualization;
215 215 \href{http://code.enthought.com/projects/mayavi}{code.enthought.com/projects/mayavi}.
216 216 \item
217 217 Cython: a bridge language between Python and C, useful both to
218 218 optimize performance bottlenecks in Python and to access C libraries
219 219 directly; \href{http://cython.org}{cython.org}.
220 220 \item
221 221 Pandas: high-performance data structures and data analysis tools, with
222 222 powerful data alignment and structural manipulation capabilities;
223 223 \href{http://pandas.pydata.org}{pandas.pydata.org}.
224 224 \item
225 225 Statsmodels: statistical data exploration and model estimation;
226 226 \href{http://statsmodels.sourceforge.net}{statsmodels.sourceforge.net}.
227 227 \item
228 228 Scikit-learn: general purpose machine learning algorithms with a
229 229 common interface; \href{http://scikit-learn.org}{scikit-learn.org}.
230 230 \item
231 231 Scikits-image: image processing toolbox;
232 232 \href{http://scikits-image.org}{scikits-image.org}.
233 233 \item
234 234 NetworkX: analysis of complex networks (in the graph theoretical
235 235 sense); \href{http://networkx.lanl.gov}{networkx.lanl.gov}.
236 236 \item
237 237 PyTables: management of hierarchical datasets using the
238 238 industry-standard HDF5 format;
239 239 \href{http://www.pytables.org}{www.pytables.org}.
240 240 \end{itemize}
241 241 Beyond these, for any specific problem you should look on the internet
242 242 first, before starting to write code from scratch. There's a good chance
243 243 that someone, somewhere, has written an open source library that you can
244 244 use for part or all of your problem.
245 245
246 246 \subsection{A note about the examples below}
247 247 In all subsequent examples, you will see blocks of input code, followed
248 248 by the results of the code if the code generated output. This output may
249 249 include text, graphics and other result objects. These blocks of input
250 250 can be pasted into your interactive IPython session or notebook for you
251 251 to execute. In the print version of this document, a thin vertical bar
252 252 on the left of the blocks of input and output shows which blocks go
253 253 together.
254 254
255 255 If you are reading this text as an actual IPython notebook, you can
256 256 press \texttt{Shift-Enter} or use the `play' button on the toolbar
257 257 (right-pointing triangle) to execute each block of code, known as a
258 258 `cell' in IPython:
259 259
260 260 \begin{codecell}
261 261 \begin{codeinput}
262 262 \begin{lstlisting}
263 263 # This is a block of code, below you'll see its output
264 264 print "Welcome to the world of scientific computing with Python!"
265 265 \end{lstlisting}
266 266 \end{codeinput}
267 267 \begin{codeoutput}
268 268 \begin{verbatim}
269 269 Welcome to the world of scientific computing with Python!
270 270 \end{verbatim}
271 271 \end{codeoutput}
272 272 \end{codecell}
273 273 \section{Motivation: the trapezoidal rule}
274 274 In subsequent sections we'll provide a basic introduction to the nuts
275 275 and bolts of the basic scientific python tools; but we'll first motivate
276 276 it with a brief example that illustrates what you can do in a few lines
277 277 with these tools. For this, we will use the simple problem of
278 278 approximating a definite integral with the trapezoid rule:
279 279
280 280 \[
281 281 \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right).
282 282 \]
283 283
284 284 Our task will be to compute this formula for a function such as:
285 285
286 286 \[
287 287 f(x) = (x-3)(x-5)(x-7)+85
288 288 \]
289 289
290 290 integrated between $a=1$ and $b=9$.
291 291
292 292 First, we define the function and sample it evenly between 0 and 10 at
293 293 200 points:
294 294
295 295 \begin{codecell}
296 296 \begin{codeinput}
297 297 \begin{lstlisting}
298 298 def f(x):
299 299 return (x-3)*(x-5)*(x-7)+85
300 300
301 301 import numpy as np
302 302 x = np.linspace(0, 10, 200)
303 303 y = f(x)
304 304 \end{lstlisting}
305 305 \end{codeinput}
306 306 \end{codecell}
307 307 We select $a$ and $b$, our integration limits, and we take only a few
308 308 points in that region to illustrate the error behavior of the trapezoid
309 309 approximation:
310 310
311 311 \begin{codecell}
312 312 \begin{codeinput}
313 313 \begin{lstlisting}
314 314 a, b = 1, 9
315 315 xint = x[logical_and(x>=a, x<=b)][::30]
316 316 yint = y[logical_and(x>=a, x<=b)][::30]
317 317 \end{lstlisting}
318 318 \end{codeinput}
319 319 \end{codecell}
320 320 Let's plot both the function and the area below it in the trapezoid
321 321 approximation:
322 322
323 323 \begin{codecell}
324 324 \begin{codeinput}
325 325 \begin{lstlisting}
326 326 import matplotlib.pyplot as plt
327 327 plt.plot(x, y, lw=2)
328 328 plt.axis([0, 10, 0, 140])
329 329 plt.fill_between(xint, 0, yint, facecolor='gray', alpha=0.4)
330 330 plt.text(0.5 * (a + b), 30,r"$\int_a^b f(x)dx$", horizontalalignment='center', fontsize=20);
331 331 \end{lstlisting}
332 332 \end{codeinput}
333 333 \begin{codeoutput}
334 334 \begin{center}
335 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_00.pdf}
335 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_00.pdf}
336 336 \par
337 337 \end{center}
338 338 \end{codeoutput}
339 339 \end{codecell}
340 340 Compute the integral both at high accuracy and with the trapezoid
341 341 approximation
342 342
343 343 \begin{codecell}
344 344 \begin{codeinput}
345 345 \begin{lstlisting}
346 346 from scipy.integrate import quad, trapz
347 347 integral, error = quad(f, 1, 9)
348 348 trap_integral = trapz(yint, xint)
349 349 print "The integral is: %g +/- %.1e" % (integral, error)
350 350 print "The trapezoid approximation with", len(xint), "points is:", trap_integral
351 351 print "The absolute error is:", abs(integral - trap_integral)
352 352 \end{lstlisting}
353 353 \end{codeinput}
354 354 \begin{codeoutput}
355 355 \begin{verbatim}
356 356 The integral is: 680 +/- 7.5e-12
357 357 The trapezoid approximation with 6 points is: 621.286411141
358 358 The absolute error is: 58.7135888589
359 359 \end{verbatim}
360 360 \end{codeoutput}
361 361 \end{codecell}
362 362 This simple example showed us how, combining the numpy, scipy and
363 363 matplotlib libraries we can provide an illustration of a standard method
364 364 in elementary calculus with just a few lines of code. We will now
365 365 discuss with more detail the basic usage of these tools.
366 366
367 367 \section{NumPy arrays: the right data structure for scientific computing}
368 368 \subsection{Basics of Numpy arrays}
369 369 We now turn our attention to the Numpy library, which forms the base
370 370 layer for the entire `scipy ecosystem'. Once you have installed numpy,
371 371 you can import it as
372 372
373 373 \begin{codecell}
374 374 \begin{codeinput}
375 375 \begin{lstlisting}
376 376 import numpy
377 377 \end{lstlisting}
378 378 \end{codeinput}
379 379 \end{codecell}
380 380 though in this book we will use the common shorthand
381 381
382 382 \begin{codecell}
383 383 \begin{codeinput}
384 384 \begin{lstlisting}
385 385 import numpy as np
386 386 \end{lstlisting}
387 387 \end{codeinput}
388 388 \end{codecell}
389 389 As mentioned above, the main object provided by numpy is a powerful
390 390 array. We'll start by exploring how the numpy array differs from Python
391 391 lists. We start by creating a simple list and an array with the same
392 392 contents of the list:
393 393
394 394 \begin{codecell}
395 395 \begin{codeinput}
396 396 \begin{lstlisting}
397 397 lst = [10, 20, 30, 40]
398 398 arr = np.array([10, 20, 30, 40])
399 399 \end{lstlisting}
400 400 \end{codeinput}
401 401 \end{codecell}
402 402 Elements of a one-dimensional array are accessed with the same syntax as
403 403 a list:
404 404
405 405 \begin{codecell}
406 406 \begin{codeinput}
407 407 \begin{lstlisting}
408 408 lst[0]
409 409 \end{lstlisting}
410 410 \end{codeinput}
411 411 \begin{codeoutput}
412 412 \begin{verbatim}
413 413 10
414 414 \end{verbatim}
415 415 \end{codeoutput}
416 416 \end{codecell}
417 417 \begin{codecell}
418 418 \begin{codeinput}
419 419 \begin{lstlisting}
420 420 arr[0]
421 421 \end{lstlisting}
422 422 \end{codeinput}
423 423 \begin{codeoutput}
424 424 \begin{verbatim}
425 425 10
426 426 \end{verbatim}
427 427 \end{codeoutput}
428 428 \end{codecell}
429 429 \begin{codecell}
430 430 \begin{codeinput}
431 431 \begin{lstlisting}
432 432 arr[-1]
433 433 \end{lstlisting}
434 434 \end{codeinput}
435 435 \begin{codeoutput}
436 436 \begin{verbatim}
437 437 40
438 438 \end{verbatim}
439 439 \end{codeoutput}
440 440 \end{codecell}
441 441 \begin{codecell}
442 442 \begin{codeinput}
443 443 \begin{lstlisting}
444 444 arr[2:]
445 445 \end{lstlisting}
446 446 \end{codeinput}
447 447 \begin{codeoutput}
448 448 \begin{verbatim}
449 449 array([30, 40])
450 450 \end{verbatim}
451 451 \end{codeoutput}
452 452 \end{codecell}
453 453 The first difference to note between lists and arrays is that arrays are
454 454 \emph{homogeneous}; i.e.~all elements of an array must be of the same
455 455 type. In contrast, lists can contain elements of arbitrary type. For
456 456 example, we can change the last element in our list above to be a
457 457 string:
458 458
459 459 \begin{codecell}
460 460 \begin{codeinput}
461 461 \begin{lstlisting}
462 462 lst[-1] = 'a string inside a list'
463 463 lst
464 464 \end{lstlisting}
465 465 \end{codeinput}
466 466 \begin{codeoutput}
467 467 \begin{verbatim}
468 468 [10, 20, 30, 'a string inside a list']
469 469 \end{verbatim}
470 470 \end{codeoutput}
471 471 \end{codecell}
472 472 but the same can not be done with an array, as we get an error message:
473 473
474 474 \begin{codecell}
475 475 \begin{codeinput}
476 476 \begin{lstlisting}
477 477 arr[-1] = 'a string inside an array'
478 478 \end{lstlisting}
479 479 \end{codeinput}
480 480 \begin{codeoutput}
481 481 \begin{traceback}
482 482 \begin{verbatim}
483 483 ---------------------------------------------------------------------------
484 484 ValueError Traceback (most recent call last)
485 485 /home/fperez/teach/book-math-labtool/<ipython-input-13-29c0bfa5fa8a> in <module>()
486 486 ----> 1 arr[-1] = 'a string inside an array'
487 487
488 488 ValueError: invalid literal for long() with base 10: 'a string inside an array'
489 489 \end{verbatim}
490 490 \end{traceback}
491 491 \end{codeoutput}
492 492 \end{codecell}
493 493 The information about the type of an array is contained in its
494 494 \emph{dtype} attribute:
495 495
496 496 \begin{codecell}
497 497 \begin{codeinput}
498 498 \begin{lstlisting}
499 499 arr.dtype
500 500 \end{lstlisting}
501 501 \end{codeinput}
502 502 \begin{codeoutput}
503 503 \begin{verbatim}
504 504 dtype('int32')
505 505 \end{verbatim}
506 506 \end{codeoutput}
507 507 \end{codecell}
508 508 Once an array has been created, its dtype is fixed and it can only store
509 509 elements of the same type. For this example where the dtype is integer,
510 510 if we store a floating point number it will be automatically converted
511 511 into an integer:
512 512
513 513 \begin{codecell}
514 514 \begin{codeinput}
515 515 \begin{lstlisting}
516 516 arr[-1] = 1.234
517 517 arr
518 518 \end{lstlisting}
519 519 \end{codeinput}
520 520 \begin{codeoutput}
521 521 \begin{verbatim}
522 522 array([10, 20, 30, 1])
523 523 \end{verbatim}
524 524 \end{codeoutput}
525 525 \end{codecell}
526 526 Above we created an array from an existing list; now let us now see
527 527 other ways in which we can create arrays, which we'll illustrate next. A
528 528 common need is to have an array initialized with a constant value, and
529 529 very often this value is 0 or 1 (suitable as starting value for additive
530 530 and multiplicative loops respectively); \texttt{zeros} creates arrays of
531 531 all zeros, with any desired dtype:
532 532
533 533 \begin{codecell}
534 534 \begin{codeinput}
535 535 \begin{lstlisting}
536 536 np.zeros(5, float)
537 537 \end{lstlisting}
538 538 \end{codeinput}
539 539 \begin{codeoutput}
540 540 \begin{verbatim}
541 541 array([ 0., 0., 0., 0., 0.])
542 542 \end{verbatim}
543 543 \end{codeoutput}
544 544 \end{codecell}
545 545 \begin{codecell}
546 546 \begin{codeinput}
547 547 \begin{lstlisting}
548 548 np.zeros(3, int)
549 549 \end{lstlisting}
550 550 \end{codeinput}
551 551 \begin{codeoutput}
552 552 \begin{verbatim}
553 553 array([0, 0, 0])
554 554 \end{verbatim}
555 555 \end{codeoutput}
556 556 \end{codecell}
557 557 \begin{codecell}
558 558 \begin{codeinput}
559 559 \begin{lstlisting}
560 560 np.zeros(3, complex)
561 561 \end{lstlisting}
562 562 \end{codeinput}
563 563 \begin{codeoutput}
564 564 \begin{verbatim}
565 565 array([ 0.+0.j, 0.+0.j, 0.+0.j])
566 566 \end{verbatim}
567 567 \end{codeoutput}
568 568 \end{codecell}
569 569 and similarly for \texttt{ones}:
570 570
571 571 \begin{codecell}
572 572 \begin{codeinput}
573 573 \begin{lstlisting}
574 574 print '5 ones:', np.ones(5)
575 575 \end{lstlisting}
576 576 \end{codeinput}
577 577 \begin{codeoutput}
578 578 \begin{verbatim}
579 579 5 ones: [ 1. 1. 1. 1. 1.]
580 580 \end{verbatim}
581 581 \end{codeoutput}
582 582 \end{codecell}
583 583 If we want an array initialized with an arbitrary value, we can create
584 584 an empty array and then use the fill method to put the value we want
585 585 into the array:
586 586
587 587 \begin{codecell}
588 588 \begin{codeinput}
589 589 \begin{lstlisting}
590 590 a = empty(4)
591 591 a.fill(5.5)
592 592 a
593 593 \end{lstlisting}
594 594 \end{codeinput}
595 595 \begin{codeoutput}
596 596 \begin{verbatim}
597 597 array([ 5.5, 5.5, 5.5, 5.5])
598 598 \end{verbatim}
599 599 \end{codeoutput}
600 600 \end{codecell}
601 601 Numpy also offers the \texttt{arange} function, which works like the
602 602 builtin \texttt{range} but returns an array instead of a list:
603 603
604 604 \begin{codecell}
605 605 \begin{codeinput}
606 606 \begin{lstlisting}
607 607 np.arange(5)
608 608 \end{lstlisting}
609 609 \end{codeinput}
610 610 \begin{codeoutput}
611 611 \begin{verbatim}
612 612 array([0, 1, 2, 3, 4])
613 613 \end{verbatim}
614 614 \end{codeoutput}
615 615 \end{codecell}
616 616 and the \texttt{linspace} and \texttt{logspace} functions to create
617 617 linearly and logarithmically-spaced grids respectively, with a fixed
618 618 number of points and including both ends of the specified interval:
619 619
620 620 \begin{codecell}
621 621 \begin{codeinput}
622 622 \begin{lstlisting}
623 623 print "A linear grid between 0 and 1:", np.linspace(0, 1, 5)
624 624 print "A logarithmic grid between 10**1 and 10**4: ", np.logspace(1, 4, 4)
625 625 \end{lstlisting}
626 626 \end{codeinput}
627 627 \begin{codeoutput}
628 628 \begin{verbatim}
629 629 A linear grid between 0 and 1: [ 0. 0.25 0.5 0.75 1. ]
630 630 A logarithmic grid between 10**1 and 10**4: [ 10. 100. 1000. 10000.]
631 631 \end{verbatim}
632 632 \end{codeoutput}
633 633 \end{codecell}
634 634 Finally, it is often useful to create arrays with random numbers that
635 635 follow a specific distribution. The \texttt{np.random} module contains a
636 636 number of functions that can be used to this effect, for example this
637 637 will produce an array of 5 random samples taken from a standard normal
638 638 distribution (0 mean and variance 1):
639 639
640 640 \begin{codecell}
641 641 \begin{codeinput}
642 642 \begin{lstlisting}
643 643 np.random.randn(5)
644 644 \end{lstlisting}
645 645 \end{codeinput}
646 646 \begin{codeoutput}
647 647 \begin{verbatim}
648 648 array([-0.08633343, -0.67375434, 1.00589536, 0.87081651, 1.65597822])
649 649 \end{verbatim}
650 650 \end{codeoutput}
651 651 \end{codecell}
652 652 whereas this will also give 5 samples, but from a normal distribution
653 653 with a mean of 10 and a variance of 3:
654 654
655 655 \begin{codecell}
656 656 \begin{codeinput}
657 657 \begin{lstlisting}
658 658 norm10 = np.random.normal(10, 3, 5)
659 659 norm10
660 660 \end{lstlisting}
661 661 \end{codeinput}
662 662 \begin{codeoutput}
663 663 \begin{verbatim}
664 664 array([ 8.94879575, 5.53038269, 8.24847281, 12.14944165, 11.56209294])
665 665 \end{verbatim}
666 666 \end{codeoutput}
667 667 \end{codecell}
668 668 \subsection{Indexing with other arrays}
669 669 Above we saw how to index arrays with single numbers and slices, just
670 670 like Python lists. But arrays allow for a more sophisticated kind of
671 671 indexing which is very powerful: you can index an array with another
672 672 array, and in particular with an array of boolean values. This is
673 673 particluarly useful to extract information from an array that matches a
674 674 certain condition.
675 675
676 676 Consider for example that in the array \texttt{norm10} we want to
677 677 replace all values above 9 with the value 0. We can do so by first
678 678 finding the \emph{mask} that indicates where this condition is true or
679 679 false:
680 680
681 681 \begin{codecell}
682 682 \begin{codeinput}
683 683 \begin{lstlisting}
684 684 mask = norm10 > 9
685 685 mask
686 686 \end{lstlisting}
687 687 \end{codeinput}
688 688 \begin{codeoutput}
689 689 \begin{verbatim}
690 690 array([False, False, False, True, True], dtype=bool)
691 691 \end{verbatim}
692 692 \end{codeoutput}
693 693 \end{codecell}
694 694 Now that we have this mask, we can use it to either read those values or
695 695 to reset them to 0:
696 696
697 697 \begin{codecell}
698 698 \begin{codeinput}
699 699 \begin{lstlisting}
700 700 print 'Values above 9:', norm10[mask]
701 701 \end{lstlisting}
702 702 \end{codeinput}
703 703 \begin{codeoutput}
704 704 \begin{verbatim}
705 705 Values above 9: [ 12.14944165 11.56209294]
706 706 \end{verbatim}
707 707 \end{codeoutput}
708 708 \end{codecell}
709 709 \begin{codecell}
710 710 \begin{codeinput}
711 711 \begin{lstlisting}
712 712 print 'Resetting all values above 9 to 0...'
713 713 norm10[mask] = 0
714 714 print norm10
715 715 \end{lstlisting}
716 716 \end{codeinput}
717 717 \begin{codeoutput}
718 718 \begin{verbatim}
719 719 Resetting all values above 9 to 0...
720 720 [ 8.94879575 5.53038269 8.24847281 0. 0. ]
721 721 \end{verbatim}
722 722 \end{codeoutput}
723 723 \end{codecell}
724 724 \subsection{Arrays with more than one dimension}
725 725 Up until now all our examples have used one-dimensional arrays. But
726 726 Numpy can create arrays of aribtrary dimensions, and all the methods
727 727 illustrated in the previous section work with more than one dimension.
728 728 For example, a list of lists can be used to initialize a two dimensional
729 729 array:
730 730
731 731 \begin{codecell}
732 732 \begin{codeinput}
733 733 \begin{lstlisting}
734 734 lst2 = [[1, 2], [3, 4]]
735 735 arr2 = np.array([[1, 2], [3, 4]])
736 736 arr2
737 737 \end{lstlisting}
738 738 \end{codeinput}
739 739 \begin{codeoutput}
740 740 \begin{verbatim}
741 741 array([[1, 2],
742 742 [3, 4]])
743 743 \end{verbatim}
744 744 \end{codeoutput}
745 745 \end{codecell}
746 746 With two-dimensional arrays we start seeing the power of numpy: while a
747 747 nested list can be indexed using repeatedly the \texttt{{[} {]}}
748 748 operator, multidimensional arrays support a much more natural indexing
749 749 syntax with a single \texttt{{[} {]}} and a set of indices separated by
750 750 commas:
751 751
752 752 \begin{codecell}
753 753 \begin{codeinput}
754 754 \begin{lstlisting}
755 755 print lst2[0][1]
756 756 print arr2[0,1]
757 757 \end{lstlisting}
758 758 \end{codeinput}
759 759 \begin{codeoutput}
760 760 \begin{verbatim}
761 761 2
762 762 2
763 763 \end{verbatim}
764 764 \end{codeoutput}
765 765 \end{codecell}
766 766 Most of the array creation functions listed above can be used with more
767 767 than one dimension, for example:
768 768
769 769 \begin{codecell}
770 770 \begin{codeinput}
771 771 \begin{lstlisting}
772 772 np.zeros((2,3))
773 773 \end{lstlisting}
774 774 \end{codeinput}
775 775 \begin{codeoutput}
776 776 \begin{verbatim}
777 777 array([[ 0., 0., 0.],
778 778 [ 0., 0., 0.]])
779 779 \end{verbatim}
780 780 \end{codeoutput}
781 781 \end{codecell}
782 782 \begin{codecell}
783 783 \begin{codeinput}
784 784 \begin{lstlisting}
785 785 np.random.normal(10, 3, (2, 4))
786 786 \end{lstlisting}
787 787 \end{codeinput}
788 788 \begin{codeoutput}
789 789 \begin{verbatim}
790 790 array([[ 11.26788826, 4.29619866, 11.09346496, 9.73861307],
791 791 [ 10.54025996, 9.5146268 , 10.80367214, 13.62204505]])
792 792 \end{verbatim}
793 793 \end{codeoutput}
794 794 \end{codecell}
795 795 In fact, the shape of an array can be changed at any time, as long as
796 796 the total number of elements is unchanged. For example, if we want a 2x4
797 797 array with numbers increasing from 0, the easiest way to create it is:
798 798
799 799 \begin{codecell}
800 800 \begin{codeinput}
801 801 \begin{lstlisting}
802 802 arr = np.arange(8).reshape(2,4)
803 803 print arr
804 804 \end{lstlisting}
805 805 \end{codeinput}
806 806 \begin{codeoutput}
807 807 \begin{verbatim}
808 808 [[0 1 2 3]
809 809 [4 5 6 7]]
810 810 \end{verbatim}
811 811 \end{codeoutput}
812 812 \end{codecell}
813 813 With multidimensional arrays, you can also use slices, and you can mix
814 814 and match slices and single indices in the different dimensions (using
815 815 the same array as above):
816 816
817 817 \begin{codecell}
818 818 \begin{codeinput}
819 819 \begin{lstlisting}
820 820 print 'Slicing in the second row:', arr[1, 2:4]
821 821 print 'All rows, third column :', arr[:, 2]
822 822 \end{lstlisting}
823 823 \end{codeinput}
824 824 \begin{codeoutput}
825 825 \begin{verbatim}
826 826 Slicing in the second row: [6 7]
827 827 All rows, third column : [2 6]
828 828 \end{verbatim}
829 829 \end{codeoutput}
830 830 \end{codecell}
831 831 If you only provide one index, then you will get an array with one less
832 832 dimension containing that row:
833 833
834 834 \begin{codecell}
835 835 \begin{codeinput}
836 836 \begin{lstlisting}
837 837 print 'First row: ', arr[0]
838 838 print 'Second row: ', arr[1]
839 839 \end{lstlisting}
840 840 \end{codeinput}
841 841 \begin{codeoutput}
842 842 \begin{verbatim}
843 843 First row: [0 1 2 3]
844 844 Second row: [4 5 6 7]
845 845 \end{verbatim}
846 846 \end{codeoutput}
847 847 \end{codecell}
848 848 Now that we have seen how to create arrays with more than one dimension,
849 849 it's a good idea to look at some of the most useful properties and
850 850 methods that arrays have. The following provide basic information about
851 851 the size, shape and data in the array:
852 852
853 853 \begin{codecell}
854 854 \begin{codeinput}
855 855 \begin{lstlisting}
856 856 print 'Data type :', arr.dtype
857 857 print 'Total number of elements :', arr.size
858 858 print 'Number of dimensions :', arr.ndim
859 859 print 'Shape (dimensionality) :', arr.shape
860 860 print 'Memory used (in bytes) :', arr.nbytes
861 861 \end{lstlisting}
862 862 \end{codeinput}
863 863 \begin{codeoutput}
864 864 \begin{verbatim}
865 865 Data type : int32
866 866 Total number of elements : 8
867 867 Number of dimensions : 2
868 868 Shape (dimensionality) : (2, 4)
869 869 Memory used (in bytes) : 32
870 870 \end{verbatim}
871 871 \end{codeoutput}
872 872 \end{codecell}
873 873 Arrays also have many useful methods, some especially useful ones are:
874 874
875 875 \begin{codecell}
876 876 \begin{codeinput}
877 877 \begin{lstlisting}
878 878 print 'Minimum and maximum :', arr.min(), arr.max()
879 879 print 'Sum and product of all elements :', arr.sum(), arr.prod()
880 880 print 'Mean and standard deviation :', arr.mean(), arr.std()
881 881 \end{lstlisting}
882 882 \end{codeinput}
883 883 \begin{codeoutput}
884 884 \begin{verbatim}
885 885 Minimum and maximum : 0 7
886 886 Sum and product of all elements : 28 0
887 887 Mean and standard deviation : 3.5 2.29128784748
888 888 \end{verbatim}
889 889 \end{codeoutput}
890 890 \end{codecell}
891 891 For these methods, the above operations area all computed on all the
892 892 elements of the array. But for a multidimensional array, it's possible
893 893 to do the computation along a single dimension, by passing the
894 894 \texttt{axis} parameter; for example:
895 895
896 896 \begin{codecell}
897 897 \begin{codeinput}
898 898 \begin{lstlisting}
899 899 print 'For the following array:\n', arr
900 900 print 'The sum of elements along the rows is :', arr.sum(axis=1)
901 901 print 'The sum of elements along the columns is :', arr.sum(axis=0)
902 902 \end{lstlisting}
903 903 \end{codeinput}
904 904 \begin{codeoutput}
905 905 \begin{verbatim}
906 906 For the following array:
907 907 [[0 1 2 3]
908 908 [4 5 6 7]]
909 909 The sum of elements along the rows is : [ 6 22]
910 910 The sum of elements along the columns is : [ 4 6 8 10]
911 911 \end{verbatim}
912 912 \end{codeoutput}
913 913 \end{codecell}
914 914 As you can see in this example, the value of the \texttt{axis} parameter
915 915 is the dimension which will be \emph{consumed} once the operation has
916 916 been carried out. This is why to sum along the rows we use
917 917 \texttt{axis=0}.
918 918
919 919 This can be easily illustrated with an example that has more dimensions;
920 920 we create an array with 4 dimensions and shape \texttt{(3,4,5,6)} and
921 921 sum along the axis number 2 (i.e.~the \emph{third} axis, since in Python
922 922 all counts are 0-based). That consumes the dimension whose length was 5,
923 923 leaving us with a new array that has shape \texttt{(3,4,6)}:
924 924
925 925 \begin{codecell}
926 926 \begin{codeinput}
927 927 \begin{lstlisting}
928 928 np.zeros((3,4,5,6)).sum(2).shape
929 929 \end{lstlisting}
930 930 \end{codeinput}
931 931 \begin{codeoutput}
932 932 \begin{verbatim}
933 933 (3, 4, 6)
934 934 \end{verbatim}
935 935 \end{codeoutput}
936 936 \end{codecell}
937 937 Another widely used property of arrays is the \texttt{.T} attribute,
938 938 which allows you to access the transpose of the array:
939 939
940 940 \begin{codecell}
941 941 \begin{codeinput}
942 942 \begin{lstlisting}
943 943 print 'Array:\n', arr
944 944 print 'Transpose:\n', arr.T
945 945 \end{lstlisting}
946 946 \end{codeinput}
947 947 \begin{codeoutput}
948 948 \begin{verbatim}
949 949 Array:
950 950 [[0 1 2 3]
951 951 [4 5 6 7]]
952 952 Transpose:
953 953 [[0 4]
954 954 [1 5]
955 955 [2 6]
956 956 [3 7]]
957 957 \end{verbatim}
958 958 \end{codeoutput}
959 959 \end{codecell}
960 960 We don't have time here to look at all the methods and properties of
961 961 arrays, here's a complete list. Simply try exploring some of these
962 962 IPython to learn more, or read their description in the full Numpy
963 963 documentation:
964 964
965 965 \begin{verbatim}
966 966 arr.T arr.copy arr.getfield arr.put arr.squeeze
967 967 arr.all arr.ctypes arr.imag arr.ravel arr.std
968 968 arr.any arr.cumprod arr.item arr.real arr.strides
969 969 arr.argmax arr.cumsum arr.itemset arr.repeat arr.sum
970 970 arr.argmin arr.data arr.itemsize arr.reshape arr.swapaxes
971 971 arr.argsort arr.diagonal arr.max arr.resize arr.take
972 972 arr.astype arr.dot arr.mean arr.round arr.tofile
973 973 arr.base arr.dtype arr.min arr.searchsorted arr.tolist
974 974 arr.byteswap arr.dump arr.nbytes arr.setasflat arr.tostring
975 975 arr.choose arr.dumps arr.ndim arr.setfield arr.trace
976 976 arr.clip arr.fill arr.newbyteorder arr.setflags arr.transpose
977 977 arr.compress arr.flags arr.nonzero arr.shape arr.var
978 978 arr.conj arr.flat arr.prod arr.size arr.view
979 979 arr.conjugate arr.flatten arr.ptp arr.sort
980 980 \end{verbatim}
981 981
982 982
983 983 \subsection{Operating with arrays}
984 984 Arrays support all regular arithmetic operators, and the numpy library
985 985 also contains a complete collection of basic mathematical functions that
986 986 operate on arrays. It is important to remember that in general, all
987 987 operations with arrays are applied \emph{element-wise}, i.e., are
988 988 applied to all the elements of the array at the same time. Consider for
989 989 example:
990 990
991 991 \begin{codecell}
992 992 \begin{codeinput}
993 993 \begin{lstlisting}
994 994 arr1 = np.arange(4)
995 995 arr2 = np.arange(10, 14)
996 996 print arr1, '+', arr2, '=', arr1+arr2
997 997 \end{lstlisting}
998 998 \end{codeinput}
999 999 \begin{codeoutput}
1000 1000 \begin{verbatim}
1001 1001 [0 1 2 3] + [10 11 12 13] = [10 12 14 16]
1002 1002 \end{verbatim}
1003 1003 \end{codeoutput}
1004 1004 \end{codecell}
1005 1005 Importantly, you must remember that even the multiplication operator is
1006 1006 by default applied element-wise, it is \emph{not} the matrix
1007 1007 multiplication from linear algebra (as is the case in Matlab, for
1008 1008 example):
1009 1009
1010 1010 \begin{codecell}
1011 1011 \begin{codeinput}
1012 1012 \begin{lstlisting}
1013 1013 print arr1, '*', arr2, '=', arr1*arr2
1014 1014 \end{lstlisting}
1015 1015 \end{codeinput}
1016 1016 \begin{codeoutput}
1017 1017 \begin{verbatim}
1018 1018 [0 1 2 3] * [10 11 12 13] = [ 0 11 24 39]
1019 1019 \end{verbatim}
1020 1020 \end{codeoutput}
1021 1021 \end{codecell}
1022 1022 While this means that in principle arrays must always match in their
1023 1023 dimensionality in order for an operation to be valid, numpy will
1024 1024 \emph{broadcast} dimensions when possible. For example, suppose that you
1025 1025 want to add the number 1.5 to \texttt{arr1}; the following would be a
1026 1026 valid way to do it:
1027 1027
1028 1028 \begin{codecell}
1029 1029 \begin{codeinput}
1030 1030 \begin{lstlisting}
1031 1031 arr1 + 1.5*np.ones(4)
1032 1032 \end{lstlisting}
1033 1033 \end{codeinput}
1034 1034 \begin{codeoutput}
1035 1035 \begin{verbatim}
1036 1036 array([ 1.5, 2.5, 3.5, 4.5])
1037 1037 \end{verbatim}
1038 1038 \end{codeoutput}
1039 1039 \end{codecell}
1040 1040 But thanks to numpy's broadcasting rules, the following is equally
1041 1041 valid:
1042 1042
1043 1043 \begin{codecell}
1044 1044 \begin{codeinput}
1045 1045 \begin{lstlisting}
1046 1046 arr1 + 1.5
1047 1047 \end{lstlisting}
1048 1048 \end{codeinput}
1049 1049 \begin{codeoutput}
1050 1050 \begin{verbatim}
1051 1051 array([ 1.5, 2.5, 3.5, 4.5])
1052 1052 \end{verbatim}
1053 1053 \end{codeoutput}
1054 1054 \end{codecell}
1055 1055 In this case, numpy looked at both operands and saw that the first
1056 1056 (\texttt{arr1}) was a one-dimensional array of length 4 and the second
1057 1057 was a scalar, considered a zero-dimensional object. The broadcasting
1058 1058 rules allow numpy to:
1059 1059
1060 1060 \begin{itemize}
1061 1061 \item
1062 1062 \emph{create} new dimensions of length 1 (since this doesn't change
1063 1063 the size of the array)
1064 1064 \item
1065 1065 `stretch' a dimension of length 1 that needs to be matched to a
1066 1066 dimension of a different size.
1067 1067 \end{itemize}
1068 1068 So in the above example, the scalar 1.5 is effectively:
1069 1069
1070 1070 \begin{itemize}
1071 1071 \item
1072 1072 first `promoted' to a 1-dimensional array of length 1
1073 1073 \item
1074 1074 then, this array is `stretched' to length 4 to match the dimension of
1075 1075 \texttt{arr1}.
1076 1076 \end{itemize}
1077 1077 After these two operations are complete, the addition can proceed as now
1078 1078 both operands are one-dimensional arrays of length 4.
1079 1079
1080 1080 This broadcasting behavior is in practice enormously powerful,
1081 1081 especially because when numpy broadcasts to create new dimensions or to
1082 1082 `stretch' existing ones, it doesn't actually replicate the data. In the
1083 1083 example above the operation is carried \emph{as if} the 1.5 was a 1-d
1084 1084 array with 1.5 in all of its entries, but no actual array was ever
1085 1085 created. This can save lots of memory in cases when the arrays in
1086 1086 question are large and can have significant performance implications.
1087 1087
1088 1088 The general rule is: when operating on two arrays, NumPy compares their
1089 1089 shapes element-wise. It starts with the trailing dimensions, and works
1090 1090 its way forward, creating dimensions of length 1 as needed. Two
1091 1091 dimensions are considered compatible when
1092 1092
1093 1093 \begin{itemize}
1094 1094 \item
1095 1095 they are equal to begin with, or
1096 1096 \item
1097 1097 one of them is 1; in this case numpy will do the `stretching' to make
1098 1098 them equal.
1099 1099 \end{itemize}
1100 1100 If these conditions are not met, a
1101 1101 \texttt{ValueError: frames are not aligned} exception is thrown,
1102 1102 indicating that the arrays have incompatible shapes. The size of the
1103 1103 resulting array is the maximum size along each dimension of the input
1104 1104 arrays.
1105 1105
1106 1106 This shows how the broadcasting rules work in several dimensions:
1107 1107
1108 1108 \begin{codecell}
1109 1109 \begin{codeinput}
1110 1110 \begin{lstlisting}
1111 1111 b = np.array([2, 3, 4, 5])
1112 1112 print arr, '\n\n+', b , '\n----------------\n', arr + b
1113 1113 \end{lstlisting}
1114 1114 \end{codeinput}
1115 1115 \begin{codeoutput}
1116 1116 \begin{verbatim}
1117 1117 [[0 1 2 3]
1118 1118 [4 5 6 7]]
1119 1119
1120 1120 + [2 3 4 5]
1121 1121 ----------------
1122 1122 [[ 2 4 6 8]
1123 1123 [ 6 8 10 12]]
1124 1124 \end{verbatim}
1125 1125 \end{codeoutput}
1126 1126 \end{codecell}
1127 1127 Now, how could you use broadcasting to say add \texttt{{[}4, 6{]}} along
1128 1128 the rows to \texttt{arr} above? Simply performing the direct addition
1129 1129 will produce the error we previously mentioned:
1130 1130
1131 1131 \begin{codecell}
1132 1132 \begin{codeinput}
1133 1133 \begin{lstlisting}
1134 1134 c = np.array([4, 6])
1135 1135 arr + c
1136 1136 \end{lstlisting}
1137 1137 \end{codeinput}
1138 1138 \begin{codeoutput}
1139 1139 \begin{traceback}
1140 1140 \begin{verbatim}
1141 1141 ---------------------------------------------------------------------------
1142 1142 ValueError Traceback (most recent call last)
1143 1143 /home/fperez/teach/book-math-labtool/<ipython-input-45-62aa20ac1980> in <module>()
1144 1144 1 c = np.array([4, 6])
1145 1145 ----> 2 arr + c
1146 1146
1147 1147 ValueError: operands could not be broadcast together with shapes (2,4) (2)
1148 1148 \end{verbatim}
1149 1149 \end{traceback}
1150 1150 \end{codeoutput}
1151 1151 \end{codecell}
1152 1152 According to the rules above, the array \texttt{c} would need to have a
1153 1153 \emph{trailing} dimension of 1 for the broadcasting to work. It turns
1154 1154 out that numpy allows you to `inject' new dimensions anywhere into an
1155 1155 array on the fly, by indexing it with the special object
1156 1156 \texttt{np.newaxis}:
1157 1157
1158 1158 \begin{codecell}
1159 1159 \begin{codeinput}
1160 1160 \begin{lstlisting}
1161 1161 (c[:, np.newaxis]).shape
1162 1162 \end{lstlisting}
1163 1163 \end{codeinput}
1164 1164 \begin{codeoutput}
1165 1165 \begin{verbatim}
1166 1166 (2, 1)
1167 1167 \end{verbatim}
1168 1168 \end{codeoutput}
1169 1169 \end{codecell}
1170 1170 This is exactly what we need, and indeed it works:
1171 1171
1172 1172 \begin{codecell}
1173 1173 \begin{codeinput}
1174 1174 \begin{lstlisting}
1175 1175 arr + c[:, np.newaxis]
1176 1176 \end{lstlisting}
1177 1177 \end{codeinput}
1178 1178 \begin{codeoutput}
1179 1179 \begin{verbatim}
1180 1180 array([[ 4, 5, 6, 7],
1181 1181 [10, 11, 12, 13]])
1182 1182 \end{verbatim}
1183 1183 \end{codeoutput}
1184 1184 \end{codecell}
1185 1185 For the full broadcasting rules, please see the official Numpy docs,
1186 1186 which describe them in detail and with more complex examples.
1187 1187
1188 1188 As we mentioned before, Numpy ships with a full complement of
1189 1189 mathematical functions that work on entire arrays, including logarithms,
1190 1190 exponentials, trigonometric and hyperbolic trigonometric functions, etc.
1191 1191 Furthermore, scipy ships a rich special function library in the
1192 1192 \texttt{scipy.special} module that includes Bessel, Airy, Fresnel,
1193 1193 Laguerre and other classical special functions. For example, sampling
1194 1194 the sine function at 100 points between $0$ and $2\pi$ is as simple as:
1195 1195
1196 1196 \begin{codecell}
1197 1197 \begin{codeinput}
1198 1198 \begin{lstlisting}
1199 1199 x = np.linspace(0, 2*np.pi, 100)
1200 1200 y = np.sin(x)
1201 1201 \end{lstlisting}
1202 1202 \end{codeinput}
1203 1203 \end{codecell}
1204 1204 \subsection{Linear algebra in numpy}
1205 1205 Numpy ships with a basic linear algebra library, and all arrays have a
1206 1206 \texttt{dot} method whose behavior is that of the scalar dot product
1207 1207 when its arguments are vectors (one-dimensional arrays) and the
1208 1208 traditional matrix multiplication when one or both of its arguments are
1209 1209 two-dimensional arrays:
1210 1210
1211 1211 \begin{codecell}
1212 1212 \begin{codeinput}
1213 1213 \begin{lstlisting}
1214 1214 v1 = np.array([2, 3, 4])
1215 1215 v2 = np.array([1, 0, 1])
1216 1216 print v1, '.', v2, '=', v1.dot(v2)
1217 1217 \end{lstlisting}
1218 1218 \end{codeinput}
1219 1219 \begin{codeoutput}
1220 1220 \begin{verbatim}
1221 1221 [2 3 4] . [1 0 1] = 6
1222 1222 \end{verbatim}
1223 1223 \end{codeoutput}
1224 1224 \end{codecell}
1225 1225 Here is a regular matrix-vector multiplication, note that the array
1226 1226 \texttt{v1} should be viewed as a \emph{column} vector in traditional
1227 1227 linear algebra notation; numpy makes no distinction between row and
1228 1228 column vectors and simply verifies that the dimensions match the
1229 1229 required rules of matrix multiplication, in this case we have a
1230 1230 $2 \times 3$ matrix multiplied by a 3-vector, which produces a 2-vector:
1231 1231
1232 1232 \begin{codecell}
1233 1233 \begin{codeinput}
1234 1234 \begin{lstlisting}
1235 1235 A = np.arange(6).reshape(2, 3)
1236 1236 print A, 'x', v1, '=', A.dot(v1)
1237 1237 \end{lstlisting}
1238 1238 \end{codeinput}
1239 1239 \begin{codeoutput}
1240 1240 \begin{verbatim}
1241 1241 [[0 1 2]
1242 1242 [3 4 5]] x [2 3 4] = [11 38]
1243 1243 \end{verbatim}
1244 1244 \end{codeoutput}
1245 1245 \end{codecell}
1246 1246 For matrix-matrix multiplication, the same dimension-matching rules must
1247 1247 be satisfied, e.g.~consider the difference between $A \times A^T$:
1248 1248
1249 1249 \begin{codecell}
1250 1250 \begin{codeinput}
1251 1251 \begin{lstlisting}
1252 1252 print A.dot(A.T)
1253 1253 \end{lstlisting}
1254 1254 \end{codeinput}
1255 1255 \begin{codeoutput}
1256 1256 \begin{verbatim}
1257 1257 [[ 5 14]
1258 1258 [14 50]]
1259 1259 \end{verbatim}
1260 1260 \end{codeoutput}
1261 1261 \end{codecell}
1262 1262 and $A^T \times A$:
1263 1263
1264 1264 \begin{codecell}
1265 1265 \begin{codeinput}
1266 1266 \begin{lstlisting}
1267 1267 print A.T.dot(A)
1268 1268 \end{lstlisting}
1269 1269 \end{codeinput}
1270 1270 \begin{codeoutput}
1271 1271 \begin{verbatim}
1272 1272 [[ 9 12 15]
1273 1273 [12 17 22]
1274 1274 [15 22 29]]
1275 1275 \end{verbatim}
1276 1276 \end{codeoutput}
1277 1277 \end{codecell}
1278 1278 Furthermore, the \texttt{numpy.linalg} module includes additional
1279 1279 functionality such as determinants, matrix norms, Cholesky, eigenvalue
1280 1280 and singular value decompositions, etc. For even more linear algebra
1281 1281 tools, \texttt{scipy.linalg} contains the majority of the tools in the
1282 1282 classic LAPACK libraries as well as functions to operate on sparse
1283 1283 matrices. We refer the reader to the Numpy and Scipy documentations for
1284 1284 additional details on these.
1285 1285
1286 1286 \subsection{Reading and writing arrays to disk}
1287 1287 Numpy lets you read and write arrays into files in a number of ways. In
1288 1288 order to use these tools well, it is critical to understand the
1289 1289 difference between a \emph{text} and a \emph{binary} file containing
1290 1290 numerical data. In a text file, the number $\pi$ could be written as
1291 1291 ``3.141592653589793'', for example: a string of digits that a human can
1292 1292 read, with in this case 15 decimal digits. In contrast, that same number
1293 1293 written to a binary file would be encoded as 8 characters (bytes) that
1294 1294 are not readable by a human but which contain the exact same data that
1295 1295 the variable \texttt{pi} had in the computer's memory.
1296 1296
1297 1297 The tradeoffs between the two modes are thus:
1298 1298
1299 1299 \begin{itemize}
1300 1300 \item
1301 1301 Text mode: occupies more space, precision can be lost (if not all
1302 1302 digits are written to disk), but is readable and editable by hand with
1303 1303 a text editor. Can \emph{only} be used for one- and two-dimensional
1304 1304 arrays.
1305 1305 \item
1306 1306 Binary mode: compact and exact representation of the data in memory,
1307 1307 can't be read or edited by hand. Arrays of any size and dimensionality
1308 1308 can be saved and read without loss of information.
1309 1309 \end{itemize}
1310 1310 First, let's see how to read and write arrays in text mode. The
1311 1311 \texttt{np.savetxt} function saves an array to a text file, with options
1312 1312 to control the precision, separators and even adding a header:
1313 1313
1314 1314 \begin{codecell}
1315 1315 \begin{codeinput}
1316 1316 \begin{lstlisting}
1317 1317 arr = np.arange(10).reshape(2, 5)
1318 1318 np.savetxt('test.out', arr, fmt='%.2e', header="My dataset")
1319 1319 !cat test.out
1320 1320 \end{lstlisting}
1321 1321 \end{codeinput}
1322 1322 \begin{codeoutput}
1323 1323 \begin{verbatim}
1324 1324 # My dataset
1325 1325 0.00e+00 1.00e+00 2.00e+00 3.00e+00 4.00e+00
1326 1326 5.00e+00 6.00e+00 7.00e+00 8.00e+00 9.00e+00
1327 1327 \end{verbatim}
1328 1328 \end{codeoutput}
1329 1329 \end{codecell}
1330 1330 And this same type of file can then be read with the matching
1331 1331 \texttt{np.loadtxt} function:
1332 1332
1333 1333 \begin{codecell}
1334 1334 \begin{codeinput}
1335 1335 \begin{lstlisting}
1336 1336 arr2 = np.loadtxt('test.out')
1337 1337 print arr2
1338 1338 \end{lstlisting}
1339 1339 \end{codeinput}
1340 1340 \begin{codeoutput}
1341 1341 \begin{verbatim}
1342 1342 [[ 0. 1. 2. 3. 4.]
1343 1343 [ 5. 6. 7. 8. 9.]]
1344 1344 \end{verbatim}
1345 1345 \end{codeoutput}
1346 1346 \end{codecell}
1347 1347 For binary data, Numpy provides the \texttt{np.save} and
1348 1348 \texttt{np.savez} routines. The first saves a single array to a file
1349 1349 with \texttt{.npy} extension, while the latter can be used to save a
1350 1350 \emph{group} of arrays into a single file with \texttt{.npz} extension.
1351 1351 The files created with these routines can then be read with the
1352 1352 \texttt{np.load} function.
1353 1353
1354 1354 Let us first see how to use the simpler \texttt{np.save} function to
1355 1355 save a single array:
1356 1356
1357 1357 \begin{codecell}
1358 1358 \begin{codeinput}
1359 1359 \begin{lstlisting}
1360 1360 np.save('test.npy', arr2)
1361 1361 # Now we read this back
1362 1362 arr2n = np.load('test.npy')
1363 1363 # Let's see if any element is non-zero in the difference.
1364 1364 # A value of True would be a problem.
1365 1365 print 'Any differences?', np.any(arr2-arr2n)
1366 1366 \end{lstlisting}
1367 1367 \end{codeinput}
1368 1368 \begin{codeoutput}
1369 1369 \begin{verbatim}
1370 1370 Any differences? False
1371 1371 \end{verbatim}
1372 1372 \end{codeoutput}
1373 1373 \end{codecell}
1374 1374 Now let us see how the \texttt{np.savez} function works. You give it a
1375 1375 filename and either a sequence of arrays or a set of keywords. In the
1376 1376 first mode, the function will auotmatically name the saved arrays in the
1377 1377 archive as \texttt{arr\_0}, \texttt{arr\_1}, etc:
1378 1378
1379 1379 \begin{codecell}
1380 1380 \begin{codeinput}
1381 1381 \begin{lstlisting}
1382 1382 np.savez('test.npz', arr, arr2)
1383 1383 arrays = np.load('test.npz')
1384 1384 arrays.files
1385 1385 \end{lstlisting}
1386 1386 \end{codeinput}
1387 1387 \begin{codeoutput}
1388 1388 \begin{verbatim}
1389 1389 ['arr_1', 'arr_0']
1390 1390 \end{verbatim}
1391 1391 \end{codeoutput}
1392 1392 \end{codecell}
1393 1393 Alternatively, we can explicitly choose how to name the arrays we save:
1394 1394
1395 1395 \begin{codecell}
1396 1396 \begin{codeinput}
1397 1397 \begin{lstlisting}
1398 1398 np.savez('test.npz', array1=arr, array2=arr2)
1399 1399 arrays = np.load('test.npz')
1400 1400 arrays.files
1401 1401 \end{lstlisting}
1402 1402 \end{codeinput}
1403 1403 \begin{codeoutput}
1404 1404 \begin{verbatim}
1405 1405 ['array2', 'array1']
1406 1406 \end{verbatim}
1407 1407 \end{codeoutput}
1408 1408 \end{codecell}
1409 1409 The object returned by \texttt{np.load} from an \texttt{.npz} file works
1410 1410 like a dictionary, though you can also access its constituent files by
1411 1411 attribute using its special \texttt{.f} field; this is best illustrated
1412 1412 with an example with the \texttt{arrays} object from above:
1413 1413
1414 1414 \begin{codecell}
1415 1415 \begin{codeinput}
1416 1416 \begin{lstlisting}
1417 1417 print 'First row of first array:', arrays['array1'][0]
1418 1418 # This is an equivalent way to get the same field
1419 1419 print 'First row of first array:', arrays.f.array1[0]
1420 1420 \end{lstlisting}
1421 1421 \end{codeinput}
1422 1422 \begin{codeoutput}
1423 1423 \begin{verbatim}
1424 1424 First row of first array: [0 1 2 3 4]
1425 1425 First row of first array: [0 1 2 3 4]
1426 1426 \end{verbatim}
1427 1427 \end{codeoutput}
1428 1428 \end{codecell}
1429 1429 This \texttt{.npz} format is a very convenient way to package compactly
1430 1430 and without loss of information, into a single file, a group of related
1431 1431 arrays that pertain to a specific problem. At some point, however, the
1432 1432 complexity of your dataset may be such that the optimal approach is to
1433 1433 use one of the standard formats in scientific data processing that have
1434 1434 been designed to handle complex datasets, such as NetCDF or HDF5.
1435 1435
1436 1436 Fortunately, there are tools for manipulating these formats in Python,
1437 1437 and for storing data in other ways such as databases. A complete
1438 1438 discussion of the possibilities is beyond the scope of this discussion,
1439 1439 but of particular interest for scientific users we at least mention the
1440 1440 following:
1441 1441
1442 1442 \begin{itemize}
1443 1443 \item
1444 1444 The \texttt{scipy.io} module contains routines to read and write
1445 1445 Matlab files in \texttt{.mat} format and files in the NetCDF format
1446 1446 that is widely used in certain scientific disciplines.
1447 1447 \item
1448 1448 For manipulating files in the HDF5 format, there are two excellent
1449 1449 options in Python: The PyTables project offers a high-level, object
1450 1450 oriented approach to manipulating HDF5 datasets, while the h5py
1451 1451 project offers a more direct mapping to the standard HDF5 library
1452 1452 interface. Both are excellent tools; if you need to work with HDF5
1453 1453 datasets you should read some of their documentation and examples and
1454 1454 decide which approach is a better match for your needs.
1455 1455 \end{itemize}
1456 1456
1457 1457 \section{High quality data visualization with Matplotlib}
1458 1458 The \href{http://matplotlib.sf.net}{matplotlib} library is a powerful
1459 1459 tool capable of producing complex publication-quality figures with fine
1460 1460 layout control in two and three dimensions; here we will only provide a
1461 1461 minimal self-contained introduction to its usage that covers the
1462 1462 functionality needed for the rest of the book. We encourage the reader
1463 1463 to read the tutorials included with the matplotlib documentation as well
1464 1464 as to browse its extensive gallery of examples that include source code.
1465 1465
1466 1466 Just as we typically use the shorthand \texttt{np} for Numpy, we will
1467 1467 use \texttt{plt} for the \texttt{matplotlib.pyplot} module where the
1468 1468 easy-to-use plotting functions reside (the library contains a rich
1469 1469 object-oriented architecture that we don't have the space to discuss
1470 1470 here):
1471 1471
1472 1472 \begin{codecell}
1473 1473 \begin{codeinput}
1474 1474 \begin{lstlisting}
1475 1475 import matplotlib.pyplot as plt
1476 1476 \end{lstlisting}
1477 1477 \end{codeinput}
1478 1478 \end{codecell}
1479 1479 The most frequently used function is simply called \texttt{plot}, here
1480 1480 is how you can make a simple plot of $\sin(x)$ for $x \in [0, 2\pi]$
1481 1481 with labels and a grid (we use the semicolon in the last line to
1482 1482 suppress the display of some information that is unnecessary right now):
1483 1483
1484 1484 \begin{codecell}
1485 1485 \begin{codeinput}
1486 1486 \begin{lstlisting}
1487 1487 x = np.linspace(0, 2*np.pi)
1488 1488 y = np.sin(x)
1489 1489 plt.plot(x,y, label='sin(x)')
1490 1490 plt.legend()
1491 1491 plt.grid()
1492 1492 plt.title('Harmonic')
1493 1493 plt.xlabel('x')
1494 1494 plt.ylabel('y');
1495 1495 \end{lstlisting}
1496 1496 \end{codeinput}
1497 1497 \begin{codeoutput}
1498 1498 \begin{center}
1499 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_01.pdf}
1499 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_01.pdf}
1500 1500 \par
1501 1501 \end{center}
1502 1502 \end{codeoutput}
1503 1503 \end{codecell}
1504 1504 You can control the style, color and other properties of the markers,
1505 1505 for example:
1506 1506
1507 1507 \begin{codecell}
1508 1508 \begin{codeinput}
1509 1509 \begin{lstlisting}
1510 1510 plt.plot(x, y, linewidth=2);
1511 1511 \end{lstlisting}
1512 1512 \end{codeinput}
1513 1513 \begin{codeoutput}
1514 1514 \begin{center}
1515 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_02.pdf}
1515 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_02.pdf}
1516 1516 \par
1517 1517 \end{center}
1518 1518 \end{codeoutput}
1519 1519 \end{codecell}
1520 1520 \begin{codecell}
1521 1521 \begin{codeinput}
1522 1522 \begin{lstlisting}
1523 1523 plt.plot(x, y, 'o', markersize=5, color='r');
1524 1524 \end{lstlisting}
1525 1525 \end{codeinput}
1526 1526 \begin{codeoutput}
1527 1527 \begin{center}
1528 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_03.pdf}
1528 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_03.pdf}
1529 1529 \par
1530 1530 \end{center}
1531 1531 \end{codeoutput}
1532 1532 \end{codecell}
1533 1533 We will now see how to create a few other common plot types, such as a
1534 1534 simple error plot:
1535 1535
1536 1536 \begin{codecell}
1537 1537 \begin{codeinput}
1538 1538 \begin{lstlisting}
1539 1539 # example data
1540 1540 x = np.arange(0.1, 4, 0.5)
1541 1541 y = np.exp(-x)
1542 1542
1543 1543 # example variable error bar values
1544 1544 yerr = 0.1 + 0.2*np.sqrt(x)
1545 1545 xerr = 0.1 + yerr
1546 1546
1547 1547 # First illustrate basic pyplot interface, using defaults where possible.
1548 1548 plt.figure()
1549 1549 plt.errorbar(x, y, xerr=0.2, yerr=0.4)
1550 1550 plt.title("Simplest errorbars, 0.2 in x, 0.4 in y");
1551 1551 \end{lstlisting}
1552 1552 \end{codeinput}
1553 1553 \begin{codeoutput}
1554 1554 \begin{center}
1555 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_04.pdf}
1555 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_04.pdf}
1556 1556 \par
1557 1557 \end{center}
1558 1558 \end{codeoutput}
1559 1559 \end{codecell}
1560 1560 A simple log plot
1561 1561
1562 1562 \begin{codecell}
1563 1563 \begin{codeinput}
1564 1564 \begin{lstlisting}
1565 1565 x = np.linspace(-5, 5)
1566 1566 y = np.exp(-x**2)
1567 1567 plt.semilogy(x, y);
1568 1568 \end{lstlisting}
1569 1569 \end{codeinput}
1570 1570 \begin{codeoutput}
1571 1571 \begin{center}
1572 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_05.pdf}
1572 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_05.pdf}
1573 1573 \par
1574 1574 \end{center}
1575 1575 \end{codeoutput}
1576 1576 \end{codecell}
1577 1577 A histogram annotated with text inside the plot, using the \texttt{text}
1578 1578 function:
1579 1579
1580 1580 \begin{codecell}
1581 1581 \begin{codeinput}
1582 1582 \begin{lstlisting}
1583 1583 mu, sigma = 100, 15
1584 1584 x = mu + sigma * np.random.randn(10000)
1585 1585
1586 1586 # the histogram of the data
1587 1587 n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)
1588 1588
1589 1589 plt.xlabel('Smarts')
1590 1590 plt.ylabel('Probability')
1591 1591 plt.title('Histogram of IQ')
1592 1592 # This will put a text fragment at the position given:
1593 1593 plt.text(55, .027, r'$\mu=100,\ \sigma=15$', fontsize=14)
1594 1594 plt.axis([40, 160, 0, 0.03])
1595 1595 plt.grid(True)
1596 1596 \end{lstlisting}
1597 1597 \end{codeinput}
1598 1598 \begin{codeoutput}
1599 1599 \begin{center}
1600 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_06.pdf}
1600 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_06.pdf}
1601 1601 \par
1602 1602 \end{center}
1603 1603 \end{codeoutput}
1604 1604 \end{codecell}
1605 1605 \subsection{Image display}
1606 1606 The \texttt{imshow} command can display single or multi-channel images.
1607 1607 A simple array of random numbers, plotted in grayscale:
1608 1608
1609 1609 \begin{codecell}
1610 1610 \begin{codeinput}
1611 1611 \begin{lstlisting}
1612 1612 from matplotlib import cm
1613 1613 plt.imshow(np.random.rand(5, 10), cmap=cm.gray, interpolation='nearest');
1614 1614 \end{lstlisting}
1615 1615 \end{codeinput}
1616 1616 \begin{codeoutput}
1617 1617 \begin{center}
1618 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_07.pdf}
1618 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_07.pdf}
1619 1619 \par
1620 1620 \end{center}
1621 1621 \end{codeoutput}
1622 1622 \end{codecell}
1623 1623 A real photograph is a multichannel image, \texttt{imshow} interprets it
1624 1624 correctly:
1625 1625
1626 1626 \begin{codecell}
1627 1627 \begin{codeinput}
1628 1628 \begin{lstlisting}
1629 1629 img = plt.imread('stinkbug.png')
1630 1630 print 'Dimensions of the array img:', img.shape
1631 1631 plt.imshow(img);
1632 1632 \end{lstlisting}
1633 1633 \end{codeinput}
1634 1634 \begin{codeoutput}
1635 1635 \begin{verbatim}
1636 1636 Dimensions of the array img: (375, 500, 3)
1637 1637 \end{verbatim}
1638 1638 \begin{center}
1639 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_08.pdf}
1639 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_08.pdf}
1640 1640 \par
1641 1641 \end{center}
1642 1642 \end{codeoutput}
1643 1643 \end{codecell}
1644 1644 \subsection{Simple 3d plotting with matplotlib}
1645 1645 Note that you must execute at least once in your session:
1646 1646
1647 1647 \begin{codecell}
1648 1648 \begin{codeinput}
1649 1649 \begin{lstlisting}
1650 1650 from mpl_toolkits.mplot3d import Axes3D
1651 1651 \end{lstlisting}
1652 1652 \end{codeinput}
1653 1653 \end{codecell}
1654 1654 One this has been done, you can create 3d axes with the
1655 1655 \texttt{projection='3d'} keyword to \texttt{add\_subplot}:
1656 1656
1657 1657 \begin{verbatim}
1658 1658 fig = plt.figure()
1659 1659 fig.add_subplot(<other arguments here>, projection='3d')
1660 1660 \end{verbatim}
1661 1661
1662 1662
1663 1663 A simple surface plot:
1664 1664
1665 1665 \begin{codecell}
1666 1666 \begin{codeinput}
1667 1667 \begin{lstlisting}
1668 1668 from mpl_toolkits.mplot3d.axes3d import Axes3D
1669 1669 from matplotlib import cm
1670 1670
1671 1671 fig = plt.figure()
1672 1672 ax = fig.add_subplot(1, 1, 1, projection='3d')
1673 1673 X = np.arange(-5, 5, 0.25)
1674 1674 Y = np.arange(-5, 5, 0.25)
1675 1675 X, Y = np.meshgrid(X, Y)
1676 1676 R = np.sqrt(X**2 + Y**2)
1677 1677 Z = np.sin(R)
1678 1678 surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet,
1679 1679 linewidth=0, antialiased=False)
1680 1680 ax.set_zlim3d(-1.01, 1.01);
1681 1681 \end{lstlisting}
1682 1682 \end{codeinput}
1683 1683 \begin{codeoutput}
1684 1684 \begin{center}
1685 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_09.pdf}
1685 \includegraphics[width=6in]{tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_09.pdf}
1686 1686 \par
1687 1687 \end{center}
1688 1688 \end{codeoutput}
1689 1689 \end{codecell}
1690 1690 \section{IPython: a powerful interactive environment}
1691 1691 A key component of the everyday workflow of most scientific computing
1692 1692 environments is a good interactive environment, that is, a system in
1693 1693 which you can execute small amounts of code and view the results
1694 1694 immediately, combining both printing out data and opening graphical
1695 1695 visualizations. All modern systems for scientific computing, commercial
1696 1696 and open source, include such functionality.
1697 1697
1698 1698 Out of the box, Python also offers a simple interactive shell with very
1699 1699 limited capabilities. But just like the scientific community built Numpy
1700 1700 to provide arrays suited for scientific work (since Pytyhon's lists
1701 1701 aren't optimal for this task), it has also developed an interactive
1702 1702 environment much more sophisticated than the built-in one. The
1703 1703 \href{http://ipython.org}{IPython project} offers a set of tools to make
1704 1704 productive use of the Python language, all the while working
1705 1705 interactively and with immedate feedback on your results. The basic
1706 1706 tools that IPython provides are:
1707 1707
1708 1708 \begin{enumerate}[1.]
1709 1709 \item
1710 1710 A powerful terminal shell, with many features designed to increase the
1711 1711 fluidity and productivity of everyday scientific workflows, including:
1712 1712
1713 1713 \begin{itemize}
1714 1714 \item
1715 1715 rich introspection of all objects and variables including easy
1716 1716 access to the source code of any function
1717 1717 \item
1718 1718 powerful and extensible tab completion of variables and filenames,
1719 1719 \item
1720 1720 tight integration with matplotlib, supporting interactive figures
1721 1721 that don't block the terminal,
1722 1722 \item
1723 1723 direct access to the filesystem and underlying operating system,
1724 1724 \item
1725 1725 an extensible system for shell-like commands called `magics' that
1726 1726 reduce the work needed to perform many common tasks,
1727 1727 \item
1728 1728 tools for easily running, timing, profiling and debugging your
1729 1729 codes,
1730 1730 \item
1731 1731 syntax highlighted error messages with much more detail than the
1732 1732 default Python ones,
1733 1733 \item
1734 1734 logging and access to all previous history of inputs, including
1735 1735 across sessions
1736 1736 \end{itemize}
1737 1737 \item
1738 1738 A Qt console that provides the look and feel of a terminal, but adds
1739 1739 support for inline figures, graphical calltips, a persistent session
1740 1740 that can survive crashes (even segfaults) of the kernel process, and
1741 1741 more.
1742 1742 \item
1743 1743 A web-based notebook that can execute code and also contain rich text
1744 1744 and figures, mathematical equations and arbitrary HTML. This notebook
1745 1745 presents a document-like view with cells where code is executed but
1746 1746 that can be edited in-place, reordered, mixed with explanatory text
1747 1747 and figures, etc.
1748 1748 \item
1749 1749 A high-performance, low-latency system for parallel computing that
1750 1750 supports the control of a cluster of IPython engines communicating
1751 1751 over a network, with optimizations that minimize unnecessary copying
1752 1752 of large objects (especially numpy arrays).
1753 1753 \end{enumerate}
1754 1754 We will now discuss the highlights of the tools 1-3 above so that you
1755 1755 can make them an effective part of your workflow. The topic of parallel
1756 1756 computing is beyond the scope of this document, but we encourage you to
1757 1757 read the extensive
1758 1758 \href{http://ipython.org/ipython-doc/rel-0.12.1/parallel/index.html}{documentation}
1759 1759 and \href{http://minrk.github.com/scipy-tutorial-2011/}{tutorials} on
1760 1760 this available on the IPython website.
1761 1761
1762 1762 \subsection{The IPython terminal}
1763 1763 You can start IPython at the terminal simply by typing:
1764 1764
1765 1765 \begin{verbatim}
1766 1766 $ ipython
1767 1767 \end{verbatim}
1768 1768 which will provide you some basic information about how to get started
1769 1769 and will then open a prompt labeled \texttt{In {[}1{]}:} for you to
1770 1770 start typing. Here we type $2^{64}$ and Python computes the result for
1771 1771 us in exact arithmetic, returning it as \texttt{Out{[}1{]}}:
1772 1772
1773 1773 \begin{verbatim}
1774 1774 $ ipython
1775 1775 Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
1776 1776 Type "copyright", "credits" or "license" for more information.
1777 1777
1778 1778 IPython 0.13.dev -- An enhanced Interactive Python.
1779 1779 ? -> Introduction and overview of IPython's features.
1780 1780 %quickref -> Quick reference.
1781 1781 help -> Python's own help system.
1782 1782 object? -> Details about 'object', use 'object??' for extra details.
1783 1783
1784 1784 In [1]: 2**64
1785 1785 Out[1]: 18446744073709551616L
1786 1786 \end{verbatim}
1787 1787 The first thing you should know about IPython is that all your inputs
1788 1788 and outputs are saved. There are two variables named \texttt{In} and
1789 1789 \texttt{Out} which are filled as you work with your results.
1790 1790 Furthermore, all outputs are also saved to auto-created variables of the
1791 1791 form \texttt{\_NN} where \texttt{NN} is the prompt number, and inputs to
1792 1792 \texttt{\_iNN}. This allows you to recover quickly the result of a prior
1793 1793 computation by referring to its number even if you forgot to store it as
1794 1794 a variable. For example, later on in the above session you can do:
1795 1795
1796 1796 \begin{verbatim}
1797 1797 In [6]: print _1
1798 1798 18446744073709551616
1799 1799 \end{verbatim}
1800 1800
1801 1801
1802 1802 We strongly recommend that you take a few minutes to read at least the
1803 1803 basic introduction provided by the \texttt{?} command, and keep in mind
1804 1804 that the \texttt{\%quickref} command at all times can be used as a quick
1805 1805 reference ``cheat sheet'' of the most frequently used features of
1806 1806 IPython.
1807 1807
1808 1808 At the IPython prompt, any valid Python code that you type will be
1809 1809 executed similarly to the default Python shell (though often with more
1810 1810 informative feedback). But since IPython is a \emph{superset} of the
1811 1811 default Python shell; let's have a brief look at some of its additional
1812 1812 functionality.
1813 1813
1814 1814 \textbf{Object introspection}
1815 1815
1816 1816 A simple \texttt{?} command provides a general introduction to IPython,
1817 1817 but as indicated in the banner above, you can use the \texttt{?} syntax
1818 1818 to ask for details about any object. For example, if we type
1819 1819 \texttt{\_1?}, IPython will print the following details about this
1820 1820 variable:
1821 1821
1822 1822 \begin{verbatim}
1823 1823 In [14]: _1?
1824 1824 Type: long
1825 1825 Base Class: <type 'long'>
1826 1826 String Form:18446744073709551616
1827 1827 Namespace: Interactive
1828 1828 Docstring:
1829 1829 long(x[, base]) -> integer
1830 1830
1831 1831 Convert a string or number to a long integer, if possible. A floating
1832 1832
1833 1833 [etc... snipped for brevity]
1834 1834 \end{verbatim}
1835 1835 If you add a second \texttt{?} and for any oobject \texttt{x} type
1836 1836 \texttt{x??}, IPython will try to provide an even more detailed analsysi
1837 1837 of the object, including its syntax-highlighted source code when it can
1838 1838 be found. It's possible that \texttt{x??} returns the same information
1839 1839 as \texttt{x?}, but in many cases \texttt{x??} will indeed provide
1840 1840 additional details.
1841 1841
1842 1842 Finally, the \texttt{?} syntax is also useful to search
1843 1843 \emph{namespaces} with wildcards. Suppose you are wondering if there is
1844 1844 any function in Numpy that may do text-related things; with
1845 1845 \texttt{np.*txt*?}, IPython will print all the names in the \texttt{np}
1846 1846 namespace (our Numpy shorthand) that have `txt' anywhere in their name:
1847 1847
1848 1848 \begin{verbatim}
1849 1849 In [17]: np.*txt*?
1850 1850 np.genfromtxt
1851 1851 np.loadtxt
1852 1852 np.mafromtxt
1853 1853 np.ndfromtxt
1854 1854 np.recfromtxt
1855 1855 np.savetxt
1856 1856 \end{verbatim}
1857 1857
1858 1858
1859 1859 \textbf{Tab completion}
1860 1860
1861 1861 IPython makes the tab key work extra hard for you as a way to rapidly
1862 1862 inspect objects and libraries. Whenever you have typed something at the
1863 1863 prompt, by hitting the \texttt{\textless{}tab\textgreater{}} key IPython
1864 1864 will try to complete the rest of the line. For this, IPython will
1865 1865 analyze the text you had so far and try to search for Python data or
1866 1866 files that may match the context you have already provided.
1867 1867
1868 1868 For example, if you type \texttt{np.load} and hit the key, you'll see:
1869 1869
1870 1870 \begin{verbatim}
1871 1871 In [21]: np.load<TAB HERE>
1872 1872 np.load np.loads np.loadtxt
1873 1873 \end{verbatim}
1874 1874 so you can quickly find all the load-related functionality in numpy. Tab
1875 1875 completion works even for function arguments, for example consider this
1876 1876 function definition:
1877 1877
1878 1878 \begin{verbatim}
1879 1879 In [20]: def f(x, frobinate=False):
1880 1880 ....: if frobinate:
1881 1881 ....: return x**2
1882 1882 ....:
1883 1883 \end{verbatim}
1884 1884 If you now use the \texttt{\textless{}tab\textgreater{}} key after
1885 1885 having typed `fro' you'll get all valid Python completions, but those
1886 1886 marked with \texttt{=} at the end are known to be keywords of your
1887 1887 function:
1888 1888
1889 1889 \begin{verbatim}
1890 1890 In [21]: f(2, fro<TAB HERE>
1891 1891 frobinate= frombuffer fromfunction frompyfunc fromstring
1892 1892 from fromfile fromiter fromregex frozenset
1893 1893 \end{verbatim}
1894 1894 at this point you can add the \texttt{b} letter and hit
1895 1895 \texttt{\textless{}tab\textgreater{}} once more, and IPython will finish
1896 1896 the line for you:
1897 1897
1898 1898 \begin{verbatim}
1899 1899 In [21]: f(2, frobinate=
1900 1900 \end{verbatim}
1901 1901 As a beginner, simply get into the habit of using
1902 1902 \texttt{\textless{}tab\textgreater{}} after most objects; it should
1903 1903 quickly become second nature as you will see how helps keep a fluid
1904 1904 workflow and discover useful information. Later on you can also
1905 1905 customize this behavior by writing your own completion code, if you so
1906 1906 desire.
1907 1907
1908 1908 \textbf{Matplotlib integration}
1909 1909
1910 1910 One of the most useful features of IPython for scientists is its tight
1911 1911 integration with matplotlib: at the terminal IPython lets you open
1912 1912 matplotlib figures without blocking your typing (which is what happens
1913 1913 if you try to do the same thing at the default Python shell), and in the
1914 1914 Qt console and notebook you can even view your figures embedded in your
1915 1915 workspace next to the code that created them.
1916 1916
1917 1917 The matplotlib support can be either activated when you start IPython by
1918 1918 passing the \texttt{-{}-pylab} flag, or at any point later in your
1919 1919 session by using the \texttt{\%pylab} command. If you start IPython with
1920 1920 \texttt{-{}-pylab}, you'll see something like this (note the extra
1921 1921 message about pylab):
1922 1922
1923 1923 \begin{verbatim}
1924 1924 $ ipython --pylab
1925 1925 Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
1926 1926 Type "copyright", "credits" or "license" for more information.
1927 1927
1928 1928 IPython 0.13.dev -- An enhanced Interactive Python.
1929 1929 ? -> Introduction and overview of IPython's features.
1930 1930 %quickref -> Quick reference.
1931 1931 help -> Python's own help system.
1932 1932 object? -> Details about 'object', use 'object??' for extra details.
1933 1933
1934 1934 Welcome to pylab, a matplotlib-based Python environment [backend: Qt4Agg].
1935 1935 For more information, type 'help(pylab)'.
1936 1936
1937 1937 In [1]:
1938 1938 \end{verbatim}
1939 1939 Furthermore, IPython will import \texttt{numpy} with the \texttt{np}
1940 1940 shorthand, \texttt{matplotlib.pyplot} as \texttt{plt}, and it will also
1941 1941 load all of the numpy and pyplot top-level names so that you can
1942 1942 directly type something like:
1943 1943
1944 1944 \begin{verbatim}
1945 1945 In [1]: x = linspace(0, 2*pi, 200)
1946 1946
1947 1947 In [2]: plot(x, sin(x))
1948 1948 Out[2]: [<matplotlib.lines.Line2D at 0x9e7c16c>]
1949 1949 \end{verbatim}
1950 1950 instead of having to prefix each call with its full signature (as we
1951 1951 have been doing in the examples thus far):
1952 1952
1953 1953 \begin{verbatim}
1954 1954 In [3]: x = np.linspace(0, 2*np.pi, 200)
1955 1955
1956 1956 In [4]: plt.plot(x, np.sin(x))
1957 1957 Out[4]: [<matplotlib.lines.Line2D at 0x9e900ac>]
1958 1958 \end{verbatim}
1959 1959 This shorthand notation can be a huge time-saver when working
1960 1960 interactively (it's a few characters but you are likely to type them
1961 1961 hundreds of times in a session). But we should note that as you develop
1962 1962 persistent scripts and notebooks meant for reuse, it's best to get in
1963 1963 the habit of using the longer notation (known as \emph{fully qualified
1964 1964 names} as it's clearer where things come from and it makes for more
1965 1965 robust, readable and maintainable code in the long run).
1966 1966
1967 1967 \textbf{Access to the operating system and files}
1968 1968
1969 1969 In IPython, you can type \texttt{ls} to see your files or \texttt{cd} to
1970 1970 change directories, just like you would at a regular system prompt:
1971 1971
1972 1972 \begin{verbatim}
1973 1973 In [2]: cd tests
1974 1974 /home/fperez/ipython/nbconvert/tests
1975 1975
1976 1976 In [3]: ls test.*
1977 1977 test.aux test.html test.ipynb test.log test.out test.pdf test.rst test.tex
1978 1978 \end{verbatim}
1979 1979 Furthermore, if you use the \texttt{!} at the beginning of a line, any
1980 1980 commands you pass afterwards go directly to the operating system:
1981 1981
1982 1982 \begin{verbatim}
1983 1983 In [4]: !echo "Hello IPython"
1984 1984 Hello IPython
1985 1985 \end{verbatim}
1986 1986 IPython offers a useful twist in this feature: it will substitute in the
1987 1987 command the value of any \emph{Python} variable you may have if you
1988 1988 prepend it with a \texttt{\$} sign:
1989 1989
1990 1990 \begin{verbatim}
1991 1991 In [5]: message = 'IPython interpolates from Python to the shell'
1992 1992
1993 1993 In [6]: !echo $message
1994 1994 IPython interpolates from Python to the shell
1995 1995 \end{verbatim}
1996 1996 This feature can be extremely useful, as it lets you combine the power
1997 1997 and clarity of Python for complex logic with the immediacy and
1998 1998 familiarity of many shell commands. Additionally, if you start the line
1999 1999 with \emph{two} \texttt{\$\$} signs, the output of the command will be
2000 2000 automatically captured as a list of lines, e.g.:
2001 2001
2002 2002 \begin{verbatim}
2003 2003 In [10]: !!ls test.*
2004 2004 Out[10]:
2005 2005 ['test.aux',
2006 2006 'test.html',
2007 2007 'test.ipynb',
2008 2008 'test.log',
2009 2009 'test.out',
2010 2010 'test.pdf',
2011 2011 'test.rst',
2012 2012 'test.tex']
2013 2013 \end{verbatim}
2014 2014 As explained above, you can now use this as the variable \texttt{\_10}.
2015 2015 If you directly want to capture the output of a system command to a
2016 2016 Python variable, you can use the syntax \texttt{=!}:
2017 2017
2018 2018 \begin{verbatim}
2019 2019 In [11]: testfiles =! ls test.*
2020 2020
2021 2021 In [12]: print testfiles
2022 2022 ['test.aux', 'test.html', 'test.ipynb', 'test.log', 'test.out', 'test.pdf', 'test.rst', 'test.tex']
2023 2023 \end{verbatim}
2024 2024 Finally, the special \texttt{\%alias} command lets you define names that
2025 2025 are shorthands for system commands, so that you can type them without
2026 2026 having to prefix them via \texttt{!} explicitly (for example,
2027 2027 \texttt{ls} is an alias that has been predefined for you at startup).
2028 2028
2029 2029 \textbf{Magic commands}
2030 2030
2031 2031 IPython has a system for special commands, called `magics', that let you
2032 2032 control IPython itself and perform many common tasks with a more
2033 2033 shell-like syntax: it uses spaces for delimiting arguments, flags can be
2034 2034 set with dashes and all arguments are treated as strings, so no
2035 2035 additional quoting is required. This kind of syntax is invalid in the
2036 2036 Python language but very convenient for interactive typing (less
2037 2037 parentheses, commans and quoting everywhere); IPython distinguishes the
2038 2038 two by detecting lines that start with the \texttt{\%} character.
2039 2039
2040 2040 You can learn more about the magic system by simply typing
2041 2041 \texttt{\%magic} at the prompt, which will give you a short description
2042 2042 plus the documentation on \emph{all} available magics. If you want to
2043 2043 see only a listing of existing magics, you can use \texttt{\%lsmagic}:
2044 2044
2045 2045 \begin{verbatim}
2046 2046 In [4]: lsmagic
2047 2047 Available magic functions:
2048 2048 %alias %autocall %autoindent %automagic %bookmark %c %cd %colors %config %cpaste
2049 2049 %debug %dhist %dirs %doctest_mode %ds %ed %edit %env %gui %hist %history
2050 2050 %install_default_config %install_ext %install_profiles %load_ext %loadpy %logoff %logon
2051 2051 %logstart %logstate %logstop %lsmagic %macro %magic %notebook %page %paste %pastebin
2052 2052 %pd %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %pop %popd %pprint %precision %profile
2053 2053 %prun %psearch %psource %pushd %pwd %pycat %pylab %quickref %recall %rehashx
2054 2054 %reload_ext %rep %rerun %reset %reset_selective %run %save %sc %stop %store %sx %tb
2055 2055 %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode
2056 2056
2057 2057 Automagic is ON, % prefix NOT needed for magic functions.
2058 2058 \end{verbatim}
2059 2059 Note how the example above omitted the eplicit \texttt{\%} marker and
2060 2060 simply uses \texttt{lsmagic}. As long as the `automagic' feature is on
2061 2061 (which it is by default), you can omit the \texttt{\%} marker as long as
2062 2062 there is no ambiguity with a Python variable of the same name.
2063 2063
2064 2064 \textbf{Running your code}
2065 2065
2066 2066 While it's easy to type a few lines of code in IPython, for any
2067 2067 long-lived work you should keep your codes in Python scripts (or in
2068 2068 IPython notebooks, see below). Consider that you have a script, in this
2069 2069 case trivially simple for the sake of brevity, named \texttt{simple.py}:
2070 2070
2071 2071 \begin{verbatim}
2072 2072 In [12]: !cat simple.py
2073 2073 import numpy as np
2074 2074
2075 2075 x = np.random.normal(size=100)
2076 2076
2077 2077 print 'First elment of x:', x[0]
2078 2078 \end{verbatim}
2079 2079 The typical workflow with IPython is to use the \texttt{\%run} magic to
2080 2080 execute your script (you can omit the .py extension if you want). When
2081 2081 you run it, the script will execute just as if it had been run at the
2082 2082 system prompt with \texttt{python simple.py} (though since modules don't
2083 2083 get re-executed on new imports by Python, all system initialization is
2084 2084 essentially free, which can have a significant run time impact in some
2085 2085 cases):
2086 2086
2087 2087 \begin{verbatim}
2088 2088 In [13]: run simple
2089 2089 First elment of x: -1.55872256289
2090 2090 \end{verbatim}
2091 2091 Once it completes, all variables defined in it become available for you
2092 2092 to use interactively:
2093 2093
2094 2094 \begin{verbatim}
2095 2095 In [14]: x.shape
2096 2096 Out[14]: (100,)
2097 2097 \end{verbatim}
2098 2098 This allows you to plot data, try out ideas, etc, in a
2099 2099 \texttt{\%run}/interact/edit cycle that can be very productive. As you
2100 2100 start understanding your problem better you can refine your script
2101 2101 further, incrementally improving it based on the work you do at the
2102 2102 IPython prompt. At any point you can use the \texttt{\%hist} magic to
2103 2103 print out your history without prompts, so that you can copy useful
2104 2104 fragments back into the script.
2105 2105
2106 2106 By default, \texttt{\%run} executes scripts in a completely empty
2107 2107 namespace, to better mimic how they would execute at the system prompt
2108 2108 with plain Python. But if you use the \texttt{-i} flag, the script will
2109 2109 also see your interactively defined variables. This lets you edit in a
2110 2110 script larger amounts of code that still behave as if you had typed them
2111 2111 at the IPython prompt.
2112 2112
2113 2113 You can also get a summary of the time taken by your script with the
2114 2114 \texttt{-t} flag; consider a different script \texttt{randsvd.py} that
2115 2115 takes a bit longer to run:
2116 2116
2117 2117 \begin{verbatim}
2118 2118 In [21]: run -t randsvd.py
2119 2119
2120 2120 IPython CPU timings (estimated):
2121 2121 User : 0.38 s.
2122 2122 System : 0.04 s.
2123 2123 Wall time: 0.34 s.
2124 2124 \end{verbatim}
2125 2125 \texttt{User} is the time spent by the computer executing your code,
2126 2126 while \texttt{System} is the time the operating system had to work on
2127 2127 your behalf, doing things like memory allocation that are needed by your
2128 2128 code but that you didn't explicitly program and that happen inside the
2129 2129 kernel. The \texttt{Wall time} is the time on a `clock on the wall'
2130 2130 between the start and end of your program.
2131 2131
2132 2132 If \texttt{Wall \textgreater{} User+System}, your code is most likely
2133 2133 waiting idle for certain periods. That could be waiting for data to
2134 2134 arrive from a remote source or perhaps because the operating system has
2135 2135 to swap large amounts of virtual memory. If you know that your code
2136 2136 doesn't explicitly wait for remote data to arrive, you should
2137 2137 investigate further to identify possible ways of improving the
2138 2138 performance profile.
2139 2139
2140 2140 If you only want to time how long a single statement takes, you don't
2141 2141 need to put it into a script as you can use the \texttt{\%timeit} magic,
2142 2142 which uses Python's \texttt{timeit} module to very carefully measure
2143 2143 timig data; \texttt{timeit} can measure even short statements that
2144 2144 execute extremely fast:
2145 2145
2146 2146 \begin{verbatim}
2147 2147 In [27]: %timeit a=1
2148 2148 10000000 loops, best of 3: 23 ns per loop
2149 2149 \end{verbatim}
2150 2150 and for code that runs longer, it automatically adjusts so the overall
2151 2151 measurement doesn't take too long:
2152 2152
2153 2153 \begin{verbatim}
2154 2154 In [28]: %timeit np.linalg.svd(x)
2155 2155 1 loops, best of 3: 310 ms per loop
2156 2156 \end{verbatim}
2157 2157 The \texttt{\%run} magic still has more options for debugging and
2158 2158 profiling data; you should read its documentation for many useful
2159 2159 details (as always, just type \texttt{\%run?}).
2160 2160
2161 2161 \subsection{The graphical Qt console}
2162 2162 If you type at the system prompt (see the IPython website for
2163 2163 installation details, as this requires some additional libraries):
2164 2164
2165 2165 \begin{verbatim}
2166 2166 $ ipython qtconsole
2167 2167 \end{verbatim}
2168 2168 instead of opening in a terminal as before, IPython will start a
2169 2169 graphical console that at first sight appears just like a terminal, but
2170 2170 which is in fact much more capable than a text-only terminal. This is a
2171 2171 specialized terminal designed for interactive scientific work, and it
2172 2172 supports full multi-line editing with color highlighting and graphical
2173 2173 calltips for functions, it can keep multiple IPython sessions open
2174 2174 simultaneously in tabs, and when scripts run it can display the figures
2175 2175 inline directly in the work area.
2176 2176
2177 2177 % This cell is for the pdflatex output only
2178 2178 \begin{figure}[htbp]
2179 2179 \centering
2180 2180 \includegraphics[width=3in]{ipython_qtconsole2.png}
2181 2181 \caption{The IPython Qt console: a lightweight terminal for scientific exploration, with code, results and graphics in a soingle environment.}
2182 2182 \end{figure}
2183 2183 The Qt console accepts the same \texttt{-{}-pylab} startup flags as the
2184 2184 terminal, but you can additionally supply the value
2185 2185 \texttt{-{}-pylab inline}, which enables the support for inline graphics
2186 2186 shown in the figure. This is ideal for keeping all the code and figures
2187 2187 in the same session, given that the console can save the output of your
2188 2188 entire session to HTML or PDF.
2189 2189
2190 2190 Since the Qt console makes it far more convenient than the terminal to
2191 2191 edit blocks of code with multiple lines, in this environment it's worth
2192 2192 knowing about the \texttt{\%loadpy} magic function. \texttt{\%loadpy}
2193 2193 takes a path to a local file or remote URL, fetches its contents, and
2194 2194 puts it in the work area for you to further edit and execute. It can be
2195 2195 an extremely fast and convenient way of loading code from local disk or
2196 2196 remote examples from sites such as the
2197 2197 \href{http://matplotlib.sourceforge.net/gallery.html}{Matplotlib
2198 2198 gallery}.
2199 2199
2200 2200 Other than its enhanced capabilities for code and graphics, all of the
2201 2201 features of IPython we've explained before remain functional in this
2202 2202 graphical console.
2203 2203
2204 2204 \subsection{The IPython Notebook}
2205 2205 The third way to interact with IPython, in addition to the terminal and
2206 2206 graphical Qt console, is a powerful web interface called the ``IPython
2207 2207 Notebook''. If you run at the system console (you can omit the
2208 2208 \texttt{pylab} flags if you don't need plotting support):
2209 2209
2210 2210 \begin{verbatim}
2211 2211 $ ipython notebook --pylab inline
2212 2212 \end{verbatim}
2213 2213 IPython will start a process that runs a web server in your local
2214 2214 machine and to which a web browser can connect. The Notebook is a
2215 2215 workspace that lets you execute code in blocks called `cells' and
2216 2216 displays any results and figures, but which can also contain arbitrary
2217 2217 text (including LaTeX-formatted mathematical expressions) and any rich
2218 2218 media that a modern web browser is capable of displaying.
2219 2219
2220 2220 % This cell is for the pdflatex output only
2221 2221 \begin{figure}[htbp]
2222 2222 \centering
2223 2223 \includegraphics[width=3in]{ipython-notebook-specgram-2.png}
2224 2224 \caption{The IPython Notebook: text, equations, code, results, graphics and other multimedia in an open format for scientific exploration and collaboration}
2225 2225 \end{figure}
2226 2226 In fact, this document was written as a Notebook, and only exported to
2227 2227 LaTeX for printing. Inside of each cell, all the features of IPython
2228 2228 that we have discussed before remain functional, since ultimately this
2229 2229 web client is communicating with the same IPython code that runs in the
2230 2230 terminal. But this interface is a much more rich and powerful
2231 2231 environment for maintaining long-term ``live and executable'' scientific
2232 2232 documents.
2233 2233
2234 2234 Notebook environments have existed in commercial systems like
2235 2235 Mathematica(TM) and Maple(TM) for a long time; in the open source world
2236 2236 the \href{http://sagemath.org}{Sage} project blazed this particular
2237 2237 trail starting in 2006, and now we bring all the features that have made
2238 2238 IPython such a widely used tool to a Notebook model.
2239 2239
2240 2240 Since the Notebook runs as a web application, it is possible to
2241 2241 configure it for remote access, letting you run your computations on a
2242 2242 persistent server close to your data, which you can then access remotely
2243 2243 from any browser-equipped computer. We encourage you to read the
2244 2244 extensive documentation provided by the IPython project for details on
2245 2245 how to do this and many more features of the notebook.
2246 2246
2247 2247 Finally, as we said earlier, IPython also has a high-level and easy to
2248 2248 use set of libraries for parallel computing, that let you control
2249 2249 (interactively if desired) not just one IPython but an entire cluster of
2250 2250 `IPython engines'. Unfortunately a detailed discussion of these tools is
2251 2251 beyond the scope of this text, but should you need to parallelize your
2252 2252 analysis codes, a quick read of the tutorials and examples provided at
2253 2253 the IPython site may prove fruitful.
2254 2254
2255 2255 \end{document}
General Comments 0
You need to be logged in to leave comments. Login now