##// END OF EJS Templates
update ref rstfile
Matthias BUSSONNIER -
Show More
@@ -1,2077 +1,2077 b''
1 1 An Introduction to the Scientific Python Ecosystem
2 2 ==================================================
3 3
4 4 While the Python language is an excellent tool for general-purpose
5 5 programming, with a highly readable syntax, rich and powerful data types
6 6 (strings, lists, sets, dictionaries, arbitrary length integers, etc) and
7 7 a very comprehensive standard library, it was not designed specifically
8 8 for mathematical and scientific computing. Neither the language nor its
9 9 standard library have facilities for the efficient representation of
10 10 multidimensional datasets, tools for linear algebra and general matrix
11 11 manipulations (an essential building block of virtually all technical
12 12 computing), nor any data visualization facilities.
13 13
14 14 In particular, Python lists are very flexible containers that can be
15 15 nested arbitrarily deep and which can hold any Python object in them,
16 16 but they are poorly suited to represent efficiently common mathematical
17 17 constructs like vectors and matrices. In contrast, much of our modern
18 18 heritage of scientific computing has been built on top of libraries
19 19 written in the Fortran language, which has native support for vectors
20 20 and matrices as well as a library of mathematical functions that can
21 21 efficiently operate on entire arrays at once.
22 22
23 23 Scientific Python: a collaboration of projects built by scientists
24 24 ------------------------------------------------------------------
25 25
26 26 The scientific community has developed a set of related Python libraries
27 27 that provide powerful array facilities, linear algebra, numerical
28 28 algorithms, data visualization and more. In this appendix, we will
29 29 briefly outline the tools most frequently used for this purpose, that
30 30 make "Scientific Python" something far more powerful than the Python
31 31 language alone.
32 32
33 33 For reasons of space, we can only describe in some detail the central
34 34 Numpy library, but below we provide links to the websites of each
35 35 project where you can read their documentation in more detail.
36 36
37 37 First, let's look at an overview of the basic tools that most scientists
38 38 use in daily research with Python. The core of this ecosystem is
39 39 composed of:
40 40
41 41 - Numpy: the basic library that most others depend on, it provides a
42 42 powerful array type that can represent multidmensional datasets of
43 43 many different kinds and that supports arithmetic operations. Numpy
44 44 also provides a library of common mathematical functions, basic
45 45 linear algebra, random number generation and Fast Fourier Transforms.
46 46 Numpy can be found at `numpy.scipy.org <http://numpy.scipy.org>`_
47 47
48 48 - Scipy: a large collection of numerical algorithms that operate on
49 49 numpy arrays and provide facilities for many common tasks in
50 50 scientific computing, including dense and sparse linear algebra
51 51 support, optimization, special functions, statistics, n-dimensional
52 52 image processing, signal processing and more. Scipy can be found at
53 53 `scipy.org <http://scipy.org>`_.
54 54
55 55 - Matplotlib: a data visualization library with a strong focus on
56 56 producing high-quality output, it supports a variety of common
57 57 scientific plot types in two and three dimensions, with precise
58 58 control over the final output and format for publication-quality
59 59 results. Matplotlib can also be controlled interactively allowing
60 60 graphical manipulation of your data (zooming, panning, etc) and can
61 61 be used with most modern user interface toolkits. It can be found at
62 62 `matplotlib.sf.net <http://matplotlib.sf.net>`_.
63 63
64 64 - IPython: while not strictly scientific in nature, IPython is the
65 65 interactive environment in which many scientists spend their time.
66 66 IPython provides a powerful Python shell that integrates tightly with
67 67 Matplotlib and with easy access to the files and operating system,
68 68 and which can execute in a terminal or in a graphical Qt console.
69 69 IPython also has a web-based notebook interface that can combine code
70 70 with text, mathematical expressions, figures and multimedia. It can
71 71 be found at `ipython.org <http://ipython.org>`_.
72 72
73 73 While each of these tools can be installed separately, in our opinion
74 74 the most convenient way today of accessing them (especially on Windows
75 75 and Mac computers) is to install the `Free Edition of the Enthought
76 76 Python Distribution <http://www.enthought.com/products/epd_free.php>`_
77 77 which contain all the above. Other free alternatives on Windows (but not
78 78 on Macs) are `Python(x,y) <http://code.google.com/p/pythonxy>`_ and
79 79 `Christoph Gohlke's packages
80 80 page <http://www.lfd.uci.edu/~gohlke/pythonlibs>`_.
81 81
82 82 These four 'core' libraries are in practice complemented by a number of
83 83 other tools for more specialized work. We will briefly list here the
84 84 ones that we think are the most commonly needed:
85 85
86 86 - Sympy: a symbolic manipulation tool that turns a Python session into
87 87 a computer algebra system. It integrates with the IPython notebook,
88 88 rendering results in properly typeset mathematical notation.
89 89 `sympy.org <http://sympy.org>`_.
90 90
91 91 - Mayavi: sophisticated 3d data visualization;
92 92 `code.enthought.com/projects/mayavi <http://code.enthought.com/projects/mayavi>`_.
93 93
94 94 - Cython: a bridge language between Python and C, useful both to
95 95 optimize performance bottlenecks in Python and to access C libraries
96 96 directly; `cython.org <http://cython.org>`_.
97 97
98 98 - Pandas: high-performance data structures and data analysis tools,
99 99 with powerful data alignment and structural manipulation
100 100 capabilities; `pandas.pydata.org <http://pandas.pydata.org>`_.
101 101
102 102 - Statsmodels: statistical data exploration and model estimation;
103 103 `statsmodels.sourceforge.net <http://statsmodels.sourceforge.net>`_.
104 104
105 105 - Scikit-learn: general purpose machine learning algorithms with a
106 106 common interface; `scikit-learn.org <http://scikit-learn.org>`_.
107 107
108 108 - Scikits-image: image processing toolbox;
109 109 `scikits-image.org <http://scikits-image.org>`_.
110 110
111 111 - NetworkX: analysis of complex networks (in the graph theoretical
112 112 sense); `networkx.lanl.gov <http://networkx.lanl.gov>`_.
113 113
114 114 - PyTables: management of hierarchical datasets using the
115 115 industry-standard HDF5 format;
116 116 `www.pytables.org <http://www.pytables.org>`_.
117 117
118 118 Beyond these, for any specific problem you should look on the internet
119 119 first, before starting to write code from scratch. There's a good chance
120 120 that someone, somewhere, has written an open source library that you can
121 121 use for part or all of your problem.
122 122
123 123 A note about the examples below
124 124 -------------------------------
125 125
126 126 In all subsequent examples, you will see blocks of input code, followed
127 127 by the results of the code if the code generated output. This output may
128 128 include text, graphics and other result objects. These blocks of input
129 129 can be pasted into your interactive IPython session or notebook for you
130 130 to execute. In the print version of this document, a thin vertical bar
131 131 on the left of the blocks of input and output shows which blocks go
132 132 together.
133 133
134 134 If you are reading this text as an actual IPython notebook, you can
135 135 press ``Shift-Enter`` or use the 'play' button on the toolbar
136 136 (right-pointing triangle) to execute each block of code, known as a
137 137 'cell' in IPython:
138 138
139 139 In[71]:
140 140
141 141 .. code:: python
142 142
143 143 # This is a block of code, below you'll see its output
144 144 print "Welcome to the world of scientific computing with Python!"
145 145
146 146 .. parsed-literal::
147 147
148 148 Welcome to the world of scientific computing with Python!
149 149
150 150
151 151 Motivation: the trapezoidal rule
152 152 ================================
153 153
154 154 In subsequent sections we'll provide a basic introduction to the nuts
155 155 and bolts of the basic scientific python tools; but we'll first motivate
156 156 it with a brief example that illustrates what you can do in a few lines
157 157 with these tools. For this, we will use the simple problem of
158 158 approximating a definite integral with the trapezoid rule:
159 159
160 160 .. math::
161 161
162 162
163 163 \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right).
164 164
165 165 Our task will be to compute this formula for a function such as:
166 166
167 167 .. math::
168 168
169 169
170 170 f(x) = (x-3)(x-5)(x-7)+85
171 171
172 172 integrated between :math:`a=1` and :math:`b=9`.
173 173
174 174 First, we define the function and sample it evenly between 0 and 10 at
175 175 200 points:
176 176
177 177 In[1]:
178 178
179 179 .. code:: python
180 180
181 181 def f(x):
182 182 return (x-3)*(x-5)*(x-7)+85
183 183
184 184 import numpy as np
185 185 x = np.linspace(0, 10, 200)
186 186 y = f(x)
187 187
188 188 We select :math:`a` and :math:`b`, our integration limits, and we take
189 189 only a few points in that region to illustrate the error behavior of the
190 190 trapezoid approximation:
191 191
192 192 In[2]:
193 193
194 194 .. code:: python
195 195
196 196 a, b = 1, 9
197 197 xint = x[logical_and(x>=a, x<=b)][::30]
198 198 yint = y[logical_and(x>=a, x<=b)][::30]
199 199
200 200 Let's plot both the function and the area below it in the trapezoid
201 201 approximation:
202 202
203 203 In[3]:
204 204
205 205 .. code:: python
206 206
207 207 import matplotlib.pyplot as plt
208 208 plt.plot(x, y, lw=2)
209 209 plt.axis([0, 10, 0, 140])
210 210 plt.fill_between(xint, 0, yint, facecolor='gray', alpha=0.4)
211 211 plt.text(0.5 * (a + b), 30,r"$\int_a^b f(x)dx$", horizontalalignment='center', fontsize=20);
212 212
213 .. image:: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_00.svg
213 .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_00.svg
214 214
215 215 Compute the integral both at high accuracy and with the trapezoid
216 216 approximation
217 217
218 218 In[4]:
219 219
220 220 .. code:: python
221 221
222 222 from scipy.integrate import quad, trapz
223 223 integral, error = quad(f, 1, 9)
224 224 trap_integral = trapz(yint, xint)
225 225 print "The integral is: %g +/- %.1e" % (integral, error)
226 226 print "The trapezoid approximation with", len(xint), "points is:", trap_integral
227 227 print "The absolute error is:", abs(integral - trap_integral)
228 228
229 229 .. parsed-literal::
230 230
231 231 The integral is: 680 +/- 7.5e-12
232 232 The trapezoid approximation with 6 points is: 621.286411141
233 233 The absolute error is: 58.7135888589
234 234
235 235
236 236 This simple example showed us how, combining the numpy, scipy and
237 237 matplotlib libraries we can provide an illustration of a standard method
238 238 in elementary calculus with just a few lines of code. We will now
239 239 discuss with more detail the basic usage of these tools.
240 240
241 241 NumPy arrays: the right data structure for scientific computing
242 242 ===============================================================
243 243
244 244 Basics of Numpy arrays
245 245 ----------------------
246 246
247 247 We now turn our attention to the Numpy library, which forms the base
248 248 layer for the entire 'scipy ecosystem'. Once you have installed numpy,
249 249 you can import it as
250 250
251 251 In[5]:
252 252
253 253 .. code:: python
254 254
255 255 import numpy
256 256
257 257 though in this book we will use the common shorthand
258 258
259 259 In[6]:
260 260
261 261 .. code:: python
262 262
263 263 import numpy as np
264 264
265 265 As mentioned above, the main object provided by numpy is a powerful
266 266 array. We'll start by exploring how the numpy array differs from Python
267 267 lists. We start by creating a simple list and an array with the same
268 268 contents of the list:
269 269
270 270 In[7]:
271 271
272 272 .. code:: python
273 273
274 274 lst = [10, 20, 30, 40]
275 275 arr = np.array([10, 20, 30, 40])
276 276
277 277 Elements of a one-dimensional array are accessed with the same syntax as
278 278 a list:
279 279
280 280 In[8]:
281 281
282 282 .. code:: python
283 283
284 284 lst[0]
285 285
286 286 Out[8]:
287 287
288 288 .. parsed-literal::
289 289
290 290 10
291 291
292 292 In[9]:
293 293
294 294 .. code:: python
295 295
296 296 arr[0]
297 297
298 298 Out[9]:
299 299
300 300 .. parsed-literal::
301 301
302 302 10
303 303
304 304 In[10]:
305 305
306 306 .. code:: python
307 307
308 308 arr[-1]
309 309
310 310 Out[10]:
311 311
312 312 .. parsed-literal::
313 313
314 314 40
315 315
316 316 In[11]:
317 317
318 318 .. code:: python
319 319
320 320 arr[2:]
321 321
322 322 Out[11]:
323 323
324 324 .. parsed-literal::
325 325
326 326 array([30, 40])
327 327
328 328 The first difference to note between lists and arrays is that arrays are
329 329 *homogeneous*; i.e. all elements of an array must be of the same type.
330 330 In contrast, lists can contain elements of arbitrary type. For example,
331 331 we can change the last element in our list above to be a string:
332 332
333 333 In[12]:
334 334
335 335 .. code:: python
336 336
337 337 lst[-1] = 'a string inside a list'
338 338 lst
339 339
340 340 Out[12]:
341 341
342 342 .. parsed-literal::
343 343
344 344 [10, 20, 30, 'a string inside a list']
345 345
346 346 but the same can not be done with an array, as we get an error message:
347 347
348 348 In[13]:
349 349
350 350 .. code:: python
351 351
352 352 arr[-1] = 'a string inside an array'
353 353
354 354 ::
355 355
356 356 ---------------------------------------------------------------------------
357 357 ValueError Traceback (most recent call last)
358 358 /home/fperez/teach/book-math-labtool/<ipython-input-13-29c0bfa5fa8a> in <module>()
359 359 ----> 1 arr[-1] = 'a string inside an array'
360 360
361 361 ValueError: invalid literal for long() with base 10: 'a string inside an array'
362 362
363 363 The information about the type of an array is contained in its *dtype*
364 364 attribute:
365 365
366 366 In[14]:
367 367
368 368 .. code:: python
369 369
370 370 arr.dtype
371 371
372 372 Out[14]:
373 373
374 374 .. parsed-literal::
375 375
376 376 dtype('int32')
377 377
378 378 Once an array has been created, its dtype is fixed and it can only store
379 379 elements of the same type. For this example where the dtype is integer,
380 380 if we store a floating point number it will be automatically converted
381 381 into an integer:
382 382
383 383 In[15]:
384 384
385 385 .. code:: python
386 386
387 387 arr[-1] = 1.234
388 388 arr
389 389
390 390 Out[15]:
391 391
392 392 .. parsed-literal::
393 393
394 394 array([10, 20, 30, 1])
395 395
396 396 Above we created an array from an existing list; now let us now see
397 397 other ways in which we can create arrays, which we'll illustrate next. A
398 398 common need is to have an array initialized with a constant value, and
399 399 very often this value is 0 or 1 (suitable as starting value for additive
400 400 and multiplicative loops respectively); ``zeros`` creates arrays of all
401 401 zeros, with any desired dtype:
402 402
403 403 In[16]:
404 404
405 405 .. code:: python
406 406
407 407 np.zeros(5, float)
408 408
409 409 Out[16]:
410 410
411 411 .. parsed-literal::
412 412
413 413 array([ 0., 0., 0., 0., 0.])
414 414
415 415 In[17]:
416 416
417 417 .. code:: python
418 418
419 419 np.zeros(3, int)
420 420
421 421 Out[17]:
422 422
423 423 .. parsed-literal::
424 424
425 425 array([0, 0, 0])
426 426
427 427 In[18]:
428 428
429 429 .. code:: python
430 430
431 431 np.zeros(3, complex)
432 432
433 433 Out[18]:
434 434
435 435 .. parsed-literal::
436 436
437 437 array([ 0.+0.j, 0.+0.j, 0.+0.j])
438 438
439 439 and similarly for ``ones``:
440 440
441 441 In[19]:
442 442
443 443 .. code:: python
444 444
445 445 print '5 ones:', np.ones(5)
446 446
447 447 .. parsed-literal::
448 448
449 449 5 ones: [ 1. 1. 1. 1. 1.]
450 450
451 451
452 452 If we want an array initialized with an arbitrary value, we can create
453 453 an empty array and then use the fill method to put the value we want
454 454 into the array:
455 455
456 456 In[20]:
457 457
458 458 .. code:: python
459 459
460 460 a = empty(4)
461 461 a.fill(5.5)
462 462 a
463 463
464 464 Out[20]:
465 465
466 466 .. parsed-literal::
467 467
468 468 array([ 5.5, 5.5, 5.5, 5.5])
469 469
470 470 Numpy also offers the ``arange`` function, which works like the builtin
471 471 ``range`` but returns an array instead of a list:
472 472
473 473 In[21]:
474 474
475 475 .. code:: python
476 476
477 477 np.arange(5)
478 478
479 479 Out[21]:
480 480
481 481 .. parsed-literal::
482 482
483 483 array([0, 1, 2, 3, 4])
484 484
485 485 and the ``linspace`` and ``logspace`` functions to create linearly and
486 486 logarithmically-spaced grids respectively, with a fixed number of points
487 487 and including both ends of the specified interval:
488 488
489 489 In[22]:
490 490
491 491 .. code:: python
492 492
493 493 print "A linear grid between 0 and 1:", np.linspace(0, 1, 5)
494 494 print "A logarithmic grid between 10**1 and 10**4: ", np.logspace(1, 4, 4)
495 495
496 496 .. parsed-literal::
497 497
498 498 A linear grid between 0 and 1: [ 0. 0.25 0.5 0.75 1. ]
499 499 A logarithmic grid between 10**1 and 10**4: [ 10. 100. 1000. 10000.]
500 500
501 501
502 502 Finally, it is often useful to create arrays with random numbers that
503 503 follow a specific distribution. The ``np.random`` module contains a
504 504 number of functions that can be used to this effect, for example this
505 505 will produce an array of 5 random samples taken from a standard normal
506 506 distribution (0 mean and variance 1):
507 507
508 508 In[23]:
509 509
510 510 .. code:: python
511 511
512 512 np.random.randn(5)
513 513
514 514 Out[23]:
515 515
516 516 .. parsed-literal::
517 517
518 518 array([-0.08633343, -0.67375434, 1.00589536, 0.87081651, 1.65597822])
519 519
520 520 whereas this will also give 5 samples, but from a normal distribution
521 521 with a mean of 10 and a variance of 3:
522 522
523 523 In[24]:
524 524
525 525 .. code:: python
526 526
527 527 norm10 = np.random.normal(10, 3, 5)
528 528 norm10
529 529
530 530 Out[24]:
531 531
532 532 .. parsed-literal::
533 533
534 534 array([ 8.94879575, 5.53038269, 8.24847281, 12.14944165, 11.56209294])
535 535
536 536 Indexing with other arrays
537 537 --------------------------
538 538
539 539 Above we saw how to index arrays with single numbers and slices, just
540 540 like Python lists. But arrays allow for a more sophisticated kind of
541 541 indexing which is very powerful: you can index an array with another
542 542 array, and in particular with an array of boolean values. This is
543 543 particluarly useful to extract information from an array that matches a
544 544 certain condition.
545 545
546 546 Consider for example that in the array ``norm10`` we want to replace all
547 547 values above 9 with the value 0. We can do so by first finding the
548 548 *mask* that indicates where this condition is true or false:
549 549
550 550 In[25]:
551 551
552 552 .. code:: python
553 553
554 554 mask = norm10 > 9
555 555 mask
556 556
557 557 Out[25]:
558 558
559 559 .. parsed-literal::
560 560
561 561 array([False, False, False, True, True], dtype=bool)
562 562
563 563 Now that we have this mask, we can use it to either read those values or
564 564 to reset them to 0:
565 565
566 566 In[26]:
567 567
568 568 .. code:: python
569 569
570 570 print 'Values above 9:', norm10[mask]
571 571
572 572 .. parsed-literal::
573 573
574 574 Values above 9: [ 12.14944165 11.56209294]
575 575
576 576
577 577 In[27]:
578 578
579 579 .. code:: python
580 580
581 581 print 'Resetting all values above 9 to 0...'
582 582 norm10[mask] = 0
583 583 print norm10
584 584
585 585 .. parsed-literal::
586 586
587 587 Resetting all values above 9 to 0...
588 588 [ 8.94879575 5.53038269 8.24847281 0. 0. ]
589 589
590 590
591 591 Arrays with more than one dimension
592 592 -----------------------------------
593 593
594 594 Up until now all our examples have used one-dimensional arrays. But
595 595 Numpy can create arrays of aribtrary dimensions, and all the methods
596 596 illustrated in the previous section work with more than one dimension.
597 597 For example, a list of lists can be used to initialize a two dimensional
598 598 array:
599 599
600 600 In[28]:
601 601
602 602 .. code:: python
603 603
604 604 lst2 = [[1, 2], [3, 4]]
605 605 arr2 = np.array([[1, 2], [3, 4]])
606 606 arr2
607 607
608 608 Out[28]:
609 609
610 610 .. parsed-literal::
611 611
612 612 array([[1, 2],
613 613 [3, 4]])
614 614
615 615 With two-dimensional arrays we start seeing the power of numpy: while a
616 616 nested list can be indexed using repeatedly the ``[ ]`` operator,
617 617 multidimensional arrays support a much more natural indexing syntax with
618 618 a single ``[ ]`` and a set of indices separated by commas:
619 619
620 620 In[29]:
621 621
622 622 .. code:: python
623 623
624 624 print lst2[0][1]
625 625 print arr2[0,1]
626 626
627 627 .. parsed-literal::
628 628
629 629 2
630 630 2
631 631
632 632
633 633 Most of the array creation functions listed above can be used with more
634 634 than one dimension, for example:
635 635
636 636 In[30]:
637 637
638 638 .. code:: python
639 639
640 640 np.zeros((2,3))
641 641
642 642 Out[30]:
643 643
644 644 .. parsed-literal::
645 645
646 646 array([[ 0., 0., 0.],
647 647 [ 0., 0., 0.]])
648 648
649 649 In[31]:
650 650
651 651 .. code:: python
652 652
653 653 np.random.normal(10, 3, (2, 4))
654 654
655 655 Out[31]:
656 656
657 657 .. parsed-literal::
658 658
659 659 array([[ 11.26788826, 4.29619866, 11.09346496, 9.73861307],
660 660 [ 10.54025996, 9.5146268 , 10.80367214, 13.62204505]])
661 661
662 662 In fact, the shape of an array can be changed at any time, as long as
663 663 the total number of elements is unchanged. For example, if we want a 2x4
664 664 array with numbers increasing from 0, the easiest way to create it is:
665 665
666 666 In[32]:
667 667
668 668 .. code:: python
669 669
670 670 arr = np.arange(8).reshape(2,4)
671 671 print arr
672 672
673 673 .. parsed-literal::
674 674
675 675 [[0 1 2 3]
676 676 [4 5 6 7]]
677 677
678 678
679 679 With multidimensional arrays, you can also use slices, and you can mix
680 680 and match slices and single indices in the different dimensions (using
681 681 the same array as above):
682 682
683 683 In[33]:
684 684
685 685 .. code:: python
686 686
687 687 print 'Slicing in the second row:', arr[1, 2:4]
688 688 print 'All rows, third column :', arr[:, 2]
689 689
690 690 .. parsed-literal::
691 691
692 692 Slicing in the second row: [6 7]
693 693 All rows, third column : [2 6]
694 694
695 695
696 696 If you only provide one index, then you will get an array with one less
697 697 dimension containing that row:
698 698
699 699 In[34]:
700 700
701 701 .. code:: python
702 702
703 703 print 'First row: ', arr[0]
704 704 print 'Second row: ', arr[1]
705 705
706 706 .. parsed-literal::
707 707
708 708 First row: [0 1 2 3]
709 709 Second row: [4 5 6 7]
710 710
711 711
712 712 Now that we have seen how to create arrays with more than one dimension,
713 713 it's a good idea to look at some of the most useful properties and
714 714 methods that arrays have. The following provide basic information about
715 715 the size, shape and data in the array:
716 716
717 717 In[35]:
718 718
719 719 .. code:: python
720 720
721 721 print 'Data type :', arr.dtype
722 722 print 'Total number of elements :', arr.size
723 723 print 'Number of dimensions :', arr.ndim
724 724 print 'Shape (dimensionality) :', arr.shape
725 725 print 'Memory used (in bytes) :', arr.nbytes
726 726
727 727 .. parsed-literal::
728 728
729 729 Data type : int32
730 730 Total number of elements : 8
731 731 Number of dimensions : 2
732 732 Shape (dimensionality) : (2, 4)
733 733 Memory used (in bytes) : 32
734 734
735 735
736 736 Arrays also have many useful methods, some especially useful ones are:
737 737
738 738 In[36]:
739 739
740 740 .. code:: python
741 741
742 742 print 'Minimum and maximum :', arr.min(), arr.max()
743 743 print 'Sum and product of all elements :', arr.sum(), arr.prod()
744 744 print 'Mean and standard deviation :', arr.mean(), arr.std()
745 745
746 746 .. parsed-literal::
747 747
748 748 Minimum and maximum : 0 7
749 749 Sum and product of all elements : 28 0
750 750 Mean and standard deviation : 3.5 2.29128784748
751 751
752 752
753 753 For these methods, the above operations area all computed on all the
754 754 elements of the array. But for a multidimensional array, it's possible
755 755 to do the computation along a single dimension, by passing the ``axis``
756 756 parameter; for example:
757 757
758 758 In[37]:
759 759
760 760 .. code:: python
761 761
762 762 print 'For the following array:\n', arr
763 763 print 'The sum of elements along the rows is :', arr.sum(axis=1)
764 764 print 'The sum of elements along the columns is :', arr.sum(axis=0)
765 765
766 766 .. parsed-literal::
767 767
768 768 For the following array:
769 769 [[0 1 2 3]
770 770 [4 5 6 7]]
771 771 The sum of elements along the rows is : [ 6 22]
772 772 The sum of elements along the columns is : [ 4 6 8 10]
773 773
774 774
775 775 As you can see in this example, the value of the ``axis`` parameter is
776 776 the dimension which will be *consumed* once the operation has been
777 777 carried out. This is why to sum along the rows we use ``axis=0``.
778 778
779 779 This can be easily illustrated with an example that has more dimensions;
780 780 we create an array with 4 dimensions and shape ``(3,4,5,6)`` and sum
781 781 along the axis number 2 (i.e. the *third* axis, since in Python all
782 782 counts are 0-based). That consumes the dimension whose length was 5,
783 783 leaving us with a new array that has shape ``(3,4,6)``:
784 784
785 785 In[38]:
786 786
787 787 .. code:: python
788 788
789 789 np.zeros((3,4,5,6)).sum(2).shape
790 790
791 791 Out[38]:
792 792
793 793 .. parsed-literal::
794 794
795 795 (3, 4, 6)
796 796
797 797 Another widely used property of arrays is the ``.T`` attribute, which
798 798 allows you to access the transpose of the array:
799 799
800 800 In[39]:
801 801
802 802 .. code:: python
803 803
804 804 print 'Array:\n', arr
805 805 print 'Transpose:\n', arr.T
806 806
807 807 .. parsed-literal::
808 808
809 809 Array:
810 810 [[0 1 2 3]
811 811 [4 5 6 7]]
812 812 Transpose:
813 813 [[0 4]
814 814 [1 5]
815 815 [2 6]
816 816 [3 7]]
817 817
818 818
819 819 We don't have time here to look at all the methods and properties of
820 820 arrays, here's a complete list. Simply try exploring some of these
821 821 IPython to learn more, or read their description in the full Numpy
822 822 documentation:
823 823
824 824 ::
825 825
826 826 arr.T arr.copy arr.getfield arr.put arr.squeeze
827 827 arr.all arr.ctypes arr.imag arr.ravel arr.std
828 828 arr.any arr.cumprod arr.item arr.real arr.strides
829 829 arr.argmax arr.cumsum arr.itemset arr.repeat arr.sum
830 830 arr.argmin arr.data arr.itemsize arr.reshape arr.swapaxes
831 831 arr.argsort arr.diagonal arr.max arr.resize arr.take
832 832 arr.astype arr.dot arr.mean arr.round arr.tofile
833 833 arr.base arr.dtype arr.min arr.searchsorted arr.tolist
834 834 arr.byteswap arr.dump arr.nbytes arr.setasflat arr.tostring
835 835 arr.choose arr.dumps arr.ndim arr.setfield arr.trace
836 836 arr.clip arr.fill arr.newbyteorder arr.setflags arr.transpose
837 837 arr.compress arr.flags arr.nonzero arr.shape arr.var
838 838 arr.conj arr.flat arr.prod arr.size arr.view
839 839 arr.conjugate arr.flatten arr.ptp arr.sort
840 840
841 841
842 842 Operating with arrays
843 843 ---------------------
844 844
845 845 Arrays support all regular arithmetic operators, and the numpy library
846 846 also contains a complete collection of basic mathematical functions that
847 847 operate on arrays. It is important to remember that in general, all
848 848 operations with arrays are applied *element-wise*, i.e., are applied to
849 849 all the elements of the array at the same time. Consider for example:
850 850
851 851 In[40]:
852 852
853 853 .. code:: python
854 854
855 855 arr1 = np.arange(4)
856 856 arr2 = np.arange(10, 14)
857 857 print arr1, '+', arr2, '=', arr1+arr2
858 858
859 859 .. parsed-literal::
860 860
861 861 [0 1 2 3] + [10 11 12 13] = [10 12 14 16]
862 862
863 863
864 864 Importantly, you must remember that even the multiplication operator is
865 865 by default applied element-wise, it is *not* the matrix multiplication
866 866 from linear algebra (as is the case in Matlab, for example):
867 867
868 868 In[41]:
869 869
870 870 .. code:: python
871 871
872 872 print arr1, '*', arr2, '=', arr1*arr2
873 873
874 874 .. parsed-literal::
875 875
876 876 [0 1 2 3] * [10 11 12 13] = [ 0 11 24 39]
877 877
878 878
879 879 While this means that in principle arrays must always match in their
880 880 dimensionality in order for an operation to be valid, numpy will
881 881 *broadcast* dimensions when possible. For example, suppose that you want
882 882 to add the number 1.5 to ``arr1``; the following would be a valid way to
883 883 do it:
884 884
885 885 In[42]:
886 886
887 887 .. code:: python
888 888
889 889 arr1 + 1.5*np.ones(4)
890 890
891 891 Out[42]:
892 892
893 893 .. parsed-literal::
894 894
895 895 array([ 1.5, 2.5, 3.5, 4.5])
896 896
897 897 But thanks to numpy's broadcasting rules, the following is equally
898 898 valid:
899 899
900 900 In[43]:
901 901
902 902 .. code:: python
903 903
904 904 arr1 + 1.5
905 905
906 906 Out[43]:
907 907
908 908 .. parsed-literal::
909 909
910 910 array([ 1.5, 2.5, 3.5, 4.5])
911 911
912 912 In this case, numpy looked at both operands and saw that the first
913 913 (``arr1``) was a one-dimensional array of length 4 and the second was a
914 914 scalar, considered a zero-dimensional object. The broadcasting rules
915 915 allow numpy to:
916 916
917 917 - *create* new dimensions of length 1 (since this doesn't change the
918 918 size of the array)
919 919 - 'stretch' a dimension of length 1 that needs to be matched to a
920 920 dimension of a different size.
921 921
922 922 So in the above example, the scalar 1.5 is effectively:
923 923
924 924 - first 'promoted' to a 1-dimensional array of length 1
925 925 - then, this array is 'stretched' to length 4 to match the dimension of
926 926 ``arr1``.
927 927
928 928 After these two operations are complete, the addition can proceed as now
929 929 both operands are one-dimensional arrays of length 4.
930 930
931 931 This broadcasting behavior is in practice enormously powerful,
932 932 especially because when numpy broadcasts to create new dimensions or to
933 933 'stretch' existing ones, it doesn't actually replicate the data. In the
934 934 example above the operation is carried *as if* the 1.5 was a 1-d array
935 935 with 1.5 in all of its entries, but no actual array was ever created.
936 936 This can save lots of memory in cases when the arrays in question are
937 937 large and can have significant performance implications.
938 938
939 939 The general rule is: when operating on two arrays, NumPy compares their
940 940 shapes element-wise. It starts with the trailing dimensions, and works
941 941 its way forward, creating dimensions of length 1 as needed. Two
942 942 dimensions are considered compatible when
943 943
944 944 - they are equal to begin with, or
945 945 - one of them is 1; in this case numpy will do the 'stretching' to make
946 946 them equal.
947 947
948 948 If these conditions are not met, a
949 949 ``ValueError: frames are not aligned`` exception is thrown, indicating
950 950 that the arrays have incompatible shapes. The size of the resulting
951 951 array is the maximum size along each dimension of the input arrays.
952 952
953 953 This shows how the broadcasting rules work in several dimensions:
954 954
955 955 In[44]:
956 956
957 957 .. code:: python
958 958
959 959 b = np.array([2, 3, 4, 5])
960 960 print arr, '\n\n+', b , '\n----------------\n', arr + b
961 961
962 962 .. parsed-literal::
963 963
964 964 [[0 1 2 3]
965 965 [4 5 6 7]]
966 966
967 967 + [2 3 4 5]
968 968 ----------------
969 969 [[ 2 4 6 8]
970 970 [ 6 8 10 12]]
971 971
972 972
973 973 Now, how could you use broadcasting to say add ``[4, 6]`` along the rows
974 974 to ``arr`` above? Simply performing the direct addition will produce the
975 975 error we previously mentioned:
976 976
977 977 In[45]:
978 978
979 979 .. code:: python
980 980
981 981 c = np.array([4, 6])
982 982 arr + c
983 983
984 984 ::
985 985
986 986 ---------------------------------------------------------------------------
987 987 ValueError Traceback (most recent call last)
988 988 /home/fperez/teach/book-math-labtool/<ipython-input-45-62aa20ac1980> in <module>()
989 989 1 c = np.array([4, 6])
990 990 ----> 2 arr + c
991 991
992 992 ValueError: operands could not be broadcast together with shapes (2,4) (2)
993 993
994 994 According to the rules above, the array ``c`` would need to have a
995 995 *trailing* dimension of 1 for the broadcasting to work. It turns out
996 996 that numpy allows you to 'inject' new dimensions anywhere into an array
997 997 on the fly, by indexing it with the special object ``np.newaxis``:
998 998
999 999 In[46]:
1000 1000
1001 1001 .. code:: python
1002 1002
1003 1003 (c[:, np.newaxis]).shape
1004 1004
1005 1005 Out[46]:
1006 1006
1007 1007 .. parsed-literal::
1008 1008
1009 1009 (2, 1)
1010 1010
1011 1011 This is exactly what we need, and indeed it works:
1012 1012
1013 1013 In[47]:
1014 1014
1015 1015 .. code:: python
1016 1016
1017 1017 arr + c[:, np.newaxis]
1018 1018
1019 1019 Out[47]:
1020 1020
1021 1021 .. parsed-literal::
1022 1022
1023 1023 array([[ 4, 5, 6, 7],
1024 1024 [10, 11, 12, 13]])
1025 1025
1026 1026 For the full broadcasting rules, please see the official Numpy docs,
1027 1027 which describe them in detail and with more complex examples.
1028 1028
1029 1029 As we mentioned before, Numpy ships with a full complement of
1030 1030 mathematical functions that work on entire arrays, including logarithms,
1031 1031 exponentials, trigonometric and hyperbolic trigonometric functions, etc.
1032 1032 Furthermore, scipy ships a rich special function library in the
1033 1033 ``scipy.special`` module that includes Bessel, Airy, Fresnel, Laguerre
1034 1034 and other classical special functions. For example, sampling the sine
1035 1035 function at 100 points between :math:`0` and :math:`2\pi` is as simple
1036 1036 as:
1037 1037
1038 1038 In[48]:
1039 1039
1040 1040 .. code:: python
1041 1041
1042 1042 x = np.linspace(0, 2*np.pi, 100)
1043 1043 y = np.sin(x)
1044 1044
1045 1045 Linear algebra in numpy
1046 1046 -----------------------
1047 1047
1048 1048 Numpy ships with a basic linear algebra library, and all arrays have a
1049 1049 ``dot`` method whose behavior is that of the scalar dot product when its
1050 1050 arguments are vectors (one-dimensional arrays) and the traditional
1051 1051 matrix multiplication when one or both of its arguments are
1052 1052 two-dimensional arrays:
1053 1053
1054 1054 In[49]:
1055 1055
1056 1056 .. code:: python
1057 1057
1058 1058 v1 = np.array([2, 3, 4])
1059 1059 v2 = np.array([1, 0, 1])
1060 1060 print v1, '.', v2, '=', v1.dot(v2)
1061 1061
1062 1062 .. parsed-literal::
1063 1063
1064 1064 [2 3 4] . [1 0 1] = 6
1065 1065
1066 1066
1067 1067 Here is a regular matrix-vector multiplication, note that the array
1068 1068 ``v1`` should be viewed as a *column* vector in traditional linear
1069 1069 algebra notation; numpy makes no distinction between row and column
1070 1070 vectors and simply verifies that the dimensions match the required rules
1071 1071 of matrix multiplication, in this case we have a :math:`2 \times 3`
1072 1072 matrix multiplied by a 3-vector, which produces a 2-vector:
1073 1073
1074 1074 In[50]:
1075 1075
1076 1076 .. code:: python
1077 1077
1078 1078 A = np.arange(6).reshape(2, 3)
1079 1079 print A, 'x', v1, '=', A.dot(v1)
1080 1080
1081 1081 .. parsed-literal::
1082 1082
1083 1083 [[0 1 2]
1084 1084 [3 4 5]] x [2 3 4] = [11 38]
1085 1085
1086 1086
1087 1087 For matrix-matrix multiplication, the same dimension-matching rules must
1088 1088 be satisfied, e.g. consider the difference between :math:`A \times A^T`:
1089 1089
1090 1090 In[51]:
1091 1091
1092 1092 .. code:: python
1093 1093
1094 1094 print A.dot(A.T)
1095 1095
1096 1096 .. parsed-literal::
1097 1097
1098 1098 [[ 5 14]
1099 1099 [14 50]]
1100 1100
1101 1101
1102 1102 and :math:`A^T \times A`:
1103 1103
1104 1104 In[52]:
1105 1105
1106 1106 .. code:: python
1107 1107
1108 1108 print A.T.dot(A)
1109 1109
1110 1110 .. parsed-literal::
1111 1111
1112 1112 [[ 9 12 15]
1113 1113 [12 17 22]
1114 1114 [15 22 29]]
1115 1115
1116 1116
1117 1117 Furthermore, the ``numpy.linalg`` module includes additional
1118 1118 functionality such as determinants, matrix norms, Cholesky, eigenvalue
1119 1119 and singular value decompositions, etc. For even more linear algebra
1120 1120 tools, ``scipy.linalg`` contains the majority of the tools in the
1121 1121 classic LAPACK libraries as well as functions to operate on sparse
1122 1122 matrices. We refer the reader to the Numpy and Scipy documentations for
1123 1123 additional details on these.
1124 1124
1125 1125 Reading and writing arrays to disk
1126 1126 ----------------------------------
1127 1127
1128 1128 Numpy lets you read and write arrays into files in a number of ways. In
1129 1129 order to use these tools well, it is critical to understand the
1130 1130 difference between a *text* and a *binary* file containing numerical
1131 1131 data. In a text file, the number :math:`\pi` could be written as
1132 1132 "3.141592653589793", for example: a string of digits that a human can
1133 1133 read, with in this case 15 decimal digits. In contrast, that same number
1134 1134 written to a binary file would be encoded as 8 characters (bytes) that
1135 1135 are not readable by a human but which contain the exact same data that
1136 1136 the variable ``pi`` had in the computer's memory.
1137 1137
1138 1138 The tradeoffs between the two modes are thus:
1139 1139
1140 1140 - Text mode: occupies more space, precision can be lost (if not all
1141 1141 digits are written to disk), but is readable and editable by hand
1142 1142 with a text editor. Can *only* be used for one- and two-dimensional
1143 1143 arrays.
1144 1144
1145 1145 - Binary mode: compact and exact representation of the data in memory,
1146 1146 can't be read or edited by hand. Arrays of any size and
1147 1147 dimensionality can be saved and read without loss of information.
1148 1148
1149 1149 First, let's see how to read and write arrays in text mode. The
1150 1150 ``np.savetxt`` function saves an array to a text file, with options to
1151 1151 control the precision, separators and even adding a header:
1152 1152
1153 1153 In[53]:
1154 1154
1155 1155 .. code:: python
1156 1156
1157 1157 arr = np.arange(10).reshape(2, 5)
1158 1158 np.savetxt('test.out', arr, fmt='%.2e', header="My dataset")
1159 1159 !cat test.out
1160 1160
1161 1161 .. parsed-literal::
1162 1162
1163 1163 # My dataset
1164 1164 0.00e+00 1.00e+00 2.00e+00 3.00e+00 4.00e+00
1165 1165 5.00e+00 6.00e+00 7.00e+00 8.00e+00 9.00e+00
1166 1166
1167 1167
1168 1168 And this same type of file can then be read with the matching
1169 1169 ``np.loadtxt`` function:
1170 1170
1171 1171 In[54]:
1172 1172
1173 1173 .. code:: python
1174 1174
1175 1175 arr2 = np.loadtxt('test.out')
1176 1176 print arr2
1177 1177
1178 1178 .. parsed-literal::
1179 1179
1180 1180 [[ 0. 1. 2. 3. 4.]
1181 1181 [ 5. 6. 7. 8. 9.]]
1182 1182
1183 1183
1184 1184 For binary data, Numpy provides the ``np.save`` and ``np.savez``
1185 1185 routines. The first saves a single array to a file with ``.npy``
1186 1186 extension, while the latter can be used to save a *group* of arrays into
1187 1187 a single file with ``.npz`` extension. The files created with these
1188 1188 routines can then be read with the ``np.load`` function.
1189 1189
1190 1190 Let us first see how to use the simpler ``np.save`` function to save a
1191 1191 single array:
1192 1192
1193 1193 In[55]:
1194 1194
1195 1195 .. code:: python
1196 1196
1197 1197 np.save('test.npy', arr2)
1198 1198 # Now we read this back
1199 1199 arr2n = np.load('test.npy')
1200 1200 # Let's see if any element is non-zero in the difference.
1201 1201 # A value of True would be a problem.
1202 1202 print 'Any differences?', np.any(arr2-arr2n)
1203 1203
1204 1204 .. parsed-literal::
1205 1205
1206 1206 Any differences? False
1207 1207
1208 1208
1209 1209 Now let us see how the ``np.savez`` function works. You give it a
1210 1210 filename and either a sequence of arrays or a set of keywords. In the
1211 1211 first mode, the function will auotmatically name the saved arrays in the
1212 1212 archive as ``arr_0``, ``arr_1``, etc:
1213 1213
1214 1214 In[56]:
1215 1215
1216 1216 .. code:: python
1217 1217
1218 1218 np.savez('test.npz', arr, arr2)
1219 1219 arrays = np.load('test.npz')
1220 1220 arrays.files
1221 1221
1222 1222 Out[56]:
1223 1223
1224 1224 .. parsed-literal::
1225 1225
1226 1226 ['arr_1', 'arr_0']
1227 1227
1228 1228 Alternatively, we can explicitly choose how to name the arrays we save:
1229 1229
1230 1230 In[57]:
1231 1231
1232 1232 .. code:: python
1233 1233
1234 1234 np.savez('test.npz', array1=arr, array2=arr2)
1235 1235 arrays = np.load('test.npz')
1236 1236 arrays.files
1237 1237
1238 1238 Out[57]:
1239 1239
1240 1240 .. parsed-literal::
1241 1241
1242 1242 ['array2', 'array1']
1243 1243
1244 1244 The object returned by ``np.load`` from an ``.npz`` file works like a
1245 1245 dictionary, though you can also access its constituent files by
1246 1246 attribute using its special ``.f`` field; this is best illustrated with
1247 1247 an example with the ``arrays`` object from above:
1248 1248
1249 1249 In[58]:
1250 1250
1251 1251 .. code:: python
1252 1252
1253 1253 print 'First row of first array:', arrays['array1'][0]
1254 1254 # This is an equivalent way to get the same field
1255 1255 print 'First row of first array:', arrays.f.array1[0]
1256 1256
1257 1257 .. parsed-literal::
1258 1258
1259 1259 First row of first array: [0 1 2 3 4]
1260 1260 First row of first array: [0 1 2 3 4]
1261 1261
1262 1262
1263 1263 This ``.npz`` format is a very convenient way to package compactly and
1264 1264 without loss of information, into a single file, a group of related
1265 1265 arrays that pertain to a specific problem. At some point, however, the
1266 1266 complexity of your dataset may be such that the optimal approach is to
1267 1267 use one of the standard formats in scientific data processing that have
1268 1268 been designed to handle complex datasets, such as NetCDF or HDF5.
1269 1269
1270 1270 Fortunately, there are tools for manipulating these formats in Python,
1271 1271 and for storing data in other ways such as databases. A complete
1272 1272 discussion of the possibilities is beyond the scope of this discussion,
1273 1273 but of particular interest for scientific users we at least mention the
1274 1274 following:
1275 1275
1276 1276 - The ``scipy.io`` module contains routines to read and write Matlab
1277 1277 files in ``.mat`` format and files in the NetCDF format that is
1278 1278 widely used in certain scientific disciplines.
1279 1279
1280 1280 - For manipulating files in the HDF5 format, there are two excellent
1281 1281 options in Python: The PyTables project offers a high-level, object
1282 1282 oriented approach to manipulating HDF5 datasets, while the h5py
1283 1283 project offers a more direct mapping to the standard HDF5 library
1284 1284 interface. Both are excellent tools; if you need to work with HDF5
1285 1285 datasets you should read some of their documentation and examples and
1286 1286 decide which approach is a better match for your needs.
1287 1287
1288 1288
1289 1289
1290 1290 High quality data visualization with Matplotlib
1291 1291 ===============================================
1292 1292
1293 1293 The `matplotlib <http://matplotlib.sf.net>`_ library is a powerful tool
1294 1294 capable of producing complex publication-quality figures with fine
1295 1295 layout control in two and three dimensions; here we will only provide a
1296 1296 minimal self-contained introduction to its usage that covers the
1297 1297 functionality needed for the rest of the book. We encourage the reader
1298 1298 to read the tutorials included with the matplotlib documentation as well
1299 1299 as to browse its extensive gallery of examples that include source code.
1300 1300
1301 1301 Just as we typically use the shorthand ``np`` for Numpy, we will use
1302 1302 ``plt`` for the ``matplotlib.pyplot`` module where the easy-to-use
1303 1303 plotting functions reside (the library contains a rich object-oriented
1304 1304 architecture that we don't have the space to discuss here):
1305 1305
1306 1306 In[59]:
1307 1307
1308 1308 .. code:: python
1309 1309
1310 1310 import matplotlib.pyplot as plt
1311 1311
1312 1312 The most frequently used function is simply called ``plot``, here is how
1313 1313 you can make a simple plot of :math:`\sin(x)` for
1314 1314 :math:`x \in [0, 2\pi]` with labels and a grid (we use the semicolon in
1315 1315 the last line to suppress the display of some information that is
1316 1316 unnecessary right now):
1317 1317
1318 1318 In[60]:
1319 1319
1320 1320 .. code:: python
1321 1321
1322 1322 x = np.linspace(0, 2*np.pi)
1323 1323 y = np.sin(x)
1324 1324 plt.plot(x,y, label='sin(x)')
1325 1325 plt.legend()
1326 1326 plt.grid()
1327 1327 plt.title('Harmonic')
1328 1328 plt.xlabel('x')
1329 1329 plt.ylabel('y');
1330 1330
1331 .. image:: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_01.svg
1331 .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_01.svg
1332 1332
1333 1333 You can control the style, color and other properties of the markers,
1334 1334 for example:
1335 1335
1336 1336 In[61]:
1337 1337
1338 1338 .. code:: python
1339 1339
1340 1340 plt.plot(x, y, linewidth=2);
1341 1341
1342 .. image:: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_02.svg
1342 .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_02.svg
1343 1343
1344 1344 In[62]:
1345 1345
1346 1346 .. code:: python
1347 1347
1348 1348 plt.plot(x, y, 'o', markersize=5, color='r');
1349 1349
1350 .. image:: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_03.svg
1350 .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_03.svg
1351 1351
1352 1352 We will now see how to create a few other common plot types, such as a
1353 1353 simple error plot:
1354 1354
1355 1355 In[63]:
1356 1356
1357 1357 .. code:: python
1358 1358
1359 1359 # example data
1360 1360 x = np.arange(0.1, 4, 0.5)
1361 1361 y = np.exp(-x)
1362 1362
1363 1363 # example variable error bar values
1364 1364 yerr = 0.1 + 0.2*np.sqrt(x)
1365 1365 xerr = 0.1 + yerr
1366 1366
1367 1367 # First illustrate basic pyplot interface, using defaults where possible.
1368 1368 plt.figure()
1369 1369 plt.errorbar(x, y, xerr=0.2, yerr=0.4)
1370 1370 plt.title("Simplest errorbars, 0.2 in x, 0.4 in y");
1371 1371
1372 .. image:: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_04.svg
1372 .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_04.svg
1373 1373
1374 1374 A simple log plot
1375 1375
1376 1376 In[64]:
1377 1377
1378 1378 .. code:: python
1379 1379
1380 1380 x = np.linspace(-5, 5)
1381 1381 y = np.exp(-x**2)
1382 1382 plt.semilogy(x, y);
1383 1383
1384 .. image:: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_05.svg
1384 .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_05.svg
1385 1385
1386 1386 A histogram annotated with text inside the plot, using the ``text``
1387 1387 function:
1388 1388
1389 1389 In[65]:
1390 1390
1391 1391 .. code:: python
1392 1392
1393 1393 mu, sigma = 100, 15
1394 1394 x = mu + sigma * np.random.randn(10000)
1395 1395
1396 1396 # the histogram of the data
1397 1397 n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)
1398 1398
1399 1399 plt.xlabel('Smarts')
1400 1400 plt.ylabel('Probability')
1401 1401 plt.title('Histogram of IQ')
1402 1402 # This will put a text fragment at the position given:
1403 1403 plt.text(55, .027, r'$\mu=100,\ \sigma=15$', fontsize=14)
1404 1404 plt.axis([40, 160, 0, 0.03])
1405 1405 plt.grid(True)
1406 1406
1407 .. image:: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_06.svg
1407 .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_06.svg
1408 1408
1409 1409 Image display
1410 1410 -------------
1411 1411
1412 1412 The ``imshow`` command can display single or multi-channel images. A
1413 1413 simple array of random numbers, plotted in grayscale:
1414 1414
1415 1415 In[66]:
1416 1416
1417 1417 .. code:: python
1418 1418
1419 1419 from matplotlib import cm
1420 1420 plt.imshow(np.random.rand(5, 10), cmap=cm.gray, interpolation='nearest');
1421 1421
1422 .. image:: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_07.svg
1422 .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_07.svg
1423 1423
1424 1424 A real photograph is a multichannel image, ``imshow`` interprets it
1425 1425 correctly:
1426 1426
1427 1427 In[67]:
1428 1428
1429 1429 .. code:: python
1430 1430
1431 1431 img = plt.imread('stinkbug.png')
1432 1432 print 'Dimensions of the array img:', img.shape
1433 1433 plt.imshow(img);
1434 1434
1435 1435 .. parsed-literal::
1436 1436
1437 1437 Dimensions of the array img: (375, 500, 3)
1438 1438
1439 1439
1440 .. image:: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_08.svg
1440 .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_08.svg
1441 1441
1442 1442 Simple 3d plotting with matplotlib
1443 1443 ----------------------------------
1444 1444
1445 1445 Note that you must execute at least once in your session:
1446 1446
1447 1447 In[68]:
1448 1448
1449 1449 .. code:: python
1450 1450
1451 1451 from mpl_toolkits.mplot3d import Axes3D
1452 1452
1453 1453 One this has been done, you can create 3d axes with the
1454 1454 ``projection='3d'`` keyword to ``add_subplot``:
1455 1455
1456 1456 ::
1457 1457
1458 1458 fig = plt.figure()
1459 1459 fig.add_subplot(<other arguments here>, projection='3d')
1460 1460
1461 1461
1462 1462 A simple surface plot:
1463 1463
1464 1464 In[72]:
1465 1465
1466 1466 .. code:: python
1467 1467
1468 1468 from mpl_toolkits.mplot3d.axes3d import Axes3D
1469 1469 from matplotlib import cm
1470 1470
1471 1471 fig = plt.figure()
1472 1472 ax = fig.add_subplot(1, 1, 1, projection='3d')
1473 1473 X = np.arange(-5, 5, 0.25)
1474 1474 Y = np.arange(-5, 5, 0.25)
1475 1475 X, Y = np.meshgrid(X, Y)
1476 1476 R = np.sqrt(X**2 + Y**2)
1477 1477 Z = np.sin(R)
1478 1478 surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet,
1479 1479 linewidth=0, antialiased=False)
1480 1480 ax.set_zlim3d(-1.01, 1.01);
1481 1481
1482 .. image:: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_09.svg
1482 .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_09.svg
1483 1483
1484 1484 IPython: a powerful interactive environment
1485 1485 ===========================================
1486 1486
1487 1487 A key component of the everyday workflow of most scientific computing
1488 1488 environments is a good interactive environment, that is, a system in
1489 1489 which you can execute small amounts of code and view the results
1490 1490 immediately, combining both printing out data and opening graphical
1491 1491 visualizations. All modern systems for scientific computing, commercial
1492 1492 and open source, include such functionality.
1493 1493
1494 1494 Out of the box, Python also offers a simple interactive shell with very
1495 1495 limited capabilities. But just like the scientific community built Numpy
1496 1496 to provide arrays suited for scientific work (since Pytyhon's lists
1497 1497 aren't optimal for this task), it has also developed an interactive
1498 1498 environment much more sophisticated than the built-in one. The `IPython
1499 1499 project <http://ipython.org>`_ offers a set of tools to make productive
1500 1500 use of the Python language, all the while working interactively and with
1501 1501 immedate feedback on your results. The basic tools that IPython provides
1502 1502 are:
1503 1503
1504 1504 1. A powerful terminal shell, with many features designed to increase
1505 1505 the fluidity and productivity of everyday scientific workflows,
1506 1506 including:
1507 1507
1508 1508 - rich introspection of all objects and variables including easy
1509 1509 access to the source code of any function
1510 1510 - powerful and extensible tab completion of variables and filenames,
1511 1511 - tight integration with matplotlib, supporting interactive figures
1512 1512 that don't block the terminal,
1513 1513 - direct access to the filesystem and underlying operating system,
1514 1514 - an extensible system for shell-like commands called 'magics' that
1515 1515 reduce the work needed to perform many common tasks,
1516 1516 - tools for easily running, timing, profiling and debugging your
1517 1517 codes,
1518 1518 - syntax highlighted error messages with much more detail than the
1519 1519 default Python ones,
1520 1520 - logging and access to all previous history of inputs, including
1521 1521 across sessions
1522 1522
1523 1523 2. A Qt console that provides the look and feel of a terminal, but adds
1524 1524 support for inline figures, graphical calltips, a persistent session
1525 1525 that can survive crashes (even segfaults) of the kernel process, and
1526 1526 more.
1527 1527
1528 1528 3. A web-based notebook that can execute code and also contain rich text
1529 1529 and figures, mathematical equations and arbitrary HTML. This notebook
1530 1530 presents a document-like view with cells where code is executed but
1531 1531 that can be edited in-place, reordered, mixed with explanatory text
1532 1532 and figures, etc.
1533 1533
1534 1534 4. A high-performance, low-latency system for parallel computing that
1535 1535 supports the control of a cluster of IPython engines communicating
1536 1536 over a network, with optimizations that minimize unnecessary copying
1537 1537 of large objects (especially numpy arrays).
1538 1538
1539 1539 We will now discuss the highlights of the tools 1-3 above so that you
1540 1540 can make them an effective part of your workflow. The topic of parallel
1541 1541 computing is beyond the scope of this document, but we encourage you to
1542 1542 read the extensive
1543 1543 `documentation <http://ipython.org/ipython-doc/rel-0.12.1/parallel/index.html>`_
1544 1544 and `tutorials <http://minrk.github.com/scipy-tutorial-2011/>`_ on this
1545 1545 available on the IPython website.
1546 1546
1547 1547 The IPython terminal
1548 1548 --------------------
1549 1549
1550 1550 You can start IPython at the terminal simply by typing:
1551 1551
1552 1552 ::
1553 1553
1554 1554 $ ipython
1555 1555
1556 1556 which will provide you some basic information about how to get started
1557 1557 and will then open a prompt labeled ``In [1]:`` for you to start typing.
1558 1558 Here we type :math:`2^{64}` and Python computes the result for us in
1559 1559 exact arithmetic, returning it as ``Out[1]``:
1560 1560
1561 1561 ::
1562 1562
1563 1563 $ ipython
1564 1564 Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
1565 1565 Type "copyright", "credits" or "license" for more information.
1566 1566
1567 1567 IPython 0.13.dev -- An enhanced Interactive Python.
1568 1568 ? -> Introduction and overview of IPython's features.
1569 1569 %quickref -> Quick reference.
1570 1570 help -> Python's own help system.
1571 1571 object? -> Details about 'object', use 'object??' for extra details.
1572 1572
1573 1573 In [1]: 2**64
1574 1574 Out[1]: 18446744073709551616L
1575 1575
1576 1576 The first thing you should know about IPython is that all your inputs
1577 1577 and outputs are saved. There are two variables named ``In`` and ``Out``
1578 1578 which are filled as you work with your results. Furthermore, all outputs
1579 1579 are also saved to auto-created variables of the form ``_NN`` where
1580 1580 ``NN`` is the prompt number, and inputs to ``_iNN``. This allows you to
1581 1581 recover quickly the result of a prior computation by referring to its
1582 1582 number even if you forgot to store it as a variable. For example, later
1583 1583 on in the above session you can do:
1584 1584
1585 1585 ::
1586 1586
1587 1587 In [6]: print _1
1588 1588 18446744073709551616
1589 1589
1590 1590
1591 1591 We strongly recommend that you take a few minutes to read at least the
1592 1592 basic introduction provided by the ``?`` command, and keep in mind that
1593 1593 the ``%quickref`` command at all times can be used as a quick reference
1594 1594 "cheat sheet" of the most frequently used features of IPython.
1595 1595
1596 1596 At the IPython prompt, any valid Python code that you type will be
1597 1597 executed similarly to the default Python shell (though often with more
1598 1598 informative feedback). But since IPython is a *superset* of the default
1599 1599 Python shell; let's have a brief look at some of its additional
1600 1600 functionality.
1601 1601
1602 1602 **Object introspection**
1603 1603
1604 1604 A simple ``?`` command provides a general introduction to IPython, but
1605 1605 as indicated in the banner above, you can use the ``?`` syntax to ask
1606 1606 for details about any object. For example, if we type ``_1?``, IPython
1607 1607 will print the following details about this variable:
1608 1608
1609 1609 ::
1610 1610
1611 1611 In [14]: _1?
1612 1612 Type: long
1613 1613 Base Class: <type 'long'>
1614 1614 String Form:18446744073709551616
1615 1615 Namespace: Interactive
1616 1616 Docstring:
1617 1617 long(x[, base]) -> integer
1618 1618
1619 1619 Convert a string or number to a long integer, if possible. A floating
1620 1620
1621 1621 [etc... snipped for brevity]
1622 1622
1623 1623 If you add a second ``?`` and for any oobject ``x`` type ``x??``,
1624 1624 IPython will try to provide an even more detailed analsysi of the
1625 1625 object, including its syntax-highlighted source code when it can be
1626 1626 found. It's possible that ``x??`` returns the same information as
1627 1627 ``x?``, but in many cases ``x??`` will indeed provide additional
1628 1628 details.
1629 1629
1630 1630 Finally, the ``?`` syntax is also useful to search *namespaces* with
1631 1631 wildcards. Suppose you are wondering if there is any function in Numpy
1632 1632 that may do text-related things; with ``np.*txt*?``, IPython will print
1633 1633 all the names in the ``np`` namespace (our Numpy shorthand) that have
1634 1634 'txt' anywhere in their name:
1635 1635
1636 1636 ::
1637 1637
1638 1638 In [17]: np.*txt*?
1639 1639 np.genfromtxt
1640 1640 np.loadtxt
1641 1641 np.mafromtxt
1642 1642 np.ndfromtxt
1643 1643 np.recfromtxt
1644 1644 np.savetxt
1645 1645
1646 1646
1647 1647 **Tab completion**
1648 1648
1649 1649 IPython makes the tab key work extra hard for you as a way to rapidly
1650 1650 inspect objects and libraries. Whenever you have typed something at the
1651 1651 prompt, by hitting the ``<tab>`` key IPython will try to complete the
1652 1652 rest of the line. For this, IPython will analyze the text you had so far
1653 1653 and try to search for Python data or files that may match the context
1654 1654 you have already provided.
1655 1655
1656 1656 For example, if you type ``np.load`` and hit the key, you'll see:
1657 1657
1658 1658 ::
1659 1659
1660 1660 In [21]: np.load<TAB HERE>
1661 1661 np.load np.loads np.loadtxt
1662 1662
1663 1663 so you can quickly find all the load-related functionality in numpy. Tab
1664 1664 completion works even for function arguments, for example consider this
1665 1665 function definition:
1666 1666
1667 1667 ::
1668 1668
1669 1669 In [20]: def f(x, frobinate=False):
1670 1670 ....: if frobinate:
1671 1671 ....: return x**2
1672 1672 ....:
1673 1673
1674 1674 If you now use the ``<tab>`` key after having typed 'fro' you'll get all
1675 1675 valid Python completions, but those marked with ``=`` at the end are
1676 1676 known to be keywords of your function:
1677 1677
1678 1678 ::
1679 1679
1680 1680 In [21]: f(2, fro<TAB HERE>
1681 1681 frobinate= frombuffer fromfunction frompyfunc fromstring
1682 1682 from fromfile fromiter fromregex frozenset
1683 1683
1684 1684 at this point you can add the ``b`` letter and hit ``<tab>`` once more,
1685 1685 and IPython will finish the line for you:
1686 1686
1687 1687 ::
1688 1688
1689 1689 In [21]: f(2, frobinate=
1690 1690
1691 1691 As a beginner, simply get into the habit of using ``<tab>`` after most
1692 1692 objects; it should quickly become second nature as you will see how
1693 1693 helps keep a fluid workflow and discover useful information. Later on
1694 1694 you can also customize this behavior by writing your own completion
1695 1695 code, if you so desire.
1696 1696
1697 1697 **Matplotlib integration**
1698 1698
1699 1699 One of the most useful features of IPython for scientists is its tight
1700 1700 integration with matplotlib: at the terminal IPython lets you open
1701 1701 matplotlib figures without blocking your typing (which is what happens
1702 1702 if you try to do the same thing at the default Python shell), and in the
1703 1703 Qt console and notebook you can even view your figures embedded in your
1704 1704 workspace next to the code that created them.
1705 1705
1706 1706 The matplotlib support can be either activated when you start IPython by
1707 1707 passing the ``--pylab`` flag, or at any point later in your session by
1708 1708 using the ``%pylab`` command. If you start IPython with ``--pylab``,
1709 1709 you'll see something like this (note the extra message about pylab):
1710 1710
1711 1711 ::
1712 1712
1713 1713 $ ipython --pylab
1714 1714 Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
1715 1715 Type "copyright", "credits" or "license" for more information.
1716 1716
1717 1717 IPython 0.13.dev -- An enhanced Interactive Python.
1718 1718 ? -> Introduction and overview of IPython's features.
1719 1719 %quickref -> Quick reference.
1720 1720 help -> Python's own help system.
1721 1721 object? -> Details about 'object', use 'object??' for extra details.
1722 1722
1723 1723 Welcome to pylab, a matplotlib-based Python environment [backend: Qt4Agg].
1724 1724 For more information, type 'help(pylab)'.
1725 1725
1726 1726 In [1]:
1727 1727
1728 1728 Furthermore, IPython will import ``numpy`` with the ``np`` shorthand,
1729 1729 ``matplotlib.pyplot`` as ``plt``, and it will also load all of the numpy
1730 1730 and pyplot top-level names so that you can directly type something like:
1731 1731
1732 1732 ::
1733 1733
1734 1734 In [1]: x = linspace(0, 2*pi, 200)
1735 1735
1736 1736 In [2]: plot(x, sin(x))
1737 1737 Out[2]: [<matplotlib.lines.Line2D at 0x9e7c16c>]
1738 1738
1739 1739 instead of having to prefix each call with its full signature (as we
1740 1740 have been doing in the examples thus far):
1741 1741
1742 1742 ::
1743 1743
1744 1744 In [3]: x = np.linspace(0, 2*np.pi, 200)
1745 1745
1746 1746 In [4]: plt.plot(x, np.sin(x))
1747 1747 Out[4]: [<matplotlib.lines.Line2D at 0x9e900ac>]
1748 1748
1749 1749 This shorthand notation can be a huge time-saver when working
1750 1750 interactively (it's a few characters but you are likely to type them
1751 1751 hundreds of times in a session). But we should note that as you develop
1752 1752 persistent scripts and notebooks meant for reuse, it's best to get in
1753 1753 the habit of using the longer notation (known as *fully qualified names*
1754 1754 as it's clearer where things come from and it makes for more robust,
1755 1755 readable and maintainable code in the long run).
1756 1756
1757 1757 **Access to the operating system and files**
1758 1758
1759 1759 In IPython, you can type ``ls`` to see your files or ``cd`` to change
1760 1760 directories, just like you would at a regular system prompt:
1761 1761
1762 1762 ::
1763 1763
1764 1764 In [2]: cd tests
1765 1765 /home/fperez/ipython/nbconvert/tests
1766 1766
1767 1767 In [3]: ls test.*
1768 1768 test.aux test.html test.ipynb test.log test.out test.pdf test.rst test.tex
1769 1769
1770 1770 Furthermore, if you use the ``!`` at the beginning of a line, any
1771 1771 commands you pass afterwards go directly to the operating system:
1772 1772
1773 1773 ::
1774 1774
1775 1775 In [4]: !echo "Hello IPython"
1776 1776 Hello IPython
1777 1777
1778 1778 IPython offers a useful twist in this feature: it will substitute in the
1779 1779 command the value of any *Python* variable you may have if you prepend
1780 1780 it with a ``$`` sign:
1781 1781
1782 1782 ::
1783 1783
1784 1784 In [5]: message = 'IPython interpolates from Python to the shell'
1785 1785
1786 1786 In [6]: !echo $message
1787 1787 IPython interpolates from Python to the shell
1788 1788
1789 1789 This feature can be extremely useful, as it lets you combine the power
1790 1790 and clarity of Python for complex logic with the immediacy and
1791 1791 familiarity of many shell commands. Additionally, if you start the line
1792 1792 with *two* ``$$`` signs, the output of the command will be automatically
1793 1793 captured as a list of lines, e.g.:
1794 1794
1795 1795 ::
1796 1796
1797 1797 In [10]: !!ls test.*
1798 1798 Out[10]:
1799 1799 ['test.aux',
1800 1800 'test.html',
1801 1801 'test.ipynb',
1802 1802 'test.log',
1803 1803 'test.out',
1804 1804 'test.pdf',
1805 1805 'test.rst',
1806 1806 'test.tex']
1807 1807
1808 1808 As explained above, you can now use this as the variable ``_10``. If you
1809 1809 directly want to capture the output of a system command to a Python
1810 1810 variable, you can use the syntax ``=!``:
1811 1811
1812 1812 ::
1813 1813
1814 1814 In [11]: testfiles =! ls test.*
1815 1815
1816 1816 In [12]: print testfiles
1817 1817 ['test.aux', 'test.html', 'test.ipynb', 'test.log', 'test.out', 'test.pdf', 'test.rst', 'test.tex']
1818 1818
1819 1819 Finally, the special ``%alias`` command lets you define names that are
1820 1820 shorthands for system commands, so that you can type them without having
1821 1821 to prefix them via ``!`` explicitly (for example, ``ls`` is an alias
1822 1822 that has been predefined for you at startup).
1823 1823
1824 1824 **Magic commands**
1825 1825
1826 1826 IPython has a system for special commands, called 'magics', that let you
1827 1827 control IPython itself and perform many common tasks with a more
1828 1828 shell-like syntax: it uses spaces for delimiting arguments, flags can be
1829 1829 set with dashes and all arguments are treated as strings, so no
1830 1830 additional quoting is required. This kind of syntax is invalid in the
1831 1831 Python language but very convenient for interactive typing (less
1832 1832 parentheses, commans and quoting everywhere); IPython distinguishes the
1833 1833 two by detecting lines that start with the ``%`` character.
1834 1834
1835 1835 You can learn more about the magic system by simply typing ``%magic`` at
1836 1836 the prompt, which will give you a short description plus the
1837 1837 documentation on *all* available magics. If you want to see only a
1838 1838 listing of existing magics, you can use ``%lsmagic``:
1839 1839
1840 1840 ::
1841 1841
1842 1842 In [4]: lsmagic
1843 1843 Available magic functions:
1844 1844 %alias %autocall %autoindent %automagic %bookmark %c %cd %colors %config %cpaste
1845 1845 %debug %dhist %dirs %doctest_mode %ds %ed %edit %env %gui %hist %history
1846 1846 %install_default_config %install_ext %install_profiles %load_ext %loadpy %logoff %logon
1847 1847 %logstart %logstate %logstop %lsmagic %macro %magic %notebook %page %paste %pastebin
1848 1848 %pd %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %pop %popd %pprint %precision %profile
1849 1849 %prun %psearch %psource %pushd %pwd %pycat %pylab %quickref %recall %rehashx
1850 1850 %reload_ext %rep %rerun %reset %reset_selective %run %save %sc %stop %store %sx %tb
1851 1851 %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode
1852 1852
1853 1853 Automagic is ON, % prefix NOT needed for magic functions.
1854 1854
1855 1855 Note how the example above omitted the eplicit ``%`` marker and simply
1856 1856 uses ``lsmagic``. As long as the 'automagic' feature is on (which it is
1857 1857 by default), you can omit the ``%`` marker as long as there is no
1858 1858 ambiguity with a Python variable of the same name.
1859 1859
1860 1860 **Running your code**
1861 1861
1862 1862 While it's easy to type a few lines of code in IPython, for any
1863 1863 long-lived work you should keep your codes in Python scripts (or in
1864 1864 IPython notebooks, see below). Consider that you have a script, in this
1865 1865 case trivially simple for the sake of brevity, named ``simple.py``:
1866 1866
1867 1867 ::
1868 1868
1869 1869 In [12]: !cat simple.py
1870 1870 import numpy as np
1871 1871
1872 1872 x = np.random.normal(size=100)
1873 1873
1874 1874 print 'First elment of x:', x[0]
1875 1875
1876 1876 The typical workflow with IPython is to use the ``%run`` magic to
1877 1877 execute your script (you can omit the .py extension if you want). When
1878 1878 you run it, the script will execute just as if it had been run at the
1879 1879 system prompt with ``python simple.py`` (though since modules don't get
1880 1880 re-executed on new imports by Python, all system initialization is
1881 1881 essentially free, which can have a significant run time impact in some
1882 1882 cases):
1883 1883
1884 1884 ::
1885 1885
1886 1886 In [13]: run simple
1887 1887 First elment of x: -1.55872256289
1888 1888
1889 1889 Once it completes, all variables defined in it become available for you
1890 1890 to use interactively:
1891 1891
1892 1892 ::
1893 1893
1894 1894 In [14]: x.shape
1895 1895 Out[14]: (100,)
1896 1896
1897 1897 This allows you to plot data, try out ideas, etc, in a
1898 1898 ``%run``/interact/edit cycle that can be very productive. As you start
1899 1899 understanding your problem better you can refine your script further,
1900 1900 incrementally improving it based on the work you do at the IPython
1901 1901 prompt. At any point you can use the ``%hist`` magic to print out your
1902 1902 history without prompts, so that you can copy useful fragments back into
1903 1903 the script.
1904 1904
1905 1905 By default, ``%run`` executes scripts in a completely empty namespace,
1906 1906 to better mimic how they would execute at the system prompt with plain
1907 1907 Python. But if you use the ``-i`` flag, the script will also see your
1908 1908 interactively defined variables. This lets you edit in a script larger
1909 1909 amounts of code that still behave as if you had typed them at the
1910 1910 IPython prompt.
1911 1911
1912 1912 You can also get a summary of the time taken by your script with the
1913 1913 ``-t`` flag; consider a different script ``randsvd.py`` that takes a bit
1914 1914 longer to run:
1915 1915
1916 1916 ::
1917 1917
1918 1918 In [21]: run -t randsvd.py
1919 1919
1920 1920 IPython CPU timings (estimated):
1921 1921 User : 0.38 s.
1922 1922 System : 0.04 s.
1923 1923 Wall time: 0.34 s.
1924 1924
1925 1925 ``User`` is the time spent by the computer executing your code, while
1926 1926 ``System`` is the time the operating system had to work on your behalf,
1927 1927 doing things like memory allocation that are needed by your code but
1928 1928 that you didn't explicitly program and that happen inside the kernel.
1929 1929 The ``Wall time`` is the time on a 'clock on the wall' between the start
1930 1930 and end of your program.
1931 1931
1932 1932 If ``Wall > User+System``, your code is most likely waiting idle for
1933 1933 certain periods. That could be waiting for data to arrive from a remote
1934 1934 source or perhaps because the operating system has to swap large amounts
1935 1935 of virtual memory. If you know that your code doesn't explicitly wait
1936 1936 for remote data to arrive, you should investigate further to identify
1937 1937 possible ways of improving the performance profile.
1938 1938
1939 1939 If you only want to time how long a single statement takes, you don't
1940 1940 need to put it into a script as you can use the ``%timeit`` magic, which
1941 1941 uses Python's ``timeit`` module to very carefully measure timig data;
1942 1942 ``timeit`` can measure even short statements that execute extremely
1943 1943 fast:
1944 1944
1945 1945 ::
1946 1946
1947 1947 In [27]: %timeit a=1
1948 1948 10000000 loops, best of 3: 23 ns per loop
1949 1949
1950 1950 and for code that runs longer, it automatically adjusts so the overall
1951 1951 measurement doesn't take too long:
1952 1952
1953 1953 ::
1954 1954
1955 1955 In [28]: %timeit np.linalg.svd(x)
1956 1956 1 loops, best of 3: 310 ms per loop
1957 1957
1958 1958 The ``%run`` magic still has more options for debugging and profiling
1959 1959 data; you should read its documentation for many useful details (as
1960 1960 always, just type ``%run?``).
1961 1961
1962 1962 The graphical Qt console
1963 1963 ------------------------
1964 1964
1965 1965 If you type at the system prompt (see the IPython website for
1966 1966 installation details, as this requires some additional libraries):
1967 1967
1968 1968 ::
1969 1969
1970 1970 $ ipython qtconsole
1971 1971
1972 1972 instead of opening in a terminal as before, IPython will start a
1973 1973 graphical console that at first sight appears just like a terminal, but
1974 1974 which is in fact much more capable than a text-only terminal. This is a
1975 1975 specialized terminal designed for interactive scientific work, and it
1976 1976 supports full multi-line editing with color highlighting and graphical
1977 1977 calltips for functions, it can keep multiple IPython sessions open
1978 1978 simultaneously in tabs, and when scripts run it can display the figures
1979 1979 inline directly in the work area.
1980 1980
1981 1981 .. raw:: html
1982 1982
1983 1983 <center>
1984 1984
1985 1985 .. raw:: html
1986 1986
1987 1987 </center>
1988 1988
1989 1989
1990 1990 % This cell is for the pdflatex output only
1991 1991 \begin{figure}[htbp]
1992 1992 \centering
1993 1993 \includegraphics[width=3in]{ipython_qtconsole2.png}
1994 1994 \caption{The IPython Qt console: a lightweight terminal for scientific exploration, with code, results and graphics in a soingle environment.}
1995 1995 \end{figure}
1996 1996 The Qt console accepts the same ``--pylab`` startup flags as the
1997 1997 terminal, but you can additionally supply the value ``--pylab inline``,
1998 1998 which enables the support for inline graphics shown in the figure. This
1999 1999 is ideal for keeping all the code and figures in the same session, given
2000 2000 that the console can save the output of your entire session to HTML or
2001 2001 PDF.
2002 2002
2003 2003 Since the Qt console makes it far more convenient than the terminal to
2004 2004 edit blocks of code with multiple lines, in this environment it's worth
2005 2005 knowing about the ``%loadpy`` magic function. ``%loadpy`` takes a path
2006 2006 to a local file or remote URL, fetches its contents, and puts it in the
2007 2007 work area for you to further edit and execute. It can be an extremely
2008 2008 fast and convenient way of loading code from local disk or remote
2009 2009 examples from sites such as the `Matplotlib
2010 2010 gallery <http://matplotlib.sourceforge.net/gallery.html>`_.
2011 2011
2012 2012 Other than its enhanced capabilities for code and graphics, all of the
2013 2013 features of IPython we've explained before remain functional in this
2014 2014 graphical console.
2015 2015
2016 2016 The IPython Notebook
2017 2017 --------------------
2018 2018
2019 2019 The third way to interact with IPython, in addition to the terminal and
2020 2020 graphical Qt console, is a powerful web interface called the "IPython
2021 2021 Notebook". If you run at the system console (you can omit the ``pylab``
2022 2022 flags if you don't need plotting support):
2023 2023
2024 2024 ::
2025 2025
2026 2026 $ ipython notebook --pylab inline
2027 2027
2028 2028 IPython will start a process that runs a web server in your local
2029 2029 machine and to which a web browser can connect. The Notebook is a
2030 2030 workspace that lets you execute code in blocks called 'cells' and
2031 2031 displays any results and figures, but which can also contain arbitrary
2032 2032 text (including LaTeX-formatted mathematical expressions) and any rich
2033 2033 media that a modern web browser is capable of displaying.
2034 2034
2035 2035 .. raw:: html
2036 2036
2037 2037 <center>
2038 2038
2039 2039 .. raw:: html
2040 2040
2041 2041 </center>
2042 2042
2043 2043
2044 2044 % This cell is for the pdflatex output only
2045 2045 \begin{figure}[htbp]
2046 2046 \centering
2047 2047 \includegraphics[width=3in]{ipython-notebook-specgram-2.png}
2048 2048 \caption{The IPython Notebook: text, equations, code, results, graphics and other multimedia in an open format for scientific exploration and collaboration}
2049 2049 \end{figure}
2050 2050 In fact, this document was written as a Notebook, and only exported to
2051 2051 LaTeX for printing. Inside of each cell, all the features of IPython
2052 2052 that we have discussed before remain functional, since ultimately this
2053 2053 web client is communicating with the same IPython code that runs in the
2054 2054 terminal. But this interface is a much more rich and powerful
2055 2055 environment for maintaining long-term "live and executable" scientific
2056 2056 documents.
2057 2057
2058 2058 Notebook environments have existed in commercial systems like
2059 2059 Mathematica(TM) and Maple(TM) for a long time; in the open source world
2060 2060 the `Sage <http://sagemath.org>`_ project blazed this particular trail
2061 2061 starting in 2006, and now we bring all the features that have made
2062 2062 IPython such a widely used tool to a Notebook model.
2063 2063
2064 2064 Since the Notebook runs as a web application, it is possible to
2065 2065 configure it for remote access, letting you run your computations on a
2066 2066 persistent server close to your data, which you can then access remotely
2067 2067 from any browser-equipped computer. We encourage you to read the
2068 2068 extensive documentation provided by the IPython project for details on
2069 2069 how to do this and many more features of the notebook.
2070 2070
2071 2071 Finally, as we said earlier, IPython also has a high-level and easy to
2072 2072 use set of libraries for parallel computing, that let you control
2073 2073 (interactively if desired) not just one IPython but an entire cluster of
2074 2074 'IPython engines'. Unfortunately a detailed discussion of these tools is
2075 2075 beyond the scope of this text, but should you need to parallelize your
2076 2076 analysis codes, a quick read of the tutorials and examples provided at
2077 2077 the IPython site may prove fruitful.
General Comments 0
You need to be logged in to leave comments. Login now