Show More
@@ -1,2077 +1,2077 b'' | |||
|
1 | 1 | An Introduction to the Scientific Python Ecosystem |
|
2 | 2 | ================================================== |
|
3 | 3 | |
|
4 | 4 | While the Python language is an excellent tool for general-purpose |
|
5 | 5 | programming, with a highly readable syntax, rich and powerful data types |
|
6 | 6 | (strings, lists, sets, dictionaries, arbitrary length integers, etc) and |
|
7 | 7 | a very comprehensive standard library, it was not designed specifically |
|
8 | 8 | for mathematical and scientific computing. Neither the language nor its |
|
9 | 9 | standard library have facilities for the efficient representation of |
|
10 | 10 | multidimensional datasets, tools for linear algebra and general matrix |
|
11 | 11 | manipulations (an essential building block of virtually all technical |
|
12 | 12 | computing), nor any data visualization facilities. |
|
13 | 13 | |
|
14 | 14 | In particular, Python lists are very flexible containers that can be |
|
15 | 15 | nested arbitrarily deep and which can hold any Python object in them, |
|
16 | 16 | but they are poorly suited to represent efficiently common mathematical |
|
17 | 17 | constructs like vectors and matrices. In contrast, much of our modern |
|
18 | 18 | heritage of scientific computing has been built on top of libraries |
|
19 | 19 | written in the Fortran language, which has native support for vectors |
|
20 | 20 | and matrices as well as a library of mathematical functions that can |
|
21 | 21 | efficiently operate on entire arrays at once. |
|
22 | 22 | |
|
23 | 23 | Scientific Python: a collaboration of projects built by scientists |
|
24 | 24 | ------------------------------------------------------------------ |
|
25 | 25 | |
|
26 | 26 | The scientific community has developed a set of related Python libraries |
|
27 | 27 | that provide powerful array facilities, linear algebra, numerical |
|
28 | 28 | algorithms, data visualization and more. In this appendix, we will |
|
29 | 29 | briefly outline the tools most frequently used for this purpose, that |
|
30 | 30 | make "Scientific Python" something far more powerful than the Python |
|
31 | 31 | language alone. |
|
32 | 32 | |
|
33 | 33 | For reasons of space, we can only describe in some detail the central |
|
34 | 34 | Numpy library, but below we provide links to the websites of each |
|
35 | 35 | project where you can read their documentation in more detail. |
|
36 | 36 | |
|
37 | 37 | First, let's look at an overview of the basic tools that most scientists |
|
38 | 38 | use in daily research with Python. The core of this ecosystem is |
|
39 | 39 | composed of: |
|
40 | 40 | |
|
41 | 41 | - Numpy: the basic library that most others depend on, it provides a |
|
42 | 42 | powerful array type that can represent multidmensional datasets of |
|
43 | 43 | many different kinds and that supports arithmetic operations. Numpy |
|
44 | 44 | also provides a library of common mathematical functions, basic |
|
45 | 45 | linear algebra, random number generation and Fast Fourier Transforms. |
|
46 | 46 | Numpy can be found at `numpy.scipy.org <http://numpy.scipy.org>`_ |
|
47 | 47 | |
|
48 | 48 | - Scipy: a large collection of numerical algorithms that operate on |
|
49 | 49 | numpy arrays and provide facilities for many common tasks in |
|
50 | 50 | scientific computing, including dense and sparse linear algebra |
|
51 | 51 | support, optimization, special functions, statistics, n-dimensional |
|
52 | 52 | image processing, signal processing and more. Scipy can be found at |
|
53 | 53 | `scipy.org <http://scipy.org>`_. |
|
54 | 54 | |
|
55 | 55 | - Matplotlib: a data visualization library with a strong focus on |
|
56 | 56 | producing high-quality output, it supports a variety of common |
|
57 | 57 | scientific plot types in two and three dimensions, with precise |
|
58 | 58 | control over the final output and format for publication-quality |
|
59 | 59 | results. Matplotlib can also be controlled interactively allowing |
|
60 | 60 | graphical manipulation of your data (zooming, panning, etc) and can |
|
61 | 61 | be used with most modern user interface toolkits. It can be found at |
|
62 | 62 | `matplotlib.sf.net <http://matplotlib.sf.net>`_. |
|
63 | 63 | |
|
64 | 64 | - IPython: while not strictly scientific in nature, IPython is the |
|
65 | 65 | interactive environment in which many scientists spend their time. |
|
66 | 66 | IPython provides a powerful Python shell that integrates tightly with |
|
67 | 67 | Matplotlib and with easy access to the files and operating system, |
|
68 | 68 | and which can execute in a terminal or in a graphical Qt console. |
|
69 | 69 | IPython also has a web-based notebook interface that can combine code |
|
70 | 70 | with text, mathematical expressions, figures and multimedia. It can |
|
71 | 71 | be found at `ipython.org <http://ipython.org>`_. |
|
72 | 72 | |
|
73 | 73 | While each of these tools can be installed separately, in our opinion |
|
74 | 74 | the most convenient way today of accessing them (especially on Windows |
|
75 | 75 | and Mac computers) is to install the `Free Edition of the Enthought |
|
76 | 76 | Python Distribution <http://www.enthought.com/products/epd_free.php>`_ |
|
77 | 77 | which contain all the above. Other free alternatives on Windows (but not |
|
78 | 78 | on Macs) are `Python(x,y) <http://code.google.com/p/pythonxy>`_ and |
|
79 | 79 | `Christoph Gohlke's packages |
|
80 | 80 | page <http://www.lfd.uci.edu/~gohlke/pythonlibs>`_. |
|
81 | 81 | |
|
82 | 82 | These four 'core' libraries are in practice complemented by a number of |
|
83 | 83 | other tools for more specialized work. We will briefly list here the |
|
84 | 84 | ones that we think are the most commonly needed: |
|
85 | 85 | |
|
86 | 86 | - Sympy: a symbolic manipulation tool that turns a Python session into |
|
87 | 87 | a computer algebra system. It integrates with the IPython notebook, |
|
88 | 88 | rendering results in properly typeset mathematical notation. |
|
89 | 89 | `sympy.org <http://sympy.org>`_. |
|
90 | 90 | |
|
91 | 91 | - Mayavi: sophisticated 3d data visualization; |
|
92 | 92 | `code.enthought.com/projects/mayavi <http://code.enthought.com/projects/mayavi>`_. |
|
93 | 93 | |
|
94 | 94 | - Cython: a bridge language between Python and C, useful both to |
|
95 | 95 | optimize performance bottlenecks in Python and to access C libraries |
|
96 | 96 | directly; `cython.org <http://cython.org>`_. |
|
97 | 97 | |
|
98 | 98 | - Pandas: high-performance data structures and data analysis tools, |
|
99 | 99 | with powerful data alignment and structural manipulation |
|
100 | 100 | capabilities; `pandas.pydata.org <http://pandas.pydata.org>`_. |
|
101 | 101 | |
|
102 | 102 | - Statsmodels: statistical data exploration and model estimation; |
|
103 | 103 | `statsmodels.sourceforge.net <http://statsmodels.sourceforge.net>`_. |
|
104 | 104 | |
|
105 | 105 | - Scikit-learn: general purpose machine learning algorithms with a |
|
106 | 106 | common interface; `scikit-learn.org <http://scikit-learn.org>`_. |
|
107 | 107 | |
|
108 | 108 | - Scikits-image: image processing toolbox; |
|
109 | 109 | `scikits-image.org <http://scikits-image.org>`_. |
|
110 | 110 | |
|
111 | 111 | - NetworkX: analysis of complex networks (in the graph theoretical |
|
112 | 112 | sense); `networkx.lanl.gov <http://networkx.lanl.gov>`_. |
|
113 | 113 | |
|
114 | 114 | - PyTables: management of hierarchical datasets using the |
|
115 | 115 | industry-standard HDF5 format; |
|
116 | 116 | `www.pytables.org <http://www.pytables.org>`_. |
|
117 | 117 | |
|
118 | 118 | Beyond these, for any specific problem you should look on the internet |
|
119 | 119 | first, before starting to write code from scratch. There's a good chance |
|
120 | 120 | that someone, somewhere, has written an open source library that you can |
|
121 | 121 | use for part or all of your problem. |
|
122 | 122 | |
|
123 | 123 | A note about the examples below |
|
124 | 124 | ------------------------------- |
|
125 | 125 | |
|
126 | 126 | In all subsequent examples, you will see blocks of input code, followed |
|
127 | 127 | by the results of the code if the code generated output. This output may |
|
128 | 128 | include text, graphics and other result objects. These blocks of input |
|
129 | 129 | can be pasted into your interactive IPython session or notebook for you |
|
130 | 130 | to execute. In the print version of this document, a thin vertical bar |
|
131 | 131 | on the left of the blocks of input and output shows which blocks go |
|
132 | 132 | together. |
|
133 | 133 | |
|
134 | 134 | If you are reading this text as an actual IPython notebook, you can |
|
135 | 135 | press ``Shift-Enter`` or use the 'play' button on the toolbar |
|
136 | 136 | (right-pointing triangle) to execute each block of code, known as a |
|
137 | 137 | 'cell' in IPython: |
|
138 | 138 | |
|
139 | 139 | In[71]: |
|
140 | 140 | |
|
141 | 141 | .. code:: python |
|
142 | 142 | |
|
143 | 143 | # This is a block of code, below you'll see its output |
|
144 | 144 | print "Welcome to the world of scientific computing with Python!" |
|
145 | 145 | |
|
146 | 146 | .. parsed-literal:: |
|
147 | 147 | |
|
148 | 148 | Welcome to the world of scientific computing with Python! |
|
149 | 149 | |
|
150 | 150 | |
|
151 | 151 | Motivation: the trapezoidal rule |
|
152 | 152 | ================================ |
|
153 | 153 | |
|
154 | 154 | In subsequent sections we'll provide a basic introduction to the nuts |
|
155 | 155 | and bolts of the basic scientific python tools; but we'll first motivate |
|
156 | 156 | it with a brief example that illustrates what you can do in a few lines |
|
157 | 157 | with these tools. For this, we will use the simple problem of |
|
158 | 158 | approximating a definite integral with the trapezoid rule: |
|
159 | 159 | |
|
160 | 160 | .. math:: |
|
161 | 161 | |
|
162 | 162 | |
|
163 | 163 | \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right). |
|
164 | 164 | |
|
165 | 165 | Our task will be to compute this formula for a function such as: |
|
166 | 166 | |
|
167 | 167 | .. math:: |
|
168 | 168 | |
|
169 | 169 | |
|
170 | 170 | f(x) = (x-3)(x-5)(x-7)+85 |
|
171 | 171 | |
|
172 | 172 | integrated between :math:`a=1` and :math:`b=9`. |
|
173 | 173 | |
|
174 | 174 | First, we define the function and sample it evenly between 0 and 10 at |
|
175 | 175 | 200 points: |
|
176 | 176 | |
|
177 | 177 | In[1]: |
|
178 | 178 | |
|
179 | 179 | .. code:: python |
|
180 | 180 | |
|
181 | 181 | def f(x): |
|
182 | 182 | return (x-3)*(x-5)*(x-7)+85 |
|
183 | 183 | |
|
184 | 184 | import numpy as np |
|
185 | 185 | x = np.linspace(0, 10, 200) |
|
186 | 186 | y = f(x) |
|
187 | 187 | |
|
188 | 188 | We select :math:`a` and :math:`b`, our integration limits, and we take |
|
189 | 189 | only a few points in that region to illustrate the error behavior of the |
|
190 | 190 | trapezoid approximation: |
|
191 | 191 | |
|
192 | 192 | In[2]: |
|
193 | 193 | |
|
194 | 194 | .. code:: python |
|
195 | 195 | |
|
196 | 196 | a, b = 1, 9 |
|
197 | 197 | xint = x[logical_and(x>=a, x<=b)][::30] |
|
198 | 198 | yint = y[logical_and(x>=a, x<=b)][::30] |
|
199 | 199 | |
|
200 | 200 | Let's plot both the function and the area below it in the trapezoid |
|
201 | 201 | approximation: |
|
202 | 202 | |
|
203 | 203 | In[3]: |
|
204 | 204 | |
|
205 | 205 | .. code:: python |
|
206 | 206 | |
|
207 | 207 | import matplotlib.pyplot as plt |
|
208 | 208 | plt.plot(x, y, lw=2) |
|
209 | 209 | plt.axis([0, 10, 0, 140]) |
|
210 | 210 | plt.fill_between(xint, 0, yint, facecolor='gray', alpha=0.4) |
|
211 | 211 | plt.text(0.5 * (a + b), 30,r"$\int_a^b f(x)dx$", horizontalalignment='center', fontsize=20); |
|
212 | 212 | |
|
213 |
.. image:: tests/ipynbref/IntroNumPy |
|
|
213 | .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_00.svg | |
|
214 | 214 | |
|
215 | 215 | Compute the integral both at high accuracy and with the trapezoid |
|
216 | 216 | approximation |
|
217 | 217 | |
|
218 | 218 | In[4]: |
|
219 | 219 | |
|
220 | 220 | .. code:: python |
|
221 | 221 | |
|
222 | 222 | from scipy.integrate import quad, trapz |
|
223 | 223 | integral, error = quad(f, 1, 9) |
|
224 | 224 | trap_integral = trapz(yint, xint) |
|
225 | 225 | print "The integral is: %g +/- %.1e" % (integral, error) |
|
226 | 226 | print "The trapezoid approximation with", len(xint), "points is:", trap_integral |
|
227 | 227 | print "The absolute error is:", abs(integral - trap_integral) |
|
228 | 228 | |
|
229 | 229 | .. parsed-literal:: |
|
230 | 230 | |
|
231 | 231 | The integral is: 680 +/- 7.5e-12 |
|
232 | 232 | The trapezoid approximation with 6 points is: 621.286411141 |
|
233 | 233 | The absolute error is: 58.7135888589 |
|
234 | 234 | |
|
235 | 235 | |
|
236 | 236 | This simple example showed us how, combining the numpy, scipy and |
|
237 | 237 | matplotlib libraries we can provide an illustration of a standard method |
|
238 | 238 | in elementary calculus with just a few lines of code. We will now |
|
239 | 239 | discuss with more detail the basic usage of these tools. |
|
240 | 240 | |
|
241 | 241 | NumPy arrays: the right data structure for scientific computing |
|
242 | 242 | =============================================================== |
|
243 | 243 | |
|
244 | 244 | Basics of Numpy arrays |
|
245 | 245 | ---------------------- |
|
246 | 246 | |
|
247 | 247 | We now turn our attention to the Numpy library, which forms the base |
|
248 | 248 | layer for the entire 'scipy ecosystem'. Once you have installed numpy, |
|
249 | 249 | you can import it as |
|
250 | 250 | |
|
251 | 251 | In[5]: |
|
252 | 252 | |
|
253 | 253 | .. code:: python |
|
254 | 254 | |
|
255 | 255 | import numpy |
|
256 | 256 | |
|
257 | 257 | though in this book we will use the common shorthand |
|
258 | 258 | |
|
259 | 259 | In[6]: |
|
260 | 260 | |
|
261 | 261 | .. code:: python |
|
262 | 262 | |
|
263 | 263 | import numpy as np |
|
264 | 264 | |
|
265 | 265 | As mentioned above, the main object provided by numpy is a powerful |
|
266 | 266 | array. We'll start by exploring how the numpy array differs from Python |
|
267 | 267 | lists. We start by creating a simple list and an array with the same |
|
268 | 268 | contents of the list: |
|
269 | 269 | |
|
270 | 270 | In[7]: |
|
271 | 271 | |
|
272 | 272 | .. code:: python |
|
273 | 273 | |
|
274 | 274 | lst = [10, 20, 30, 40] |
|
275 | 275 | arr = np.array([10, 20, 30, 40]) |
|
276 | 276 | |
|
277 | 277 | Elements of a one-dimensional array are accessed with the same syntax as |
|
278 | 278 | a list: |
|
279 | 279 | |
|
280 | 280 | In[8]: |
|
281 | 281 | |
|
282 | 282 | .. code:: python |
|
283 | 283 | |
|
284 | 284 | lst[0] |
|
285 | 285 | |
|
286 | 286 | Out[8]: |
|
287 | 287 | |
|
288 | 288 | .. parsed-literal:: |
|
289 | 289 | |
|
290 | 290 | 10 |
|
291 | 291 | |
|
292 | 292 | In[9]: |
|
293 | 293 | |
|
294 | 294 | .. code:: python |
|
295 | 295 | |
|
296 | 296 | arr[0] |
|
297 | 297 | |
|
298 | 298 | Out[9]: |
|
299 | 299 | |
|
300 | 300 | .. parsed-literal:: |
|
301 | 301 | |
|
302 | 302 | 10 |
|
303 | 303 | |
|
304 | 304 | In[10]: |
|
305 | 305 | |
|
306 | 306 | .. code:: python |
|
307 | 307 | |
|
308 | 308 | arr[-1] |
|
309 | 309 | |
|
310 | 310 | Out[10]: |
|
311 | 311 | |
|
312 | 312 | .. parsed-literal:: |
|
313 | 313 | |
|
314 | 314 | 40 |
|
315 | 315 | |
|
316 | 316 | In[11]: |
|
317 | 317 | |
|
318 | 318 | .. code:: python |
|
319 | 319 | |
|
320 | 320 | arr[2:] |
|
321 | 321 | |
|
322 | 322 | Out[11]: |
|
323 | 323 | |
|
324 | 324 | .. parsed-literal:: |
|
325 | 325 | |
|
326 | 326 | array([30, 40]) |
|
327 | 327 | |
|
328 | 328 | The first difference to note between lists and arrays is that arrays are |
|
329 | 329 | *homogeneous*; i.e. all elements of an array must be of the same type. |
|
330 | 330 | In contrast, lists can contain elements of arbitrary type. For example, |
|
331 | 331 | we can change the last element in our list above to be a string: |
|
332 | 332 | |
|
333 | 333 | In[12]: |
|
334 | 334 | |
|
335 | 335 | .. code:: python |
|
336 | 336 | |
|
337 | 337 | lst[-1] = 'a string inside a list' |
|
338 | 338 | lst |
|
339 | 339 | |
|
340 | 340 | Out[12]: |
|
341 | 341 | |
|
342 | 342 | .. parsed-literal:: |
|
343 | 343 | |
|
344 | 344 | [10, 20, 30, 'a string inside a list'] |
|
345 | 345 | |
|
346 | 346 | but the same can not be done with an array, as we get an error message: |
|
347 | 347 | |
|
348 | 348 | In[13]: |
|
349 | 349 | |
|
350 | 350 | .. code:: python |
|
351 | 351 | |
|
352 | 352 | arr[-1] = 'a string inside an array' |
|
353 | 353 | |
|
354 | 354 | :: |
|
355 | 355 | |
|
356 | 356 | --------------------------------------------------------------------------- |
|
357 | 357 | ValueError Traceback (most recent call last) |
|
358 | 358 | /home/fperez/teach/book-math-labtool/<ipython-input-13-29c0bfa5fa8a> in <module>() |
|
359 | 359 | ----> 1 arr[-1] = 'a string inside an array' |
|
360 | 360 | |
|
361 | 361 | ValueError: invalid literal for long() with base 10: 'a string inside an array' |
|
362 | 362 | |
|
363 | 363 | The information about the type of an array is contained in its *dtype* |
|
364 | 364 | attribute: |
|
365 | 365 | |
|
366 | 366 | In[14]: |
|
367 | 367 | |
|
368 | 368 | .. code:: python |
|
369 | 369 | |
|
370 | 370 | arr.dtype |
|
371 | 371 | |
|
372 | 372 | Out[14]: |
|
373 | 373 | |
|
374 | 374 | .. parsed-literal:: |
|
375 | 375 | |
|
376 | 376 | dtype('int32') |
|
377 | 377 | |
|
378 | 378 | Once an array has been created, its dtype is fixed and it can only store |
|
379 | 379 | elements of the same type. For this example where the dtype is integer, |
|
380 | 380 | if we store a floating point number it will be automatically converted |
|
381 | 381 | into an integer: |
|
382 | 382 | |
|
383 | 383 | In[15]: |
|
384 | 384 | |
|
385 | 385 | .. code:: python |
|
386 | 386 | |
|
387 | 387 | arr[-1] = 1.234 |
|
388 | 388 | arr |
|
389 | 389 | |
|
390 | 390 | Out[15]: |
|
391 | 391 | |
|
392 | 392 | .. parsed-literal:: |
|
393 | 393 | |
|
394 | 394 | array([10, 20, 30, 1]) |
|
395 | 395 | |
|
396 | 396 | Above we created an array from an existing list; now let us now see |
|
397 | 397 | other ways in which we can create arrays, which we'll illustrate next. A |
|
398 | 398 | common need is to have an array initialized with a constant value, and |
|
399 | 399 | very often this value is 0 or 1 (suitable as starting value for additive |
|
400 | 400 | and multiplicative loops respectively); ``zeros`` creates arrays of all |
|
401 | 401 | zeros, with any desired dtype: |
|
402 | 402 | |
|
403 | 403 | In[16]: |
|
404 | 404 | |
|
405 | 405 | .. code:: python |
|
406 | 406 | |
|
407 | 407 | np.zeros(5, float) |
|
408 | 408 | |
|
409 | 409 | Out[16]: |
|
410 | 410 | |
|
411 | 411 | .. parsed-literal:: |
|
412 | 412 | |
|
413 | 413 | array([ 0., 0., 0., 0., 0.]) |
|
414 | 414 | |
|
415 | 415 | In[17]: |
|
416 | 416 | |
|
417 | 417 | .. code:: python |
|
418 | 418 | |
|
419 | 419 | np.zeros(3, int) |
|
420 | 420 | |
|
421 | 421 | Out[17]: |
|
422 | 422 | |
|
423 | 423 | .. parsed-literal:: |
|
424 | 424 | |
|
425 | 425 | array([0, 0, 0]) |
|
426 | 426 | |
|
427 | 427 | In[18]: |
|
428 | 428 | |
|
429 | 429 | .. code:: python |
|
430 | 430 | |
|
431 | 431 | np.zeros(3, complex) |
|
432 | 432 | |
|
433 | 433 | Out[18]: |
|
434 | 434 | |
|
435 | 435 | .. parsed-literal:: |
|
436 | 436 | |
|
437 | 437 | array([ 0.+0.j, 0.+0.j, 0.+0.j]) |
|
438 | 438 | |
|
439 | 439 | and similarly for ``ones``: |
|
440 | 440 | |
|
441 | 441 | In[19]: |
|
442 | 442 | |
|
443 | 443 | .. code:: python |
|
444 | 444 | |
|
445 | 445 | print '5 ones:', np.ones(5) |
|
446 | 446 | |
|
447 | 447 | .. parsed-literal:: |
|
448 | 448 | |
|
449 | 449 | 5 ones: [ 1. 1. 1. 1. 1.] |
|
450 | 450 | |
|
451 | 451 | |
|
452 | 452 | If we want an array initialized with an arbitrary value, we can create |
|
453 | 453 | an empty array and then use the fill method to put the value we want |
|
454 | 454 | into the array: |
|
455 | 455 | |
|
456 | 456 | In[20]: |
|
457 | 457 | |
|
458 | 458 | .. code:: python |
|
459 | 459 | |
|
460 | 460 | a = empty(4) |
|
461 | 461 | a.fill(5.5) |
|
462 | 462 | a |
|
463 | 463 | |
|
464 | 464 | Out[20]: |
|
465 | 465 | |
|
466 | 466 | .. parsed-literal:: |
|
467 | 467 | |
|
468 | 468 | array([ 5.5, 5.5, 5.5, 5.5]) |
|
469 | 469 | |
|
470 | 470 | Numpy also offers the ``arange`` function, which works like the builtin |
|
471 | 471 | ``range`` but returns an array instead of a list: |
|
472 | 472 | |
|
473 | 473 | In[21]: |
|
474 | 474 | |
|
475 | 475 | .. code:: python |
|
476 | 476 | |
|
477 | 477 | np.arange(5) |
|
478 | 478 | |
|
479 | 479 | Out[21]: |
|
480 | 480 | |
|
481 | 481 | .. parsed-literal:: |
|
482 | 482 | |
|
483 | 483 | array([0, 1, 2, 3, 4]) |
|
484 | 484 | |
|
485 | 485 | and the ``linspace`` and ``logspace`` functions to create linearly and |
|
486 | 486 | logarithmically-spaced grids respectively, with a fixed number of points |
|
487 | 487 | and including both ends of the specified interval: |
|
488 | 488 | |
|
489 | 489 | In[22]: |
|
490 | 490 | |
|
491 | 491 | .. code:: python |
|
492 | 492 | |
|
493 | 493 | print "A linear grid between 0 and 1:", np.linspace(0, 1, 5) |
|
494 | 494 | print "A logarithmic grid between 10**1 and 10**4: ", np.logspace(1, 4, 4) |
|
495 | 495 | |
|
496 | 496 | .. parsed-literal:: |
|
497 | 497 | |
|
498 | 498 | A linear grid between 0 and 1: [ 0. 0.25 0.5 0.75 1. ] |
|
499 | 499 | A logarithmic grid between 10**1 and 10**4: [ 10. 100. 1000. 10000.] |
|
500 | 500 | |
|
501 | 501 | |
|
502 | 502 | Finally, it is often useful to create arrays with random numbers that |
|
503 | 503 | follow a specific distribution. The ``np.random`` module contains a |
|
504 | 504 | number of functions that can be used to this effect, for example this |
|
505 | 505 | will produce an array of 5 random samples taken from a standard normal |
|
506 | 506 | distribution (0 mean and variance 1): |
|
507 | 507 | |
|
508 | 508 | In[23]: |
|
509 | 509 | |
|
510 | 510 | .. code:: python |
|
511 | 511 | |
|
512 | 512 | np.random.randn(5) |
|
513 | 513 | |
|
514 | 514 | Out[23]: |
|
515 | 515 | |
|
516 | 516 | .. parsed-literal:: |
|
517 | 517 | |
|
518 | 518 | array([-0.08633343, -0.67375434, 1.00589536, 0.87081651, 1.65597822]) |
|
519 | 519 | |
|
520 | 520 | whereas this will also give 5 samples, but from a normal distribution |
|
521 | 521 | with a mean of 10 and a variance of 3: |
|
522 | 522 | |
|
523 | 523 | In[24]: |
|
524 | 524 | |
|
525 | 525 | .. code:: python |
|
526 | 526 | |
|
527 | 527 | norm10 = np.random.normal(10, 3, 5) |
|
528 | 528 | norm10 |
|
529 | 529 | |
|
530 | 530 | Out[24]: |
|
531 | 531 | |
|
532 | 532 | .. parsed-literal:: |
|
533 | 533 | |
|
534 | 534 | array([ 8.94879575, 5.53038269, 8.24847281, 12.14944165, 11.56209294]) |
|
535 | 535 | |
|
536 | 536 | Indexing with other arrays |
|
537 | 537 | -------------------------- |
|
538 | 538 | |
|
539 | 539 | Above we saw how to index arrays with single numbers and slices, just |
|
540 | 540 | like Python lists. But arrays allow for a more sophisticated kind of |
|
541 | 541 | indexing which is very powerful: you can index an array with another |
|
542 | 542 | array, and in particular with an array of boolean values. This is |
|
543 | 543 | particluarly useful to extract information from an array that matches a |
|
544 | 544 | certain condition. |
|
545 | 545 | |
|
546 | 546 | Consider for example that in the array ``norm10`` we want to replace all |
|
547 | 547 | values above 9 with the value 0. We can do so by first finding the |
|
548 | 548 | *mask* that indicates where this condition is true or false: |
|
549 | 549 | |
|
550 | 550 | In[25]: |
|
551 | 551 | |
|
552 | 552 | .. code:: python |
|
553 | 553 | |
|
554 | 554 | mask = norm10 > 9 |
|
555 | 555 | mask |
|
556 | 556 | |
|
557 | 557 | Out[25]: |
|
558 | 558 | |
|
559 | 559 | .. parsed-literal:: |
|
560 | 560 | |
|
561 | 561 | array([False, False, False, True, True], dtype=bool) |
|
562 | 562 | |
|
563 | 563 | Now that we have this mask, we can use it to either read those values or |
|
564 | 564 | to reset them to 0: |
|
565 | 565 | |
|
566 | 566 | In[26]: |
|
567 | 567 | |
|
568 | 568 | .. code:: python |
|
569 | 569 | |
|
570 | 570 | print 'Values above 9:', norm10[mask] |
|
571 | 571 | |
|
572 | 572 | .. parsed-literal:: |
|
573 | 573 | |
|
574 | 574 | Values above 9: [ 12.14944165 11.56209294] |
|
575 | 575 | |
|
576 | 576 | |
|
577 | 577 | In[27]: |
|
578 | 578 | |
|
579 | 579 | .. code:: python |
|
580 | 580 | |
|
581 | 581 | print 'Resetting all values above 9 to 0...' |
|
582 | 582 | norm10[mask] = 0 |
|
583 | 583 | print norm10 |
|
584 | 584 | |
|
585 | 585 | .. parsed-literal:: |
|
586 | 586 | |
|
587 | 587 | Resetting all values above 9 to 0... |
|
588 | 588 | [ 8.94879575 5.53038269 8.24847281 0. 0. ] |
|
589 | 589 | |
|
590 | 590 | |
|
591 | 591 | Arrays with more than one dimension |
|
592 | 592 | ----------------------------------- |
|
593 | 593 | |
|
594 | 594 | Up until now all our examples have used one-dimensional arrays. But |
|
595 | 595 | Numpy can create arrays of aribtrary dimensions, and all the methods |
|
596 | 596 | illustrated in the previous section work with more than one dimension. |
|
597 | 597 | For example, a list of lists can be used to initialize a two dimensional |
|
598 | 598 | array: |
|
599 | 599 | |
|
600 | 600 | In[28]: |
|
601 | 601 | |
|
602 | 602 | .. code:: python |
|
603 | 603 | |
|
604 | 604 | lst2 = [[1, 2], [3, 4]] |
|
605 | 605 | arr2 = np.array([[1, 2], [3, 4]]) |
|
606 | 606 | arr2 |
|
607 | 607 | |
|
608 | 608 | Out[28]: |
|
609 | 609 | |
|
610 | 610 | .. parsed-literal:: |
|
611 | 611 | |
|
612 | 612 | array([[1, 2], |
|
613 | 613 | [3, 4]]) |
|
614 | 614 | |
|
615 | 615 | With two-dimensional arrays we start seeing the power of numpy: while a |
|
616 | 616 | nested list can be indexed using repeatedly the ``[ ]`` operator, |
|
617 | 617 | multidimensional arrays support a much more natural indexing syntax with |
|
618 | 618 | a single ``[ ]`` and a set of indices separated by commas: |
|
619 | 619 | |
|
620 | 620 | In[29]: |
|
621 | 621 | |
|
622 | 622 | .. code:: python |
|
623 | 623 | |
|
624 | 624 | print lst2[0][1] |
|
625 | 625 | print arr2[0,1] |
|
626 | 626 | |
|
627 | 627 | .. parsed-literal:: |
|
628 | 628 | |
|
629 | 629 | 2 |
|
630 | 630 | 2 |
|
631 | 631 | |
|
632 | 632 | |
|
633 | 633 | Most of the array creation functions listed above can be used with more |
|
634 | 634 | than one dimension, for example: |
|
635 | 635 | |
|
636 | 636 | In[30]: |
|
637 | 637 | |
|
638 | 638 | .. code:: python |
|
639 | 639 | |
|
640 | 640 | np.zeros((2,3)) |
|
641 | 641 | |
|
642 | 642 | Out[30]: |
|
643 | 643 | |
|
644 | 644 | .. parsed-literal:: |
|
645 | 645 | |
|
646 | 646 | array([[ 0., 0., 0.], |
|
647 | 647 | [ 0., 0., 0.]]) |
|
648 | 648 | |
|
649 | 649 | In[31]: |
|
650 | 650 | |
|
651 | 651 | .. code:: python |
|
652 | 652 | |
|
653 | 653 | np.random.normal(10, 3, (2, 4)) |
|
654 | 654 | |
|
655 | 655 | Out[31]: |
|
656 | 656 | |
|
657 | 657 | .. parsed-literal:: |
|
658 | 658 | |
|
659 | 659 | array([[ 11.26788826, 4.29619866, 11.09346496, 9.73861307], |
|
660 | 660 | [ 10.54025996, 9.5146268 , 10.80367214, 13.62204505]]) |
|
661 | 661 | |
|
662 | 662 | In fact, the shape of an array can be changed at any time, as long as |
|
663 | 663 | the total number of elements is unchanged. For example, if we want a 2x4 |
|
664 | 664 | array with numbers increasing from 0, the easiest way to create it is: |
|
665 | 665 | |
|
666 | 666 | In[32]: |
|
667 | 667 | |
|
668 | 668 | .. code:: python |
|
669 | 669 | |
|
670 | 670 | arr = np.arange(8).reshape(2,4) |
|
671 | 671 | print arr |
|
672 | 672 | |
|
673 | 673 | .. parsed-literal:: |
|
674 | 674 | |
|
675 | 675 | [[0 1 2 3] |
|
676 | 676 | [4 5 6 7]] |
|
677 | 677 | |
|
678 | 678 | |
|
679 | 679 | With multidimensional arrays, you can also use slices, and you can mix |
|
680 | 680 | and match slices and single indices in the different dimensions (using |
|
681 | 681 | the same array as above): |
|
682 | 682 | |
|
683 | 683 | In[33]: |
|
684 | 684 | |
|
685 | 685 | .. code:: python |
|
686 | 686 | |
|
687 | 687 | print 'Slicing in the second row:', arr[1, 2:4] |
|
688 | 688 | print 'All rows, third column :', arr[:, 2] |
|
689 | 689 | |
|
690 | 690 | .. parsed-literal:: |
|
691 | 691 | |
|
692 | 692 | Slicing in the second row: [6 7] |
|
693 | 693 | All rows, third column : [2 6] |
|
694 | 694 | |
|
695 | 695 | |
|
696 | 696 | If you only provide one index, then you will get an array with one less |
|
697 | 697 | dimension containing that row: |
|
698 | 698 | |
|
699 | 699 | In[34]: |
|
700 | 700 | |
|
701 | 701 | .. code:: python |
|
702 | 702 | |
|
703 | 703 | print 'First row: ', arr[0] |
|
704 | 704 | print 'Second row: ', arr[1] |
|
705 | 705 | |
|
706 | 706 | .. parsed-literal:: |
|
707 | 707 | |
|
708 | 708 | First row: [0 1 2 3] |
|
709 | 709 | Second row: [4 5 6 7] |
|
710 | 710 | |
|
711 | 711 | |
|
712 | 712 | Now that we have seen how to create arrays with more than one dimension, |
|
713 | 713 | it's a good idea to look at some of the most useful properties and |
|
714 | 714 | methods that arrays have. The following provide basic information about |
|
715 | 715 | the size, shape and data in the array: |
|
716 | 716 | |
|
717 | 717 | In[35]: |
|
718 | 718 | |
|
719 | 719 | .. code:: python |
|
720 | 720 | |
|
721 | 721 | print 'Data type :', arr.dtype |
|
722 | 722 | print 'Total number of elements :', arr.size |
|
723 | 723 | print 'Number of dimensions :', arr.ndim |
|
724 | 724 | print 'Shape (dimensionality) :', arr.shape |
|
725 | 725 | print 'Memory used (in bytes) :', arr.nbytes |
|
726 | 726 | |
|
727 | 727 | .. parsed-literal:: |
|
728 | 728 | |
|
729 | 729 | Data type : int32 |
|
730 | 730 | Total number of elements : 8 |
|
731 | 731 | Number of dimensions : 2 |
|
732 | 732 | Shape (dimensionality) : (2, 4) |
|
733 | 733 | Memory used (in bytes) : 32 |
|
734 | 734 | |
|
735 | 735 | |
|
736 | 736 | Arrays also have many useful methods, some especially useful ones are: |
|
737 | 737 | |
|
738 | 738 | In[36]: |
|
739 | 739 | |
|
740 | 740 | .. code:: python |
|
741 | 741 | |
|
742 | 742 | print 'Minimum and maximum :', arr.min(), arr.max() |
|
743 | 743 | print 'Sum and product of all elements :', arr.sum(), arr.prod() |
|
744 | 744 | print 'Mean and standard deviation :', arr.mean(), arr.std() |
|
745 | 745 | |
|
746 | 746 | .. parsed-literal:: |
|
747 | 747 | |
|
748 | 748 | Minimum and maximum : 0 7 |
|
749 | 749 | Sum and product of all elements : 28 0 |
|
750 | 750 | Mean and standard deviation : 3.5 2.29128784748 |
|
751 | 751 | |
|
752 | 752 | |
|
753 | 753 | For these methods, the above operations area all computed on all the |
|
754 | 754 | elements of the array. But for a multidimensional array, it's possible |
|
755 | 755 | to do the computation along a single dimension, by passing the ``axis`` |
|
756 | 756 | parameter; for example: |
|
757 | 757 | |
|
758 | 758 | In[37]: |
|
759 | 759 | |
|
760 | 760 | .. code:: python |
|
761 | 761 | |
|
762 | 762 | print 'For the following array:\n', arr |
|
763 | 763 | print 'The sum of elements along the rows is :', arr.sum(axis=1) |
|
764 | 764 | print 'The sum of elements along the columns is :', arr.sum(axis=0) |
|
765 | 765 | |
|
766 | 766 | .. parsed-literal:: |
|
767 | 767 | |
|
768 | 768 | For the following array: |
|
769 | 769 | [[0 1 2 3] |
|
770 | 770 | [4 5 6 7]] |
|
771 | 771 | The sum of elements along the rows is : [ 6 22] |
|
772 | 772 | The sum of elements along the columns is : [ 4 6 8 10] |
|
773 | 773 | |
|
774 | 774 | |
|
775 | 775 | As you can see in this example, the value of the ``axis`` parameter is |
|
776 | 776 | the dimension which will be *consumed* once the operation has been |
|
777 | 777 | carried out. This is why to sum along the rows we use ``axis=0``. |
|
778 | 778 | |
|
779 | 779 | This can be easily illustrated with an example that has more dimensions; |
|
780 | 780 | we create an array with 4 dimensions and shape ``(3,4,5,6)`` and sum |
|
781 | 781 | along the axis number 2 (i.e. the *third* axis, since in Python all |
|
782 | 782 | counts are 0-based). That consumes the dimension whose length was 5, |
|
783 | 783 | leaving us with a new array that has shape ``(3,4,6)``: |
|
784 | 784 | |
|
785 | 785 | In[38]: |
|
786 | 786 | |
|
787 | 787 | .. code:: python |
|
788 | 788 | |
|
789 | 789 | np.zeros((3,4,5,6)).sum(2).shape |
|
790 | 790 | |
|
791 | 791 | Out[38]: |
|
792 | 792 | |
|
793 | 793 | .. parsed-literal:: |
|
794 | 794 | |
|
795 | 795 | (3, 4, 6) |
|
796 | 796 | |
|
797 | 797 | Another widely used property of arrays is the ``.T`` attribute, which |
|
798 | 798 | allows you to access the transpose of the array: |
|
799 | 799 | |
|
800 | 800 | In[39]: |
|
801 | 801 | |
|
802 | 802 | .. code:: python |
|
803 | 803 | |
|
804 | 804 | print 'Array:\n', arr |
|
805 | 805 | print 'Transpose:\n', arr.T |
|
806 | 806 | |
|
807 | 807 | .. parsed-literal:: |
|
808 | 808 | |
|
809 | 809 | Array: |
|
810 | 810 | [[0 1 2 3] |
|
811 | 811 | [4 5 6 7]] |
|
812 | 812 | Transpose: |
|
813 | 813 | [[0 4] |
|
814 | 814 | [1 5] |
|
815 | 815 | [2 6] |
|
816 | 816 | [3 7]] |
|
817 | 817 | |
|
818 | 818 | |
|
819 | 819 | We don't have time here to look at all the methods and properties of |
|
820 | 820 | arrays, here's a complete list. Simply try exploring some of these |
|
821 | 821 | IPython to learn more, or read their description in the full Numpy |
|
822 | 822 | documentation: |
|
823 | 823 | |
|
824 | 824 | :: |
|
825 | 825 | |
|
826 | 826 | arr.T arr.copy arr.getfield arr.put arr.squeeze |
|
827 | 827 | arr.all arr.ctypes arr.imag arr.ravel arr.std |
|
828 | 828 | arr.any arr.cumprod arr.item arr.real arr.strides |
|
829 | 829 | arr.argmax arr.cumsum arr.itemset arr.repeat arr.sum |
|
830 | 830 | arr.argmin arr.data arr.itemsize arr.reshape arr.swapaxes |
|
831 | 831 | arr.argsort arr.diagonal arr.max arr.resize arr.take |
|
832 | 832 | arr.astype arr.dot arr.mean arr.round arr.tofile |
|
833 | 833 | arr.base arr.dtype arr.min arr.searchsorted arr.tolist |
|
834 | 834 | arr.byteswap arr.dump arr.nbytes arr.setasflat arr.tostring |
|
835 | 835 | arr.choose arr.dumps arr.ndim arr.setfield arr.trace |
|
836 | 836 | arr.clip arr.fill arr.newbyteorder arr.setflags arr.transpose |
|
837 | 837 | arr.compress arr.flags arr.nonzero arr.shape arr.var |
|
838 | 838 | arr.conj arr.flat arr.prod arr.size arr.view |
|
839 | 839 | arr.conjugate arr.flatten arr.ptp arr.sort |
|
840 | 840 | |
|
841 | 841 | |
|
842 | 842 | Operating with arrays |
|
843 | 843 | --------------------- |
|
844 | 844 | |
|
845 | 845 | Arrays support all regular arithmetic operators, and the numpy library |
|
846 | 846 | also contains a complete collection of basic mathematical functions that |
|
847 | 847 | operate on arrays. It is important to remember that in general, all |
|
848 | 848 | operations with arrays are applied *element-wise*, i.e., are applied to |
|
849 | 849 | all the elements of the array at the same time. Consider for example: |
|
850 | 850 | |
|
851 | 851 | In[40]: |
|
852 | 852 | |
|
853 | 853 | .. code:: python |
|
854 | 854 | |
|
855 | 855 | arr1 = np.arange(4) |
|
856 | 856 | arr2 = np.arange(10, 14) |
|
857 | 857 | print arr1, '+', arr2, '=', arr1+arr2 |
|
858 | 858 | |
|
859 | 859 | .. parsed-literal:: |
|
860 | 860 | |
|
861 | 861 | [0 1 2 3] + [10 11 12 13] = [10 12 14 16] |
|
862 | 862 | |
|
863 | 863 | |
|
864 | 864 | Importantly, you must remember that even the multiplication operator is |
|
865 | 865 | by default applied element-wise, it is *not* the matrix multiplication |
|
866 | 866 | from linear algebra (as is the case in Matlab, for example): |
|
867 | 867 | |
|
868 | 868 | In[41]: |
|
869 | 869 | |
|
870 | 870 | .. code:: python |
|
871 | 871 | |
|
872 | 872 | print arr1, '*', arr2, '=', arr1*arr2 |
|
873 | 873 | |
|
874 | 874 | .. parsed-literal:: |
|
875 | 875 | |
|
876 | 876 | [0 1 2 3] * [10 11 12 13] = [ 0 11 24 39] |
|
877 | 877 | |
|
878 | 878 | |
|
879 | 879 | While this means that in principle arrays must always match in their |
|
880 | 880 | dimensionality in order for an operation to be valid, numpy will |
|
881 | 881 | *broadcast* dimensions when possible. For example, suppose that you want |
|
882 | 882 | to add the number 1.5 to ``arr1``; the following would be a valid way to |
|
883 | 883 | do it: |
|
884 | 884 | |
|
885 | 885 | In[42]: |
|
886 | 886 | |
|
887 | 887 | .. code:: python |
|
888 | 888 | |
|
889 | 889 | arr1 + 1.5*np.ones(4) |
|
890 | 890 | |
|
891 | 891 | Out[42]: |
|
892 | 892 | |
|
893 | 893 | .. parsed-literal:: |
|
894 | 894 | |
|
895 | 895 | array([ 1.5, 2.5, 3.5, 4.5]) |
|
896 | 896 | |
|
897 | 897 | But thanks to numpy's broadcasting rules, the following is equally |
|
898 | 898 | valid: |
|
899 | 899 | |
|
900 | 900 | In[43]: |
|
901 | 901 | |
|
902 | 902 | .. code:: python |
|
903 | 903 | |
|
904 | 904 | arr1 + 1.5 |
|
905 | 905 | |
|
906 | 906 | Out[43]: |
|
907 | 907 | |
|
908 | 908 | .. parsed-literal:: |
|
909 | 909 | |
|
910 | 910 | array([ 1.5, 2.5, 3.5, 4.5]) |
|
911 | 911 | |
|
912 | 912 | In this case, numpy looked at both operands and saw that the first |
|
913 | 913 | (``arr1``) was a one-dimensional array of length 4 and the second was a |
|
914 | 914 | scalar, considered a zero-dimensional object. The broadcasting rules |
|
915 | 915 | allow numpy to: |
|
916 | 916 | |
|
917 | 917 | - *create* new dimensions of length 1 (since this doesn't change the |
|
918 | 918 | size of the array) |
|
919 | 919 | - 'stretch' a dimension of length 1 that needs to be matched to a |
|
920 | 920 | dimension of a different size. |
|
921 | 921 | |
|
922 | 922 | So in the above example, the scalar 1.5 is effectively: |
|
923 | 923 | |
|
924 | 924 | - first 'promoted' to a 1-dimensional array of length 1 |
|
925 | 925 | - then, this array is 'stretched' to length 4 to match the dimension of |
|
926 | 926 | ``arr1``. |
|
927 | 927 | |
|
928 | 928 | After these two operations are complete, the addition can proceed as now |
|
929 | 929 | both operands are one-dimensional arrays of length 4. |
|
930 | 930 | |
|
931 | 931 | This broadcasting behavior is in practice enormously powerful, |
|
932 | 932 | especially because when numpy broadcasts to create new dimensions or to |
|
933 | 933 | 'stretch' existing ones, it doesn't actually replicate the data. In the |
|
934 | 934 | example above the operation is carried *as if* the 1.5 was a 1-d array |
|
935 | 935 | with 1.5 in all of its entries, but no actual array was ever created. |
|
936 | 936 | This can save lots of memory in cases when the arrays in question are |
|
937 | 937 | large and can have significant performance implications. |
|
938 | 938 | |
|
939 | 939 | The general rule is: when operating on two arrays, NumPy compares their |
|
940 | 940 | shapes element-wise. It starts with the trailing dimensions, and works |
|
941 | 941 | its way forward, creating dimensions of length 1 as needed. Two |
|
942 | 942 | dimensions are considered compatible when |
|
943 | 943 | |
|
944 | 944 | - they are equal to begin with, or |
|
945 | 945 | - one of them is 1; in this case numpy will do the 'stretching' to make |
|
946 | 946 | them equal. |
|
947 | 947 | |
|
948 | 948 | If these conditions are not met, a |
|
949 | 949 | ``ValueError: frames are not aligned`` exception is thrown, indicating |
|
950 | 950 | that the arrays have incompatible shapes. The size of the resulting |
|
951 | 951 | array is the maximum size along each dimension of the input arrays. |
|
952 | 952 | |
|
953 | 953 | This shows how the broadcasting rules work in several dimensions: |
|
954 | 954 | |
|
955 | 955 | In[44]: |
|
956 | 956 | |
|
957 | 957 | .. code:: python |
|
958 | 958 | |
|
959 | 959 | b = np.array([2, 3, 4, 5]) |
|
960 | 960 | print arr, '\n\n+', b , '\n----------------\n', arr + b |
|
961 | 961 | |
|
962 | 962 | .. parsed-literal:: |
|
963 | 963 | |
|
964 | 964 | [[0 1 2 3] |
|
965 | 965 | [4 5 6 7]] |
|
966 | 966 | |
|
967 | 967 | + [2 3 4 5] |
|
968 | 968 | ---------------- |
|
969 | 969 | [[ 2 4 6 8] |
|
970 | 970 | [ 6 8 10 12]] |
|
971 | 971 | |
|
972 | 972 | |
|
973 | 973 | Now, how could you use broadcasting to say add ``[4, 6]`` along the rows |
|
974 | 974 | to ``arr`` above? Simply performing the direct addition will produce the |
|
975 | 975 | error we previously mentioned: |
|
976 | 976 | |
|
977 | 977 | In[45]: |
|
978 | 978 | |
|
979 | 979 | .. code:: python |
|
980 | 980 | |
|
981 | 981 | c = np.array([4, 6]) |
|
982 | 982 | arr + c |
|
983 | 983 | |
|
984 | 984 | :: |
|
985 | 985 | |
|
986 | 986 | --------------------------------------------------------------------------- |
|
987 | 987 | ValueError Traceback (most recent call last) |
|
988 | 988 | /home/fperez/teach/book-math-labtool/<ipython-input-45-62aa20ac1980> in <module>() |
|
989 | 989 | 1 c = np.array([4, 6]) |
|
990 | 990 | ----> 2 arr + c |
|
991 | 991 | |
|
992 | 992 | ValueError: operands could not be broadcast together with shapes (2,4) (2) |
|
993 | 993 | |
|
994 | 994 | According to the rules above, the array ``c`` would need to have a |
|
995 | 995 | *trailing* dimension of 1 for the broadcasting to work. It turns out |
|
996 | 996 | that numpy allows you to 'inject' new dimensions anywhere into an array |
|
997 | 997 | on the fly, by indexing it with the special object ``np.newaxis``: |
|
998 | 998 | |
|
999 | 999 | In[46]: |
|
1000 | 1000 | |
|
1001 | 1001 | .. code:: python |
|
1002 | 1002 | |
|
1003 | 1003 | (c[:, np.newaxis]).shape |
|
1004 | 1004 | |
|
1005 | 1005 | Out[46]: |
|
1006 | 1006 | |
|
1007 | 1007 | .. parsed-literal:: |
|
1008 | 1008 | |
|
1009 | 1009 | (2, 1) |
|
1010 | 1010 | |
|
1011 | 1011 | This is exactly what we need, and indeed it works: |
|
1012 | 1012 | |
|
1013 | 1013 | In[47]: |
|
1014 | 1014 | |
|
1015 | 1015 | .. code:: python |
|
1016 | 1016 | |
|
1017 | 1017 | arr + c[:, np.newaxis] |
|
1018 | 1018 | |
|
1019 | 1019 | Out[47]: |
|
1020 | 1020 | |
|
1021 | 1021 | .. parsed-literal:: |
|
1022 | 1022 | |
|
1023 | 1023 | array([[ 4, 5, 6, 7], |
|
1024 | 1024 | [10, 11, 12, 13]]) |
|
1025 | 1025 | |
|
1026 | 1026 | For the full broadcasting rules, please see the official Numpy docs, |
|
1027 | 1027 | which describe them in detail and with more complex examples. |
|
1028 | 1028 | |
|
1029 | 1029 | As we mentioned before, Numpy ships with a full complement of |
|
1030 | 1030 | mathematical functions that work on entire arrays, including logarithms, |
|
1031 | 1031 | exponentials, trigonometric and hyperbolic trigonometric functions, etc. |
|
1032 | 1032 | Furthermore, scipy ships a rich special function library in the |
|
1033 | 1033 | ``scipy.special`` module that includes Bessel, Airy, Fresnel, Laguerre |
|
1034 | 1034 | and other classical special functions. For example, sampling the sine |
|
1035 | 1035 | function at 100 points between :math:`0` and :math:`2\pi` is as simple |
|
1036 | 1036 | as: |
|
1037 | 1037 | |
|
1038 | 1038 | In[48]: |
|
1039 | 1039 | |
|
1040 | 1040 | .. code:: python |
|
1041 | 1041 | |
|
1042 | 1042 | x = np.linspace(0, 2*np.pi, 100) |
|
1043 | 1043 | y = np.sin(x) |
|
1044 | 1044 | |
|
1045 | 1045 | Linear algebra in numpy |
|
1046 | 1046 | ----------------------- |
|
1047 | 1047 | |
|
1048 | 1048 | Numpy ships with a basic linear algebra library, and all arrays have a |
|
1049 | 1049 | ``dot`` method whose behavior is that of the scalar dot product when its |
|
1050 | 1050 | arguments are vectors (one-dimensional arrays) and the traditional |
|
1051 | 1051 | matrix multiplication when one or both of its arguments are |
|
1052 | 1052 | two-dimensional arrays: |
|
1053 | 1053 | |
|
1054 | 1054 | In[49]: |
|
1055 | 1055 | |
|
1056 | 1056 | .. code:: python |
|
1057 | 1057 | |
|
1058 | 1058 | v1 = np.array([2, 3, 4]) |
|
1059 | 1059 | v2 = np.array([1, 0, 1]) |
|
1060 | 1060 | print v1, '.', v2, '=', v1.dot(v2) |
|
1061 | 1061 | |
|
1062 | 1062 | .. parsed-literal:: |
|
1063 | 1063 | |
|
1064 | 1064 | [2 3 4] . [1 0 1] = 6 |
|
1065 | 1065 | |
|
1066 | 1066 | |
|
1067 | 1067 | Here is a regular matrix-vector multiplication, note that the array |
|
1068 | 1068 | ``v1`` should be viewed as a *column* vector in traditional linear |
|
1069 | 1069 | algebra notation; numpy makes no distinction between row and column |
|
1070 | 1070 | vectors and simply verifies that the dimensions match the required rules |
|
1071 | 1071 | of matrix multiplication, in this case we have a :math:`2 \times 3` |
|
1072 | 1072 | matrix multiplied by a 3-vector, which produces a 2-vector: |
|
1073 | 1073 | |
|
1074 | 1074 | In[50]: |
|
1075 | 1075 | |
|
1076 | 1076 | .. code:: python |
|
1077 | 1077 | |
|
1078 | 1078 | A = np.arange(6).reshape(2, 3) |
|
1079 | 1079 | print A, 'x', v1, '=', A.dot(v1) |
|
1080 | 1080 | |
|
1081 | 1081 | .. parsed-literal:: |
|
1082 | 1082 | |
|
1083 | 1083 | [[0 1 2] |
|
1084 | 1084 | [3 4 5]] x [2 3 4] = [11 38] |
|
1085 | 1085 | |
|
1086 | 1086 | |
|
1087 | 1087 | For matrix-matrix multiplication, the same dimension-matching rules must |
|
1088 | 1088 | be satisfied, e.g. consider the difference between :math:`A \times A^T`: |
|
1089 | 1089 | |
|
1090 | 1090 | In[51]: |
|
1091 | 1091 | |
|
1092 | 1092 | .. code:: python |
|
1093 | 1093 | |
|
1094 | 1094 | print A.dot(A.T) |
|
1095 | 1095 | |
|
1096 | 1096 | .. parsed-literal:: |
|
1097 | 1097 | |
|
1098 | 1098 | [[ 5 14] |
|
1099 | 1099 | [14 50]] |
|
1100 | 1100 | |
|
1101 | 1101 | |
|
1102 | 1102 | and :math:`A^T \times A`: |
|
1103 | 1103 | |
|
1104 | 1104 | In[52]: |
|
1105 | 1105 | |
|
1106 | 1106 | .. code:: python |
|
1107 | 1107 | |
|
1108 | 1108 | print A.T.dot(A) |
|
1109 | 1109 | |
|
1110 | 1110 | .. parsed-literal:: |
|
1111 | 1111 | |
|
1112 | 1112 | [[ 9 12 15] |
|
1113 | 1113 | [12 17 22] |
|
1114 | 1114 | [15 22 29]] |
|
1115 | 1115 | |
|
1116 | 1116 | |
|
1117 | 1117 | Furthermore, the ``numpy.linalg`` module includes additional |
|
1118 | 1118 | functionality such as determinants, matrix norms, Cholesky, eigenvalue |
|
1119 | 1119 | and singular value decompositions, etc. For even more linear algebra |
|
1120 | 1120 | tools, ``scipy.linalg`` contains the majority of the tools in the |
|
1121 | 1121 | classic LAPACK libraries as well as functions to operate on sparse |
|
1122 | 1122 | matrices. We refer the reader to the Numpy and Scipy documentations for |
|
1123 | 1123 | additional details on these. |
|
1124 | 1124 | |
|
1125 | 1125 | Reading and writing arrays to disk |
|
1126 | 1126 | ---------------------------------- |
|
1127 | 1127 | |
|
1128 | 1128 | Numpy lets you read and write arrays into files in a number of ways. In |
|
1129 | 1129 | order to use these tools well, it is critical to understand the |
|
1130 | 1130 | difference between a *text* and a *binary* file containing numerical |
|
1131 | 1131 | data. In a text file, the number :math:`\pi` could be written as |
|
1132 | 1132 | "3.141592653589793", for example: a string of digits that a human can |
|
1133 | 1133 | read, with in this case 15 decimal digits. In contrast, that same number |
|
1134 | 1134 | written to a binary file would be encoded as 8 characters (bytes) that |
|
1135 | 1135 | are not readable by a human but which contain the exact same data that |
|
1136 | 1136 | the variable ``pi`` had in the computer's memory. |
|
1137 | 1137 | |
|
1138 | 1138 | The tradeoffs between the two modes are thus: |
|
1139 | 1139 | |
|
1140 | 1140 | - Text mode: occupies more space, precision can be lost (if not all |
|
1141 | 1141 | digits are written to disk), but is readable and editable by hand |
|
1142 | 1142 | with a text editor. Can *only* be used for one- and two-dimensional |
|
1143 | 1143 | arrays. |
|
1144 | 1144 | |
|
1145 | 1145 | - Binary mode: compact and exact representation of the data in memory, |
|
1146 | 1146 | can't be read or edited by hand. Arrays of any size and |
|
1147 | 1147 | dimensionality can be saved and read without loss of information. |
|
1148 | 1148 | |
|
1149 | 1149 | First, let's see how to read and write arrays in text mode. The |
|
1150 | 1150 | ``np.savetxt`` function saves an array to a text file, with options to |
|
1151 | 1151 | control the precision, separators and even adding a header: |
|
1152 | 1152 | |
|
1153 | 1153 | In[53]: |
|
1154 | 1154 | |
|
1155 | 1155 | .. code:: python |
|
1156 | 1156 | |
|
1157 | 1157 | arr = np.arange(10).reshape(2, 5) |
|
1158 | 1158 | np.savetxt('test.out', arr, fmt='%.2e', header="My dataset") |
|
1159 | 1159 | !cat test.out |
|
1160 | 1160 | |
|
1161 | 1161 | .. parsed-literal:: |
|
1162 | 1162 | |
|
1163 | 1163 | # My dataset |
|
1164 | 1164 | 0.00e+00 1.00e+00 2.00e+00 3.00e+00 4.00e+00 |
|
1165 | 1165 | 5.00e+00 6.00e+00 7.00e+00 8.00e+00 9.00e+00 |
|
1166 | 1166 | |
|
1167 | 1167 | |
|
1168 | 1168 | And this same type of file can then be read with the matching |
|
1169 | 1169 | ``np.loadtxt`` function: |
|
1170 | 1170 | |
|
1171 | 1171 | In[54]: |
|
1172 | 1172 | |
|
1173 | 1173 | .. code:: python |
|
1174 | 1174 | |
|
1175 | 1175 | arr2 = np.loadtxt('test.out') |
|
1176 | 1176 | print arr2 |
|
1177 | 1177 | |
|
1178 | 1178 | .. parsed-literal:: |
|
1179 | 1179 | |
|
1180 | 1180 | [[ 0. 1. 2. 3. 4.] |
|
1181 | 1181 | [ 5. 6. 7. 8. 9.]] |
|
1182 | 1182 | |
|
1183 | 1183 | |
|
1184 | 1184 | For binary data, Numpy provides the ``np.save`` and ``np.savez`` |
|
1185 | 1185 | routines. The first saves a single array to a file with ``.npy`` |
|
1186 | 1186 | extension, while the latter can be used to save a *group* of arrays into |
|
1187 | 1187 | a single file with ``.npz`` extension. The files created with these |
|
1188 | 1188 | routines can then be read with the ``np.load`` function. |
|
1189 | 1189 | |
|
1190 | 1190 | Let us first see how to use the simpler ``np.save`` function to save a |
|
1191 | 1191 | single array: |
|
1192 | 1192 | |
|
1193 | 1193 | In[55]: |
|
1194 | 1194 | |
|
1195 | 1195 | .. code:: python |
|
1196 | 1196 | |
|
1197 | 1197 | np.save('test.npy', arr2) |
|
1198 | 1198 | # Now we read this back |
|
1199 | 1199 | arr2n = np.load('test.npy') |
|
1200 | 1200 | # Let's see if any element is non-zero in the difference. |
|
1201 | 1201 | # A value of True would be a problem. |
|
1202 | 1202 | print 'Any differences?', np.any(arr2-arr2n) |
|
1203 | 1203 | |
|
1204 | 1204 | .. parsed-literal:: |
|
1205 | 1205 | |
|
1206 | 1206 | Any differences? False |
|
1207 | 1207 | |
|
1208 | 1208 | |
|
1209 | 1209 | Now let us see how the ``np.savez`` function works. You give it a |
|
1210 | 1210 | filename and either a sequence of arrays or a set of keywords. In the |
|
1211 | 1211 | first mode, the function will auotmatically name the saved arrays in the |
|
1212 | 1212 | archive as ``arr_0``, ``arr_1``, etc: |
|
1213 | 1213 | |
|
1214 | 1214 | In[56]: |
|
1215 | 1215 | |
|
1216 | 1216 | .. code:: python |
|
1217 | 1217 | |
|
1218 | 1218 | np.savez('test.npz', arr, arr2) |
|
1219 | 1219 | arrays = np.load('test.npz') |
|
1220 | 1220 | arrays.files |
|
1221 | 1221 | |
|
1222 | 1222 | Out[56]: |
|
1223 | 1223 | |
|
1224 | 1224 | .. parsed-literal:: |
|
1225 | 1225 | |
|
1226 | 1226 | ['arr_1', 'arr_0'] |
|
1227 | 1227 | |
|
1228 | 1228 | Alternatively, we can explicitly choose how to name the arrays we save: |
|
1229 | 1229 | |
|
1230 | 1230 | In[57]: |
|
1231 | 1231 | |
|
1232 | 1232 | .. code:: python |
|
1233 | 1233 | |
|
1234 | 1234 | np.savez('test.npz', array1=arr, array2=arr2) |
|
1235 | 1235 | arrays = np.load('test.npz') |
|
1236 | 1236 | arrays.files |
|
1237 | 1237 | |
|
1238 | 1238 | Out[57]: |
|
1239 | 1239 | |
|
1240 | 1240 | .. parsed-literal:: |
|
1241 | 1241 | |
|
1242 | 1242 | ['array2', 'array1'] |
|
1243 | 1243 | |
|
1244 | 1244 | The object returned by ``np.load`` from an ``.npz`` file works like a |
|
1245 | 1245 | dictionary, though you can also access its constituent files by |
|
1246 | 1246 | attribute using its special ``.f`` field; this is best illustrated with |
|
1247 | 1247 | an example with the ``arrays`` object from above: |
|
1248 | 1248 | |
|
1249 | 1249 | In[58]: |
|
1250 | 1250 | |
|
1251 | 1251 | .. code:: python |
|
1252 | 1252 | |
|
1253 | 1253 | print 'First row of first array:', arrays['array1'][0] |
|
1254 | 1254 | # This is an equivalent way to get the same field |
|
1255 | 1255 | print 'First row of first array:', arrays.f.array1[0] |
|
1256 | 1256 | |
|
1257 | 1257 | .. parsed-literal:: |
|
1258 | 1258 | |
|
1259 | 1259 | First row of first array: [0 1 2 3 4] |
|
1260 | 1260 | First row of first array: [0 1 2 3 4] |
|
1261 | 1261 | |
|
1262 | 1262 | |
|
1263 | 1263 | This ``.npz`` format is a very convenient way to package compactly and |
|
1264 | 1264 | without loss of information, into a single file, a group of related |
|
1265 | 1265 | arrays that pertain to a specific problem. At some point, however, the |
|
1266 | 1266 | complexity of your dataset may be such that the optimal approach is to |
|
1267 | 1267 | use one of the standard formats in scientific data processing that have |
|
1268 | 1268 | been designed to handle complex datasets, such as NetCDF or HDF5. |
|
1269 | 1269 | |
|
1270 | 1270 | Fortunately, there are tools for manipulating these formats in Python, |
|
1271 | 1271 | and for storing data in other ways such as databases. A complete |
|
1272 | 1272 | discussion of the possibilities is beyond the scope of this discussion, |
|
1273 | 1273 | but of particular interest for scientific users we at least mention the |
|
1274 | 1274 | following: |
|
1275 | 1275 | |
|
1276 | 1276 | - The ``scipy.io`` module contains routines to read and write Matlab |
|
1277 | 1277 | files in ``.mat`` format and files in the NetCDF format that is |
|
1278 | 1278 | widely used in certain scientific disciplines. |
|
1279 | 1279 | |
|
1280 | 1280 | - For manipulating files in the HDF5 format, there are two excellent |
|
1281 | 1281 | options in Python: The PyTables project offers a high-level, object |
|
1282 | 1282 | oriented approach to manipulating HDF5 datasets, while the h5py |
|
1283 | 1283 | project offers a more direct mapping to the standard HDF5 library |
|
1284 | 1284 | interface. Both are excellent tools; if you need to work with HDF5 |
|
1285 | 1285 | datasets you should read some of their documentation and examples and |
|
1286 | 1286 | decide which approach is a better match for your needs. |
|
1287 | 1287 | |
|
1288 | 1288 | |
|
1289 | 1289 | |
|
1290 | 1290 | High quality data visualization with Matplotlib |
|
1291 | 1291 | =============================================== |
|
1292 | 1292 | |
|
1293 | 1293 | The `matplotlib <http://matplotlib.sf.net>`_ library is a powerful tool |
|
1294 | 1294 | capable of producing complex publication-quality figures with fine |
|
1295 | 1295 | layout control in two and three dimensions; here we will only provide a |
|
1296 | 1296 | minimal self-contained introduction to its usage that covers the |
|
1297 | 1297 | functionality needed for the rest of the book. We encourage the reader |
|
1298 | 1298 | to read the tutorials included with the matplotlib documentation as well |
|
1299 | 1299 | as to browse its extensive gallery of examples that include source code. |
|
1300 | 1300 | |
|
1301 | 1301 | Just as we typically use the shorthand ``np`` for Numpy, we will use |
|
1302 | 1302 | ``plt`` for the ``matplotlib.pyplot`` module where the easy-to-use |
|
1303 | 1303 | plotting functions reside (the library contains a rich object-oriented |
|
1304 | 1304 | architecture that we don't have the space to discuss here): |
|
1305 | 1305 | |
|
1306 | 1306 | In[59]: |
|
1307 | 1307 | |
|
1308 | 1308 | .. code:: python |
|
1309 | 1309 | |
|
1310 | 1310 | import matplotlib.pyplot as plt |
|
1311 | 1311 | |
|
1312 | 1312 | The most frequently used function is simply called ``plot``, here is how |
|
1313 | 1313 | you can make a simple plot of :math:`\sin(x)` for |
|
1314 | 1314 | :math:`x \in [0, 2\pi]` with labels and a grid (we use the semicolon in |
|
1315 | 1315 | the last line to suppress the display of some information that is |
|
1316 | 1316 | unnecessary right now): |
|
1317 | 1317 | |
|
1318 | 1318 | In[60]: |
|
1319 | 1319 | |
|
1320 | 1320 | .. code:: python |
|
1321 | 1321 | |
|
1322 | 1322 | x = np.linspace(0, 2*np.pi) |
|
1323 | 1323 | y = np.sin(x) |
|
1324 | 1324 | plt.plot(x,y, label='sin(x)') |
|
1325 | 1325 | plt.legend() |
|
1326 | 1326 | plt.grid() |
|
1327 | 1327 | plt.title('Harmonic') |
|
1328 | 1328 | plt.xlabel('x') |
|
1329 | 1329 | plt.ylabel('y'); |
|
1330 | 1330 | |
|
1331 |
.. image:: tests/ipynbref/IntroNumPy |
|
|
1331 | .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_01.svg | |
|
1332 | 1332 | |
|
1333 | 1333 | You can control the style, color and other properties of the markers, |
|
1334 | 1334 | for example: |
|
1335 | 1335 | |
|
1336 | 1336 | In[61]: |
|
1337 | 1337 | |
|
1338 | 1338 | .. code:: python |
|
1339 | 1339 | |
|
1340 | 1340 | plt.plot(x, y, linewidth=2); |
|
1341 | 1341 | |
|
1342 |
.. image:: tests/ipynbref/IntroNumPy |
|
|
1342 | .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_02.svg | |
|
1343 | 1343 | |
|
1344 | 1344 | In[62]: |
|
1345 | 1345 | |
|
1346 | 1346 | .. code:: python |
|
1347 | 1347 | |
|
1348 | 1348 | plt.plot(x, y, 'o', markersize=5, color='r'); |
|
1349 | 1349 | |
|
1350 |
.. image:: tests/ipynbref/IntroNumPy |
|
|
1350 | .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_03.svg | |
|
1351 | 1351 | |
|
1352 | 1352 | We will now see how to create a few other common plot types, such as a |
|
1353 | 1353 | simple error plot: |
|
1354 | 1354 | |
|
1355 | 1355 | In[63]: |
|
1356 | 1356 | |
|
1357 | 1357 | .. code:: python |
|
1358 | 1358 | |
|
1359 | 1359 | # example data |
|
1360 | 1360 | x = np.arange(0.1, 4, 0.5) |
|
1361 | 1361 | y = np.exp(-x) |
|
1362 | 1362 | |
|
1363 | 1363 | # example variable error bar values |
|
1364 | 1364 | yerr = 0.1 + 0.2*np.sqrt(x) |
|
1365 | 1365 | xerr = 0.1 + yerr |
|
1366 | 1366 | |
|
1367 | 1367 | # First illustrate basic pyplot interface, using defaults where possible. |
|
1368 | 1368 | plt.figure() |
|
1369 | 1369 | plt.errorbar(x, y, xerr=0.2, yerr=0.4) |
|
1370 | 1370 | plt.title("Simplest errorbars, 0.2 in x, 0.4 in y"); |
|
1371 | 1371 | |
|
1372 |
.. image:: tests/ipynbref/IntroNumPy |
|
|
1372 | .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_04.svg | |
|
1373 | 1373 | |
|
1374 | 1374 | A simple log plot |
|
1375 | 1375 | |
|
1376 | 1376 | In[64]: |
|
1377 | 1377 | |
|
1378 | 1378 | .. code:: python |
|
1379 | 1379 | |
|
1380 | 1380 | x = np.linspace(-5, 5) |
|
1381 | 1381 | y = np.exp(-x**2) |
|
1382 | 1382 | plt.semilogy(x, y); |
|
1383 | 1383 | |
|
1384 |
.. image:: tests/ipynbref/IntroNumPy |
|
|
1384 | .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_05.svg | |
|
1385 | 1385 | |
|
1386 | 1386 | A histogram annotated with text inside the plot, using the ``text`` |
|
1387 | 1387 | function: |
|
1388 | 1388 | |
|
1389 | 1389 | In[65]: |
|
1390 | 1390 | |
|
1391 | 1391 | .. code:: python |
|
1392 | 1392 | |
|
1393 | 1393 | mu, sigma = 100, 15 |
|
1394 | 1394 | x = mu + sigma * np.random.randn(10000) |
|
1395 | 1395 | |
|
1396 | 1396 | # the histogram of the data |
|
1397 | 1397 | n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75) |
|
1398 | 1398 | |
|
1399 | 1399 | plt.xlabel('Smarts') |
|
1400 | 1400 | plt.ylabel('Probability') |
|
1401 | 1401 | plt.title('Histogram of IQ') |
|
1402 | 1402 | # This will put a text fragment at the position given: |
|
1403 | 1403 | plt.text(55, .027, r'$\mu=100,\ \sigma=15$', fontsize=14) |
|
1404 | 1404 | plt.axis([40, 160, 0, 0.03]) |
|
1405 | 1405 | plt.grid(True) |
|
1406 | 1406 | |
|
1407 |
.. image:: tests/ipynbref/IntroNumPy |
|
|
1407 | .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_06.svg | |
|
1408 | 1408 | |
|
1409 | 1409 | Image display |
|
1410 | 1410 | ------------- |
|
1411 | 1411 | |
|
1412 | 1412 | The ``imshow`` command can display single or multi-channel images. A |
|
1413 | 1413 | simple array of random numbers, plotted in grayscale: |
|
1414 | 1414 | |
|
1415 | 1415 | In[66]: |
|
1416 | 1416 | |
|
1417 | 1417 | .. code:: python |
|
1418 | 1418 | |
|
1419 | 1419 | from matplotlib import cm |
|
1420 | 1420 | plt.imshow(np.random.rand(5, 10), cmap=cm.gray, interpolation='nearest'); |
|
1421 | 1421 | |
|
1422 |
.. image:: tests/ipynbref/IntroNumPy |
|
|
1422 | .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_07.svg | |
|
1423 | 1423 | |
|
1424 | 1424 | A real photograph is a multichannel image, ``imshow`` interprets it |
|
1425 | 1425 | correctly: |
|
1426 | 1426 | |
|
1427 | 1427 | In[67]: |
|
1428 | 1428 | |
|
1429 | 1429 | .. code:: python |
|
1430 | 1430 | |
|
1431 | 1431 | img = plt.imread('stinkbug.png') |
|
1432 | 1432 | print 'Dimensions of the array img:', img.shape |
|
1433 | 1433 | plt.imshow(img); |
|
1434 | 1434 | |
|
1435 | 1435 | .. parsed-literal:: |
|
1436 | 1436 | |
|
1437 | 1437 | Dimensions of the array img: (375, 500, 3) |
|
1438 | 1438 | |
|
1439 | 1439 | |
|
1440 |
.. image:: tests/ipynbref/IntroNumPy |
|
|
1440 | .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_08.svg | |
|
1441 | 1441 | |
|
1442 | 1442 | Simple 3d plotting with matplotlib |
|
1443 | 1443 | ---------------------------------- |
|
1444 | 1444 | |
|
1445 | 1445 | Note that you must execute at least once in your session: |
|
1446 | 1446 | |
|
1447 | 1447 | In[68]: |
|
1448 | 1448 | |
|
1449 | 1449 | .. code:: python |
|
1450 | 1450 | |
|
1451 | 1451 | from mpl_toolkits.mplot3d import Axes3D |
|
1452 | 1452 | |
|
1453 | 1453 | One this has been done, you can create 3d axes with the |
|
1454 | 1454 | ``projection='3d'`` keyword to ``add_subplot``: |
|
1455 | 1455 | |
|
1456 | 1456 | :: |
|
1457 | 1457 | |
|
1458 | 1458 | fig = plt.figure() |
|
1459 | 1459 | fig.add_subplot(<other arguments here>, projection='3d') |
|
1460 | 1460 | |
|
1461 | 1461 | |
|
1462 | 1462 | A simple surface plot: |
|
1463 | 1463 | |
|
1464 | 1464 | In[72]: |
|
1465 | 1465 | |
|
1466 | 1466 | .. code:: python |
|
1467 | 1467 | |
|
1468 | 1468 | from mpl_toolkits.mplot3d.axes3d import Axes3D |
|
1469 | 1469 | from matplotlib import cm |
|
1470 | 1470 | |
|
1471 | 1471 | fig = plt.figure() |
|
1472 | 1472 | ax = fig.add_subplot(1, 1, 1, projection='3d') |
|
1473 | 1473 | X = np.arange(-5, 5, 0.25) |
|
1474 | 1474 | Y = np.arange(-5, 5, 0.25) |
|
1475 | 1475 | X, Y = np.meshgrid(X, Y) |
|
1476 | 1476 | R = np.sqrt(X**2 + Y**2) |
|
1477 | 1477 | Z = np.sin(R) |
|
1478 | 1478 | surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet, |
|
1479 | 1479 | linewidth=0, antialiased=False) |
|
1480 | 1480 | ax.set_zlim3d(-1.01, 1.01); |
|
1481 | 1481 | |
|
1482 |
.. image:: tests/ipynbref/IntroNumPy |
|
|
1482 | .. image:: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_09.svg | |
|
1483 | 1483 | |
|
1484 | 1484 | IPython: a powerful interactive environment |
|
1485 | 1485 | =========================================== |
|
1486 | 1486 | |
|
1487 | 1487 | A key component of the everyday workflow of most scientific computing |
|
1488 | 1488 | environments is a good interactive environment, that is, a system in |
|
1489 | 1489 | which you can execute small amounts of code and view the results |
|
1490 | 1490 | immediately, combining both printing out data and opening graphical |
|
1491 | 1491 | visualizations. All modern systems for scientific computing, commercial |
|
1492 | 1492 | and open source, include such functionality. |
|
1493 | 1493 | |
|
1494 | 1494 | Out of the box, Python also offers a simple interactive shell with very |
|
1495 | 1495 | limited capabilities. But just like the scientific community built Numpy |
|
1496 | 1496 | to provide arrays suited for scientific work (since Pytyhon's lists |
|
1497 | 1497 | aren't optimal for this task), it has also developed an interactive |
|
1498 | 1498 | environment much more sophisticated than the built-in one. The `IPython |
|
1499 | 1499 | project <http://ipython.org>`_ offers a set of tools to make productive |
|
1500 | 1500 | use of the Python language, all the while working interactively and with |
|
1501 | 1501 | immedate feedback on your results. The basic tools that IPython provides |
|
1502 | 1502 | are: |
|
1503 | 1503 | |
|
1504 | 1504 | 1. A powerful terminal shell, with many features designed to increase |
|
1505 | 1505 | the fluidity and productivity of everyday scientific workflows, |
|
1506 | 1506 | including: |
|
1507 | 1507 | |
|
1508 | 1508 | - rich introspection of all objects and variables including easy |
|
1509 | 1509 | access to the source code of any function |
|
1510 | 1510 | - powerful and extensible tab completion of variables and filenames, |
|
1511 | 1511 | - tight integration with matplotlib, supporting interactive figures |
|
1512 | 1512 | that don't block the terminal, |
|
1513 | 1513 | - direct access to the filesystem and underlying operating system, |
|
1514 | 1514 | - an extensible system for shell-like commands called 'magics' that |
|
1515 | 1515 | reduce the work needed to perform many common tasks, |
|
1516 | 1516 | - tools for easily running, timing, profiling and debugging your |
|
1517 | 1517 | codes, |
|
1518 | 1518 | - syntax highlighted error messages with much more detail than the |
|
1519 | 1519 | default Python ones, |
|
1520 | 1520 | - logging and access to all previous history of inputs, including |
|
1521 | 1521 | across sessions |
|
1522 | 1522 | |
|
1523 | 1523 | 2. A Qt console that provides the look and feel of a terminal, but adds |
|
1524 | 1524 | support for inline figures, graphical calltips, a persistent session |
|
1525 | 1525 | that can survive crashes (even segfaults) of the kernel process, and |
|
1526 | 1526 | more. |
|
1527 | 1527 | |
|
1528 | 1528 | 3. A web-based notebook that can execute code and also contain rich text |
|
1529 | 1529 | and figures, mathematical equations and arbitrary HTML. This notebook |
|
1530 | 1530 | presents a document-like view with cells where code is executed but |
|
1531 | 1531 | that can be edited in-place, reordered, mixed with explanatory text |
|
1532 | 1532 | and figures, etc. |
|
1533 | 1533 | |
|
1534 | 1534 | 4. A high-performance, low-latency system for parallel computing that |
|
1535 | 1535 | supports the control of a cluster of IPython engines communicating |
|
1536 | 1536 | over a network, with optimizations that minimize unnecessary copying |
|
1537 | 1537 | of large objects (especially numpy arrays). |
|
1538 | 1538 | |
|
1539 | 1539 | We will now discuss the highlights of the tools 1-3 above so that you |
|
1540 | 1540 | can make them an effective part of your workflow. The topic of parallel |
|
1541 | 1541 | computing is beyond the scope of this document, but we encourage you to |
|
1542 | 1542 | read the extensive |
|
1543 | 1543 | `documentation <http://ipython.org/ipython-doc/rel-0.12.1/parallel/index.html>`_ |
|
1544 | 1544 | and `tutorials <http://minrk.github.com/scipy-tutorial-2011/>`_ on this |
|
1545 | 1545 | available on the IPython website. |
|
1546 | 1546 | |
|
1547 | 1547 | The IPython terminal |
|
1548 | 1548 | -------------------- |
|
1549 | 1549 | |
|
1550 | 1550 | You can start IPython at the terminal simply by typing: |
|
1551 | 1551 | |
|
1552 | 1552 | :: |
|
1553 | 1553 | |
|
1554 | 1554 | $ ipython |
|
1555 | 1555 | |
|
1556 | 1556 | which will provide you some basic information about how to get started |
|
1557 | 1557 | and will then open a prompt labeled ``In [1]:`` for you to start typing. |
|
1558 | 1558 | Here we type :math:`2^{64}` and Python computes the result for us in |
|
1559 | 1559 | exact arithmetic, returning it as ``Out[1]``: |
|
1560 | 1560 | |
|
1561 | 1561 | :: |
|
1562 | 1562 | |
|
1563 | 1563 | $ ipython |
|
1564 | 1564 | Python 2.7.2+ (default, Oct 4 2011, 20:03:08) |
|
1565 | 1565 | Type "copyright", "credits" or "license" for more information. |
|
1566 | 1566 | |
|
1567 | 1567 | IPython 0.13.dev -- An enhanced Interactive Python. |
|
1568 | 1568 | ? -> Introduction and overview of IPython's features. |
|
1569 | 1569 | %quickref -> Quick reference. |
|
1570 | 1570 | help -> Python's own help system. |
|
1571 | 1571 | object? -> Details about 'object', use 'object??' for extra details. |
|
1572 | 1572 | |
|
1573 | 1573 | In [1]: 2**64 |
|
1574 | 1574 | Out[1]: 18446744073709551616L |
|
1575 | 1575 | |
|
1576 | 1576 | The first thing you should know about IPython is that all your inputs |
|
1577 | 1577 | and outputs are saved. There are two variables named ``In`` and ``Out`` |
|
1578 | 1578 | which are filled as you work with your results. Furthermore, all outputs |
|
1579 | 1579 | are also saved to auto-created variables of the form ``_NN`` where |
|
1580 | 1580 | ``NN`` is the prompt number, and inputs to ``_iNN``. This allows you to |
|
1581 | 1581 | recover quickly the result of a prior computation by referring to its |
|
1582 | 1582 | number even if you forgot to store it as a variable. For example, later |
|
1583 | 1583 | on in the above session you can do: |
|
1584 | 1584 | |
|
1585 | 1585 | :: |
|
1586 | 1586 | |
|
1587 | 1587 | In [6]: print _1 |
|
1588 | 1588 | 18446744073709551616 |
|
1589 | 1589 | |
|
1590 | 1590 | |
|
1591 | 1591 | We strongly recommend that you take a few minutes to read at least the |
|
1592 | 1592 | basic introduction provided by the ``?`` command, and keep in mind that |
|
1593 | 1593 | the ``%quickref`` command at all times can be used as a quick reference |
|
1594 | 1594 | "cheat sheet" of the most frequently used features of IPython. |
|
1595 | 1595 | |
|
1596 | 1596 | At the IPython prompt, any valid Python code that you type will be |
|
1597 | 1597 | executed similarly to the default Python shell (though often with more |
|
1598 | 1598 | informative feedback). But since IPython is a *superset* of the default |
|
1599 | 1599 | Python shell; let's have a brief look at some of its additional |
|
1600 | 1600 | functionality. |
|
1601 | 1601 | |
|
1602 | 1602 | **Object introspection** |
|
1603 | 1603 | |
|
1604 | 1604 | A simple ``?`` command provides a general introduction to IPython, but |
|
1605 | 1605 | as indicated in the banner above, you can use the ``?`` syntax to ask |
|
1606 | 1606 | for details about any object. For example, if we type ``_1?``, IPython |
|
1607 | 1607 | will print the following details about this variable: |
|
1608 | 1608 | |
|
1609 | 1609 | :: |
|
1610 | 1610 | |
|
1611 | 1611 | In [14]: _1? |
|
1612 | 1612 | Type: long |
|
1613 | 1613 | Base Class: <type 'long'> |
|
1614 | 1614 | String Form:18446744073709551616 |
|
1615 | 1615 | Namespace: Interactive |
|
1616 | 1616 | Docstring: |
|
1617 | 1617 | long(x[, base]) -> integer |
|
1618 | 1618 | |
|
1619 | 1619 | Convert a string or number to a long integer, if possible. A floating |
|
1620 | 1620 | |
|
1621 | 1621 | [etc... snipped for brevity] |
|
1622 | 1622 | |
|
1623 | 1623 | If you add a second ``?`` and for any oobject ``x`` type ``x??``, |
|
1624 | 1624 | IPython will try to provide an even more detailed analsysi of the |
|
1625 | 1625 | object, including its syntax-highlighted source code when it can be |
|
1626 | 1626 | found. It's possible that ``x??`` returns the same information as |
|
1627 | 1627 | ``x?``, but in many cases ``x??`` will indeed provide additional |
|
1628 | 1628 | details. |
|
1629 | 1629 | |
|
1630 | 1630 | Finally, the ``?`` syntax is also useful to search *namespaces* with |
|
1631 | 1631 | wildcards. Suppose you are wondering if there is any function in Numpy |
|
1632 | 1632 | that may do text-related things; with ``np.*txt*?``, IPython will print |
|
1633 | 1633 | all the names in the ``np`` namespace (our Numpy shorthand) that have |
|
1634 | 1634 | 'txt' anywhere in their name: |
|
1635 | 1635 | |
|
1636 | 1636 | :: |
|
1637 | 1637 | |
|
1638 | 1638 | In [17]: np.*txt*? |
|
1639 | 1639 | np.genfromtxt |
|
1640 | 1640 | np.loadtxt |
|
1641 | 1641 | np.mafromtxt |
|
1642 | 1642 | np.ndfromtxt |
|
1643 | 1643 | np.recfromtxt |
|
1644 | 1644 | np.savetxt |
|
1645 | 1645 | |
|
1646 | 1646 | |
|
1647 | 1647 | **Tab completion** |
|
1648 | 1648 | |
|
1649 | 1649 | IPython makes the tab key work extra hard for you as a way to rapidly |
|
1650 | 1650 | inspect objects and libraries. Whenever you have typed something at the |
|
1651 | 1651 | prompt, by hitting the ``<tab>`` key IPython will try to complete the |
|
1652 | 1652 | rest of the line. For this, IPython will analyze the text you had so far |
|
1653 | 1653 | and try to search for Python data or files that may match the context |
|
1654 | 1654 | you have already provided. |
|
1655 | 1655 | |
|
1656 | 1656 | For example, if you type ``np.load`` and hit the key, you'll see: |
|
1657 | 1657 | |
|
1658 | 1658 | :: |
|
1659 | 1659 | |
|
1660 | 1660 | In [21]: np.load<TAB HERE> |
|
1661 | 1661 | np.load np.loads np.loadtxt |
|
1662 | 1662 | |
|
1663 | 1663 | so you can quickly find all the load-related functionality in numpy. Tab |
|
1664 | 1664 | completion works even for function arguments, for example consider this |
|
1665 | 1665 | function definition: |
|
1666 | 1666 | |
|
1667 | 1667 | :: |
|
1668 | 1668 | |
|
1669 | 1669 | In [20]: def f(x, frobinate=False): |
|
1670 | 1670 | ....: if frobinate: |
|
1671 | 1671 | ....: return x**2 |
|
1672 | 1672 | ....: |
|
1673 | 1673 | |
|
1674 | 1674 | If you now use the ``<tab>`` key after having typed 'fro' you'll get all |
|
1675 | 1675 | valid Python completions, but those marked with ``=`` at the end are |
|
1676 | 1676 | known to be keywords of your function: |
|
1677 | 1677 | |
|
1678 | 1678 | :: |
|
1679 | 1679 | |
|
1680 | 1680 | In [21]: f(2, fro<TAB HERE> |
|
1681 | 1681 | frobinate= frombuffer fromfunction frompyfunc fromstring |
|
1682 | 1682 | from fromfile fromiter fromregex frozenset |
|
1683 | 1683 | |
|
1684 | 1684 | at this point you can add the ``b`` letter and hit ``<tab>`` once more, |
|
1685 | 1685 | and IPython will finish the line for you: |
|
1686 | 1686 | |
|
1687 | 1687 | :: |
|
1688 | 1688 | |
|
1689 | 1689 | In [21]: f(2, frobinate= |
|
1690 | 1690 | |
|
1691 | 1691 | As a beginner, simply get into the habit of using ``<tab>`` after most |
|
1692 | 1692 | objects; it should quickly become second nature as you will see how |
|
1693 | 1693 | helps keep a fluid workflow and discover useful information. Later on |
|
1694 | 1694 | you can also customize this behavior by writing your own completion |
|
1695 | 1695 | code, if you so desire. |
|
1696 | 1696 | |
|
1697 | 1697 | **Matplotlib integration** |
|
1698 | 1698 | |
|
1699 | 1699 | One of the most useful features of IPython for scientists is its tight |
|
1700 | 1700 | integration with matplotlib: at the terminal IPython lets you open |
|
1701 | 1701 | matplotlib figures without blocking your typing (which is what happens |
|
1702 | 1702 | if you try to do the same thing at the default Python shell), and in the |
|
1703 | 1703 | Qt console and notebook you can even view your figures embedded in your |
|
1704 | 1704 | workspace next to the code that created them. |
|
1705 | 1705 | |
|
1706 | 1706 | The matplotlib support can be either activated when you start IPython by |
|
1707 | 1707 | passing the ``--pylab`` flag, or at any point later in your session by |
|
1708 | 1708 | using the ``%pylab`` command. If you start IPython with ``--pylab``, |
|
1709 | 1709 | you'll see something like this (note the extra message about pylab): |
|
1710 | 1710 | |
|
1711 | 1711 | :: |
|
1712 | 1712 | |
|
1713 | 1713 | $ ipython --pylab |
|
1714 | 1714 | Python 2.7.2+ (default, Oct 4 2011, 20:03:08) |
|
1715 | 1715 | Type "copyright", "credits" or "license" for more information. |
|
1716 | 1716 | |
|
1717 | 1717 | IPython 0.13.dev -- An enhanced Interactive Python. |
|
1718 | 1718 | ? -> Introduction and overview of IPython's features. |
|
1719 | 1719 | %quickref -> Quick reference. |
|
1720 | 1720 | help -> Python's own help system. |
|
1721 | 1721 | object? -> Details about 'object', use 'object??' for extra details. |
|
1722 | 1722 | |
|
1723 | 1723 | Welcome to pylab, a matplotlib-based Python environment [backend: Qt4Agg]. |
|
1724 | 1724 | For more information, type 'help(pylab)'. |
|
1725 | 1725 | |
|
1726 | 1726 | In [1]: |
|
1727 | 1727 | |
|
1728 | 1728 | Furthermore, IPython will import ``numpy`` with the ``np`` shorthand, |
|
1729 | 1729 | ``matplotlib.pyplot`` as ``plt``, and it will also load all of the numpy |
|
1730 | 1730 | and pyplot top-level names so that you can directly type something like: |
|
1731 | 1731 | |
|
1732 | 1732 | :: |
|
1733 | 1733 | |
|
1734 | 1734 | In [1]: x = linspace(0, 2*pi, 200) |
|
1735 | 1735 | |
|
1736 | 1736 | In [2]: plot(x, sin(x)) |
|
1737 | 1737 | Out[2]: [<matplotlib.lines.Line2D at 0x9e7c16c>] |
|
1738 | 1738 | |
|
1739 | 1739 | instead of having to prefix each call with its full signature (as we |
|
1740 | 1740 | have been doing in the examples thus far): |
|
1741 | 1741 | |
|
1742 | 1742 | :: |
|
1743 | 1743 | |
|
1744 | 1744 | In [3]: x = np.linspace(0, 2*np.pi, 200) |
|
1745 | 1745 | |
|
1746 | 1746 | In [4]: plt.plot(x, np.sin(x)) |
|
1747 | 1747 | Out[4]: [<matplotlib.lines.Line2D at 0x9e900ac>] |
|
1748 | 1748 | |
|
1749 | 1749 | This shorthand notation can be a huge time-saver when working |
|
1750 | 1750 | interactively (it's a few characters but you are likely to type them |
|
1751 | 1751 | hundreds of times in a session). But we should note that as you develop |
|
1752 | 1752 | persistent scripts and notebooks meant for reuse, it's best to get in |
|
1753 | 1753 | the habit of using the longer notation (known as *fully qualified names* |
|
1754 | 1754 | as it's clearer where things come from and it makes for more robust, |
|
1755 | 1755 | readable and maintainable code in the long run). |
|
1756 | 1756 | |
|
1757 | 1757 | **Access to the operating system and files** |
|
1758 | 1758 | |
|
1759 | 1759 | In IPython, you can type ``ls`` to see your files or ``cd`` to change |
|
1760 | 1760 | directories, just like you would at a regular system prompt: |
|
1761 | 1761 | |
|
1762 | 1762 | :: |
|
1763 | 1763 | |
|
1764 | 1764 | In [2]: cd tests |
|
1765 | 1765 | /home/fperez/ipython/nbconvert/tests |
|
1766 | 1766 | |
|
1767 | 1767 | In [3]: ls test.* |
|
1768 | 1768 | test.aux test.html test.ipynb test.log test.out test.pdf test.rst test.tex |
|
1769 | 1769 | |
|
1770 | 1770 | Furthermore, if you use the ``!`` at the beginning of a line, any |
|
1771 | 1771 | commands you pass afterwards go directly to the operating system: |
|
1772 | 1772 | |
|
1773 | 1773 | :: |
|
1774 | 1774 | |
|
1775 | 1775 | In [4]: !echo "Hello IPython" |
|
1776 | 1776 | Hello IPython |
|
1777 | 1777 | |
|
1778 | 1778 | IPython offers a useful twist in this feature: it will substitute in the |
|
1779 | 1779 | command the value of any *Python* variable you may have if you prepend |
|
1780 | 1780 | it with a ``$`` sign: |
|
1781 | 1781 | |
|
1782 | 1782 | :: |
|
1783 | 1783 | |
|
1784 | 1784 | In [5]: message = 'IPython interpolates from Python to the shell' |
|
1785 | 1785 | |
|
1786 | 1786 | In [6]: !echo $message |
|
1787 | 1787 | IPython interpolates from Python to the shell |
|
1788 | 1788 | |
|
1789 | 1789 | This feature can be extremely useful, as it lets you combine the power |
|
1790 | 1790 | and clarity of Python for complex logic with the immediacy and |
|
1791 | 1791 | familiarity of many shell commands. Additionally, if you start the line |
|
1792 | 1792 | with *two* ``$$`` signs, the output of the command will be automatically |
|
1793 | 1793 | captured as a list of lines, e.g.: |
|
1794 | 1794 | |
|
1795 | 1795 | :: |
|
1796 | 1796 | |
|
1797 | 1797 | In [10]: !!ls test.* |
|
1798 | 1798 | Out[10]: |
|
1799 | 1799 | ['test.aux', |
|
1800 | 1800 | 'test.html', |
|
1801 | 1801 | 'test.ipynb', |
|
1802 | 1802 | 'test.log', |
|
1803 | 1803 | 'test.out', |
|
1804 | 1804 | 'test.pdf', |
|
1805 | 1805 | 'test.rst', |
|
1806 | 1806 | 'test.tex'] |
|
1807 | 1807 | |
|
1808 | 1808 | As explained above, you can now use this as the variable ``_10``. If you |
|
1809 | 1809 | directly want to capture the output of a system command to a Python |
|
1810 | 1810 | variable, you can use the syntax ``=!``: |
|
1811 | 1811 | |
|
1812 | 1812 | :: |
|
1813 | 1813 | |
|
1814 | 1814 | In [11]: testfiles =! ls test.* |
|
1815 | 1815 | |
|
1816 | 1816 | In [12]: print testfiles |
|
1817 | 1817 | ['test.aux', 'test.html', 'test.ipynb', 'test.log', 'test.out', 'test.pdf', 'test.rst', 'test.tex'] |
|
1818 | 1818 | |
|
1819 | 1819 | Finally, the special ``%alias`` command lets you define names that are |
|
1820 | 1820 | shorthands for system commands, so that you can type them without having |
|
1821 | 1821 | to prefix them via ``!`` explicitly (for example, ``ls`` is an alias |
|
1822 | 1822 | that has been predefined for you at startup). |
|
1823 | 1823 | |
|
1824 | 1824 | **Magic commands** |
|
1825 | 1825 | |
|
1826 | 1826 | IPython has a system for special commands, called 'magics', that let you |
|
1827 | 1827 | control IPython itself and perform many common tasks with a more |
|
1828 | 1828 | shell-like syntax: it uses spaces for delimiting arguments, flags can be |
|
1829 | 1829 | set with dashes and all arguments are treated as strings, so no |
|
1830 | 1830 | additional quoting is required. This kind of syntax is invalid in the |
|
1831 | 1831 | Python language but very convenient for interactive typing (less |
|
1832 | 1832 | parentheses, commans and quoting everywhere); IPython distinguishes the |
|
1833 | 1833 | two by detecting lines that start with the ``%`` character. |
|
1834 | 1834 | |
|
1835 | 1835 | You can learn more about the magic system by simply typing ``%magic`` at |
|
1836 | 1836 | the prompt, which will give you a short description plus the |
|
1837 | 1837 | documentation on *all* available magics. If you want to see only a |
|
1838 | 1838 | listing of existing magics, you can use ``%lsmagic``: |
|
1839 | 1839 | |
|
1840 | 1840 | :: |
|
1841 | 1841 | |
|
1842 | 1842 | In [4]: lsmagic |
|
1843 | 1843 | Available magic functions: |
|
1844 | 1844 | %alias %autocall %autoindent %automagic %bookmark %c %cd %colors %config %cpaste |
|
1845 | 1845 | %debug %dhist %dirs %doctest_mode %ds %ed %edit %env %gui %hist %history |
|
1846 | 1846 | %install_default_config %install_ext %install_profiles %load_ext %loadpy %logoff %logon |
|
1847 | 1847 | %logstart %logstate %logstop %lsmagic %macro %magic %notebook %page %paste %pastebin |
|
1848 | 1848 | %pd %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %pop %popd %pprint %precision %profile |
|
1849 | 1849 | %prun %psearch %psource %pushd %pwd %pycat %pylab %quickref %recall %rehashx |
|
1850 | 1850 | %reload_ext %rep %rerun %reset %reset_selective %run %save %sc %stop %store %sx %tb |
|
1851 | 1851 | %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode |
|
1852 | 1852 | |
|
1853 | 1853 | Automagic is ON, % prefix NOT needed for magic functions. |
|
1854 | 1854 | |
|
1855 | 1855 | Note how the example above omitted the eplicit ``%`` marker and simply |
|
1856 | 1856 | uses ``lsmagic``. As long as the 'automagic' feature is on (which it is |
|
1857 | 1857 | by default), you can omit the ``%`` marker as long as there is no |
|
1858 | 1858 | ambiguity with a Python variable of the same name. |
|
1859 | 1859 | |
|
1860 | 1860 | **Running your code** |
|
1861 | 1861 | |
|
1862 | 1862 | While it's easy to type a few lines of code in IPython, for any |
|
1863 | 1863 | long-lived work you should keep your codes in Python scripts (or in |
|
1864 | 1864 | IPython notebooks, see below). Consider that you have a script, in this |
|
1865 | 1865 | case trivially simple for the sake of brevity, named ``simple.py``: |
|
1866 | 1866 | |
|
1867 | 1867 | :: |
|
1868 | 1868 | |
|
1869 | 1869 | In [12]: !cat simple.py |
|
1870 | 1870 | import numpy as np |
|
1871 | 1871 | |
|
1872 | 1872 | x = np.random.normal(size=100) |
|
1873 | 1873 | |
|
1874 | 1874 | print 'First elment of x:', x[0] |
|
1875 | 1875 | |
|
1876 | 1876 | The typical workflow with IPython is to use the ``%run`` magic to |
|
1877 | 1877 | execute your script (you can omit the .py extension if you want). When |
|
1878 | 1878 | you run it, the script will execute just as if it had been run at the |
|
1879 | 1879 | system prompt with ``python simple.py`` (though since modules don't get |
|
1880 | 1880 | re-executed on new imports by Python, all system initialization is |
|
1881 | 1881 | essentially free, which can have a significant run time impact in some |
|
1882 | 1882 | cases): |
|
1883 | 1883 | |
|
1884 | 1884 | :: |
|
1885 | 1885 | |
|
1886 | 1886 | In [13]: run simple |
|
1887 | 1887 | First elment of x: -1.55872256289 |
|
1888 | 1888 | |
|
1889 | 1889 | Once it completes, all variables defined in it become available for you |
|
1890 | 1890 | to use interactively: |
|
1891 | 1891 | |
|
1892 | 1892 | :: |
|
1893 | 1893 | |
|
1894 | 1894 | In [14]: x.shape |
|
1895 | 1895 | Out[14]: (100,) |
|
1896 | 1896 | |
|
1897 | 1897 | This allows you to plot data, try out ideas, etc, in a |
|
1898 | 1898 | ``%run``/interact/edit cycle that can be very productive. As you start |
|
1899 | 1899 | understanding your problem better you can refine your script further, |
|
1900 | 1900 | incrementally improving it based on the work you do at the IPython |
|
1901 | 1901 | prompt. At any point you can use the ``%hist`` magic to print out your |
|
1902 | 1902 | history without prompts, so that you can copy useful fragments back into |
|
1903 | 1903 | the script. |
|
1904 | 1904 | |
|
1905 | 1905 | By default, ``%run`` executes scripts in a completely empty namespace, |
|
1906 | 1906 | to better mimic how they would execute at the system prompt with plain |
|
1907 | 1907 | Python. But if you use the ``-i`` flag, the script will also see your |
|
1908 | 1908 | interactively defined variables. This lets you edit in a script larger |
|
1909 | 1909 | amounts of code that still behave as if you had typed them at the |
|
1910 | 1910 | IPython prompt. |
|
1911 | 1911 | |
|
1912 | 1912 | You can also get a summary of the time taken by your script with the |
|
1913 | 1913 | ``-t`` flag; consider a different script ``randsvd.py`` that takes a bit |
|
1914 | 1914 | longer to run: |
|
1915 | 1915 | |
|
1916 | 1916 | :: |
|
1917 | 1917 | |
|
1918 | 1918 | In [21]: run -t randsvd.py |
|
1919 | 1919 | |
|
1920 | 1920 | IPython CPU timings (estimated): |
|
1921 | 1921 | User : 0.38 s. |
|
1922 | 1922 | System : 0.04 s. |
|
1923 | 1923 | Wall time: 0.34 s. |
|
1924 | 1924 | |
|
1925 | 1925 | ``User`` is the time spent by the computer executing your code, while |
|
1926 | 1926 | ``System`` is the time the operating system had to work on your behalf, |
|
1927 | 1927 | doing things like memory allocation that are needed by your code but |
|
1928 | 1928 | that you didn't explicitly program and that happen inside the kernel. |
|
1929 | 1929 | The ``Wall time`` is the time on a 'clock on the wall' between the start |
|
1930 | 1930 | and end of your program. |
|
1931 | 1931 | |
|
1932 | 1932 | If ``Wall > User+System``, your code is most likely waiting idle for |
|
1933 | 1933 | certain periods. That could be waiting for data to arrive from a remote |
|
1934 | 1934 | source or perhaps because the operating system has to swap large amounts |
|
1935 | 1935 | of virtual memory. If you know that your code doesn't explicitly wait |
|
1936 | 1936 | for remote data to arrive, you should investigate further to identify |
|
1937 | 1937 | possible ways of improving the performance profile. |
|
1938 | 1938 | |
|
1939 | 1939 | If you only want to time how long a single statement takes, you don't |
|
1940 | 1940 | need to put it into a script as you can use the ``%timeit`` magic, which |
|
1941 | 1941 | uses Python's ``timeit`` module to very carefully measure timig data; |
|
1942 | 1942 | ``timeit`` can measure even short statements that execute extremely |
|
1943 | 1943 | fast: |
|
1944 | 1944 | |
|
1945 | 1945 | :: |
|
1946 | 1946 | |
|
1947 | 1947 | In [27]: %timeit a=1 |
|
1948 | 1948 | 10000000 loops, best of 3: 23 ns per loop |
|
1949 | 1949 | |
|
1950 | 1950 | and for code that runs longer, it automatically adjusts so the overall |
|
1951 | 1951 | measurement doesn't take too long: |
|
1952 | 1952 | |
|
1953 | 1953 | :: |
|
1954 | 1954 | |
|
1955 | 1955 | In [28]: %timeit np.linalg.svd(x) |
|
1956 | 1956 | 1 loops, best of 3: 310 ms per loop |
|
1957 | 1957 | |
|
1958 | 1958 | The ``%run`` magic still has more options for debugging and profiling |
|
1959 | 1959 | data; you should read its documentation for many useful details (as |
|
1960 | 1960 | always, just type ``%run?``). |
|
1961 | 1961 | |
|
1962 | 1962 | The graphical Qt console |
|
1963 | 1963 | ------------------------ |
|
1964 | 1964 | |
|
1965 | 1965 | If you type at the system prompt (see the IPython website for |
|
1966 | 1966 | installation details, as this requires some additional libraries): |
|
1967 | 1967 | |
|
1968 | 1968 | :: |
|
1969 | 1969 | |
|
1970 | 1970 | $ ipython qtconsole |
|
1971 | 1971 | |
|
1972 | 1972 | instead of opening in a terminal as before, IPython will start a |
|
1973 | 1973 | graphical console that at first sight appears just like a terminal, but |
|
1974 | 1974 | which is in fact much more capable than a text-only terminal. This is a |
|
1975 | 1975 | specialized terminal designed for interactive scientific work, and it |
|
1976 | 1976 | supports full multi-line editing with color highlighting and graphical |
|
1977 | 1977 | calltips for functions, it can keep multiple IPython sessions open |
|
1978 | 1978 | simultaneously in tabs, and when scripts run it can display the figures |
|
1979 | 1979 | inline directly in the work area. |
|
1980 | 1980 | |
|
1981 | 1981 | .. raw:: html |
|
1982 | 1982 | |
|
1983 | 1983 | <center> |
|
1984 | 1984 | |
|
1985 | 1985 | .. raw:: html |
|
1986 | 1986 | |
|
1987 | 1987 | </center> |
|
1988 | 1988 | |
|
1989 | 1989 | |
|
1990 | 1990 | % This cell is for the pdflatex output only |
|
1991 | 1991 | \begin{figure}[htbp] |
|
1992 | 1992 | \centering |
|
1993 | 1993 | \includegraphics[width=3in]{ipython_qtconsole2.png} |
|
1994 | 1994 | \caption{The IPython Qt console: a lightweight terminal for scientific exploration, with code, results and graphics in a soingle environment.} |
|
1995 | 1995 | \end{figure} |
|
1996 | 1996 | The Qt console accepts the same ``--pylab`` startup flags as the |
|
1997 | 1997 | terminal, but you can additionally supply the value ``--pylab inline``, |
|
1998 | 1998 | which enables the support for inline graphics shown in the figure. This |
|
1999 | 1999 | is ideal for keeping all the code and figures in the same session, given |
|
2000 | 2000 | that the console can save the output of your entire session to HTML or |
|
2001 | 2001 | PDF. |
|
2002 | 2002 | |
|
2003 | 2003 | Since the Qt console makes it far more convenient than the terminal to |
|
2004 | 2004 | edit blocks of code with multiple lines, in this environment it's worth |
|
2005 | 2005 | knowing about the ``%loadpy`` magic function. ``%loadpy`` takes a path |
|
2006 | 2006 | to a local file or remote URL, fetches its contents, and puts it in the |
|
2007 | 2007 | work area for you to further edit and execute. It can be an extremely |
|
2008 | 2008 | fast and convenient way of loading code from local disk or remote |
|
2009 | 2009 | examples from sites such as the `Matplotlib |
|
2010 | 2010 | gallery <http://matplotlib.sourceforge.net/gallery.html>`_. |
|
2011 | 2011 | |
|
2012 | 2012 | Other than its enhanced capabilities for code and graphics, all of the |
|
2013 | 2013 | features of IPython we've explained before remain functional in this |
|
2014 | 2014 | graphical console. |
|
2015 | 2015 | |
|
2016 | 2016 | The IPython Notebook |
|
2017 | 2017 | -------------------- |
|
2018 | 2018 | |
|
2019 | 2019 | The third way to interact with IPython, in addition to the terminal and |
|
2020 | 2020 | graphical Qt console, is a powerful web interface called the "IPython |
|
2021 | 2021 | Notebook". If you run at the system console (you can omit the ``pylab`` |
|
2022 | 2022 | flags if you don't need plotting support): |
|
2023 | 2023 | |
|
2024 | 2024 | :: |
|
2025 | 2025 | |
|
2026 | 2026 | $ ipython notebook --pylab inline |
|
2027 | 2027 | |
|
2028 | 2028 | IPython will start a process that runs a web server in your local |
|
2029 | 2029 | machine and to which a web browser can connect. The Notebook is a |
|
2030 | 2030 | workspace that lets you execute code in blocks called 'cells' and |
|
2031 | 2031 | displays any results and figures, but which can also contain arbitrary |
|
2032 | 2032 | text (including LaTeX-formatted mathematical expressions) and any rich |
|
2033 | 2033 | media that a modern web browser is capable of displaying. |
|
2034 | 2034 | |
|
2035 | 2035 | .. raw:: html |
|
2036 | 2036 | |
|
2037 | 2037 | <center> |
|
2038 | 2038 | |
|
2039 | 2039 | .. raw:: html |
|
2040 | 2040 | |
|
2041 | 2041 | </center> |
|
2042 | 2042 | |
|
2043 | 2043 | |
|
2044 | 2044 | % This cell is for the pdflatex output only |
|
2045 | 2045 | \begin{figure}[htbp] |
|
2046 | 2046 | \centering |
|
2047 | 2047 | \includegraphics[width=3in]{ipython-notebook-specgram-2.png} |
|
2048 | 2048 | \caption{The IPython Notebook: text, equations, code, results, graphics and other multimedia in an open format for scientific exploration and collaboration} |
|
2049 | 2049 | \end{figure} |
|
2050 | 2050 | In fact, this document was written as a Notebook, and only exported to |
|
2051 | 2051 | LaTeX for printing. Inside of each cell, all the features of IPython |
|
2052 | 2052 | that we have discussed before remain functional, since ultimately this |
|
2053 | 2053 | web client is communicating with the same IPython code that runs in the |
|
2054 | 2054 | terminal. But this interface is a much more rich and powerful |
|
2055 | 2055 | environment for maintaining long-term "live and executable" scientific |
|
2056 | 2056 | documents. |
|
2057 | 2057 | |
|
2058 | 2058 | Notebook environments have existed in commercial systems like |
|
2059 | 2059 | Mathematica(TM) and Maple(TM) for a long time; in the open source world |
|
2060 | 2060 | the `Sage <http://sagemath.org>`_ project blazed this particular trail |
|
2061 | 2061 | starting in 2006, and now we bring all the features that have made |
|
2062 | 2062 | IPython such a widely used tool to a Notebook model. |
|
2063 | 2063 | |
|
2064 | 2064 | Since the Notebook runs as a web application, it is possible to |
|
2065 | 2065 | configure it for remote access, letting you run your computations on a |
|
2066 | 2066 | persistent server close to your data, which you can then access remotely |
|
2067 | 2067 | from any browser-equipped computer. We encourage you to read the |
|
2068 | 2068 | extensive documentation provided by the IPython project for details on |
|
2069 | 2069 | how to do this and many more features of the notebook. |
|
2070 | 2070 | |
|
2071 | 2071 | Finally, as we said earlier, IPython also has a high-level and easy to |
|
2072 | 2072 | use set of libraries for parallel computing, that let you control |
|
2073 | 2073 | (interactively if desired) not just one IPython but an entire cluster of |
|
2074 | 2074 | 'IPython engines'. Unfortunately a detailed discussion of these tools is |
|
2075 | 2075 | beyond the scope of this text, but should you need to parallelize your |
|
2076 | 2076 | analysis codes, a quick read of the tutorials and examples provided at |
|
2077 | 2077 | the IPython site may prove fruitful. |
General Comments 0
You need to be logged in to leave comments.
Login now