##// END OF EJS Templates
update ref pyfile
Matthias BUSSONNIER -
Show More
@@ -1,1181 +1,1181 b''
1 ## An Introduction to the Scientific Python Ecosystem
1 ## An Introduction to the Scientific Python Ecosystem
2
2
3 # While the Python language is an excellent tool for general-purpose programming, with a highly readable syntax, rich and powerful data types (strings, lists, sets, dictionaries, arbitrary length integers, etc) and a very comprehensive standard library, it was not designed specifically for mathematical and scientific computing. Neither the language nor its standard library have facilities for the efficient representation of multidimensional datasets, tools for linear algebra and general matrix manipulations (an essential building block of virtually all technical computing), nor any data visualization facilities.
3 # While the Python language is an excellent tool for general-purpose programming, with a highly readable syntax, rich and powerful data types (strings, lists, sets, dictionaries, arbitrary length integers, etc) and a very comprehensive standard library, it was not designed specifically for mathematical and scientific computing. Neither the language nor its standard library have facilities for the efficient representation of multidimensional datasets, tools for linear algebra and general matrix manipulations (an essential building block of virtually all technical computing), nor any data visualization facilities.
4 #
4 #
5 # In particular, Python lists are very flexible containers that can be nested arbitrarily deep and which can hold any Python object in them, but they are poorly suited to represent efficiently common mathematical constructs like vectors and matrices. In contrast, much of our modern heritage of scientific computing has been built on top of libraries written in the Fortran language, which has native support for vectors and matrices as well as a library of mathematical functions that can efficiently operate on entire arrays at once.
5 # In particular, Python lists are very flexible containers that can be nested arbitrarily deep and which can hold any Python object in them, but they are poorly suited to represent efficiently common mathematical constructs like vectors and matrices. In contrast, much of our modern heritage of scientific computing has been built on top of libraries written in the Fortran language, which has native support for vectors and matrices as well as a library of mathematical functions that can efficiently operate on entire arrays at once.
6
6
7 ### Scientific Python: a collaboration of projects built by scientists
7 ### Scientific Python: a collaboration of projects built by scientists
8
8
9 # The scientific community has developed a set of related Python libraries that provide powerful array facilities, linear algebra, numerical algorithms, data visualization and more. In this appendix, we will briefly outline the tools most frequently used for this purpose, that make "Scientific Python" something far more powerful than the Python language alone.
9 # The scientific community has developed a set of related Python libraries that provide powerful array facilities, linear algebra, numerical algorithms, data visualization and more. In this appendix, we will briefly outline the tools most frequently used for this purpose, that make "Scientific Python" something far more powerful than the Python language alone.
10 #
10 #
11 # For reasons of space, we can only describe in some detail the central Numpy library, but below we provide links to the websites of each project where you can read their documentation in more detail.
11 # For reasons of space, we can only describe in some detail the central Numpy library, but below we provide links to the websites of each project where you can read their documentation in more detail.
12 #
12 #
13 # First, let's look at an overview of the basic tools that most scientists use in daily research with Python. The core of this ecosystem is composed of:
13 # First, let's look at an overview of the basic tools that most scientists use in daily research with Python. The core of this ecosystem is composed of:
14 #
14 #
15 # * Numpy: the basic library that most others depend on, it provides a powerful array type that can represent multidmensional datasets of many different kinds and that supports arithmetic operations. Numpy also provides a library of common mathematical functions, basic linear algebra, random number generation and Fast Fourier Transforms. Numpy can be found at [numpy.scipy.org](http://numpy.scipy.org)
15 # * Numpy: the basic library that most others depend on, it provides a powerful array type that can represent multidmensional datasets of many different kinds and that supports arithmetic operations. Numpy also provides a library of common mathematical functions, basic linear algebra, random number generation and Fast Fourier Transforms. Numpy can be found at [numpy.scipy.org](http://numpy.scipy.org)
16 #
16 #
17 # * Scipy: a large collection of numerical algorithms that operate on numpy arrays and provide facilities for many common tasks in scientific computing, including dense and sparse linear algebra support, optimization, special functions, statistics, n-dimensional image processing, signal processing and more. Scipy can be found at [scipy.org](http://scipy.org).
17 # * Scipy: a large collection of numerical algorithms that operate on numpy arrays and provide facilities for many common tasks in scientific computing, including dense and sparse linear algebra support, optimization, special functions, statistics, n-dimensional image processing, signal processing and more. Scipy can be found at [scipy.org](http://scipy.org).
18 #
18 #
19 # * Matplotlib: a data visualization library with a strong focus on producing high-quality output, it supports a variety of common scientific plot types in two and three dimensions, with precise control over the final output and format for publication-quality results. Matplotlib can also be controlled interactively allowing graphical manipulation of your data (zooming, panning, etc) and can be used with most modern user interface toolkits. It can be found at [matplotlib.sf.net](http://matplotlib.sf.net).
19 # * Matplotlib: a data visualization library with a strong focus on producing high-quality output, it supports a variety of common scientific plot types in two and three dimensions, with precise control over the final output and format for publication-quality results. Matplotlib can also be controlled interactively allowing graphical manipulation of your data (zooming, panning, etc) and can be used with most modern user interface toolkits. It can be found at [matplotlib.sf.net](http://matplotlib.sf.net).
20 #
20 #
21 # * IPython: while not strictly scientific in nature, IPython is the interactive environment in which many scientists spend their time. IPython provides a powerful Python shell that integrates tightly with Matplotlib and with easy access to the files and operating system, and which can execute in a terminal or in a graphical Qt console. IPython also has a web-based notebook interface that can combine code with text, mathematical expressions, figures and multimedia. It can be found at [ipython.org](http://ipython.org).
21 # * IPython: while not strictly scientific in nature, IPython is the interactive environment in which many scientists spend their time. IPython provides a powerful Python shell that integrates tightly with Matplotlib and with easy access to the files and operating system, and which can execute in a terminal or in a graphical Qt console. IPython also has a web-based notebook interface that can combine code with text, mathematical expressions, figures and multimedia. It can be found at [ipython.org](http://ipython.org).
22 #
22 #
23 # While each of these tools can be installed separately, in our opinion the most convenient way today of accessing them (especially on Windows and Mac computers) is to install the [Free Edition of the Enthought Python Distribution](http://www.enthought.com/products/epd_free.php) which contain all the above. Other free alternatives on Windows (but not on Macs) are [Python(x,y)](http://code.google.com/p/pythonxy) and [ Christoph Gohlke's packages page](http://www.lfd.uci.edu/~gohlke/pythonlibs).
23 # While each of these tools can be installed separately, in our opinion the most convenient way today of accessing them (especially on Windows and Mac computers) is to install the [Free Edition of the Enthought Python Distribution](http://www.enthought.com/products/epd_free.php) which contain all the above. Other free alternatives on Windows (but not on Macs) are [Python(x,y)](http://code.google.com/p/pythonxy) and [ Christoph Gohlke's packages page](http://www.lfd.uci.edu/~gohlke/pythonlibs).
24 #
24 #
25 # These four 'core' libraries are in practice complemented by a number of other tools for more specialized work. We will briefly list here the ones that we think are the most commonly needed:
25 # These four 'core' libraries are in practice complemented by a number of other tools for more specialized work. We will briefly list here the ones that we think are the most commonly needed:
26 #
26 #
27 # * Sympy: a symbolic manipulation tool that turns a Python session into a computer algebra system. It integrates with the IPython notebook, rendering results in properly typeset mathematical notation. [sympy.org](http://sympy.org).
27 # * Sympy: a symbolic manipulation tool that turns a Python session into a computer algebra system. It integrates with the IPython notebook, rendering results in properly typeset mathematical notation. [sympy.org](http://sympy.org).
28 #
28 #
29 # * Mayavi: sophisticated 3d data visualization; [code.enthought.com/projects/mayavi](http://code.enthought.com/projects/mayavi).
29 # * Mayavi: sophisticated 3d data visualization; [code.enthought.com/projects/mayavi](http://code.enthought.com/projects/mayavi).
30 #
30 #
31 # * Cython: a bridge language between Python and C, useful both to optimize performance bottlenecks in Python and to access C libraries directly; [cython.org](http://cython.org).
31 # * Cython: a bridge language between Python and C, useful both to optimize performance bottlenecks in Python and to access C libraries directly; [cython.org](http://cython.org).
32 #
32 #
33 # * Pandas: high-performance data structures and data analysis tools, with powerful data alignment and structural manipulation capabilities; [pandas.pydata.org](http://pandas.pydata.org).
33 # * Pandas: high-performance data structures and data analysis tools, with powerful data alignment and structural manipulation capabilities; [pandas.pydata.org](http://pandas.pydata.org).
34 #
34 #
35 # * Statsmodels: statistical data exploration and model estimation; [statsmodels.sourceforge.net](http://statsmodels.sourceforge.net).
35 # * Statsmodels: statistical data exploration and model estimation; [statsmodels.sourceforge.net](http://statsmodels.sourceforge.net).
36 #
36 #
37 # * Scikit-learn: general purpose machine learning algorithms with a common interface; [scikit-learn.org](http://scikit-learn.org).
37 # * Scikit-learn: general purpose machine learning algorithms with a common interface; [scikit-learn.org](http://scikit-learn.org).
38 #
38 #
39 # * Scikits-image: image processing toolbox; [scikits-image.org](http://scikits-image.org).
39 # * Scikits-image: image processing toolbox; [scikits-image.org](http://scikits-image.org).
40 #
40 #
41 # * NetworkX: analysis of complex networks (in the graph theoretical sense); [networkx.lanl.gov](http://networkx.lanl.gov).
41 # * NetworkX: analysis of complex networks (in the graph theoretical sense); [networkx.lanl.gov](http://networkx.lanl.gov).
42 #
42 #
43 # * PyTables: management of hierarchical datasets using the industry-standard HDF5 format; [www.pytables.org](http://www.pytables.org).
43 # * PyTables: management of hierarchical datasets using the industry-standard HDF5 format; [www.pytables.org](http://www.pytables.org).
44 #
44 #
45 # Beyond these, for any specific problem you should look on the internet first, before starting to write code from scratch. There's a good chance that someone, somewhere, has written an open source library that you can use for part or all of your problem.
45 # Beyond these, for any specific problem you should look on the internet first, before starting to write code from scratch. There's a good chance that someone, somewhere, has written an open source library that you can use for part or all of your problem.
46
46
47 ### A note about the examples below
47 ### A note about the examples below
48
48
49 # In all subsequent examples, you will see blocks of input code, followed by the results of the code if the code generated output. This output may include text, graphics and other result objects. These blocks of input can be pasted into your interactive IPython session or notebook for you to execute. In the print version of this document, a thin vertical bar on the left of the blocks of input and output shows which blocks go together.
49 # In all subsequent examples, you will see blocks of input code, followed by the results of the code if the code generated output. This output may include text, graphics and other result objects. These blocks of input can be pasted into your interactive IPython session or notebook for you to execute. In the print version of this document, a thin vertical bar on the left of the blocks of input and output shows which blocks go together.
50 #
50 #
51 # If you are reading this text as an actual IPython notebook, you can press `Shift-Enter` or use the 'play' button on the toolbar (right-pointing triangle) to execute each block of code, known as a 'cell' in IPython:
51 # If you are reading this text as an actual IPython notebook, you can press `Shift-Enter` or use the 'play' button on the toolbar (right-pointing triangle) to execute each block of code, known as a 'cell' in IPython:
52
52
53 # In[71]:
53 # In[71]:
54 # This is a block of code, below you'll see its output
54 # This is a block of code, below you'll see its output
55 print "Welcome to the world of scientific computing with Python!"
55 print "Welcome to the world of scientific computing with Python!"
56
56
57 # Out[71]:
57 # Out[71]:
58 # Welcome to the world of scientific computing with Python!
58 # Welcome to the world of scientific computing with Python!
59 #
59 #
60 ## Motivation: the trapezoidal rule
60 ## Motivation: the trapezoidal rule
61
61
62 # In subsequent sections we'll provide a basic introduction to the nuts and bolts of the basic scientific python tools; but we'll first motivate it with a brief example that illustrates what you can do in a few lines with these tools. For this, we will use the simple problem of approximating a definite integral with the trapezoid rule:
62 # In subsequent sections we'll provide a basic introduction to the nuts and bolts of the basic scientific python tools; but we'll first motivate it with a brief example that illustrates what you can do in a few lines with these tools. For this, we will use the simple problem of approximating a definite integral with the trapezoid rule:
63 #
63 #
64 # $$
64 # $$
65 # \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right).
65 # \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right).
66 # $$
66 # $$
67 #
67 #
68 # Our task will be to compute this formula for a function such as:
68 # Our task will be to compute this formula for a function such as:
69 #
69 #
70 # $$
70 # $$
71 # f(x) = (x-3)(x-5)(x-7)+85
71 # f(x) = (x-3)(x-5)(x-7)+85
72 # $$
72 # $$
73 #
73 #
74 # integrated between $a=1$ and $b=9$.
74 # integrated between $a=1$ and $b=9$.
75 #
75 #
76 # First, we define the function and sample it evenly between 0 and 10 at 200 points:
76 # First, we define the function and sample it evenly between 0 and 10 at 200 points:
77
77
78 # In[1]:
78 # In[1]:
79 def f(x):
79 def f(x):
80 return (x-3)*(x-5)*(x-7)+85
80 return (x-3)*(x-5)*(x-7)+85
81
81
82 import numpy as np
82 import numpy as np
83 x = np.linspace(0, 10, 200)
83 x = np.linspace(0, 10, 200)
84 y = f(x)
84 y = f(x)
85
85
86 # We select $a$ and $b$, our integration limits, and we take only a few points in that region to illustrate the error behavior of the trapezoid approximation:
86 # We select $a$ and $b$, our integration limits, and we take only a few points in that region to illustrate the error behavior of the trapezoid approximation:
87
87
88 # In[2]:
88 # In[2]:
89 a, b = 1, 9
89 a, b = 1, 9
90 xint = x[logical_and(x>=a, x<=b)][::30]
90 xint = x[logical_and(x>=a, x<=b)][::30]
91 yint = y[logical_and(x>=a, x<=b)][::30]
91 yint = y[logical_and(x>=a, x<=b)][::30]
92
92
93 # Let's plot both the function and the area below it in the trapezoid approximation:
93 # Let's plot both the function and the area below it in the trapezoid approximation:
94
94
95 # In[3]:
95 # In[3]:
96 import matplotlib.pyplot as plt
96 import matplotlib.pyplot as plt
97 plt.plot(x, y, lw=2)
97 plt.plot(x, y, lw=2)
98 plt.axis([0, 10, 0, 140])
98 plt.axis([0, 10, 0, 140])
99 plt.fill_between(xint, 0, yint, facecolor='gray', alpha=0.4)
99 plt.fill_between(xint, 0, yint, facecolor='gray', alpha=0.4)
100 plt.text(0.5 * (a + b), 30,r"$\int_a^b f(x)dx$", horizontalalignment='center', fontsize=20);
100 plt.text(0.5 * (a + b), 30,r"$\int_a^b f(x)dx$", horizontalalignment='center', fontsize=20);
101
101
102 # Out[3]:
102 # Out[3]:
103 # image file: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_00.svg
103 # image file: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_00.svg
104
104
105 # Compute the integral both at high accuracy and with the trapezoid approximation
105 # Compute the integral both at high accuracy and with the trapezoid approximation
106
106
107 # In[4]:
107 # In[4]:
108 from scipy.integrate import quad, trapz
108 from scipy.integrate import quad, trapz
109 integral, error = quad(f, 1, 9)
109 integral, error = quad(f, 1, 9)
110 trap_integral = trapz(yint, xint)
110 trap_integral = trapz(yint, xint)
111 print "The integral is: %g +/- %.1e" % (integral, error)
111 print "The integral is: %g +/- %.1e" % (integral, error)
112 print "The trapezoid approximation with", len(xint), "points is:", trap_integral
112 print "The trapezoid approximation with", len(xint), "points is:", trap_integral
113 print "The absolute error is:", abs(integral - trap_integral)
113 print "The absolute error is:", abs(integral - trap_integral)
114
114
115 # Out[4]:
115 # Out[4]:
116 # The integral is: 680 +/- 7.5e-12
116 # The integral is: 680 +/- 7.5e-12
117 # The trapezoid approximation with 6 points is: 621.286411141
117 # The trapezoid approximation with 6 points is: 621.286411141
118 # The absolute error is: 58.7135888589
118 # The absolute error is: 58.7135888589
119 #
119 #
120 # This simple example showed us how, combining the numpy, scipy and matplotlib libraries we can provide an illustration of a standard method in elementary calculus with just a few lines of code. We will now discuss with more detail the basic usage of these tools.
120 # This simple example showed us how, combining the numpy, scipy and matplotlib libraries we can provide an illustration of a standard method in elementary calculus with just a few lines of code. We will now discuss with more detail the basic usage of these tools.
121
121
122 ## NumPy arrays: the right data structure for scientific computing
122 ## NumPy arrays: the right data structure for scientific computing
123
123
124 ### Basics of Numpy arrays
124 ### Basics of Numpy arrays
125
125
126 # We now turn our attention to the Numpy library, which forms the base layer for the entire 'scipy ecosystem'. Once you have installed numpy, you can import it as
126 # We now turn our attention to the Numpy library, which forms the base layer for the entire 'scipy ecosystem'. Once you have installed numpy, you can import it as
127
127
128 # In[5]:
128 # In[5]:
129 import numpy
129 import numpy
130
130
131 # though in this book we will use the common shorthand
131 # though in this book we will use the common shorthand
132
132
133 # In[6]:
133 # In[6]:
134 import numpy as np
134 import numpy as np
135
135
136 # As mentioned above, the main object provided by numpy is a powerful array. We'll start by exploring how the numpy array differs from Python lists. We start by creating a simple list and an array with the same contents of the list:
136 # As mentioned above, the main object provided by numpy is a powerful array. We'll start by exploring how the numpy array differs from Python lists. We start by creating a simple list and an array with the same contents of the list:
137
137
138 # In[7]:
138 # In[7]:
139 lst = [10, 20, 30, 40]
139 lst = [10, 20, 30, 40]
140 arr = np.array([10, 20, 30, 40])
140 arr = np.array([10, 20, 30, 40])
141
141
142 # Elements of a one-dimensional array are accessed with the same syntax as a list:
142 # Elements of a one-dimensional array are accessed with the same syntax as a list:
143
143
144 # In[8]:
144 # In[8]:
145 lst[0]
145 lst[0]
146
146
147 # Out[8]:
147 # Out[8]:
148 # 10
148 # 10
149
149
150
150
151 # In[9]:
151 # In[9]:
152 arr[0]
152 arr[0]
153
153
154 # Out[9]:
154 # Out[9]:
155 # 10
155 # 10
156
156
157
157
158 # In[10]:
158 # In[10]:
159 arr[-1]
159 arr[-1]
160
160
161 # Out[10]:
161 # Out[10]:
162 # 40
162 # 40
163
163
164
164
165 # In[11]:
165 # In[11]:
166 arr[2:]
166 arr[2:]
167
167
168 # Out[11]:
168 # Out[11]:
169 # array([30, 40])
169 # array([30, 40])
170
170
171
171
172 # The first difference to note between lists and arrays is that arrays are *homogeneous*; i.e. all elements of an array must be of the same type. In contrast, lists can contain elements of arbitrary type. For example, we can change the last element in our list above to be a string:
172 # The first difference to note between lists and arrays is that arrays are *homogeneous*; i.e. all elements of an array must be of the same type. In contrast, lists can contain elements of arbitrary type. For example, we can change the last element in our list above to be a string:
173
173
174 # In[12]:
174 # In[12]:
175 lst[-1] = 'a string inside a list'
175 lst[-1] = 'a string inside a list'
176 lst
176 lst
177
177
178 # Out[12]:
178 # Out[12]:
179 # [10, 20, 30, 'a string inside a list']
179 # [10, 20, 30, 'a string inside a list']
180
180
181
181
182 # but the same can not be done with an array, as we get an error message:
182 # but the same can not be done with an array, as we get an error message:
183
183
184 # In[13]:
184 # In[13]:
185 arr[-1] = 'a string inside an array'
185 arr[-1] = 'a string inside an array'
186
186
187 # Out[13]:
187 # Out[13]:
188 ---------------------------------------------------------------------------
188 ---------------------------------------------------------------------------
189 ValueError Traceback (most recent call last)
189 ValueError Traceback (most recent call last)
190 /home/fperez/teach/book-math-labtool/<ipython-input-13-29c0bfa5fa8a> in <module>()
190 /home/fperez/teach/book-math-labtool/<ipython-input-13-29c0bfa5fa8a> in <module>()
191 ----> 1 arr[-1] = 'a string inside an array'
191 ----> 1 arr[-1] = 'a string inside an array'
192
192
193 ValueError: invalid literal for long() with base 10: 'a string inside an array'
193 ValueError: invalid literal for long() with base 10: 'a string inside an array'
194
194
195 # The information about the type of an array is contained in its *dtype* attribute:
195 # The information about the type of an array is contained in its *dtype* attribute:
196
196
197 # In[14]:
197 # In[14]:
198 arr.dtype
198 arr.dtype
199
199
200 # Out[14]:
200 # Out[14]:
201 # dtype('int32')
201 # dtype('int32')
202
202
203
203
204 # Once an array has been created, its dtype is fixed and it can only store elements of the same type. For this example where the dtype is integer, if we store a floating point number it will be automatically converted into an integer:
204 # Once an array has been created, its dtype is fixed and it can only store elements of the same type. For this example where the dtype is integer, if we store a floating point number it will be automatically converted into an integer:
205
205
206 # In[15]:
206 # In[15]:
207 arr[-1] = 1.234
207 arr[-1] = 1.234
208 arr
208 arr
209
209
210 # Out[15]:
210 # Out[15]:
211 # array([10, 20, 30, 1])
211 # array([10, 20, 30, 1])
212
212
213
213
214 # Above we created an array from an existing list; now let us now see other ways in which we can create arrays, which we'll illustrate next. A common need is to have an array initialized with a constant value, and very often this value is 0 or 1 (suitable as starting value for additive and multiplicative loops respectively); `zeros` creates arrays of all zeros, with any desired dtype:
214 # Above we created an array from an existing list; now let us now see other ways in which we can create arrays, which we'll illustrate next. A common need is to have an array initialized with a constant value, and very often this value is 0 or 1 (suitable as starting value for additive and multiplicative loops respectively); `zeros` creates arrays of all zeros, with any desired dtype:
215
215
216 # In[16]:
216 # In[16]:
217 np.zeros(5, float)
217 np.zeros(5, float)
218
218
219 # Out[16]:
219 # Out[16]:
220 # array([ 0., 0., 0., 0., 0.])
220 # array([ 0., 0., 0., 0., 0.])
221
221
222
222
223 # In[17]:
223 # In[17]:
224 np.zeros(3, int)
224 np.zeros(3, int)
225
225
226 # Out[17]:
226 # Out[17]:
227 # array([0, 0, 0])
227 # array([0, 0, 0])
228
228
229
229
230 # In[18]:
230 # In[18]:
231 np.zeros(3, complex)
231 np.zeros(3, complex)
232
232
233 # Out[18]:
233 # Out[18]:
234 # array([ 0.+0.j, 0.+0.j, 0.+0.j])
234 # array([ 0.+0.j, 0.+0.j, 0.+0.j])
235
235
236
236
237 # and similarly for `ones`:
237 # and similarly for `ones`:
238
238
239 # In[19]:
239 # In[19]:
240 print '5 ones:', np.ones(5)
240 print '5 ones:', np.ones(5)
241
241
242 # Out[19]:
242 # Out[19]:
243 # 5 ones: [ 1. 1. 1. 1. 1.]
243 # 5 ones: [ 1. 1. 1. 1. 1.]
244 #
244 #
245 # If we want an array initialized with an arbitrary value, we can create an empty array and then use the fill method to put the value we want into the array:
245 # If we want an array initialized with an arbitrary value, we can create an empty array and then use the fill method to put the value we want into the array:
246
246
247 # In[20]:
247 # In[20]:
248 a = empty(4)
248 a = empty(4)
249 a.fill(5.5)
249 a.fill(5.5)
250 a
250 a
251
251
252 # Out[20]:
252 # Out[20]:
253 # array([ 5.5, 5.5, 5.5, 5.5])
253 # array([ 5.5, 5.5, 5.5, 5.5])
254
254
255
255
256 # Numpy also offers the `arange` function, which works like the builtin `range` but returns an array instead of a list:
256 # Numpy also offers the `arange` function, which works like the builtin `range` but returns an array instead of a list:
257
257
258 # In[21]:
258 # In[21]:
259 np.arange(5)
259 np.arange(5)
260
260
261 # Out[21]:
261 # Out[21]:
262 # array([0, 1, 2, 3, 4])
262 # array([0, 1, 2, 3, 4])
263
263
264
264
265 # and the `linspace` and `logspace` functions to create linearly and logarithmically-spaced grids respectively, with a fixed number of points and including both ends of the specified interval:
265 # and the `linspace` and `logspace` functions to create linearly and logarithmically-spaced grids respectively, with a fixed number of points and including both ends of the specified interval:
266
266
267 # In[22]:
267 # In[22]:
268 print "A linear grid between 0 and 1:", np.linspace(0, 1, 5)
268 print "A linear grid between 0 and 1:", np.linspace(0, 1, 5)
269 print "A logarithmic grid between 10**1 and 10**4: ", np.logspace(1, 4, 4)
269 print "A logarithmic grid between 10**1 and 10**4: ", np.logspace(1, 4, 4)
270
270
271 # Out[22]:
271 # Out[22]:
272 # A linear grid between 0 and 1: [ 0. 0.25 0.5 0.75 1. ]
272 # A linear grid between 0 and 1: [ 0. 0.25 0.5 0.75 1. ]
273 # A logarithmic grid between 10**1 and 10**4: [ 10. 100. 1000. 10000.]
273 # A logarithmic grid between 10**1 and 10**4: [ 10. 100. 1000. 10000.]
274 #
274 #
275 # Finally, it is often useful to create arrays with random numbers that follow a specific distribution. The `np.random` module contains a number of functions that can be used to this effect, for example this will produce an array of 5 random samples taken from a standard normal distribution (0 mean and variance 1):
275 # Finally, it is often useful to create arrays with random numbers that follow a specific distribution. The `np.random` module contains a number of functions that can be used to this effect, for example this will produce an array of 5 random samples taken from a standard normal distribution (0 mean and variance 1):
276
276
277 # In[23]:
277 # In[23]:
278 np.random.randn(5)
278 np.random.randn(5)
279
279
280 # Out[23]:
280 # Out[23]:
281 # array([-0.08633343, -0.67375434, 1.00589536, 0.87081651, 1.65597822])
281 # array([-0.08633343, -0.67375434, 1.00589536, 0.87081651, 1.65597822])
282
282
283
283
284 # whereas this will also give 5 samples, but from a normal distribution with a mean of 10 and a variance of 3:
284 # whereas this will also give 5 samples, but from a normal distribution with a mean of 10 and a variance of 3:
285
285
286 # In[24]:
286 # In[24]:
287 norm10 = np.random.normal(10, 3, 5)
287 norm10 = np.random.normal(10, 3, 5)
288 norm10
288 norm10
289
289
290 # Out[24]:
290 # Out[24]:
291 # array([ 8.94879575, 5.53038269, 8.24847281, 12.14944165, 11.56209294])
291 # array([ 8.94879575, 5.53038269, 8.24847281, 12.14944165, 11.56209294])
292
292
293
293
294 ### Indexing with other arrays
294 ### Indexing with other arrays
295
295
296 # Above we saw how to index arrays with single numbers and slices, just like Python lists. But arrays allow for a more sophisticated kind of indexing which is very powerful: you can index an array with another array, and in particular with an array of boolean values. This is particluarly useful to extract information from an array that matches a certain condition.
296 # Above we saw how to index arrays with single numbers and slices, just like Python lists. But arrays allow for a more sophisticated kind of indexing which is very powerful: you can index an array with another array, and in particular with an array of boolean values. This is particluarly useful to extract information from an array that matches a certain condition.
297 #
297 #
298 # Consider for example that in the array `norm10` we want to replace all values above 9 with the value 0. We can do so by first finding the *mask* that indicates where this condition is true or false:
298 # Consider for example that in the array `norm10` we want to replace all values above 9 with the value 0. We can do so by first finding the *mask* that indicates where this condition is true or false:
299
299
300 # In[25]:
300 # In[25]:
301 mask = norm10 > 9
301 mask = norm10 > 9
302 mask
302 mask
303
303
304 # Out[25]:
304 # Out[25]:
305 # array([False, False, False, True, True], dtype=bool)
305 # array([False, False, False, True, True], dtype=bool)
306
306
307
307
308 # Now that we have this mask, we can use it to either read those values or to reset them to 0:
308 # Now that we have this mask, we can use it to either read those values or to reset them to 0:
309
309
310 # In[26]:
310 # In[26]:
311 print 'Values above 9:', norm10[mask]
311 print 'Values above 9:', norm10[mask]
312
312
313 # Out[26]:
313 # Out[26]:
314 # Values above 9: [ 12.14944165 11.56209294]
314 # Values above 9: [ 12.14944165 11.56209294]
315 #
315 #
316 # In[27]:
316 # In[27]:
317 print 'Resetting all values above 9 to 0...'
317 print 'Resetting all values above 9 to 0...'
318 norm10[mask] = 0
318 norm10[mask] = 0
319 print norm10
319 print norm10
320
320
321 # Out[27]:
321 # Out[27]:
322 # Resetting all values above 9 to 0...
322 # Resetting all values above 9 to 0...
323 # [ 8.94879575 5.53038269 8.24847281 0. 0. ]
323 # [ 8.94879575 5.53038269 8.24847281 0. 0. ]
324 #
324 #
325 ### Arrays with more than one dimension
325 ### Arrays with more than one dimension
326
326
327 # Up until now all our examples have used one-dimensional arrays. But Numpy can create arrays of aribtrary dimensions, and all the methods illustrated in the previous section work with more than one dimension. For example, a list of lists can be used to initialize a two dimensional array:
327 # Up until now all our examples have used one-dimensional arrays. But Numpy can create arrays of aribtrary dimensions, and all the methods illustrated in the previous section work with more than one dimension. For example, a list of lists can be used to initialize a two dimensional array:
328
328
329 # In[28]:
329 # In[28]:
330 lst2 = [[1, 2], [3, 4]]
330 lst2 = [[1, 2], [3, 4]]
331 arr2 = np.array([[1, 2], [3, 4]])
331 arr2 = np.array([[1, 2], [3, 4]])
332 arr2
332 arr2
333
333
334 # Out[28]:
334 # Out[28]:
335 # array([[1, 2],
335 # array([[1, 2],
336 # [3, 4]])
336 # [3, 4]])
337
337
338
338
339 # With two-dimensional arrays we start seeing the power of numpy: while a nested list can be indexed using repeatedly the `[ ]` operator, multidimensional arrays support a much more natural indexing syntax with a single `[ ]` and a set of indices separated by commas:
339 # With two-dimensional arrays we start seeing the power of numpy: while a nested list can be indexed using repeatedly the `[ ]` operator, multidimensional arrays support a much more natural indexing syntax with a single `[ ]` and a set of indices separated by commas:
340
340
341 # In[29]:
341 # In[29]:
342 print lst2[0][1]
342 print lst2[0][1]
343 print arr2[0,1]
343 print arr2[0,1]
344
344
345 # Out[29]:
345 # Out[29]:
346 # 2
346 # 2
347 # 2
347 # 2
348 #
348 #
349 # Most of the array creation functions listed above can be used with more than one dimension, for example:
349 # Most of the array creation functions listed above can be used with more than one dimension, for example:
350
350
351 # In[30]:
351 # In[30]:
352 np.zeros((2,3))
352 np.zeros((2,3))
353
353
354 # Out[30]:
354 # Out[30]:
355 # array([[ 0., 0., 0.],
355 # array([[ 0., 0., 0.],
356 # [ 0., 0., 0.]])
356 # [ 0., 0., 0.]])
357
357
358
358
359 # In[31]:
359 # In[31]:
360 np.random.normal(10, 3, (2, 4))
360 np.random.normal(10, 3, (2, 4))
361
361
362 # Out[31]:
362 # Out[31]:
363 # array([[ 11.26788826, 4.29619866, 11.09346496, 9.73861307],
363 # array([[ 11.26788826, 4.29619866, 11.09346496, 9.73861307],
364 # [ 10.54025996, 9.5146268 , 10.80367214, 13.62204505]])
364 # [ 10.54025996, 9.5146268 , 10.80367214, 13.62204505]])
365
365
366
366
367 # In fact, the shape of an array can be changed at any time, as long as the total number of elements is unchanged. For example, if we want a 2x4 array with numbers increasing from 0, the easiest way to create it is:
367 # In fact, the shape of an array can be changed at any time, as long as the total number of elements is unchanged. For example, if we want a 2x4 array with numbers increasing from 0, the easiest way to create it is:
368
368
369 # In[32]:
369 # In[32]:
370 arr = np.arange(8).reshape(2,4)
370 arr = np.arange(8).reshape(2,4)
371 print arr
371 print arr
372
372
373 # Out[32]:
373 # Out[32]:
374 # [[0 1 2 3]
374 # [[0 1 2 3]
375 # [4 5 6 7]]
375 # [4 5 6 7]]
376 #
376 #
377 # With multidimensional arrays, you can also use slices, and you can mix and match slices and single indices in the different dimensions (using the same array as above):
377 # With multidimensional arrays, you can also use slices, and you can mix and match slices and single indices in the different dimensions (using the same array as above):
378
378
379 # In[33]:
379 # In[33]:
380 print 'Slicing in the second row:', arr[1, 2:4]
380 print 'Slicing in the second row:', arr[1, 2:4]
381 print 'All rows, third column :', arr[:, 2]
381 print 'All rows, third column :', arr[:, 2]
382
382
383 # Out[33]:
383 # Out[33]:
384 # Slicing in the second row: [6 7]
384 # Slicing in the second row: [6 7]
385 # All rows, third column : [2 6]
385 # All rows, third column : [2 6]
386 #
386 #
387 # If you only provide one index, then you will get an array with one less dimension containing that row:
387 # If you only provide one index, then you will get an array with one less dimension containing that row:
388
388
389 # In[34]:
389 # In[34]:
390 print 'First row: ', arr[0]
390 print 'First row: ', arr[0]
391 print 'Second row: ', arr[1]
391 print 'Second row: ', arr[1]
392
392
393 # Out[34]:
393 # Out[34]:
394 # First row: [0 1 2 3]
394 # First row: [0 1 2 3]
395 # Second row: [4 5 6 7]
395 # Second row: [4 5 6 7]
396 #
396 #
397 # Now that we have seen how to create arrays with more than one dimension, it's a good idea to look at some of the most useful properties and methods that arrays have. The following provide basic information about the size, shape and data in the array:
397 # Now that we have seen how to create arrays with more than one dimension, it's a good idea to look at some of the most useful properties and methods that arrays have. The following provide basic information about the size, shape and data in the array:
398
398
399 # In[35]:
399 # In[35]:
400 print 'Data type :', arr.dtype
400 print 'Data type :', arr.dtype
401 print 'Total number of elements :', arr.size
401 print 'Total number of elements :', arr.size
402 print 'Number of dimensions :', arr.ndim
402 print 'Number of dimensions :', arr.ndim
403 print 'Shape (dimensionality) :', arr.shape
403 print 'Shape (dimensionality) :', arr.shape
404 print 'Memory used (in bytes) :', arr.nbytes
404 print 'Memory used (in bytes) :', arr.nbytes
405
405
406 # Out[35]:
406 # Out[35]:
407 # Data type : int32
407 # Data type : int32
408 # Total number of elements : 8
408 # Total number of elements : 8
409 # Number of dimensions : 2
409 # Number of dimensions : 2
410 # Shape (dimensionality) : (2, 4)
410 # Shape (dimensionality) : (2, 4)
411 # Memory used (in bytes) : 32
411 # Memory used (in bytes) : 32
412 #
412 #
413 # Arrays also have many useful methods, some especially useful ones are:
413 # Arrays also have many useful methods, some especially useful ones are:
414
414
415 # In[36]:
415 # In[36]:
416 print 'Minimum and maximum :', arr.min(), arr.max()
416 print 'Minimum and maximum :', arr.min(), arr.max()
417 print 'Sum and product of all elements :', arr.sum(), arr.prod()
417 print 'Sum and product of all elements :', arr.sum(), arr.prod()
418 print 'Mean and standard deviation :', arr.mean(), arr.std()
418 print 'Mean and standard deviation :', arr.mean(), arr.std()
419
419
420 # Out[36]:
420 # Out[36]:
421 # Minimum and maximum : 0 7
421 # Minimum and maximum : 0 7
422 # Sum and product of all elements : 28 0
422 # Sum and product of all elements : 28 0
423 # Mean and standard deviation : 3.5 2.29128784748
423 # Mean and standard deviation : 3.5 2.29128784748
424 #
424 #
425 # For these methods, the above operations area all computed on all the elements of the array. But for a multidimensional array, it's possible to do the computation along a single dimension, by passing the `axis` parameter; for example:
425 # For these methods, the above operations area all computed on all the elements of the array. But for a multidimensional array, it's possible to do the computation along a single dimension, by passing the `axis` parameter; for example:
426
426
427 # In[37]:
427 # In[37]:
428 print 'For the following array:\n', arr
428 print 'For the following array:\n', arr
429 print 'The sum of elements along the rows is :', arr.sum(axis=1)
429 print 'The sum of elements along the rows is :', arr.sum(axis=1)
430 print 'The sum of elements along the columns is :', arr.sum(axis=0)
430 print 'The sum of elements along the columns is :', arr.sum(axis=0)
431
431
432 # Out[37]:
432 # Out[37]:
433 # For the following array:
433 # For the following array:
434 # [[0 1 2 3]
434 # [[0 1 2 3]
435 # [4 5 6 7]]
435 # [4 5 6 7]]
436 # The sum of elements along the rows is : [ 6 22]
436 # The sum of elements along the rows is : [ 6 22]
437 # The sum of elements along the columns is : [ 4 6 8 10]
437 # The sum of elements along the columns is : [ 4 6 8 10]
438 #
438 #
439 # As you can see in this example, the value of the `axis` parameter is the dimension which will be *consumed* once the operation has been carried out. This is why to sum along the rows we use `axis=0`.
439 # As you can see in this example, the value of the `axis` parameter is the dimension which will be *consumed* once the operation has been carried out. This is why to sum along the rows we use `axis=0`.
440 #
440 #
441 # This can be easily illustrated with an example that has more dimensions; we create an array with 4 dimensions and shape `(3,4,5,6)` and sum along the axis number 2 (i.e. the *third* axis, since in Python all counts are 0-based). That consumes the dimension whose length was 5, leaving us with a new array that has shape `(3,4,6)`:
441 # This can be easily illustrated with an example that has more dimensions; we create an array with 4 dimensions and shape `(3,4,5,6)` and sum along the axis number 2 (i.e. the *third* axis, since in Python all counts are 0-based). That consumes the dimension whose length was 5, leaving us with a new array that has shape `(3,4,6)`:
442
442
443 # In[38]:
443 # In[38]:
444 np.zeros((3,4,5,6)).sum(2).shape
444 np.zeros((3,4,5,6)).sum(2).shape
445
445
446 # Out[38]:
446 # Out[38]:
447 # (3, 4, 6)
447 # (3, 4, 6)
448
448
449
449
450 # Another widely used property of arrays is the `.T` attribute, which allows you to access the transpose of the array:
450 # Another widely used property of arrays is the `.T` attribute, which allows you to access the transpose of the array:
451
451
452 # In[39]:
452 # In[39]:
453 print 'Array:\n', arr
453 print 'Array:\n', arr
454 print 'Transpose:\n', arr.T
454 print 'Transpose:\n', arr.T
455
455
456 # Out[39]:
456 # Out[39]:
457 # Array:
457 # Array:
458 # [[0 1 2 3]
458 # [[0 1 2 3]
459 # [4 5 6 7]]
459 # [4 5 6 7]]
460 # Transpose:
460 # Transpose:
461 # [[0 4]
461 # [[0 4]
462 # [1 5]
462 # [1 5]
463 # [2 6]
463 # [2 6]
464 # [3 7]]
464 # [3 7]]
465 #
465 #
466 # We don't have time here to look at all the methods and properties of arrays, here's a complete list. Simply try exploring some of these IPython to learn more, or read their description in the full Numpy documentation:
466 # We don't have time here to look at all the methods and properties of arrays, here's a complete list. Simply try exploring some of these IPython to learn more, or read their description in the full Numpy documentation:
467 #
467 #
468 # arr.T arr.copy arr.getfield arr.put arr.squeeze
468 # arr.T arr.copy arr.getfield arr.put arr.squeeze
469 # arr.all arr.ctypes arr.imag arr.ravel arr.std
469 # arr.all arr.ctypes arr.imag arr.ravel arr.std
470 # arr.any arr.cumprod arr.item arr.real arr.strides
470 # arr.any arr.cumprod arr.item arr.real arr.strides
471 # arr.argmax arr.cumsum arr.itemset arr.repeat arr.sum
471 # arr.argmax arr.cumsum arr.itemset arr.repeat arr.sum
472 # arr.argmin arr.data arr.itemsize arr.reshape arr.swapaxes
472 # arr.argmin arr.data arr.itemsize arr.reshape arr.swapaxes
473 # arr.argsort arr.diagonal arr.max arr.resize arr.take
473 # arr.argsort arr.diagonal arr.max arr.resize arr.take
474 # arr.astype arr.dot arr.mean arr.round arr.tofile
474 # arr.astype arr.dot arr.mean arr.round arr.tofile
475 # arr.base arr.dtype arr.min arr.searchsorted arr.tolist
475 # arr.base arr.dtype arr.min arr.searchsorted arr.tolist
476 # arr.byteswap arr.dump arr.nbytes arr.setasflat arr.tostring
476 # arr.byteswap arr.dump arr.nbytes arr.setasflat arr.tostring
477 # arr.choose arr.dumps arr.ndim arr.setfield arr.trace
477 # arr.choose arr.dumps arr.ndim arr.setfield arr.trace
478 # arr.clip arr.fill arr.newbyteorder arr.setflags arr.transpose
478 # arr.clip arr.fill arr.newbyteorder arr.setflags arr.transpose
479 # arr.compress arr.flags arr.nonzero arr.shape arr.var
479 # arr.compress arr.flags arr.nonzero arr.shape arr.var
480 # arr.conj arr.flat arr.prod arr.size arr.view
480 # arr.conj arr.flat arr.prod arr.size arr.view
481 # arr.conjugate arr.flatten arr.ptp arr.sort
481 # arr.conjugate arr.flatten arr.ptp arr.sort
482
482
483 ### Operating with arrays
483 ### Operating with arrays
484
484
485 # Arrays support all regular arithmetic operators, and the numpy library also contains a complete collection of basic mathematical functions that operate on arrays. It is important to remember that in general, all operations with arrays are applied *element-wise*, i.e., are applied to all the elements of the array at the same time. Consider for example:
485 # Arrays support all regular arithmetic operators, and the numpy library also contains a complete collection of basic mathematical functions that operate on arrays. It is important to remember that in general, all operations with arrays are applied *element-wise*, i.e., are applied to all the elements of the array at the same time. Consider for example:
486
486
487 # In[40]:
487 # In[40]:
488 arr1 = np.arange(4)
488 arr1 = np.arange(4)
489 arr2 = np.arange(10, 14)
489 arr2 = np.arange(10, 14)
490 print arr1, '+', arr2, '=', arr1+arr2
490 print arr1, '+', arr2, '=', arr1+arr2
491
491
492 # Out[40]:
492 # Out[40]:
493 # [0 1 2 3] + [10 11 12 13] = [10 12 14 16]
493 # [0 1 2 3] + [10 11 12 13] = [10 12 14 16]
494 #
494 #
495 # Importantly, you must remember that even the multiplication operator is by default applied element-wise, it is *not* the matrix multiplication from linear algebra (as is the case in Matlab, for example):
495 # Importantly, you must remember that even the multiplication operator is by default applied element-wise, it is *not* the matrix multiplication from linear algebra (as is the case in Matlab, for example):
496
496
497 # In[41]:
497 # In[41]:
498 print arr1, '*', arr2, '=', arr1*arr2
498 print arr1, '*', arr2, '=', arr1*arr2
499
499
500 # Out[41]:
500 # Out[41]:
501 # [0 1 2 3] * [10 11 12 13] = [ 0 11 24 39]
501 # [0 1 2 3] * [10 11 12 13] = [ 0 11 24 39]
502 #
502 #
503 # While this means that in principle arrays must always match in their dimensionality in order for an operation to be valid, numpy will *broadcast* dimensions when possible. For example, suppose that you want to add the number 1.5 to `arr1`; the following would be a valid way to do it:
503 # While this means that in principle arrays must always match in their dimensionality in order for an operation to be valid, numpy will *broadcast* dimensions when possible. For example, suppose that you want to add the number 1.5 to `arr1`; the following would be a valid way to do it:
504
504
505 # In[42]:
505 # In[42]:
506 arr1 + 1.5*np.ones(4)
506 arr1 + 1.5*np.ones(4)
507
507
508 # Out[42]:
508 # Out[42]:
509 # array([ 1.5, 2.5, 3.5, 4.5])
509 # array([ 1.5, 2.5, 3.5, 4.5])
510
510
511
511
512 # But thanks to numpy's broadcasting rules, the following is equally valid:
512 # But thanks to numpy's broadcasting rules, the following is equally valid:
513
513
514 # In[43]:
514 # In[43]:
515 arr1 + 1.5
515 arr1 + 1.5
516
516
517 # Out[43]:
517 # Out[43]:
518 # array([ 1.5, 2.5, 3.5, 4.5])
518 # array([ 1.5, 2.5, 3.5, 4.5])
519
519
520
520
521 # In this case, numpy looked at both operands and saw that the first (`arr1`) was a one-dimensional array of length 4 and the second was a scalar, considered a zero-dimensional object. The broadcasting rules allow numpy to:
521 # In this case, numpy looked at both operands and saw that the first (`arr1`) was a one-dimensional array of length 4 and the second was a scalar, considered a zero-dimensional object. The broadcasting rules allow numpy to:
522 #
522 #
523 # * *create* new dimensions of length 1 (since this doesn't change the size of the array)
523 # * *create* new dimensions of length 1 (since this doesn't change the size of the array)
524 # * 'stretch' a dimension of length 1 that needs to be matched to a dimension of a different size.
524 # * 'stretch' a dimension of length 1 that needs to be matched to a dimension of a different size.
525 #
525 #
526 # So in the above example, the scalar 1.5 is effectively:
526 # So in the above example, the scalar 1.5 is effectively:
527 #
527 #
528 # * first 'promoted' to a 1-dimensional array of length 1
528 # * first 'promoted' to a 1-dimensional array of length 1
529 # * then, this array is 'stretched' to length 4 to match the dimension of `arr1`.
529 # * then, this array is 'stretched' to length 4 to match the dimension of `arr1`.
530 #
530 #
531 # After these two operations are complete, the addition can proceed as now both operands are one-dimensional arrays of length 4.
531 # After these two operations are complete, the addition can proceed as now both operands are one-dimensional arrays of length 4.
532 #
532 #
533 # This broadcasting behavior is in practice enormously powerful, especially because when numpy broadcasts to create new dimensions or to 'stretch' existing ones, it doesn't actually replicate the data. In the example above the operation is carried *as if* the 1.5 was a 1-d array with 1.5 in all of its entries, but no actual array was ever created. This can save lots of memory in cases when the arrays in question are large and can have significant performance implications.
533 # This broadcasting behavior is in practice enormously powerful, especially because when numpy broadcasts to create new dimensions or to 'stretch' existing ones, it doesn't actually replicate the data. In the example above the operation is carried *as if* the 1.5 was a 1-d array with 1.5 in all of its entries, but no actual array was ever created. This can save lots of memory in cases when the arrays in question are large and can have significant performance implications.
534 #
534 #
535 # The general rule is: when operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward, creating dimensions of length 1 as needed. Two dimensions are considered compatible when
535 # The general rule is: when operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward, creating dimensions of length 1 as needed. Two dimensions are considered compatible when
536 #
536 #
537 # * they are equal to begin with, or
537 # * they are equal to begin with, or
538 # * one of them is 1; in this case numpy will do the 'stretching' to make them equal.
538 # * one of them is 1; in this case numpy will do the 'stretching' to make them equal.
539 #
539 #
540 # If these conditions are not met, a `ValueError: frames are not aligned` exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.
540 # If these conditions are not met, a `ValueError: frames are not aligned` exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.
541
541
542 # This shows how the broadcasting rules work in several dimensions:
542 # This shows how the broadcasting rules work in several dimensions:
543
543
544 # In[44]:
544 # In[44]:
545 b = np.array([2, 3, 4, 5])
545 b = np.array([2, 3, 4, 5])
546 print arr, '\n\n+', b , '\n----------------\n', arr + b
546 print arr, '\n\n+', b , '\n----------------\n', arr + b
547
547
548 # Out[44]:
548 # Out[44]:
549 # [[0 1 2 3]
549 # [[0 1 2 3]
550 # [4 5 6 7]]
550 # [4 5 6 7]]
551 #
551 #
552 # + [2 3 4 5]
552 # + [2 3 4 5]
553 # ----------------
553 # ----------------
554 # [[ 2 4 6 8]
554 # [[ 2 4 6 8]
555 # [ 6 8 10 12]]
555 # [ 6 8 10 12]]
556 #
556 #
557 # Now, how could you use broadcasting to say add `[4, 6]` along the rows to `arr` above? Simply performing the direct addition will produce the error we previously mentioned:
557 # Now, how could you use broadcasting to say add `[4, 6]` along the rows to `arr` above? Simply performing the direct addition will produce the error we previously mentioned:
558
558
559 # In[45]:
559 # In[45]:
560 c = np.array([4, 6])
560 c = np.array([4, 6])
561 arr + c
561 arr + c
562
562
563 # Out[45]:
563 # Out[45]:
564 ---------------------------------------------------------------------------
564 ---------------------------------------------------------------------------
565 ValueError Traceback (most recent call last)
565 ValueError Traceback (most recent call last)
566 /home/fperez/teach/book-math-labtool/<ipython-input-45-62aa20ac1980> in <module>()
566 /home/fperez/teach/book-math-labtool/<ipython-input-45-62aa20ac1980> in <module>()
567 1 c = np.array([4, 6])
567 1 c = np.array([4, 6])
568 ----> 2 arr + c
568 ----> 2 arr + c
569
569
570 ValueError: operands could not be broadcast together with shapes (2,4) (2)
570 ValueError: operands could not be broadcast together with shapes (2,4) (2)
571
571
572 # According to the rules above, the array `c` would need to have a *trailing* dimension of 1 for the broadcasting to work. It turns out that numpy allows you to 'inject' new dimensions anywhere into an array on the fly, by indexing it with the special object `np.newaxis`:
572 # According to the rules above, the array `c` would need to have a *trailing* dimension of 1 for the broadcasting to work. It turns out that numpy allows you to 'inject' new dimensions anywhere into an array on the fly, by indexing it with the special object `np.newaxis`:
573
573
574 # In[46]:
574 # In[46]:
575 (c[:, np.newaxis]).shape
575 (c[:, np.newaxis]).shape
576
576
577 # Out[46]:
577 # Out[46]:
578 # (2, 1)
578 # (2, 1)
579
579
580
580
581 # This is exactly what we need, and indeed it works:
581 # This is exactly what we need, and indeed it works:
582
582
583 # In[47]:
583 # In[47]:
584 arr + c[:, np.newaxis]
584 arr + c[:, np.newaxis]
585
585
586 # Out[47]:
586 # Out[47]:
587 # array([[ 4, 5, 6, 7],
587 # array([[ 4, 5, 6, 7],
588 # [10, 11, 12, 13]])
588 # [10, 11, 12, 13]])
589
589
590
590
591 # For the full broadcasting rules, please see the official Numpy docs, which describe them in detail and with more complex examples.
591 # For the full broadcasting rules, please see the official Numpy docs, which describe them in detail and with more complex examples.
592
592
593 # As we mentioned before, Numpy ships with a full complement of mathematical functions that work on entire arrays, including logarithms, exponentials, trigonometric and hyperbolic trigonometric functions, etc. Furthermore, scipy ships a rich special function library in the `scipy.special` module that includes Bessel, Airy, Fresnel, Laguerre and other classical special functions. For example, sampling the sine function at 100 points between $0$ and $2\pi$ is as simple as:
593 # As we mentioned before, Numpy ships with a full complement of mathematical functions that work on entire arrays, including logarithms, exponentials, trigonometric and hyperbolic trigonometric functions, etc. Furthermore, scipy ships a rich special function library in the `scipy.special` module that includes Bessel, Airy, Fresnel, Laguerre and other classical special functions. For example, sampling the sine function at 100 points between $0$ and $2\pi$ is as simple as:
594
594
595 # In[48]:
595 # In[48]:
596 x = np.linspace(0, 2*np.pi, 100)
596 x = np.linspace(0, 2*np.pi, 100)
597 y = np.sin(x)
597 y = np.sin(x)
598
598
599 ### Linear algebra in numpy
599 ### Linear algebra in numpy
600
600
601 # Numpy ships with a basic linear algebra library, and all arrays have a `dot` method whose behavior is that of the scalar dot product when its arguments are vectors (one-dimensional arrays) and the traditional matrix multiplication when one or both of its arguments are two-dimensional arrays:
601 # Numpy ships with a basic linear algebra library, and all arrays have a `dot` method whose behavior is that of the scalar dot product when its arguments are vectors (one-dimensional arrays) and the traditional matrix multiplication when one or both of its arguments are two-dimensional arrays:
602
602
603 # In[49]:
603 # In[49]:
604 v1 = np.array([2, 3, 4])
604 v1 = np.array([2, 3, 4])
605 v2 = np.array([1, 0, 1])
605 v2 = np.array([1, 0, 1])
606 print v1, '.', v2, '=', v1.dot(v2)
606 print v1, '.', v2, '=', v1.dot(v2)
607
607
608 # Out[49]:
608 # Out[49]:
609 # [2 3 4] . [1 0 1] = 6
609 # [2 3 4] . [1 0 1] = 6
610 #
610 #
611 # Here is a regular matrix-vector multiplication, note that the array `v1` should be viewed as a *column* vector in traditional linear algebra notation; numpy makes no distinction between row and column vectors and simply verifies that the dimensions match the required rules of matrix multiplication, in this case we have a $2 \times 3$ matrix multiplied by a 3-vector, which produces a 2-vector:
611 # Here is a regular matrix-vector multiplication, note that the array `v1` should be viewed as a *column* vector in traditional linear algebra notation; numpy makes no distinction between row and column vectors and simply verifies that the dimensions match the required rules of matrix multiplication, in this case we have a $2 \times 3$ matrix multiplied by a 3-vector, which produces a 2-vector:
612
612
613 # In[50]:
613 # In[50]:
614 A = np.arange(6).reshape(2, 3)
614 A = np.arange(6).reshape(2, 3)
615 print A, 'x', v1, '=', A.dot(v1)
615 print A, 'x', v1, '=', A.dot(v1)
616
616
617 # Out[50]:
617 # Out[50]:
618 # [[0 1 2]
618 # [[0 1 2]
619 # [3 4 5]] x [2 3 4] = [11 38]
619 # [3 4 5]] x [2 3 4] = [11 38]
620 #
620 #
621 # For matrix-matrix multiplication, the same dimension-matching rules must be satisfied, e.g. consider the difference between $A \times A^T$:
621 # For matrix-matrix multiplication, the same dimension-matching rules must be satisfied, e.g. consider the difference between $A \times A^T$:
622
622
623 # In[51]:
623 # In[51]:
624 print A.dot(A.T)
624 print A.dot(A.T)
625
625
626 # Out[51]:
626 # Out[51]:
627 # [[ 5 14]
627 # [[ 5 14]
628 # [14 50]]
628 # [14 50]]
629 #
629 #
630 # and $A^T \times A$:
630 # and $A^T \times A$:
631
631
632 # In[52]:
632 # In[52]:
633 print A.T.dot(A)
633 print A.T.dot(A)
634
634
635 # Out[52]:
635 # Out[52]:
636 # [[ 9 12 15]
636 # [[ 9 12 15]
637 # [12 17 22]
637 # [12 17 22]
638 # [15 22 29]]
638 # [15 22 29]]
639 #
639 #
640 # Furthermore, the `numpy.linalg` module includes additional functionality such as determinants, matrix norms, Cholesky, eigenvalue and singular value decompositions, etc. For even more linear algebra tools, `scipy.linalg` contains the majority of the tools in the classic LAPACK libraries as well as functions to operate on sparse matrices. We refer the reader to the Numpy and Scipy documentations for additional details on these.
640 # Furthermore, the `numpy.linalg` module includes additional functionality such as determinants, matrix norms, Cholesky, eigenvalue and singular value decompositions, etc. For even more linear algebra tools, `scipy.linalg` contains the majority of the tools in the classic LAPACK libraries as well as functions to operate on sparse matrices. We refer the reader to the Numpy and Scipy documentations for additional details on these.
641
641
642 ### Reading and writing arrays to disk
642 ### Reading and writing arrays to disk
643
643
644 # Numpy lets you read and write arrays into files in a number of ways. In order to use these tools well, it is critical to understand the difference between a *text* and a *binary* file containing numerical data. In a text file, the number $\pi$ could be written as "3.141592653589793", for example: a string of digits that a human can read, with in this case 15 decimal digits. In contrast, that same number written to a binary file would be encoded as 8 characters (bytes) that are not readable by a human but which contain the exact same data that the variable `pi` had in the computer's memory.
644 # Numpy lets you read and write arrays into files in a number of ways. In order to use these tools well, it is critical to understand the difference between a *text* and a *binary* file containing numerical data. In a text file, the number $\pi$ could be written as "3.141592653589793", for example: a string of digits that a human can read, with in this case 15 decimal digits. In contrast, that same number written to a binary file would be encoded as 8 characters (bytes) that are not readable by a human but which contain the exact same data that the variable `pi` had in the computer's memory.
645 #
645 #
646 # The tradeoffs between the two modes are thus:
646 # The tradeoffs between the two modes are thus:
647 #
647 #
648 # * Text mode: occupies more space, precision can be lost (if not all digits are written to disk), but is readable and editable by hand with a text editor. Can *only* be used for one- and two-dimensional arrays.
648 # * Text mode: occupies more space, precision can be lost (if not all digits are written to disk), but is readable and editable by hand with a text editor. Can *only* be used for one- and two-dimensional arrays.
649 #
649 #
650 # * Binary mode: compact and exact representation of the data in memory, can't be read or edited by hand. Arrays of any size and dimensionality can be saved and read without loss of information.
650 # * Binary mode: compact and exact representation of the data in memory, can't be read or edited by hand. Arrays of any size and dimensionality can be saved and read without loss of information.
651 #
651 #
652 # First, let's see how to read and write arrays in text mode. The `np.savetxt` function saves an array to a text file, with options to control the precision, separators and even adding a header:
652 # First, let's see how to read and write arrays in text mode. The `np.savetxt` function saves an array to a text file, with options to control the precision, separators and even adding a header:
653
653
654 # In[53]:
654 # In[53]:
655 arr = np.arange(10).reshape(2, 5)
655 arr = np.arange(10).reshape(2, 5)
656 np.savetxt('test.out', arr, fmt='%.2e', header="My dataset")
656 np.savetxt('test.out', arr, fmt='%.2e', header="My dataset")
657 !cat test.out
657 !cat test.out
658
658
659 # Out[53]:
659 # Out[53]:
660 # # My dataset
660 # # My dataset
661 # 0.00e+00 1.00e+00 2.00e+00 3.00e+00 4.00e+00
661 # 0.00e+00 1.00e+00 2.00e+00 3.00e+00 4.00e+00
662 # 5.00e+00 6.00e+00 7.00e+00 8.00e+00 9.00e+00
662 # 5.00e+00 6.00e+00 7.00e+00 8.00e+00 9.00e+00
663 #
663 #
664 # And this same type of file can then be read with the matching `np.loadtxt` function:
664 # And this same type of file can then be read with the matching `np.loadtxt` function:
665
665
666 # In[54]:
666 # In[54]:
667 arr2 = np.loadtxt('test.out')
667 arr2 = np.loadtxt('test.out')
668 print arr2
668 print arr2
669
669
670 # Out[54]:
670 # Out[54]:
671 # [[ 0. 1. 2. 3. 4.]
671 # [[ 0. 1. 2. 3. 4.]
672 # [ 5. 6. 7. 8. 9.]]
672 # [ 5. 6. 7. 8. 9.]]
673 #
673 #
674 # For binary data, Numpy provides the `np.save` and `np.savez` routines. The first saves a single array to a file with `.npy` extension, while the latter can be used to save a *group* of arrays into a single file with `.npz` extension. The files created with these routines can then be read with the `np.load` function.
674 # For binary data, Numpy provides the `np.save` and `np.savez` routines. The first saves a single array to a file with `.npy` extension, while the latter can be used to save a *group* of arrays into a single file with `.npz` extension. The files created with these routines can then be read with the `np.load` function.
675 #
675 #
676 # Let us first see how to use the simpler `np.save` function to save a single array:
676 # Let us first see how to use the simpler `np.save` function to save a single array:
677
677
678 # In[55]:
678 # In[55]:
679 np.save('test.npy', arr2)
679 np.save('test.npy', arr2)
680 # Now we read this back
680 # Now we read this back
681 arr2n = np.load('test.npy')
681 arr2n = np.load('test.npy')
682 # Let's see if any element is non-zero in the difference.
682 # Let's see if any element is non-zero in the difference.
683 # A value of True would be a problem.
683 # A value of True would be a problem.
684 print 'Any differences?', np.any(arr2-arr2n)
684 print 'Any differences?', np.any(arr2-arr2n)
685
685
686 # Out[55]:
686 # Out[55]:
687 # Any differences? False
687 # Any differences? False
688 #
688 #
689 # Now let us see how the `np.savez` function works. You give it a filename and either a sequence of arrays or a set of keywords. In the first mode, the function will auotmatically name the saved arrays in the archive as `arr_0`, `arr_1`, etc:
689 # Now let us see how the `np.savez` function works. You give it a filename and either a sequence of arrays or a set of keywords. In the first mode, the function will auotmatically name the saved arrays in the archive as `arr_0`, `arr_1`, etc:
690
690
691 # In[56]:
691 # In[56]:
692 np.savez('test.npz', arr, arr2)
692 np.savez('test.npz', arr, arr2)
693 arrays = np.load('test.npz')
693 arrays = np.load('test.npz')
694 arrays.files
694 arrays.files
695
695
696 # Out[56]:
696 # Out[56]:
697 # ['arr_1', 'arr_0']
697 # ['arr_1', 'arr_0']
698
698
699
699
700 # Alternatively, we can explicitly choose how to name the arrays we save:
700 # Alternatively, we can explicitly choose how to name the arrays we save:
701
701
702 # In[57]:
702 # In[57]:
703 np.savez('test.npz', array1=arr, array2=arr2)
703 np.savez('test.npz', array1=arr, array2=arr2)
704 arrays = np.load('test.npz')
704 arrays = np.load('test.npz')
705 arrays.files
705 arrays.files
706
706
707 # Out[57]:
707 # Out[57]:
708 # ['array2', 'array1']
708 # ['array2', 'array1']
709
709
710
710
711 # The object returned by `np.load` from an `.npz` file works like a dictionary, though you can also access its constituent files by attribute using its special `.f` field; this is best illustrated with an example with the `arrays` object from above:
711 # The object returned by `np.load` from an `.npz` file works like a dictionary, though you can also access its constituent files by attribute using its special `.f` field; this is best illustrated with an example with the `arrays` object from above:
712
712
713 # In[58]:
713 # In[58]:
714 print 'First row of first array:', arrays['array1'][0]
714 print 'First row of first array:', arrays['array1'][0]
715 # This is an equivalent way to get the same field
715 # This is an equivalent way to get the same field
716 print 'First row of first array:', arrays.f.array1[0]
716 print 'First row of first array:', arrays.f.array1[0]
717
717
718 # Out[58]:
718 # Out[58]:
719 # First row of first array: [0 1 2 3 4]
719 # First row of first array: [0 1 2 3 4]
720 # First row of first array: [0 1 2 3 4]
720 # First row of first array: [0 1 2 3 4]
721 #
721 #
722 # This `.npz` format is a very convenient way to package compactly and without loss of information, into a single file, a group of related arrays that pertain to a specific problem. At some point, however, the complexity of your dataset may be such that the optimal approach is to use one of the standard formats in scientific data processing that have been designed to handle complex datasets, such as NetCDF or HDF5.
722 # This `.npz` format is a very convenient way to package compactly and without loss of information, into a single file, a group of related arrays that pertain to a specific problem. At some point, however, the complexity of your dataset may be such that the optimal approach is to use one of the standard formats in scientific data processing that have been designed to handle complex datasets, such as NetCDF or HDF5.
723 #
723 #
724 # Fortunately, there are tools for manipulating these formats in Python, and for storing data in other ways such as databases. A complete discussion of the possibilities is beyond the scope of this discussion, but of particular interest for scientific users we at least mention the following:
724 # Fortunately, there are tools for manipulating these formats in Python, and for storing data in other ways such as databases. A complete discussion of the possibilities is beyond the scope of this discussion, but of particular interest for scientific users we at least mention the following:
725 #
725 #
726 # * The `scipy.io` module contains routines to read and write Matlab files in `.mat` format and files in the NetCDF format that is widely used in certain scientific disciplines.
726 # * The `scipy.io` module contains routines to read and write Matlab files in `.mat` format and files in the NetCDF format that is widely used in certain scientific disciplines.
727 #
727 #
728 # * For manipulating files in the HDF5 format, there are two excellent options in Python: The PyTables project offers a high-level, object oriented approach to manipulating HDF5 datasets, while the h5py project offers a more direct mapping to the standard HDF5 library interface. Both are excellent tools; if you need to work with HDF5 datasets you should read some of their documentation and examples and decide which approach is a better match for your needs.
728 # * For manipulating files in the HDF5 format, there are two excellent options in Python: The PyTables project offers a high-level, object oriented approach to manipulating HDF5 datasets, while the h5py project offers a more direct mapping to the standard HDF5 library interface. Both are excellent tools; if you need to work with HDF5 datasets you should read some of their documentation and examples and decide which approach is a better match for your needs.
729
729
730 ## High quality data visualization with Matplotlib
730 ## High quality data visualization with Matplotlib
731
731
732 # The [matplotlib](http://matplotlib.sf.net) library is a powerful tool capable of producing complex publication-quality figures with fine layout control in two and three dimensions; here we will only provide a minimal self-contained introduction to its usage that covers the functionality needed for the rest of the book. We encourage the reader to read the tutorials included with the matplotlib documentation as well as to browse its extensive gallery of examples that include source code.
732 # The [matplotlib](http://matplotlib.sf.net) library is a powerful tool capable of producing complex publication-quality figures with fine layout control in two and three dimensions; here we will only provide a minimal self-contained introduction to its usage that covers the functionality needed for the rest of the book. We encourage the reader to read the tutorials included with the matplotlib documentation as well as to browse its extensive gallery of examples that include source code.
733 #
733 #
734 # Just as we typically use the shorthand `np` for Numpy, we will use `plt` for the `matplotlib.pyplot` module where the easy-to-use plotting functions reside (the library contains a rich object-oriented architecture that we don't have the space to discuss here):
734 # Just as we typically use the shorthand `np` for Numpy, we will use `plt` for the `matplotlib.pyplot` module where the easy-to-use plotting functions reside (the library contains a rich object-oriented architecture that we don't have the space to discuss here):
735
735
736 # In[59]:
736 # In[59]:
737 import matplotlib.pyplot as plt
737 import matplotlib.pyplot as plt
738
738
739 # The most frequently used function is simply called `plot`, here is how you can make a simple plot of $\sin(x)$ for $x \in [0, 2\pi]$ with labels and a grid (we use the semicolon in the last line to suppress the display of some information that is unnecessary right now):
739 # The most frequently used function is simply called `plot`, here is how you can make a simple plot of $\sin(x)$ for $x \in [0, 2\pi]$ with labels and a grid (we use the semicolon in the last line to suppress the display of some information that is unnecessary right now):
740
740
741 # In[60]:
741 # In[60]:
742 x = np.linspace(0, 2*np.pi)
742 x = np.linspace(0, 2*np.pi)
743 y = np.sin(x)
743 y = np.sin(x)
744 plt.plot(x,y, label='sin(x)')
744 plt.plot(x,y, label='sin(x)')
745 plt.legend()
745 plt.legend()
746 plt.grid()
746 plt.grid()
747 plt.title('Harmonic')
747 plt.title('Harmonic')
748 plt.xlabel('x')
748 plt.xlabel('x')
749 plt.ylabel('y');
749 plt.ylabel('y');
750
750
751 # Out[60]:
751 # Out[60]:
752 # image file: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_01.svg
752 # image file: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_01.svg
753
753
754 # You can control the style, color and other properties of the markers, for example:
754 # You can control the style, color and other properties of the markers, for example:
755
755
756 # In[61]:
756 # In[61]:
757 plt.plot(x, y, linewidth=2);
757 plt.plot(x, y, linewidth=2);
758
758
759 # Out[61]:
759 # Out[61]:
760 # image file: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_02.svg
760 # image file: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_02.svg
761
761
762 # In[62]:
762 # In[62]:
763 plt.plot(x, y, 'o', markersize=5, color='r');
763 plt.plot(x, y, 'o', markersize=5, color='r');
764
764
765 # Out[62]:
765 # Out[62]:
766 # image file: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_03.svg
766 # image file: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_03.svg
767
767
768 # We will now see how to create a few other common plot types, such as a simple error plot:
768 # We will now see how to create a few other common plot types, such as a simple error plot:
769
769
770 # In[63]:
770 # In[63]:
771 # example data
771 # example data
772 x = np.arange(0.1, 4, 0.5)
772 x = np.arange(0.1, 4, 0.5)
773 y = np.exp(-x)
773 y = np.exp(-x)
774
774
775 # example variable error bar values
775 # example variable error bar values
776 yerr = 0.1 + 0.2*np.sqrt(x)
776 yerr = 0.1 + 0.2*np.sqrt(x)
777 xerr = 0.1 + yerr
777 xerr = 0.1 + yerr
778
778
779 # First illustrate basic pyplot interface, using defaults where possible.
779 # First illustrate basic pyplot interface, using defaults where possible.
780 plt.figure()
780 plt.figure()
781 plt.errorbar(x, y, xerr=0.2, yerr=0.4)
781 plt.errorbar(x, y, xerr=0.2, yerr=0.4)
782 plt.title("Simplest errorbars, 0.2 in x, 0.4 in y");
782 plt.title("Simplest errorbars, 0.2 in x, 0.4 in y");
783
783
784 # Out[63]:
784 # Out[63]:
785 # image file: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_04.svg
785 # image file: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_04.svg
786
786
787 # A simple log plot
787 # A simple log plot
788
788
789 # In[64]:
789 # In[64]:
790 x = np.linspace(-5, 5)
790 x = np.linspace(-5, 5)
791 y = np.exp(-x**2)
791 y = np.exp(-x**2)
792 plt.semilogy(x, y);
792 plt.semilogy(x, y);
793
793
794 # Out[64]:
794 # Out[64]:
795 # image file: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_05.svg
795 # image file: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_05.svg
796
796
797 # A histogram annotated with text inside the plot, using the `text` function:
797 # A histogram annotated with text inside the plot, using the `text` function:
798
798
799 # In[65]:
799 # In[65]:
800 mu, sigma = 100, 15
800 mu, sigma = 100, 15
801 x = mu + sigma * np.random.randn(10000)
801 x = mu + sigma * np.random.randn(10000)
802
802
803 # the histogram of the data
803 # the histogram of the data
804 n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)
804 n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)
805
805
806 plt.xlabel('Smarts')
806 plt.xlabel('Smarts')
807 plt.ylabel('Probability')
807 plt.ylabel('Probability')
808 plt.title('Histogram of IQ')
808 plt.title('Histogram of IQ')
809 # This will put a text fragment at the position given:
809 # This will put a text fragment at the position given:
810 plt.text(55, .027, r'$\mu=100,\ \sigma=15$', fontsize=14)
810 plt.text(55, .027, r'$\mu=100,\ \sigma=15$', fontsize=14)
811 plt.axis([40, 160, 0, 0.03])
811 plt.axis([40, 160, 0, 0.03])
812 plt.grid(True)
812 plt.grid(True)
813
813
814 # Out[65]:
814 # Out[65]:
815 # image file: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_06.svg
815 # image file: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_06.svg
816
816
817 ### Image display
817 ### Image display
818
818
819 # The `imshow` command can display single or multi-channel images. A simple array of random numbers, plotted in grayscale:
819 # The `imshow` command can display single or multi-channel images. A simple array of random numbers, plotted in grayscale:
820
820
821 # In[66]:
821 # In[66]:
822 from matplotlib import cm
822 from matplotlib import cm
823 plt.imshow(np.random.rand(5, 10), cmap=cm.gray, interpolation='nearest');
823 plt.imshow(np.random.rand(5, 10), cmap=cm.gray, interpolation='nearest');
824
824
825 # Out[66]:
825 # Out[66]:
826 # image file: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_07.svg
826 # image file: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_07.svg
827
827
828 # A real photograph is a multichannel image, `imshow` interprets it correctly:
828 # A real photograph is a multichannel image, `imshow` interprets it correctly:
829
829
830 # In[67]:
830 # In[67]:
831 img = plt.imread('stinkbug.png')
831 img = plt.imread('stinkbug.png')
832 print 'Dimensions of the array img:', img.shape
832 print 'Dimensions of the array img:', img.shape
833 plt.imshow(img);
833 plt.imshow(img);
834
834
835 # Out[67]:
835 # Out[67]:
836 # Dimensions of the array img: (375, 500, 3)
836 # Dimensions of the array img: (375, 500, 3)
837 #
837 #
838 # image file: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_08.svg
838 # image file: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_08.svg
839
839
840 ### Simple 3d plotting with matplotlib
840 ### Simple 3d plotting with matplotlib
841
841
842 # Note that you must execute at least once in your session:
842 # Note that you must execute at least once in your session:
843
843
844 # In[68]:
844 # In[68]:
845 from mpl_toolkits.mplot3d import Axes3D
845 from mpl_toolkits.mplot3d import Axes3D
846
846
847 # One this has been done, you can create 3d axes with the `projection='3d'` keyword to `add_subplot`:
847 # One this has been done, you can create 3d axes with the `projection='3d'` keyword to `add_subplot`:
848 #
848 #
849 # fig = plt.figure()
849 # fig = plt.figure()
850 # fig.add_subplot(<other arguments here>, projection='3d')
850 # fig.add_subplot(<other arguments here>, projection='3d')
851
851
852 # A simple surface plot:
852 # A simple surface plot:
853
853
854 # In[72]:
854 # In[72]:
855 from mpl_toolkits.mplot3d.axes3d import Axes3D
855 from mpl_toolkits.mplot3d.axes3d import Axes3D
856 from matplotlib import cm
856 from matplotlib import cm
857
857
858 fig = plt.figure()
858 fig = plt.figure()
859 ax = fig.add_subplot(1, 1, 1, projection='3d')
859 ax = fig.add_subplot(1, 1, 1, projection='3d')
860 X = np.arange(-5, 5, 0.25)
860 X = np.arange(-5, 5, 0.25)
861 Y = np.arange(-5, 5, 0.25)
861 Y = np.arange(-5, 5, 0.25)
862 X, Y = np.meshgrid(X, Y)
862 X, Y = np.meshgrid(X, Y)
863 R = np.sqrt(X**2 + Y**2)
863 R = np.sqrt(X**2 + Y**2)
864 Z = np.sin(R)
864 Z = np.sin(R)
865 surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet,
865 surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet,
866 linewidth=0, antialiased=False)
866 linewidth=0, antialiased=False)
867 ax.set_zlim3d(-1.01, 1.01);
867 ax.set_zlim3d(-1.01, 1.01);
868
868
869 # Out[72]:
869 # Out[72]:
870 # image file: tests/ipynbref/IntroNumPy.orig_files/IntroNumPy.orig_fig_09.svg
870 # image file: tests/ipynbref/IntroNumPy_orig_files/IntroNumPy_orig_fig_09.svg
871
871
872 ## IPython: a powerful interactive environment
872 ## IPython: a powerful interactive environment
873
873
874 # A key component of the everyday workflow of most scientific computing environments is a good interactive environment, that is, a system in which you can execute small amounts of code and view the results immediately, combining both printing out data and opening graphical visualizations. All modern systems for scientific computing, commercial and open source, include such functionality.
874 # A key component of the everyday workflow of most scientific computing environments is a good interactive environment, that is, a system in which you can execute small amounts of code and view the results immediately, combining both printing out data and opening graphical visualizations. All modern systems for scientific computing, commercial and open source, include such functionality.
875 #
875 #
876 # Out of the box, Python also offers a simple interactive shell with very limited capabilities. But just like the scientific community built Numpy to provide arrays suited for scientific work (since Pytyhon's lists aren't optimal for this task), it has also developed an interactive environment much more sophisticated than the built-in one. The [IPython project](http://ipython.org) offers a set of tools to make productive use of the Python language, all the while working interactively and with immedate feedback on your results. The basic tools that IPython provides are:
876 # Out of the box, Python also offers a simple interactive shell with very limited capabilities. But just like the scientific community built Numpy to provide arrays suited for scientific work (since Pytyhon's lists aren't optimal for this task), it has also developed an interactive environment much more sophisticated than the built-in one. The [IPython project](http://ipython.org) offers a set of tools to make productive use of the Python language, all the while working interactively and with immedate feedback on your results. The basic tools that IPython provides are:
877 #
877 #
878 # 1. A powerful terminal shell, with many features designed to increase the fluidity and productivity of everyday scientific workflows, including:
878 # 1. A powerful terminal shell, with many features designed to increase the fluidity and productivity of everyday scientific workflows, including:
879 #
879 #
880 # * rich introspection of all objects and variables including easy access to the source code of any function
880 # * rich introspection of all objects and variables including easy access to the source code of any function
881 # * powerful and extensible tab completion of variables and filenames,
881 # * powerful and extensible tab completion of variables and filenames,
882 # * tight integration with matplotlib, supporting interactive figures that don't block the terminal,
882 # * tight integration with matplotlib, supporting interactive figures that don't block the terminal,
883 # * direct access to the filesystem and underlying operating system,
883 # * direct access to the filesystem and underlying operating system,
884 # * an extensible system for shell-like commands called 'magics' that reduce the work needed to perform many common tasks,
884 # * an extensible system for shell-like commands called 'magics' that reduce the work needed to perform many common tasks,
885 # * tools for easily running, timing, profiling and debugging your codes,
885 # * tools for easily running, timing, profiling and debugging your codes,
886 # * syntax highlighted error messages with much more detail than the default Python ones,
886 # * syntax highlighted error messages with much more detail than the default Python ones,
887 # * logging and access to all previous history of inputs, including across sessions
887 # * logging and access to all previous history of inputs, including across sessions
888 #
888 #
889 # 2. A Qt console that provides the look and feel of a terminal, but adds support for inline figures, graphical calltips, a persistent session that can survive crashes (even segfaults) of the kernel process, and more.
889 # 2. A Qt console that provides the look and feel of a terminal, but adds support for inline figures, graphical calltips, a persistent session that can survive crashes (even segfaults) of the kernel process, and more.
890 #
890 #
891 # 3. A web-based notebook that can execute code and also contain rich text and figures, mathematical equations and arbitrary HTML. This notebook presents a document-like view with cells where code is executed but that can be edited in-place, reordered, mixed with explanatory text and figures, etc.
891 # 3. A web-based notebook that can execute code and also contain rich text and figures, mathematical equations and arbitrary HTML. This notebook presents a document-like view with cells where code is executed but that can be edited in-place, reordered, mixed with explanatory text and figures, etc.
892 #
892 #
893 # 4. A high-performance, low-latency system for parallel computing that supports the control of a cluster of IPython engines communicating over a network, with optimizations that minimize unnecessary copying of large objects (especially numpy arrays).
893 # 4. A high-performance, low-latency system for parallel computing that supports the control of a cluster of IPython engines communicating over a network, with optimizations that minimize unnecessary copying of large objects (especially numpy arrays).
894 #
894 #
895 # We will now discuss the highlights of the tools 1-3 above so that you can make them an effective part of your workflow. The topic of parallel computing is beyond the scope of this document, but we encourage you to read the extensive [documentation](http://ipython.org/ipython-doc/rel-0.12.1/parallel/index.html) and [tutorials](http://minrk.github.com/scipy-tutorial-2011/) on this available on the IPython website.
895 # We will now discuss the highlights of the tools 1-3 above so that you can make them an effective part of your workflow. The topic of parallel computing is beyond the scope of this document, but we encourage you to read the extensive [documentation](http://ipython.org/ipython-doc/rel-0.12.1/parallel/index.html) and [tutorials](http://minrk.github.com/scipy-tutorial-2011/) on this available on the IPython website.
896
896
897 ### The IPython terminal
897 ### The IPython terminal
898
898
899 # You can start IPython at the terminal simply by typing:
899 # You can start IPython at the terminal simply by typing:
900 #
900 #
901 # $ ipython
901 # $ ipython
902 #
902 #
903 # which will provide you some basic information about how to get started and will then open a prompt labeled `In [1]:` for you to start typing. Here we type $2^{64}$ and Python computes the result for us in exact arithmetic, returning it as `Out[1]`:
903 # which will provide you some basic information about how to get started and will then open a prompt labeled `In [1]:` for you to start typing. Here we type $2^{64}$ and Python computes the result for us in exact arithmetic, returning it as `Out[1]`:
904 #
904 #
905 # $ ipython
905 # $ ipython
906 # Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
906 # Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
907 # Type "copyright", "credits" or "license" for more information.
907 # Type "copyright", "credits" or "license" for more information.
908 #
908 #
909 # IPython 0.13.dev -- An enhanced Interactive Python.
909 # IPython 0.13.dev -- An enhanced Interactive Python.
910 # ? -> Introduction and overview of IPython's features.
910 # ? -> Introduction and overview of IPython's features.
911 # %quickref -> Quick reference.
911 # %quickref -> Quick reference.
912 # help -> Python's own help system.
912 # help -> Python's own help system.
913 # object? -> Details about 'object', use 'object??' for extra details.
913 # object? -> Details about 'object', use 'object??' for extra details.
914 #
914 #
915 # In [1]: 2**64
915 # In [1]: 2**64
916 # Out[1]: 18446744073709551616L
916 # Out[1]: 18446744073709551616L
917 #
917 #
918 # The first thing you should know about IPython is that all your inputs and outputs are saved. There are two variables named `In` and `Out` which are filled as you work with your results. Furthermore, all outputs are also saved to auto-created variables of the form `_NN` where `NN` is the prompt number, and inputs to `_iNN`. This allows you to recover quickly the result of a prior computation by referring to its number even if you forgot to store it as a variable. For example, later on in the above session you can do:
918 # The first thing you should know about IPython is that all your inputs and outputs are saved. There are two variables named `In` and `Out` which are filled as you work with your results. Furthermore, all outputs are also saved to auto-created variables of the form `_NN` where `NN` is the prompt number, and inputs to `_iNN`. This allows you to recover quickly the result of a prior computation by referring to its number even if you forgot to store it as a variable. For example, later on in the above session you can do:
919 #
919 #
920 # In [6]: print _1
920 # In [6]: print _1
921 # 18446744073709551616
921 # 18446744073709551616
922
922
923 # We strongly recommend that you take a few minutes to read at least the basic introduction provided by the `?` command, and keep in mind that the `%quickref` command at all times can be used as a quick reference "cheat sheet" of the most frequently used features of IPython.
923 # We strongly recommend that you take a few minutes to read at least the basic introduction provided by the `?` command, and keep in mind that the `%quickref` command at all times can be used as a quick reference "cheat sheet" of the most frequently used features of IPython.
924 #
924 #
925 # At the IPython prompt, any valid Python code that you type will be executed similarly to the default Python shell (though often with more informative feedback). But since IPython is a *superset* of the default Python shell; let's have a brief look at some of its additional functionality.
925 # At the IPython prompt, any valid Python code that you type will be executed similarly to the default Python shell (though often with more informative feedback). But since IPython is a *superset* of the default Python shell; let's have a brief look at some of its additional functionality.
926
926
927 # **Object introspection**
927 # **Object introspection**
928 #
928 #
929 # A simple `?` command provides a general introduction to IPython, but as indicated in the banner above, you can use the `?` syntax to ask for details about any object. For example, if we type `_1?`, IPython will print the following details about this variable:
929 # A simple `?` command provides a general introduction to IPython, but as indicated in the banner above, you can use the `?` syntax to ask for details about any object. For example, if we type `_1?`, IPython will print the following details about this variable:
930 #
930 #
931 # In [14]: _1?
931 # In [14]: _1?
932 # Type: long
932 # Type: long
933 # Base Class: <type 'long'>
933 # Base Class: <type 'long'>
934 # String Form:18446744073709551616
934 # String Form:18446744073709551616
935 # Namespace: Interactive
935 # Namespace: Interactive
936 # Docstring:
936 # Docstring:
937 # long(x[, base]) -> integer
937 # long(x[, base]) -> integer
938 #
938 #
939 # Convert a string or number to a long integer, if possible. A floating
939 # Convert a string or number to a long integer, if possible. A floating
940 #
940 #
941 # [etc... snipped for brevity]
941 # [etc... snipped for brevity]
942 #
942 #
943 # If you add a second `?` and for any oobject `x` type `x??`, IPython will try to provide an even more detailed analsysi of the object, including its syntax-highlighted source code when it can be found. It's possible that `x??` returns the same information as `x?`, but in many cases `x??` will indeed provide additional details.
943 # If you add a second `?` and for any oobject `x` type `x??`, IPython will try to provide an even more detailed analsysi of the object, including its syntax-highlighted source code when it can be found. It's possible that `x??` returns the same information as `x?`, but in many cases `x??` will indeed provide additional details.
944 #
944 #
945 # Finally, the `?` syntax is also useful to search *namespaces* with wildcards. Suppose you are wondering if there is any function in Numpy that may do text-related things; with `np.*txt*?`, IPython will print all the names in the `np` namespace (our Numpy shorthand) that have 'txt' anywhere in their name:
945 # Finally, the `?` syntax is also useful to search *namespaces* with wildcards. Suppose you are wondering if there is any function in Numpy that may do text-related things; with `np.*txt*?`, IPython will print all the names in the `np` namespace (our Numpy shorthand) that have 'txt' anywhere in their name:
946 #
946 #
947 # In [17]: np.*txt*?
947 # In [17]: np.*txt*?
948 # np.genfromtxt
948 # np.genfromtxt
949 # np.loadtxt
949 # np.loadtxt
950 # np.mafromtxt
950 # np.mafromtxt
951 # np.ndfromtxt
951 # np.ndfromtxt
952 # np.recfromtxt
952 # np.recfromtxt
953 # np.savetxt
953 # np.savetxt
954
954
955 # **Tab completion**
955 # **Tab completion**
956 #
956 #
957 # IPython makes the tab key work extra hard for you as a way to rapidly inspect objects and libraries. Whenever you have typed something at the prompt, by hitting the `<tab>` key IPython will try to complete the rest of the line. For this, IPython will analyze the text you had so far and try to search for Python data or files that may match the context you have already provided.
957 # IPython makes the tab key work extra hard for you as a way to rapidly inspect objects and libraries. Whenever you have typed something at the prompt, by hitting the `<tab>` key IPython will try to complete the rest of the line. For this, IPython will analyze the text you had so far and try to search for Python data or files that may match the context you have already provided.
958 #
958 #
959 # For example, if you type `np.load` and hit the <tab> key, you'll see:
959 # For example, if you type `np.load` and hit the <tab> key, you'll see:
960 #
960 #
961 # In [21]: np.load<TAB HERE>
961 # In [21]: np.load<TAB HERE>
962 # np.load np.loads np.loadtxt
962 # np.load np.loads np.loadtxt
963 #
963 #
964 # so you can quickly find all the load-related functionality in numpy. Tab completion works even for function arguments, for example consider this function definition:
964 # so you can quickly find all the load-related functionality in numpy. Tab completion works even for function arguments, for example consider this function definition:
965 #
965 #
966 # In [20]: def f(x, frobinate=False):
966 # In [20]: def f(x, frobinate=False):
967 # ....: if frobinate:
967 # ....: if frobinate:
968 # ....: return x**2
968 # ....: return x**2
969 # ....:
969 # ....:
970 #
970 #
971 # If you now use the `<tab>` key after having typed 'fro' you'll get all valid Python completions, but those marked with `=` at the end are known to be keywords of your function:
971 # If you now use the `<tab>` key after having typed 'fro' you'll get all valid Python completions, but those marked with `=` at the end are known to be keywords of your function:
972 #
972 #
973 # In [21]: f(2, fro<TAB HERE>
973 # In [21]: f(2, fro<TAB HERE>
974 # frobinate= frombuffer fromfunction frompyfunc fromstring
974 # frobinate= frombuffer fromfunction frompyfunc fromstring
975 # from fromfile fromiter fromregex frozenset
975 # from fromfile fromiter fromregex frozenset
976 #
976 #
977 # at this point you can add the `b` letter and hit `<tab>` once more, and IPython will finish the line for you:
977 # at this point you can add the `b` letter and hit `<tab>` once more, and IPython will finish the line for you:
978 #
978 #
979 # In [21]: f(2, frobinate=
979 # In [21]: f(2, frobinate=
980 #
980 #
981 # As a beginner, simply get into the habit of using `<tab>` after most objects; it should quickly become second nature as you will see how helps keep a fluid workflow and discover useful information. Later on you can also customize this behavior by writing your own completion code, if you so desire.
981 # As a beginner, simply get into the habit of using `<tab>` after most objects; it should quickly become second nature as you will see how helps keep a fluid workflow and discover useful information. Later on you can also customize this behavior by writing your own completion code, if you so desire.
982
982
983 # **Matplotlib integration**
983 # **Matplotlib integration**
984 #
984 #
985 # One of the most useful features of IPython for scientists is its tight integration with matplotlib: at the terminal IPython lets you open matplotlib figures without blocking your typing (which is what happens if you try to do the same thing at the default Python shell), and in the Qt console and notebook you can even view your figures embedded in your workspace next to the code that created them.
985 # One of the most useful features of IPython for scientists is its tight integration with matplotlib: at the terminal IPython lets you open matplotlib figures without blocking your typing (which is what happens if you try to do the same thing at the default Python shell), and in the Qt console and notebook you can even view your figures embedded in your workspace next to the code that created them.
986 #
986 #
987 # The matplotlib support can be either activated when you start IPython by passing the `--pylab` flag, or at any point later in your session by using the `%pylab` command. If you start IPython with `--pylab`, you'll see something like this (note the extra message about pylab):
987 # The matplotlib support can be either activated when you start IPython by passing the `--pylab` flag, or at any point later in your session by using the `%pylab` command. If you start IPython with `--pylab`, you'll see something like this (note the extra message about pylab):
988 #
988 #
989 # $ ipython --pylab
989 # $ ipython --pylab
990 # Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
990 # Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
991 # Type "copyright", "credits" or "license" for more information.
991 # Type "copyright", "credits" or "license" for more information.
992 #
992 #
993 # IPython 0.13.dev -- An enhanced Interactive Python.
993 # IPython 0.13.dev -- An enhanced Interactive Python.
994 # ? -> Introduction and overview of IPython's features.
994 # ? -> Introduction and overview of IPython's features.
995 # %quickref -> Quick reference.
995 # %quickref -> Quick reference.
996 # help -> Python's own help system.
996 # help -> Python's own help system.
997 # object? -> Details about 'object', use 'object??' for extra details.
997 # object? -> Details about 'object', use 'object??' for extra details.
998 #
998 #
999 # Welcome to pylab, a matplotlib-based Python environment [backend: Qt4Agg].
999 # Welcome to pylab, a matplotlib-based Python environment [backend: Qt4Agg].
1000 # For more information, type 'help(pylab)'.
1000 # For more information, type 'help(pylab)'.
1001 #
1001 #
1002 # In [1]:
1002 # In [1]:
1003 #
1003 #
1004 # Furthermore, IPython will import `numpy` with the `np` shorthand, `matplotlib.pyplot` as `plt`, and it will also load all of the numpy and pyplot top-level names so that you can directly type something like:
1004 # Furthermore, IPython will import `numpy` with the `np` shorthand, `matplotlib.pyplot` as `plt`, and it will also load all of the numpy and pyplot top-level names so that you can directly type something like:
1005 #
1005 #
1006 # In [1]: x = linspace(0, 2*pi, 200)
1006 # In [1]: x = linspace(0, 2*pi, 200)
1007 #
1007 #
1008 # In [2]: plot(x, sin(x))
1008 # In [2]: plot(x, sin(x))
1009 # Out[2]: [<matplotlib.lines.Line2D at 0x9e7c16c>]
1009 # Out[2]: [<matplotlib.lines.Line2D at 0x9e7c16c>]
1010 #
1010 #
1011 # instead of having to prefix each call with its full signature (as we have been doing in the examples thus far):
1011 # instead of having to prefix each call with its full signature (as we have been doing in the examples thus far):
1012 #
1012 #
1013 # In [3]: x = np.linspace(0, 2*np.pi, 200)
1013 # In [3]: x = np.linspace(0, 2*np.pi, 200)
1014 #
1014 #
1015 # In [4]: plt.plot(x, np.sin(x))
1015 # In [4]: plt.plot(x, np.sin(x))
1016 # Out[4]: [<matplotlib.lines.Line2D at 0x9e900ac>]
1016 # Out[4]: [<matplotlib.lines.Line2D at 0x9e900ac>]
1017 #
1017 #
1018 # This shorthand notation can be a huge time-saver when working interactively (it's a few characters but you are likely to type them hundreds of times in a session). But we should note that as you develop persistent scripts and notebooks meant for reuse, it's best to get in the habit of using the longer notation (known as *fully qualified names* as it's clearer where things come from and it makes for more robust, readable and maintainable code in the long run).
1018 # This shorthand notation can be a huge time-saver when working interactively (it's a few characters but you are likely to type them hundreds of times in a session). But we should note that as you develop persistent scripts and notebooks meant for reuse, it's best to get in the habit of using the longer notation (known as *fully qualified names* as it's clearer where things come from and it makes for more robust, readable and maintainable code in the long run).
1019
1019
1020 # **Access to the operating system and files**
1020 # **Access to the operating system and files**
1021 #
1021 #
1022 # In IPython, you can type `ls` to see your files or `cd` to change directories, just like you would at a regular system prompt:
1022 # In IPython, you can type `ls` to see your files or `cd` to change directories, just like you would at a regular system prompt:
1023 #
1023 #
1024 # In [2]: cd tests
1024 # In [2]: cd tests
1025 # /home/fperez/ipython/nbconvert/tests
1025 # /home/fperez/ipython/nbconvert/tests
1026 #
1026 #
1027 # In [3]: ls test.*
1027 # In [3]: ls test.*
1028 # test.aux test.html test.ipynb test.log test.out test.pdf test.rst test.tex
1028 # test.aux test.html test.ipynb test.log test.out test.pdf test.rst test.tex
1029 #
1029 #
1030 # Furthermore, if you use the `!` at the beginning of a line, any commands you pass afterwards go directly to the operating system:
1030 # Furthermore, if you use the `!` at the beginning of a line, any commands you pass afterwards go directly to the operating system:
1031 #
1031 #
1032 # In [4]: !echo "Hello IPython"
1032 # In [4]: !echo "Hello IPython"
1033 # Hello IPython
1033 # Hello IPython
1034 #
1034 #
1035 # IPython offers a useful twist in this feature: it will substitute in the command the value of any *Python* variable you may have if you prepend it with a `$` sign:
1035 # IPython offers a useful twist in this feature: it will substitute in the command the value of any *Python* variable you may have if you prepend it with a `$` sign:
1036 #
1036 #
1037 # In [5]: message = 'IPython interpolates from Python to the shell'
1037 # In [5]: message = 'IPython interpolates from Python to the shell'
1038 #
1038 #
1039 # In [6]: !echo $message
1039 # In [6]: !echo $message
1040 # IPython interpolates from Python to the shell
1040 # IPython interpolates from Python to the shell
1041 #
1041 #
1042 # This feature can be extremely useful, as it lets you combine the power and clarity of Python for complex logic with the immediacy and familiarity of many shell commands. Additionally, if you start the line with *two* `$$` signs, the output of the command will be automatically captured as a list of lines, e.g.:
1042 # This feature can be extremely useful, as it lets you combine the power and clarity of Python for complex logic with the immediacy and familiarity of many shell commands. Additionally, if you start the line with *two* `$$` signs, the output of the command will be automatically captured as a list of lines, e.g.:
1043 #
1043 #
1044 # In [10]: !!ls test.*
1044 # In [10]: !!ls test.*
1045 # Out[10]:
1045 # Out[10]:
1046 # ['test.aux',
1046 # ['test.aux',
1047 # 'test.html',
1047 # 'test.html',
1048 # 'test.ipynb',
1048 # 'test.ipynb',
1049 # 'test.log',
1049 # 'test.log',
1050 # 'test.out',
1050 # 'test.out',
1051 # 'test.pdf',
1051 # 'test.pdf',
1052 # 'test.rst',
1052 # 'test.rst',
1053 # 'test.tex']
1053 # 'test.tex']
1054 #
1054 #
1055 # As explained above, you can now use this as the variable `_10`. If you directly want to capture the output of a system command to a Python variable, you can use the syntax `=!`:
1055 # As explained above, you can now use this as the variable `_10`. If you directly want to capture the output of a system command to a Python variable, you can use the syntax `=!`:
1056 #
1056 #
1057 # In [11]: testfiles =! ls test.*
1057 # In [11]: testfiles =! ls test.*
1058 #
1058 #
1059 # In [12]: print testfiles
1059 # In [12]: print testfiles
1060 # ['test.aux', 'test.html', 'test.ipynb', 'test.log', 'test.out', 'test.pdf', 'test.rst', 'test.tex']
1060 # ['test.aux', 'test.html', 'test.ipynb', 'test.log', 'test.out', 'test.pdf', 'test.rst', 'test.tex']
1061 #
1061 #
1062 # Finally, the special `%alias` command lets you define names that are shorthands for system commands, so that you can type them without having to prefix them via `!` explicitly (for example, `ls` is an alias that has been predefined for you at startup).
1062 # Finally, the special `%alias` command lets you define names that are shorthands for system commands, so that you can type them without having to prefix them via `!` explicitly (for example, `ls` is an alias that has been predefined for you at startup).
1063
1063
1064 # **Magic commands**
1064 # **Magic commands**
1065 #
1065 #
1066 # IPython has a system for special commands, called 'magics', that let you control IPython itself and perform many common tasks with a more shell-like syntax: it uses spaces for delimiting arguments, flags can be set with dashes and all arguments are treated as strings, so no additional quoting is required. This kind of syntax is invalid in the Python language but very convenient for interactive typing (less parentheses, commans and quoting everywhere); IPython distinguishes the two by detecting lines that start with the `%` character.
1066 # IPython has a system for special commands, called 'magics', that let you control IPython itself and perform many common tasks with a more shell-like syntax: it uses spaces for delimiting arguments, flags can be set with dashes and all arguments are treated as strings, so no additional quoting is required. This kind of syntax is invalid in the Python language but very convenient for interactive typing (less parentheses, commans and quoting everywhere); IPython distinguishes the two by detecting lines that start with the `%` character.
1067 #
1067 #
1068 # You can learn more about the magic system by simply typing `%magic` at the prompt, which will give you a short description plus the documentation on *all* available magics. If you want to see only a listing of existing magics, you can use `%lsmagic`:
1068 # You can learn more about the magic system by simply typing `%magic` at the prompt, which will give you a short description plus the documentation on *all* available magics. If you want to see only a listing of existing magics, you can use `%lsmagic`:
1069 #
1069 #
1070 # In [4]: lsmagic
1070 # In [4]: lsmagic
1071 # Available magic functions:
1071 # Available magic functions:
1072 # %alias %autocall %autoindent %automagic %bookmark %c %cd %colors %config %cpaste
1072 # %alias %autocall %autoindent %automagic %bookmark %c %cd %colors %config %cpaste
1073 # %debug %dhist %dirs %doctest_mode %ds %ed %edit %env %gui %hist %history
1073 # %debug %dhist %dirs %doctest_mode %ds %ed %edit %env %gui %hist %history
1074 # %install_default_config %install_ext %install_profiles %load_ext %loadpy %logoff %logon
1074 # %install_default_config %install_ext %install_profiles %load_ext %loadpy %logoff %logon
1075 # %logstart %logstate %logstop %lsmagic %macro %magic %notebook %page %paste %pastebin
1075 # %logstart %logstate %logstop %lsmagic %macro %magic %notebook %page %paste %pastebin
1076 # %pd %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %pop %popd %pprint %precision %profile
1076 # %pd %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %pop %popd %pprint %precision %profile
1077 # %prun %psearch %psource %pushd %pwd %pycat %pylab %quickref %recall %rehashx
1077 # %prun %psearch %psource %pushd %pwd %pycat %pylab %quickref %recall %rehashx
1078 # %reload_ext %rep %rerun %reset %reset_selective %run %save %sc %stop %store %sx %tb
1078 # %reload_ext %rep %rerun %reset %reset_selective %run %save %sc %stop %store %sx %tb
1079 # %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode
1079 # %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode
1080 #
1080 #
1081 # Automagic is ON, % prefix NOT needed for magic functions.
1081 # Automagic is ON, % prefix NOT needed for magic functions.
1082 #
1082 #
1083 # Note how the example above omitted the eplicit `%` marker and simply uses `lsmagic`. As long as the 'automagic' feature is on (which it is by default), you can omit the `%` marker as long as there is no ambiguity with a Python variable of the same name.
1083 # Note how the example above omitted the eplicit `%` marker and simply uses `lsmagic`. As long as the 'automagic' feature is on (which it is by default), you can omit the `%` marker as long as there is no ambiguity with a Python variable of the same name.
1084
1084
1085 # **Running your code**
1085 # **Running your code**
1086 #
1086 #
1087 # While it's easy to type a few lines of code in IPython, for any long-lived work you should keep your codes in Python scripts (or in IPython notebooks, see below). Consider that you have a script, in this case trivially simple for the sake of brevity, named `simple.py`:
1087 # While it's easy to type a few lines of code in IPython, for any long-lived work you should keep your codes in Python scripts (or in IPython notebooks, see below). Consider that you have a script, in this case trivially simple for the sake of brevity, named `simple.py`:
1088 #
1088 #
1089 # In [12]: !cat simple.py
1089 # In [12]: !cat simple.py
1090 # import numpy as np
1090 # import numpy as np
1091 #
1091 #
1092 # x = np.random.normal(size=100)
1092 # x = np.random.normal(size=100)
1093 #
1093 #
1094 # print 'First elment of x:', x[0]
1094 # print 'First elment of x:', x[0]
1095 #
1095 #
1096 # The typical workflow with IPython is to use the `%run` magic to execute your script (you can omit the .py extension if you want). When you run it, the script will execute just as if it had been run at the system prompt with `python simple.py` (though since modules don't get re-executed on new imports by Python, all system initialization is essentially free, which can have a significant run time impact in some cases):
1096 # The typical workflow with IPython is to use the `%run` magic to execute your script (you can omit the .py extension if you want). When you run it, the script will execute just as if it had been run at the system prompt with `python simple.py` (though since modules don't get re-executed on new imports by Python, all system initialization is essentially free, which can have a significant run time impact in some cases):
1097 #
1097 #
1098 # In [13]: run simple
1098 # In [13]: run simple
1099 # First elment of x: -1.55872256289
1099 # First elment of x: -1.55872256289
1100 #
1100 #
1101 # Once it completes, all variables defined in it become available for you to use interactively:
1101 # Once it completes, all variables defined in it become available for you to use interactively:
1102 #
1102 #
1103 # In [14]: x.shape
1103 # In [14]: x.shape
1104 # Out[14]: (100,)
1104 # Out[14]: (100,)
1105 #
1105 #
1106 # This allows you to plot data, try out ideas, etc, in a `%run`/interact/edit cycle that can be very productive. As you start understanding your problem better you can refine your script further, incrementally improving it based on the work you do at the IPython prompt. At any point you can use the `%hist` magic to print out your history without prompts, so that you can copy useful fragments back into the script.
1106 # This allows you to plot data, try out ideas, etc, in a `%run`/interact/edit cycle that can be very productive. As you start understanding your problem better you can refine your script further, incrementally improving it based on the work you do at the IPython prompt. At any point you can use the `%hist` magic to print out your history without prompts, so that you can copy useful fragments back into the script.
1107 #
1107 #
1108 # By default, `%run` executes scripts in a completely empty namespace, to better mimic how they would execute at the system prompt with plain Python. But if you use the `-i` flag, the script will also see your interactively defined variables. This lets you edit in a script larger amounts of code that still behave as if you had typed them at the IPython prompt.
1108 # By default, `%run` executes scripts in a completely empty namespace, to better mimic how they would execute at the system prompt with plain Python. But if you use the `-i` flag, the script will also see your interactively defined variables. This lets you edit in a script larger amounts of code that still behave as if you had typed them at the IPython prompt.
1109 #
1109 #
1110 # You can also get a summary of the time taken by your script with the `-t` flag; consider a different script `randsvd.py` that takes a bit longer to run:
1110 # You can also get a summary of the time taken by your script with the `-t` flag; consider a different script `randsvd.py` that takes a bit longer to run:
1111 #
1111 #
1112 # In [21]: run -t randsvd.py
1112 # In [21]: run -t randsvd.py
1113 #
1113 #
1114 # IPython CPU timings (estimated):
1114 # IPython CPU timings (estimated):
1115 # User : 0.38 s.
1115 # User : 0.38 s.
1116 # System : 0.04 s.
1116 # System : 0.04 s.
1117 # Wall time: 0.34 s.
1117 # Wall time: 0.34 s.
1118 #
1118 #
1119 # `User` is the time spent by the computer executing your code, while `System` is the time the operating system had to work on your behalf, doing things like memory allocation that are needed by your code but that you didn't explicitly program and that happen inside the kernel. The `Wall time` is the time on a 'clock on the wall' between the start and end of your program.
1119 # `User` is the time spent by the computer executing your code, while `System` is the time the operating system had to work on your behalf, doing things like memory allocation that are needed by your code but that you didn't explicitly program and that happen inside the kernel. The `Wall time` is the time on a 'clock on the wall' between the start and end of your program.
1120 #
1120 #
1121 # If `Wall > User+System`, your code is most likely waiting idle for certain periods. That could be waiting for data to arrive from a remote source or perhaps because the operating system has to swap large amounts of virtual memory. If you know that your code doesn't explicitly wait for remote data to arrive, you should investigate further to identify possible ways of improving the performance profile.
1121 # If `Wall > User+System`, your code is most likely waiting idle for certain periods. That could be waiting for data to arrive from a remote source or perhaps because the operating system has to swap large amounts of virtual memory. If you know that your code doesn't explicitly wait for remote data to arrive, you should investigate further to identify possible ways of improving the performance profile.
1122 #
1122 #
1123 # If you only want to time how long a single statement takes, you don't need to put it into a script as you can use the `%timeit` magic, which uses Python's `timeit` module to very carefully measure timig data; `timeit` can measure even short statements that execute extremely fast:
1123 # If you only want to time how long a single statement takes, you don't need to put it into a script as you can use the `%timeit` magic, which uses Python's `timeit` module to very carefully measure timig data; `timeit` can measure even short statements that execute extremely fast:
1124 #
1124 #
1125 # In [27]: %timeit a=1
1125 # In [27]: %timeit a=1
1126 # 10000000 loops, best of 3: 23 ns per loop
1126 # 10000000 loops, best of 3: 23 ns per loop
1127 #
1127 #
1128 # and for code that runs longer, it automatically adjusts so the overall measurement doesn't take too long:
1128 # and for code that runs longer, it automatically adjusts so the overall measurement doesn't take too long:
1129 #
1129 #
1130 # In [28]: %timeit np.linalg.svd(x)
1130 # In [28]: %timeit np.linalg.svd(x)
1131 # 1 loops, best of 3: 310 ms per loop
1131 # 1 loops, best of 3: 310 ms per loop
1132 #
1132 #
1133 # The `%run` magic still has more options for debugging and profiling data; you should read its documentation for many useful details (as always, just type `%run?`).
1133 # The `%run` magic still has more options for debugging and profiling data; you should read its documentation for many useful details (as always, just type `%run?`).
1134
1134
1135 ### The graphical Qt console
1135 ### The graphical Qt console
1136
1136
1137 # If you type at the system prompt (see the IPython website for installation details, as this requires some additional libraries):
1137 # If you type at the system prompt (see the IPython website for installation details, as this requires some additional libraries):
1138 #
1138 #
1139 # $ ipython qtconsole
1139 # $ ipython qtconsole
1140 #
1140 #
1141 # instead of opening in a terminal as before, IPython will start a graphical console that at first sight appears just like a terminal, but which is in fact much more capable than a text-only terminal. This is a specialized terminal designed for interactive scientific work, and it supports full multi-line editing with color highlighting and graphical calltips for functions, it can keep multiple IPython sessions open simultaneously in tabs, and when scripts run it can display the figures inline directly in the work area.
1141 # instead of opening in a terminal as before, IPython will start a graphical console that at first sight appears just like a terminal, but which is in fact much more capable than a text-only terminal. This is a specialized terminal designed for interactive scientific work, and it supports full multi-line editing with color highlighting and graphical calltips for functions, it can keep multiple IPython sessions open simultaneously in tabs, and when scripts run it can display the figures inline directly in the work area.
1142 #
1142 #
1143 # <center><img src="ipython_qtconsole2.png" width=400px></center>
1143 # <center><img src="ipython_qtconsole2.png" width=400px></center>
1144
1144
1145 # % This cell is for the pdflatex output only
1145 # % This cell is for the pdflatex output only
1146 # \begin{figure}[htbp]
1146 # \begin{figure}[htbp]
1147 # \centering
1147 # \centering
1148 # \includegraphics[width=3in]{ipython_qtconsole2.png}
1148 # \includegraphics[width=3in]{ipython_qtconsole2.png}
1149 # \caption{The IPython Qt console: a lightweight terminal for scientific exploration, with code, results and graphics in a soingle environment.}
1149 # \caption{The IPython Qt console: a lightweight terminal for scientific exploration, with code, results and graphics in a soingle environment.}
1150 # \end{figure}
1150 # \end{figure}
1151
1151
1152 # The Qt console accepts the same `--pylab` startup flags as the terminal, but you can additionally supply the value `--pylab inline`, which enables the support for inline graphics shown in the figure. This is ideal for keeping all the code and figures in the same session, given that the console can save the output of your entire session to HTML or PDF.
1152 # The Qt console accepts the same `--pylab` startup flags as the terminal, but you can additionally supply the value `--pylab inline`, which enables the support for inline graphics shown in the figure. This is ideal for keeping all the code and figures in the same session, given that the console can save the output of your entire session to HTML or PDF.
1153 #
1153 #
1154 # Since the Qt console makes it far more convenient than the terminal to edit blocks of code with multiple lines, in this environment it's worth knowing about the `%loadpy` magic function. `%loadpy` takes a path to a local file or remote URL, fetches its contents, and puts it in the work area for you to further edit and execute. It can be an extremely fast and convenient way of loading code from local disk or remote examples from sites such as the [Matplotlib gallery](http://matplotlib.sourceforge.net/gallery.html).
1154 # Since the Qt console makes it far more convenient than the terminal to edit blocks of code with multiple lines, in this environment it's worth knowing about the `%loadpy` magic function. `%loadpy` takes a path to a local file or remote URL, fetches its contents, and puts it in the work area for you to further edit and execute. It can be an extremely fast and convenient way of loading code from local disk or remote examples from sites such as the [Matplotlib gallery](http://matplotlib.sourceforge.net/gallery.html).
1155 #
1155 #
1156 # Other than its enhanced capabilities for code and graphics, all of the features of IPython we've explained before remain functional in this graphical console.
1156 # Other than its enhanced capabilities for code and graphics, all of the features of IPython we've explained before remain functional in this graphical console.
1157
1157
1158 ### The IPython Notebook
1158 ### The IPython Notebook
1159
1159
1160 # The third way to interact with IPython, in addition to the terminal and graphical Qt console, is a powerful web interface called the "IPython Notebook". If you run at the system console (you can omit the `pylab` flags if you don't need plotting support):
1160 # The third way to interact with IPython, in addition to the terminal and graphical Qt console, is a powerful web interface called the "IPython Notebook". If you run at the system console (you can omit the `pylab` flags if you don't need plotting support):
1161 #
1161 #
1162 # $ ipython notebook --pylab inline
1162 # $ ipython notebook --pylab inline
1163 #
1163 #
1164 # IPython will start a process that runs a web server in your local machine and to which a web browser can connect. The Notebook is a workspace that lets you execute code in blocks called 'cells' and displays any results and figures, but which can also contain arbitrary text (including LaTeX-formatted mathematical expressions) and any rich media that a modern web browser is capable of displaying.
1164 # IPython will start a process that runs a web server in your local machine and to which a web browser can connect. The Notebook is a workspace that lets you execute code in blocks called 'cells' and displays any results and figures, but which can also contain arbitrary text (including LaTeX-formatted mathematical expressions) and any rich media that a modern web browser is capable of displaying.
1165 #
1165 #
1166 # <center><img src="ipython-notebook-specgram-2.png" width=400px></center>
1166 # <center><img src="ipython-notebook-specgram-2.png" width=400px></center>
1167
1167
1168 # % This cell is for the pdflatex output only
1168 # % This cell is for the pdflatex output only
1169 # \begin{figure}[htbp]
1169 # \begin{figure}[htbp]
1170 # \centering
1170 # \centering
1171 # \includegraphics[width=3in]{ipython-notebook-specgram-2.png}
1171 # \includegraphics[width=3in]{ipython-notebook-specgram-2.png}
1172 # \caption{The IPython Notebook: text, equations, code, results, graphics and other multimedia in an open format for scientific exploration and collaboration}
1172 # \caption{The IPython Notebook: text, equations, code, results, graphics and other multimedia in an open format for scientific exploration and collaboration}
1173 # \end{figure}
1173 # \end{figure}
1174
1174
1175 # In fact, this document was written as a Notebook, and only exported to LaTeX for printing. Inside of each cell, all the features of IPython that we have discussed before remain functional, since ultimately this web client is communicating with the same IPython code that runs in the terminal. But this interface is a much more rich and powerful environment for maintaining long-term "live and executable" scientific documents.
1175 # In fact, this document was written as a Notebook, and only exported to LaTeX for printing. Inside of each cell, all the features of IPython that we have discussed before remain functional, since ultimately this web client is communicating with the same IPython code that runs in the terminal. But this interface is a much more rich and powerful environment for maintaining long-term "live and executable" scientific documents.
1176 #
1176 #
1177 # Notebook environments have existed in commercial systems like Mathematica(TM) and Maple(TM) for a long time; in the open source world the [Sage](http://sagemath.org) project blazed this particular trail starting in 2006, and now we bring all the features that have made IPython such a widely used tool to a Notebook model.
1177 # Notebook environments have existed in commercial systems like Mathematica(TM) and Maple(TM) for a long time; in the open source world the [Sage](http://sagemath.org) project blazed this particular trail starting in 2006, and now we bring all the features that have made IPython such a widely used tool to a Notebook model.
1178 #
1178 #
1179 # Since the Notebook runs as a web application, it is possible to configure it for remote access, letting you run your computations on a persistent server close to your data, which you can then access remotely from any browser-equipped computer. We encourage you to read the extensive documentation provided by the IPython project for details on how to do this and many more features of the notebook.
1179 # Since the Notebook runs as a web application, it is possible to configure it for remote access, letting you run your computations on a persistent server close to your data, which you can then access remotely from any browser-equipped computer. We encourage you to read the extensive documentation provided by the IPython project for details on how to do this and many more features of the notebook.
1180 #
1180 #
1181 # Finally, as we said earlier, IPython also has a high-level and easy to use set of libraries for parallel computing, that let you control (interactively if desired) not just one IPython but an entire cluster of 'IPython engines'. Unfortunately a detailed discussion of these tools is beyond the scope of this text, but should you need to parallelize your analysis codes, a quick read of the tutorials and examples provided at the IPython site may prove fruitful.
1181 # Finally, as we said earlier, IPython also has a high-level and easy to use set of libraries for parallel computing, that let you control (interactively if desired) not just one IPython but an entire cluster of 'IPython engines'. Unfortunately a detailed discussion of these tools is beyond the scope of this text, but should you need to parallelize your analysis codes, a quick read of the tutorials and examples provided at the IPython site may prove fruitful.
General Comments 0
You need to be logged in to leave comments. Login now