Show More
@@ -1,12 +1,16 b'' | |||||
1 | { |
|
1 | { | |
2 |
"metadata": { |
|
2 | "metadata": { | |
|
3 | "name": "tutorial" | |||
|
4 | }, | |||
3 | "nbformat": 3, |
|
5 | "nbformat": 3, | |
|
6 | "nbformat_minor": 0, | |||
4 | "worksheets": [ |
|
7 | "worksheets": [ | |
5 | { |
|
8 | { | |
6 | "cells": [ |
|
9 | "cells": [ | |
7 | { |
|
10 | { | |
8 | "cell_type": "heading", |
|
11 | "cell_type": "heading", | |
9 | "level": 1, |
|
12 | "level": 1, | |
|
13 | "metadata": {}, | |||
10 | "source": [ |
|
14 | "source": [ | |
11 | "An Introduction to machine learning with scikit-learn" |
|
15 | "An Introduction to machine learning with scikit-learn" | |
12 | ] |
|
16 | ] | |
@@ -14,121 +18,134 b'' | |||||
14 | { |
|
18 | { | |
15 | "cell_type": "heading", |
|
19 | "cell_type": "heading", | |
16 | "level": 1, |
|
20 | "level": 1, | |
|
21 | "metadata": {}, | |||
17 | "source": [ |
|
22 | "source": [ | |
18 | "Section contents" |
|
23 | "Section contents" | |
19 | ] |
|
24 | ] | |
20 | }, |
|
25 | }, | |
21 | { |
|
26 | { | |
22 | "cell_type": "markdown", |
|
27 | "cell_type": "markdown", | |
|
28 | "metadata": {}, | |||
23 | "source": [ |
|
29 | "source": [ | |
24 | "In this section, we introduce the machine learning", |
|
30 | "In this section, we introduce the machine learning\n", | |
25 | "vocabulary that we use through-out scikit-learn and give a", |
|
31 | "vocabulary that we use through-out scikit-learn and give a\n", | |
26 | "simple learning example." |
|
32 | "simple learning example." | |
27 | ] |
|
33 | ] | |
28 | }, |
|
34 | }, | |
29 | { |
|
35 | { | |
30 | "cell_type": "heading", |
|
36 | "cell_type": "heading", | |
31 | "level": 2, |
|
37 | "level": 2, | |
|
38 | "metadata": {}, | |||
32 | "source": [ |
|
39 | "source": [ | |
33 | "Machine learning: the problem setting" |
|
40 | "Machine learning: the problem setting" | |
34 | ] |
|
41 | ] | |
35 | }, |
|
42 | }, | |
36 | { |
|
43 | { | |
37 | "cell_type": "markdown", |
|
44 | "cell_type": "markdown", | |
38 | "source": [ |
|
45 | "metadata": {}, | |
39 | "In general, a learning problem considers a set of n", |
|
46 | "source": [ | |
40 | "samples of", |
|
47 | "In general, a learning problem considers a set of n\n", | |
41 | "data and try to predict properties of unknown data. If each sample is", |
|
48 | "samples of\n", | |
42 | "more than a single number, and for instance a multi-dimensional entry", |
|
49 | "data and try to predict properties of unknown data. If each sample is\n", | |
43 | "(aka multivariate", |
|
50 | "more than a single number, and for instance a multi-dimensional entry\n", | |
44 | "data), is it said to have several attributes,", |
|
51 | "(aka multivariate\n", | |
|
52 | "data), is it said to have several attributes,\n", | |||
45 | "or features." |
|
53 | "or features." | |
46 | ] |
|
54 | ] | |
47 | }, |
|
55 | }, | |
48 | { |
|
56 | { | |
49 | "cell_type": "markdown", |
|
57 | "cell_type": "markdown", | |
|
58 | "metadata": {}, | |||
50 | "source": [ |
|
59 | "source": [ | |
51 | "We can separate learning problems in a few large categories:" |
|
60 | "We can separate learning problems in a few large categories:" | |
52 | ] |
|
61 | ] | |
53 | }, |
|
62 | }, | |
54 | { |
|
63 | { | |
55 | "cell_type": "markdown", |
|
64 | "cell_type": "markdown", | |
|
65 | "metadata": {}, | |||
56 | "source": [ |
|
66 | "source": [ | |
57 | "supervised learning,", |
|
67 | "supervised learning,\n", | |
58 | "in which the data comes with additional attributes that we want to predict", |
|
68 | "in which the data comes with additional attributes that we want to predict\n", | |
59 | "(:ref:`Click here <supervised-learning>`", |
|
69 | "(:ref:`Click here <supervised-learning>`\n", | |
60 | "to go to the Scikit-Learn supervised learning page).This problem", |
|
70 | "to go to the Scikit-Learn supervised learning page).This problem\n", | |
61 | "can be either:" |
|
71 | "can be either:" | |
62 | ] |
|
72 | ] | |
63 | }, |
|
73 | }, | |
64 | { |
|
74 | { | |
65 | "cell_type": "markdown", |
|
75 | "cell_type": "markdown", | |
66 | "source": [ |
|
76 | "metadata": {}, | |
67 | "classification:", |
|
77 | "source": [ | |
68 | "samples belong to two or more classes and we", |
|
78 | "classification:\n", | |
69 | "want to learn from already labeled data how to predict the class", |
|
79 | "samples belong to two or more classes and we\n", | |
70 | "of unlabeled data. An example of classification problem would", |
|
80 | "want to learn from already labeled data how to predict the class\n", | |
71 | "be the digit recognition example, in which the aim is to assign", |
|
81 | "of unlabeled data. An example of classification problem would\n", | |
72 | "each input vector to one of a finite number of discrete", |
|
82 | "be the digit recognition example, in which the aim is to assign\n", | |
|
83 | "each input vector to one of a finite number of discrete\n", | |||
73 | "categories." |
|
84 | "categories." | |
74 | ] |
|
85 | ] | |
75 | }, |
|
86 | }, | |
76 | { |
|
87 | { | |
77 | "cell_type": "markdown", |
|
88 | "cell_type": "markdown", | |
|
89 | "metadata": {}, | |||
78 | "source": [ |
|
90 | "source": [ | |
79 | "regression:", |
|
91 | "regression:\n", | |
80 | "if the desired output consists of one or more", |
|
92 | "if the desired output consists of one or more\n", | |
81 | "continuous variables, then the task is called regression. An", |
|
93 | "continuous variables, then the task is called regression. An\n", | |
82 | "example of a regression problem would be the prediction of the", |
|
94 | "example of a regression problem would be the prediction of the\n", | |
83 | "length of a salmon as a function of its age and weight." |
|
95 | "length of a salmon as a function of its age and weight." | |
84 | ] |
|
96 | ] | |
85 | }, |
|
97 | }, | |
86 | { |
|
98 | { | |
87 | "cell_type": "markdown", |
|
99 | "cell_type": "markdown", | |
88 | "source": [ |
|
100 | "metadata": {}, | |
89 | "unsupervised learning,", |
|
101 | "source": [ | |
90 | "in which the training data consists of a set of input vectors x", |
|
102 | "unsupervised learning,\n", | |
91 | "without any corresponding target values. The goal in such problems", |
|
103 | "in which the training data consists of a set of input vectors x\n", | |
92 | "may be to discover groups of similar examples within the data, where", |
|
104 | "without any corresponding target values. The goal in such problems\n", | |
93 | "it is called clustering,", |
|
105 | "may be to discover groups of similar examples within the data, where\n", | |
94 | "or to determine the distribution of data within the input space, known as", |
|
106 | "it is called clustering,\n", | |
95 | "density estimation, or", |
|
107 | "or to determine the distribution of data within the input space, known as\n", | |
96 | "to project the data from a high-dimensional space down to two or thee", |
|
108 | "density estimation, or\n", | |
97 | "dimensions for the purpose of visualization", |
|
109 | "to project the data from a high-dimensional space down to two or thee\n", | |
98 | "(:ref:`Click here <unsupervised-learning>`", |
|
110 | "dimensions for the purpose of visualization\n", | |
|
111 | "(:ref:`Click here <unsupervised-learning>`\n", | |||
99 | "to go to the Scikit-Learn unsupervised learning page)." |
|
112 | "to go to the Scikit-Learn unsupervised learning page)." | |
100 | ] |
|
113 | ] | |
101 | }, |
|
114 | }, | |
102 | { |
|
115 | { | |
103 | "cell_type": "heading", |
|
116 | "cell_type": "heading", | |
104 | "level": 2, |
|
117 | "level": 2, | |
|
118 | "metadata": {}, | |||
105 | "source": [ |
|
119 | "source": [ | |
106 | "Training set and testing set" |
|
120 | "Training set and testing set" | |
107 | ] |
|
121 | ] | |
108 | }, |
|
122 | }, | |
109 | { |
|
123 | { | |
110 | "cell_type": "markdown", |
|
124 | "cell_type": "markdown", | |
|
125 | "metadata": {}, | |||
111 | "source": [ |
|
126 | "source": [ | |
112 | "Machine learning is about learning some properties of a data set", |
|
127 | "Machine learning is about learning some properties of a data set\n", | |
113 | "and applying them to new data. This is why a common practice in", |
|
128 | "and applying them to new data. This is why a common practice in\n", | |
114 | "machine learning to evaluate an algorithm is to split the data", |
|
129 | "machine learning to evaluate an algorithm is to split the data\n", | |
115 | "at hand in two sets, one that we call a training set on which", |
|
130 | "at hand in two sets, one that we call a training set on which\n", | |
116 | "we learn data properties, and one that we call a testing set,", |
|
131 | "we learn data properties, and one that we call a testing set,\n", | |
117 | "on which we test these properties." |
|
132 | "on which we test these properties." | |
118 | ] |
|
133 | ] | |
119 | }, |
|
134 | }, | |
120 | { |
|
135 | { | |
121 | "cell_type": "heading", |
|
136 | "cell_type": "heading", | |
122 | "level": 2, |
|
137 | "level": 2, | |
|
138 | "metadata": {}, | |||
123 | "source": [ |
|
139 | "source": [ | |
124 | "Loading an example dataset" |
|
140 | "Loading an example dataset" | |
125 | ] |
|
141 | ] | |
126 | }, |
|
142 | }, | |
127 | { |
|
143 | { | |
128 | "cell_type": "markdown", |
|
144 | "cell_type": "markdown", | |
|
145 | "metadata": {}, | |||
129 | "source": [ |
|
146 | "source": [ | |
130 | "scikit-learn comes with a few standard datasets, for instance the", |
|
147 | "scikit-learn comes with a few standard datasets, for instance the\n", | |
131 | "iris and digits", |
|
148 | "iris and digits\n", | |
132 | "datasets for classification and the boston house prices dataset for regression.:" |
|
149 | "datasets for classification and the boston house prices dataset for regression.:" | |
133 | ] |
|
150 | ] | |
134 | }, |
|
151 | }, | |
@@ -136,28 +153,31 b'' | |||||
136 | "cell_type": "code", |
|
153 | "cell_type": "code", | |
137 | "collapsed": false, |
|
154 | "collapsed": false, | |
138 | "input": [ |
|
155 | "input": [ | |
139 | "from sklearn import datasets", |
|
156 | "from sklearn import datasets\n", | |
140 | "iris = datasets.load_iris()", |
|
157 | "iris = datasets.load_iris()\n", | |
141 | "digits = datasets.load_digits()" |
|
158 | "digits = datasets.load_digits()" | |
142 | ], |
|
159 | ], | |
143 | "language": "python", |
|
160 | "language": "python", | |
|
161 | "metadata": {}, | |||
144 | "outputs": [] |
|
162 | "outputs": [] | |
145 | }, |
|
163 | }, | |
146 | { |
|
164 | { | |
147 | "cell_type": "markdown", |
|
165 | "cell_type": "markdown", | |
|
166 | "metadata": {}, | |||
148 | "source": [ |
|
167 | "source": [ | |
149 | "A dataset is a dictionary-like object that holds all the data and some", |
|
168 | "A dataset is a dictionary-like object that holds all the data and some\n", | |
150 | "metadata about the data. This data is stored in the .data member,", |
|
169 | "metadata about the data. This data is stored in the .data member,\n", | |
151 | "which is a n_samples, n_features array. In the case of supervised", |
|
170 | "which is a n_samples, n_features array. In the case of supervised\n", | |
152 | "problem, explanatory variables are stored in the .target member. More", |
|
171 | "problem, explanatory variables are stored in the .target member. More\n", | |
153 | "details on the different datasets can be found in the :ref:`dedicated", |
|
172 | "details on the different datasets can be found in the :ref:`dedicated\n", | |
154 | "section <datasets>`." |
|
173 | "section <datasets>`." | |
155 | ] |
|
174 | ] | |
156 | }, |
|
175 | }, | |
157 | { |
|
176 | { | |
158 | "cell_type": "markdown", |
|
177 | "cell_type": "markdown", | |
|
178 | "metadata": {}, | |||
159 | "source": [ |
|
179 | "source": [ | |
160 | "For instance, in the case of the digits dataset, digits.data gives", |
|
180 | "For instance, in the case of the digits dataset, digits.data gives\n", | |
161 | "access to the features that can be used to classify the digits samples:" |
|
181 | "access to the features that can be used to classify the digits samples:" | |
162 | ] |
|
182 | ] | |
163 | }, |
|
183 | }, | |
@@ -168,13 +188,15 b'' | |||||
168 | "print digits.data # doctest: +NORMALIZE_WHITESPACE" |
|
188 | "print digits.data # doctest: +NORMALIZE_WHITESPACE" | |
169 | ], |
|
189 | ], | |
170 | "language": "python", |
|
190 | "language": "python", | |
|
191 | "metadata": {}, | |||
171 | "outputs": [] |
|
192 | "outputs": [] | |
172 | }, |
|
193 | }, | |
173 | { |
|
194 | { | |
174 | "cell_type": "markdown", |
|
195 | "cell_type": "markdown", | |
|
196 | "metadata": {}, | |||
175 | "source": [ |
|
197 | "source": [ | |
176 | "and digits.target gives the ground truth for the digit dataset, that", |
|
198 | "and digits.target gives the ground truth for the digit dataset, that\n", | |
177 | "is the number corresponding to each digit image that we are trying to", |
|
199 | "is the number corresponding to each digit image that we are trying to\n", | |
178 | "learn:" |
|
200 | "learn:" | |
179 | ] |
|
201 | ] | |
180 | }, |
|
202 | }, | |
@@ -185,21 +207,24 b'' | |||||
185 | "digits.target" |
|
207 | "digits.target" | |
186 | ], |
|
208 | ], | |
187 | "language": "python", |
|
209 | "language": "python", | |
|
210 | "metadata": {}, | |||
188 | "outputs": [] |
|
211 | "outputs": [] | |
189 | }, |
|
212 | }, | |
190 | { |
|
213 | { | |
191 | "cell_type": "heading", |
|
214 | "cell_type": "heading", | |
192 | "level": 2, |
|
215 | "level": 2, | |
|
216 | "metadata": {}, | |||
193 | "source": [ |
|
217 | "source": [ | |
194 | "Shape of the data arrays" |
|
218 | "Shape of the data arrays" | |
195 | ] |
|
219 | ] | |
196 | }, |
|
220 | }, | |
197 | { |
|
221 | { | |
198 | "cell_type": "markdown", |
|
222 | "cell_type": "markdown", | |
|
223 | "metadata": {}, | |||
199 | "source": [ |
|
224 | "source": [ | |
200 | "The data is always a 2D array, n_samples, n_features, although", |
|
225 | "The data is always a 2D array, n_samples, n_features, although\n", | |
201 | "the original data may have had a different shape. In the case of the", |
|
226 | "the original data may have had a different shape. In the case of the\n", | |
202 | "digits, each original sample is an image of shape 8, 8 and can be", |
|
227 | "digits, each original sample is an image of shape 8, 8 and can be\n", | |
203 | "accessed using:" |
|
228 | "accessed using:" | |
204 | ] |
|
229 | ] | |
205 | }, |
|
230 | }, | |
@@ -210,48 +235,54 b'' | |||||
210 | "digits.images[0]" |
|
235 | "digits.images[0]" | |
211 | ], |
|
236 | ], | |
212 | "language": "python", |
|
237 | "language": "python", | |
|
238 | "metadata": {}, | |||
213 | "outputs": [] |
|
239 | "outputs": [] | |
214 | }, |
|
240 | }, | |
215 | { |
|
241 | { | |
216 | "cell_type": "markdown", |
|
242 | "cell_type": "markdown", | |
|
243 | "metadata": {}, | |||
217 | "source": [ |
|
244 | "source": [ | |
218 | "The :ref:`simple example on this dataset", |
|
245 | "The :ref:`simple example on this dataset\n", | |
219 | "<example_plot_digits_classification.py>` illustrates how starting", |
|
246 | "<example_plot_digits_classification.py>` illustrates how starting\n", | |
220 | "from the original problem one can shape the data for consumption in", |
|
247 | "from the original problem one can shape the data for consumption in\n", | |
221 | "the scikit-learn." |
|
248 | "the scikit-learn." | |
222 | ] |
|
249 | ] | |
223 | }, |
|
250 | }, | |
224 | { |
|
251 | { | |
225 | "cell_type": "heading", |
|
252 | "cell_type": "heading", | |
226 | "level": 2, |
|
253 | "level": 2, | |
|
254 | "metadata": {}, | |||
227 | "source": [ |
|
255 | "source": [ | |
228 | "Learning and Predicting" |
|
256 | "Learning and Predicting" | |
229 | ] |
|
257 | ] | |
230 | }, |
|
258 | }, | |
231 | { |
|
259 | { | |
232 | "cell_type": "markdown", |
|
260 | "cell_type": "markdown", | |
|
261 | "metadata": {}, | |||
233 | "source": [ |
|
262 | "source": [ | |
234 | "In the case of the digits dataset, the task is to predict the value of a", |
|
263 | "In the case of the digits dataset, the task is to predict the value of a\n", | |
235 | "hand-written digit from an image. We are given samples of each of the 10", |
|
264 | "hand-written digit from an image. We are given samples of each of the 10\n", | |
236 | "possible classes on which we fit an", |
|
265 | "possible classes on which we fit an\n", | |
237 | "estimator to be able to predict", |
|
266 | "estimator to be able to predict\n", | |
238 | "the labels corresponding to new data." |
|
267 | "the labels corresponding to new data." | |
239 | ] |
|
268 | ] | |
240 | }, |
|
269 | }, | |
241 | { |
|
270 | { | |
242 | "cell_type": "markdown", |
|
271 | "cell_type": "markdown", | |
|
272 | "metadata": {}, | |||
243 | "source": [ |
|
273 | "source": [ | |
244 | "In scikit-learn, an estimator is just a plain Python class that", |
|
274 | "In scikit-learn, an estimator is just a plain Python class that\n", | |
245 | "implements the methods fit(X, Y) and predict(T)." |
|
275 | "implements the methods fit(X, Y) and predict(T)." | |
246 | ] |
|
276 | ] | |
247 | }, |
|
277 | }, | |
248 | { |
|
278 | { | |
249 | "cell_type": "markdown", |
|
279 | "cell_type": "markdown", | |
|
280 | "metadata": {}, | |||
250 | "source": [ |
|
281 | "source": [ | |
251 | "An example of estimator is the class sklearn.svm.SVC that", |
|
282 | "An example of estimator is the class sklearn.svm.SVC that\n", | |
252 | "implements Support Vector Classification. The", |
|
283 | "implements Support Vector Classification. The\n", | |
253 | "constructor of an estimator takes as arguments the parameters of the", |
|
284 | "constructor of an estimator takes as arguments the parameters of the\n", | |
254 | "model, but for the time being, we will consider the estimator as a black", |
|
285 | "model, but for the time being, we will consider the estimator as a black\n", | |
255 | "box:" |
|
286 | "box:" | |
256 | ] |
|
287 | ] | |
257 | }, |
|
288 | }, | |
@@ -259,35 +290,39 b'' | |||||
259 | "cell_type": "code", |
|
290 | "cell_type": "code", | |
260 | "collapsed": false, |
|
291 | "collapsed": false, | |
261 | "input": [ |
|
292 | "input": [ | |
262 | "from sklearn import svm", |
|
293 | "from sklearn import svm\n", | |
263 | "clf = svm.SVC(gamma=0.001, C=100.)" |
|
294 | "clf = svm.SVC(gamma=0.001, C=100.)" | |
264 | ], |
|
295 | ], | |
265 | "language": "python", |
|
296 | "language": "python", | |
|
297 | "metadata": {}, | |||
266 | "outputs": [] |
|
298 | "outputs": [] | |
267 | }, |
|
299 | }, | |
268 | { |
|
300 | { | |
269 | "cell_type": "heading", |
|
301 | "cell_type": "heading", | |
270 | "level": 2, |
|
302 | "level": 2, | |
|
303 | "metadata": {}, | |||
271 | "source": [ |
|
304 | "source": [ | |
272 | "Choosing the parameters of the model" |
|
305 | "Choosing the parameters of the model" | |
273 | ] |
|
306 | ] | |
274 | }, |
|
307 | }, | |
275 | { |
|
308 | { | |
276 | "cell_type": "markdown", |
|
309 | "cell_type": "markdown", | |
|
310 | "metadata": {}, | |||
277 | "source": [ |
|
311 | "source": [ | |
278 | "In this example we set the value of gamma manually. It is possible", |
|
312 | "In this example we set the value of gamma manually. It is possible\n", | |
279 | "to automatically find good values for the parameters by using tools", |
|
313 | "to automatically find good values for the parameters by using tools\n", | |
280 | "such as :ref:`grid search <grid_search>` and :ref:`cross validation", |
|
314 | "such as :ref:`grid search <grid_search>` and :ref:`cross validation\n", | |
281 | "<cross_validation>`." |
|
315 | "<cross_validation>`." | |
282 | ] |
|
316 | ] | |
283 | }, |
|
317 | }, | |
284 | { |
|
318 | { | |
285 | "cell_type": "markdown", |
|
319 | "cell_type": "markdown", | |
|
320 | "metadata": {}, | |||
286 | "source": [ |
|
321 | "source": [ | |
287 | "We call our estimator instance clf as it is a classifier. It now must", |
|
322 | "We call our estimator instance clf as it is a classifier. It now must\n", | |
288 | "be fitted to the model, that is, it must learn from the model. This is", |
|
323 | "be fitted to the model, that is, it must learn from the model. This is\n", | |
289 | "done by passing our training set to the fit method. As a training", |
|
324 | "done by passing our training set to the fit method. As a training\n", | |
290 | "set, let us use all the images of our dataset apart from the last", |
|
325 | "set, let us use all the images of our dataset apart from the last\n", | |
291 | "one:" |
|
326 | "one:" | |
292 | ] |
|
327 | ] | |
293 | }, |
|
328 | }, | |
@@ -298,13 +333,15 b'' | |||||
298 | "clf.fit(digits.data[:-1], digits.target[:-1])" |
|
333 | "clf.fit(digits.data[:-1], digits.target[:-1])" | |
299 | ], |
|
334 | ], | |
300 | "language": "python", |
|
335 | "language": "python", | |
|
336 | "metadata": {}, | |||
301 | "outputs": [] |
|
337 | "outputs": [] | |
302 | }, |
|
338 | }, | |
303 | { |
|
339 | { | |
304 | "cell_type": "markdown", |
|
340 | "cell_type": "markdown", | |
|
341 | "metadata": {}, | |||
305 | "source": [ |
|
342 | "source": [ | |
306 | "Now you can predict new values, in particular, we can ask to the", |
|
343 | "Now you can predict new values, in particular, we can ask to the\n", | |
307 | "classifier what is the digit of our last image in the digits dataset,", |
|
344 | "classifier what is the digit of our last image in the digits dataset,\n", | |
308 | "which we have not used to train the classifier:" |
|
345 | "which we have not used to train the classifier:" | |
309 | ] |
|
346 | ] | |
310 | }, |
|
347 | }, | |
@@ -315,40 +352,46 b'' | |||||
315 | "clf.predict(digits.data[-1])" |
|
352 | "clf.predict(digits.data[-1])" | |
316 | ], |
|
353 | ], | |
317 | "language": "python", |
|
354 | "language": "python", | |
|
355 | "metadata": {}, | |||
318 | "outputs": [] |
|
356 | "outputs": [] | |
319 | }, |
|
357 | }, | |
320 | { |
|
358 | { | |
321 | "cell_type": "markdown", |
|
359 | "cell_type": "markdown", | |
|
360 | "metadata": {}, | |||
322 | "source": [ |
|
361 | "source": [ | |
323 | "The corresponding image is the following:" |
|
362 | "The corresponding image is the following:" | |
324 | ] |
|
363 | ] | |
325 | }, |
|
364 | }, | |
326 | { |
|
365 | { | |
327 | "cell_type": "markdown", |
|
366 | "cell_type": "markdown", | |
|
367 | "metadata": {}, | |||
328 | "source": [ |
|
368 | "source": [ | |
329 | "As you can see, it is a challenging task: the images are of poor", |
|
369 | "As you can see, it is a challenging task: the images are of poor\n", | |
330 | "resolution. Do you agree with the classifier?" |
|
370 | "resolution. Do you agree with the classifier?" | |
331 | ] |
|
371 | ] | |
332 | }, |
|
372 | }, | |
333 | { |
|
373 | { | |
334 | "cell_type": "markdown", |
|
374 | "cell_type": "markdown", | |
|
375 | "metadata": {}, | |||
335 | "source": [ |
|
376 | "source": [ | |
336 | "A complete example of this classification problem is available as an", |
|
377 | "A complete example of this classification problem is available as an\n", | |
337 | "example that you can run and study:", |
|
378 | "example that you can run and study:\n", | |
338 | ":ref:`example_plot_digits_classification.py`." |
|
379 | ":ref:`example_plot_digits_classification.py`." | |
339 | ] |
|
380 | ] | |
340 | }, |
|
381 | }, | |
341 | { |
|
382 | { | |
342 | "cell_type": "heading", |
|
383 | "cell_type": "heading", | |
343 | "level": 2, |
|
384 | "level": 2, | |
|
385 | "metadata": {}, | |||
344 | "source": [ |
|
386 | "source": [ | |
345 | "Model persistence" |
|
387 | "Model persistence" | |
346 | ] |
|
388 | ] | |
347 | }, |
|
389 | }, | |
348 | { |
|
390 | { | |
349 | "cell_type": "markdown", |
|
391 | "cell_type": "markdown", | |
|
392 | "metadata": {}, | |||
350 | "source": [ |
|
393 | "source": [ | |
351 | "It is possible to save a model in the scikit by using Python's built-in", |
|
394 | "It is possible to save a model in the scikit by using Python's built-in\n", | |
352 | "persistence model, namely pickle:" |
|
395 | "persistence model, namely pickle:" | |
353 | ] |
|
396 | ] | |
354 | }, |
|
397 | }, | |
@@ -356,27 +399,29 b'' | |||||
356 | "cell_type": "code", |
|
399 | "cell_type": "code", | |
357 | "collapsed": false, |
|
400 | "collapsed": false, | |
358 | "input": [ |
|
401 | "input": [ | |
359 | "from sklearn import svm", |
|
402 | "from sklearn import svm\n", | |
360 | "from sklearn import datasets", |
|
403 | "from sklearn import datasets\n", | |
361 | "clf = svm.SVC()", |
|
404 | "clf = svm.SVC()\n", | |
362 | "iris = datasets.load_iris()", |
|
405 | "iris = datasets.load_iris()\n", | |
363 | "X, y = iris.data, iris.target", |
|
406 | "X, y = iris.data, iris.target\n", | |
364 | "clf.fit(X, y)", |
|
407 | "clf.fit(X, y)\n", | |
365 | "import pickle", |
|
408 | "import pickle\n", | |
366 | "s = pickle.dumps(clf)", |
|
409 | "s = pickle.dumps(clf)\n", | |
367 | "clf2 = pickle.loads(s)", |
|
410 | "clf2 = pickle.loads(s)\n", | |
368 | "clf2.predict(X[0])", |
|
411 | "clf2.predict(X[0])\n", | |
369 | "y[0]" |
|
412 | "y[0]" | |
370 | ], |
|
413 | ], | |
371 | "language": "python", |
|
414 | "language": "python", | |
|
415 | "metadata": {}, | |||
372 | "outputs": [] |
|
416 | "outputs": [] | |
373 | }, |
|
417 | }, | |
374 | { |
|
418 | { | |
375 | "cell_type": "markdown", |
|
419 | "cell_type": "markdown", | |
|
420 | "metadata": {}, | |||
376 | "source": [ |
|
421 | "source": [ | |
377 | "In the specific case of the scikit, it may be more interesting to use", |
|
422 | "In the specific case of the scikit, it may be more interesting to use\n", | |
378 | "joblib's replacement of pickle (joblib.dump & joblib.load),", |
|
423 | "joblib's replacement of pickle (joblib.dump & joblib.load),\n", | |
379 | "which is more efficient on big data, but can only pickle to the disk", |
|
424 | "which is more efficient on big data, but can only pickle to the disk\n", | |
380 | "and not to a string:" |
|
425 | "and not to a string:" | |
381 | ] |
|
426 | ] | |
382 | }, |
|
427 | }, | |
@@ -384,13 +429,15 b'' | |||||
384 | "cell_type": "code", |
|
429 | "cell_type": "code", | |
385 | "collapsed": false, |
|
430 | "collapsed": false, | |
386 | "input": [ |
|
431 | "input": [ | |
387 | "from sklearn.externals import joblib", |
|
432 | "from sklearn.externals import joblib\n", | |
388 | "joblib.dump(clf, 'filename.pkl') # doctest: +SKIP" |
|
433 | "joblib.dump(clf, 'filename.pkl') # doctest: +SKIP" | |
389 | ], |
|
434 | ], | |
390 | "language": "python", |
|
435 | "language": "python", | |
|
436 | "metadata": {}, | |||
391 | "outputs": [] |
|
437 | "outputs": [] | |
392 | } |
|
438 | } | |
393 | ] |
|
439 | ], | |
|
440 | "metadata": {} | |||
394 | } |
|
441 | } | |
395 | ] |
|
442 | ] | |
396 | } No newline at end of file |
|
443 | } |
General Comments 0
You need to be logged in to leave comments.
Login now