##// END OF EJS Templates
Merge pull request #6529 from Carreau/marrijnhmirror...
Merge pull request #6529 from Carreau/marrijnhmirror codemirror repo moved, update links in comments

File last commit:

r17497:0df787b5
r17944:4f93283d merge
Show More
Using nbconvert as a Library.ipynb
1053 lines | 73.2 KiB | text/plain | TextLexer
/ examples / Notebook / Using nbconvert as a Library.ipynb

NbConvert, Python library

In this Notebook, I will introduce you to the programatic API of nbconvert to show you how to use it in various context.

For this I will use one of @jakevdp great blog post. I've explicitely chosen a post with no javascript tricks as Jake seem to be found of right now, for the reason that the becommings of embeding javascript in nbviewer, which is based on nbconvert is not fully decided yet.

This will not focus on using the command line tool to convert file. The attentive reader will point-out that no data are read from, or written to disk during the conversion process. Indeed, nbconvert as been though as much as possible to avoid IO operation and work as well in a database, or web-based environement.

Quick overview

 Warning, Do use 1.0 or 1.x branch and not master naming have changed.
 Warning, NbConvert is a Tech-Preview, API will change within the next 6 month.

Credit, Jonathan Freder (@jdfreder on github)

<center> nbca </center>

The main principle of nbconvert is to instanciate a Exporter that controle a pipeline through which each notebook you want to export with go through.

Let's start by importing what we need from the API, and download @jakevdp's notebook.

In [1]:
import requests
response = requests.get('http://jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb')
response.content[0:60]+'...'
Out[1]:
'{\n "metadata": {\n  "name": "XKCD_plots"\n },\n "nbformat": 3,\n...'

If you do not have request install downlad by hand, and read the file as usual.

We read the response into a slightly more convenient format which represent IPython notebook. There are not real advantages for now, except some convenient methods, but with time this structure should be able to guarantee that the notebook structure is valid. Note also that the in-memory format and on disk format can be slightly different. In particual, on disk, multiline strings might be spitted into list of string to be more version control friendly.

In [2]:
from IPython.nbformat import current as nbformat
jake_notebook = nbformat.reads_json(response.content)
jake_notebook.worksheets[0].cells[0]
Out[2]:
{u'cell_type': u'heading',
 u'level': 1,
 u'metadata': {},
 u'source': u'XKCD plots in Matplotlib'}

So we have here Jake's notebook in a convenient form, which is mainly a Super-Powered dict and list nested. You don't need to worry about the exact structure.

The nbconvert API exposes some basic exporter for common format and default options. We will start by using one of them. First we import it, instanciate an instance with most of the default parameters and fed it the downloaded notebook.

In [3]:
import IPython.nbconvert
In [4]:
from IPython.config import Config
from IPython.nbconvert import HTMLExporter

## I use `basic` here to have less boilerplate and headers in the HTML.
## we'll see later how to pass config to exporters.
exportHtml = HTMLExporter(config=Config({'HTMLExporter':{'default_template':'basic'}}))
In [5]:
(body, resources) = exportHtml.from_notebook_node(jake_notebook)

The exporter returns a tuple containing the body of the converted notebook, here raw HTML, as well as a resources dict. The resource dict contains (among many things) the extracted PNG, JPG [...etc] from the notebook when applicable. The basic HTML exporter does keep them as embeded base64 into the notebook, but one can do ask the figures to be extracted. Cf advance use. So for now the resource dict should be mostly empty, except for 1 key containing some css, and 2 others whose content will be obvious.

Exporter are stateless, you won't be able to extract any usefull information (except their configuration) from them. You can directly re-use the instance to convert another notebook. Each exporter expose for convenience a from_file and from_filename methods if you need.

In [6]:
print resources.keys()
print resources['metadata']
print resources['output_extension']
# print resources['inlining'] # too lng to be shown
['inlining', 'output_extension', 'metadata']
defaultdict(None, {'name': 'Notebook'})
html
In [7]:
# Part of the body, here the first Heading
start = body.index('<h1 id', )
print body[:400]+'...'
<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="XKCD-plots-in-Matplotlib">XKCD plots in Matplotlib<a class="anchor-link" href="#XKCD-plots-in-Matplotlib">&#182;</a></h1>
</div>

<div class="text_cell_render border-box-sizing rendered_html">
<p>This notebook originally appeared as a blog post at <a href="http://jakevdp.github.com/blog/2012/10/07/xkcd-style-plots-in-matplotli...

You can directly write the body into an HTML file if you wish, as you see it does not contains any body tag, or style declaration, but thoses are included in the default HtmlExporter if you do not pass it a config object as I did.

Extracting Figures

When exporting one might want to extract the base64 encoded figures to separate files, this is by default what does the RstExporter does, let see how to use it.

In [8]:
from IPython.nbconvert import RSTExporter

rst_export = RSTExporter()

(body,resources) = rst_export.from_notebook_node(jake_notebook)
In [9]:
print body[:970]+'...'
print '[.....]'
print body[800:1200]+'...'
XKCD plots in Matplotlib
========================


This notebook originally appeared as a blog post at `Pythonic
Perambulations <http://jakevdp.github.com/blog/2012/10/07/xkcd-style-plots-in-matplotlib/>`_
by Jake Vanderplas.

 *Update: the matplotlib pull request has been merged! See* `*This
post* <http://jakevdp.github.io/blog/2013/07/10/XKCD-plots-in-matplotlib/>`_
*for a description of the XKCD functionality now built-in to
matplotlib!*

One of the problems I've had with typical matplotlib figures is that
everything in them is so precise, so perfect. For an example of what I
mean, take a look at this figure:
In[1]:
.. code:: python

    from IPython.display import Image
    Image('http://jakevdp.github.com/figures/xkcd_version.png')





.. image:: output_3_0.png



Sometimes when showing schematic plots, this is the type of figure I
want to display. But drawing it by hand is a pain: I'd rather just use
matplotlib. The problem is, matplotlib is a bit...
[.....]
owing schematic plots, this is the type of figure I
want to display. But drawing it by hand is a pain: I'd rather just use
matplotlib. The problem is, matplotlib is a bit too precise. Attempting
to duplicate this figure in matplotlib leads to something like this:
In[2]:
.. code:: python

    Image('http://jakevdp.github.com/figures/mpl_version.png')





.. image:: output_5_0.png



It just doesn'...

Here we see that base64 images are not embeded, but we get what look like file name. Actually those are (Configurable) keys to get back the binary data from the resources dict we havent inspected earlier.

So when writing a Rst Plugin for any blogengine, Sphinx or anything else, you will be responsible for writing all those data to disk, in the right place. Of course to help you in this task all those naming are configurable in the right place.

let's try to see how to get one of these images

In [10]:
resources['outputs'].keys()
Out[10]:
[u'output_13_1.text',
 u'output_18_0.text',
 u'output_3_0.text',
 u'output_18_1.png',
 u'output_12_0.text',
 u'output_5_0.text',
 u'output_5_0.png',
 u'output_13_1.png',
 u'output_16_0.text',
 u'output_13_0.text',
 u'output_18_1.text',
 u'output_3_0.png',
 u'output_16_0.png']

We have extracted 5 binary figures, here pngs, but they could have been svg, and then wouldn't appear in the binary sub dict. keep in mind that a object having multiple repr will store all it's repr in the notebook.

Hence if you provide _repr_javascript_,_repr_latex_ and _repr_png_to an object, you will be able to determine at conversion time which representaition is the more appropriate. You could even decide to show all the representaition of an object, it's up to you. But this will require beeing a little more involve and write a few line of Jinja template. This will probably be the subject of another tutorial.

Back to our images,

In [11]:
from IPython.display import Image
Image(data=resources['outputs']['output_3_0.png'],format='png')
Out[11]:
No description has been provided for this image

Yep, this is indeed the image we were expecting, and I was able to see it without ever writing or reading it from disk. I don't think I'll have to show to you what to do with those data, as if you are here you are most probably familiar with IO.

Extracting figures with HTML Exporter ?

Use case:

> I write an awesome blog in HTML, and I want all but having base64 embeded images. Having one html file with all inside is nice to send to coworker, but I definitively want resources to be cached ! So I need an HTML exporter, and I want it to extract the figures !

Some theory

The process of converting a notebook to a another format with the nbconvert Exporters happend in a few steps:

  • Get the notebook data and other required files. (you are responsible for that)
  • Feed them to the exporter that will
    • sequentially feed the data to a number of Transformers. Transformer only act on the structure
    of the notebook, and have access to it all.
    • feed the notebook through the jinja templating engine
      • the use templates are configurable.
      • templates make use of configurable macros called filters.
  • The exporter return the converted notebook as well as other relevant resources as a tuple.
  • Write what you need to disk, or elsewhere. (You are responsible for it)

Here we'll be interested in the Transformers. Each Transformer is applied successively and in order on the notebook before going through the conversion process.

We provide some transformer that do some modification on the notebook structure by default. One of them, the ExtractOutputTransformer is responsible for crawling notebook, finding all the figures, and put them into the resources directory, as well as choosing the key (filename_xx_y.extension) that can replace the figure in the template.

The ExtractOutputTransformer is special in the fact that it should be availlable on all Exporters, but is just inactive by default on some exporter.

In [12]:
# second transformer shoudl be Instance of ExtractFigureTransformer
exportHtml._transformers # 3rd one shouel be <ExtractOutputTransformer>
Out[12]:
[<function IPython.nbconvert.transformers.coalescestreams.wrappedfunc>,
 <IPython.nbconvert.transformers.svg2pdf.SVG2PDFTransformer at 0x10d2a7490>,
 <IPython.nbconvert.transformers.extractoutput.ExtractOutputTransformer at 0x10d2a7ad0>,
 <IPython.nbconvert.transformers.csshtmlheader.CSSHTMLHeaderTransformer at 0x10d2a7b50>,
 <IPython.nbconvert.transformers.revealhelp.RevealHelpTransformer at 0x10d29dd90>,
 <IPython.nbconvert.transformers.latex.LatexTransformer at 0x10d29db50>,
 <IPython.nbconvert.transformers.sphinx.SphinxTransformer at 0x10d2a7b90>]

To enable it we will use IPython configuration/Traitlets system. If you are have already set some IPython configuration options, this will look pretty familiar to you. Configuration option are always of the form:

ClassName.attribute_name = value

A few ways exist to create such config, like reading a config file in your profile, but you can also do it programatically usign a dictionary. Let's create such a config object, and see the difference if we pass it to our HtmlExporter

In [13]:
from IPython.config import Config

c =  Config({
            'ExtractOutputTransformer':{'enabled':True}
            })

exportHtml = HTMLExporter()
exportHtml_and_figs = HTMLExporter(config=c)

(_, resources)          = exportHtml.from_notebook_node(jake_notebook)
(_, resources_with_fig) = exportHtml_and_figs.from_notebook_node(jake_notebook)

print 'resources without the "figures" key :'
print resources.keys()

print ''
print 'Here we have one more field '
print resources_with_fig.keys()
resources_with_fig['outputs'].keys() 
resources without the "figures" key :
['inlining', 'output_extension', 'metadata']

Here we have one more field 
['outputs', 'inlining', 'output_extension', 'metadata']
Out[13]:
[u'output_13_1.text',
 u'output_18_0.text',
 u'output_3_0.text',
 u'output_18_1.png',
 u'output_12_0.text',
 u'output_5_0.text',
 u'output_5_0.png',
 u'output_13_1.png',
 u'output_16_0.text',
 u'output_13_0.text',
 u'output_18_1.text',
 u'output_3_0.png',
 u'output_16_0.png']

So now you can loop through the dict and write all those figures to disk in the right place...

Custom transformer

Of course you can imagine many transformation that you would like to apply to a notebook. This is one of the reason we provide a way to register your own transformers that will be applied to the notebook after the default ones.

To do so you'll have to pass an ordered list of Transformers to the Exporter constructor.

But what is an transformer ? Transformer can be either decorated function for dead-simple Transformers that apply independently to each cell, for more advance transformation that support configurability You have to inherit from Transformer and define a call method as we'll see below.

All transforers have a magic attribute that allows it to be activated/disactivate from the config dict.

In [14]:
from IPython.nbconvert.transformers import Transformer
import IPython.config
print "Four relevant docstring"
print '============================='
print Transformer.__doc__
print '============================='
print Transformer.call.__doc__
print '============================='
print Transformer.transform_cell.__doc__
print '============================='
Four relevant docstring
=============================
 A configurable transformer

    Inherit from this class if you wish to have configurability for your
    transformer.

    Any configurable traitlets this class exposed will be configurable in profiles
    using c.SubClassName.atribute=value

    you can overwrite transform_cell to apply a transformation independently on each cell
    or __call__ if you prefer your own logic. See corresponding docstring for informations.

    Disabled by default and can be enabled via the config by
        'c.YourTransformerName.enabled = True'
    
=============================

        Transformation to apply on each notebook.
        
        You should return modified nb, resources.
        If you wish to apply your transform on each cell, you might want to 
        overwrite transform_cell method instead.
        
        Parameters
        ----------
        nb : NotebookNode
            Notebook being converted
        resources : dictionary
            Additional resources used in the conversion process.  Allows
            transformers to pass variables into the Jinja engine.
        
=============================

        Overwrite if you want to apply a transformation on each cell.  You 
        should return modified cell and resource dictionary.
        
        Parameters
        ----------
        cell : NotebookNode cell
            Notebook cell being processed
        resources : dictionary
            Additional resources used in the conversion process.  Allows
            transformers to pass variables into the Jinja engine.
        index : int
            Index of the cell being processed
        
=============================

We don't provide convenient method to be aplied on each worksheet as the data structure for worksheet will be removed. (not the worksheet functionnality, which is still on it's way)


Example

I'll now demonstrate a specific example requested while nbconvert 2 was beeing developped. The ability to exclude cell from the conversion process based on their index.

I'll let you imagin how to inject cell, if what you just want is to happend static content at the beginning/end of a notebook, plese refer to templating section, it will be much easier and cleaner.

In [15]:
from IPython.utils.traitlets import Integer
In [16]:
class PelicanSubCell(Transformer):
    """A Pelican specific transformer to remove somme of the cells of a notebook"""
    
    # I could also read the cells from nbc.metadata.pelican is someone wrote a JS extension
    # But I'll stay with configurable value. 
    start = Integer(0, config=True, help="first cell of notebook to be converted")
    end   = Integer(-1, config=True, help="last cell of notebook to be converted")
    
    def call(self, nb, resources):

        #nbc = deepcopy(nb)
        nbc = nb
        # don't print in real transformer !!!
        print "I'll keep only cells from ", self.start, "to ", self.end, "\n\n"
        for worksheet in nbc.worksheets :
            cells = worksheet.cells[:]
            worksheet.cells = cells[self.start:self.end]                    
        return nbc, resources
In [17]:
# I create this on the fly, but this could be loaded from a DB, and config object support merging...
c =  Config({
            'PelicanSubCell':{
                            'enabled':True,
                            'start':4,
                            'end':6,
                             }
            })

I'm creating a pelican exporter that take PelicanSubCell extra transformers and a config object as parameter. This might seem redundant, but with configuration system you'll see that one can register an inactive transformer on all exporters and activate it at will form its config files and command line.

In [18]:
pelican = RSTExporter(transformers=[PelicanSubCell], config=c)
In [19]:
print pelican.from_notebook_node(jake_notebook)[0]
I'll keep only cells from  4 to  6 



Sometimes when showing schematic plots, this is the type of figure I
want to display. But drawing it by hand is a pain: I'd rather just use
matplotlib. The problem is, matplotlib is a bit too precise. Attempting
to duplicate this figure in matplotlib leads to something like this:
In[2]:
.. code:: python

    Image('http://jakevdp.github.com/figures/mpl_version.png')





.. image:: output_5_0.png



Programatic example of extending templates / cutom filters

In [20]:
from IPython.nbconvert.filters.highlight import _pygment_highlight
from pygments.formatters import HtmlFormatter

from IPython.nbconvert.exporters import HTMLExporter
from IPython.config import Config

from IPython.nbformat import current as nbformat

Here we define a dustom 'highlight' filter that apply a custom class to code in css. We register this filter with a already existing name, so it will replace the default one.

In [21]:
def my_highlight(source, language='ipython'):
    formatter = HtmlFormatter(cssclass='highlight-ipynb')
    return _pygment_highlight(source, formatter, language)
        
c = Config({'CSSHtmlHeaderTransformer':
                    {'enabled':False, 'highlight_class':'highlight-ipynb'}})

exportHtml = HTMLExporter( config=c , filters={'highlight2html': my_highlight} )
(body,resources) = exportHtml.from_notebook_node(jake_notebook)
In [22]:
i = body.index('highlight-ipynb')
body[i-12:i+50]
Out[22]:
u'<div class="highlight-ipynb"><pre><span class="kn">from</span>'

Programatically make templates

In [23]:
from jinja2 import DictLoader

dl = DictLoader({'html_full.tpl': 
"""
{%- extends 'html_basic.tpl' -%} 

{% block footer %}
FOOOOOOOOTEEEEER
{% endblock footer %}
"""})


exportHtml = HTMLExporter( config=None , filters={'highlight': my_highlight}, extra_loaders=[dl] )
(body,resources) = exportHtml.from_notebook_node(jake_notebook)
for l in body.split('\n')[-4:]:
    print l
<p>This post was written entirely in an IPython Notebook: the notebook file is available for download <a href="http://jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb">here</a>. For more information on blogging with notebooks in octopress, see my <a href="http://jakevdp.github.com/blog/2012/10/04/blogging-with-ipython/">previous post</a> on the subject.</p>
</div>
FOOOOOOOOTEEEEER

Real World Use

@jakevdp use Pelican and IPython Notebook to blog. Pelican Will use nbconvert programatically to generate blog post. Have a look a Pythonic Preambulations for Jake blog post.

@damianavila Wrote a Nicholas Plugin to Write blog post as Notebook and is developping a js-extension to publish notebooks in one click from the web app.

<center>

</center>

And finaly, what you just did, is replicate what nbviewer does. WHich to fetch a notebook from url, convert it and send in back to you as a static html.

A few gotchas

Jinja blocks use {% %}by default which does not play nicely with $\LaTeX$, hence thoses are replaced by ((* *)) in latex templates.

In [ ]: