##// END OF EJS Templates
use python -m build
use python -m build

File last commit:

r24425:3deabdbd
r27318:28424d86
Show More
Markdown Pandoc Limitations.ipynb
2691 lines | 83.1 KiB | text/plain | TextLexer
/ tools / tests / Markdown Pandoc Limitations.ipynb

WARNING: This document will not render correctly using nbviewer or nbconvert. To render this notebook correctly, open in IPython Notebook and run Cell->Run All from the menu bar.

Introduction

The IPython Notebook allows Markdown, HTML, and inline LaTeX in Mardown Cells. The inline LaTeX is parsed with MathJax and Markdown is parsed with marked. Any inline HTML is left to the web browser to parse. NBConvert is a utility that allows users to easily convert their notebooks to various formats. Pandoc is used to parse markdown text in NBConvert. Since what the notebook web interface supports is a mix of Markdown, HTML, and LaTeX, Pandoc has trouble converting notebook markdown. This results in incomplete representations of the notebook in nbviewer or a compiled Latex PDF.

This isn't a Pandoc flaw; Pandoc isn't designed to parse and convert a mixed format document. Unfortunately, this means that Pandoc can only support a subset of the markup supported in the notebook web interface. This notebook compares output of Pandoc to the notebook web interface.

Changes:

05102013

  • heading anchors
  • note on remote images

06102013

  • remove strip_math_space filter
  • add lxml test

<style> .rendered_html xmp { white-space: pre-wrap; } </style>

Utilities

Define functions to render Markdown using the notebook and Pandoc.

In [1]:
from IPython.nbconvert.utils.pandoc import pandoc
from IPython.display import HTML, Javascript, display

from IPython.nbconvert.filters import citation2latex, strip_files_prefix, \
                                     markdown2html, markdown2latex

def pandoc_render(markdown):
    """Render Pandoc Markdown->LaTeX content."""
    
    ## Convert the markdown directly to latex.  This is what nbconvert does.
    #latex = pandoc(markdown, "markdown", "latex")
    #html = pandoc(markdown, "markdown", "html", ["--mathjax"])
    
    # nbconvert template conversions
    html = strip_files_prefix(markdown2html(markdown))
    latex = markdown2latex(citation2latex(markdown))
    display(HTML(data="<div style='display: inline-block; width: 30%; vertical-align: top;'>" \
                 "<div style='background: #AAFFAA; width: 100%;'>NBConvert Latex Output</div>" \
                 "<pre class='prettyprint lang-tex' style='background: #EEFFEE; border: 1px solid #DDEEDD;'><xmp>" + latex + "</xmp></pre>"\
                 "</div>" \
                 "<div style='display: inline-block; width: 2%;'></div>" \
                 "<div style='display: inline-block; width: 30%; vertical-align: top;'>" \
                 "<div style='background: #FFAAAA; width: 100%;'>NBViewer Output</div>" \
                 "<div style='display: inline-block; width: 100%;'>" + html + "</div>" \
                 "</div>"))
    javascript = """
    $.getScript("https://google-code-prettify.googlecode.com/svn/loader/run_prettify.js");
"""
    display(Javascript(data=javascript))

def notebook_render(markdown):
    javascript = """
var mdcell = new IPython.MarkdownCell();
mdcell.create_element();
mdcell.set_text('""" + markdown.replace("\\", "\\\\").replace("'", "\'").replace("\n", "\\n") + """');
mdcell.render();
$(element).append(mdcell.element)
.removeClass()
.css('left', '66%')
.css('position', 'absolute')
.css('width', '30%')
mdcell.element.prepend(
    $('<div />')
    .removeClass()
    .css('background', '#AAAAFF')
    .css('width', '100 %')
    .html('Notebook Output')

);
container.show()
"""
    display(Javascript(data=javascript))

    
def pandoc_html_render(markdown):
    """Render Pandoc Markdown->LaTeX content."""
    
    # Convert the markdown directly to latex.  This is what nbconvert does.
    latex = pandoc(markdown, "markdown", "latex")
    
    # Convert the pandoc generated latex to HTML so it can be rendered in 
    # the web browser.
    html = pandoc(latex, "latex", "html", ["--mathjax"])
    display(HTML(data="<div style='background: #AAFFAA; width: 40%;'>HTML Pandoc Output</div>" \
                 "<div style='display: inline-block; width: 40%;'>" + html + "</div>"))
    return html
    
def compare_render(markdown):
    notebook_render(markdown)
    pandoc_render(markdown)

Outputs

In [1]:
try:
    import lxml
    print 'LXML found!'
except:
    print 'Warning! No LXML found - the old citation2latex filter will not work'
LXML found!

General markdown

Heading level 6 is not supported by Pandoc.

In [2]:
compare_render(r"""

# Heading 1 
## Heading 2 
### Heading 3 
#### Heading 4 
##### Heading 5 
###### Heading 6""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\section{Heading 1}

\subsection{Heading 2}

\subsubsection{Heading 3}

\paragraph{Heading 4}

\subparagraph{Heading 5}

Heading 6
NBViewer Output

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6
SandBoxed(IPython.core.display.Javascript object)

Headers aren't recognized by (Pandoc on Windows?) if there isn't a blank line above the headers.

In [3]:
compare_render(r"""
# Heading 1 
## Heading 2 
### Heading 3 
#### Heading 4 
##### Heading 5 
###### Heading 6 """)

print("\n"*10)
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\section{Heading 1}

\subsection{Heading 2}

\subsubsection{Heading 3}

\paragraph{Heading 4}

\subparagraph{Heading 5}

Heading 6
NBViewer Output

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6
SandBoxed(IPython.core.display.Javascript object)










If internal links are defined, these will not work in nbviewer and latex as the local link is not existing.

In [4]:
compare_render(r"""
[Link2Heading](http://127.0.0.1:8888/0a2d8086-ee24-4e5b-a32b-f66b525836cb#General-markdown)
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\href{http://127.0.0.1:8888/0a2d8086-ee24-4e5b-a32b-f66b525836cb\#General-markdown}{Link2Heading}
NBViewer Output
SandBoxed(IPython.core.display.Javascript object)

Basic Markdown bold and italic works.

In [5]:
compare_render(r"""
This is Markdown **bold** and *italic* text.
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
This is Markdown \textbf{bold} and \emph{italic} text.
NBViewer Output

This is Markdown bold and italic text.

SandBoxed(IPython.core.display.Javascript object)

Nested lists work as well

In [6]:
compare_render(r"""
- li 1
- li 2
    1. li 3
    1. li 4
- li 5
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
  li 1
\item
  li 2

  \begin{enumerate}
  \def\labelenumi{\arabic{enumi}.}
  \itemsep1pt\parskip0pt\parsep0pt
  \item
    li 3
  \item
    li 4
  \end{enumerate}
\item
  li 5
\end{itemize}
NBViewer Output
  • li 1
  • li 2
    1. li 3
    2. li 4
  • li 5
SandBoxed(IPython.core.display.Javascript object)

Unicode support

In [7]:
compare_render(ur"""
überschuß +***^°³³ α β θ
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
überschuß +\emph{*}\^{}°³³ α β θ
NBViewer Output

überschuß +*^°³³ α β θ

SandBoxed(IPython.core.display.Javascript object)

Pandoc may produce invalid latex, e.g \sout is not allowed in headings

In [8]:
compare_render(r"""

# Heading 1 ~~strikeout~~
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\section{Heading 1 \sout{strikeout}}
NBViewer Output

Heading 1 strikeout

SandBoxed(IPython.core.display.Javascript object)

Horizontal lines work just fine

In [9]:
compare_render(r"""
above

--------

below
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
above

\begin{center}\rule{3in}{0.4pt}\end{center}

below
NBViewer Output

above


below

SandBoxed(IPython.core.display.Javascript object)

Extended markdown of pandoc

(maybe we should deactivate this)

In [10]:
compare_render(r"""
This is Markdown ~subscript~ and ^superscript^ text.
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
This is Markdown \textsubscript{subscript} and
\textsuperscript{superscript} text.
NBViewer Output

This is Markdown subscript and superscript text.

SandBoxed(IPython.core.display.Javascript object)

No space before underline behaves inconsistent (Pandoc extension: intraword_underscores - deactivate?)

In [11]:
compare_render(r"""
This is Markdown not_italic_.
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
This is Markdown not\_italic\_.
NBViewer Output

This is Markdown not_italic_.

SandBoxed(IPython.core.display.Javascript object)

Pandoc allows to define tex macros which are respected for all output formats, the notebook not.

In [12]:
compare_render(r"""
\newcommand{\tuple}[1]{\langle #1 \rangle}

$\tuple{a, b, c}$
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\newcommand{\tuple}[1]{\langle #1 \rangle}

$\tuple{a, b, c}$
NBViewer Output

\(\langle a, b, c \rangle\)

SandBoxed(IPython.core.display.Javascript object)

When placing the \newcommand inside a math environment it works within the notebook and nbviewer, but produces invalid latex (the newcommand is only valid in the same math environment).

In [13]:
compare_render(r"""
$\newcommand{\foo}[1]{...:: #1 ::...}$
$\foo{bar}$
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
$\newcommand{\foo}[1]{...:: #1 ::...}$ $\foo{bar}$
NBViewer Output

\(\newcommand{\foo}[1]{...:: #1 ::...}\) \(\foo{bar}\)

SandBoxed(IPython.core.display.Javascript object)

HTML or LaTeX injections

Raw HTML gets dropped entirely when converting to $\LaTeX$.

In [14]:
compare_render(r"""
This is HTML <b>bold</b> and <i>italic</i> text.
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
This is HTML bold and italic text.
NBViewer Output

This is HTML bold and italic text.

SandBoxed(IPython.core.display.Javascript object)

Same for something like center

In [15]:
compare_render(r"""
<center>Center aligned</center>
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
Center aligned
NBViewer Output
Center aligned
SandBoxed(IPython.core.display.Javascript object)

Raw $\LaTeX$ gets droppen entirely when converted to HTML. (I don't know why the HTML output is cropped here???)

In [16]:
compare_render(r"""
This is \LaTeX \bf{bold} and \emph{italic} text.
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
This is \LaTeX \bf{bold} and \emph{italic} text.
NBViewer Output

This is

SandBoxed(IPython.core.display.Javascript object)

A combination of raw $\LaTeX$ and raw HTML

In [17]:
compare_render(r"""
**foo** $\left( \sum_{k=1}^n a_k b_k \right)^2 \leq$ <b>b\$ar</b> $$test$$ 
\cite{}
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\textbf{foo} $\left( \sum_{k=1}^n a_k b_k \right)^2 \leq$ b\$ar \[test\]
\cite{}
NBViewer Output

foo \(\left( \sum_{k=1}^n a_k b_k \right)^2 \leq\) b$ar \[test\]

SandBoxed(IPython.core.display.Javascript object)

Tables

HTML tables render in the notebook, but not in Pandoc.

In [18]:
compare_render(r"""
<table>
    <tr>
        <td>a</td>
        <td>b</td>
    </tr>
    <tr>
        <td>c</td>
        <td>d</td>
    </tr>
</table>
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
a

b

c

d
NBViewer Output
a b
c d
SandBoxed(IPython.core.display.Javascript object)

Instead, Pandoc supports simple ascii tables. Unfortunately marked.js doesn't support this, and therefore it is not supported in the notebook.

In [19]:
compare_render(r"""
+---+---+
| a | b |
+---+---+
| c | d |
+---+---+
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\begin{longtable}[c]{@{}ll@{}}
\hline\noalign{\medskip}
\begin{minipage}[t]{0.06\columnwidth}\raggedright
a
\end{minipage} &amp; \begin{minipage}[t]{0.06\columnwidth}\raggedright
b
\end{minipage}
\\\noalign{\medskip}
\begin{minipage}[t]{0.06\columnwidth}\raggedright
c
\end{minipage} &amp; \begin{minipage}[t]{0.06\columnwidth}\raggedright
d
\end{minipage}
\\\noalign{\medskip}
\hline
\end{longtable}
NBViewer Output

a

b

c

d

SandBoxed(IPython.core.display.Javascript object)

An alternative to basic ascii tables is pipe tables. Pipe tables can be recognized by Pandoc and are supported by marked, hence, this is the best way to add tables.

In [20]:
compare_render(r"""
|Left |Center |Right|
|:----|:-----:|----:|
|Text1|Text2  |Text3|
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\begin{longtable}[c]{@{}lcr@{}}
\hline\noalign{\medskip}
Left &amp; Center &amp; Right
\\\noalign{\medskip}
\hline\noalign{\medskip}
Text1 &amp; Text2 &amp; Text3
\\\noalign{\medskip}
\hline
\end{longtable}
NBViewer Output
Left Center Right
Text1 Text2 Text3
SandBoxed(IPython.core.display.Javascript object)

Pandoc recognizes cell alignment in simple tables. Since marked.js doesn't recognize ascii tables, it can't render this table.

In [21]:
compare_render(r"""
Right Aligned Center Aligned Left Aligned
------------- -------------- ------------
          Why      does      this
     actually      work?     Who
        knows       ...
""")

print("\n"*5)
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\begin{longtable}[c]{@{}lll@{}}
\hline\noalign{\medskip}
Right Aligned &amp; Center Aligned &amp; Left Aligned
\\\noalign{\medskip}
\hline\noalign{\medskip}
Why &amp; does &amp; this
\\\noalign{\medskip}
actually &amp; work? &amp; Who
\\\noalign{\medskip}
knows &amp; \ldots{} &amp;
\\\noalign{\medskip}
\hline
\end{longtable}
NBViewer Output
Right Aligned Center Aligned Left Aligned
Why does this
actually work? Who
knows ...
SandBoxed(IPython.core.display.Javascript object)





Images

Markdown images work on both. However, remote images are not allowed in $\LaTeX$. Maybe add a preprocessor to download these. The alternate text is displayed in nbviewer next to the image.

In [22]:
compare_render(r"""
![Alternate Text](https://ipython.org/_static/IPy_header.png)
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\begin{figure}[htbp]
\centering
\includegraphics{https://ipython.org/_static/IPy_header.png}
\caption{Alternate Text}
\end{figure}
NBViewer Output
Alternate Text

Alternate Text

SandBoxed(IPython.core.display.Javascript object)

HTML Images only work in the notebook.

In [23]:
compare_render(r"""
<img src="https://ipython.org/_static/IPy_header.png">
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
NBViewer Output

No description has been provided for this image

SandBoxed(IPython.core.display.Javascript object)

Math

Simple inline and displaystyle maths work fine

In [24]:
compare_render(r"""
My equation:
$$ 5/x=2y $$

It is inline $ 5/x=2y $ here.
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
My equation: \[ 5/x=2y \]

It is inline \$ 5/x=2y \$ here.
NBViewer Output

My equation: \[ 5/x=2y \]

It is inline $ 5/x=2y $ here.

SandBoxed(IPython.core.display.Javascript object)

If the first $ is on a new line, the equation is not captured by md2tex, if both $s are on a new line md2html fails (Note the raw latex is dropped) but the notebook renders it correctly.

In [25]:
compare_render(r"""
$5 \cdot x=2$

$
5 \cdot x=2$

$
5 \cdot x=2
$
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
$5 \cdot x=2$

\$ 5 \cdot x=2\$

\$ 5 \cdot x=2 \$
NBViewer Output

\(5 \cdot x=2\)

$ 5 x=2$

$ 5 x=2 $

SandBoxed(IPython.core.display.Javascript object)

MathJax permits some $\LaTeX$ math constructs without $s, of course these raw $\LaTeX$ is stripped when converting to html. Moreove, the & are escaped by the lxml parsing #4251.

In [26]:
compare_render(r"""
\begin{align}
a & b\\
d & c
\end{align}

\begin{eqnarray}
a & b \\
c & d
\end{eqnarray}
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
\begin{align}
a &amp; b\\
d &amp; c
\end{align}

\begin{eqnarray}
a &amp; b \\
c &amp; d
\end{eqnarray}
NBViewer Output
SandBoxed(IPython.core.display.Javascript object)

There is another lxml issue, #4283

In [27]:
compare_render(r"""
1<2 is true, but 3>4 is false.

$1<2$ is true, but $3>4$ is false.

1<2 it is even worse if it is alone in a line.
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
14 is false.

$14$ is false.

1
NBViewer Output

1<2 is true, but 3>4 is false.

\(1<2\) is true, but \(3>4\) is false.

1<2 it is even worse if it is alone in a line.

SandBoxed(IPython.core.display.Javascript object)

Listings, and Code blocks

In [28]:
compare_render(r"""
some source code

```
a = "test"
print(a)
```
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
some source code

\begin{verbatim}
a = "test"
print(a)
\end{verbatim}
NBViewer Output

some source code

a = "test"
print(a)
SandBoxed(IPython.core.display.Javascript object)

Language specific syntax highlighting by Pandoc requires additional dependencies to render correctly.

In [29]:
compare_render(r"""
some source code

```python
a = "test"
print(a)
```
""")
SandBoxed(IPython.core.display.Javascript object)
NBConvert Latex Output
some source code

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{a = }\StringTok{"test"}
\KeywordTok{print}\NormalTok{(a)}
\end{Highlighting}
\end{Shaded}
NBViewer Output

some source code

a = "test"
print(a)
SandBoxed(IPython.core.display.Javascript object)