##// END OF EJS Templates
Document the new input transformation API
Thomas Kluyver -
Show More
@@ -0,0 +1,3 b''
1 * The API for transforming input before it is parsed as Python code has been
2 completely redesigned, and any custom input transformations will need to be
3 rewritten. See :doc:`/config/inputtransforms` for details of the new API.
@@ -1,193 +1,167 b''
1 1
2 2 ===========================
3 3 Custom input transformation
4 4 ===========================
5 5
6 6 IPython extends Python syntax to allow things like magic commands, and help with
7 7 the ``?`` syntax. There are several ways to customise how the user's input is
8 8 processed into Python code to be executed.
9 9
10 10 These hooks are mainly for other projects using IPython as the core of their
11 11 interactive interface. Using them carelessly can easily break IPython!
12 12
13 13 String based transformations
14 14 ============================
15 15
16 16 .. currentmodule:: IPython.core.inputtransforms
17 17
18 When the user enters a line of code, it is first processed as a string. By the
18 When the user enters code, it is first processed as a string. By the
19 19 end of this stage, it must be valid Python syntax.
20 20
21 These transformers all subclass :class:`IPython.core.inputtransformer.InputTransformer`,
22 and are used by :class:`IPython.core.inputsplitter.IPythonInputSplitter`.
23
24 These transformers act in three groups, stored separately as lists of instances
25 in attributes of :class:`~IPython.core.inputsplitter.IPythonInputSplitter`:
26
27 * ``physical_line_transforms`` act on the lines as the user enters them. For
28 example, these strip Python prompts from examples pasted in.
29 * ``logical_line_transforms`` act on lines as connected by explicit line
30 continuations, i.e. ``\`` at the end of physical lines. They are skipped
31 inside multiline Python statements. This is the point where IPython recognises
32 ``%magic`` commands, for instance.
33 * ``python_line_transforms`` act on blocks containing complete Python statements.
34 Multi-line strings, lists and function calls are reassembled before being
35 passed to these, but note that function and class *definitions* are still a
36 series of separate statements. IPython does not use any of these by default.
37
38 An InteractiveShell instance actually has two
39 :class:`~IPython.core.inputsplitter.IPythonInputSplitter` instances, as the
40 attributes :attr:`~IPython.core.interactiveshell.InteractiveShell.input_splitter`,
41 to tell when a block of input is complete, and
42 :attr:`~IPython.core.interactiveshell.InteractiveShell.input_transformer_manager`,
43 to transform complete cells. If you add a transformer, you should make sure that
44 it gets added to both, e.g.::
45
46 ip.input_splitter.logical_line_transforms.append(my_transformer())
47 ip.input_transformer_manager.logical_line_transforms.append(my_transformer())
21 .. versionchanged:: 7.0
22
23 The API for string and token-based transformations has been completely
24 redesigned. Any third party code extending input transformation will need to
25 be rewritten. The new API is, hopefully, simpler.
26
27 String based transformations are managed by
28 :class:`IPython.core.inputtransformer2.TransformerManager`, which is attached to
29 the :class:`~IPython.core.interactiveshell.InteractiveShell` instance as
30 ``input_transformer_manager``. This passes the
31 data through a series of individual transformers. There are two kinds of
32 transformers stored in three groups:
33
34 * ``cleanup_transforms`` and ``line_transforms`` are lists of functions. Each
35 function is called with a list of input lines (which include trailing
36 newlines), and they return a list in the same format. ``cleanup_transforms``
37 are run first; they strip prompts and leading indentation from input.
38 The only default transform in ``line_transforms`` processes cell magics.
39 * ``token_transformers`` is a list of :class:`IPython.core.inputtransformer2.TokenTransformBase`
40 subclasses (not instances). They recognise special syntax like
41 ``%line magics`` and ``help?``, and transform them to Python syntax. The
42 interface for these is more complex; see below.
48 43
49 44 These transformers may raise :exc:`SyntaxError` if the input code is invalid, but
50 45 in most cases it is clearer to pass unrecognised code through unmodified and let
51 46 Python's own parser decide whether it is valid.
52 47
53 48 .. versionchanged:: 2.0
54 49
55 50 Added the option to raise :exc:`SyntaxError`.
56 51
57 Stateless transformations
58 -------------------------
52 Line based transformations
53 --------------------------
59 54
60 The simplest kind of transformations work one line at a time. Write a function
61 which takes a line and returns a line, and decorate it with
62 :meth:`StatelessInputTransformer.wrap`::
55 For example, imagine we want to obfuscate our code by reversing each line, so
56 we'd write ``)5(f =+ a`` instead of ``a += f(5)``. Here's how we could swap it
57 back the right way before IPython tries to run it::
63 58
64 @StatelessInputTransformer.wrap
65 def my_special_commands(line):
66 if line.startswith("¬"):
67 return "specialcommand(" + repr(line) + ")"
68 return line
59 def reverse_line_chars(lines):
60 new_lines = []
61 for line in lines:
62 chars = line[:-1] # the newline needs to stay at the end
63 new_lines.append(chars[::-1] + '\n')
64 return new_lines
69 65
70 The decorator returns a factory function which will produce instances of
71 :class:`~IPython.core.inputtransformer.StatelessInputTransformer` using your
72 function.
66 To start using this::
73 67
74 Transforming a full block
75 -------------------------
76
77 .. warning::
68 ip = get_ipython()
69 ip.input_transformer_manager.line_transforms.append(reverse_line_chars)
78 70
79 Transforming a full block at once will break the automatic detection of
80 whether a block of code is complete in interfaces relying on this
81 functionality, such as terminal IPython. You will need to use a
82 shortcut to force-execute your cells.
71 Token based transformations
72 ---------------------------
83 73
84 Transforming a full block of python code is possible by implementing a
85 :class:`~IPython.core.inputtransformer.Inputtransformer` and overwriting the
86 ``push`` and ``reset`` methods. The reset method should send the full block of
87 transformed text. As an example a transformer the reversed the lines from last
88 to first.
74 These recognise special syntax like ``%magics`` and ``help?``, and transform it
75 into valid Python code. Using tokens makes it easy to avoid transforming similar
76 patterns inside comments or strings.
89 77
90 from IPython.core.inputtransformer import InputTransformer
78 The API for a token-based transformation looks like this::
91 79
92 class ReverseLineTransformer(InputTransformer):
80 .. class:: MyTokenTransformer
93 81
94 def __init__(self):
95 self.acc = []
82 .. classmethod:: find(tokens_by_line)
96 83
97 def push(self, line):
98 self.acc.append(line)
99 return None
84 Takes a list of lists of :class:`tokenize.TokenInfo` objects. Each sublist
85 is the tokens from one Python line, which may span several physical lines,
86 because of line continuations, multiline strings or expressions. If it
87 finds a pattern to transform, it returns an instance of the class.
88 Otherwise, it returns None.
100 89
101 def reset(self):
102 ret = '\n'.join(self.acc[::-1])
103 self.acc = []
104 return ret
90 .. attribute:: start_lineno
91 start_col
92 priority
105 93
94 These attributes are used to select which transformation to run first.
95 ``start_lineno`` is 0-indexed (whereas the locations on
96 :class:`~tokenize.TokenInfo` use 1-indexed line numbers). If there are
97 multiple matches in the same location, the one with the smaller
98 ``priority`` number is used.
106 99
107 Coroutine transformers
108 ----------------------
100 .. method:: transform(lines)
109 101
110 More advanced transformers can be written as coroutines. The coroutine will be
111 sent each line in turn, followed by ``None`` to reset it. It can yield lines, or
112 ``None`` if it is accumulating text to yield at a later point. When reset, it
113 should give up any code it has accumulated.
102 This should transform the individual recognised pattern that was
103 previously found. As with line-based transforms, it takes a list of
104 lines as strings, and returns a similar list.
114 105
115 You may use :meth:`CoroutineInputTransformer.wrap` to simplify the creation of
116 such a transformer.
106 Because each transformation may affect the parsing of the code after it,
107 ``TransformerManager`` takes a careful approach. It calls ``find()`` on all
108 available transformers. If any find a match, the transformation which matched
109 closest to the start is run. Then it tokenises the transformed code again,
110 and starts the process again. This continues until none of the transformers
111 return a match. So it's important that the transformation removes the pattern
112 which ``find()`` recognises, otherwise it will enter an infinite loop.
117 113
118 Here is a simple :class:`CoroutineInputTransformer` that can be thought of
119 being the identity::
114 For example, here's a transformer which will recognise ``¬`` as a prefix for a
115 new kind of special command::
120 116
121 from IPython.core.inputtransformer import CoroutineInputTransformer
117 import tokenize
118 from IPython.core.inputtransformer2 import TokenTransformBase
122 119
123 @CoroutineInputTransformer.wrap
124 def noop():
125 line = ''
126 while True:
127 line = (yield line)
120 class MySpecialCommand(TokenTransformBase):
121 @classmethod
122 def find(cls, tokens_by_line):
123 """Find the first escaped command (¬foo) in the cell.
124 """
125 for line in tokens_by_line:
126 ix = 0
127 # Find the first token that's not INDENT/DEDENT
128 while line[ix].type in {tokenize.INDENT, tokenize.DEDENT}:
129 ix += 1
130 if line[ix].string == '¬':
131 return cls(line[ix].start)
128 132
129 ip = get_ipython()
133 def transform(self, lines):
134 indent = lines[self.start_line][:self.start_col]
135 content = lines[self.start_line][self.start_col+1:]
130 136
131 ip.input_splitter.logical_line_transforms.append(noop())
132 ip.input_transformer_manager.logical_line_transforms.append(noop())
137 lines_before = lines[:self.start_line]
138 call = "specialcommand(%r)" % content
139 new_line = indent + call + '\n'
140 lines_after = lines[self.start_line + 1:]
133 141
134 This code in IPython strips a constant amount of leading indentation from each
135 line in a cell::
142 return lines_before + [new_line] + lines_after
136 143
137 from IPython.core.inputtransformer import CoroutineInputTransformer
144 And here's how you'd use it::
138 145
139 @CoroutineInputTransformer.wrap
140 def leading_indent():
141 """Remove leading indentation.
146 ip = get_ipython()
147 ip.input_transformer_manager.token_transformers.append(MySpecialCommand)
142 148
143 If the first line starts with a spaces or tabs, the same whitespace will be
144 removed from each following line until it is reset.
145 """
146 space_re = re.compile(r'^[ \t]+')
147 line = ''
148 while True:
149 line = (yield line)
150
151 if line is None:
152 continue
153
154 m = space_re.match(line)
155 if m:
156 space = m.group(0)
157 while line is not None:
158 if line.startswith(space):
159 line = line[len(space):]
160 line = (yield line)
161 else:
162 # No leading spaces - wait for reset
163 while line is not None:
164 line = (yield line)
165
166
167 Token-based transformers
168 ------------------------
169
170 There is an experimental framework that takes care of tokenizing and
171 untokenizing lines of code. Define a function that accepts a list of tokens, and
172 returns an iterable of output tokens, and decorate it with
173 :meth:`TokenInputTransformer.wrap`. These should only be used in
174 ``python_line_transforms``.
175 149
176 150 AST transformations
177 151 ===================
178 152
179 153 After the code has been parsed as Python syntax, you can use Python's powerful
180 154 *Abstract Syntax Tree* tools to modify it. Subclass :class:`ast.NodeTransformer`,
181 155 and add an instance to ``shell.ast_transformers``.
182 156
183 157 This example wraps integer literals in an ``Integer`` class, which is useful for
184 158 mathematical frameworks that want to handle e.g. ``1/3`` as a precise fraction::
185 159
186 160
187 161 class IntegerWrapper(ast.NodeTransformer):
188 162 """Wraps all integers in a call to Integer()"""
189 163 def visit_Num(self, node):
190 164 if isinstance(node.n, int):
191 165 return ast.Call(func=ast.Name(id='Integer', ctx=ast.Load()),
192 166 args=[node], keywords=[])
193 167 return node
General Comments 0
You need to be logged in to leave comments. Login now