##// END OF EJS Templates
Fix crash with 'x?' when input line had unusual patterns.
Fernando Perez -
Show More
@@ -1,987 +1,986
1 1 """Analysis of text input into executable blocks.
2 2
3 3 The main class in this module, :class:`InputSplitter`, is designed to break
4 4 input from either interactive, line-by-line environments or block-based ones,
5 5 into standalone blocks that can be executed by Python as 'single' statements
6 6 (thus triggering sys.displayhook).
7 7
8 8 A companion, :class:`IPythonInputSplitter`, provides the same functionality but
9 9 with full support for the extended IPython syntax (magics, system calls, etc).
10 10
11 11 For more details, see the class docstring below.
12 12
13 13 Syntax Transformations
14 14 ----------------------
15 15
16 16 One of the main jobs of the code in this file is to apply all syntax
17 17 transformations that make up 'the IPython language', i.e. magics, shell
18 18 escapes, etc. All transformations should be implemented as *fully stateless*
19 19 entities, that simply take one line as their input and return a line.
20 20 Internally for implementation purposes they may be a normal function or a
21 21 callable object, but the only input they receive will be a single line and they
22 22 should only return a line, without holding any data-dependent state between
23 23 calls.
24 24
25 25 As an example, the EscapedTransformer is a class so we can more clearly group
26 26 together the functionality of dispatching to individual functions based on the
27 27 starting escape character, but the only method for public use is its call
28 28 method.
29 29
30 30
31 31 ToDo
32 32 ----
33 33
34 34 - Should we make push() actually raise an exception once push_accepts_more()
35 35 returns False?
36 36
37 37 - Naming cleanups. The tr_* names aren't the most elegant, though now they are
38 38 at least just attributes of a class so not really very exposed.
39 39
40 40 - Think about the best way to support dynamic things: automagic, autocall,
41 41 macros, etc.
42 42
43 43 - Think of a better heuristic for the application of the transforms in
44 44 IPythonInputSplitter.push() than looking at the buffer ending in ':'. Idea:
45 45 track indentation change events (indent, dedent, nothing) and apply them only
46 46 if the indentation went up, but not otherwise.
47 47
48 48 - Think of the cleanest way for supporting user-specified transformations (the
49 49 user prefilters we had before).
50 50
51 51 Authors
52 52 -------
53 53
54 54 * Fernando Perez
55 55 * Brian Granger
56 56 """
57 57 #-----------------------------------------------------------------------------
58 58 # Copyright (C) 2010 The IPython Development Team
59 59 #
60 60 # Distributed under the terms of the BSD License. The full license is in
61 61 # the file COPYING, distributed as part of this software.
62 62 #-----------------------------------------------------------------------------
63 63 from __future__ import print_function
64 64
65 65 #-----------------------------------------------------------------------------
66 66 # Imports
67 67 #-----------------------------------------------------------------------------
68 68 # stdlib
69 69 import codeop
70 70 import re
71 71 import sys
72 72
73 73 # IPython modules
74 74 from IPython.utils.text import make_quoted_expr
75 75
76 76 #-----------------------------------------------------------------------------
77 77 # Globals
78 78 #-----------------------------------------------------------------------------
79 79
80 80 # The escape sequences that define the syntax transformations IPython will
81 81 # apply to user input. These can NOT be just changed here: many regular
82 82 # expressions and other parts of the code may use their hardcoded values, and
83 83 # for all intents and purposes they constitute the 'IPython syntax', so they
84 84 # should be considered fixed.
85 85
86 86 ESC_SHELL = '!' # Send line to underlying system shell
87 87 ESC_SH_CAP = '!!' # Send line to system shell and capture output
88 88 ESC_HELP = '?' # Find information about object
89 89 ESC_HELP2 = '??' # Find extra-detailed information about object
90 90 ESC_MAGIC = '%' # Call magic function
91 91 ESC_QUOTE = ',' # Split args on whitespace, quote each as string and call
92 92 ESC_QUOTE2 = ';' # Quote all args as a single string, call
93 93 ESC_PAREN = '/' # Call first argument with rest of line as arguments
94 94
95 95 #-----------------------------------------------------------------------------
96 96 # Utilities
97 97 #-----------------------------------------------------------------------------
98 98
99 99 # FIXME: These are general-purpose utilities that later can be moved to the
100 100 # general ward. Kept here for now because we're being very strict about test
101 101 # coverage with this code, and this lets us ensure that we keep 100% coverage
102 102 # while developing.
103 103
104 104 # compiled regexps for autoindent management
105 105 dedent_re = re.compile(r'^\s+raise|^\s+return|^\s+pass')
106 106 ini_spaces_re = re.compile(r'^([ \t\r\f\v]+)')
107 107
108 108 # regexp to match pure comment lines so we don't accidentally insert 'if 1:'
109 109 # before pure comments
110 110 comment_line_re = re.compile('^\s*\#')
111 111
112 112
113 113 def num_ini_spaces(s):
114 114 """Return the number of initial spaces in a string.
115 115
116 116 Note that tabs are counted as a single space. For now, we do *not* support
117 117 mixing of tabs and spaces in the user's input.
118 118
119 119 Parameters
120 120 ----------
121 121 s : string
122 122
123 123 Returns
124 124 -------
125 125 n : int
126 126 """
127 127
128 128 ini_spaces = ini_spaces_re.match(s)
129 129 if ini_spaces:
130 130 return ini_spaces.end()
131 131 else:
132 132 return 0
133 133
134 134
135 135 def remove_comments(src):
136 136 """Remove all comments from input source.
137 137
138 138 Note: comments are NOT recognized inside of strings!
139 139
140 140 Parameters
141 141 ----------
142 142 src : string
143 143 A single or multiline input string.
144 144
145 145 Returns
146 146 -------
147 147 String with all Python comments removed.
148 148 """
149 149
150 150 return re.sub('#.*', '', src)
151 151
152 152
153 153 def get_input_encoding():
154 154 """Return the default standard input encoding.
155 155
156 156 If sys.stdin has no encoding, 'ascii' is returned."""
157 157 # There are strange environments for which sys.stdin.encoding is None. We
158 158 # ensure that a valid encoding is returned.
159 159 encoding = getattr(sys.stdin, 'encoding', None)
160 160 if encoding is None:
161 161 encoding = 'ascii'
162 162 return encoding
163 163
164 164 #-----------------------------------------------------------------------------
165 165 # Classes and functions for normal Python syntax handling
166 166 #-----------------------------------------------------------------------------
167 167
168 168 # HACK! This implementation, written by Robert K a while ago using the
169 169 # compiler module, is more robust than the other one below, but it expects its
170 170 # input to be pure python (no ipython syntax). For now we're using it as a
171 171 # second-pass splitter after the first pass transforms the input to pure
172 172 # python.
173 173
174 174 def split_blocks(python):
175 175 """ Split multiple lines of code into discrete commands that can be
176 176 executed singly.
177 177
178 178 Parameters
179 179 ----------
180 180 python : str
181 181 Pure, exec'able Python code.
182 182
183 183 Returns
184 184 -------
185 185 commands : list of str
186 186 Separate commands that can be exec'ed independently.
187 187 """
188 188
189 189 import compiler
190 190
191 191 # compiler.parse treats trailing spaces after a newline as a
192 192 # SyntaxError. This is different than codeop.CommandCompiler, which
193 193 # will compile the trailng spaces just fine. We simply strip any
194 194 # trailing whitespace off. Passing a string with trailing whitespace
195 195 # to exec will fail however. There seems to be some inconsistency in
196 196 # how trailing whitespace is handled, but this seems to work.
197 197 python_ori = python # save original in case we bail on error
198 198 python = python.strip()
199 199
200 200 # The compiler module does not like unicode. We need to convert
201 201 # it encode it:
202 202 if isinstance(python, unicode):
203 203 # Use the utf-8-sig BOM so the compiler detects this a UTF-8
204 204 # encode string.
205 205 python = '\xef\xbb\xbf' + python.encode('utf-8')
206 206
207 207 # The compiler module will parse the code into an abstract syntax tree.
208 208 # This has a bug with str("a\nb"), but not str("""a\nb""")!!!
209 209 try:
210 210 ast = compiler.parse(python)
211 211 except:
212 212 return [python_ori]
213 213
214 214 # Uncomment to help debug the ast tree
215 215 # for n in ast.node:
216 216 # print n.lineno,'->',n
217 217
218 218 # Each separate command is available by iterating over ast.node. The
219 219 # lineno attribute is the line number (1-indexed) beginning the commands
220 220 # suite.
221 221 # lines ending with ";" yield a Discard Node that doesn't have a lineno
222 222 # attribute. These nodes can and should be discarded. But there are
223 223 # other situations that cause Discard nodes that shouldn't be discarded.
224 224 # We might eventually discover other cases where lineno is None and have
225 225 # to put in a more sophisticated test.
226 226 linenos = [x.lineno-1 for x in ast.node if x.lineno is not None]
227 227
228 228 # When we finally get the slices, we will need to slice all the way to
229 229 # the end even though we don't have a line number for it. Fortunately,
230 230 # None does the job nicely.
231 231 linenos.append(None)
232 232
233 233 # Same problem at the other end: sometimes the ast tree has its
234 234 # first complete statement not starting on line 0. In this case
235 235 # we might miss part of it. This fixes ticket 266993. Thanks Gael!
236 236 linenos[0] = 0
237 237
238 238 lines = python.splitlines()
239 239
240 240 # Create a list of atomic commands.
241 241 cmds = []
242 242 for i, j in zip(linenos[:-1], linenos[1:]):
243 243 cmd = lines[i:j]
244 244 if cmd:
245 245 cmds.append('\n'.join(cmd)+'\n')
246 246
247 247 return cmds
248 248
249 249
250 250 class InputSplitter(object):
251 251 """An object that can split Python source input in executable blocks.
252 252
253 253 This object is designed to be used in one of two basic modes:
254 254
255 255 1. By feeding it python source line-by-line, using :meth:`push`. In this
256 256 mode, it will return on each push whether the currently pushed code
257 257 could be executed already. In addition, it provides a method called
258 258 :meth:`push_accepts_more` that can be used to query whether more input
259 259 can be pushed into a single interactive block.
260 260
261 261 2. By calling :meth:`split_blocks` with a single, multiline Python string,
262 262 that is then split into blocks each of which can be executed
263 263 interactively as a single statement.
264 264
265 265 This is a simple example of how an interactive terminal-based client can use
266 266 this tool::
267 267
268 268 isp = InputSplitter()
269 269 while isp.push_accepts_more():
270 270 indent = ' '*isp.indent_spaces
271 271 prompt = '>>> ' + indent
272 272 line = indent + raw_input(prompt)
273 273 isp.push(line)
274 274 print 'Input source was:\n', isp.source_reset(),
275 275 """
276 276 # Number of spaces of indentation computed from input that has been pushed
277 277 # so far. This is the attributes callers should query to get the current
278 278 # indentation level, in order to provide auto-indent facilities.
279 279 indent_spaces = 0
280 280 # String, indicating the default input encoding. It is computed by default
281 281 # at initialization time via get_input_encoding(), but it can be reset by a
282 282 # client with specific knowledge of the encoding.
283 283 encoding = ''
284 284 # String where the current full source input is stored, properly encoded.
285 285 # Reading this attribute is the normal way of querying the currently pushed
286 286 # source code, that has been properly encoded.
287 287 source = ''
288 288 # Code object corresponding to the current source. It is automatically
289 289 # synced to the source, so it can be queried at any time to obtain the code
290 290 # object; it will be None if the source doesn't compile to valid Python.
291 291 code = None
292 292 # Input mode
293 293 input_mode = 'line'
294 294
295 295 # Private attributes
296 296
297 297 # List with lines of input accumulated so far
298 298 _buffer = None
299 299 # Command compiler
300 300 _compile = None
301 301 # Mark when input has changed indentation all the way back to flush-left
302 302 _full_dedent = False
303 303 # Boolean indicating whether the current block is complete
304 304 _is_complete = None
305 305
306 306 def __init__(self, input_mode=None):
307 307 """Create a new InputSplitter instance.
308 308
309 309 Parameters
310 310 ----------
311 311 input_mode : str
312 312
313 313 One of ['line', 'cell']; default is 'line'.
314 314
315 315 The input_mode parameter controls how new inputs are used when fed via
316 316 the :meth:`push` method:
317 317
318 318 - 'line': meant for line-oriented clients, inputs are appended one at a
319 319 time to the internal buffer and the whole buffer is compiled.
320 320
321 321 - 'cell': meant for clients that can edit multi-line 'cells' of text at
322 322 a time. A cell can contain one or more blocks that can be compile in
323 323 'single' mode by Python. In this mode, each new input new input
324 324 completely replaces all prior inputs. Cell mode is thus equivalent
325 325 to prepending a full reset() to every push() call.
326 326 """
327 327 self._buffer = []
328 328 self._compile = codeop.CommandCompiler()
329 329 self.encoding = get_input_encoding()
330 330 self.input_mode = InputSplitter.input_mode if input_mode is None \
331 331 else input_mode
332 332
333 333 def reset(self):
334 334 """Reset the input buffer and associated state."""
335 335 self.indent_spaces = 0
336 336 self._buffer[:] = []
337 337 self.source = ''
338 338 self.code = None
339 339 self._is_complete = False
340 340 self._full_dedent = False
341 341
342 342 def source_reset(self):
343 343 """Return the input source and perform a full reset.
344 344 """
345 345 out = self.source
346 346 self.reset()
347 347 return out
348 348
349 349 def push(self, lines):
350 350 """Push one ore more lines of input.
351 351
352 352 This stores the given lines and returns a status code indicating
353 353 whether the code forms a complete Python block or not.
354 354
355 355 Any exceptions generated in compilation are swallowed, but if an
356 356 exception was produced, the method returns True.
357 357
358 358 Parameters
359 359 ----------
360 360 lines : string
361 361 One or more lines of Python input.
362 362
363 363 Returns
364 364 -------
365 365 is_complete : boolean
366 366 True if the current input source (the result of the current input
367 367 plus prior inputs) forms a complete Python execution block. Note that
368 368 this value is also stored as a private attribute (_is_complete), so it
369 369 can be queried at any time.
370 370 """
371 371 if self.input_mode == 'cell':
372 372 self.reset()
373 373
374 374 # If the source code has leading blanks, add 'if 1:\n' to it
375 375 # this allows execution of indented pasted code. It is tempting
376 376 # to add '\n' at the end of source to run commands like ' a=1'
377 377 # directly, but this fails for more complicated scenarios
378 378
379 379 if not self._buffer and lines[:1] in [' ', '\t'] and \
380 380 not comment_line_re.match(lines):
381 381 lines = 'if 1:\n%s' % lines
382 382
383 383 self._store(lines)
384 384 source = self.source
385 385
386 386 # Before calling _compile(), reset the code object to None so that if an
387 387 # exception is raised in compilation, we don't mislead by having
388 388 # inconsistent code/source attributes.
389 389 self.code, self._is_complete = None, None
390 390
391 391 # Honor termination lines properly
392 392 if source.rstrip().endswith('\\'):
393 393 return False
394 394
395 395 self._update_indent(lines)
396 396 try:
397 397 self.code = self._compile(source)
398 398 # Invalid syntax can produce any of a number of different errors from
399 399 # inside the compiler, so we have to catch them all. Syntax errors
400 400 # immediately produce a 'ready' block, so the invalid Python can be
401 401 # sent to the kernel for evaluation with possible ipython
402 402 # special-syntax conversion.
403 403 except (SyntaxError, OverflowError, ValueError, TypeError,
404 404 MemoryError):
405 405 self._is_complete = True
406 406 else:
407 407 # Compilation didn't produce any exceptions (though it may not have
408 408 # given a complete code object)
409 409 self._is_complete = self.code is not None
410 410
411 411 return self._is_complete
412 412
413 413 def push_accepts_more(self):
414 414 """Return whether a block of interactive input can accept more input.
415 415
416 416 This method is meant to be used by line-oriented frontends, who need to
417 417 guess whether a block is complete or not based solely on prior and
418 418 current input lines. The InputSplitter considers it has a complete
419 419 interactive block and will not accept more input only when either a
420 420 SyntaxError is raised, or *all* of the following are true:
421 421
422 422 1. The input compiles to a complete statement.
423 423
424 424 2. The indentation level is flush-left (because if we are indented,
425 425 like inside a function definition or for loop, we need to keep
426 426 reading new input).
427 427
428 428 3. There is one extra line consisting only of whitespace.
429 429
430 430 Because of condition #3, this method should be used only by
431 431 *line-oriented* frontends, since it means that intermediate blank lines
432 432 are not allowed in function definitions (or any other indented block).
433 433
434 434 Block-oriented frontends that have a separate keyboard event to
435 435 indicate execution should use the :meth:`split_blocks` method instead.
436 436
437 437 If the current input produces a syntax error, this method immediately
438 438 returns False but does *not* raise the syntax error exception, as
439 439 typically clients will want to send invalid syntax to an execution
440 440 backend which might convert the invalid syntax into valid Python via
441 441 one of the dynamic IPython mechanisms.
442 442 """
443 443
444 444 # With incomplete input, unconditionally accept more
445 445 if not self._is_complete:
446 446 return True
447 447
448 448 # If we already have complete input and we're flush left, the answer
449 449 # depends. In line mode, we're done. But in cell mode, we need to
450 450 # check how many blocks the input so far compiles into, because if
451 451 # there's already more than one full independent block of input, then
452 452 # the client has entered full 'cell' mode and is feeding lines that
453 453 # each is complete. In this case we should then keep accepting.
454 454 # The Qt terminal-like console does precisely this, to provide the
455 455 # convenience of terminal-like input of single expressions, but
456 456 # allowing the user (with a separate keystroke) to switch to 'cell'
457 457 # mode and type multiple expressions in one shot.
458 458 if self.indent_spaces==0:
459 459 if self.input_mode=='line':
460 460 return False
461 461 else:
462 462 nblocks = len(split_blocks(''.join(self._buffer)))
463 463 if nblocks==1:
464 464 return False
465 465
466 466 # When input is complete, then termination is marked by an extra blank
467 467 # line at the end.
468 468 last_line = self.source.splitlines()[-1]
469 469 return bool(last_line and not last_line.isspace())
470 470
471 471 def split_blocks(self, lines):
472 472 """Split a multiline string into multiple input blocks.
473 473
474 474 Note: this method starts by performing a full reset().
475 475
476 476 Parameters
477 477 ----------
478 478 lines : str
479 479 A possibly multiline string.
480 480
481 481 Returns
482 482 -------
483 483 blocks : list
484 484 A list of strings, each possibly multiline. Each string corresponds
485 485 to a single block that can be compiled in 'single' mode (unless it
486 486 has a syntax error)."""
487 487
488 488 # This code is fairly delicate. If you make any changes here, make
489 489 # absolutely sure that you do run the full test suite and ALL tests
490 490 # pass.
491 491
492 492 self.reset()
493 493 blocks = []
494 494
495 495 # Reversed copy so we can use pop() efficiently and consume the input
496 496 # as a stack
497 497 lines = lines.splitlines()[::-1]
498 498 # Outer loop over all input
499 499 while lines:
500 500 #print 'Current lines:', lines # dbg
501 501 # Inner loop to build each block
502 502 while True:
503 503 # Safety exit from inner loop
504 504 if not lines:
505 505 break
506 506 # Grab next line but don't push it yet
507 507 next_line = lines.pop()
508 508 # Blank/empty lines are pushed as-is
509 509 if not next_line or next_line.isspace():
510 510 self.push(next_line)
511 511 continue
512 512
513 513 # Check indentation changes caused by the *next* line
514 514 indent_spaces, _full_dedent = self._find_indent(next_line)
515 515
516 516 # If the next line causes a dedent, it can be for two differnt
517 517 # reasons: either an explicit de-dent by the user or a
518 518 # return/raise/pass statement. These MUST be handled
519 519 # separately:
520 520 #
521 521 # 1. the first case is only detected when the actual explicit
522 522 # dedent happens, and that would be the *first* line of a *new*
523 523 # block. Thus, we must put the line back into the input buffer
524 524 # so that it starts a new block on the next pass.
525 525 #
526 526 # 2. the second case is detected in the line before the actual
527 527 # dedent happens, so , we consume the line and we can break out
528 528 # to start a new block.
529 529
530 530 # Case 1, explicit dedent causes a break.
531 531 # Note: check that we weren't on the very last line, else we'll
532 532 # enter an infinite loop adding/removing the last line.
533 533 if _full_dedent and lines and not next_line.startswith(' '):
534 534 lines.append(next_line)
535 535 break
536 536
537 537 # Otherwise any line is pushed
538 538 self.push(next_line)
539 539
540 540 # Case 2, full dedent with full block ready:
541 541 if _full_dedent or \
542 542 self.indent_spaces==0 and not self.push_accepts_more():
543 543 break
544 544 # Form the new block with the current source input
545 545 blocks.append(self.source_reset())
546 546
547 547 #return blocks
548 548 # HACK!!! Now that our input is in blocks but guaranteed to be pure
549 549 # python syntax, feed it back a second time through the AST-based
550 550 # splitter, which is more accurate than ours.
551 551 return split_blocks(''.join(blocks))
552 552
553 553 #------------------------------------------------------------------------
554 554 # Private interface
555 555 #------------------------------------------------------------------------
556 556
557 557 def _find_indent(self, line):
558 558 """Compute the new indentation level for a single line.
559 559
560 560 Parameters
561 561 ----------
562 562 line : str
563 563 A single new line of non-whitespace, non-comment Python input.
564 564
565 565 Returns
566 566 -------
567 567 indent_spaces : int
568 568 New value for the indent level (it may be equal to self.indent_spaces
569 569 if indentation doesn't change.
570 570
571 571 full_dedent : boolean
572 572 Whether the new line causes a full flush-left dedent.
573 573 """
574 574 indent_spaces = self.indent_spaces
575 575 full_dedent = self._full_dedent
576 576
577 577 inisp = num_ini_spaces(line)
578 578 if inisp < indent_spaces:
579 579 indent_spaces = inisp
580 580 if indent_spaces <= 0:
581 581 #print 'Full dedent in text',self.source # dbg
582 582 full_dedent = True
583 583
584 584 if line[-1] == ':':
585 585 indent_spaces += 4
586 586 elif dedent_re.match(line):
587 587 indent_spaces -= 4
588 588 if indent_spaces <= 0:
589 589 full_dedent = True
590 590
591 591 # Safety
592 592 if indent_spaces < 0:
593 593 indent_spaces = 0
594 594 #print 'safety' # dbg
595 595
596 596 return indent_spaces, full_dedent
597 597
598 598 def _update_indent(self, lines):
599 599 for line in remove_comments(lines).splitlines():
600 600 if line and not line.isspace():
601 601 self.indent_spaces, self._full_dedent = self._find_indent(line)
602 602
603 603 def _store(self, lines):
604 604 """Store one or more lines of input.
605 605
606 606 If input lines are not newline-terminated, a newline is automatically
607 607 appended."""
608 608
609 609 if lines.endswith('\n'):
610 610 self._buffer.append(lines)
611 611 else:
612 612 self._buffer.append(lines+'\n')
613 613 self._set_source()
614 614
615 615 def _set_source(self):
616 616 self.source = ''.join(self._buffer).encode(self.encoding)
617 617
618 618
619 619 #-----------------------------------------------------------------------------
620 620 # Functions and classes for IPython-specific syntactic support
621 621 #-----------------------------------------------------------------------------
622 622
623 623 # RegExp for splitting line contents into pre-char//first word-method//rest.
624 624 # For clarity, each group in on one line.
625 625
626 626 line_split = re.compile("""
627 627 ^(\s*) # any leading space
628 628 ([,;/%]|!!?|\?\??) # escape character or characters
629 629 \s*(%?[\w\.\*]*) # function/method, possibly with leading %
630 630 # to correctly treat things like '?%magic'
631 631 (\s+.*$|$) # rest of line
632 632 """, re.VERBOSE)
633 633
634 634
635 635 def split_user_input(line):
636 636 """Split user input into early whitespace, esc-char, function part and rest.
637 637
638 638 This is currently handles lines with '=' in them in a very inconsistent
639 639 manner.
640 640
641 641 Examples
642 642 ========
643 643 >>> split_user_input('x=1')
644 644 ('', '', 'x=1', '')
645 645 >>> split_user_input('?')
646 646 ('', '?', '', '')
647 647 >>> split_user_input('??')
648 648 ('', '??', '', '')
649 649 >>> split_user_input(' ?')
650 650 (' ', '?', '', '')
651 651 >>> split_user_input(' ??')
652 652 (' ', '??', '', '')
653 653 >>> split_user_input('??x')
654 654 ('', '??', 'x', '')
655 655 >>> split_user_input('?x=1')
656 656 ('', '', '?x=1', '')
657 657 >>> split_user_input('!ls')
658 658 ('', '!', 'ls', '')
659 659 >>> split_user_input(' !ls')
660 660 (' ', '!', 'ls', '')
661 661 >>> split_user_input('!!ls')
662 662 ('', '!!', 'ls', '')
663 663 >>> split_user_input(' !!ls')
664 664 (' ', '!!', 'ls', '')
665 665 >>> split_user_input(',ls')
666 666 ('', ',', 'ls', '')
667 667 >>> split_user_input(';ls')
668 668 ('', ';', 'ls', '')
669 669 >>> split_user_input(' ;ls')
670 670 (' ', ';', 'ls', '')
671 671 >>> split_user_input('f.g(x)')
672 672 ('', '', 'f.g(x)', '')
673 673 >>> split_user_input('f.g (x)')
674 674 ('', '', 'f.g', '(x)')
675 675 >>> split_user_input('?%hist')
676 676 ('', '?', '%hist', '')
677 677 >>> split_user_input('?x*')
678 678 ('', '?', 'x*', '')
679 679 """
680 680 match = line_split.match(line)
681 681 if match:
682 682 lspace, esc, fpart, rest = match.groups()
683 683 else:
684 684 # print "match failed for line '%s'" % line
685 685 try:
686 686 fpart, rest = line.split(None, 1)
687 687 except ValueError:
688 688 # print "split failed for line '%s'" % line
689 689 fpart, rest = line,''
690 690 lspace = re.match('^(\s*)(.*)', line).groups()[0]
691 691 esc = ''
692 692
693 693 # fpart has to be a valid python identifier, so it better be only pure
694 694 # ascii, no unicode:
695 695 try:
696 696 fpart = fpart.encode('ascii')
697 697 except UnicodeEncodeError:
698 698 lspace = unicode(lspace)
699 699 rest = fpart + u' ' + rest
700 700 fpart = u''
701 701
702 702 #print 'line:<%s>' % line # dbg
703 703 #print 'esc <%s> fpart <%s> rest <%s>' % (esc,fpart.strip(),rest) # dbg
704 704 return lspace, esc, fpart.strip(), rest.lstrip()
705 705
706 706
707 707 # The escaped translators ALL receive a line where their own escape has been
708 708 # stripped. Only '?' is valid at the end of the line, all others can only be
709 709 # placed at the start.
710 710
711 711 class LineInfo(object):
712 712 """A single line of input and associated info.
713 713
714 714 This is a utility class that mostly wraps the output of
715 715 :func:`split_user_input` into a convenient object to be passed around
716 716 during input transformations.
717 717
718 718 Includes the following as properties:
719 719
720 720 line
721 721 The original, raw line
722 722
723 723 lspace
724 724 Any early whitespace before actual text starts.
725 725
726 726 esc
727 727 The initial esc character (or characters, for double-char escapes like
728 728 '??' or '!!').
729 729
730 730 fpart
731 731 The 'function part', which is basically the maximal initial sequence
732 732 of valid python identifiers and the '.' character. This is what is
733 733 checked for alias and magic transformations, used for auto-calling,
734 734 etc.
735 735
736 736 rest
737 737 Everything else on the line.
738 738 """
739 739 def __init__(self, line):
740 740 self.line = line
741 741 self.lspace, self.esc, self.fpart, self.rest = \
742 742 split_user_input(line)
743 743
744 744 def __str__(self):
745 745 return "LineInfo [%s|%s|%s|%s]" % (self.lspace, self.esc,
746 746 self.fpart, self.rest)
747 747
748 748
749 749 # Transformations of the special syntaxes that don't rely on an explicit escape
750 750 # character but instead on patterns on the input line
751 751
752 752 # The core transformations are implemented as standalone functions that can be
753 753 # tested and validated in isolation. Each of these uses a regexp, we
754 754 # pre-compile these and keep them close to each function definition for clarity
755 755
756 756 _assign_system_re = re.compile(r'(?P<lhs>(\s*)([\w\.]+)((\s*,\s*[\w\.]+)*))'
757 757 r'\s*=\s*!\s*(?P<cmd>.*)')
758 758
759 759 def transform_assign_system(line):
760 760 """Handle the `files = !ls` syntax."""
761 761 m = _assign_system_re.match(line)
762 762 if m is not None:
763 763 cmd = m.group('cmd')
764 764 lhs = m.group('lhs')
765 765 expr = make_quoted_expr(cmd)
766 766 new_line = '%s = get_ipython().getoutput(%s)' % (lhs, expr)
767 767 return new_line
768 768 return line
769 769
770 770
771 771 _assign_magic_re = re.compile(r'(?P<lhs>(\s*)([\w\.]+)((\s*,\s*[\w\.]+)*))'
772 772 r'\s*=\s*%\s*(?P<cmd>.*)')
773 773
774 774 def transform_assign_magic(line):
775 775 """Handle the `a = %who` syntax."""
776 776 m = _assign_magic_re.match(line)
777 777 if m is not None:
778 778 cmd = m.group('cmd')
779 779 lhs = m.group('lhs')
780 780 expr = make_quoted_expr(cmd)
781 781 new_line = '%s = get_ipython().magic(%s)' % (lhs, expr)
782 782 return new_line
783 783 return line
784 784
785 785
786 786 _classic_prompt_re = re.compile(r'^([ \t]*>>> |^[ \t]*\.\.\. )')
787 787
788 788 def transform_classic_prompt(line):
789 789 """Handle inputs that start with '>>> ' syntax."""
790 790
791 791 if not line or line.isspace():
792 792 return line
793 793 m = _classic_prompt_re.match(line)
794 794 if m:
795 795 return line[len(m.group(0)):]
796 796 else:
797 797 return line
798 798
799 799
800 800 _ipy_prompt_re = re.compile(r'^([ \t]*In \[\d+\]: |^[ \t]*\ \ \ \.\.\.+: )')
801 801
802 802 def transform_ipy_prompt(line):
803 803 """Handle inputs that start classic IPython prompt syntax."""
804 804
805 805 if not line or line.isspace():
806 806 return line
807 807 #print 'LINE: %r' % line # dbg
808 808 m = _ipy_prompt_re.match(line)
809 809 if m:
810 810 #print 'MATCH! %r -> %r' % (line, line[len(m.group(0)):]) # dbg
811 811 return line[len(m.group(0)):]
812 812 else:
813 813 return line
814 814
815 815
816 816 class EscapedTransformer(object):
817 817 """Class to transform lines that are explicitly escaped out."""
818 818
819 819 def __init__(self):
820 820 tr = { ESC_SHELL : self._tr_system,
821 821 ESC_SH_CAP : self._tr_system2,
822 822 ESC_HELP : self._tr_help,
823 823 ESC_HELP2 : self._tr_help,
824 824 ESC_MAGIC : self._tr_magic,
825 825 ESC_QUOTE : self._tr_quote,
826 826 ESC_QUOTE2 : self._tr_quote2,
827 827 ESC_PAREN : self._tr_paren }
828 828 self.tr = tr
829 829
830 830 # Support for syntax transformations that use explicit escapes typed by the
831 831 # user at the beginning of a line
832 832 @staticmethod
833 833 def _tr_system(line_info):
834 834 "Translate lines escaped with: !"
835 835 cmd = line_info.line.lstrip().lstrip(ESC_SHELL)
836 836 return '%sget_ipython().system(%s)' % (line_info.lspace,
837 837 make_quoted_expr(cmd))
838 838
839 839 @staticmethod
840 840 def _tr_system2(line_info):
841 841 "Translate lines escaped with: !!"
842 842 cmd = line_info.line.lstrip()[2:]
843 843 return '%sget_ipython().getoutput(%s)' % (line_info.lspace,
844 844 make_quoted_expr(cmd))
845 845
846 846 @staticmethod
847 847 def _tr_help(line_info):
848 848 "Translate lines escaped with: ?/??"
849 849 # A naked help line should just fire the intro help screen
850 850 if not line_info.line[1:]:
851 851 return 'get_ipython().show_usage()'
852 852
853 853 # There may be one or two '?' at the end, move them to the front so that
854 854 # the rest of the logic can assume escapes are at the start
855 855 l_ori = line_info
856 856 line = line_info.line
857 857 if line.endswith('?'):
858 858 line = line[-1] + line[:-1]
859 859 if line.endswith('?'):
860 860 line = line[-1] + line[:-1]
861 861 line_info = LineInfo(line)
862 862
863 863 # From here on, simply choose which level of detail to get, and
864 864 # special-case the psearch syntax
865 pinfo = 'pinfo' # default
865 866 if '*' in line_info.line:
866 867 pinfo = 'psearch'
867 elif line_info.esc == '?':
868 pinfo = 'pinfo'
869 868 elif line_info.esc == '??':
870 869 pinfo = 'pinfo2'
871 870
872 871 tpl = '%sget_ipython().magic("%s %s")'
873 872 return tpl % (line_info.lspace, pinfo,
874 873 ' '.join([line_info.fpart, line_info.rest]).strip())
875 874
876 875 @staticmethod
877 876 def _tr_magic(line_info):
878 877 "Translate lines escaped with: %"
879 878 tpl = '%sget_ipython().magic(%s)'
880 879 cmd = make_quoted_expr(' '.join([line_info.fpart,
881 880 line_info.rest]).strip())
882 881 return tpl % (line_info.lspace, cmd)
883 882
884 883 @staticmethod
885 884 def _tr_quote(line_info):
886 885 "Translate lines escaped with: ,"
887 886 return '%s%s("%s")' % (line_info.lspace, line_info.fpart,
888 887 '", "'.join(line_info.rest.split()) )
889 888
890 889 @staticmethod
891 890 def _tr_quote2(line_info):
892 891 "Translate lines escaped with: ;"
893 892 return '%s%s("%s")' % (line_info.lspace, line_info.fpart,
894 893 line_info.rest)
895 894
896 895 @staticmethod
897 896 def _tr_paren(line_info):
898 897 "Translate lines escaped with: /"
899 898 return '%s%s(%s)' % (line_info.lspace, line_info.fpart,
900 899 ", ".join(line_info.rest.split()))
901 900
902 901 def __call__(self, line):
903 902 """Class to transform lines that are explicitly escaped out.
904 903
905 904 This calls the above _tr_* static methods for the actual line
906 905 translations."""
907 906
908 907 # Empty lines just get returned unmodified
909 908 if not line or line.isspace():
910 909 return line
911 910
912 911 # Get line endpoints, where the escapes can be
913 912 line_info = LineInfo(line)
914 913
915 914 # If the escape is not at the start, only '?' needs to be special-cased.
916 915 # All other escapes are only valid at the start
917 916 if not line_info.esc in self.tr:
918 917 if line.endswith(ESC_HELP):
919 918 return self._tr_help(line_info)
920 919 else:
921 920 # If we don't recognize the escape, don't modify the line
922 921 return line
923 922
924 923 return self.tr[line_info.esc](line_info)
925 924
926 925
927 926 # A function-looking object to be used by the rest of the code. The purpose of
928 927 # the class in this case is to organize related functionality, more than to
929 928 # manage state.
930 929 transform_escaped = EscapedTransformer()
931 930
932 931
933 932 class IPythonInputSplitter(InputSplitter):
934 933 """An input splitter that recognizes all of IPython's special syntax."""
935 934
936 935 def push(self, lines):
937 936 """Push one or more lines of IPython input.
938 937 """
939 938 if not lines:
940 939 return super(IPythonInputSplitter, self).push(lines)
941 940
942 941 lines_list = lines.splitlines()
943 942
944 943 transforms = [transform_escaped, transform_assign_system,
945 944 transform_assign_magic, transform_ipy_prompt,
946 945 transform_classic_prompt]
947 946
948 947 # Transform logic
949 948 #
950 949 # We only apply the line transformers to the input if we have either no
951 950 # input yet, or complete input, or if the last line of the buffer ends
952 951 # with ':' (opening an indented block). This prevents the accidental
953 952 # transformation of escapes inside multiline expressions like
954 953 # triple-quoted strings or parenthesized expressions.
955 954 #
956 955 # The last heuristic, while ugly, ensures that the first line of an
957 956 # indented block is correctly transformed.
958 957 #
959 958 # FIXME: try to find a cleaner approach for this last bit.
960 959
961 960 # If we were in 'block' mode, since we're going to pump the parent
962 961 # class by hand line by line, we need to temporarily switch out to
963 962 # 'line' mode, do a single manual reset and then feed the lines one
964 963 # by one. Note that this only matters if the input has more than one
965 964 # line.
966 965 changed_input_mode = False
967 966
968 967 if len(lines_list)>1 and self.input_mode == 'cell':
969 968 self.reset()
970 969 changed_input_mode = True
971 970 saved_input_mode = 'cell'
972 971 self.input_mode = 'line'
973 972
974 973 try:
975 974 push = super(IPythonInputSplitter, self).push
976 975 for line in lines_list:
977 976 if self._is_complete or not self._buffer or \
978 977 (self._buffer and self._buffer[-1].rstrip().endswith(':')):
979 978 for f in transforms:
980 979 line = f(line)
981 980
982 981 out = push(line)
983 982 finally:
984 983 if changed_input_mode:
985 984 self.input_mode = saved_input_mode
986 985
987 986 return out
General Comments 0
You need to be logged in to leave comments. Login now