##// END OF EJS Templates
Backport PR #2738: Unicode content crashes the pager (console)...
Backport PR #2738: Unicode content crashes the pager (console) We've run into an interesting bug in the astropy project. https://github.com/astropy/astropy/issues/600 When displaying a docstring that contains Unicode and is also long enough that it gets sent to the pager it fails since the docstring can't be sent to the pager as ascii. This crashes in the middle of sending content to the pager, so the shell ends up in an inconsistent state and stops echoing the keyboard etc. The fix (attached) is merely to encode the content sent to the pager in the same encoding as the terminal (`sys.stdout.encoding`). Strictly speaking, this isn't always the right thing to do, since the pager may be configured to expect a different encoding than the terminal, but that is sort of an irrational way to configure a machine... ;) For example, `less`, in the absence of any special environment variables to tell it otherwise, uses the standard `LC*` environment variables to determine what to do, which should be the same mechanism the terminal also uses by default. If anyone can suggest a better fix, I'm all for it. Perhaps it should be configurable, defaulting to `sys.stdout.encoding`?

File last commit:

r6998:d2a11a76
r9853:7f9a133e
Show More
splitinput.py
137 lines | 4.7 KiB | text/x-python | PythonLexer
# encoding: utf-8
"""
Simple utility for splitting user input. This is used by both inputsplitter and
prefilter.
Authors:
* Brian Granger
* Fernando Perez
"""
#-----------------------------------------------------------------------------
# Copyright (C) 2008-2011 The IPython Development Team
#
# Distributed under the terms of the BSD License. The full license is in
# the file COPYING, distributed as part of this software.
#-----------------------------------------------------------------------------
#-----------------------------------------------------------------------------
# Imports
#-----------------------------------------------------------------------------
import re
import sys
from IPython.utils import py3compat
from IPython.utils.encoding import get_stream_enc
#-----------------------------------------------------------------------------
# Main function
#-----------------------------------------------------------------------------
# RegExp for splitting line contents into pre-char//first word-method//rest.
# For clarity, each group in on one line.
# WARNING: update the regexp if the escapes in interactiveshell are changed, as
# they are hardwired in.
# Although it's not solely driven by the regex, note that:
# ,;/% only trigger if they are the first character on the line
# ! and !! trigger if they are first char(s) *or* follow an indent
# ? triggers as first or last char.
line_split = re.compile("""
^(\s*) # any leading space
([,;/%]|!!?|\?\??)? # escape character or characters
\s*(%{0,2}[\w\.\*]*) # function/method, possibly with leading %
# to correctly treat things like '?%magic'
(.*?$|$) # rest of line
""", re.VERBOSE)
def split_user_input(line, pattern=None):
"""Split user input into initial whitespace, escape character, function part
and the rest.
"""
# We need to ensure that the rest of this routine deals only with unicode
encoding = get_stream_enc(sys.stdin, 'utf-8')
line = py3compat.cast_unicode(line, encoding)
if pattern is None:
pattern = line_split
match = pattern.match(line)
if not match:
# print "match failed for line '%s'" % line
try:
ifun, the_rest = line.split(None,1)
except ValueError:
# print "split failed for line '%s'" % line
ifun, the_rest = line, u''
pre = re.match('^(\s*)(.*)',line).groups()[0]
esc = ""
else:
pre, esc, ifun, the_rest = match.groups()
#print 'line:<%s>' % line # dbg
#print 'pre <%s> ifun <%s> rest <%s>' % (pre,ifun.strip(),the_rest) # dbg
return pre, esc or '', ifun.strip(), the_rest.lstrip()
class LineInfo(object):
"""A single line of input and associated info.
Includes the following as properties:
line
The original, raw line
continue_prompt
Is this line a continuation in a sequence of multiline input?
pre
Any leading whitespace.
esc
The escape character(s) in pre or the empty string if there isn't one.
Note that '!!' and '??' are possible values for esc. Otherwise it will
always be a single character.
ifun
The 'function part', which is basically the maximal initial sequence
of valid python identifiers and the '.' character. This is what is
checked for alias and magic transformations, used for auto-calling,
etc. In contrast to Python identifiers, it may start with "%" and contain
"*".
the_rest
Everything else on the line.
"""
def __init__(self, line, continue_prompt=False):
self.line = line
self.continue_prompt = continue_prompt
self.pre, self.esc, self.ifun, self.the_rest = split_user_input(line)
self.pre_char = self.pre.strip()
if self.pre_char:
self.pre_whitespace = '' # No whitespace allowd before esc chars
else:
self.pre_whitespace = self.pre
def ofind(self, ip):
"""Do a full, attribute-walking lookup of the ifun in the various
namespaces for the given IPython InteractiveShell instance.
Return a dict with keys: {found, obj, ospace, ismagic}
Note: can cause state changes because of calling getattr, but should
only be run if autocall is on and if the line hasn't matched any
other, less dangerous handlers.
Does cache the results of the call, so can be called multiple times
without worrying about *further* damaging state.
"""
return ip._ofind(self.ifun)
def __str__(self):
return "LineInfo [%s|%s|%s|%s]" %(self.pre, self.esc, self.ifun, self.the_rest)