upstream/mercurial-mirror Files · contrib/import-checker.py

parsers: fail fast if Python has wrong minor version (issue4110)...

parsers: fail fast if Python has wrong minor version (issue4110) This change causes an informative ImportError to be raised when importing the extension module parsers if the minor version of the currently-running Python interpreter doesn't match that of the Python that was used when compiling the extension module. Here is an example of what the new error looks like: Traceback (most recent call last): File "test.py", line 1, in <module> import mercurial.parsers ImportError: Python minor version mismatch: The Mercurial extension modules were compiled with Python 2.7.6, but Mercurial is currently using Python with sys.hexversion=33883888: Python 2.5.6 (r256:88840, Nov 18 2012, 05:37:10) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] at: /opt/local/Library/Frameworks/Python.framework/Versions/2.5/Resources/ Python.app/Contents/MacOS/Python The reason for raising an error in this scenario is that Python's C API is known not to be compatible from minor version to minor version, even if sys.api_version is the same. See for example this Python bug report about incompatibilities between 2.5 and 2.6+: http://bugs.python.org/issue8118 These incompatibilities can cause Mercurial to break in mysterious, unforeseen ways. For example, when Mercurial compiled with Python 2.7 was run with 2.5, the following crash occurred when running "hg status": http://bz.selenic.com/show_bug.cgi?id=4110 After this crash was fixed, running with Python 2.5 no longer crashes, but the following puzzling behavior still occurs: $ hg status ... File ".../mercurial/changelog.py", line 123, in __init__ revlog.revlog.__init__(self, opener, "00changelog.i") File ".../mercurial/revlog.py", line 251, in __init__ d = self._io.parseindex(i, self._inline) File ".../mercurial/revlog.py", line 158, in parseindex index, cache = parsers.parse_index2(data, inline) TypeError: data is not a string which can be reproduced more simply with: import mercurial.parsers as parsers parsers.parse_index2("", True) Both the crash and the TypeError occurred because the Python C API's PyString_Check returns the wrong value when the C header files from Python 2.7 are run with Python 2.5. This is an example of an incompatibility of the sort mentioned in the Python bug report above. Failing fast with an informative error message will result in a better user experience in cases like the above. The information in the ImportError will also simplify troubleshooting for those on Mercurial mailing lists, the bug tracker, etc. This patch only adds the version check to parsers.c, which is sufficient to affect command-line commands like "hg status" and "hg summary". An idea for a future improvement is to move the version-checking C code to a more central location, and have it run when importing all Mercurial extension modules and not just parsers.c.

Augie Fackler - - Load All Authors

File last commit:

r20038:c65a6937 default


                r20155:21dafd85

default

Download file

             import-checker.py
        
                    221 lines
            
             | 6.8 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / contrib / import-checker.py
          
                    History
                
                 |
                  Annotation
                 | Raw
                 |Copy content
                 |Copy permalink

      import ast

      import os

      import sys

      def dotted_name_of_path(path):

          """Given a relative path to a source file, return its dotted module name.

          >>> dotted_name_of_path('mercurial/error.py')

          'mercurial.error'

          """

          parts = path.split('/')

          parts[-1] = parts[-1][:-3] # remove .py

          return '.'.join(parts)

      def list_stdlib_modules():

          """List the modules present in the stdlib.

          >>> mods = set(list_stdlib_modules())

          >>> 'BaseHTTPServer' in mods

          True

          os.path isn't really a module, so it's missing:

          >>> 'os.path' in mods

          False

          sys requires special treatment, because it's baked into the

          interpreter, but it should still appear:

          >>> 'sys' in mods

          True

          >>> 'collections' in mods

          True

          >>> 'cStringIO' in mods

          True

          """

          for m in sys.builtin_module_names:

              yield m

          # These modules only exist on windows, but we should always

          # consider them stdlib.

          for m in ['msvcrt', '_winreg']:

              yield m

          # These get missed too

          for m in 'ctypes', 'email':

              yield m

          yield 'builtins' # python3 only

          for libpath in sys.path:

              # We want to walk everything in sys.path that starts with

              # either sys.prefix or sys.exec_prefix.

              if not (libpath.startswith(sys.prefix)

                      or libpath.startswith(sys.exec_prefix)):

                  continue

              if 'site-packages' in libpath:

                  continue

              for top, dirs, files in os.walk(libpath):

                  for name in files:

                      if name == '__init__.py':

                          continue

                      if not (name.endswith('.py') or name.endswith('.so')):

                          continue

                      full_path = os.path.join(top, name)

                      if 'site-packages' in full_path:

                          continue

                      rel_path = full_path[len(libpath) + 1:]

                      mod = dotted_name_of_path(rel_path)

                      yield mod

      stdlib_modules = set(list_stdlib_modules())

      def imported_modules(source, ignore_nested=False):

          """Given the source of a file as a string, yield the names

          imported by that file.

          Args:

            source: The python source to examine as a string.

            ignore_nested: If true, import statements that do not start in

                           column zero will be ignored.

          Returns:

            A list of module names imported by the given source.

          >>> sorted(imported_modules(

          ...         'import foo ; from baz import bar; import foo.qux'))

          ['baz.bar', 'foo', 'foo.qux']

          >>> sorted(imported_modules(

          ... '''import foo

          ... def wat():

          ...     import bar

          ... ''', ignore_nested=True))

          ['foo']

          """

          for node in ast.walk(ast.parse(source)):

              if ignore_nested and getattr(node, 'col_offset', 0) > 0:

                  continue

              if isinstance(node, ast.Import):

                  for n in node.names:

                      yield n.name

              elif isinstance(node, ast.ImportFrom):

                  prefix = node.module + '.'

                  for n in node.names:

                      yield prefix + n.name

      def verify_stdlib_on_own_line(source):

          """Given some python source, verify that stdlib imports are done

          in separate statements from relative local module imports.

          Observing this limitation is important as it works around an

          annoying lib2to3 bug in relative import rewrites:

          http://bugs.python.org/issue19510.

          >>> list(verify_stdlib_on_own_line('import sys, foo'))

          ['mixed stdlib and relative imports:\\n   foo, sys']

          >>> list(verify_stdlib_on_own_line('import sys, os'))

          []

          >>> list(verify_stdlib_on_own_line('import foo, bar'))

          []

          """

          for node in ast.walk(ast.parse(source)):

              if isinstance(node, ast.Import):

                  from_stdlib = {}

                  for n in node.names:

                      from_stdlib[n.name] = n.name in stdlib_modules

                  num_std = len([x for x in from_stdlib.values() if x])

                  if num_std not in (len(from_stdlib.values()), 0):

                      yield ('mixed stdlib and relative imports:\n   %s' %

                             ', '.join(sorted(from_stdlib.iterkeys())))

      class CircularImport(Exception):

          pass

      def cyclekey(names):

          return tuple(sorted(set(names)))

      def check_one_mod(mod, imports, path=None, ignore=None):

          if path is None:

              path = []

          if ignore is None:

              ignore = []

          path = path + [mod]

          for i in sorted(imports.get(mod, [])):

              if i not in stdlib_modules:

                  i = mod.rsplit('.', 1)[0] + '.' + i

              if i in path:

                  firstspot = path.index(i)

                  cycle = path[firstspot:] + [i]

                  if cyclekey(cycle) not in ignore:

                      raise CircularImport(cycle)

                  continue

              check_one_mod(i, imports, path=path, ignore=ignore)

      def rotatecycle(cycle):

          """arrange a cycle so that the lexicographically first module listed first

          >>> rotatecycle(['foo', 'bar', 'foo'])

          ['bar', 'foo', 'bar']

          """

          lowest = min(cycle)

          idx = cycle.index(lowest)

          return cycle[idx:] + cycle[1:idx] + [lowest]

      def find_cycles(imports):

          """Find cycles in an already-loaded import graph.

          >>> imports = {'top.foo': ['bar', 'os.path', 'qux'],

          ...            'top.bar': ['baz', 'sys'],

          ...            'top.baz': ['foo'],

          ...            'top.qux': ['foo']}

          >>> print '\\n'.join(sorted(find_cycles(imports)))

          top.bar -> top.baz -> top.foo -> top.bar -> top.bar

          top.foo -> top.qux -> top.foo -> top.foo

          """

          cycles = {}

          for mod in sorted(imports.iterkeys()):

              try:

                  check_one_mod(mod, imports, ignore=cycles)

              except CircularImport, e:

                  cycle = e.args[0]

                  cycles[cyclekey(cycle)] = ' -> '.join(rotatecycle(cycle))

          return cycles.values()

      def _cycle_sortkey(c):

          return len(c), c

      def main(argv):

          if len(argv) < 2:

              print 'Usage: %s file [file] [file] ...'

              return 1

          used_imports = {}

          any_errors = False

          for source_path in argv[1:]:

              f = open(source_path)

              modname = dotted_name_of_path(source_path)

              src = f.read()

              used_imports[modname] = sorted(

                  imported_modules(src, ignore_nested=True))

              for error in verify_stdlib_on_own_line(src):

                  any_errors = True

                  print source_path, error

              f.close()

          cycles = find_cycles(used_imports)

          if cycles:

              firstmods = set()

              for c in sorted(cycles, key=_cycle_sortkey):

                  first = c.split()[0]

                  # As a rough cut, ignore any cycle that starts with the

                  # same module as some other cycle. Otherwise we see lots

                  # of cycles that are effectively duplicates.

                  if first in firstmods:

                      continue

                  print 'Import cycle:', c

                  firstmods.add(first)

              any_errors = True

          return not any_errors

      if __name__ == '__main__':

          sys.exit(int(main(sys.argv)))

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

				import ast
				import os
				import sys

				def dotted_name_of_path(path):
				"""Given a relative path to a source file, return its dotted module name.


				>>> dotted_name_of_path('mercurial/error.py')
				'mercurial.error'
				"""
				parts = path.split('/')
				parts[-1] = parts[-1][:-3] # remove .py
				return '.'.join(parts)


				def list_stdlib_modules():
				"""List the modules present in the stdlib.

				>>> mods = set(list_stdlib_modules())
				>>> 'BaseHTTPServer' in mods
				True

				os.path isn't really a module, so it's missing:

				>>> 'os.path' in mods
				False

				sys requires special treatment, because it's baked into the
				interpreter, but it should still appear:

				>>> 'sys' in mods
				True

				>>> 'collections' in mods
				True

				>>> 'cStringIO' in mods
				True
				"""
				for m in sys.builtin_module_names:
				yield m
				# These modules only exist on windows, but we should always
				# consider them stdlib.
				for m in ['msvcrt', '_winreg']:
				yield m
				# These get missed too
				for m in 'ctypes', 'email':
				yield m
				yield 'builtins' # python3 only
				for libpath in sys.path:
				# We want to walk everything in sys.path that starts with
				# either sys.prefix or sys.exec_prefix.
				if not (libpath.startswith(sys.prefix)
				or libpath.startswith(sys.exec_prefix)):
				continue
				if 'site-packages' in libpath:
				continue
				for top, dirs, files in os.walk(libpath):
				for name in files:
				if name == '__init__.py':
				continue
				if not (name.endswith('.py') or name.endswith('.so')):
				continue
				full_path = os.path.join(top, name)
				if 'site-packages' in full_path:
				continue
				rel_path = full_path[len(libpath) + 1:]
				mod = dotted_name_of_path(rel_path)
				yield mod

				stdlib_modules = set(list_stdlib_modules())

				def imported_modules(source, ignore_nested=False):
				"""Given the source of a file as a string, yield the names
				imported by that file.

				Args:
				source: The python source to examine as a string.
				ignore_nested: If true, import statements that do not start in
				column zero will be ignored.

				Returns:
				A list of module names imported by the given source.

				>>> sorted(imported_modules(
				... 'import foo ; from baz import bar; import foo.qux'))
				['baz.bar', 'foo', 'foo.qux']
				>>> sorted(imported_modules(
				... '''import foo
				... def wat():
				... import bar
				... ''', ignore_nested=True))
				['foo']
				"""
				for node in ast.walk(ast.parse(source)):
				if ignore_nested and getattr(node, 'col_offset', 0) > 0:
				continue
				if isinstance(node, ast.Import):
				for n in node.names:
				yield n.name
				elif isinstance(node, ast.ImportFrom):
				prefix = node.module + '.'
				for n in node.names:
				yield prefix + n.name

				def verify_stdlib_on_own_line(source):
				"""Given some python source, verify that stdlib imports are done
				in separate statements from relative local module imports.

				Observing this limitation is important as it works around an
				annoying lib2to3 bug in relative import rewrites:
				http://bugs.python.org/issue19510.

				>>> list(verify_stdlib_on_own_line('import sys, foo'))
				['mixed stdlib and relative imports:\\n foo, sys']
				>>> list(verify_stdlib_on_own_line('import sys, os'))
				[]
				>>> list(verify_stdlib_on_own_line('import foo, bar'))
				[]
				"""
				for node in ast.walk(ast.parse(source)):
				if isinstance(node, ast.Import):
				from_stdlib = {}
				for n in node.names:
				from_stdlib[n.name] = n.name in stdlib_modules
				num_std = len([x for x in from_stdlib.values() if x])
				if num_std not in (len(from_stdlib.values()), 0):
				yield ('mixed stdlib and relative imports:\n %s' %
				', '.join(sorted(from_stdlib.iterkeys())))

				class CircularImport(Exception):
				pass


				def cyclekey(names):
				return tuple(sorted(set(names)))

				def check_one_mod(mod, imports, path=None, ignore=None):
				if path is None:
				path = []
				if ignore is None:
				ignore = []
				path = path + [mod]
				for i in sorted(imports.get(mod, [])):
				if i not in stdlib_modules:
				i = mod.rsplit('.', 1)[0] + '.' + i
				if i in path:
				firstspot = path.index(i)
				cycle = path[firstspot:] + [i]
				if cyclekey(cycle) not in ignore:
				raise CircularImport(cycle)
				continue
				check_one_mod(i, imports, path=path, ignore=ignore)

				def rotatecycle(cycle):
				"""arrange a cycle so that the lexicographically first module listed first

				>>> rotatecycle(['foo', 'bar', 'foo'])
				['bar', 'foo', 'bar']
				"""
				lowest = min(cycle)
				idx = cycle.index(lowest)
				return cycle[idx:] + cycle[1:idx] + [lowest]

				def find_cycles(imports):
				"""Find cycles in an already-loaded import graph.

				>>> imports = {'top.foo': ['bar', 'os.path', 'qux'],
				... 'top.bar': ['baz', 'sys'],
				... 'top.baz': ['foo'],
				... 'top.qux': ['foo']}
				>>> print '\\n'.join(sorted(find_cycles(imports)))
				top.bar -> top.baz -> top.foo -> top.bar -> top.bar
				top.foo -> top.qux -> top.foo -> top.foo
				"""
				cycles = {}
				for mod in sorted(imports.iterkeys()):
				try:
				check_one_mod(mod, imports, ignore=cycles)
				except CircularImport, e:
				cycle = e.args[0]
				cycles[cyclekey(cycle)] = ' -> '.join(rotatecycle(cycle))
				return cycles.values()

				def _cycle_sortkey(c):
				return len(c), c

				def main(argv):
				if len(argv) < 2:
				print 'Usage: %s file [file] [file] ...'
				return 1
				used_imports = {}
				any_errors = False
				for source_path in argv[1:]:
				f = open(source_path)
				modname = dotted_name_of_path(source_path)
				src = f.read()
				used_imports[modname] = sorted(
				imported_modules(src, ignore_nested=True))
				for error in verify_stdlib_on_own_line(src):
				any_errors = True
				print source_path, error
				f.close()
				cycles = find_cycles(used_imports)
				if cycles:
				firstmods = set()
				for c in sorted(cycles, key=_cycle_sortkey):
				first = c.split()[0]
				# As a rough cut, ignore any cycle that starts with the
				# same module as some other cycle. Otherwise we see lots
				# of cycles that are effectively duplicates.
				if first in firstmods:
				continue
				print 'Import cycle:', c
				firstmods.add(first)
				any_errors = True
				return not any_errors

				if __name__ == '__main__':
				sys.exit(int(main(sys.argv)))