upstream/mercurial-mirror Files · mercurial/filesetlang.py

deltas: set estimated compression upper bound to "3x" instead of "10x"...

deltas: set estimated compression upper bound to "3x" instead of "10x" In pratice, we very rarely observer compression better than "3x" on manifest deltas. Having a more aggressive estimate significantly helps our pathological use case on a private repository. Here are a comparison of timings using different upper bound. Estimated compression | ø | ×10 | ×5 | ×3 | timing | 14.11 | 2.61 | 1.96 | 1.53 | We also tested the impact of this series on an array of public repositories. This shown no impact in either size nor timing. Full data set below for those interested. Size ---- Regarding size, not significant impact have been noticed on neither public nor private repositories. Here are the number we gathered on public repositories: zlib/upperbound | no | 10x | 5x | 3x mercurial | 5 875 730 | 5 875 730 | 5 875 730 | 5 875 730 pypy | 27 782 913 | 27 782 913 | 27 782 913 | 27 782 913 netbeans | 159 161 207 | 159 161 207 | 159 161 207 | 159 959 879 (+0.5%) mozilla-central | 323 841 642 | 323 841 642 | 323 841 642 | 319 867 519 (-2.5%) mozilla-try | 746 649 123 | 746 649 123 | 746 649 123 | 741 155 568 (-0.7%) private-repo | 1 485 287 294 | 1 485 287 294 | 1 485 287 294 | 1 409 248 382 (-5.1%) zstd/upperbound | no | 10x | 5x | 3x mercurial | 5 895 206 | 5 895 206 | 5 895 206 | 5 895 206 pypy | 28 689 230 | 28 689 230 | 28 689 230 | 28 689 230 netbeans | 157 636 387 | 157 636 387 | 157 636 387 | 159 692 678 (+1.3%) mozilla-central | 317 650 281 | 317 650 281 | 317 650 281 | 319 613 603 (+0.6%) mozilla-try | 737 555 275 | 737 555 275 | 737 555 275 | 738 079 473 (+0.1%) private-repo | 1 352 362 982 | 1 352 362 982 | 1 346 961 880 | 1 361 327 384 (+0.7%) Speed ------ Timing gathered using `hg perfrevlogwrite -m`. Value are in seconds. mercurial zlib | no | 10x | 5x | 3x | total | 65.551783 | 65.388887 | 65.260658 | 65.321199 | max | 0.034544 | 0.034571 | 0.034659 | 0.034521 | 99.99% | 0.034544 | 0.034571 | 0.034659 | 0.034521 | zstd | no | 10x | 5x | 3x | total | 49.118449 | 49.054062 | 48.753588 | 48.740230 | max | 0.009338 | 0.009239 | 0.009202 | 0.009178 | 99.99% | 0.007618 | 0.007639 | 0.007626 | 0.007621 | pypy zlib | no | 10x | 5x | 3x | total | 560.865984 | 558.983817 | 559.083815 | 559.349152 | max | 0.219614 | 0.215922 | 0.218112 | 0.218107 | 99.99% | 0.219614 | 0.215922 | 0.218112 | 0.218107 | zstd | no | 10x | 5x | 3x | total | 349.393280 | 347.395819 | 347.185407 | 345.643985 | max | 0.084143 | 0.083536 | 0.081834 | 0.082178 | 99.99% | 0.039445 | 0.039639 | 0.039612 | 0.039175 | netbeans zlib | no | 10x | 5x | 3x | total | 33103.327727 | 33314.932260 | 33211.745233 | 33345.891778 | max | 2.666852 | 2.672059 | 2.662453 | 2.662936 | 99.99% | 2.058772 | 2.070429 | 2.069569 | 2.064653 | zstd | no | 10x | 5x | 3x | total | 20112.102708 | 20095.879719 | 20083.390300 | 20123.221859 | max | 2.063482 | 2.062851 | 2.065229 | 2.060147 | 99.99% | 1.146647 | 1.143794 | 1.142933 | 1.146529 | mozilla zlib | no | 10x | 5x | 3x | total | 41374.102138 | 41418.816773 | 41381.956370 | 41334.280732 | max | 3.383474 | 3.387400 | 3.405711 | 3.387316 | 99.99% | 1.006755 | 1.005954 | 1.007700 | 1.007373 | zstd | no | 10x | 5x | 3x | total | 24689.691520 | 24643.939662 | 24664.630027 | 24664.512714 | max | 1.460822 | 1.449640 | 1.439747 | 1.465304 | 99.99% | 0.527111 | 0.527377 | 0.527807 | 0.527226 |

Yuya Nishihara - - Load All Authors

File last commit:

r38915:e79a69af default


                r42669:4a3abb33

default

Download file

             filesetlang.py
        
                    330 lines
            
             | 10.2 KiB
            
                | text/x-python
            
             |
                PythonLexer
            
             / mercurial / filesetlang.py
          
                    History
                
                 |
                  Source
                 | Raw
                 |Copy content
                 |Copy permalink

        Yuya Nishihara
    
fileset: extract language processing part to new module (API)...

              r38841
            
      # filesetlang.py - parser, tokenizer and utility for file set language

      #

      # Copyright 2010 Matt Mackall <mpm@selenic.com>

      #

      # This software may be used and distributed according to the terms of the

      # GNU General Public License version 2 or any later version.

      from __future__ import absolute_import

      from .i18n import _

      from . import (

          error,

          parser,

          pycompat,

      )

        Yuya Nishihara
    
fileset: introduce weight constants for readability...

              r38899
            
      # common weight constants for static optimization

      # (see registrar.filesetpredicate for details)

      WEIGHT_CHECK_FILENAME = 0.5

      WEIGHT_READ_CONTENTS = 30

      WEIGHT_STATUS = 10

      WEIGHT_STATUS_THOROUGH = 50

        Yuya Nishihara
    
fileset: extract language processing part to new module (API)...

              r38841
            
      elements = {

          # token-type: binding-strength, primary, prefix, infix, suffix

          "(": (20, None, ("group", 1, ")"), ("func", 1, ")"), None),

          ":": (15, None, None, ("kindpat", 15), None),

          "-": (5, None, ("negate", 19), ("minus", 5), None),

          "not": (10, None, ("not", 10), None, None),

          "!": (10, None, ("not", 10), None, None),

          "and": (5, None, None, ("and", 5), None),

          "&": (5, None, None, ("and", 5), None),

          "or": (4, None, None, ("or", 4), None),

          "|": (4, None, None, ("or", 4), None),

          "+": (4, None, None, ("or", 4), None),

          ",": (2, None, None, ("list", 2), None),

          ")": (0, None, None, None, None),

          "symbol": (0, "symbol", None, None, None),

          "string": (0, "string", None, None, None),

          "end": (0, None, None, None, None),

      }

      keywords = {'and', 'or', 'not'}

      symbols = {}

      globchars = ".*{}[]?/\\_"

      def tokenize(program):

          pos, l = 0, len(program)

          program = pycompat.bytestr(program)

          while pos < l:

              c = program[pos]

              if c.isspace(): # skip inter-token whitespace

                  pass

              elif c in "(),-:|&+!": # handle simple operators

                  yield (c, None, pos)

              elif (c in '"\'' or c == 'r' and

                    program[pos:pos + 2] in ("r'", 'r"')): # handle quoted strings

                  if c == 'r':

                      pos += 1

                      c = program[pos]

                      decode = lambda x: x

                  else:

                      decode = parser.unescapestr

                  pos += 1

                  s = pos

                  while pos < l: # find closing quote

                      d = program[pos]

                      if d == '\\': # skip over escaped characters

                          pos += 2

                          continue

                      if d == c:

                          yield ('string', decode(program[s:pos]), s)

                          break

                      pos += 1

                  else:

                      raise error.ParseError(_("unterminated string"), s)

              elif c.isalnum() or c in globchars or ord(c) > 127:

                  # gather up a symbol/keyword

                  s = pos

                  pos += 1

                  while pos < l: # find end of symbol

                      d = program[pos]

                      if not (d.isalnum() or d in globchars or ord(d) > 127):

                          break

                      pos += 1

                  sym = program[s:pos]

                  if sym in keywords: # operator keywords

                      yield (sym, None, s)

                  else:

                      yield ('symbol', sym, s)

                  pos -= 1

              else:

                  raise error.ParseError(_("syntax error"), pos)

              pos += 1

          yield ('end', None, pos)

      def parse(expr):

          p = parser.parser(elements)

          tree, pos = p.parse(tokenize(expr))

          if pos != len(expr):

              raise error.ParseError(_("invalid token"), pos)

          return parser.simplifyinfixops(tree, {'list', 'or'})

      def getsymbol(x):

          if x and x[0] == 'symbol':

              return x[1]

          raise error.ParseError(_('not a symbol'))

      def getstring(x, err):

          if x and (x[0] == 'string' or x[0] == 'symbol'):

              return x[1]

          raise error.ParseError(err)

      def getkindpat(x, y, allkinds, err):

          kind = getsymbol(x)

          pat = getstring(y, err)

          if kind not in allkinds:

              raise error.ParseError(_("invalid pattern kind: %s") % kind)

          return '%s:%s' % (kind, pat)

      def getpattern(x, allkinds, err):

          if x and x[0] == 'kindpat':

              return getkindpat(x[1], x[2], allkinds, err)

          return getstring(x, err)

      def getlist(x):

          if not x:

              return []

          if x[0] == 'list':

              return list(x[1:])

          return [x]

      def getargs(x, min, max, err):

          l = getlist(x)

          if len(l) < min or len(l) > max:

              raise error.ParseError(err)

          return l

        Yuya Nishihara
    
fileset: add phase to transform parsed tree...

              r38862
            
      def _analyze(x):

          if x is None:

              return x

          op = x[0]

          if op in {'string', 'symbol'}:

              return x

          if op == 'kindpat':

              getsymbol(x[1])  # kind must be a symbol

              t = _analyze(x[2])

              return (op, x[1], t)

        Yuya Nishihara
    
fileset: drop 'group' node from tree to be evaluated...

              r38863
            
          if op == 'group':

              return _analyze(x[1])

        Yuya Nishihara
    
fileset: reject 'negate' node early while transforming parsed tree...

              r38864
            
          if op == 'negate':

              raise error.ParseError(_("can't use negate operator in this context"))

          if op == 'not':

        Yuya Nishihara
    
fileset: add phase to transform parsed tree...

              r38862
            
              t = _analyze(x[1])

              return (op, t)

        Yuya Nishihara
    
fileset: optimize 'x and not y' to 'x - y'...

              r38868
            
          if op == 'and':

        Yuya Nishihara
    
fileset: add phase to transform parsed tree...

              r38862
            
              ta = _analyze(x[1])

              tb = _analyze(x[2])

              return (op, ta, tb)

        Yuya Nishihara
    
fileset: optimize 'x and not y' to 'x - y'...

              r38868
            
          if op == 'minus':

              return _analyze(('and', x[1], ('not', x[2])))

        Yuya Nishihara
    
fileset: add phase to transform parsed tree...

              r38862
            
          if op in {'list', 'or'}:

              ts = tuple(_analyze(y) for y in x[1:])

              return (op,) + ts

          if op == 'func':

              getsymbol(x[1])  # function name must be a symbol

              ta = _analyze(x[2])

              return (op, x[1], ta)

          raise error.ProgrammingError('invalid operator %r' % op)

        Yuya Nishihara
    
fileset: insert hints where status should be computed...

              r38915
            
      def _insertstatushints(x):

          """Insert hint nodes where status should be calculated (first path)

          This works in bottom-up way, summing up status names and inserting hint

          nodes at 'and' and 'or' as needed. Thus redundant hint nodes may be left.

          Returns (status-names, new-tree) at the given subtree, where status-names

          is a sum of status names referenced in the given subtree.

          """

          if x is None:

              return (), x

          op = x[0]

          if op in {'string', 'symbol', 'kindpat'}:

              return (), x

          if op == 'not':

              h, t = _insertstatushints(x[1])

              return h, (op, t)

          if op == 'and':

              ha, ta = _insertstatushints(x[1])

              hb, tb = _insertstatushints(x[2])

              hr = ha + hb

              if ha and hb:

                  return hr, ('withstatus', (op, ta, tb), ('string', ' '.join(hr)))

              return hr, (op, ta, tb)

          if op == 'or':

              hs, ts = zip(*(_insertstatushints(y) for y in x[1:]))

              hr = sum(hs, ())

              if sum(bool(h) for h in hs) > 1:

                  return hr, ('withstatus', (op,) + ts, ('string', ' '.join(hr)))

              return hr, (op,) + ts

          if op == 'list':

              hs, ts = zip(*(_insertstatushints(y) for y in x[1:]))

              return sum(hs, ()), (op,) + ts

          if op == 'func':

              f = getsymbol(x[1])

              # don't propagate 'ha' crossing a function boundary

              ha, ta = _insertstatushints(x[2])

              if getattr(symbols.get(f), '_callstatus', False):

                  return (f,), ('withstatus', (op, x[1], ta), ('string', f))

              return (), (op, x[1], ta)

          raise error.ProgrammingError('invalid operator %r' % op)

      def _mergestatushints(x, instatus):

          """Remove redundant status hint nodes (second path)

          This is the top-down path to eliminate inner hint nodes.

          """

          if x is None:

              return x

          op = x[0]

          if op == 'withstatus':

              if instatus:

                  # drop redundant hint node

                  return _mergestatushints(x[1], instatus)

              t = _mergestatushints(x[1], instatus=True)

              return (op, t, x[2])

          if op in {'string', 'symbol', 'kindpat'}:

              return x

          if op == 'not':

              t = _mergestatushints(x[1], instatus)

              return (op, t)

          if op == 'and':

              ta = _mergestatushints(x[1], instatus)

              tb = _mergestatushints(x[2], instatus)

              return (op, ta, tb)

          if op in {'list', 'or'}:

              ts = tuple(_mergestatushints(y, instatus) for y in x[1:])

              return (op,) + ts

          if op == 'func':

              # don't propagate 'instatus' crossing a function boundary

              ta = _mergestatushints(x[2], instatus=False)

              return (op, x[1], ta)

          raise error.ProgrammingError('invalid operator %r' % op)

        Yuya Nishihara
    
fileset: add phase to transform parsed tree...

              r38862
            
      def analyze(x):

          """Transform raw parsed tree to evaluatable tree which can be fed to

        Yuya Nishihara
    
fileset: add stub for weight-based optimization...

              r38865
            
          optimize() or getmatch()

        Yuya Nishihara
    
fileset: add phase to transform parsed tree...

              r38862
            
          All pseudo operations should be mapped to real operations or functions

          defined in methods or symbols table respectively.

          """

        Yuya Nishihara
    
fileset: insert hints where status should be computed...

              r38915
            
          t = _analyze(x)

          _h, t = _insertstatushints(t)

          return _mergestatushints(t, instatus=False)

        Yuya Nishihara
    
fileset: add phase to transform parsed tree...

              r38862
            
        Yuya Nishihara
    
fileset: optimize 'x and not y' to 'x - y'...

              r38868
            
      def _optimizeandops(op, ta, tb):

          if tb is not None and tb[0] == 'not':

              return ('minus', ta, tb[1])

          return (op, ta, tb)

        Yuya Nishihara
    
fileset: combine union of basic patterns into single matcher...

              r38901
            
      def _optimizeunion(xs):

          # collect string patterns so they can be compiled into a single regexp

          ws, ts, ss = [], [], []

          for x in xs:

              w, t = _optimize(x)

              if t is not None and t[0] in {'string', 'symbol', 'kindpat'}:

                  ss.append(t)

                  continue

              ws.append(w)

              ts.append(t)

          if ss:

              ws.append(WEIGHT_CHECK_FILENAME)

              ts.append(('patterns',) + tuple(ss))

          return ws, ts

        Yuya Nishihara
    
fileset: add stub for weight-based optimization...

              r38865
            
      def _optimize(x):

          if x is None:

              return 0, x

          op = x[0]

        Yuya Nishihara
    
fileset: insert hints where status should be computed...

              r38915
            
          if op == 'withstatus':

              w, t = _optimize(x[1])

              return w, (op, t, x[2])

        Yuya Nishihara
    
fileset: add stub for weight-based optimization...

              r38865
            
          if op in {'string', 'symbol'}:

        Yuya Nishihara
    
fileset: introduce weight constants for readability...

              r38899
            
              return WEIGHT_CHECK_FILENAME, x

        Yuya Nishihara
    
fileset: add stub for weight-based optimization...

              r38865
            
          if op == 'kindpat':

              w, t = _optimize(x[2])

              return w, (op, x[1], t)

          if op == 'not':

              w, t = _optimize(x[1])

              return w, (op, t)

        Yuya Nishihara
    
fileset: reorder 'and' expression to evaluate basic patterns first...

              r38867
            
          if op == 'and':

              wa, ta = _optimize(x[1])

              wb, tb = _optimize(x[2])

              if wa <= wb:

        Yuya Nishihara
    
fileset: optimize 'x and not y' to 'x - y'...

              r38868
            
                  return wa, _optimizeandops(op, ta, tb)

        Yuya Nishihara
    
fileset: reorder 'and' expression to evaluate basic patterns first...

              r38867
            
              else:

        Yuya Nishihara
    
fileset: optimize 'x and not y' to 'x - y'...

              r38868
            
                  return wb, _optimizeandops(op, tb, ta)

        Yuya Nishihara
    
fileset: add stub for weight-based optimization...

              r38865
            
          if op == 'or':

        Yuya Nishihara
    
fileset: combine union of basic patterns into single matcher...

              r38901
            
              ws, ts = _optimizeunion(x[1:])

              if len(ts) == 1:

                  return ws[0], ts[0] # 'or' operation is fully optimized out

        Yuya Nishihara
    
fileset: reorder 'or' expression by weight

              r38900
            
              ts = tuple(it[1] for it in sorted(enumerate(ts),

                                                key=lambda it: ws[it[0]]))

        Yuya Nishihara
    
fileset: add stub for weight-based optimization...

              r38865
            
              return max(ws), (op,) + ts

          if op == 'list':

              ws, ts = zip(*(_optimize(y) for y in x[1:]))

              return sum(ws), (op,) + ts

          if op == 'func':

              f = getsymbol(x[1])

              w = getattr(symbols.get(f), '_weight', 1)

              wa, ta = _optimize(x[2])

              return w + wa, (op, x[1], ta)

          raise error.ProgrammingError('invalid operator %r' % op)

      def optimize(x):

          """Reorder/rewrite evaluatable tree for optimization

          All pseudo operations should be transformed beforehand.

          """

          _w, t = _optimize(x)

          return t

        Yuya Nishihara
    
fileset: extract language processing part to new module (API)...

              r38841
            
      def prettyformat(tree):

          return parser.prettyformat(tree, ('string', 'symbol'))

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

Yuya Nishihara fileset: extract language processing part to new module (API)...	r38841	# filesetlang.py - parser, tokenizer and utility for file set language
		#
		# Copyright 2010 Matt Mackall <mpm@selenic.com>
		#
		# This software may be used and distributed according to the terms of the
		# GNU General Public License version 2 or any later version.

		from __future__ import absolute_import

		from .i18n import _
		from . import (
		error,
		parser,
		pycompat,
		)

Yuya Nishihara fileset: introduce weight constants for readability...	r38899	# common weight constants for static optimization
		# (see registrar.filesetpredicate for details)
		WEIGHT_CHECK_FILENAME = 0.5
		WEIGHT_READ_CONTENTS = 30
		WEIGHT_STATUS = 10
		WEIGHT_STATUS_THOROUGH = 50

Yuya Nishihara fileset: extract language processing part to new module (API)...	r38841	elements = {
		# token-type: binding-strength, primary, prefix, infix, suffix
		"(": (20, None, ("group", 1, ")"), ("func", 1, ")"), None),
		":": (15, None, None, ("kindpat", 15), None),
		"-": (5, None, ("negate", 19), ("minus", 5), None),
		"not": (10, None, ("not", 10), None, None),
		"!": (10, None, ("not", 10), None, None),
		"and": (5, None, None, ("and", 5), None),
		"&": (5, None, None, ("and", 5), None),
		"or": (4, None, None, ("or", 4), None),
		"\|": (4, None, None, ("or", 4), None),
		"+": (4, None, None, ("or", 4), None),
		",": (2, None, None, ("list", 2), None),
		")": (0, None, None, None, None),
		"symbol": (0, "symbol", None, None, None),
		"string": (0, "string", None, None, None),
		"end": (0, None, None, None, None),
		}

		keywords = {'and', 'or', 'not'}

		symbols = {}

		globchars = ".*{}[]?/\\_"

		def tokenize(program):
		pos, l = 0, len(program)
		program = pycompat.bytestr(program)
		while pos < l:
		c = program[pos]
		if c.isspace(): # skip inter-token whitespace
		pass
		elif c in "(),-:\|&+!": # handle simple operators
		yield (c, None, pos)
		elif (c in '"\'' or c == 'r' and
		program[pos:pos + 2] in ("r'", 'r"')): # handle quoted strings
		if c == 'r':
		pos += 1
		c = program[pos]
		decode = lambda x: x
		else:
		decode = parser.unescapestr
		pos += 1
		s = pos
		while pos < l: # find closing quote
		d = program[pos]
		if d == '\\': # skip over escaped characters
		pos += 2
		continue
		if d == c:
		yield ('string', decode(program[s:pos]), s)
		break
		pos += 1
		else:
		raise error.ParseError(_("unterminated string"), s)
		elif c.isalnum() or c in globchars or ord(c) > 127:
		# gather up a symbol/keyword
		s = pos
		pos += 1
		while pos < l: # find end of symbol
		d = program[pos]
		if not (d.isalnum() or d in globchars or ord(d) > 127):
		break
		pos += 1
		sym = program[s:pos]
		if sym in keywords: # operator keywords
		yield (sym, None, s)
		else:
		yield ('symbol', sym, s)
		pos -= 1
		else:
		raise error.ParseError(_("syntax error"), pos)
		pos += 1
		yield ('end', None, pos)

		def parse(expr):
		p = parser.parser(elements)
		tree, pos = p.parse(tokenize(expr))
		if pos != len(expr):
		raise error.ParseError(_("invalid token"), pos)
		return parser.simplifyinfixops(tree, {'list', 'or'})

		def getsymbol(x):
		if x and x[0] == 'symbol':
		return x[1]
		raise error.ParseError(_('not a symbol'))

		def getstring(x, err):
		if x and (x[0] == 'string' or x[0] == 'symbol'):
		return x[1]
		raise error.ParseError(err)

		def getkindpat(x, y, allkinds, err):
		kind = getsymbol(x)
		pat = getstring(y, err)
		if kind not in allkinds:
		raise error.ParseError(_("invalid pattern kind: %s") % kind)
		return '%s:%s' % (kind, pat)

		def getpattern(x, allkinds, err):
		if x and x[0] == 'kindpat':
		return getkindpat(x[1], x[2], allkinds, err)
		return getstring(x, err)

		def getlist(x):
		if not x:
		return []
		if x[0] == 'list':
		return list(x[1:])
		return [x]

		def getargs(x, min, max, err):
		l = getlist(x)
		if len(l) < min or len(l) > max:
		raise error.ParseError(err)
		return l

Yuya Nishihara fileset: add phase to transform parsed tree...	r38862	def _analyze(x):
		if x is None:
		return x

		op = x[0]
		if op in {'string', 'symbol'}:
		return x
		if op == 'kindpat':
		getsymbol(x[1]) # kind must be a symbol
		t = _analyze(x[2])
		return (op, x[1], t)
Yuya Nishihara fileset: drop 'group' node from tree to be evaluated...	r38863	if op == 'group':
		return _analyze(x[1])
Yuya Nishihara fileset: reject 'negate' node early while transforming parsed tree...	r38864	if op == 'negate':
		raise error.ParseError(_("can't use negate operator in this context"))
		if op == 'not':
Yuya Nishihara fileset: add phase to transform parsed tree...	r38862	t = _analyze(x[1])
		return (op, t)
Yuya Nishihara fileset: optimize 'x and not y' to 'x - y'...	r38868	if op == 'and':
Yuya Nishihara fileset: add phase to transform parsed tree...	r38862	ta = _analyze(x[1])
		tb = _analyze(x[2])
		return (op, ta, tb)
Yuya Nishihara fileset: optimize 'x and not y' to 'x - y'...	r38868	if op == 'minus':
		return _analyze(('and', x[1], ('not', x[2])))
Yuya Nishihara fileset: add phase to transform parsed tree...	r38862	if op in {'list', 'or'}:
		ts = tuple(_analyze(y) for y in x[1:])
		return (op,) + ts
		if op == 'func':
		getsymbol(x[1]) # function name must be a symbol
		ta = _analyze(x[2])
		return (op, x[1], ta)
		raise error.ProgrammingError('invalid operator %r' % op)

Yuya Nishihara fileset: insert hints where status should be computed...	r38915	def _insertstatushints(x):
		"""Insert hint nodes where status should be calculated (first path)

		This works in bottom-up way, summing up status names and inserting hint
		nodes at 'and' and 'or' as needed. Thus redundant hint nodes may be left.

		Returns (status-names, new-tree) at the given subtree, where status-names
		is a sum of status names referenced in the given subtree.
		"""
		if x is None:
		return (), x

		op = x[0]
		if op in {'string', 'symbol', 'kindpat'}:
		return (), x
		if op == 'not':
		h, t = _insertstatushints(x[1])
		return h, (op, t)
		if op == 'and':
		ha, ta = _insertstatushints(x[1])
		hb, tb = _insertstatushints(x[2])
		hr = ha + hb
		if ha and hb:
		return hr, ('withstatus', (op, ta, tb), ('string', ' '.join(hr)))
		return hr, (op, ta, tb)
		if op == 'or':
		hs, ts = zip(*(_insertstatushints(y) for y in x[1:]))
		hr = sum(hs, ())
		if sum(bool(h) for h in hs) > 1:
		return hr, ('withstatus', (op,) + ts, ('string', ' '.join(hr)))
		return hr, (op,) + ts
		if op == 'list':
		hs, ts = zip(*(_insertstatushints(y) for y in x[1:]))
		return sum(hs, ()), (op,) + ts
		if op == 'func':
		f = getsymbol(x[1])
		# don't propagate 'ha' crossing a function boundary
		ha, ta = _insertstatushints(x[2])
		if getattr(symbols.get(f), '_callstatus', False):
		return (f,), ('withstatus', (op, x[1], ta), ('string', f))
		return (), (op, x[1], ta)
		raise error.ProgrammingError('invalid operator %r' % op)

		def _mergestatushints(x, instatus):
		"""Remove redundant status hint nodes (second path)

		This is the top-down path to eliminate inner hint nodes.
		"""
		if x is None:
		return x

		op = x[0]
		if op == 'withstatus':
		if instatus:
		# drop redundant hint node
		return _mergestatushints(x[1], instatus)
		t = _mergestatushints(x[1], instatus=True)
		return (op, t, x[2])
		if op in {'string', 'symbol', 'kindpat'}:
		return x
		if op == 'not':
		t = _mergestatushints(x[1], instatus)
		return (op, t)
		if op == 'and':
		ta = _mergestatushints(x[1], instatus)
		tb = _mergestatushints(x[2], instatus)
		return (op, ta, tb)
		if op in {'list', 'or'}:
		ts = tuple(_mergestatushints(y, instatus) for y in x[1:])
		return (op,) + ts
		if op == 'func':
		# don't propagate 'instatus' crossing a function boundary
		ta = _mergestatushints(x[2], instatus=False)
		return (op, x[1], ta)
		raise error.ProgrammingError('invalid operator %r' % op)

Yuya Nishihara fileset: add phase to transform parsed tree...	r38862	def analyze(x):
		"""Transform raw parsed tree to evaluatable tree which can be fed to
Yuya Nishihara fileset: add stub for weight-based optimization...	r38865	optimize() or getmatch()
Yuya Nishihara fileset: add phase to transform parsed tree...	r38862
		All pseudo operations should be mapped to real operations or functions
		defined in methods or symbols table respectively.
		"""
Yuya Nishihara fileset: insert hints where status should be computed...	r38915	t = _analyze(x)
		_h, t = _insertstatushints(t)
		return _mergestatushints(t, instatus=False)
Yuya Nishihara fileset: add phase to transform parsed tree...	r38862
Yuya Nishihara fileset: optimize 'x and not y' to 'x - y'...	r38868	def _optimizeandops(op, ta, tb):
		if tb is not None and tb[0] == 'not':
		return ('minus', ta, tb[1])
		return (op, ta, tb)

Yuya Nishihara fileset: combine union of basic patterns into single matcher...	r38901	def _optimizeunion(xs):
		# collect string patterns so they can be compiled into a single regexp
		ws, ts, ss = [], [], []
		for x in xs:
		w, t = _optimize(x)
		if t is not None and t[0] in {'string', 'symbol', 'kindpat'}:
		ss.append(t)
		continue
		ws.append(w)
		ts.append(t)
		if ss:
		ws.append(WEIGHT_CHECK_FILENAME)
		ts.append(('patterns',) + tuple(ss))
		return ws, ts

Yuya Nishihara fileset: add stub for weight-based optimization...	r38865	def _optimize(x):
		if x is None:
		return 0, x

		op = x[0]
Yuya Nishihara fileset: insert hints where status should be computed...	r38915	if op == 'withstatus':
		w, t = _optimize(x[1])
		return w, (op, t, x[2])
Yuya Nishihara fileset: add stub for weight-based optimization...	r38865	if op in {'string', 'symbol'}:
Yuya Nishihara fileset: introduce weight constants for readability...	r38899	return WEIGHT_CHECK_FILENAME, x
Yuya Nishihara fileset: add stub for weight-based optimization...	r38865	if op == 'kindpat':
		w, t = _optimize(x[2])
		return w, (op, x[1], t)
		if op == 'not':
		w, t = _optimize(x[1])
		return w, (op, t)
Yuya Nishihara fileset: reorder 'and' expression to evaluate basic patterns first...	r38867	if op == 'and':
		wa, ta = _optimize(x[1])
		wb, tb = _optimize(x[2])
		if wa <= wb:
Yuya Nishihara fileset: optimize 'x and not y' to 'x - y'...	r38868	return wa, _optimizeandops(op, ta, tb)
Yuya Nishihara fileset: reorder 'and' expression to evaluate basic patterns first...	r38867	else:
Yuya Nishihara fileset: optimize 'x and not y' to 'x - y'...	r38868	return wb, _optimizeandops(op, tb, ta)
Yuya Nishihara fileset: add stub for weight-based optimization...	r38865	if op == 'or':
Yuya Nishihara fileset: combine union of basic patterns into single matcher...	r38901	ws, ts = _optimizeunion(x[1:])
		if len(ts) == 1:
		return ws[0], ts[0] # 'or' operation is fully optimized out
Yuya Nishihara fileset: reorder 'or' expression by weight	r38900	ts = tuple(it[1] for it in sorted(enumerate(ts),
		key=lambda it: ws[it[0]]))
Yuya Nishihara fileset: add stub for weight-based optimization...	r38865	return max(ws), (op,) + ts
		if op == 'list':
		ws, ts = zip(*(_optimize(y) for y in x[1:]))
		return sum(ws), (op,) + ts
		if op == 'func':
		f = getsymbol(x[1])
		w = getattr(symbols.get(f), '_weight', 1)
		wa, ta = _optimize(x[2])
		return w + wa, (op, x[1], ta)
		raise error.ProgrammingError('invalid operator %r' % op)

		def optimize(x):
		"""Reorder/rewrite evaluatable tree for optimization

		All pseudo operations should be transformed beforehand.
		"""
		_w, t = _optimize(x)
		return t

Yuya Nishihara fileset: extract language processing part to new module (API)...	r38841	def prettyformat(tree):
		return parser.prettyformat(tree, ('string', 'symbol'))