The (?P<...>...) notation in Python's regular expressions can be viewed as a classification of matched tokens. The names of these classes can be used to dispatch tokens to appropriate handlers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | import re
scan = re.compile(flags=re.VERBOSE, pattern=r"""
(?P<number> \d+\.?\d* ) |
(?P<operator> \+ | - | \* | / ) |
(?P<eof> $ ) |
(?P<error> \S ) # catch all (except whitespace)
""").finditer
def parse(source):
parser = RPNParser()
for each in scan(source):
dispatch = getattr(parser, each.lastgroup)
dispatch(each.group())
return parser.stack[0]
class RPNParser(object):
def __init__(self):
self.stack = []
def number(self, token):
self.stack.append(token)
def operator(self, token):
right = self.stack.pop()
left = self.stack.pop()
self.stack.append(Expression(left, token, right))
def eof(self, token):
if len(self.stack) != 1:
raise Exception("source kaputt!")
def error(self, token):
raise Exception("unknown token %s!" % token)
class Expression(object):
def __init__(self, *args):
self.args = args
def __str__(self):
return "(%s %s %s)" % self.args
if __name__ == "__main__":
print parse("1.5 2 + 3.33 1.11 - *")
|
The example is a simple parser for arithmetic expressions in Reverse Polish Notation, hence the Name RPNParser. First, scan is defined as a tokenizer function and group names (?P<...>...) are defined for tokens matched by the subsequent regex patterns. The RPNParser class has correspondingly named methods which handle the respective tokens. The parse function uses getattr and re.MatchObject's lastgroup attribute to select an appropriate method to dispatch the token to. There an AST is built successively that represents the input string. Then the AST is returned.
More elaborate things are imaginable, e.g. I use the same approach with an ATN-ish state machine to parse source code in my toy Prolog interpreter. There, the handler methods primarily manage state transitions and delegate the actual creation of the Prolog-AST to a Builder object.