Welcome, guest | Sign In | My Account | Store | Cart

Pyparsing with Yacc-like parser definitions (top-down, no Forward required, even in recursive rules).

Python, 25 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from pyparsing import *

class PyparsingProcessor(object):
    def __init__(self):
        self._identifiers = {}
       
    def __getitem__(self, key):
        try:
            return self._context[key]
        except KeyError:
            if self._identifiers.has_key(key):
                return self._identifiers[key]
            self._identifiers[key] = Forward()
            return self._identifiers[key]
   
    def __setitem__(self, key, value):
        try:
            self._identifiers[key] << value
            self._context[key] = self._identifiers[key]
        except Exception, e:
            self._context[key] = value
       
    def ProcessPyparsingModule(self, iterable, context):
        self._context = context
        exec iterable in context, self

I really enjoy using Paul McGuire's pyparsing module, and relish the ease with which I can quickly create sophisticated parsers in pure python.

About the only thing I miss from my yacc days is that fact that in yacc I don't have to forward-declare rules, which makes it very easy to specify grammars from the top down, which I find natural, and also makes it painless to declare recursive rules. This recipe is a lightweight of retrofit pyparsing to support the ability _not_ to have to forward-declare rules.

(Warning about the sample code which follows: you apparently can't have a left-shift symbol in this discussion text -- doing so garbles the text when it is viewed. Grr. To compensate, I've had to use LL to refer to the __lshift__ operator. Note to editor: is there a way to get the two adjacent less-than symbols to display correctly? They seem to be getting stripped out as though they were html tags)

Here's a simple pyparsing grammar:

B = Literal('b') C = Literal('c') A = B + C D = Literal('d') D_list = Forward() D_list LL D | (D + D_list)

Not bad at all, and except for the Forward() call, it looks a lot like a yacc grammar -- though in yacc, you'd probably specify it from the top down. Except that would look like this:

B = Forward() C = Forward() A = B + C B LL Literal('b') C LL Literal('c') D_list = Forward() D = Forward() D_list LL D | (D + D_list) D LL Literal('d')

Not very pretty. What we would really like to do is something like this:

A = B + C B = Literal('b') C = Literal('c') D_list = D | (D + D_list) D = Literal('d')

Clean and simple, but, unfortunately, it's not valid python. Try to run this, and you'll get "NameError: name 'B' is not defined".

In this recipe, instead of just passing your grammar to the interpreter, we process it with an instance of the PyparsingProcessor. If the above grammar were in a file named noforward_grammar.py, you could use your grammar like this:

pp = PyparsingProcessor() f = file('noforward_grammar.py', 'rt')

pp.ProcessPyparsingModule (f, globals())

A.parseString('bc') result = D_list.parseString('dddd')

Note that the rules A, B, C, and D_list are added to the globals.

This recipe works through a nifty little hook added in Python 2.4, where the last argument to exec or eval can be any object with a mapping interface (i.e., an object with __getitem__ and __setitem__), not just a dictionary. In otherwords, you can overload symbol lookup with arbitrary semantics.

I've extended this recipe a bit to include checks that all Forwards are eventually used later in the grammar, which is another nice gimme when using this approach

Created by Kevin Atkinson on Fri, 7 Sep 2007 (PSF)
Python recipes (4591)
Kevin Atkinson's recipes (1)

Required Modules

Other Information and Tasks