The developers of Python hide a very nice function in the re module for scanning text. The example is taken from the Python testsuite: http://mail.python.org/pipermail/python-dev/2003-April/035075.html

Python, 16 lines
import sre

def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)

scanner = sre.Scanner([
    (r"[a-zA-Z_]\w*", s_ident),
    (r"\d+\.\d*", s_float),
    (r"\d+", s_int),
    (r"=|\+|-|\*|/", s_operator),
    (r"\s+", None),

print scanner.scan("sum = 3*foo + 312.50 + bar")

First you create a Scanner object and initialize it with a list of a tuple that contains an regular expression and a function for handling the found tokens. Instead of the function you may also use None to skip the found token.

The result looks like this:

(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], '')

This way you may write parsers using regular expressions in an easies way I hope.


Fredrik Lundh 18 years, 4 months ago  # | flag

Footnote. You might wish to mention somewhere that you didn't write that snippet yourself (it's copied from Python's test suite; see e.g. http://mail.python.org/pipermail/python-dev/2003-April/035075.html )

Cheers /F (the original author)

Dirk Holtwick (author) 18 years, 4 months ago  # | flag

Copied from testsuite. Of course you're right. I will add the notice. Whith this recipe I just wanted to give a hint to this hidden functionality in "re".

Nikos Kouremenos 18 years, 4 months ago  # | flag

thank both guys (one for writing) other for posting here. yet another example why python docs suck. espeically the sre module. ;(

yariv 18 years, 4 months ago  # | flag

Very useful. Thanks! Quite a few times I'd need a parser like that and was surprised I couldn't find one in Python.

Now we just need a special parser for each type (so we won't have to filter from a list), and have it in a standard Python library.

Josiah Carlson 18 years, 4 months ago  # | flag

Using the sre scanner looks interesting, though I have the feeling that for real parsing activities, using a parser like DParser, Spark, SimpleParse, etc., would probably get one farther.

