ActiveState Code

Recipe 457664: Hidden Scanner functionality in re module


The developers of Python hide a very nice function in the re module for scanning text. The example is taken from the Python testsuite: http://mail.python.org/pipermail/python-dev/2003-April/035075.html

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import sre

def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)

scanner = sre.Scanner([
    (r"[a-zA-Z_]\w*", s_ident),
    (r"\d+\.\d*", s_float),
    (r"\d+", s_int),
    (r"=|\+|-|\*|/", s_operator),
    (r"\s+", None),
    ])

print scanner.scan("sum = 3*foo + 312.50 + bar")

Discussion

First you create a Scanner object and initialize it with a list of a tuple that contains an regular expression and a function for handling the found tokens. Instead of the function you may also use None to skip the found token.

The result looks like this:

(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], '')

This way you may write parsers using regular expressions in an easies way I hope.

Comments

  1. 1. At 4:45 a.m. on 1 dec 2005, Fredrik Lundh said:

    Footnote. You might wish to mention somewhere that you didn't write that snippet yourself (it's copied from Python's test suite; see e.g. http://mail.python.org/pipermail/python-dev/2003-April/035075.html )

    Cheers /F (the original author)

  2. 2. At 11:31 p.m. on 1 dec 2005, Dirk Holtwick (the author) said:

    Copied from testsuite. Of course you're right. I will add the notice. Whith this recipe I just wanted to give a hint to this hidden functionality in "re".

  3. 3. At 3:20 a.m. on 2 dec 2005, Nikos Kouremenos said:

    thank both guys (one for writing) other for posting here. yet another example why python docs suck. espeically the sre module. ;(

  4. 4. At 4:23 a.m. on 3 dec 2005, Anonymous said:

    Very useful. Thanks! Quite a few times I'd need a parser like that and was surprised I couldn't find one in Python.

    Now we just need a special parser for each type (so we won't have to filter from a list), and have it in a standard Python library.

  5. 5. At 5:45 p.m. on 6 dec 2005, Josiah Carlson said:

    Using the sre scanner looks interesting, though I have the feeling that for real parsing activities, using a parser like DParser, Spark, SimpleParse, etc., would probably get one farther.

Sign in to comment