Welcome, guest | Sign In | My Account | Store | Cart

The developers of Python hide a very nice function in the re module for scanning text. The example is taken from the Python testsuite: http://mail.python.org/pipermail/python-dev/2003-April/035075.html

Python, 16 lines
import sre

def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)

scanner = sre.Scanner([
    (r"[a-zA-Z_]\w*", s_ident),
    (r"\d+\.\d*", s_float),
    (r"\d+", s_int),
    (r"=|\+|-|\*|/", s_operator),
    (r"\s+", None),

print scanner.scan("sum = 3*foo + 312.50 + bar")

First you create a Scanner object and initialize it with a list of a tuple that contains an regular expression and a function for handling the found tokens. Instead of the function you may also use None to skip the found token.

The result looks like this:

(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], '')

This way you may write parsers using regular expressions in an easies way I hope.


Fredrik Lundh 18 years, 4 months ago  # | flag

Footnote. You might wish to mention somewhere that you didn't write that snippet yourself (it's copied from Python's test suite; see e.g. http://mail.python.org/pipermail/python-dev/2003-April/035075.html )

Cheers /F (the original author)

Dirk Holtwick (author) 18 years, 4 months ago  # | flag

Copied from testsuite. Of course you're right. I will add the notice. Whith this recipe I just wanted to give a hint to this hidden functionality in "re".

Nikos Kouremenos 18 years, 4 months ago  # | flag

thank both guys (one for writing) other for posting here. yet another example why python docs suck. espeically the sre module. ;(

yariv 18 years, 4 months ago  # | flag

Very useful. Thanks! Quite a few times I'd need a parser like that and was surprised I couldn't find one in Python.

Now we just need a special parser for each type (so we won't have to filter from a list), and have it in a standard Python library.

Josiah Carlson 18 years, 4 months ago  # | flag

Using the sre scanner looks interesting, though I have the feeling that for real parsing activities, using a parser like DParser, Spark, SimpleParse, etc., would probably get one farther.

Created by Dirk Holtwick on Sun, 27 Nov 2005 (PSF)
Python recipes (4591)
Dirk Holtwick's recipes (15)

Required Modules

Other Information and Tasks