Welcome, guest | Sign In | My Account | Store | Cart

Hidden Scanner functionality in re module (Python recipe) by Dirk Holtwick
ActiveState Code (http://code.activestate.com/recipes/457664/)

The developers of Python hide a very nice function in the re module for scanning text. The example is taken from the Python testsuite: http://mail.python.org/pipermail/python-dev/2003-April/035075.html

      import sre

def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)

scanner = sre.Scanner([
    (r"[a-zA-Z_]\w*", s_ident),
    (r"\d+\.\d*", s_float),
    (r"\d+", s_int),
    (r"=|\+|-|\*|/", s_operator),
    (r"\s+", None),
    ])

print scanner.scan("sum = 3*foo + 312.50 + bar")

      

First you create a Scanner object and initialize it with a list of a tuple that contains an regular expression and a function for handling the found tokens. Instead of the function you may also use None to skip the found token.

The result looks like this:

(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], '')

This way you may write parsers using regular expressions in an easies way I hope.

Tags: text

5 comments

Fredrik Lundh 18 years, 4 months ago # | flag

Footnote. You might wish to mention somewhere that you didn't write that snippet yourself (it's copied from Python's test suite; see e.g. http://mail.python.org/pipermail/python-dev/2003-April/035075.html )

Cheers /F (the original author)

Dirk Holtwick (author) 18 years, 4 months ago # | flag

Copied from testsuite. Of course you're right. I will add the notice. Whith this recipe I just wanted to give a hint to this hidden functionality in "re".

Nikos Kouremenos 18 years, 4 months ago # | flag

thank both guys (one for writing) other for posting here. yet another example why python docs suck. espeically the sre module. ;(

yariv 18 years, 4 months ago # | flag

Very useful. Thanks! Quite a few times I'd need a parser like that and was surprised I couldn't find one in Python.

Now we just need a special parser for each type (so we won't have to filter from a list), and have it in a standard Python library.

Josiah Carlson 18 years, 4 months ago # | flag

Using the sre scanner looks interesting, though I have the feeling that for real parsing activities, using a parser like DParser, Spark, SimpleParse, etc., would probably get one farther.

Created by Dirk Holtwick on Sun, 27 Nov 2005 (PSF)

◄	Python recipes (4591)	►
◄	Dirk Holtwick's recipes (15)	►

Required Modules

Other Information and Tasks

Licensed under the PSF License
Viewed 30594 times
Revision 2 (updated 18 years ago)

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Hidden Scanner functionality in re module (Python recipe) by Dirk Holtwick ActiveState Code (http://code.activestate.com/recipes/457664/)