The developers of Python hide a very nice function in the re module for scanning text. The example is taken from the Python testsuite: http://mail.python.org/pipermail/python-dev/2003-April/035075.html
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | import sre
def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)
scanner = sre.Scanner([
(r"[a-zA-Z_]\w*", s_ident),
(r"\d+\.\d*", s_float),
(r"\d+", s_int),
(r"=|\+|-|\*|/", s_operator),
(r"\s+", None),
])
print scanner.scan("sum = 3*foo + 312.50 + bar")
|
First you create a Scanner object and initialize it with a list of a tuple that contains an regular expression and a function for handling the found tokens. Instead of the function you may also use None to skip the found token.
The result looks like this:
(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], '')
This way you may write parsers using regular expressions in an easies way I hope.
Footnote. You might wish to mention somewhere that you didn't write that snippet yourself (it's copied from Python's test suite; see e.g. http://mail.python.org/pipermail/python-dev/2003-April/035075.html )
Cheers /F (the original author)
Copied from testsuite. Of course you're right. I will add the notice. Whith this recipe I just wanted to give a hint to this hidden functionality in "re".
thank both guys (one for writing) other for posting here. yet another example why python docs suck. espeically the sre module. ;(
Very useful. Thanks! Quite a few times I'd need a parser like that and was surprised I couldn't find one in Python.
Now we just need a special parser for each type (so we won't have to filter from a list), and have it in a standard Python library.
Using the sre scanner looks interesting, though I have the feeling that for real parsing activities, using a parser like DParser, Spark, SimpleParse, etc., would probably get one farther.