Popular recipes tagged "regular_expressions" but not "directory"

ActiveState recipe statistics (Python)

2011-06-02T14:52:50-07:00

Python recipe 577732 by Kaan Ozturk (html, regular_expressions, statistics, urllib2, web). Revision 2.

Downloads "All Recipe Authors" pages in ActiveState, uses regular expressions to parse author name and number of their recipes on each page. Finally, it displays the recipe submission distribution (the count of how many authors have submitted how many recipes each).

Formatting numbers with a state machine (implementation of a regex pattern) (Python)

2011-03-22T03:40:45-07:00

Python recipe 577618 by James Mills (formatting, regular_expressions).

I was once asked to explain how the following regular expression works which formats any integer with commas for every thousand (or group of 3 digits):

(\d)(?=(\d{3})+$)

Example:

>>> import re
>>> re.sub("(\d)(?=(\d{3})+$)", "\\1,", "1234")
'1,234'

So here is an implementation of the above regular expression (as best as I could over a lunch break) that will hopefully highlight how a regular expression engine and finite automa work.

Comments and feedback welcome!

--JamesMills / prologic

Simple regex engine, elementary Python (Python)

2010-07-10T10:43:30-07:00

Python recipe 577251 by Joost Behrends (cached, parse, parsing, recursion, regular_expressions). Revision 40.

A short engine for testing against a regex, understanding the 3 common quantifiers ?,+,* (non-greedy) working on characters, ., [...], [^...], \s, \S, bracketed patterns and group designators \N. Accepts unicode objects and fixed-width encoded strings (but problems with eventual comparisons of trailing bytes in multi-byte utf-letters). Captures up to 10 groups ( (?:...) implemented), which can be used for back referencing and in xreplace(). Captured groups are accessible after the search in the global list xGroups. | is supported, but only in groups and needing nested=True. nested=False is making '(' and ')' common letters.

This is not about Python or for Python, there it has little use beside re. But regarding that re needs about 6,000 lines you might agree with the author, that these 176 lines are powerful. This was the reason to publish it as a recipe - as a kind of (fairly complete) minimal example of a regex tester and as an example for corresponding recursive structures in data (TokenListCache) and code.

Working on this improved the author's understanding of regular expressions - especially of their eventual "greed". "Greedy" quantifiers are a concept, which has to be explained seperately and is coming unexpected: Whoever is scanning a text for '<.*>', s/he will search SGML tags, not the whole text. Even with the star's "greediness" the code has to take care, that '.*' doesn't eat the whole text finding no match for '<.*>' at all. Thus the standard syntax with greedy quantifiers cannot be simpler to implement than this with its mere 3 lines 101, 111 and 121 preventing any greed. Perhaps it is faster - otherwise it is difficult to understand, why the concept "greed" is existing at all.

This engine might be useful here and then under circumstances with nothing else available. Its brevity eases translation to other languages and it can work with arbitrary characters for STAR or PERHAPS (for example).

Regular Expression for generic sequences of symbols (Python)

2009-06-13T09:51:38-07:00

Python recipe 576806 by Emanuele Ruffaldi (adt, re, regular_expressions, sequence). Revision 2.

Python regular expression are very powerful and efficient and they can be applied to the recognition of different types of sequences. This recipe shows how to match sequences of generic symbol set with the power of regular expression. The code uses a mapping from every entity into a character. The mapping is used both at level of sequence and in the compilation of the regular expression. When the symbol set is small it is possible to efficiently use 8 bit strings instead of full unicode.

Multi-Regex: Single pass replace of multiple regexes (Python)

2009-04-03T13:38:39-07:00

Python recipe 576710 by Michael Palmer (regular_expressions, string_substitution). Revision 5.

Not really - all regexes first get combined into a single big disjunction. Then, for each match, the matching sub-regex is determined from a group name and the match object dispatched to a corresponding method, or simply replaced by a string.