Welcome, guest | Sign In | My Account | Store | Cart

Not really - all regexes first get combined into a single big disjunction. Then, for each match, the matching sub-regex is determined from a group name and the match object dispatched to a corresponding method, or simply replaced by a string.

Python, 63 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import re

class MultiRegex(object):
    flags = re.DOTALL
    regexes = ()

    def __init__(self):
        '''
        compile a disjunction of regexes, in order
        '''
        self._regex = re.compile("|".join(self.regexes), self.flags)

    def sub(self, s):
        return self._regex.sub(self._sub, s)

    def _sub(self, mo):
        '''
        determine which partial regex matched, and
        dispatch on self accordingly.
        '''
        for k,v in mo.groupdict().iteritems():
            if v:
                sub = getattr(self, k)
                if callable(sub):
                    return sub(mo)
                return sub
        raise AttributeError, \
             'nothing captured, matching sub-regex could not be identified'


class TrivialExample(MultiRegex):
    regexes = (
        r'(?P<lower>[a-z]{2,})',
        r'(?P<upper>[A-Z]{2,})',
        r'(?P<mixed>[A-Za-z]+)'
    )

    def lower(self, mo):
        return 'lower:' + mo.group()

    upper = lambda self, mo: 'upper:' + mo.group()
    mixed = 'stuff'


class TrivialExample2(TrivialExample):
    '''
    this illustrates that the order of regexes is important
    '''
    regexes = (
        r'(?P<mixed>[a-zA-Z]+)',
        r'(?P<lower>[a-z]{2,})',
        r'(?P<upper>[A-Z]{2,})'
    )

a = 'That cake was AWESOME, dude!'
print TrivialExample().sub(a)
print TrivialExample2().sub(a)

'''
produces:                                                                      
stuff lower:cake lower:was upper:AWESOME, lower:dude!
stuff stuff stuff stuff, stuff!
'''

This recipe just adds a little bit of structure to the handling of matches from parts of a complex regex.

To make it work, it is necessary that each matching sub-regex contains a named group (?P<groupname>...) that matches a method name and actually captures some string - pure look-ahead or look-behind expressions will not work.

One drawback of this recipe is that match objects produced by a large regex will return long lists of matching strings returned by .groups(). Most of the items will be None and can be eliminated using filter(None, matchStrings).

For the simpler case of substituting multiple strings, not regexes, have a look at recipe 81330.

Created by Michael Palmer on Fri, 3 Apr 2009 (MIT)
Python recipes (4591)
Michael Palmer's recipes (8)

Required Modules

Other Information and Tasks