Welcome, guest | Sign In | My Account | Store | Cart

This function will generate a regular expression for you for any given numeric range. You can also run this code online here: http://utilitymill.com/utility/Regex_For_Range

Python, 93 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
#GPL3

def regex_for_range(min,max):
    """A recursive function to generate a regular expression that matches
    any number in the range between min and max inclusive.

    Usage / doctests:
    >>> regex_for_range(13,57)
    '4[0-9]|3[0-9]|2[0-9]|1[3-9]|5[0-7]'
    >>> regex_for_range(1983,2011)
    '200[0-9]|199[0-9]|198[3-9]|201[0-1]'
    >>> regex_for_range(99,112)
    '99|10[0-9]|11[0-2]'

    Note: doctests are order sensitive, while regular expression engines don't care.  So you may need to rewrite these
    doctests if making changes.
    """
    #overhead
    #assert (max>=min) and (min>=0)
    _min,_max=str(min),str(max)
    #calculations
    if min==max:
        return '%s' % str(max)
    if len(_max)>len(_min):
        #more digits in max than min, so we pair it down into sub ranges
        #that are the same number of digits.  If applicable we also create a pattern to
        #cover the cases of values with number of digits in between that of
        #max and min.
        re_middle_range=None
        if len(_max)>len(_min)+2:
            #digits more than 2 off, create mid range
            re_middle_range='[0-9]{%s,%s}' % (len(_min)+1,len(_max)-1)
        elif len(_max)>len(_min)+1:
            #digits more than 1 off, create mid range
            #assert len(_min)+1==len(_max)-1 #temp: remove
            re_middle_range='[0-9]{%s}' % (len(_min)+1)
        #pair off into sub ranges
        max_big=max
        min_big=int('1'+('0'*(len(_max)-1)))
        re_big=regex_for_range(min_big,max_big)
        max_small=int('9'*len(_min))
        min_small=min
        re_small=regex_for_range(min_small,max_small)
        if re_middle_range:
            return '|'.join([re_small,re_middle_range,re_big])
        else:
            return '|'.join([re_small,re_big])
    elif len(_max)==len(_min):
        def naive_range(min,max):
            """Simply matches min, to max digits by position.  Should create a
            valid regex when min and max have same num digits and has same 10s
            place digit."""
            _min,_max=str(min),str(max)
            pattern=''
            for i in range(len(_min)):
                if _min[i]==_max[i]:
                    pattern+=_min[i]
                else:
                    pattern+='[%s-%s]' % (_min[i],_max[i])
            return '%s' % pattern
        if len(_max)==1:
            patterns=[naive_range(min,max)]
        else:
            #this is probably the trickiest part so we'll follow the example of
            #1336 to 1821 through this section
            patterns=[]
            distance=str(max-min) #e.g., distance = 1821-1336 = 485
            increment=int('1'+('0'*(len(distance)-1))) #e.g., 100 when distance is 485
            if increment==1:
                #it's safe to do a naive_range see, see def since 10's place is the same for min and max
                patterns=[naive_range(min,max)]
            else:
                #create a function to return a floor to the correct digit position
                #e.g., floor_digit_n(1336) => 1300 when increment is 100
                floor_digit_n=lambda x:int(round(x/increment,0)*increment)
                #capture a safe middle range
                #e.g., create regex patterns to cover range between 1400 to 1800 inclusive
                #so in example we should get: 14[0-9]{2}|15[0-9]{2}|16[0-9]{2}|17[0-9]{2}
                for i in range(floor_digit_n(max)-increment,floor_digit_n(min),-increment):
                    len_end_to_replace=len(str(increment))-1
                    if len_end_to_replace==1:
                        pattern='%s[0-9]' % str(i)[:-(len_end_to_replace)]
                    else:
                        pattern='%s[0-9]{%s}' % (str(i)[:-(len_end_to_replace)],len_end_to_replace)
                    patterns.append(pattern)
                #split off ranges outside of increment digits, i.e., what isn't covered in last step.
                #low side: e.g., 1336 -> min=1336, max=1300+(100-1) = 1399
                patterns.append(regex_for_range(min,floor_digit_n(min)+(increment-1)))
                #high side: e.g., 1821 -> min=1800 max=1821
                patterns.append(regex_for_range(floor_digit_n(max),max))
        return '|'.join(patterns)
    else:
        raise ValueError('max value must have more or the same num digits as min')

This would be useful to search a document for a certain year range, say 1987-2005 (which is: (199[0-9]|198[7-9]|200[0-5])), or to restrict input in a form to a certain range. Can you think of other uses?

It's a giant recursive function that goes through the range of values to find each sub-range that can be covered by one regular expression. Then it stitches those regular expressions together with the '|' symbol to make one giant regular expression to match all the numbers.

Please let me know if you find any cases where it doesn't work.

6 comments

tt.ren.cn 15 years ago  # | flag

If I covered 102-110,it get the following string. 1[0-1][2-0] It obvious incorrect. If you fix this bug,would you like to tell me.Thank you very much! My e-mail is tt.ren.cn@gmail.com.

vamshi 13 years, 2 months ago  # | flag

Can i find this recipe in java!!! Or can explain the detail process of the code!

dimka665 11 years, 1 month ago  # | flag
Mindaugas Dobilas 8 years, 3 months ago  # | flag

I got error about too many recursions when trying range 1-200

Dhwanit 7 years, 10 months ago  # | flag
Created by greg p on Sun, 28 Oct 2007 (PSF)
Python recipes (4591)
greg p's recipes (3)

Required Modules

  • (none specified)

Other Information and Tasks