This function will generate a regular expression for you for any given numeric range. You can also run this code online here: http://utilitymill.com/utility/Regex_For_Range
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | #GPL3
def regex_for_range(min,max):
"""A recursive function to generate a regular expression that matches
any number in the range between min and max inclusive.
Usage / doctests:
>>> regex_for_range(13,57)
'4[0-9]|3[0-9]|2[0-9]|1[3-9]|5[0-7]'
>>> regex_for_range(1983,2011)
'200[0-9]|199[0-9]|198[3-9]|201[0-1]'
>>> regex_for_range(99,112)
'99|10[0-9]|11[0-2]'
Note: doctests are order sensitive, while regular expression engines don't care. So you may need to rewrite these
doctests if making changes.
"""
#overhead
#assert (max>=min) and (min>=0)
_min,_max=str(min),str(max)
#calculations
if min==max:
return '%s' % str(max)
if len(_max)>len(_min):
#more digits in max than min, so we pair it down into sub ranges
#that are the same number of digits. If applicable we also create a pattern to
#cover the cases of values with number of digits in between that of
#max and min.
re_middle_range=None
if len(_max)>len(_min)+2:
#digits more than 2 off, create mid range
re_middle_range='[0-9]{%s,%s}' % (len(_min)+1,len(_max)-1)
elif len(_max)>len(_min)+1:
#digits more than 1 off, create mid range
#assert len(_min)+1==len(_max)-1 #temp: remove
re_middle_range='[0-9]{%s}' % (len(_min)+1)
#pair off into sub ranges
max_big=max
min_big=int('1'+('0'*(len(_max)-1)))
re_big=regex_for_range(min_big,max_big)
max_small=int('9'*len(_min))
min_small=min
re_small=regex_for_range(min_small,max_small)
if re_middle_range:
return '|'.join([re_small,re_middle_range,re_big])
else:
return '|'.join([re_small,re_big])
elif len(_max)==len(_min):
def naive_range(min,max):
"""Simply matches min, to max digits by position. Should create a
valid regex when min and max have same num digits and has same 10s
place digit."""
_min,_max=str(min),str(max)
pattern=''
for i in range(len(_min)):
if _min[i]==_max[i]:
pattern+=_min[i]
else:
pattern+='[%s-%s]' % (_min[i],_max[i])
return '%s' % pattern
if len(_max)==1:
patterns=[naive_range(min,max)]
else:
#this is probably the trickiest part so we'll follow the example of
#1336 to 1821 through this section
patterns=[]
distance=str(max-min) #e.g., distance = 1821-1336 = 485
increment=int('1'+('0'*(len(distance)-1))) #e.g., 100 when distance is 485
if increment==1:
#it's safe to do a naive_range see, see def since 10's place is the same for min and max
patterns=[naive_range(min,max)]
else:
#create a function to return a floor to the correct digit position
#e.g., floor_digit_n(1336) => 1300 when increment is 100
floor_digit_n=lambda x:int(round(x/increment,0)*increment)
#capture a safe middle range
#e.g., create regex patterns to cover range between 1400 to 1800 inclusive
#so in example we should get: 14[0-9]{2}|15[0-9]{2}|16[0-9]{2}|17[0-9]{2}
for i in range(floor_digit_n(max)-increment,floor_digit_n(min),-increment):
len_end_to_replace=len(str(increment))-1
if len_end_to_replace==1:
pattern='%s[0-9]' % str(i)[:-(len_end_to_replace)]
else:
pattern='%s[0-9]{%s}' % (str(i)[:-(len_end_to_replace)],len_end_to_replace)
patterns.append(pattern)
#split off ranges outside of increment digits, i.e., what isn't covered in last step.
#low side: e.g., 1336 -> min=1336, max=1300+(100-1) = 1399
patterns.append(regex_for_range(min,floor_digit_n(min)+(increment-1)))
#high side: e.g., 1821 -> min=1800 max=1821
patterns.append(regex_for_range(floor_digit_n(max),max))
return '|'.join(patterns)
else:
raise ValueError('max value must have more or the same num digits as min')
|
This would be useful to search a document for a certain year range, say 1987-2005 (which is: (199[0-9]|198[7-9]|200[0-5])), or to restrict input in a form to a certain range. Can you think of other uses?
It's a giant recursive function that goes through the range of values to find each sub-range that can be covered by one regular expression. Then it stitches those regular expressions together with the '|' symbol to make one giant regular expression to match all the numbers.
Please let me know if you find any cases where it doesn't work.
If I covered 102-110,it get the following string. 1[0-1][2-0] It obvious incorrect. If you fix this bug,would you like to tell me.Thank you very much! My e-mail is tt.ren.cn@gmail.com.
Can i find this recipe in java!!! Or can explain the detail process of the code!
Sorry, but this is buggy. Another solution: http://kuxyz.blogspot.com/2011/03/generating-regexes-that-match-numeric.html
It works right https://github.com/dimka665/range-regex
I got error about too many recursions when trying range 1-200
User this gem https://rubygems.org/gems/regex_for_range