Is it possible to create a simple command line program that I can use to quickly test my regular expressions?
Yes it is. This simple regular expression tester uses Python's re module as well as others to quickly allow you to test your regular expression in one go.
TODO:
- Add Support For Multiple Regular Expression Input
Recent Changes:
- Made the output prettier with a little more whitespace. More bytes, but at least it's easier to read!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | """
Test Regular Expressions
Written by Sunjay Varma -- www.sunjay.ca
"""
import sys, re, argparse, traceback as tb
__version__ = "1.0"
VALID_FLAGS = ['DOTALL', 'I', 'IGNORECASE', 'L', 'LOCALE', 'M', 'MULTILINE',
'S', 'T', 'TEMPLATE', 'U', 'UNICODE', 'VERBOSE', 'X']
FLAG_SEP = "|"
def parse_argv(argv):
parser = argparse.ArgumentParser(description="Test A Regular Expression \
Against Input", prog="Regular Expression Tester")
parser.add_argument('-f', nargs="?", type=argparse.FileType('r'),
default=None, help="The input file to use, if omitted, will use \
sys.stdin", dest="inputfile")
parser.add_argument('-p', nargs="?", type=argparse.FileType('r'),
default=None, help="The input regular expression as a text from a file",
dest="regfile")
parser.add_argument('pattern', type=str, help="The pattern to test",
default="", nargs="?")
parser.add_argument("-m", type=str, help="Flags to include in the regular \
expression. (e.g. VERBOSE|I)", dest="flags", default="")
parser.add_argument("-i", action="store_true", help="Run an input \
interpreter after reading the input file (if any). This happens automatically \
if there is no input file.", dest="after_input")
return parser.parse_args(argv)
def get_input(ifile):
try:
return ifile.readline()
except (EOFError, KeyboardInterrupt):
return None
def prompt(ifile=sys.stdin, prompt="> "):
try:
print prompt,
return ifile.readline()
except (EOFError, KeyboardInterrupt):
return None
def test_line(prog, line=""):
try:
result = prog.match(line)
if result:
print "\bHere is the result:"
for x in ["groups", "start", "end"]:
print x.capitalize()+"=>", getattr(result, x)()
print ""
return True
else:
print "No Matches!\n"
return False
except:
tb.print_exc()
print "\nAn error occurred!"
return False
def test_file(prog, input_file):
completed, total = 0, 0
line = get_input(input_file)
while line:
print "Testing '%s'"%line.rstrip()
completed += int(test_line(prog, line))
total += 1
line = get_input(input_file)
print "\nCompleted %i Successful Tests Out of %i"%(completed, total)
return completed, total
def main():
args = parse_argv(sys.argv[1:])
# retrieve the flags
flags = 0
for flag in (args.flags and args.flags.split(FLAG_SEP) or []):
if flag in VALID_FLAGS:
flags |= getattr(re, flag)
pat = args.pattern
if not pat and args.regfile:
pat = args.regfile.read()
try:
prog = re.compile(pat, flags)
except:
tb.print_exc()
print "\nInvalid Pattern"
sys.exit(0)
# test the pattern
print "Testing Pattern:",
if args.regfile:
print '\n"""\\\n'+pat, '"""\n'
else:
print repr(pat), "\n"
try:
if args.inputfile:
test_file(prog, args.inputfile)
args.inputfile.close()
if args.after_input or not args.inputfile:
if args.inputfile:
print "\nContinuing with Input Testing..."
completed, total = 0, 0
line = prompt()
while line:
completed += int(test_line(prog, line))
total += 1
line = prompt()
print "\nCompleted %i Successful Tests Out of %i"%(completed, total)
except KeyboardInterrupt:
raise SystemExit
if __name__ == "__main__":
main()
|
Details:
The use of this script is extremely simple and can be tailored to almost all of your regular expression needs.
There are several command line arguments that you can tailor to your needs.
Here is the usage guideline:
usage: Regular Expression Tester [-h] [-f [INPUTFILE]] [-p [REGFILE]]
[-m FLAGS] [-i]
[pattern]
Test A Regular Expression Against Input
positional arguments:
pattern The pattern to test
optional arguments:
-h, --help show this help message and exit
-f [INPUTFILE] The input file to use, if omitted, will use sys.stdin
-p [REGFILE] The input regular expression as a text from a file
-m FLAGS Flags to include in the regular expression. (e.g. VERBOSE|I)
-i Run an input interpreter after reading the input file (if
any). This happens automatically if there is no input file.
Create a batch (.bat) file and name it retest.bat, then place it somewhere on the system path. Add these lines to it:
@echo off
C:\path\to\re_t.py %*
Now you can use it by typing retest
in the Command Line.
Example:
Let's say that you have a file full of test emails that you would like to test against a simple regular expression.
Your emails are stored in input_file.txt
: (Note: This Active State seems to add some HTML gibberish before the first email. Make sure you don't copy that!)
me@example.com">class="prettyprint">me@example.com
a.nonymous@example.com
name+tag@example.com
name\@tag@example.com – this is a valid email address containing two @ symbols.
spaces\ are\ allowed@example.com
"spaces may be quoted"@example.com
!#$%&'+-/=.?^`{|}~@[1.0.0.127]
!#$%&'+-/=.?^`{|}~@[IPv6:0123:4567:89AB:CDEF:0123:4567:89AB:CDEF]
me@
@example.com
me.@example.com
.me@example.com
me@example..com
me.example@com
me\@example.com
Your pattern is stored in pattern.txt
:
^ # start the pattern
([^@]+) # collect stuff before the @
@ # look for an @
(.+) # collect everything else
$ # end the pattern!
If both pattern.txt
and input_file.txt
are in the same directory, all you have to run from the command line in that directory is:
retest -p pattern.txt -m X -f input_file.txt
Where:
retest
is the name of the batch (.bat) file-p pattern.txt
is the pattern file (the pattern could have just been inputted directly as an argument-m X
is the VERBOSE flag of the re module. (To allow our expression to work)-f input_file.txt
is the input filename to read emails from
The output should look something like this:
Testing Pattern:
"""\
^ # start the pattern
([^@]+) # collect stuff before the @
@ # look for an @
(.+) # collect everything else
$ # end the pattern!
"""
Testing 'me@example.com'
Here is the result:
Groups=> ('me', 'example.com')
Start=> 0
End=> 14
Testing 'a.nonymous@example.com'
Here is the result:
Groups=> ('a.nonymous', 'example.com')
Start=> 0
End=> 22
Testing 'name+tag@example.com'
Here is the result:
Groups=> ('name+tag', 'example.com')
Start=> 0
End=> 20
Testing 'name\@tag@example.com'
Here is the result:
Groups=> ('name\\', 'tag@example.com')
Start=> 0
End=> 21
Testing 'spaces\ are\ allowed@example.com'
Here is the result:
Groups=> ('spaces\\ are\\ allowed', 'example.com')
Start=> 0
End=> 32
Testing '"spaces may be quoted"@example.com'
Here is the result:
Groups=> ('"spaces may be quoted"', 'example.com')
Start=> 0
End=> 34
Testing '!#$%&'+-/=.?^`{|}~@[1.0.0.127]'
Here is the result:
Groups=> ("!#$%&'+-/=.?^`{|}~", '[1.0.0.127]')
Start=> 0
End=> 30
Testing '!#$%&'+-/=.?^`{|}~@[IPv6:0123:4567:89AB:CDEF:0123:4567:89AB:CDEF]'
Here is the result:
Groups=> ("!#$%&'+-/=.?^`{|}~", '[IPv6:0123:4567:89AB:CDEF:0123:4567:89AB:CDEF]')
Start=> 0
End=> 65
Testing 'me@'
No Matches!
Testing '@example.com'
No Matches!
Testing 'me.@example.com'
Here is the result:
Groups=> ('me.', 'example.com')
Start=> 0
End=> 15
Testing '.me@example.com'
Here is the result:
Groups=> ('.me', 'example.com')
Start=> 0
End=> 15
Testing 'me@example..com'
Here is the result:
Groups=> ('me', 'example..com')
Start=> 0
End=> 15
Testing 'me.example@com'
Here is the result:
Groups=> ('me.example', 'com')
Start=> 0
End=> 14
Testing 'me\@example.com'
Here is the result:
Groups=> ('me\\', 'example.com')
Start=> 0
End=> 15
Completed 13 Successful Tests Out of 15
I made something similar too, tryout Regex :). Although much less code and a lot simpler. Has just one command called COMPILE that takes regex. Then it starts testing any input to match the regex. BTW nice nonetheless.
If you want to play around with regular expression in a GUI to see your work evaluated in real-time, you might want to check out the "redemo.py" program in your "Python/Tools/Scripts" directory.
Thanks for posting this. It has me thinking.
Named match groups make regular expressions easier to read and more maintainable. This already works with named groups, just add groupdict to the list of stuff to be printed: for x in ["groups", "groupdict", "start", "end"]:
You use re.match which only matches patterns at the start of lines. re.search finds patterns anywhere in the line.
As written, this does not handle multiline matches like you often get using re.DOTALL. You could use re.finditer to find all the currently found matches plus multiline matches.
@Ray: You raise some interesting points, and for now, I'm just going to address number 3: I agree completely, but as it seems I didn't mention, this script is just for simple testing... You have got me thinking though, and maybe for that kind of purpose it would be better to add a --complex option or something which prints more verbose results.
I hope that answers that, please reply. I love feed back. :)