Welcome, guest | Sign In | My Account | Store | Cart

Is it possible to create a simple command line program that I can use to quickly test my regular expressions?

Yes it is. This simple regular expression tester uses Python's re module as well as others to quickly allow you to test your regular expression in one go.

TODO:

  • Add Support For Multiple Regular Expression Input

Recent Changes:

  • Made the output prettier with a little more whitespace. More bytes, but at least it's easier to read!
Python, 117 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
"""
Test Regular Expressions

Written by Sunjay Varma -- www.sunjay.ca
"""

import sys, re, argparse, traceback as tb

__version__ = "1.0"

VALID_FLAGS = ['DOTALL', 'I', 'IGNORECASE', 'L', 'LOCALE', 'M', 'MULTILINE', 
				'S', 'T', 'TEMPLATE', 'U', 'UNICODE', 'VERBOSE', 'X']
FLAG_SEP = "|"

def parse_argv(argv):
	parser = argparse.ArgumentParser(description="Test A Regular Expression \
Against Input", prog="Regular Expression Tester")
	parser.add_argument('-f', nargs="?", type=argparse.FileType('r'),
		default=None, help="The input file to use, if omitted, will use \
sys.stdin", dest="inputfile")
	parser.add_argument('-p', nargs="?", type=argparse.FileType('r'),
		default=None, help="The input regular expression as a text from a file",
		dest="regfile")
	parser.add_argument('pattern', type=str, help="The pattern to test", 
		default="", nargs="?")
	parser.add_argument("-m", type=str, help="Flags to include in the regular \
expression. (e.g. VERBOSE|I)", dest="flags", default="")
	parser.add_argument("-i", action="store_true", help="Run an input \
interpreter after reading the input file (if any). This happens automatically \
if there is no input file.", dest="after_input")
	return parser.parse_args(argv)

def get_input(ifile):
	try:
		return ifile.readline()
	except (EOFError, KeyboardInterrupt):
		return None
		
def prompt(ifile=sys.stdin, prompt="> "):
	try:
		print prompt,
		return ifile.readline()
	except (EOFError, KeyboardInterrupt):
		return None

def test_line(prog, line=""):
	try:
		result = prog.match(line)
		if result:
			print "\bHere is the result:"
			for x in ["groups", "start", "end"]:
				print x.capitalize()+"=>", getattr(result, x)()
			print ""
			return True
		else:
			print "No Matches!\n"
			return False
	except:
		tb.print_exc()
		print "\nAn error occurred!"
		return False

def test_file(prog, input_file):
	completed, total = 0, 0
	line = get_input(input_file)
	while line:
		print "Testing '%s'"%line.rstrip()
		completed += int(test_line(prog, line))
		
		total += 1
		line = get_input(input_file)
	
	print "\nCompleted %i Successful Tests Out of %i"%(completed, total)
	return completed, total

def main():
	args = parse_argv(sys.argv[1:])
	# retrieve the flags
	flags = 0
	for flag in (args.flags and args.flags.split(FLAG_SEP) or []):
		if flag in VALID_FLAGS:
			flags |= getattr(re, flag) 
	pat = args.pattern
	if not pat and args.regfile:
		pat = args.regfile.read()
	try:
		prog = re.compile(pat, flags)
	except:
		tb.print_exc()
		print "\nInvalid Pattern"
		sys.exit(0)
	# test the pattern
	print "Testing Pattern:",
	if args.regfile:
		print '\n"""\\\n'+pat, '"""\n'
	else:
		print repr(pat), "\n"
	try:
		if args.inputfile:
			test_file(prog, args.inputfile)
			args.inputfile.close()
		if args.after_input or not args.inputfile:
			if args.inputfile:
				print "\nContinuing with Input Testing..."
			completed, total = 0, 0
			line = prompt()
			while line:
				completed += int(test_line(prog, line))
				total += 1
				line = prompt()
			
			print "\nCompleted %i Successful Tests Out of %i"%(completed, total)
	except KeyboardInterrupt:
		raise SystemExit
	
if __name__ == "__main__": 
	main()

Details:

The use of this script is extremely simple and can be tailored to almost all of your regular expression needs.

There are several command line arguments that you can tailor to your needs.

Here is the usage guideline:

usage: Regular Expression Tester [-h] [-f [INPUTFILE]] [-p [REGFILE]]
                                 [-m FLAGS] [-i]
                                 [pattern]

Test A Regular Expression Against Input

positional arguments:
  pattern         The pattern to test

optional arguments:
  -h, --help      show this help message and exit
  -f [INPUTFILE]  The input file to use, if omitted, will use sys.stdin
  -p [REGFILE]    The input regular expression as a text from a file
  -m FLAGS        Flags to include in the regular expression. (e.g. VERBOSE|I)
  -i              Run an input interpreter after reading the input file (if
                  any). This happens automatically if there is no input file.

Create a batch (.bat) file and name it retest.bat, then place it somewhere on the system path. Add these lines to it:

@echo off
C:\path\to\re_t.py %*

Now you can use it by typing retest in the Command Line.

Example:

Let's say that you have a file full of test emails that you would like to test against a simple regular expression.

Your emails are stored in input_file.txt: (Note: This Active State seems to add some HTML gibberish before the first email. Make sure you don't copy that!)

me@example.com">class="prettyprint">me@example.com
a.nonymous@example.com
name+tag@example.com
name\@tag@example.com – this is a valid email address containing two @ symbols.
spaces\ are\ allowed@example.com
"spaces may be quoted"@example.com
!#$%&'+-/=.?^`{|}~@[1.0.0.127]
!#$%&'+-/=.?^`{|}~@[IPv6:0123:4567:89AB:CDEF:0123:4567:89AB:CDEF]
me@
@example.com
me.@example.com
.me@example.com
me@example..com
me.example@com
me\@example.com

Your pattern is stored in pattern.txt:

^       # start the pattern
([^@]+) # collect stuff before the @
@       # look for an @
(.+)    # collect everything else
$       # end the pattern!

If both pattern.txt and input_file.txt are in the same directory, all you have to run from the command line in that directory is:

retest -p pattern.txt -m X -f input_file.txt

Where:

  • retest is the name of the batch (.bat) file
  • -p pattern.txt is the pattern file (the pattern could have just been inputted directly as an argument
  • -m X is the VERBOSE flag of the re module. (To allow our expression to work)
  • -f input_file.txt is the input filename to read emails from

The output should look something like this:

Testing Pattern:
"""\
^               # start the pattern
([^@]+) # collect stuff before the @
@               # look for an @
(.+)    # collect everything else
$               # end the pattern!
"""

Testing 'me@example.com'
Here is the result:
Groups=> ('me', 'example.com')
Start=> 0
End=> 14

Testing 'a.nonymous@example.com'
Here is the result:
Groups=> ('a.nonymous', 'example.com')
Start=> 0
End=> 22

Testing 'name+tag@example.com'
Here is the result:
Groups=> ('name+tag', 'example.com')
Start=> 0
End=> 20

Testing 'name\@tag@example.com'
Here is the result:
Groups=> ('name\\', 'tag@example.com')
Start=> 0
End=> 21

Testing 'spaces\ are\ allowed@example.com'
Here is the result:
Groups=> ('spaces\\ are\\ allowed', 'example.com')
Start=> 0
End=> 32

Testing '"spaces may be quoted"@example.com'
Here is the result:
Groups=> ('"spaces may be quoted"', 'example.com')
Start=> 0
End=> 34

Testing '!#$%&'+-/=.?^`{|}~@[1.0.0.127]'
Here is the result:
Groups=> ("!#$%&'+-/=.?^`{|}~", '[1.0.0.127]')
Start=> 0
End=> 30

Testing '!#$%&'+-/=.?^`{|}~@[IPv6:0123:4567:89AB:CDEF:0123:4567:89AB:CDEF]'
Here is the result:
Groups=> ("!#$%&'+-/=.?^`{|}~", '[IPv6:0123:4567:89AB:CDEF:0123:4567:89AB:CDEF]')
Start=> 0
End=> 65

Testing 'me@'
No Matches!

Testing '@example.com'
No Matches!

Testing 'me.@example.com'
Here is the result:
Groups=> ('me.', 'example.com')
Start=> 0
End=> 15

Testing '.me@example.com'
Here is the result:
Groups=> ('.me', 'example.com')
Start=> 0
End=> 15

Testing 'me@example..com'
Here is the result:
Groups=> ('me', 'example..com')
Start=> 0
End=> 15

Testing 'me.example@com'
Here is the result:
Groups=> ('me.example', 'com')
Start=> 0
End=> 14

Testing 'me\@example.com'
Here is the result:
Groups=> ('me\\', 'example.com')
Start=> 0
End=> 15

Completed 13 Successful Tests Out of 15

4 comments

Lost Protocol 13 years, 4 months ago  # | flag

I made something similar too, tryout Regex :). Although much less code and a lot simpler. Has just one command called COMPILE that takes regex. Then it starts testing any input to match the regex. BTW nice nonetheless.

Stephen Chappell 13 years, 3 months ago  # | flag

If you want to play around with regular expression in a GUI to see your work evaluated in real-time, you might want to check out the "redemo.py" program in your "Python/Tools/Scripts" directory.

Ray Barker 13 years, 3 months ago  # | flag

Thanks for posting this. It has me thinking.

  1. Named match groups make regular expressions easier to read and more maintainable. This already works with named groups, just add groupdict to the list of stuff to be printed: for x in ["groups", "groupdict", "start", "end"]:

  2. You use re.match which only matches patterns at the start of lines. re.search finds patterns anywhere in the line.

  3. As written, this does not handle multiline matches like you often get using re.DOTALL. You could use re.finditer to find all the currently found matches plus multiline matches.

Sunjay Varma (author) 13 years, 2 months ago  # | flag

@Ray: You raise some interesting points, and for now, I'm just going to address number 3: I agree completely, but as it seems I didn't mention, this script is just for simple testing... You have got me thinking though, and maybe for that kind of purpose it would be better to add a --complex option or something which prints more verbose results.

I hope that answers that, please reply. I love feed back. :)