Welcome, guest | Sign In | My Account | Store | Cart

A portable class to carry out all sorts of validation on strings. It uses regular expressions to carry out common validation procedures.

Python, 74 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
import re
true = 1
false = 0

class StringValidator:
	RE_ALPHA = None
	RE_ALPHANUMERIC = None
	RE_NUMERIC = None
	RE_EMAIL = None

	validateString = ""
        _patterns = {}

	def __init__(self, validateString):
		self.validateString = validateString

	def isAlpha(self):
                if not self.__class__.RE_ALPHA:
                        self.__class__.RE_ALPHA = re.compile("^\D+$")
                return self.checkStringAgainstRe(self.__class__.RE_ALPHA)

	def isAlphaNumeric(self):
                if not self.__class__.RE_ALPHANUMERIC:
                        self.__class__.RE_ALPHANUMERIC = re.compile("^[a-zA-Z0-9]+$")
                return self.checkStringAgainstRe(self.__class__.RE_ALPHANUMERIC)

	def isNumeric(self):
                if not self.__class__.RE_NUMERIC:
                        self.__class__.RE_NUMERIC = re.compile("^\d+$")
                return self.checkStringAgainstRe(self.__class__.RE_NUMERIC)

	def isEmail(self):
                if not self.__class__.RE_EMAIL:
                        self.__class__.RE_EMAIL = re.compile("^.+@.+\..{2,3}$")
                return self.checkStringAgainstRe(self.__class__.RE_EMAIL)

	def isEmpty(self):
		return self.validateString == ""

        def definePattern(self, re_name, re_pat):
                self._patterns[re_name] = re_pat

        def isValidForPattern(self, re_name):
                if self._patterns.has_key(re_name):
                        if type(self._patterns[re_name]) == type(''):
                                self._patterns[re_name] = re.compile(self._patterns[re_name])
                                return self.checkStringAgainstRe(self._patterns[re_name])
                else:
                        raise KeyError, "No pattern name '%s' stored."

	# this method should be considered to be private (not be be used via interface)

	def checkStringAgainstRe(self, regexObject):
		if regexObject.search(self.validateString) == None:
			return false
		return true

# example usage

sv1 = StringValidator("joe@testmail.com")
sv2 = StringValidator("rw__343")

if sv1.isEmail(): print sv1.validateString + " is a valid e-mail address"
else: print sv1.validateString + " is not a valid e-mail address"

if sv2.isAlphaNumeric(): print sv2.validateString + " is a valid alpha-numeric string"
else: print sv2.validateString + "i is not a valid alpha-numeric string"

# note, this is basically the same as the e-mail checker, just it shows
# how to do a custom re
sv2.definePattern("custom_email", "^.+@.+\..{2,3}$")

if sv1.isValidForPattern("custom_email"): print sv1.validateString + " is a valid e-mail address"
else: print sv1.validateString + " is a invalid e-mail address"

Programmers often put a validation engine inside of each individual component they make (ie.Form, Emailer, etc). That can cause maintainability and consistancy problems. A much better way to do it is to have a general validation object that can be delegated the task of performing validation. That is exactly what this class attempts to provide

15 comments

Steven Cummings 22 years, 9 months ago  # | flag

Why compile Rx's each time? Why not implement lazy caching of the actual objects produced by compile() and keep them in the validator instance?

Mark Nenadov (author) 22 years, 9 months ago  # | flag

thanks. Thanks for the suggestion. That makes alot of sense. Its definately more efficient. I have made the change to the code, let me know if thats what you were thinking of.

Mark Nenadov (author) 22 years, 9 months ago  # | flag

actually.. Actually the other way would be more efficient if I know I was just going to use the "is" only once or twice, because it would only compile the Rx's it needs. However, assuming that we will call this "is" methods many times, the new way is more efficient because the Rx's wouldn't be compiled every time.

Either way, I think the new way is better design.

Steven Cummings 22 years, 8 months ago  # | flag

Lazy Rx compilations. So we could combine the ideas and reuse compiled regular expressions, but only compile them as they're needed. Each time a validity-testing method is invoked, just check to see whether the pattern has been compiled, if so use it, if not compile it but save it. I would also add support for arbitrary patterns stored by name that could be added at the class level. One last note, if you want to make the internal service method private, append "__" to the beginning of it's name.

Mark Nenadov (author) 22 years, 8 months ago  # | flag

Ok.. I implemented lazy compilations. Please review my code. Is there any way that I can directly detect whether a variable is a compiled re object? I tried "isinstance".. but it didn't seem to work.

In other words, are you saying.. you would like to see the ability to add arbitary patterns at run time?

Steven Cummings 22 years, 8 months ago  # | flag

Getting closer. Just about there... but I would set those class-level variables to None and then just test the variables like:

class StringValidator:
    RE_ALPHA = None
    ...

    def isAlpha(self):
        if not self.RE_ALPHA:
            self.RE_ALPHA = re.compile("^\D+$")
        return self.checkStringAgainstRe(self.RE_ALPHA)
    ...

Note that the return statement should NOT be under the else clause that you had before! This means that in the case where a pre-compiled Rx was not found in the cache-variable, it will perform the compilation but skip the return statement, meaning that the calling expression recieves the default return value of None.

<p>The idea I threw out about arbitrary patterns was just storing custom named patterns and lazy-compiling those as well. The user would then have to employ a method with the name as an argument or something similiar:

StringValidator.definePattern("year", "^\d{4}$")

strvalidator = StringValidator("text text abc123...")
if strvalidator.validForPattern("year"):
    print "Is a valid year!"

</p>

Chris Arndt 22 years, 8 months ago  # | flag

isValidForPattern. You can't call a method from a uninstantiated method withod an instance.

So here's a module function that stores the pattern in a class variable:

def storePattern(re_name, re_pat):
    StringValidator._patterns[re_name] = re_pat
<pre>


and a method to test the instance against that stored pattern:



<pre>
        def isValidForPattern(self, re_name):
                if self._patterns.has_key(re_name):
                    if type(self._patterns[re_name]) == type(''):
                        self._patterns[re_name] = re.compile(
                          self._patterns[re_name]
                        )
                    return self.checkStringAgainstRe(
                      self._patterns[re_name])
                else:
                    raise KeyError, "No pattern named '%s' stored."

here's the testing code (BTW: als testing code should be placed in an

if __name__ == '__main__':

construct):

# test for stored patterns
storePattern('inquisition', '.*inquisition.*')
sv3 = StringValidator('spammish requisition')
print sv3._patterns

if sv3.isValidForPattern('inquisition'):
    print "I didn't expect the spanish inquisition!"
else:
    print "Nobody expects the spanish inquisition!"

BTW: the method of testing the RE if they are of type string allows defining them at the start of the class definition for more clarity

</pre></pre>

Chris Arndt 22 years, 8 months ago  # | flag

isValidForPattern revisited. Of course you have to add a class variable '_patterns = {}' to the StringValidator class. Or you alter the storePattern method thus:

def storePattern(re_name, re_pat):
    if not hasattr(StringValidator, '_patterns'):
    StringValidator._patterns = {}
    StringValidator._patterns[re_name] = re_pat
Mark Nenadov (author) 22 years, 8 months ago  # | flag

ok. Ok. I have implemented all recommended changes, please review the code.

nicolas 22 years, 8 months ago  # | flag

Short email addresses. It is important to note that email adresses _can_ be shorter than 7 characters. Although domain names shorter than 2 letters are not allowed in .com .net and .org, many ISO country domain registrars do allow one letter domain names. For example in Denmark, where I live. For example see http://www.n.dk

So a@n.dk is in fact a potential valid email address, which would not get through your regexp :-)

Alex Martelli 22 years, 6 months ago  # | flag

a slight enhancement. better to bind, e.g., self.__class__.RE_ALPHA=etc, rather than self.RE_ALPHA, so all other instances of the same class will transparently re-use the compiled re (keep per-class, not per-instance).

Mark Nenadov (author) 22 years, 5 months ago  # | flag

! Great idea! :]

Mark Nenadov (author) 22 years, 5 months ago  # | flag

! Great idea! :]

Jeff Koncz 20 years, 9 months ago  # | flag

RE_EMAIL regex. The regex for isEmail will return true even if the string being validated contains space characters. The following regex allows for any character (just as the original) except a space:

self.__class__.RE_EMAIL = re.compile("^[^\s]+@[^\s]+\.[^\s]{2,3}$")
Terence MacDonald 19 years, 3 months ago  # | flag

Classmethod. Why not make definePattern a classmethod i.e.

definePattern = classmethod(definePattern)

then you define patterns without requiring an instance i.e.

StringValidator.definePattern(....)

Created by Mark Nenadov on Fri, 27 Jul 2001 (PSF)
Python recipes (4591)
Mark Nenadov's recipes (12)

Required Modules

Other Information and Tasks