ActiveState Code

Recipe 66439: StringValidator


A portable class to carry out all sorts of validation on strings. It uses regular expressions to carry out common validation procedures.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
import re
true = 1
false = 0

class StringValidator:
	RE_ALPHA = None
	RE_ALPHANUMERIC = None
	RE_NUMERIC = None
	RE_EMAIL = None

	validateString = ""
        _patterns = {}

	def __init__(self, validateString):
		self.validateString = validateString

	def isAlpha(self):
                if not self.__class__.RE_ALPHA:
                        self.__class__.RE_ALPHA = re.compile("^\D+$")
                return self.checkStringAgainstRe(self.__class__.RE_ALPHA)

	def isAlphaNumeric(self):
                if not self.__class__.RE_ALPHANUMERIC:
                        self.__class__.RE_ALPHANUMERIC = re.compile("^[a-zA-Z0-9]+$")
                return self.checkStringAgainstRe(self.__class__.RE_ALPHANUMERIC)

	def isNumeric(self):
                if not self.__class__.RE_NUMERIC:
                        self.__class__.RE_NUMERIC = re.compile("^\d+$")
                return self.checkStringAgainstRe(self.__class__.RE_NUMERIC)

	def isEmail(self):
                if not self.__class__.RE_EMAIL:
                        self.__class__.RE_EMAIL = re.compile("^.+@.+\..{2,3}$")
                return self.checkStringAgainstRe(self.__class__.RE_EMAIL)

	def isEmpty(self):
		return self.validateString == ""

        def definePattern(self, re_name, re_pat):
                self._patterns[re_name] = re_pat

        def isValidForPattern(self, re_name):
                if self._patterns.has_key(re_name):
                        if type(self._patterns[re_name]) == type(''):
                                self._patterns[re_name] = re.compile(self._patterns[re_name])
                                return self.checkStringAgainstRe(self._patterns[re_name])
                else:
                        raise KeyError, "No pattern name '%s' stored."

	# this method should be considered to be private (not be be used via interface)

	def checkStringAgainstRe(self, regexObject):
		if regexObject.search(self.validateString) == None:
			return false
		return true

# example usage

sv1 = StringValidator("joe@testmail.com")
sv2 = StringValidator("rw__343")

if sv1.isEmail(): print sv1.validateString + " is a valid e-mail address"
else: print sv1.validateString + " is not a valid e-mail address"

if sv2.isAlphaNumeric(): print sv2.validateString + " is a valid alpha-numeric string"
else: print sv2.validateString + "i is not a valid alpha-numeric string"

# note, this is basically the same as the e-mail checker, just it shows
# how to do a custom re
sv2.definePattern("custom_email", "^.+@.+\..{2,3}$")

if sv1.isValidForPattern("custom_email"): print sv1.validateString + " is a valid e-mail address"
else: print sv1.validateString + " is a invalid e-mail address"

Discussion

Programmers often put a validation engine inside of each individual component they make (ie.Form, Emailer, etc). That can cause maintainability and consistancy problems. A much better way to do it is to have a general validation object that can be delegated the task of performing validation. That is exactly what this class attempts to provide

Comments

  1. 1. At 2:45 p.m. on 27 jul 2001, Steven Cummings said:

    Why compile Rx's each time? Why not implement lazy caching of the actual objects produced by compile() and keep them in the validator instance?

  2. 2. At 4:04 p.m. on 27 jul 2001, Mark Nenadov (the author) said:

    thanks. Thanks for the suggestion. That makes alot of sense. Its definately more efficient. I have made the change to the code, let me know if thats what you were thinking of.

  3. 3. At 4:12 p.m. on 27 jul 2001, Mark Nenadov (the author) said:

    actually.. Actually the other way would be more efficient if I know I was just going to use the "is" only once or twice, because it would only compile the Rx's it needs. However, assuming that we will call this "is" methods many times, the new way is more efficient because the Rx's wouldn't be compiled every time.

    Either way, I think the new way is better design.

  4. 4. At 3:06 p.m. on 31 jul 2001, Steven Cummings said:

    Lazy Rx compilations. So we could combine the ideas and reuse compiled regular expressions, but only compile them as they're needed. Each time a validity-testing method is invoked, just check to see whether the pattern has been compiled, if so use it, if not compile it but save it. I would also add support for arbitrary patterns stored by name that could be added at the class level. One last note, if you want to make the internal service method private, append "__" to the beginning of it's name.

  5. 5. At 11 p.m. on 3 aug 2001, Mark Nenadov (the author) said:

    Ok.. I implemented lazy compilations. Please review my code. Is there any way that I can directly detect whether a variable is a compiled re object? I tried "isinstance".. but it didn't seem to work.

    In other words, are you saying.. you would like to see the ability to add arbitary patterns at run time?

  6. 6. At 10:37 a.m. on 6 aug 2001, Steven Cummings said:

    Getting closer. Just about there... but I would set those class-level variables to None and then just test the variables like:

    class StringValidator:
        RE_ALPHA = None
        ...
    
        def isAlpha(self):
            if not self.RE_ALPHA:
                self.RE_ALPHA = re.compile("^\D+$")
            return self.checkStringAgainstRe(self.RE_ALPHA)
        ...
    

    Note that the return statement should NOT be under the else clause that you had before! This means that in the case where a pre-compiled Rx was not found in the cache-variable, it will perform the compilation but skip the return statement, meaning that the calling expression recieves the default return value of None.

    <p>The idea I threw out about arbitrary patterns was just storing custom named patterns and lazy-compiling those as well. The user would then have to employ a method with the name as an argument or something similiar:

    StringValidator.definePattern("year", "^\d{4}$")
    
    strvalidator = StringValidator("text text abc123...")
    if strvalidator.validForPattern("year"):
        print "Is a valid year!"
    

    </p>

  7. 7. At 9:30 a.m. on 13 aug 2001, Chris Arndt said:

    isValidForPattern. You can't call a method from a uninstantiated method withod an instance.

    So here's a module function that stores the pattern in a class variable:

    def storePattern(re_name, re_pat):
        StringValidator._patterns[re_name] = re_pat
    <pre>
    
    
    and a method to test the instance against that stored pattern:
    
    
    
    <pre>
            def isValidForPattern(self, re_name):
                    if self._patterns.has_key(re_name):
                        if type(self._patterns[re_name]) == type(''):
                            self._patterns[re_name] = re.compile(
                              self._patterns[re_name]
                            )
                        return self.checkStringAgainstRe(
                          self._patterns[re_name])
                    else:
                        raise KeyError, "No pattern named '%s' stored."
    

    here's the testing code (BTW: als testing code should be placed in an

    if __name__ == '__main__':
    

    construct):

    # test for stored patterns
    storePattern('inquisition', '.*inquisition.*')
    sv3 = StringValidator('spammish requisition')
    print sv3._patterns
    
    if sv3.isValidForPattern('inquisition'):
        print "I didn't expect the spanish inquisition!"
    else:
        print "Nobody expects the spanish inquisition!"
    

    BTW: the method of testing the RE if they are of type string allows defining them at the start of the class definition for more clarity

    </pre></pre>

  8. 8. At 9:36 a.m. on 13 aug 2001, Chris Arndt said:

    isValidForPattern revisited. Of course you have to add a class variable '_patterns = {}' to the StringValidator class. Or you alter the storePattern method thus:

    def storePattern(re_name, re_pat):
        if not hasattr(StringValidator, '_patterns'):
        StringValidator._patterns = {}
        StringValidator._patterns[re_name] = re_pat
    
  9. 9. At 7:28 a.m. on 17 aug 2001, Mark Nenadov (the author) said:

    ok. Ok. I have implemented all recommended changes, please review the code.

  10. 10. At 2:51 p.m. on 18 aug 2001, Anonymous said:

    Short email addresses. It is important to note that email adresses _can_ be shorter than 7 characters. Although domain names shorter than 2 letters are not allowed in .com .net and .org, many ISO country domain registrars do allow one letter domain names. For example in Denmark, where I live. For example see http://www.n.dk

    So a@n.dk is in fact a potential valid email address, which would not get through your regexp :-)

  11. 11. At 12:35 a.m. on 15 oct 2001, Alex Martelli said:

    a slight enhancement. better to bind, e.g., self.__class__.RE_ALPHA=etc, rather than self.RE_ALPHA, so all other instances of the same class will transparently re-use the compiled re (keep per-class, not per-instance).

  12. 12. At 9:34 a.m. on 29 oct 2001, Mark Nenadov (the author) said:

    ! Great idea! :]

  13. 13. At 9:35 a.m. on 29 oct 2001, Mark Nenadov (the author) said:

    ! Great idea! :]

  14. 14. At 10:41 a.m. on 28 jul 2003, Jeff Koncz said:

    RE_EMAIL regex. The regex for isEmail will return true even if the string being validated contains space characters. The following regex allows for any character (just as the original) except a space:

    self.__class__.RE_EMAIL = re.compile("^[^\s]+@[^\s]+\.[^\s]{2,3}$")
    
  15. 15. At 3:59 a.m. on 4 jan 2005, Terence MacDonald said:

    Classmethod. Why not make definePattern a classmethod i.e.

    definePattern = classmethod(definePattern)

    then you define patterns without requiring an instance i.e.

    StringValidator.definePattern(....)

Sign in to comment