ActiveState Code

Recipe 413487: Quick Test If Strings Are Identifiers


Sometimes you need strings to be true identifiers, for ex. to represent symbolic names. Smalltalk offers the type 'Symbol' for this purpose. In python, you need to test this explicitly. Here is a quick way.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def AreStringsIdentifiers(*strings):
   try:
      class test(object): __slots__ = strings
   except TypeError:
      return False
   else:
      return True

if __name__ == '__main__':
   print
   print AreStringsIdentifiers('A', 'B') # -> True
   print AreStringsIdentifiers('A', '1B', 'x y') # -> False

Discussion

This recipe takes advantage of the special treatment of the __slots__ attribute with new-style python classes. The actual test is performed by the python interpreter and raises a TypeError if any slot name is not a true python identifier. Need I say more?

Cheers and happy checking for identifiers!

Comments

  1. 1. At 1:51 a.m. on 7 may 2005, Alexander Semenov said:

    It isn't very fast. In python to say "it is fast" you must measure. In my experiments this function:

    reident = re.compile(r'^[a-zA-Z_]\w*$')
    def AreStringsIdentifiers2(*strings):
        matcher = reident.match
        for s in strings:
            if matcher(s) is None: return False
        return True
    

    is five times faster.

  2. 2. At 12:48 a.m. on 8 may 2005, Paul Miller said:

    Does it need to be fast? I can't see this function being a bottleneck in any reasonable program.

  3. 3. At 3:42 a.m. on 8 may 2005, Zoran Isailovski (the author) said:

    Speed's not crusial here. Speed is not of such great importance here, as my pre-submitter noticed. I'll better remove the misleading part from the recipe.

  4. 4. At 6:31 a.m. on 8 may 2005, Zoran Isailovski (the author) said:

    About credibility of measurements. "Don't take it for granted, do your own measurements" someone said. So I did:

    • When the very first string is not an identifier, the re-based code above is about 3.3 (not 5) times faster, and this factor does not seem to vary with the string length.

    • When the very last string is not an identifier, the re-based code is about 3.3 times SLOWER for 8-char-strings, getting even worse for shorter strings.

    Usually, measurements are expected to increase the accuracy and credibility of a statement. But then, it depends on what and how you measure ...

Sign in to comment