One often needs unique ID strings to tag processes, threads, files or anything else that might be created in quantity. Traditionally these are created based on PIDs or similar system values. But it is not easy to visually recognise these strings, which makes debugging more difficult than it need be. This recipe creates readable and pronounceable ID strings that are much easier to work with.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
from random import choice def GetFriendlyID(): """ Create an ID string we can recognise. (Think Italian or Japanese or Native American.) """ v = 'aeiou' c = 'bdfghklmnprstvw' return ''.join([choice(v if i%2 else c) for i in range(8)]) def GetUniqueFriendlyID(used_ids): """ Return an ID that is not in our list of already used IDs. """ # trying infinitely is a bad idea LIMIT = 1000 count = 0 while count < LIMIT: id = GetFriendlyID() if id not in used_ids: break count += 1 id = '' return id if __name__ == '__main__': from sets import Set print 'some sample unique IDs:' used_ids = Set() for i in xrange(50): id = GetUniqueFriendlyID(used_ids) if not id: print 'something broke' break used_ids.add(id) print id
I hate debugging systems by trying to track IDs that look like "kSbT73oQ". It's much easier on the brain to scan strings like "levokosi". These are also pronounceable, so one can communicate with other members on the team. "Hey dude, what's up with file sadofabi -- it appears to be locked."
I many cases one does not need billions of IDs, so restricting the strings to alternating consonants and vowels, and a reasonable length, gives us plenty of scope. The function GetFriendlyID() generates these of length 8, using the built-in Python pseudo-random generator. If this is not good enough, increase the length or use a more random module. But before you do so, realise that the algorithm here allows over 40 million unique strings.
Of course it is sometimes useful to have ID strings based on PIDs or whatever, so they can be tied back to the originating process. In those cases this recipe is not applicable. There are also cases where pronunciation is the main constraint. Here one could run into problems with these letter choices, since in some English accents most vowels sound the same. A word dictionary approach may be more appropriate in those cases, but first consider how likely it is that you will end up with IDs that sound similar.
The GetUniqueFriendlyID() function sketches out a typical implementation, where we want to ensure the generated IDs are, in fact, unique. (I have now changed this to use sets, for searches that scale better with large data sets.)
The module-level code does a simple test so we can inspect some resulting strings. Baby name generator, anyone?