One often needs unique ID strings to tag processes, threads, files or anything else that might be created in quantity. Traditionally these are created based on PIDs or similar system values. But it is not easy to visually recognise these strings, which makes debugging more difficult than it need be. This recipe creates readable and pronounceable ID strings that are much easier to work with.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | from random import choice
def GetFriendlyID():
"""
Create an ID string we can recognise.
(Think Italian or Japanese or Native American.)
"""
v = 'aeiou'
c = 'bdfghklmnprstvw'
return ''.join([choice(v if i%2 else c) for i in range(8)])
def GetUniqueFriendlyID(used_ids):
"""
Return an ID that is not in our list of already used IDs.
"""
# trying infinitely is a bad idea
LIMIT = 1000
count = 0
while count < LIMIT:
id = GetFriendlyID()
if id not in used_ids:
break
count += 1
id = ''
return id
if __name__ == '__main__':
from sets import Set
print 'some sample unique IDs:'
used_ids = Set()
for i in xrange(50):
id = GetUniqueFriendlyID(used_ids)
if not id:
print 'something broke'
break
used_ids.add(id)
print id
|
I hate debugging systems by trying to track IDs that look like "kSbT73oQ". It's much easier on the brain to scan strings like "levokosi". These are also pronounceable, so one can communicate with other members on the team. "Hey dude, what's up with file sadofabi -- it appears to be locked."
I many cases one does not need billions of IDs, so restricting the strings to alternating consonants and vowels, and a reasonable length, gives us plenty of scope. The function GetFriendlyID() generates these of length 8, using the built-in Python pseudo-random generator. If this is not good enough, increase the length or use a more random module. But before you do so, realise that the algorithm here allows over 40 million unique strings.
Of course it is sometimes useful to have ID strings based on PIDs or whatever, so they can be tied back to the originating process. In those cases this recipe is not applicable. There are also cases where pronunciation is the main constraint. Here one could run into problems with these letter choices, since in some English accents most vowels sound the same. A word dictionary approach may be more appropriate in those cases, but first consider how likely it is that you will end up with IDs that sound similar.
The GetUniqueFriendlyID() function sketches out a typical implementation, where we want to ensure the generated IDs are, in fact, unique. (I have now changed this to use sets, for searches that scale better with large data sets.)
The module-level code does a simple test so we can inspect some resulting strings. Baby name generator, anyone?
Nice idea. I'd drop the 'c' though, since there is a pronunciation overlapping with 'k' in e.g. 'ca' and 'ka'.
Oh, and you probably want a set for the used id container, since you want O(1) membership checking, especially if the used id set is big.
Simplifying. The basic function can be a one-liner, though this might be a little cryptic:
return ''.join([choice('aeiou' if i%2 else 'bcdfghklmnprstvw') for i in range(8)])
I agree. Will do that.
I've updated the code. This version requires Python 2.5 as it uses conditional expressions.
hidding used_ids, make life easier ;). I wrote similar functions early and think that user dont want to manage used id list manually (in case of generating one global id set):
</pre>
Whether a global makes life easier or not is implementation dependent. In the cases I normally use this recipe, the IDs are not saved in any sort of Python structure. Rather they must be looked up in a database (eg: which IDs have already been used for keys) or on disk (eg: which IDs have been used for file names). In these cases the use of the ID must be bound up in getting a new ID, in order to avoid race conditions.
Not much longer and no conditional expressions (compatible with older Python). ''.join([choice('bcdfghklmnprstvw')+choice('aeiou') for i in range(4)])