Welcome, guest | Sign In | My Account | Store | Cart

One often needs unique ID strings to tag processes, threads, files or anything else that might be created in quantity. Traditionally these are created based on PIDs or similar system values. But it is not easy to visually recognise these strings, which makes debugging more difficult than it need be. This recipe creates readable and pronounceable ID strings that are much easier to work with.

Python, 40 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from random import choice

def GetFriendlyID():
	"""
		Create an ID string we can recognise.
		(Think Italian or Japanese or Native American.)
	"""
	v = 'aeiou'
	c = 'bdfghklmnprstvw'
	
	return ''.join([choice(v if i%2 else c) for i in range(8)])

def GetUniqueFriendlyID(used_ids):
	"""
		Return an ID that is not in our list of already used IDs.
	"""
	# trying infinitely is a bad idea
	LIMIT = 1000

	count = 0
	while count < LIMIT:
		id = GetFriendlyID()
		if id not in used_ids:
			break
		count += 1
		id = ''
	return id

if __name__ == '__main__':
	from sets import Set

	print 'some sample unique IDs:'
	used_ids = Set()
	for i in xrange(50):
		id = GetUniqueFriendlyID(used_ids)
		if not id:
			print 'something broke'
			break
		used_ids.add(id)
		print id

		

I hate debugging systems by trying to track IDs that look like "kSbT73oQ". It's much easier on the brain to scan strings like "levokosi". These are also pronounceable, so one can communicate with other members on the team. "Hey dude, what's up with file sadofabi -- it appears to be locked."

I many cases one does not need billions of IDs, so restricting the strings to alternating consonants and vowels, and a reasonable length, gives us plenty of scope. The function GetFriendlyID() generates these of length 8, using the built-in Python pseudo-random generator. If this is not good enough, increase the length or use a more random module. But before you do so, realise that the algorithm here allows over 40 million unique strings.

Of course it is sometimes useful to have ID strings based on PIDs or whatever, so they can be tied back to the originating process. In those cases this recipe is not applicable. There are also cases where pronunciation is the main constraint. Here one could run into problems with these letter choices, since in some English accents most vowels sound the same. A word dictionary approach may be more appropriate in those cases, but first consider how likely it is that you will end up with IDs that sound similar.

The GetUniqueFriendlyID() function sketches out a typical implementation, where we want to ensure the generated IDs are, in fact, unique. (I have now changed this to use sets, for searches that scale better with large data sets.)

The module-level code does a simple test so we can inspect some resulting strings. Baby name generator, anyone?

8 comments

Matteo Dell'Amico 16 years, 8 months ago  # | flag

Nice idea. I'd drop the 'c' though, since there is a pronunciation overlapping with 'k' in e.g. 'ca' and 'ka'.

Matteo Dell'Amico 16 years, 8 months ago  # | flag

Oh, and you probably want a set for the used id container, since you want O(1) membership checking, especially if the used id set is big.

Robin Parmar (author) 16 years, 8 months ago  # | flag

Simplifying. The basic function can be a one-liner, though this might be a little cryptic:

return ''.join([choice('aeiou' if i%2 else 'bcdfghklmnprstvw') for i in range(8)])

Robin Parmar (author) 16 years, 8 months ago  # | flag

I agree. Will do that.

Robin Parmar (author) 16 years, 8 months ago  # | flag

I've updated the code. This version requires Python 2.5 as it uses conditional expressions.

Denis Barmenkov 16 years, 8 months ago  # | flag

hidding used_ids, make life easier ;). I wrote similar functions early and think that user dont want to manage used id list manually (in case of generating one global id set):

...
__list_of_used_ids=list()

def GetUniqueFriendlyID():
    """
        Return an ID that is not in our list of already used IDs.
    """
    global __list_of_used_ids

    # trying infinitely is a bad idea
    LIMIT = 1000

    id = ''
    count = 0
    while count I wrote similar functions early and think that user dont want to manage used id list manually (in case of generating one global id set):
<pre>...
__list_of_used_ids=list()

def GetUniqueFriendlyID():
    """
        Return an ID that is not in our list of already used IDs.
    """
    global __list_of_used_ids

    # trying infinitely is a bad idea
    LIMIT = 1000

    id = ''
    count = 0
    while count

</pre>

Robin Parmar (author) 16 years, 8 months ago  # | flag

Whether a global makes life easier or not is implementation dependent. In the cases I normally use this recipe, the IDs are not saved in any sort of Python structure. Rather they must be looked up in a database (eg: which IDs have already been used for keys) or on disk (eg: which IDs have been used for file names). In these cases the use of the ID must be bound up in getting a new ID, in order to avoid race conditions.

N N 16 years, 8 months ago  # | flag

Not much longer and no conditional expressions (compatible with older Python). ''.join([choice('bcdfghklmnprstvw')+choice('aeiou') for i in range(4)])

Created by Robin Parmar on Wed, 8 Aug 2007 (PSF)
Python recipes (4591)
Robin Parmar's recipes (9)

Required Modules

Other Information and Tasks