Welcome, guest | Sign In | My Account | Store | Cart

Sometimes for testing purposes you need to fill a database with randomly generated user names. Or maybe you're just offering distinguishable anonymity to users for whatever reason. Or maybe your product needs a codename! This describes a very simple way to get a bunch of "names".

Python, 19 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import random

names_file = file('/etc/dictionaries-common/words')
num_dict_lines = 9900            # A-Z, no apostrophes, approximate!
bytes = num_dict_lines * 10 * 8  # lines * avg word len * bytes/char
rand_words = [ln for ln in names_file.readlines(bytes) if "'" not in ln]
names_file.close()

def gen_name():
    idx = random.randint(2, num_dict_lines)
    username = rand_words[idx]
    #print 'last:', rand_words[num_dict_lines]
    return username.strip()

# Generate a few samples.
for i in range(3):
    print gen_name(),

# Printed: Sister Frankfort Babbitt

The simple reason this works is that a dictionary file has capitalized proper names listed at the top of the file. This script simply grabs the top-most lines and assumes that they're names (which is generally true, but you'll see exceptions).

Assumptions and limitations:

  • you want it to be fast and local and don't care much about relevance/accuracy of names
  • hard-coded for Ubuntu word dictionary (adjust for location of yours)
  • your word dictionary may be shorter or longer (but it's of little consequence)
  • not all generated words are proper names
  • you could be more accurate by slurping the whole file and grabbing only capitalized words, but it would be slower

2 comments

sebastien.renard 15 years, 5 months ago  # | flag

Hello,

Why do you bother about number of lines and words lenght ?

names_file.readlines() is ok, it reads all file and return it as list of str.

Micah Elliott (author) 15 years, 5 months ago  # | flag

I don't want to read the whole file; just the first 20%, which contains the names. The other 80,000 lines are just words. It would probably be cleaner to just call readline 18,000 times.

Created by Micah Elliott on Tue, 28 Oct 2008 (MIT)
Python recipes (4591)
Micah Elliott's recipes (3)

Required Modules

Other Information and Tasks