Sometimes for testing purposes you need to fill a database with randomly generated user names. Or maybe you're just offering distinguishable anonymity to users for whatever reason. Or maybe your product needs a codename! This describes a very simple way to get a bunch of "names".
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
import random names_file = file('/etc/dictionaries-common/words') num_dict_lines = 9900 # A-Z, no apostrophes, approximate! bytes = num_dict_lines * 10 * 8 # lines * avg word len * bytes/char rand_words = [ln for ln in names_file.readlines(bytes) if "'" not in ln] names_file.close() def gen_name(): idx = random.randint(2, num_dict_lines) username = rand_words[idx] #print 'last:', rand_words[num_dict_lines] return username.strip() # Generate a few samples. for i in range(3): print gen_name(), # Printed: Sister Frankfort Babbitt
The simple reason this works is that a dictionary file has capitalized proper names listed at the top of the file. This script simply grabs the top-most lines and assumes that they're names (which is generally true, but you'll see exceptions).
Assumptions and limitations:
- you want it to be fast and local and don't care much about relevance/accuracy of names
- hard-coded for Ubuntu word dictionary (adjust for location of yours)
- your word dictionary may be shorter or longer (but it's of little consequence)
- not all generated words are proper names
- you could be more accurate by slurping the whole file and grabbing only capitalized words, but it would be slower
Why do you bother about number of lines and words lenght ?
names_file.readlines() is ok, it reads all file and return it as list of str.
I don't want to read the whole file; just the first 20%, which contains the names. The other 80,000 lines are just words. It would probably be cleaner to just call