Anagram Fetcher « Python recipes

Code for fetching Anagrams out of any given file that contains words seperated by new lines.

      import sys, string

# returns a signature that will be the same
# for a whole class of anagrams
def getSignature( item ):
	item_chars = []
	for i in range(len(item)):
		item_chars.append(item[i])
	item_chars.sort()
	return string.join(item_chars, "")

# goes through a list of strings and strips
# away the new lines at the end
def stripNewLines( list ):
	for i in range(len(list)):
		list[i] = list[i][0:len(list[i])-1]
		return list

# sort a list by a given field
def sortBy(list, n):
	nlist = map(lambda x, n=n: (x[n], x), list)
	nlist.sort()
	return map(lambda (key, x): x, nlist)  
 
if ( len(sys.argv) != 2 ):
	print "You must pass one input file to this program. That file"
	print "should contain lowercase words seperated by new lines."
	print "Example: python dict.txt"
	sys.exit()
 
# prepare our file
inFile = ""
try: 
	inFile = open(sys.argv[1], "r" )
except IOError:
	print "Error opening input file"
	sys.exit()

# read the dictionary and make a new one
# with each word and its signature 
dict = stripNewLines(inFile.readlines())
newDict = []

# go through the dictionary and and make
# a new one with signatures
for item in dict:
	newDict.append([item, getSignature(item)])
 
# sort the new dictionary by signature
newDict = sortBy(newDict, 1)

finalDict = {}

# group the dictionary entries by anagram
for item in newDict:
	if not (finalDict.has_key(item[1])):
		finalDict[item[1]] = [] 
		finalDict[item[1]].append(item[0])

# output only the anagrams
print "Anagrams Found:"

for group in finalDict.values():
	if ( len(group) > 1 ):
        	print group

      

This solution is fairly quick. This code took about six seconds to run on my computer with a input file of over 50,000 words.

Please note that this example is case senstive.

The inspiration for this program comes from a anagram problem in "Programming Pearls" by Jon Bentley.

Tags: algorithms

2 comments

Chris Gemignani 21 years, 11 months ago # | flag

Use a Dict? I've been enjoying Programming Pearls too.

I think a Python dictionary is a more natural structure for holding the anagram keys and associated word lists. Also, list comprehension is helpful assuming you're using Python 2.0 or later.

Dict={}
## list comprehension reads all the lines from inFile eliminating lines with only one char
## returns the full word and a list of letters suitable for key generation
for (word,letters) in [(word[0:len(word)-1],list(word[0:len(word)-1])) for word in inFile.readlines() if len(word)>1]:
    letters.sort()
    ## join the sorted letters to create a key
    ## Dict.setdefault returns [] if the key is not found
    Dict.setdefault(string.join(letters,""),[]).append(word)

## Use list comprehension and string joins to print formatted output
print "Anagrams found:\n"+string.join([string.join(Dict[key],", ") for key in Dict.keys() if len(Dict[key])>1],"\n")

Victor Kryukov 21 years, 3 months ago # | flag

Simplify... Note that word[0:len(word)-1] == word[:-1].

◄	Python recipes (4591)	►
◄	Mark Nenadov's recipes (12)	►

Anagram Fetcher (Python recipe) by Mark Nenadov
ActiveState Code (http://code.activestate.com/recipes/117237/)

2 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Anagram Fetcher (Python recipe) by Mark Nenadov ActiveState Code (http://code.activestate.com/recipes/117237/)

2 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Anagram Fetcher (Python recipe) by Mark Nenadov
ActiveState Code (http://code.activestate.com/recipes/117237/)