Welcome, guest | Sign In | My Account | Store | Cart

Code for fetching Anagrams out of any given file that contains words seperated by new lines.

Python, 65 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
import sys, string

# returns a signature that will be the same
# for a whole class of anagrams
def getSignature( item ):
	item_chars = []
	for i in range(len(item)):
		item_chars.append(item[i])
	item_chars.sort()
	return string.join(item_chars, "")

# goes through a list of strings and strips
# away the new lines at the end
def stripNewLines( list ):
	for i in range(len(list)):
		list[i] = list[i][0:len(list[i])-1]
		return list

# sort a list by a given field
def sortBy(list, n):
	nlist = map(lambda x, n=n: (x[n], x), list)
	nlist.sort()
	return map(lambda (key, x): x, nlist)  
 
if ( len(sys.argv) != 2 ):
	print "You must pass one input file to this program. That file"
	print "should contain lowercase words seperated by new lines."
	print "Example: python dict.txt"
	sys.exit()
 
# prepare our file
inFile = ""
try: 
	inFile = open(sys.argv[1], "r" )
except IOError:
	print "Error opening input file"
	sys.exit()

# read the dictionary and make a new one
# with each word and its signature 
dict = stripNewLines(inFile.readlines())
newDict = []

# go through the dictionary and and make
# a new one with signatures
for item in dict:
	newDict.append([item, getSignature(item)])
 
# sort the new dictionary by signature
newDict = sortBy(newDict, 1)

finalDict = {}

# group the dictionary entries by anagram
for item in newDict:
	if not (finalDict.has_key(item[1])):
		finalDict[item[1]] = [] 
		finalDict[item[1]].append(item[0])

# output only the anagrams
print "Anagrams Found:"

for group in finalDict.values():
	if ( len(group) > 1 ):
        	print group

This solution is fairly quick. This code took about six seconds to run on my computer with a input file of over 50,000 words.

Please note that this example is case senstive.

The inspiration for this program comes from a anagram problem in "Programming Pearls" by Jon Bentley.

2 comments

Chris Gemignani 21 years, 10 months ago  # | flag

Use a Dict? I've been enjoying Programming Pearls too.

I think a Python dictionary is a more natural structure for holding the anagram keys and associated word lists. Also, list comprehension is helpful assuming you're using Python 2.0 or later.

Dict={}
## list comprehension reads all the lines from inFile eliminating lines with only one char
## returns the full word and a list of letters suitable for key generation
for (word,letters) in [(word[0:len(word)-1],list(word[0:len(word)-1])) for word in inFile.readlines() if len(word)>1]:
    letters.sort()
    ## join the sorted letters to create a key
    ## Dict.setdefault returns [] if the key is not found
    Dict.setdefault(string.join(letters,""),[]).append(word)

## Use list comprehension and string joins to print formatted output
print "Anagrams found:\n"+string.join([string.join(Dict[key],", ") for key in Dict.keys() if len(Dict[key])>1],"\n")
Victor Kryukov 21 years, 2 months ago  # | flag

Simplify... Note that word[0:len(word)-1] == word[:-1].

Created by Mark Nenadov on Thu, 14 Mar 2002 (PSF)
Python recipes (4591)
Mark Nenadov's recipes (12)

Required Modules

Other Information and Tasks