Code for fetching Anagrams out of any given file that contains words seperated by new lines.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | import sys, string
# returns a signature that will be the same
# for a whole class of anagrams
def getSignature( item ):
item_chars = []
for i in range(len(item)):
item_chars.append(item[i])
item_chars.sort()
return string.join(item_chars, "")
# goes through a list of strings and strips
# away the new lines at the end
def stripNewLines( list ):
for i in range(len(list)):
list[i] = list[i][0:len(list[i])-1]
return list
# sort a list by a given field
def sortBy(list, n):
nlist = map(lambda x, n=n: (x[n], x), list)
nlist.sort()
return map(lambda (key, x): x, nlist)
if ( len(sys.argv) != 2 ):
print "You must pass one input file to this program. That file"
print "should contain lowercase words seperated by new lines."
print "Example: python dict.txt"
sys.exit()
# prepare our file
inFile = ""
try:
inFile = open(sys.argv[1], "r" )
except IOError:
print "Error opening input file"
sys.exit()
# read the dictionary and make a new one
# with each word and its signature
dict = stripNewLines(inFile.readlines())
newDict = []
# go through the dictionary and and make
# a new one with signatures
for item in dict:
newDict.append([item, getSignature(item)])
# sort the new dictionary by signature
newDict = sortBy(newDict, 1)
finalDict = {}
# group the dictionary entries by anagram
for item in newDict:
if not (finalDict.has_key(item[1])):
finalDict[item[1]] = []
finalDict[item[1]].append(item[0])
# output only the anagrams
print "Anagrams Found:"
for group in finalDict.values():
if ( len(group) > 1 ):
print group
|
This solution is fairly quick. This code took about six seconds to run on my computer with a input file of over 50,000 words.
Please note that this example is case senstive.
The inspiration for this program comes from a anagram problem in "Programming Pearls" by Jon Bentley.
Tags: algorithms
Use a Dict? I've been enjoying Programming Pearls too.
I think a Python dictionary is a more natural structure for holding the anagram keys and associated word lists. Also, list comprehension is helpful assuming you're using Python 2.0 or later.
Simplify... Note that word[0:len(word)-1] == word[:-1].