I came up with this when I tried to implement some autocompletion feature. The problem is to find a common substring (beginning from the start) for all strings in a given list.
I couldn't find an existing recipe that is doing this task, but I'm wondering if I just didn't look hard enough
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
def getcommonletters(strlist): return ''.join([x for x in zip(*strlist) \ if reduce(lambda a,b:(a == b) and a or None,x)]) def findcommonstart(strlist): strlist = strlist[:] prev = None while True: common = getcommonletters(strlist) if common == prev: break strlist.append(common) prev = common return getcommonletters(strlist)
The code works as follows (commonletters):
the list of strings is converted to a list of tuples. Each tuple contains one character from each word from the same position. This is done by the builtin 'zip' function. Example: zip('hippie','hippo') == \ (('h','h'),('i','i'),('p','p'),('p','p'),('i','o'))
the result is iterated over and every tuple is checked if its elements are identical. This is done by the (equally aclaimed and hated) reduce function. If this test is successful, one tupleelement is added to the resultlist.
Now, we have a string, that contains all letters that are identical in all strings for every position.
Unfortunatelly, this is not enough (otherwise, this would be a cool one liner). If the strings in question had only identical letters at the beginning, we were done. But consider the strings ('hip','hop'). The result would be 'hp', which is clearly not the common start of both strings.
But, if we start over with the resultstring added to the stringlist as long as the resultstring changes, we'll get the one and only true result (a proof of that is left to the reader :-)
PLEASE note that the above statement was wrong with the old codeversion: <pre> def findcommonstart(strlist): strlist = strlist[:] common = getcommonletters(strlist) strlist.append(common) return getcommonletters(strlist) </pre>
The lambda function inside the reduce function does the same as the following regular function (and is, of course, more readable):
<pre> def myeq(op1,op2): if op1 == op2: return op1 </pre>
Please note that the algorithm does work equally well for all iterable types, not only strings (but remove the 'join' first)