Welcome, guest | Sign In | My Account | Store | Cart

Levenshtein-Distance (Python recipe) by Martin Schimmels
ActiveState Code (http://code.activestate.com/recipes/576874/)

calculates the Levenshtein-Distance of two strings.

see: http://en.wikipedia.org/wiki/Levenshtein_distance (english) and: http://de.wikipedia.org/wiki/Levenshtein-Distanz (deutsch)

        ''' Calculates the Levenshtein distance of 2 strings'''

  def printMatrix(m):
      print ' '
      for line in m:
          spTupel = ()
          breite = len(line)
          for column in line:
              spTupel = spTupel + (column, )
          print "%3i"*breite % spTupel

  s1 = raw_input('first word: ')
  s2 = raw_input('second word: ')

  def levenshtein(s1, s2):
    l1 = len(s1)
    l2 = len(s2)

    matrix = [range(l1 + 1)] * (l2 + 1)
    for zz in range(l2 + 1):
      matrix[zz] = range(zz,zz + l1 + 1)
    for zz in range(0,l2):
      for sz in range(0,l1):
        if s1[sz] == s2[zz]:
          matrix[zz+1][sz+1] = min(matrix[zz+1][sz] + 1, matrix[zz][sz+1] + 1, matrix[zz][sz])
        else:
          matrix[zz+1][sz+1] = min(matrix[zz+1][sz] + 1, matrix[zz][sz+1] + 1, matrix[zz][sz] + 1)
    print "That's the Levenshtein-Matrix:"
    printMatrix(matrix)
    return matrix[l2][l1]
      
  distance = levenshtein(s1, s2)		
  print 'The Levenshtein-Distance of ',s1, ' and ', s2, ' is ', distance

      

Tags: text

2 comments

Denis Barmenkov 14 years, 8 months ago # | flag

Wikipedia implementations:

English

http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Levenshtein_distance#Python

Russian

http://ru.wikibooks.org/wiki/%D0%A0%D0%B0%D1%81%D1%81%D1%82%D0%BE%D1%8F%D0%BD%D0%B8%D0%B5_%D0%9B%D0%B5%D0%B2%D0%B5%D0%BD%D1%88%D1%82%D0%B5%D0%B9%D0%BD%D0%B0#.D0.90.D0.BB.D0.B3.D0.BE.D1.80.D0.B8.D1.82.D0.BC_.D0.9B.D0.B5.D0.B2.D0.B5.D0.BD.D1.88.D1.82.D0.B5.D0.B9.D0.BD.D0.B0_.D0.BD.D0.B0_.D1.8F.D0.B7.D1.8B.D0.BA.D0.B5_Python

Chris S 10 years, 7 months ago # | flag

Ok This code is fantastic and counts the differences in my strings. The problem is that I can not find a way to print "what" it is comparing.

To be more clear. I am trying to compare two strings for phoneme error rates. What the matrix doesn't tell me, is WHICH fones ahve been substituted, delted or inserted for Which phonemes from the other string.

For example if I have

Expected Sentence- I have a baseball Obtained Sentence- I have the best ball Expected Phonemes- ei h a v ae b e s b a l Obtained Phonemes- ei h a v th i b e s t b a l

Ho can I get the LD to return the exact phonemes which were confused and whether they were substituted for another phoneme (and which one), deleted, or inserted?

Created by Martin Schimmels on Mon, 10 Aug 2009 (MIT)

◄	Python recipes (4591)	►
◄	Martin Schimmels's recipes (1)	►

Forks

Levenshtein, my love

Required Modules

(none specified)

Other Information and Tasks

Licensed under the MIT License
Viewed 33077 times
Forked 1 time
Revision 1

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Levenshtein-Distance (Python recipe) by Martin Schimmels ActiveState Code (http://code.activestate.com/recipes/576874/)