Welcome, guest | Sign In | My Account | Store | Cart

A string searching algorithm based upon Boyer-Moore string searching, which is considered one of the most efficient string searching algorithms. Boyer-Moore-Horspool only uses the bad-suffix window for matching and is therefore simpler to implement and faster than normal BM.

Python, 31 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# bmh.py
#
# An implementation of Boyer-Moore-Horspool string searching.
#
# This code is Public Domain.
#
def BoyerMooreHorspool(pattern, text):
    m = len(pattern)
    n = len(text)
    if m > n: return -1
    skip = []
    for k in range(256): skip.append(m)
    for k in range(m - 1): skip[ord(pattern[k])] = m - k - 1
    skip = tuple(skip)
    k = m - 1
    while k < n:
        j = m - 1; i = k
        while j >= 0 and text[i] == pattern[j]:
            j -= 1; i -= 1
        if j == -1: return i + 1
        k += skip[ord(text[k])]
    return -1

if __name__ == '__main__':
    text = "this is the string to search in"
    pattern = "the"
    s = BoyerMooreHorspool(pattern, text)
    print 'Text:',text
    print 'Pattern:',pattern
    if s > -1:
        print 'Pattern \"' + pattern + '\" found at position',s

This algorithm is more efficient than KMP and has low overhead to implement. It is one of the few string searching algorithms that balances memory consumption and speed very well. There have been many comparisons and studies on this and other string searching algorithms in the field and charts can be found which prove the usefulness of this algorithm. According to Moore himself, this algorithm gets faster the larger the pattern.

Created by Nelson Rush on Thu, 7 Mar 2002 (PSF)
Python recipes (4591)
Nelson Rush's recipes (8)

Required Modules

  • (none specified)

Other Information and Tasks