ActiveState Code

Recipe 499314: Find All Indices of a SubString in a Given String


I needed a version of the string.index(sub) function which returns a list of indices of ALL occurances of a substring in the string.

Is there a better/shorter/more efficient way to do this? Please share.

Python
1
2
3
4
5
6
7
8
9
def allindices(string, sub, listindex, offset):
        #call as l = allindices(string, sub, [], 0)
	if (string.find(sub) == -1):
		return listindex
	else:
		offset = string.index(sub)+offset
		listindex.append(offset)
		string = string[(string.index(sub)+1):]
		return allindices(string, sub, listindex, offset+1)

Discussion

this can be used to do string.replaceAll(sub1, sub2) sort of thing.

Comments

  1. 1. At 4:22 a.m. on 14 dec 2006, Rogier Steehouder said:

    non-recursive. How about:

    def allindices(string, sub, listindex=[], offset=0):
        i = string.find(sub, offset)
        while i >= 0:
            listindex.append(i)
            i = string.find(sub, i + 1)
        return listindex
    

    I prefer non-recursive functions. Also, there is no need to copy the string, because find() and index() can search from an offset themselves.

  2. 2. At 6:36 a.m. on 14 dec 2006, Michael Foord said:

    No Need to Pass in Empty List or Offset. How about :

    def allindices(string, sub, offset=0, listindex=None): #call as l = allindices(string, sub) if listindex is None: listindex = [] if (string.find(sub) == -1): return listindex else: offset = string.index(sub)+offset listindex.append(offset) string = string[(string.index(sub)+1):] return allindices(string, sub, listindex, offset+1)

  3. 3. At 7:16 a.m. on 14 dec 2006, Graham Fawcett said:

    Don't reinvent the wheel. You should look at the "re" module in the standard library. To find all of the the starting positions of S in T:

    import re
    starts = [match.start() for match in re.finditer(re.escape(S), T)]
    

    Plus it supports substitutions, and lots of other goodness.

  4. 4. At 3:08 a.m. on 25 jan 2007, Kent Johnson said:

    finditer() won't find overlapping strings. The regular expression methods, including finditer(), find non-overlapping matches. To find overlapping matches you need a loop.

Sign in to comment