Welcome, guest | Sign In | My Account | Store | Cart

Given a list, find the indices used to get the elements from the list in sorted order

Python, 48 lines
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 # Based on a post by Peter Otten on comp.lang.python 2004/09/04. # This uses the 'key' option to sort and the 'sorted' function, both # new in Python 2.4. def permutation_indices(data): return sorted(range(len(data)), key = data.__getitem__) if __name__ == "__main__": import sys chars = "Sing a song of six-pence." print chars print "0 1 2 " print "01234567890123456789012345" print indices = permutation_indices(chars) for i in indices: print "%4s %r" % (i, chars[i]) # Sing a song of six-pence. # 0 1 2 # 01234567890123456789012345 # # 4 ' ' # 6 ' ' # 11 ' ' # 14 ' ' # 18 '-' # 24 '.' # 0 'S' # 5 'a' # 22 'c' # 20 'e' # 23 'e' # 13 'f' # 3 'g' # 10 'g' # 1 'i' # 16 'i' # 2 'n' # 9 'n' # 21 'n' # 8 'o' # 12 'o' # 19 'p' # 7 's' # 15 's' # 17 'x'

Sometimes you need to know how a list would be sorted but not actually sort it. In the original context on comp.lang.python there was a histogram function which returned a list where the length of the list was the number of colors in the color table and the elements the repeat count of the corresponding color. To list the color values in order you need to find the position of the largest item in the histogram and get the color with the same index, then the second largest, etc.

The standard solution for this is the "DSU" (for Decorate-Sort-Undecorate), also called the Schwartzian transform from the Perl community. The 'key' parameter was added to sort in Python 2.4 to make it easier to do DSU. The key insight by Peter Otten is that the list to decorate isn't the input list, it's the indices list, and the input list's __getitem__ can be used to get the key for that index.

We benchmarked several other alternatives (see the thread "list conversion question") and this was the fastest and simplest. Sridhar R 17 years, 3 months ago

O(n) access! Using indexing operation of lists is O(N). You should use dictionary instead.

I myself went into trouble, when manipulating a graph of over 3,00,000 vertices! Hamish Lawson 17 years, 3 months ago

I don't understand what's meant by the first sentence of the discussion. B. Smith-Mannschott 17 years, 3 months ago

O(n) list indexing? I think not. TEST

from time import time
short = *1000
medium = *100000
long = *10000000
for L in [short, medium, long]:
j = 0
t = time()
for i in xrange(1000000):
_ = L[j]
t1 = time()-t
j = len(L)-1
t = time()
for i in xrange(1000000):
_ = L[j]
t2 = time()-t
print "len=%08d, %f, %f" % (len(L), t1, t2)

OUTPUT

len=00001000, 1.901750, 1.935451
len=00100000, 1.913299, 1.920121
len=10000000, 1.960430, 1.937439

It doesn't seem to be O(n) for either n = length of list, or n = index value. Indeed, it would be quite mystifying if it were so as Python "Lists" are not lists in the LISP or "Linked List" sense but rather a sort of array that grows as necessary.

// Ben B. Smith-Mannschott 17 years, 3 months ago

(Big "O" notation) expresses... ...the efficiency of an operation (for example, the time required, or memory consumed) as a function of some measure of the size of the input.

O(n) means that the time require grows at the same rate as the size of the input. In this concrete example, it's unclear if the original poster meant n to reflect the size of the array or the index within the array being accessed.  Created by Andrew Dalke on Sat, 2 Oct 2004 (PSF)