ActiveState Code

Recipe 306862: list permutation order indices


Given a list, find the indices used to get the elements from the list in sorted order

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Based on a post by Peter Otten on comp.lang.python 2004/09/04.
# This uses the 'key' option to sort and the 'sorted' function, both
# new in Python 2.4.

def permutation_indices(data):
     return sorted(range(len(data)), key = data.__getitem__)

if __name__ == "__main__":
     import sys
     chars = "Sing a song of six-pence."
     print chars
     print   "0         1         2     "
     print   "01234567890123456789012345"
     print
                    
     indices = permutation_indices(chars)
     for i in indices:
          print "%4s %r" % (i, chars[i])

# Sing a song of six-pence.
# 0         1         2     
# 01234567890123456789012345
# 
#    4 ' '
#    6 ' '
#   11 ' '
#   14 ' '
#   18 '-'
#   24 '.'
#    0 'S'
#    5 'a'
#   22 'c'
#   20 'e'
#   23 'e'
#   13 'f'
#    3 'g'
#   10 'g'
#    1 'i'
#   16 'i'
#    2 'n'
#    9 'n'
#   21 'n'
#    8 'o'
#   12 'o'
#   19 'p'
#    7 's'
#   15 's'
#   17 'x'

Discussion

Sometimes you need to know how a list would be sorted but not actually sort it. In the original context on comp.lang.python there was a histogram function which returned a list where the length of the list was the number of colors in the color table and the elements the repeat count of the corresponding color. To list the color values in order you need to find the position of the largest item in the histogram and get the color with the same index, then the second largest, etc.

The standard solution for this is the "DSU" (for Decorate-Sort-Undecorate), also called the Schwartzian transform from the Perl community. The 'key' parameter was added to sort in Python 2.4 to make it easier to do DSU. The key insight by Peter Otten is that the list to decorate isn't the input list, it's the indices list, and the input list's __getitem__ can be used to get the key for that index.

We benchmarked several other alternatives (see the thread "list conversion question") and this was the fastest and simplest.

Comments

  1. 1. At 3:10 a.m. on 3 oct 2004, Sridhar R said:

    O(n) access! Using indexing operation of lists is O(N). You should use dictionary instead.

    I myself went into trouble, when manipulating a graph of over 3,00,000 vertices!

  2. 2. At 5:22 a.m. on 4 oct 2004, Hamish Lawson said:

    I don't understand what's meant by the first sentence of the discussion.

  3. 3. At 2:42 a.m. on 6 oct 2004, B. Smith-Mannschott said:

    O(n) list indexing? I think not. TEST

    from time import time
    short = [1]*1000
    medium = [1]*100000
    long = [1]*10000000
    for L in [short, medium, long]:
        j = 0
        t = time()
        for i in xrange(1000000):
            _ = L[j]
        t1 = time()-t
        j = len(L)-1
        t = time()
        for i in xrange(1000000):
            _ = L[j]
        t2 = time()-t
        print "len=%08d, %f, %f" % (len(L), t1, t2)
    

    OUTPUT

    len=00001000, 1.901750, 1.935451
    len=00100000, 1.913299, 1.920121
    len=10000000, 1.960430, 1.937439
    

    It doesn't seem to be O(n) for either n = length of list, or n = index value. Indeed, it would be quite mystifying if it were so as Python "Lists" are not lists in the LISP or "Linked List" sense but rather a sort of array that grows as necessary.

    // Ben

  4. 4. At 5:03 a.m. on 6 oct 2004, B. Smith-Mannschott said:

    (Big "O" notation) expresses... ...the efficiency of an operation (for example, the time required, or memory consumed) as a function of some measure of the size of the input.

    O(n) means that the time require grows at the same rate as the size of the input. In this concrete example, it's unclear if the original poster meant n to reflect the size of the array or the index within the array being accessed.

    More information can be found at: http://en.wikipedia.org/wiki/Big_O_notation

  5. 5. At 9:32 a.m. on 6 oct 2004, Andrew Dalke (the author) said:

    fixed typos. "permutaton" -> "permutation" in the title, and fixed first sentence.

Sign in to comment