Welcome, guest | Sign In | My Account | Store | Cart
1

Python lists' .sort method is not guaranteed stable -- items that compare equal may or may not be in unchanged order. Ensuring stability is easy as one of the many application of the commom idiom decorate-sort-undecorate (aka "Schwartzian transform").

Python, 17 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
def stable_sorted_copy(alist, _indices=xrange(sys.maxint)):
    # the 'decorate' step: make a list such that each item
    # is the concatenation of sort-keys in order of decreasing
    # significance -- we'll sort this auxiliary-list
    decorated = zip(alist, _indices)

    # the 'sort' step: just builtin-sort the auxiliary list
    decorated.sort()

    # the 'undecorate' step: extract the items from the
    # decorated, and now correctly sorted, auxiliary list
    return [ item for item, index in decorated ]

def stable_sort_inplace(alist):
    # if "inplace" sorting is desired, simplest is to assign
    # to a slice-of-all-items of the original input list
    alist[:] = stable_sorted_copy(alist)

"decorate-sort-undecorate" is a general and common idiom that allows very flexible and speedy sorting of Python sequences. An auxiliary list is first built (the 'decorate' step) where each item is made up of all sort-keys (in descending order of significance) of the corresponding item of the input sequence (must include all of the information in the whole corresponding item, and/or an index to it so we can fetch it back [or reconstruct it] in the third step). This is then sorted by its builtin sort method without arguments. Finally, the desired sorted-list is extracted/reconstructed by "undecorating" the now-sorted auxiliary-list. Steps 1 and 3 can be performed in several ways, with map, zip, list comprehensions, and explicit loops, all possible. [This idiom is also known as "Schwartzian transform" by analogy with a similar Perl idiom (which, however, implies using map and grep and performing the whole sequence inside one single statement)].

This 'naturally' supplies a sorted _copy_, but if the input-sequence is a list we can just assign to its "include everything" slice to get in-place effect.

This recipe specifically demonstrates using this to achieve a stable sort (where items that compare equal keep the same relative order in the result list as they had in the input sequence). For this specific task, passing an argument to the built-in sort method is no use. More generally, the decorate-sort-undecorate idiom could sometimes be replaced by passing a comparison function argument to sort, but the d-s-u idiom tends to be much faster.

The speed comes from maximally accelerating (by using no Python-coded function an argument to .sort) the O(N log N) part, which dominates sorting time for sequences of substantial length N; the decoration and undecoration steps are both O(N), thus contribute negligible runtime if N is large enough, and reasonably little runtime anyway even for many practical values of N.

Note "optional argument" _indices -- this is used to ensure a single copy of the needed xrange object is generated, at function-definition time -- it can then be reused to decorate-with-indices any argument sequence. See also the recipe "Parallel loop on index and sequence-item" for more on this and related usages.

7 comments

David Goodger 15 years, 10 months ago  # | flag

why class Indices? Why go to the trouble of greating a local class "Indices" when this would do the same job more simply?

decorated = zip(alist, xrange(len(alist)))

Or was it simply out of habit? :-)

Dave Cross 13 years, 8 months ago  # | flag

Not a Schwartzian Transform. I just wanted to correct a couple of errors in your understanding of the Schwartzian Transform in Perl.

Firstly, the form of the Schwartzian Transform is

@sorted = map { ... } sort { ... } map { ... } @unsorted;

There is no use of "grep" as you imply.

Secondly, your example is actually of a more specialised version of the Schwartzian Transform where the middle sort has no associated code block

@sorted = map { ... } sort map { ... } @unsorted;

This special case is usually known as the Guttman-Rosler Transform after the people who first described in in their paper at http://www.sysarch.com/perl/sort_paper.html.

Of course, the original Schwartzian Transform was just a Perl translation of a very common Lisp idiom.

hth,

Dave...

Michael Palmer 12 years, 8 months ago  # | flag

Correction. There seems to be an error in this recipy. It should read like this:

def stable_sorted_copy(alist, _indices=xrange(sys.maxint)):
    decorated = zip(_indices, alist)
    decorated.sort()
    return [ item for index, item in decorated ]

Michael Palmer

Alex Martelli (author) 12 years, 6 months ago  # | flag

you MUST be joking, right, Michael...? It's pretty obvious that if the first items of decorated's pairs are the indices, as in your "correction", the whole function becomes nothing but a humorously complicated "no-op"! As it's unthinkable that you haven't even tried it out before posting, I do understand you're joking, but, don't do that, it might confuse some rank beginner. Presumably your point would be that this recipe's obsoleted by Python 2.3, whose built-in sort IS stable...

Alex

Nick Coghlan 12 years, 3 months ago  # | flag

Not needed for Python 2.3 or later. As Alex mentioned, list.sort in Python 2.3 is actually already stable.

Even better, as of Python 2.4, that artifact of the 2.3 implementation became a guaranteed property of the interpreter.

Connelly Barnes 9 years, 7 months ago  # | flag

How to avoid DSU. In Python 2.4 and above, one can avoid the DSU pattern by using the key argument to the sort() and sorted() builtin functions: simply pass a lambda function which computes the "decoration" part of DSU for the key argument.

Chris Smith 4 years, 6 months ago  # | flag

If you have multiple keys and want to apply them economically, see this recipe .

Add a comment

Sign in to comment

Created by Alex Martelli on Thu, 15 Mar 2001 (PSF)
Python recipes (4552)
Alex Martelli's recipes (27)

Required Modules

  • (none specified)

Other Information and Tasks