Python lists' .sort method is not guaranteed stable -- items that compare equal may or may not be in unchanged order. Ensuring stability is easy as one of the many application of the commom idiom decorate-sort-undecorate (aka "Schwartzian transform").
Python, 17 lines
"decorate-sort-undecorate" is a general and common idiom that allows very flexible and speedy sorting of Python sequences. An auxiliary list is first built (the 'decorate' step) where each item is made up of all sort-keys (in descending order of significance) of the corresponding item of the input sequence (must include all of the information in the whole corresponding item, and/or an index to it so we can fetch it back [or reconstruct it] in the third step). This is then sorted by its builtin sort method without arguments. Finally, the desired sorted-list is extracted/reconstructed by "undecorating" the now-sorted auxiliary-list. Steps 1 and 3 can be performed in several ways, with map, zip, list comprehensions, and explicit loops, all possible. [This idiom is also known as "Schwartzian transform" by analogy with a similar Perl idiom (which, however, implies using map and grep and performing the whole sequence inside one single statement)].
This 'naturally' supplies a sorted _copy_, but if the input-sequence is a list we can just assign to its "include everything" slice to get in-place effect.
This recipe specifically demonstrates using this to achieve a stable sort (where items that compare equal keep the same relative order in the result list as they had in the input sequence). For this specific task, passing an argument to the built-in sort method is no use. More generally, the decorate-sort-undecorate idiom could sometimes be replaced by passing a comparison function argument to sort, but the d-s-u idiom tends to be much faster.
The speed comes from maximally accelerating (by using no Python-coded function an argument to .sort) the O(N log N) part, which dominates sorting time for sequences of substantial length N; the decoration and undecoration steps are both O(N), thus contribute negligible runtime if N is large enough, and reasonably little runtime anyway even for many practical values of N.
Note "optional argument" _indices -- this is used to ensure a single copy of the needed xrange object is generated, at function-definition time -- it can then be reused to decorate-with-indices any argument sequence. See also the recipe "Parallel loop on index and sequence-item" for more on this and related usages.