Welcome, guest | Sign In | My Account | Store | Cart

sort a sequence by applying decoration only as needed

Python, 57 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
    def lazyDSU_sort(seq, keys, warn=False):
        """Return sorted seq, breaking ties by applying keys only when needed.

        If ``warn`` is True then an error will be raised if there were no
        keys remaining to break ties.

        Examples
        ========

        Here, sequences are sorted by length, then sum:

        >>> seq, keys = [[[1, 2, 1], [0, 3, 1], [1, 1, 3], [2], [1]], [
        ...    lambda x: len(x),
        ...    lambda x: sum(x)]]
        ...
        >>> lazyDSU_sort(seq, keys)
        [[1], [2], [1, 2, 1], [0, 3, 1], [1, 1, 3]]

        If ``warn`` is True, an error will be raised if there were not
        enough keys to break ties:

        >>> lazyDSU_sort(seq, keys, warn=True)
        Traceback (most recent call last):
        ...
        ValueError: not enough keys to break ties


        Notes
        =====

        The decorated sort is one of the fastest ways to sort a sequence for
        which special item comparison is desired: the sequence is decorated,
        sorted on the basis of the decoration (e.g. making all letters lower
        case) and then undecorated. If one wants to break ties for items that
        have the same decorated value, a second key can be used. But if the
        second key is expensive to compute then it is inefficient to decorate
        all items with both keys: only those items having identical first key
        values need to be decorated. This function applies keys successively
        only when needed to break ties.
        """
        from collections import defaultdict
        d = defaultdict(list)
        keys = list(keys)
        f = keys.pop(0)
        for a in seq:
            d[f(a)].append(a)
        seq = []
        for k in sorted(d.keys()):
            if len(d[k]) > 1:
                if keys:
                    d[k] = lazyDSU_sort(d[k], keys, warn)
                elif warn:
                  raise ValueError('not enough keys to break ties')
                seq.extend(d[k])
            else:
                seq.append(d[k][0])
        return seq

One problem with the DSU (decorate, sort, undecorate) idiom for sorting is that one must calculate the decoration -- all decorations, if multiple are being used -- before one knows how deep of a decoration is needed to break ties. This recipe applies additional decoration only when needed.

See also this recipe for discussion of DSU sorting.

Created by Chris Smith on Wed, 12 Sep 2012 (MIT)
Python recipes (4591)
Chris Smith's recipes (3)

Required Modules

  • (none specified)

Other Information and Tasks