Welcome, guest | Sign In | My Account | Store | Cart

Modeled after the key= argument to list.sort() and sorted().

Python, 63 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
from heapq import nlargest as _nlargest, nsmallest as _nsmallest
from __builtin__ import min as _min, max as _max
from itertools import tee, izip, imap, count
from operator import itemgetter

__all__ = ['nsmallest', 'nlargest', 'min', 'max']

def nsmallest(n, iterable, key=None):
    """Find the n smallest elements in a dataset.

    Equivalent to:  sorted(iterable, key=key)[:n]
    """
    if key is None:
        return _nsmallest(n, iterable)
    in1, in2 = tee(iterable)
    it = izip(imap(key, in1), count(), in2)                 # decorate
    result = _nsmallest(n, it)
    return map(itemgetter(2), result)                       # undecorate

def nlargest(n, iterable, key=None):
    """Find the n largest elements in a dataset.

    Equivalent to:  sorted(iterable, key=key, reverse=True)[:n]
    """
    if key is None:
        return _nlargest(n, iterable)
    in1, in2 = tee(iterable)
    it = izip(imap(key, in1), count(), in2)                 # decorate
    result = _nlargest(n, it)
    return map(itemgetter(2), result)                       # undecorate


def min(iterable, key=None):
    if key is None:
        return _min(iterable)
    it = iter(iterable)
    try:
        min_elem = it.next()
    except StopIteration:
        raise ValueError('min() arg is an empty sequence')
    min_k = key(min_elem)
    for elem in it:
        k = key(elem)
        if k < min_k:
            min_elem = elem
            min_k = k
    return min_elem

def max(iterable, key=None):
    if key is None:
        return _max(iterable)
    it = iter(iterable)
    try:
        max_elem = it.next()
    except StopIteration:
        raise ValueError('max() arg is an empty sequence')
    max_k = key(max_elem)
    for elem in it:
        k = key(elem)
        if k > max_k:
            max_elem = elem
            max_k = k
    return max_elem

The key= argument is especially useful with bag-style dictionaries where the values represent item counts:

nlargest(10, bag.iteritems(), key=attrgetter(1))   # ten most frequent keys

This code runs under Py2.4. Because it uses itertools, the extension runs at C speed (entirely avoiding the eval-loop). And, because iterators are used throughout, the extended functions retain the memory friendliness of the original functions.

The decorate step includes calls to count(). This serves as tie breaker when equal keys are encountered. As a result, the sorting is stable and there are no full record comparisons.

The new min() and max() functions use regular for-loops. Because the loops are so tight, the eval-loop overhead is small. Since overhead is less than the cost of decorating, comparing tuples, and undecorating, the itertools approach is not used here.