Welcome, guest | Sign In | My Account | Store | Cart

Minimizes the number of comparisons to compute the minimum and maximum of a dataset. Uses 1.5 compares for every element, improving on the more obvious solution that does 2 comparisons per element.

Python, 22 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import itertools

def minmax(data):
    'Computes the minimum and maximum values in one-pass using only 1.5*len(data) comparisons'
    it = iter(data)
    try:
        lo = hi = next(it)
    except StopIteration:
        raise ValueError('minmax() arg is an empty sequence')
    for x, y in itertools.izip_longest(it, it, fillvalue=lo):
        if x > y:
            x, y = y, x
        if x < lo:
            lo = x
        if y > hi:
            hi = y
    return lo, hi

if __name__ == '__main__':
    import random
    data = [random.random() for i in range(1000)]
    print minmax(data)

Uses a two element presort to cut the search space for the min or the max in half.

The recipe is useful for lists of objects which have very expensive comparisons.

6 comments

Casper S. Jensen 12 years, 6 months ago  # | flag

If the goal of the given code snippet is to improve performance of Python programs, then a much simpler solution using the build-in min and max functions would be both much faster and easier to read. E.g. using cProfile on the suggested function and the following alternative...

def native(data):
    return min(data), max(data)

... I get the following running time when executed on a data array containing 10000000 elements.

Proposed solution: 5 function calls in 1.301 CPU seconds Native min max: 5 function calls in 0.877 CPU seconds

So even if this is a neat solution, it does not really make sense in Python when compared with the built-in functions.

Raymond Hettinger (author) 12 years, 6 months ago  # | flag

Casper, I think you missed the point. The recipe demonstrates the algorithm for minimizing the number of comparisons. It is only helpful when comparisons are expensive (such as with decimal objects). Also, with tools like PyPy, the builtin functions have no advantage over the ones coded in pure python.

Casper S. Jensen 12 years, 6 months ago  # | flag

You are correct, in the case of very expensive comparisons this would make sense - must have missed the last line of your recipe (my bad). The "Fast min/max" title mislead me into thinking otherwise ^^

Steven D'Aprano 12 years, 6 months ago  # | flag

This recipe is useful even in the case of cheap comparisons. Because it only requires a single pass over the data, it can work with iterators where separate calls to min and max will fail.

If your data is large enough that converting to a list is expensive, minmax is useful. On my PC running CPython 2.6, the above minmax gets to within 8% of the speed of calling

def minmax2(it):
    seq = list(it)
    return min(seq), max(seq)

for an iterator of 50000000 ints, and avoids building and garbage collecting a large temporary list.

François Boulogne 12 years, 5 months ago  # | flag

You can use numpy instead. On my computer, from an array size of 10000000 I get min and max in 0.19s with numpy instead of 4.62 with your solution. Numpy benefits of a C implementation.

Firstal Lastif 11 years, 4 months ago  # | flag

Compared to Python built-in min and max:

def minmax(data):
    return (min(data),max(data))

your proposal results in roughly 7% improvement in performance for a size 1e8 float data.