Minimizes the number of comparisons to compute the minimum and maximum of a dataset. Uses 1.5 compares for every element, improving on the more obvious solution that does 2 comparisons per element.
Python, 22 lines
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
import itertools def minmax(data): 'Computes the minimum and maximum values in one-pass using only 1.5*len(data) comparisons' it = iter(data) try: lo = hi = next(it) except StopIteration: raise ValueError('minmax() arg is an empty sequence') for x, y in itertools.izip_longest(it, it, fillvalue=lo): if x > y: x, y = y, x if x < lo: lo = x if y > hi: hi = y return lo, hi if __name__ == '__main__': import random data = [random.random() for i in range(1000)] print minmax(data)
Uses a two element presort to cut the search space for the min or the max in half.
The recipe is useful for lists of objects which have very expensive comparisons.
If the goal of the given code snippet is to improve performance of Python programs, then a much simpler solution using the build-in min and max functions would be both much faster and easier to read. E.g. using cProfile on the suggested function and the following alternative...
... I get the following running time when executed on a data array containing 10000000 elements.
Proposed solution: 5 function calls in 1.301 CPU seconds Native min max: 5 function calls in 0.877 CPU seconds
So even if this is a neat solution, it does not really make sense in Python when compared with the built-in functions.
Casper, I think you missed the point. The recipe demonstrates the algorithm for minimizing the number of comparisons. It is only helpful when comparisons are expensive (such as with decimal objects). Also, with tools like PyPy, the builtin functions have no advantage over the ones coded in pure python.
You are correct, in the case of very expensive comparisons this would make sense - must have missed the last line of your recipe (my bad). The "Fast min/max" title mislead me into thinking otherwise ^^
This recipe is useful even in the case of cheap comparisons. Because it only requires a single pass over the data, it can work with iterators where separate calls to min and max will fail.
If your data is large enough that converting to a list is expensive, minmax is useful. On my PC running CPython 2.6, the above minmax gets to within 8% of the speed of calling
for an iterator of 50000000 ints, and avoids building and garbage collecting a large temporary list.
You can use numpy instead. On my computer, from an array size of 10000000 I get min and max in 0.19s with numpy instead of 4.62 with your solution. Numpy benefits of a C implementation.
Compared to Python built-in
your proposal results in roughly
7%improvement in performance for a size
1e8 float data.