O(n) quicksort style algorithm for looking up data based on rank order. Useful for finding medians, percentiles, quartiles, and deciles. Equivalent to data[n] when the data is already sorted.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
import random def select(data, n): "Find the nth rank ordered element (the least value has rank 0)." data = list(data) if not 0 <= n < len(data): raise ValueError('not enough elements for the given rank') while True: pivot = random.choice(data) pcount = 0 under, over = ,  uappend, oappend = under.append, over.append for elem in data: if elem < pivot: uappend(elem) elif elem > pivot: oappend(elem) else: pcount += 1 if n < len(under): data = under elif n < len(under) + pcount: return pivot else: data = over n -= len(under) + pcount
The input data can be any iterable.<pre></pre> The randomization of pivots makes the algorithm perform consistently even with unfavorable data orderings (the same kind that wreak havoc on quicksort). Makes approximately lg2(N) calls to random.choice().<pre></pre> Revised to include the pivot counts after David Eppstein pointed out that the originally posted algorithm ran slowly when all the inputs were equal.