fork of http://code.activestate.com/recipes/269554-select-the-nth-smallest-element/ O(n) quicksort style algorithm for looking up data based on rank order. Useful for finding medians, percentiles, quartiles, and deciles. Equivalent to [data[n] for n in positions] when the data is already sorted.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | import random
def select(data, positions, start=0, end=None):
    '''For every n in *positions* find nth rank ordered element in *data*
        inplace select'''
    if not end: end = len(data) - 1
    if end < start:
        return []
    if end == start:
        return [data[start]]
    pivot_rand_i = random.randrange(start,end)
    pivot_rand = data[pivot_rand_i] # get random pivot
    data[end], data[pivot_rand_i] = data[pivot_rand_i], data[end]
    pivot_i = start
    for i in xrange(start, end): # partitioning about the pivot
        if data[i] < pivot_rand:
            data[pivot_i], data[i] = data[i], data[pivot_i]
            pivot_i += 1
    data[end], data[pivot_i] = data[pivot_i], data[end]
    under_positions, over_positions, mid_positions = [],[],[]
    for position in positions:
        if position == pivot_i:
            mid_positions.append(position)
        elif position < pivot_i:
            under_positions.append(position)
        else:
            over_positions.append(position)
    result = []
    if len(under_positions) > 0:
        result.extend(select(data, under_positions, start, pivot_i-1))
    if len(mid_positions) > 0:
        result.extend([data[position] for position in mid_positions])
    if len(over_positions) > 0:
        result.extend(select(data, over_positions, pivot_i+1, end))
    return result
 | 
The input data can be any iterable.<pre></pre> The randomization of pivots makes the algorithm perform consistently even with unfavorable data orderings (the same kind that wreak havoc on quicksort). Makes approximately lg2(N) calls to random.choice().<pre></pre> Revised to include the pivot counts after David Eppstein pointed out that the originally posted algorithm ran slowly when all the inputs were equal.

 Download
Download Copy to clipboard
Copy to clipboard