Welcome, guest | Sign In | My Account | Store | Cart

fork of http://code.activestate.com/recipes/269554-select-the-nth-smallest-element/ O(n) quicksort style algorithm for looking up data based on rank order. Useful for finding medians, percentiles, quartiles, and deciles. Equivalent to [data[n] for n in positions] when the data is already sorted.

Python, 37 lines
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37``` ```import random def select(data, positions, start=0, end=None): '''For every n in *positions* find nth rank ordered element in *data* inplace select''' if not end: end = len(data) - 1 if end < start: return [] if end == start: return [data[start]] pivot_rand_i = random.randrange(start,end) pivot_rand = data[pivot_rand_i] # get random pivot data[end], data[pivot_rand_i] = data[pivot_rand_i], data[end] pivot_i = start for i in xrange(start, end): # partitioning about the pivot if data[i] < pivot_rand: data[pivot_i], data[i] = data[i], data[pivot_i] pivot_i += 1 data[end], data[pivot_i] = data[pivot_i], data[end] under_positions, over_positions, mid_positions = [],[],[] for position in positions: if position == pivot_i: mid_positions.append(position) elif position < pivot_i: under_positions.append(position) else: over_positions.append(position) result = [] if len(under_positions) > 0: result.extend(select(data, under_positions, start, pivot_i-1)) if len(mid_positions) > 0: result.extend([data[position] for position in mid_positions]) if len(over_positions) > 0: result.extend(select(data, over_positions, pivot_i+1, end)) return result ```

The input data can be any iterable.<pre></pre> The randomization of pivots makes the algorithm perform consistently even with unfavorable data orderings (the same kind that wreak havoc on quicksort). Makes approximately lg2(N) calls to random.choice().<pre></pre> Revised to include the pivot counts after David Eppstein pointed out that the originally posted algorithm ran slowly when all the inputs were equal. Created by Teodor Kichatov on Tue, 30 Nov 2010 (PSF)