Welcome, guest | Sign In | My Account | Store | Cart

Sample generator using only r calls to random.random().

Python, 16 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
def sample(n, r):
    "Generate r randomly chosen, sorted integers from [0,n)"
    rand = random.random
    pop = n
    for samp in xrange(r, 0, -1):
        cumprob = 1.0
        x = rand()        
        while x < cumprob:
            cumprob -= cumprob * samp / pop
            pop -= 1
        yield n-pop-1


# Example call to select three samples in range(0,10)
>>> list(sample(10, 3))
[2, 7, 8]

Unlike random.sample() in Py2.3, this sampler requires no auxiliary memory and is guaranteed to make only r calls to random.random(), one for each sample.

Also, the results are returned in sorted order rather than selection order.

The downside is that the running time is proportional to O(n) instead of O(r).

This is was inspired by Knuth's Algorithm S in Seminumerical Algorithms section 3.4.2. The improvement here is that tracking a cumulative probability for each selection eliminates the need to make n calls to random.random().

A potential major improvement would be to find a direct formula giving the same result as the inner loop: given x, samp, and pop, compute the index of the first sample. Finding this formula would reduce the running time to O(r).