For taking k random samples (with replacement) from a population, where k may be greater than len(population).
1 2 3 4 5 6 7 8 9 10 11 12 | # credit author(s) of random.py
import random
def sample_wr(population, k):
"Chooses k random elements (with replacement) from a population"
n = len(population)
_random, _int = random.random, int # speed hack
result = [None] * k
for i in xrange(k):
j = _int(_random() * n)
result[i] = population[j]
return result
|
random.sample() lets you do random sampling without replacement. sample_wr() lets you sample with replacement.
Some simple examples:
tosses = sample_wr(('H', 'T'), 100) # simulate 100 coin tosses rolls = sample_wr(range(1,7), 100) # simulate 100 dice rolls
make a random string from 200 characters over a given alphabet
from string import letters as alphabet rstr = ''.join(sample_wr(alphabet, 200))
NOTE:
You could use
from random import choice sample = [choice(population) for i in xrange(k)]
but that is 2-4 times slower than sample_wr(population, k) for 10E3 <= k <= 10E6
from random import random n = len(population) sample = [population[int(random()*n)] for i in xrange(k)]
is better but still slower.
math.floor() beats int(). It is faster still to use _int=math.floor for this situation (rounding positive numbers downward).
Nix the previous comment. The float result still needs to be converted back to an integer at some point.
In Py2.4, pre-allocation no longer rules. In the next version of Python, list comprehensions have been super-optimized and cannot be beat by pre-allocating and using indices.
And you can go bit faster using itertools:
correction, accolades, and propositions. return [population[_int(_random() * n)] for i in ... ]
I've been following python-dev, so I'm aware of the optimizations you've been making. Congratulations on your results to date, and thank you for your time and efforts.
I wonder, do you suppose the developers would accept changing random.sample to allow for sampling with replacement?
replacement=False by default (backwards compatible)
random.sample(population, k, replacement=True)
Adding a replace=False option to random.sample. For several reasons, probably not.
The straight-forward list comp does the trick pretty well. Anything that someone can bang out without much thought is rarely a good candidate for building into the library unless the pattern is very general and the use cases very common. The reasoning is that it is typically easier to bang out a couple of lines than to learn and remember dozens of method variations. Part of the justification for the inclusion of sampling without replacement is that it took a great deal of skill and time to implement correctly.
The other issue is that there are plenty of use cases that just do not need the whole sample all a once (those are best served by a simple for-loop). In contrast, sampling without replacement requires some state memory between calls.
Also, adding a new method is typically preferred to adding a keyword switch (for example, see itertools ifilter() and ifilterfalse()). However, for the reasons listed above, the inclusion of this as a separate method is unlikely.
I put all of this here because it is useful to a wider audience. Feel free to email me for any further discussion.