ActiveState Code

Recipe 466177: Randomize an iterator


Extremely simple/straightforward implementation.

Posted mainly to receive comments on alternative implementations.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
def irandomize(iterable, seed=None):
    """
    Randomize the iterable.

    Note: this evaluates the entire iterable to a sequence in memory
    """
    if seed is None:
        from random import shuffle
    else:
        from random import Random
        shuffle = Random(seed).shuffle
    result = list(iterable)
    shuffle(result)
    return iter(result)

if __name__ == "__main__":
    print list(irandomize(range(10)))
    print list(irandomize(range(10)))
    print list(irandomize(range(10), seed=14))
    print list(irandomize(range(10), seed=14))

Discussion

While randomizing an iterable can be useful, this implementation is very basic.

So, does anyone have better ideas? :)

Comments

  1. 1. At 4:17 p.m. on 8 jan 2006, Steven Bethard said:

    What do you gain by this? The list() call loads everything into memory, so pretending like it's still an iterator but calling iter() seems misleading to me....

  2. 2. At 2:04 a.m. on 9 jan 2006, Ori Peleg (the author) said:

    Fitting in an existing iterator-centric scheme. I agree that there is no implementation benefit to the iter() call, but it conforms to the interface where iterator modifiers are expected.

    The scheme involved has a 'source' iterator, and chains together several iterator-modifying functions to receive a final iterator. A configuration controls which modifying functions are used at run-time.

    I know this implementation is a problem. I'd love to hear suggestions for a better one...

  3. 3. At 3:47 p.m. on 19 jan 2006, Steven Bethard said:

    more iterator-like implementation. Well, this isn't as random as random.shuffle -- items near the beginning of the original iterable will likely be near the beginning of the resulting iterable -- but at least it maintains the memory behavior of iterators.

    def irandomize(iterable):
        iterable = iter(iterable)
        items = []
        try:
            while True:
                for i in xrange(random.randint(1, 10)):
                    items.append(iterable.next())
                random.shuffle(items)
                for i in xrange(random.randint(1, 10)):
                    if items:
                        yield items.pop()
                    else:
                        break
        except StopIteration:
            random.shuffle(items)
            for item in items:
                yield item
            raise StopIteration
    

    In worst-case behavior, it could load all items from the iterable into memory at once, but on the average I believe it should only be storing about 10 at a time.

  4. 4. At 1:59 a.m. on 22 jan 2006, Christopher Dunn said:

    Length of unknown iterator has no upper bound. If you want to guarantee that memory limits are never exceeded, you will have to specify an upper bound somewhere.

    I think the best way to tackle this is to provide a buffer-length argument to irandomize():

    def irandomize(..., bufsize=1000):
    

    etc. Once the buffer is full, you choose a random element, replace it with the next element from the iterator, and yield the chosen element. "Shuffle" is never needed.

Sign in to comment