Randomize an iterator « Python recipes

Extremely simple/straightforward implementation.

Posted mainly to receive comments on alternative implementations.

      def irandomize(iterable, seed=None):
    """
    Randomize the iterable.

    Note: this evaluates the entire iterable to a sequence in memory
    """
    if seed is None:
        from random import shuffle
    else:
        from random import Random
        shuffle = Random(seed).shuffle
    result = list(iterable)
    shuffle(result)
    return iter(result)

if __name__ == "__main__":
    print list(irandomize(range(10)))
    print list(irandomize(range(10)))
    print list(irandomize(range(10), seed=14))
    print list(irandomize(range(10), seed=14))

      

While randomizing an iterable can be useful, this implementation is very basic.

So, does anyone have better ideas? :)

Tags: algorithms

4 comments

Steven Bethard 18 years, 3 months ago # | flag

What do you gain by this? The list() call loads everything into memory, so pretending like it's still an iterator but calling iter() seems misleading to me....

Ori Peleg (author) 18 years, 3 months ago # | flag

Fitting in an existing iterator-centric scheme. I agree that there is no implementation benefit to the iter() call, but it conforms to the interface where iterator modifiers are expected.

The scheme involved has a 'source' iterator, and chains together several iterator-modifying functions to receive a final iterator. A configuration controls which modifying functions are used at run-time.

I know this implementation is a problem. I'd love to hear suggestions for a better one...

Steven Bethard 18 years, 3 months ago # | flag

more iterator-like implementation. Well, this isn't as random as random.shuffle -- items near the beginning of the original iterable will likely be near the beginning of the resulting iterable -- but at least it maintains the memory behavior of iterators.

def irandomize(iterable):
    iterable = iter(iterable)
    items = []
    try:
        while True:
            for i in xrange(random.randint(1, 10)):
                items.append(iterable.next())
            random.shuffle(items)
            for i in xrange(random.randint(1, 10)):
                if items:
                    yield items.pop()
                else:
                    break
    except StopIteration:
        random.shuffle(items)
        for item in items:
            yield item
        raise StopIteration

In worst-case behavior, it could load all items from the iterable into memory at once, but on the average I believe it should only be storing about 10 at a time.

Christopher Dunn 18 years, 3 months ago # | flag

Length of unknown iterator has no upper bound. If you want to guarantee that memory limits are never exceeded, you will have to specify an upper bound somewhere.

I think the best way to tackle this is to provide a buffer-length argument to irandomize():

def irandomize(..., bufsize=1000):

etc. Once the buffer is full, you choose a random element, replace it with the next element from the iterator, and yield the chosen element. "Shuffle" is never needed.

◄	Python recipes (4591)	►
◄	Ori Peleg's recipes (15)	►

Randomize an iterator (Python recipe) by Ori Peleg
ActiveState Code (http://code.activestate.com/recipes/466177/)

4 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Randomize an iterator (Python recipe) by Ori Peleg ActiveState Code (http://code.activestate.com/recipes/466177/)

4 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Randomize an iterator (Python recipe) by Ori Peleg
ActiveState Code (http://code.activestate.com/recipes/466177/)