You want to get the items from a sequence (or other iterable) a batch at a time, including a short batch at the end if need be.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
from itertools import islice, chain def batch(iterable, size): sourceiter = iter(iterable) while True: batchiter = islice(sourceiter, size) yield chain([batchiter.next()], batchiter) seq = xrange(19) for batchiter in batch(seq, 3): print "Batch: ", for item in batchiter: print item, print Batch: 0 1 2 Batch: 3 4 5 Batch: 6 7 8 Batch: 9 10 11 Batch: 12 13 14 Batch: 15 16 17 Batch: 18
Since I wanted to be able to batch iterables that weren't materialized in memory or whose length was unknown, one of the goals with this recipe was that it should only require the source iterable to support iteration, rather than require it to support indexing or have a known length as in some other approaches to batching.
An earlier version of this recipe met that goal, but I was unhappy that it required memory for the list built for each batch. To keep memory requirements to a minimum what I needed to yield instead was a size-bounded iterator over the original iterable. That's exactly what is provided by the itertools module's islice function.
But knowing when we're done batching is the tricky part, as islice is happy to continue returning iterators (albeit zero-length ones) even when the source iterator is exhausted. The problem is that we can't tell in advance whether an iterator is of zero length. One approach would be to maintain an internal count for each iterator and terminate when the yielded iterator turned out after the fact to have a count of zero. But I'd prefer never to yield a zero-length iterator in the first place, as this makes things more complicated for the client. Python's iterators do not have something like a hasNext method, so the only way to know whether an iterator can produce any items is by actually trying to consume it, which is the approach used in the recipe. If the initial iteration fails then we know we are done. But if it succeeds, we yield the obtained value chained with the (partially consumed) islice object. Note that each batch should be entirely consumed before proceeding to the next one, otherwise you will get unexpected behaviour.