Simple generator accepts an iterable L and an integer N and yields a series of sub-generators, each of which will in turn yield N items from L.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
class groupcount(object): """Accept a (possibly infinite) iterable and yield a succession of sub-iterators from it, each of which will yield N values. >>> gc = groupcount('abcdefghij', 3) >>> for subgroup in gc: ... for item in subgroup: ... print item, ... print ... a b c d e f g h i j """ def __init__(self, iterable, n=10): self.it = iter(iterable) self.n = n def __iter__(self): return self def next(self): return self._group(self.it.next()) def _group(self, ondeck): yield ondeck for i in xrange(1, self.n): yield self.it.next()
The task that prompted this recipe was to write out sitemap files (http://sitemaps.org/protocol.php) for a website containing 100,000 pages. Since the spec limits each sitemap.xml file to 10,000 urls, I needed to go through a list of 100,000 items and break it into 10 pages.
The itertools.groupby function provides a general solution by grouping an iterable according to a supplied function. Maybe it was just me, but using groupby to break up the input sequence by count became somewhat baroque: <pre> def countby(it, n=10): from itertools import groupby, imap grouped = groupby(enumerate(it), lambda x: int(x/n)) counted = imap(lambda x:x, grouped) return imap(lambda x: imap(lambda y: y, x), counted) </pre> Referring to http://docs.python.org/lib/itertools-functions.html, I adapted the python-equivalent code shown for itertools.groupby.
The 'groupcount' generator delivers the same result in a more understandable (for me anyway) package.
here is another way. ref to Raymond Hettinger @ http://groups.google.com/group/comp.lang.python/browse_thread/thread/4696a3b3e1a6d691/ def grouper(n, iterable, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return izip([chain(iterable, repeat(padvalue, n-1))]n)
We live, we learn. Someone else posted this much simpler way to accomplish the same task; I can't find his or her post now, or I would give proper credit.
from itertools import groupby, count
def batcher(seq, n): counter = count() return (y for (x,y) in groupby(iter(seq), lambda x: counter.next() // n))
Note that each batch must be consumed before the next is retrieved