Simple generator accepts an iterable L and an integer N and yields a series of sub-generators, each of which will in turn yield N items from L.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | class groupcount(object):
"""Accept a (possibly infinite) iterable and yield a succession
of sub-iterators from it, each of which will yield N values.
>>> gc = groupcount('abcdefghij', 3)
>>> for subgroup in gc:
... for item in subgroup:
... print item,
... print
...
a b c
d e f
g h i
j
"""
def __init__(self, iterable, n=10):
self.it = iter(iterable)
self.n = n
def __iter__(self):
return self
def next(self):
return self._group(self.it.next())
def _group(self, ondeck):
yield ondeck
for i in xrange(1, self.n):
yield self.it.next()
|
The task that prompted this recipe was to write out sitemap files (http://sitemaps.org/protocol.php) for a website containing 100,000 pages. Since the spec limits each sitemap.xml file to 10,000 urls, I needed to go through a list of 100,000 items and break it into 10 pages.
The itertools.groupby function provides a general solution by grouping an iterable according to a supplied function. Maybe it was just me, but using groupby to break up the input sequence by count became somewhat baroque: <pre> def countby(it, n=10): from itertools import groupby, imap grouped = groupby(enumerate(it), lambda x: int(x[0]/n)) counted = imap(lambda x:x[1], grouped) return imap(lambda x: imap(lambda y: y[1], x), counted) </pre> Referring to http://docs.python.org/lib/itertools-functions.html, I adapted the python-equivalent code shown for itertools.groupby.
The 'groupcount' generator delivers the same result in a more understandable (for me anyway) package.
here is another way. ref to Raymond Hettinger @ http://groups.google.com/group/comp.lang.python/browse_thread/thread/4696a3b3e1a6d691/ def grouper(n, iterable, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return izip([chain(iterable, repeat(padvalue, n-1))]n)
We live, we learn. Someone else posted this much simpler way to accomplish the same task; I can't find his or her post now, or I would give proper credit.
from itertools import groupby, count
def batcher(seq, n): counter = count() return (y for (x,y) in groupby(iter(seq), lambda x: counter.next() // n))
Note that each batch must be consumed before the next is retrieved