Welcome, guest | Sign In | My Account | Store | Cart

Simple generator accepts an iterable L and an integer N and yields a series of sub-generators, each of which will in turn yield N items from L.

Python, 30 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class groupcount(object):
    """Accept a (possibly infinite) iterable and yield a succession
    of sub-iterators from it, each of which will yield N values.

    >>> gc = groupcount('abcdefghij', 3)
    >>> for subgroup in gc:
    ...     for item in subgroup:
    ...             print item,
    ...     print
    ...
    a b c
    d e f
    g h i
    j
    """

    def __init__(self, iterable, n=10):
        self.it = iter(iterable)
        self.n = n

    def __iter__(self):
        return self

    def next(self):
        return self._group(self.it.next())

    def _group(self, ondeck):
        yield ondeck
        for i in xrange(1, self.n):
            yield self.it.next()

The task that prompted this recipe was to write out sitemap files (http://sitemaps.org/protocol.php) for a website containing 100,000 pages. Since the spec limits each sitemap.xml file to 10,000 urls, I needed to go through a list of 100,000 items and break it into 10 pages.

The itertools.groupby function provides a general solution by grouping an iterable according to a supplied function. Maybe it was just me, but using groupby to break up the input sequence by count became somewhat baroque: <pre> def countby(it, n=10): from itertools import groupby, imap grouped = groupby(enumerate(it), lambda x: int(x[0]/n)) counted = imap(lambda x:x[1], grouped) return imap(lambda x: imap(lambda y: y[1], x), counted) </pre> Referring to http://docs.python.org/lib/itertools-functions.html, I adapted the python-equivalent code shown for itertools.groupby.

The 'groupcount' generator delivers the same result in a more understandable (for me anyway) package.

2 comments

lotr py 16 years, 2 months ago  # | flag

here is another way. ref to Raymond Hettinger @ http://groups.google.com/group/comp.lang.python/browse_thread/thread/4696a3b3e1a6d691/ def grouper(n, iterable, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return izip([chain(iterable, repeat(padvalue, n-1))]n)

Wade Leftwich 14 years, 4 months ago  # | flag

We live, we learn. Someone else posted this much simpler way to accomplish the same task; I can't find his or her post now, or I would give proper credit.

from itertools import groupby, count

def batcher(seq, n): counter = count() return (y for (x,y) in groupby(iter(seq), lambda x: counter.next() // n))

>>> for batch in batcher('abcdefgh', 3):
    print list(batch)
...     
['a', 'b', 'c']
['d', 'e', 'f']
['g', 'h']

Note that each batch must be consumed before the next is retrieved

Created by Wade Leftwich on Tue, 19 Feb 2008 (PSF)
Python recipes (4591)
Wade Leftwich's recipes (2)

Required Modules

  • (none specified)

Other Information and Tasks