Welcome, guest | Sign In | My Account | Store | Cart

This recipe shows a generator that breaks an iterable into chunks of fixed size. It addresses the general use case of having to (or wanting to) constrain the number of items to be processed at a time, for example because of resource limitations. It can very easily wrap blocks of code that work on iterables: just replace <pre>process(all_items)</pre>with <pre>for some_items in iterblock(all_items, 100): process(some_items)</pre>

Python, 42 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from itertools import islice, chain, repeat

def iterblocks(iterable, size, **kwds):
    '''Break an iterable into blocks of a given size.

    The optional keyword parameters determine the type of each block and what to
    do if the last block has smaller size (by default return it as is).

    @keyword blocktype: A callable f(iterable) for generating each block (tuple
        by default).
    @keyword truncate: If true, drop the last block if its length is less than
        `size`.
    @keyword pad: If given, the last block is padded with this object so that
        is length becomes equal to `size`.

    @returns: An iterator over blocks of the iterable.

    >>> list(iterblocks(xrange(7), 3))
    [(0, 1, 2), (3, 4, 5), (6,)]
    >>> list(iterblocks(xrange(7), 3, truncate=True))
    [(0, 1, 2), (3, 4, 5)]
    >>> list(iterblocks(xrange(7), 3, pad=None))
    [(0, 1, 2), (3, 4, 5), (6, None, None)]
    >>> list(iterblocks('abcdefg', 3, pad='-', blocktype=''.join))
    ['abc', 'def', 'g--']
    '''
    truncate = kwds.get('truncate',False)
    blocktype = kwds.get('blocktype',tuple)
    if truncate and 'pad' in kwds:
        raise ValueError("'truncate' must be false if 'pad' is given")
    iterator = iter(iterable)
    while True:
        block = blocktype(islice(iterator,size))
        if not block:
            break
        if len(block) < size:
            if 'pad' in kwds:
                block = blocktype(chain(block, repeat(kwds['pad'],
                                                      size-len(block))))
            elif truncate:
                break
        yield block

1 comment

Tal Einat 12 years ago  # | flag

Why the manual processing of keyword arguments? You should instead define the function as:

_DONT_PAD = object() # sentinel value
def iterblocks(iterable, size, blocktype=tuple, truncate=False, pad=_DONT_PAD)

... and when checking whether to pad, use if pad is not _DONT_PAD.

Also, you have some unused imports at the top.