ActiveState Code

Recipe 542194: Iterating over fixed size blocks


This recipe shows a generator that breaks an iterable into chunks of fixed size. It addresses the general use case of having to (or wanting to) constrain the number of items to be processed at a time, for example because of resource limitations. It can very easily wrap blocks of code that work on iterables: just replace <pre>process(all_items)</pre>with <pre>for some_items in iterblock(all_items, 100): process(some_items)</pre>

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from itertools import islice, chain, repeat

def iterblocks(iterable, size, **kwds):
    '''Break an iterable into blocks of a given size.

    The optional keyword parameters determine the type of each block and what to
    do if the last block has smaller size (by default return it as is).

    @keyword blocktype: A callable f(iterable) for generating each block (tuple
        by default).
    @keyword truncate: If true, drop the last block if its length is less than
        `size`.
    @keyword pad: If given, the last block is padded with this object so that
        is length becomes equal to `size`.

    @returns: An iterator over blocks of the iterable.

    >>> list(iterblocks(xrange(7), 3))
    [(0, 1, 2), (3, 4, 5), (6,)]
    >>> list(iterblocks(xrange(7), 3, truncate=True))
    [(0, 1, 2), (3, 4, 5)]
    >>> list(iterblocks(xrange(7), 3, pad=None))
    [(0, 1, 2), (3, 4, 5), (6, None, None)]
    >>> list(iterblocks('abcdefg', 3, pad='-', blocktype=''.join))
    ['abc', 'def', 'g--']
    '''
    truncate = kwds.get('truncate',False)
    blocktype = kwds.get('blocktype',tuple)
    if truncate and 'pad' in kwds:
        raise ValueError("'truncate' must be false if 'pad' is given")
    iterator = iter(iterable)
    while True:
        block = blocktype(islice(iterator,size))
        if not block:
            break
        if len(block) < size:
            if 'pad' in kwds:
                block = blocktype(chain(block, repeat(kwds['pad'],
                                                      size-len(block))))
            elif truncate:
                break
        yield block

Sign in to comment