This recipe shows a generator that breaks an iterable into chunks of fixed size. It addresses the general use case of having to (or wanting to) constrain the number of items to be processed at a time, for example because of resource limitations. It can very easily wrap blocks of code that work on iterables: just replace <pre>process(all_items)</pre>with <pre>for some_items in iterblock(all_items, 100): process(some_items)</pre>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | from itertools import islice, chain, repeat
def iterblocks(iterable, size, **kwds):
'''Break an iterable into blocks of a given size.
The optional keyword parameters determine the type of each block and what to
do if the last block has smaller size (by default return it as is).
@keyword blocktype: A callable f(iterable) for generating each block (tuple
by default).
@keyword truncate: If true, drop the last block if its length is less than
`size`.
@keyword pad: If given, the last block is padded with this object so that
is length becomes equal to `size`.
@returns: An iterator over blocks of the iterable.
>>> list(iterblocks(xrange(7), 3))
[(0, 1, 2), (3, 4, 5), (6,)]
>>> list(iterblocks(xrange(7), 3, truncate=True))
[(0, 1, 2), (3, 4, 5)]
>>> list(iterblocks(xrange(7), 3, pad=None))
[(0, 1, 2), (3, 4, 5), (6, None, None)]
>>> list(iterblocks('abcdefg', 3, pad='-', blocktype=''.join))
['abc', 'def', 'g--']
'''
truncate = kwds.get('truncate',False)
blocktype = kwds.get('blocktype',tuple)
if truncate and 'pad' in kwds:
raise ValueError("'truncate' must be false if 'pad' is given")
iterator = iter(iterable)
while True:
block = blocktype(islice(iterator,size))
if not block:
break
if len(block) < size:
if 'pad' in kwds:
block = blocktype(chain(block, repeat(kwds['pad'],
size-len(block))))
elif truncate:
break
yield block
|
Why the manual processing of keyword arguments? You should instead define the function as:
... and when checking whether to pad, use
if pad is not _DONT_PAD
.Also, you have some unused imports at the top.