This recipe is yet another combination generator, with an extra twist: the generated combinations can be yielded in blocks of a given size. This comes useful (or necessary) when processing arbitrarily long or infinite iterables in bounded-size blocks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | from itertools import islice, repeat, izip, cycle
# If PEP 3102 is accepted, the signature would be
#def iterCombinations(*iterables, blocksize=1):
def iterCombinations(*iterables, **kwds):
'''Generates the combinations of the given iterables.
@returns: An iterator over all combinations of the C{iterables}. Each yielded
combination is a list of size C{len(iterables)} where the i-th element
is drawn from the i-th iterable. B{Important note:} This list should not
be modified between iterations; if you need to modify it, copy it to a
new container.
@param iterables: One or more arbitrary iterables.
@param kwds: Currently only 'blocksize' allowed.
@keyword blocksize: Determines the order of the yielded combinations. By
default (C{blocksize=1}), the first iterable corresponds to the most outer
loop (slowest change) and the last iterable corresponds to the most inner
loop (fastest change).
For larger blocksize, each iterable is first partitioned into consecutive
blocks of size C{blocksize} (except perhaps for the last block which may
be shorter). Then each combination is yielded by first iterating over
each block combination C{B := (B1,B2,..Bn)} and then yielding each
combination from B.
More generally, C{blocksize} can be an iterable so that different
C{iterables} can be partitioned by different block size. In this case,
C{blocksize} is repeated as many time as necessary to match C{len(iterables)}.
For instance::
iterCombinations(range(4),range(6),range(8),'xyz', blocksize=(2,3))
partitions C{range(4)} and C{range(8)} with blocksize=2, while C{range(6)}
and 'xyz' are partitioned with blocksize=3.
'''
combo = [None] * len(iterables)
blocksize = kwds.get('blocksize', 1)
if isinstance(blocksize, int):
sizes = repeat(blocksize)
else:
sizes = cycle(iter(blocksize))
block_lists = [list(_iterblocks(it,sz)) for it,sz in izip(iterables,sizes)]
for block_combo in _iterCombinations(block_lists, [None] * len(iterables)):
for _ in _iterCombinations(block_combo, combo):
yield combo
def _iterCombinations(groups, combo_list, index=0):
# generate recursively all combinations of groups, updating combo_list
# *in-place* for each combination.
if index < len(groups)-1:
for x in groups[index]:
combo_list[index] = x
for _ in _iterCombinations(groups,combo_list,index+1):
yield combo_list
else: # optimization to avoid the last level of recursion
assert index == len(groups)-1
for x in groups[index]:
combo_list[index] = x
yield combo_list
def _iterblocks(iterable, blocksize, factory=tuple):
# split the iterable into blocks of blocksize
iterable = iter(iterable)
while True:
block = factory(islice(iterable,blocksize))
if not block: break
yield block
if len(block) < blocksize: break
# example
>>> for c in iterCombinations(range(5), 'abc'): print c
...
[0, 'a']
[0, 'b']
[0, 'c']
[1, 'a']
[1, 'b']
[1, 'c']
[2, 'a']
[2, 'b']
[2, 'c']
[3, 'a']
[3, 'b']
[3, 'c']
[4, 'a']
[4, 'b']
[4, 'c']
>>> for c in iterCombinations(range(5), 'abc', blocksize=2): print c
...
[0, 'a']
[0, 'b']
[1, 'a']
[1, 'b']
[0, 'c']
[1, 'c']
[2, 'a']
[2, 'b']
[3, 'a']
[3, 'b']
[2, 'c']
[3, 'c']
[4, 'a']
[4, 'b']
[4, 'c']
|
There are already several recipes on generating combinations and permutations out of iterables. This one is an efficient implementation with an additional feature: being able to break the input iterables into blocks. The combinations are generated in two stages: at the first level, generate all block combinations, and at the second level generate all combinations from the members of a given block combination.
The motivation was a real-world scenario where each iterable to be combined consisted of file chunks. The total size of the files was too large to fit in memory, so the computation had to be performed in bounded-size blocks. Using this generator, the number of blocks that had to be in memory at the same time was bounded and as a result, the memory swapping was reduced significantly compared to the initial naive generation.