ActiveState Code

Recipe 497007: zip_pad(), a lazy zip() that pads all but the longest iterable with None


On the rare occasion that you want to fill the sequences passed to zip() with a padding value, at least use something fast. You can optionally specify a padding value other than None.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
"""
>>> list(zip_pad([], [1], [1,2]))
[(None, 1, 1), (None, None, 2)]

>>> list(zip_pad([], [1], [1,2], pad=42))
[(42, 1, 1), (42, 42, 2)]

>>> list(zip_pad([], []))
[]

>>> list(zip_pad([1], [2]))
[(1, 2)]

>>> list(zip_pad([1,2], []))
[(1, None), (2, None)]

>>> list(zip_pad([1], [2]))
[(1, 2)]

>>> list(zip_pad([1,2], []))
[(1, None), (2, None)]

>>> list(zip_pad([1,2], [3,4]))
[(1, 3), (2, 4)]

>>> list(zip_pad([1,2], [10,20,30], [100,200,300,400]))
[(1, 10, 100), (2, 20, 200), (None, 30, 300), (None, None, 400)]
"""

from itertools import izip, chain

def zip_pad(*iterables, **kw):
    if kw:
        assert len(kw) == 1
        pad = kw["pad"]
    else:
        pad = None
    done = [len(iterables)-1]
    def pad_iter():
        if not done[0]:
            return
        done[0] -= 1
        while 1:
            yield pad
    iterables = [chain(seq, pad_iter()) for seq in iterables]
    return izip(*iterables)

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Discussion

The trick here is that the check whether all iterables are exhausted is performed only once per iterable, where a naive implementation would check once per iteration. Of course there are per-iteration checks, but these are hidden in chain()/izip() and profit from the itertools module's fast implementation in C.

This recipe is inspired by code written by Andrew Dalke, as posted on comp.lang.python: http://mail.python.org/pipermail/python-list/2005-July/292146.html.

Comments

  1. 1. At 3:05 a.m. on 5 sep 2006, Zoran Isailovski said:

    map is a simpler, though less general alternative. I'd like to mention that "map(None, *iterables)" already does the job for most cases (padding Nones). So, if the result does not have to be an iterable (i.e. I am not dealing with huge data amounts), I'd probably prefer using map.

    I am adding this, since you mentioned you wanted an alternative for zip that pads values and is fast. "map(None, *iterables)" is both, and is readily available with python.

    See also http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/410687

  2. 2. At 2:31 p.m. on 5 sep 2006, Raymond Hettinger said:

    izip_longest(). FWIW, here is my version of a padding zipper:

    def izip_longest(*args, **kwds):
        fillvalue = kwds.get('fillvalue')
        its = [chain(it, repeat(fillvalue)).next for it in args]
        term = [fillvalue] * len(args)
        while 1:
            result = [g() for g in its]
            if result == term: return
            yield tuple(result)
    
    
    print list(izip_longest('x', 'abc', 'ABCDEF', '1', fillvalue=999))
    
  3. 3. At 10:53 a.m. on 6 sep 2006, Peter Otten (the author) said:

    This may fail... as it relies on a non-unique sentinel:

    >>> list(izip_longest('x-', 'a-bc', 'A-BCDEF', '1', fillvalue="-"))
    [('x', 'a', 'A', '1')]
    
  4. 4. At 7:32 p.m. on 18 oct 2006, Raymond Hettinger said:

    A C-speed version using itertools.

    def izip_longest(*args, **kwds):
        ''' Alternate version of izip() that fills-in missing values rather than truncating
        to the length of the shortest iterable.  The fillvalue is specified as a keyword
        argument (defaulting to None if not specified).
    
        >>> list(izip_longest('a', 'def', 'ghi'))
        [('a', 'd', 'g'), (None, 'e', 'h'), (None, 'f', 'i')]
        >>> list(izip_longest('abc', 'def', 'ghi'))
        [('a', 'd', 'g'), ('b', 'e', 'h'), ('c', 'f', 'i')]
        >>> list(izip_longest('a', 'def', 'gh'))
        [('a', 'd', 'g'), (None, 'e', 'h'), (None, 'f', None)]
        '''
        fillvalue = kwds.get('fillvalue')
        def sentinel(counter=[fillvalue]*(len(args)-1)):
            yield counter.pop()     # raises IndexError when count hits zero
        iters = [chain(it, sentinel(), repeat(fillvalue)) for it in args]
        try:
            for tup in izip(*iters):
                yield tup
        except IndexError:
            pass
    

Sign in to comment