Welcome, guest | Sign In | My Account | Store | Cart

On the rare occasion that you want to fill the sequences passed to zip() with a padding value, at least use something fast. You can optionally specify a padding value other than None.

Python, 50 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
"""
>>> list(zip_pad([], [1], [1,2]))
[(None, 1, 1), (None, None, 2)]

>>> list(zip_pad([], [1], [1,2], pad=42))
[(42, 1, 1), (42, 42, 2)]

>>> list(zip_pad([], []))
[]

>>> list(zip_pad([1], [2]))
[(1, 2)]

>>> list(zip_pad([1,2], []))
[(1, None), (2, None)]

>>> list(zip_pad([1], [2]))
[(1, 2)]

>>> list(zip_pad([1,2], []))
[(1, None), (2, None)]

>>> list(zip_pad([1,2], [3,4]))
[(1, 3), (2, 4)]

>>> list(zip_pad([1,2], [10,20,30], [100,200,300,400]))
[(1, 10, 100), (2, 20, 200), (None, 30, 300), (None, None, 400)]
"""

from itertools import izip, chain

def zip_pad(*iterables, **kw):
    if kw:
        assert len(kw) == 1
        pad = kw["pad"]
    else:
        pad = None
    done = [len(iterables)-1]
    def pad_iter():
        if not done[0]:
            return
        done[0] -= 1
        while 1:
            yield pad
    iterables = [chain(seq, pad_iter()) for seq in iterables]
    return izip(*iterables)

if __name__ == "__main__":
    import doctest
    doctest.testmod()

The trick here is that the check whether all iterables are exhausted is performed only once per iterable, where a naive implementation would check once per iteration. Of course there are per-iteration checks, but these are hidden in chain()/izip() and profit from the itertools module's fast implementation in C.

This recipe is inspired by code written by Andrew Dalke, as posted on comp.lang.python: http://mail.python.org/pipermail/python-list/2005-July/292146.html.

4 comments

Zoran Isailovski 17 years, 7 months ago  # | flag

map is a simpler, though less general alternative. I'd like to mention that "map(None, *iterables)" already does the job for most cases (padding Nones). So, if the result does not have to be an iterable (i.e. I am not dealing with huge data amounts), I'd probably prefer using map.

I am adding this, since you mentioned you wanted an alternative for zip that pads values and is fast. "map(None, *iterables)" is both, and is readily available with python.

See also http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/410687

Raymond Hettinger 17 years, 7 months ago  # | flag

izip_longest(). FWIW, here is my version of a padding zipper:

def izip_longest(*args, **kwds):
    fillvalue = kwds.get('fillvalue')
    its = [chain(it, repeat(fillvalue)).next for it in args]
    term = [fillvalue] * len(args)
    while 1:
        result = [g() for g in its]
        if result == term: return
        yield tuple(result)


print list(izip_longest('x', 'abc', 'ABCDEF', '1', fillvalue=999))
Peter Otten (author) 17 years, 7 months ago  # | flag

This may fail... as it relies on a non-unique sentinel:

>>> list(izip_longest('x-', 'a-bc', 'A-BCDEF', '1', fillvalue="-"))
[('x', 'a', 'A', '1')]
Raymond Hettinger 17 years, 6 months ago  # | flag

A C-speed version using itertools.

def izip_longest(*args, **kwds):
    ''' Alternate version of izip() that fills-in missing values rather than truncating
    to the length of the shortest iterable.  The fillvalue is specified as a keyword
    argument (defaulting to None if not specified).

    >>> list(izip_longest('a', 'def', 'ghi'))
    [('a', 'd', 'g'), (None, 'e', 'h'), (None, 'f', 'i')]
    >>> list(izip_longest('abc', 'def', 'ghi'))
    [('a', 'd', 'g'), ('b', 'e', 'h'), ('c', 'f', 'i')]
    >>> list(izip_longest('a', 'def', 'gh'))
    [('a', 'd', 'g'), (None, 'e', 'h'), (None, 'f', None)]
    '''
    fillvalue = kwds.get('fillvalue')
    def sentinel(counter=[fillvalue]*(len(args)-1)):
        yield counter.pop()     # raises IndexError when count hits zero
    iters = [chain(it, sentinel(), repeat(fillvalue)) for it in args]
    try:
        for tup in izip(*iters):
            yield tup
    except IndexError:
        pass