Welcome, guest | Sign In | My Account | Store | Cart

Using zip(names, values) may inadvertently eat some of your data when there are, e. g., fewer values than names. This is easy to fix with assert len(names) == len(values) if the arguments' length is known, but not if they are arbitrary iterables. With zip_exc() no such glitches go unnoticed as list(zip_exc(names, values)) throws a LengthMismatch exception if the number of names and values differ.

Python, 69 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
"""
>>> list(zip_exc([]))
[]

>>> list(zip_exc((), (), ()))
[]

>>> list(zip_exc("abc", range(3)))
[('a', 0), ('b', 1), ('c', 2)]

>>> try:
...     list(zip_exc("", range(3)))
... except LengthMismatch:
...     print "mismatch"
mismatch

>>> try:
...     list(zip_exc(range(3), ()))
... except LengthMismatch:
...     print "mismatch"
mismatch

>>> try:
...     list(zip_exc(range(3), range(2), range(4)))
... except LengthMismatch:
...     print "mismatch"
mismatch

>>> items = zip_exc(range(3), range(2), range(4))
>>> items.next()
(0, 0, 0)
>>> items.next()
(1, 1, 1)
>>> try: items.next()
... except LengthMismatch: print "mismatch"
mismatch
"""

from itertools import chain, izip

class LengthMismatch(Exception):
    pass

def _throw():
    raise LengthMismatch
    yield None # unreachable

def _check(rest):
    for i in rest:
        try:
            i.next()
        except LengthMismatch:
            pass
        else:
            raise LengthMismatch
    return
    yield None # unreachable

def zip_exc(*iterables):
    """Like itertools.izip(), but throws a LengthMismatch exception if
    the iterables' lengths differ.
    """
    rest = [chain(i, _throw()) for i in iterables[1:]]
    first = chain(iterables[0], _check(rest))
    return izip(*[first] + rest)

if __name__ == "__main__":
    import doctest
    doctest.testmod()

My implementation looks a bit different than the straightforward approach used in http://mail.python.org/pipermail/python-3000/2006-March/000160.html, for example.

To keep the performance hit low, I've tried hard move as much of the work into code written in C (The chain() and izip() functions from the marvelous itertools module). I challenge you to come up with something faster in pure Python :-)

5 comments

Steven Bethard 17 years, 7 months ago  # | flag

Raymond Hettinger's solution. Raymond Hettinger as ususal provided a beautiful solution to this problem:

def izip_exact(*args):
    sentinel = object()
    iters = [chain(it, repeat(sentinel)) for it in args]
    for result in izip(*iters):
        if sentinel in result:
            if all(value==sentinel for value in result):
                return
            raise ValueError('sequences of different lengths')
        yield result

I believe the plan is to add something like this to the itertools recipes section in the not-so-distant future.

Peter Otten (author) 17 years, 7 months ago  # | flag

Mine is faster for "long" sequences. zip_exc() doesn't do expensive tests for every tuple. So I'm claiming "conceptual beauty", a term which I've just made up :-)

$ python2.5 -m timeit -s'from zips import zip_exc as zip; data = [range(1000)]3' 'for item in zip(data): pass'

1000 loops, best of 3: 255 usec per loop

$ python2.5 -m timeit -s'from zips import izip_exact as zip; data = [range(1000)]3' 'for item in zip(data): pass'

1000 loops, best of 3: 1.08e+03 usec per loop

On the other hand the setup for izip_exact() is a bit more lightweight. With three sequences, the break-even is at 20 items (i. e. for less than 20 3-tuples izip_exact() wins):

$ python2.5 -m timeit -s'from zips import izip_exact as zip; data = [range(20)]3' 'for item in zip(data): pass'

10000 loops, best of 3: 33.9 usec per loop

$ python2.5 -m timeit -s'from zips import zip_exc as zip; data = [range(20)]3' 'for item in zip(data): pass'

10000 loops, best of 3: 33.5 usec per loop

I used the release candidate of 2.5 for the measurements because it provides all() as a built-in.

PS: If you want to use Raymond's approach in 2.4 you can define

def all(items):
    return True in (True for item in items if item)
Raymond Hettinger 17 years, 7 months ago  # | flag

Alternate version without the O(n) sentinel search. If speed is the main concern, the O(n) sentinel search step can be replaced with a generator that tracks the number of calls to next():

def izip_exact(*args):
    count = []
    def counter():
        while 1:
            count.append(None)
            yield None
    sentinel = counter()
    iters = [chain(it, sentinel) for it in args]
    for result in izip(*iters):
        if count:
            if len(count) == len(args):
                return
            raise ValueError('sequences of different lengths')
        yield result
Raymond Hettinger 17 years, 7 months ago  # | flag

FYI, Peter's version is still the fastest.

Raymond Hettinger 17 years, 7 months ago  # | flag

Faster Py2.4 version of all().

def all(seq):
    for elem in ifilterfalse(None, seq):
        return False
    return True

Also, when using all() in the above recipe, use an "is" test instead of "==":

if all(value is sentinel for value in result):
    return