Using zip(names, values) may inadvertently eat some of your data when there are, e. g., fewer values than names. This is easy to fix with assert len(names) == len(values) if the arguments' length is known, but not if they are arbitrary iterables. With zip_exc() no such glitches go unnoticed as list(zip_exc(names, values)) throws a LengthMismatch exception if the number of names and values differ.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | """
>>> list(zip_exc([]))
[]
>>> list(zip_exc((), (), ()))
[]
>>> list(zip_exc("abc", range(3)))
[('a', 0), ('b', 1), ('c', 2)]
>>> try:
... list(zip_exc("", range(3)))
... except LengthMismatch:
... print "mismatch"
mismatch
>>> try:
... list(zip_exc(range(3), ()))
... except LengthMismatch:
... print "mismatch"
mismatch
>>> try:
... list(zip_exc(range(3), range(2), range(4)))
... except LengthMismatch:
... print "mismatch"
mismatch
>>> items = zip_exc(range(3), range(2), range(4))
>>> items.next()
(0, 0, 0)
>>> items.next()
(1, 1, 1)
>>> try: items.next()
... except LengthMismatch: print "mismatch"
mismatch
"""
from itertools import chain, izip
class LengthMismatch(Exception):
pass
def _throw():
raise LengthMismatch
yield None # unreachable
def _check(rest):
for i in rest:
try:
i.next()
except LengthMismatch:
pass
else:
raise LengthMismatch
return
yield None # unreachable
def zip_exc(*iterables):
"""Like itertools.izip(), but throws a LengthMismatch exception if
the iterables' lengths differ.
"""
rest = [chain(i, _throw()) for i in iterables[1:]]
first = chain(iterables[0], _check(rest))
return izip(*[first] + rest)
if __name__ == "__main__":
import doctest
doctest.testmod()
|
My implementation looks a bit different than the straightforward approach used in http://mail.python.org/pipermail/python-3000/2006-March/000160.html, for example.
To keep the performance hit low, I've tried hard move as much of the work into code written in C (The chain() and izip() functions from the marvelous itertools module). I challenge you to come up with something faster in pure Python :-)
Raymond Hettinger's solution. Raymond Hettinger as ususal provided a beautiful solution to this problem:
I believe the plan is to add something like this to the itertools recipes section in the not-so-distant future.
Mine is faster for "long" sequences. zip_exc() doesn't do expensive tests for every tuple. So I'm claiming "conceptual beauty", a term which I've just made up :-)
$ python2.5 -m timeit -s'from zips import zip_exc as zip; data = [range(1000)]3' 'for item in zip(data): pass'
1000 loops, best of 3: 255 usec per loop
$ python2.5 -m timeit -s'from zips import izip_exact as zip; data = [range(1000)]3' 'for item in zip(data): pass'
1000 loops, best of 3: 1.08e+03 usec per loop
On the other hand the setup for izip_exact() is a bit more lightweight. With three sequences, the break-even is at 20 items (i. e. for less than 20 3-tuples izip_exact() wins):
$ python2.5 -m timeit -s'from zips import izip_exact as zip; data = [range(20)]3' 'for item in zip(data): pass'
10000 loops, best of 3: 33.9 usec per loop
$ python2.5 -m timeit -s'from zips import zip_exc as zip; data = [range(20)]3' 'for item in zip(data): pass'
10000 loops, best of 3: 33.5 usec per loop
I used the release candidate of 2.5 for the measurements because it provides all() as a built-in.
PS: If you want to use Raymond's approach in 2.4 you can define
Alternate version without the O(n) sentinel search. If speed is the main concern, the O(n) sentinel search step can be replaced with a generator that tracks the number of calls to next():
FYI, Peter's version is still the fastest.
Faster Py2.4 version of all().
Also, when using all() in the above recipe, use an "is" test instead of "==":