With lists, it is common to test whether the list is empty and perform special code for the empty case. With iterators, this becomes awkward -- testing whether the iterator is empty will use up the first item! The solution is an idiom based on itertools.tee().
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # First, as a demo, we'll demonstrate the idiom with a list:
# Code here that creates a list 'my_list'
if not my_list:
# Code here for the case where the list is empty
else:
# Code here for the case where the list is NOT empty
# Now, we'll demonstrate how to do the same thing for iterators:
# (note that this must be an ITERATOR, not an ITERABLE. So
# it won't work correctly if 'my_iter' is of type list, but
# works fine if we use 'my_iter = iter(list)'. For more on the
# distinction between iterators and iterables, see the docs.
import itertools
# code here that creates an iterator 'my_iter'
try:
first = my_iter.next()
except StopIteration:
# Code here for the case where the iterator is empty
else:
my_iter = itertools.chain([first], my_iter)
# Code here for the case where the iterator is NOT empty
|
It is slightly awkward to encapsulate this in a helper function ('is_empty'), but writing out the idiom directly is straightforward enough for any reader who knows what itertools.chain() does and knows that StopIteration is used to signal the end of an iterator.
Obviously, you may not need both branches. For example, if you need to do special processing when the iterator is empty, but then proceed with the normal handling, then the 'else' clause would be empty (just leave it out) and normal handling would be placed after the try-except statement.
Thanks and credit are due to Brian Roberts and Michele Simionato whose postings to c.l.py inspired me to post this recipe, and Fredrik Lundh who provided a better implementation.
Needs lots of memory for large iterators. Did you see the warning in the itertools documentation?
"Note, this member of the toolkit may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator is going to use most or all of the data before the other iterator, it is faster to use list() instead of tee(). New in version 2.4. "
With your code, 'tee' ends up building a rather large internal data structure which will never be used. With a 10000000-item iterator, Python needs about 170 megabytes to run an empty loop over the 'active' iterator.
Here's a more robust solution:
I posted the last comment, but it looks like the comment system doesn't like me. Let's see if this works better.
</F>
islice to the rescue? What's wrong with:
?
Bugaboo. Whoops. Change the islice call to islice (iterable,1)
Confucious say, never code after 3 AM. The "corrected" code consumes the first element of the iterator, which is exactly what is not wanted. I withdraw my submission in favor of the effbot's superior python-fu.
Of course, you're right! You're right, of course. Thanks. I'll change the recipe.
check the peek recipes. Note that this problem has a lot in common with trying to "peek" into an iterator (that is, trying to see the first element without removing it). Take a look at:
With this code, you could write it something like:
or
Still incorrect. Note, Fred's version does not use itertools.tee(). So, the first line of the recipe needs to be removed:
Okay, fixed that too. Gee... with the help of the entire Python community, perhaps we'll eventually wind up with a correct and efficient snippet of code. I guess that's what the Cookbook is all about.
Why use iterators in the first place? In my experience, code based on python iterators like this, is one of the less wise things to do. Python iterators are the wrong abstractions for almost anything other then element-by-element forward iteration. If you need random access (peeking the n-the value, which includes n=0), you obviously need another type of abstraction.
Why use iterators? If you believe that it is NEVER wise to use iterators, then I wonder where you are coming from... I find them a very useful tool for many situations. I tend to prefer them in nearly all cases where access will only be sequential (prefering lists when access will be random).
So if (to invent an example) the main task is to run through 1 million records in order and you've chosen to use an iterator because of that, you STILL have the problem of how to display the error message if there are NO records in the file. This recipe shows a simple idiom for solving THAT problem.
Pardon me? It is getting quite TIGHT here, especially for YOUR kind of WORDING...
Pardon me? (continued here due to lack of horizontal space in the actual thread above). So what do you mean by "you wonder where I'm coming from"? What does where I am coming from have to do with your recipe? Don't you think you are getting a bit too arrogant for someone who needed "the help of the entire Python community" to get 3+1/2 lines of moderate code working? Is DECENCY not a value where YOU come from, dude? How about COURTESY then?
Anyway, if you would have read my comment carefully before reacting, you wouldn't have missed the point this sadly. And the point was ABSTRACTION (not such a novel or rare concept in computer science, you know, some might EXPECT stuff like that in software). In most situations - be they invented or real - there is an early point when you can either choose the RIGHT abstraction, or blandly take a plain iterator. A wise choice however - even the very act of attempting it - requires understanding about the value of a good abstraction. Pity enough though, an abstraction presented still you have not.
To conclude: After reading your explanation I am more then convinced that this recipe should better be declared bad code smell: Ever needing it is a clear sign of looming design flaw. (I'll give you a hint: There is a difference between an iterable, which is a good abstraction for sequential access, and an iterator, which is a bad one. I will leave it up to you to conclude why.)