This is a very simple wrapper of an iterator or iterable, such that the iterator can be iterated streamingly without generating all elements or any at all, but the object can still be iterated from the beginning as many times as wanted. In effect, it is a streamingly loaded list.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
import itertools class SavedIterable (object): """Wrap an iterable and cache it. The SavedIterable can be accessed streamingly, while still being incrementally cached. Later attempts to iterate it will access the whole of the sequence. When it has been cached to its full extent once, it reduces to a thin wrapper of a sequence iterator. The SavedIterable will pickle into a list. >>> s = SavedIterable(xrange(5)) >>> iter(s).next() 0 >>> list(s) [0, 1, 2, 3, 4] >>> iter(s) # doctest: +ELLIPSIS <listiterator object at 0x...> >>> import pickle >>> pickle.loads(pickle.dumps(s)) [0, 1, 2, 3, 4] >>> u = SavedIterable(xrange(5)) >>> one, two = iter(u), iter(u) >>> one.next(), two.next() (0, 0) >>> list(two) [1, 2, 3, 4] >>> list(one) [1, 2, 3, 4] >>> SavedIterable(range(3)) [0, 1, 2] """ def __new__(cls, iterable): if isinstance(iterable, list): return iterable return object.__new__(cls) def __init__(self, iterable): self.iterator = iter(iterable) self.data =  def __iter__(self): if self.iterator is None: return iter(self.data) return self._incremental_caching_iter() def _incremental_caching_iter(self): indices = itertools.count() while True: idx = indices.next() try: yield self.data[idx] except IndexError: pass else: continue if self.iterator is None: return try: x = self.iterator.next() self.data.append(x) yield x except StopIteration: self.iterator = None def __reduce__(self): # pickle into a list with __reduce__ # (callable, args, state, listitems) return (list, (), None, iter(self)) if __name__ == '__main__': import doctest doctest.testmod()
The recipe is as simple as possible and no simpler. It addresses performance by returning a list iterator directly when the iterable is fully cached.
For my application, I pickle the SavedIterable to a list, which has the advantage that the pickles do not rely on this caching datastructure at all. As another completely optional addition, the __new__ classmethod returns a list if a list is passed; this emphasises that the class is a simple replacement to "streamingly load" a sequence.
There seems to be a problem when one iterator exhausts the iterable and there are still other live iterators:
[Also, just a stylistic note: __new__ is a classmethod, and its first argument is called 'cls' usually, not 'self']
Thank you for your comment Gabriel, I realized this problem but haven't worried yet.
I think changing:
is enough to fix it.
That is addressed, but there is still a problem:
so it's not so simple as I thought it was.
I have fixed the recipe for concurrent iterators. We have to check both the cache and the iterator for that to work, so the _incremental_caching_iter has changed in its appearance. Less pretty, but the concept is still very simple.
Just a clarification: I didn't understand the problem first (even though I said so), so thank you Gabriel! This fixes a potential bug in my program.
Why not use itertools.tee() which does the caching for you?