Provides a mergeiter() function that can merge two iterators into a single iterator. Uses generators, and guarantees constant memory use.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | #!/usr/bin/env python
"""An extended example of generators in action. Provides a function
called mergeiter that merges two iterators together.
Danny Yoo (dyoo@hkn.eecs.berkeley.edu)
"""
from __future__ import generators
def mergeiter(i1, i2, cmp=cmp):
"""Returns the "merge" of i1 and i2. i1 and i2 must be iteratable
objects, and we assume that i1 and i2 are both individually sorted.
"""
left, right = ExtendedIter(i1), ExtendedIter(i2)
while 1:
if not left.has_next():
while 1: yield ('r', right.next())
if not right.has_next():
while 1: yield ('l', left.next())
comparison = cmp(left.peek(), right.peek())
if comparison < 0:
yield ('l', left.next())
elif comparison == 0:
right.next() ; yield ('=', left.next())
else:
yield ('r', right.next())
class ExtendedIter:
"""An extended iterator that wraps around an existing iterators.
It provides extra methods:
has_next(): checks if we can still yield items.
peek(): returns the next element of our iterator, but doesn't
pass by it."""
def __init__(self, i):
self._myiter = iter(i)
self._next_element = None
self._has_next = 0
self._prime()
def has_next(self):
"""Returns true if we can call next() without raising a
StopException."""
return self._has_next
def peek(self):
"""Nonexhaustively returns the next element in our iterator."""
assert self.has_next()
return self._next_element
def next(self):
"""Returns the next element in our iterator."""
if not self._has_next:
raise StopIteration
result = self._next_element
self._prime()
return result
def _prime(self):
"""Private function to initialize the states of
self._next_element and self._has_next. We poke our
self._myiter to see if it's still alive and kicking."""
try:
self._next_element = self._myiter.next()
self._has_next = 1
except StopIteration:
self.next_element = None
self._has_next = 0
def _test():
for item in mergeiter([2, 4, 6, 8], [1, 3, 4, 7, 9, 10]):
print item
if __name__ == '__main__':
## _test()
import sys
f1, f2 = open(sys.argv[1]), open(sys.argv[2])
for item in mergeiter(f1, f2):
print item
|
An alternative approach to merging two sorted sequences into one sorted sequence can use dictionaries or lists. However, a container-based approach suffers if the input sequences are large, as it requires loading both into memory all at once. The merging approach takes advantage of the knowledge that both input sequences are already sorted, so we can avoid using a collection.
The code here evolved from discussions on the Python-Tutor mailing list:
http://mail.python.org/pipermail/tutor/2003-April/021955.html
http://mail.python.org/pipermail/tutor/2003-April/022085.html
There is another implementation of this task in the Python Cookbook:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/141934
that uses priority queues.