How to merge several iterable sequences and keep things ordered.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | def xmerge(*ln):
""" Iterator version of merge.
Assuming l1, l2, l3...ln sorted sequences, return an iterator that
yield all the items of l1, l2, l3...ln in ascending order.
Input values doesn't need to be lists: any iterable sequence can be used.
"""
pqueue = []
for i in map(iter, ln):
try:
pqueue.append((i.next(), i.next))
except StopIteration:
pass
pqueue.sort()
pqueue.reverse()
X = max(0, len(pqueue) - 1)
while X:
d,f = pqueue.pop()
yield d
try:
# Insort in reverse order to avoid pop(0)
lo, hi, d = 0, X, f()
while lo < hi:
mid = (lo+hi)//2
if d > pqueue[mid][0]: hi = mid
else: lo = mid+1
pqueue.insert(lo, (d,f))
except StopIteration:
X-=1
if pqueue:
d,f = pqueue[0]
yield d
try:
while 1: yield f()
except StopIteration:pass
def merge(*ln):
""" Merge several sorted sequences into a sorted list.
Assuming l1, l2, l3...ln sorted sequences, return a list that contain
all the items of l1, l2, l3...ln in ascending order.
Input values doesn't need to be lists: any iterable sequence can be used.
"""
return list(xmerge(*ln))
|
As said Guido, this not something you need very often, but I couldn't stay with 13 recipes anymore ;-)
Implementation use a priority queue to find the next shorter item. Iterators wich have no more items to provide are progressively removed from the queue. The size of this queue is the stop condition of the loop. cf p178 'Algorithms'(Robert Sedgewick) 2d edition ISBN:0-201-06673-4
In the future, no need to roll your own priority queue. Looking into the future, Py2.3 comes with a wonderfully efficient priority queue implementation in a module called heapq. Applying heapq simplifies and speeds the above code considerably.
Recursive is faster. In spite of number of comparisons on random ordered sequences:
Where imerge_two - real merging iterator.
More readable. I was a bit stymied by the way iterators were used in the original code (frankly, I did not understand the code), I came up with one using heapq, but with easier to read, but producing the same effect. I am not sure of performance hit on large sequences due to the multiple maps, but the number of lines are reduced, which was my goal .
Using itertools. Using itertools, simplifies it further.
There's now a method in heapq that does exactly what you want:
[0, 0, 1, 4, 6, 9, 12, 16, 18, 24, 30, 36, 42, 48, 54]