Welcome, guest | Sign In | My Account | Store | Cart

How to merge several iterable sequences and keep things ordered.

Python, 45 lines
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45``` ```def xmerge(*ln): """ Iterator version of merge. Assuming l1, l2, l3...ln sorted sequences, return an iterator that yield all the items of l1, l2, l3...ln in ascending order. Input values doesn't need to be lists: any iterable sequence can be used. """ pqueue = [] for i in map(iter, ln): try: pqueue.append((i.next(), i.next)) except StopIteration: pass pqueue.sort() pqueue.reverse() X = max(0, len(pqueue) - 1) while X: d,f = pqueue.pop() yield d try: # Insort in reverse order to avoid pop(0) lo, hi, d = 0, X, f() while lo < hi: mid = (lo+hi)//2 if d > pqueue[mid][0]: hi = mid else: lo = mid+1 pqueue.insert(lo, (d,f)) except StopIteration: X-=1 if pqueue: d,f = pqueue[0] yield d try: while 1: yield f() except StopIteration:pass def merge(*ln): """ Merge several sorted sequences into a sorted list. Assuming l1, l2, l3...ln sorted sequences, return a list that contain all the items of l1, l2, l3...ln in ascending order. Input values doesn't need to be lists: any iterable sequence can be used. """ return list(xmerge(*ln)) ```

As said Guido, this not something you need very often, but I couldn't stay with 13 recipes anymore ;-)

Implementation use a priority queue to find the next shorter item. Iterators wich have no more items to provide are progressively removed from the queue. The size of this queue is the stop condition of the loop. cf p178 'Algorithms'(Robert Sedgewick) 2d edition ISBN:0-201-06673-4

Raymond Hettinger 19 years ago

In the future, no need to roll your own priority queue. Looking into the future, Py2.3 comes with a wonderfully efficient priority queue implementation in a module called heapq. Applying heapq simplifies and speeds the above code considerably.

``````from heapq import heapify, heappop, heapreplace

def xmerge(*ln):
pqueue = []
for i in map(iter, ln):
try:
pqueue.append((i.next(), i.next))
except StopIteration:
pass
heapify(pqueue)
while pqueue:
val, it = pqueue[0]
yield val
try:
heapreplace(pqueue, (it(), it))
except StopIteration:
heappop(pqueue)
``````
Vjacheslav Fyodorov 17 years ago

Recursive is faster. In spite of number of comparisons on random ordered sequences:

``````def imerge ( *ordered ) :
"""Merge ordered iterables.
"""
L = len(ordered)
if L == 0 : return ()
if L == 1 : return ordered[0]

if L == 2 : return imerge_two(ordered[0],ordered[1])
L /= 2
return imerge_two(imerge(*ordered[:L]),imerge(*ordered[L:]))
``````

Where imerge_two - real merging iterator.

Anand 15 years, 4 months ago

More readable. I was a bit stymied by the way iterators were used in the original code (frankly, I did not understand the code), I came up with one using heapq, but with easier to read, but producing the same effect. I am not sure of performance hit on large sequences due to the multiple maps, but the number of lines are reduced, which was my goal .

``````def xmerge2(*ln):

""" Uses a simplified syntax """

heap = []
map(lambda l: map(heap.append, l), ln)

while heap:
yield heappop(heap)
``````
Anand 15 years, 4 months ago

Using itertools. Using itertools, simplifies it further.

``````def xmerge3(*ln):

from itertools import chain

heap = []
for i in chain(*ln):
heap.append(i)

while heap:
yield heappop(heap)
``````
jeffamcgee 10 years, 11 months ago

There's now a method in heapq that does exactly what you want:

``````import heapq
l = heapq.merge((x*6 for x in xrange(10)),(x*x for x in xrange(5)))
print list(l)
``````

[0, 0, 1, 4, 6, 9, 12, 16, 18, 24, 30, 36, 42, 48, 54]

 Created by Sébastien Keim on Tue, 30 Jul 2002 (PSF)

### Tags

▶ Show machine tags (4)

### Required Modules

• (none specified)