ActiveState Code

Recipe 141934: Merging sorted sequences


How to merge several iterable sequences and keep things ordered.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def xmerge(*ln):
     """ Iterator version of merge.
 
     Assuming l1, l2, l3...ln sorted sequences, return an iterator that
     yield all the items of l1, l2, l3...ln in ascending order.
     Input values doesn't need to be lists: any iterable sequence can be used.
     """
     pqueue = []
     for i in map(iter, ln):
         try:
             pqueue.append((i.next(), i.next))
         except StopIteration:
             pass
     pqueue.sort()
     pqueue.reverse()
     X = max(0, len(pqueue) - 1)
     while X:
         d,f = pqueue.pop()
         yield d
         try:
             # Insort in reverse order to avoid pop(0)
             lo, hi, d = 0, X, f()
             while lo < hi:
                 mid = (lo+hi)//2
                 if d > pqueue[mid][0]: hi = mid
                 else: lo = mid+1
             pqueue.insert(lo, (d,f))
         except StopIteration:
             X-=1
     if pqueue:
         d,f = pqueue[0]
         yield d
         try:
             while 1: yield f()
         except StopIteration:pass
 
 
def merge(*ln):
     """ Merge several sorted sequences into a sorted list.
     
     Assuming l1, l2, l3...ln sorted sequences, return a list that contain
     all the items of l1, l2, l3...ln in ascending order.
     Input values doesn't need to be lists: any iterable sequence can be used.
     """
     return list(xmerge(*ln))

Discussion

As said Guido, this not something you need very often, but I couldn't stay with 13 recipes anymore ;-)

Implementation use a priority queue to find the next shorter item. Iterators wich have no more items to provide are progressively removed from the queue. The size of this queue is the stop condition of the loop. cf p178 'Algorithms'(Robert Sedgewick) 2d edition ISBN:0-201-06673-4

Comments

  1. 1. At 1:25 a.m. on 11 nov 2002, Raymond Hettinger said:

    In the future, no need to roll your own priority queue. Looking into the future, Py2.3 comes with a wonderfully efficient priority queue implementation in a module called heapq. Applying heapq simplifies and speeds the above code considerably.

    from heapq import heapify, heappop, heapreplace
    
    def xmerge(*ln):
        pqueue = []
        for i in map(iter, ln):
            try:
                pqueue.append((i.next(), i.next))
            except StopIteration:
                pass
        heapify(pqueue)
        while pqueue:
            val, it = pqueue[0]
            yield val
            try:
                heapreplace(pqueue, (it(), it))
            except StopIteration:
                heappop(pqueue)
    
  2. 2. At 2:26 a.m. on 29 nov 2004, Vjacheslav Fyodorov said:

    Recursive is faster. In spite of number of comparisons on random ordered sequences:

    def imerge ( *ordered ) :
        """Merge ordered iterables.
        """
        L = len(ordered)
        if L == 0 : return ()
        if L == 1 : return ordered[0]
    
        if L == 2 : return imerge_two(ordered[0],ordered[1])
        L /= 2
        return imerge_two(imerge(*ordered[:L]),imerge(*ordered[L:]))
    

    Where imerge_two - real merging iterator.

  3. 3. At 1:42 a.m. on 16 jul 2006, Anand Balachandran Pillai said:

    More readable. I was a bit stymied by the way iterators were used in the original code (frankly, I did not understand the code), I came up with one using heapq, but with easier to read, but producing the same effect. I am not sure of performance hit on large sequences due to the multiple maps, but the number of lines are reduced, which was my goal .

    def xmerge2(*ln):
    
        """ Uses a simplified syntax """
    
        heap = []
        map(lambda l: map(heap.append, l), ln)
    
        while heap:
            yield heappop(heap)
    
  4. 4. At 2:34 a.m. on 16 jul 2006, Anand Balachandran Pillai said:

    Using itertools. Using itertools, simplifies it further.

    def xmerge3(*ln):
    
        from itertools import chain
    
        heap = []
        for i in chain(*ln):
            heap.append(i)
    
        while heap:
            yield heappop(heap)
    

Sign in to comment