Welcome, guest | Sign In | My Account | Store | Cart

How to merge several iterable sequences and keep things ordered.

Python, 45 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def xmerge(*ln):
     """ Iterator version of merge.
 
     Assuming l1, l2, l3...ln sorted sequences, return an iterator that
     yield all the items of l1, l2, l3...ln in ascending order.
     Input values doesn't need to be lists: any iterable sequence can be used.
     """
     pqueue = []
     for i in map(iter, ln):
         try:
             pqueue.append((i.next(), i.next))
         except StopIteration:
             pass
     pqueue.sort()
     pqueue.reverse()
     X = max(0, len(pqueue) - 1)
     while X:
         d,f = pqueue.pop()
         yield d
         try:
             # Insort in reverse order to avoid pop(0)
             lo, hi, d = 0, X, f()
             while lo < hi:
                 mid = (lo+hi)//2
                 if d > pqueue[mid][0]: hi = mid
                 else: lo = mid+1
             pqueue.insert(lo, (d,f))
         except StopIteration:
             X-=1
     if pqueue:
         d,f = pqueue[0]
         yield d
         try:
             while 1: yield f()
         except StopIteration:pass
 
 
def merge(*ln):
     """ Merge several sorted sequences into a sorted list.
     
     Assuming l1, l2, l3...ln sorted sequences, return a list that contain
     all the items of l1, l2, l3...ln in ascending order.
     Input values doesn't need to be lists: any iterable sequence can be used.
     """
     return list(xmerge(*ln))

As said Guido, this not something you need very often, but I couldn't stay with 13 recipes anymore ;-)

Implementation use a priority queue to find the next shorter item. Iterators wich have no more items to provide are progressively removed from the queue. The size of this queue is the stop condition of the loop. cf p178 'Algorithms'(Robert Sedgewick) 2d edition ISBN:0-201-06673-4

5 comments

Raymond Hettinger 19 years ago  # | flag

In the future, no need to roll your own priority queue. Looking into the future, Py2.3 comes with a wonderfully efficient priority queue implementation in a module called heapq. Applying heapq simplifies and speeds the above code considerably.

from heapq import heapify, heappop, heapreplace

def xmerge(*ln):
    pqueue = []
    for i in map(iter, ln):
        try:
            pqueue.append((i.next(), i.next))
        except StopIteration:
            pass
    heapify(pqueue)
    while pqueue:
        val, it = pqueue[0]
        yield val
        try:
            heapreplace(pqueue, (it(), it))
        except StopIteration:
            heappop(pqueue)
Vjacheslav Fyodorov 17 years ago  # | flag

Recursive is faster. In spite of number of comparisons on random ordered sequences:

def imerge ( *ordered ) :
    """Merge ordered iterables.
    """
    L = len(ordered)
    if L == 0 : return ()
    if L == 1 : return ordered[0]

    if L == 2 : return imerge_two(ordered[0],ordered[1])
    L /= 2
    return imerge_two(imerge(*ordered[:L]),imerge(*ordered[L:]))

Where imerge_two - real merging iterator.

Anand 15 years, 4 months ago  # | flag

More readable. I was a bit stymied by the way iterators were used in the original code (frankly, I did not understand the code), I came up with one using heapq, but with easier to read, but producing the same effect. I am not sure of performance hit on large sequences due to the multiple maps, but the number of lines are reduced, which was my goal .

def xmerge2(*ln):

    """ Uses a simplified syntax """

    heap = []
    map(lambda l: map(heap.append, l), ln)

    while heap:
        yield heappop(heap)
Anand 15 years, 4 months ago  # | flag

Using itertools. Using itertools, simplifies it further.

def xmerge3(*ln):

    from itertools import chain

    heap = []
    for i in chain(*ln):
        heap.append(i)

    while heap:
        yield heappop(heap)
jeffamcgee 10 years, 11 months ago  # | flag

There's now a method in heapq that does exactly what you want:

import heapq
l = heapq.merge((x*6 for x in xrange(10)),(x*x for x in xrange(5)))
print list(l)

[0, 0, 1, 4, 6, 9, 12, 16, 18, 24, 30, 36, 42, 48, 54]

Created by Sébastien Keim on Tue, 30 Jul 2002 (PSF)
Python recipes (4591)
Sébastien Keim's recipes (24)
Python Cookbook Edition 2 (117)

Required Modules

  • (none specified)

Other Information and Tasks