Welcome, guest | Sign In | My Account | Store | Cart

Provides a data structure for a queue of integers whose get() method returns the median element. The interface is similar to the standard Queue module, with an added method top() to retrieve the median without removing it.

Python, 63 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import Queue as queue # replace with 'import queue' if using Python 3

class MedianHeap:
    def __init__(self, numbers = None):
        self.median = None
        self.left = queue.PriorityQueue() # max-heap
        self.right = queue.PriorityQueue() # min-heap
        self.diff = 0 # difference in size between left and right
        
        if numbers:
            for n in numbers:
                self.put(n)

    def top(self):
        return self.median

    def put(self, n):
        if not self.median:
            self.median = n
        elif n <= self.median:
            self.left.put(-n)
            self.diff += 1
        else:
            self.right.put(n)
            self.diff -= 1

        if self.diff > 1:
            self.right.put(self.median)
            self.median = -self.left.get()
            self.diff = 0
        elif self.diff < -1:
            self.left.put(-self.median)
            self.median = self.right.get()
            self.diff = 0

    def get(self):
        median = self.median

        if self.diff > 0:
            self.median = -self.left.get()
            self.diff -= 1
        elif self.diff < 0:
            self.median = self.right.get()
            self.diff += 1
        elif not self.left.empty():
            self.median = -self.left.get()
            self.diff -= 1
        elif not self.right.empty():
            self.median = self.right.get()
            self.diff += 1
        else: # median was the only element
            self.median = None
        
        return median

    
if __name__ == '__main__':
    import sys
    
    if len(sys.argv) > 1:
        numbers = map(int, sys.argv[1:])
        m = MedianHeap(numbers)
        print "Median is ", m.get()

This is what I came up with, however there is another (I guess similar) method called a min-max-median heap which might be more efficient. In my implementation, the 'root' is the median of the set, while the left and right halves are stored respectively in a max-heap and min-heap (using Queue.PriorityQueue and a dirty trick on the sign of integers). Retrieving the median is constant-time, inserting a new element and removing the median both have logarithmic cost.

3 comments

Andy Barbour 10 years, 7 months ago  # | flag

Works fine for integers:

% Medians.py 1 2 3 4 5 
   Median is  3

But why is there an exception raised for floats?

% Medians.py 1.1 2.2 3.3 4.4 5.5
   Traceback (most recent call last):
       File "Medians.py", line 77, in <module>
           numbers = map(int, sys.argv[1:])
   ValueError: invalid literal for int() with base 10: '1.1'
jimmy2times (author) 10 years, 7 months ago  # | flag

It's because I'm parsing the command-line arguments as integers (look at the error code). You can replace "int" with "float" in line 61 and it should work fine.

Ankit Jaiswal 8 years, 9 months ago  # | flag

I need to add the remove operation as well to this program like put. Any pointers how that can be accomplished?