Provides a data structure for a queue of integers whose get() method returns the median element. The interface is similar to the standard Queue module, with an added method top() to retrieve the median without removing it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | import Queue as queue # replace with 'import queue' if using Python 3
class MedianHeap:
def __init__(self, numbers = None):
self.median = None
self.left = queue.PriorityQueue() # max-heap
self.right = queue.PriorityQueue() # min-heap
self.diff = 0 # difference in size between left and right
if numbers:
for n in numbers:
self.put(n)
def top(self):
return self.median
def put(self, n):
if not self.median:
self.median = n
elif n <= self.median:
self.left.put(-n)
self.diff += 1
else:
self.right.put(n)
self.diff -= 1
if self.diff > 1:
self.right.put(self.median)
self.median = -self.left.get()
self.diff = 0
elif self.diff < -1:
self.left.put(-self.median)
self.median = self.right.get()
self.diff = 0
def get(self):
median = self.median
if self.diff > 0:
self.median = -self.left.get()
self.diff -= 1
elif self.diff < 0:
self.median = self.right.get()
self.diff += 1
elif not self.left.empty():
self.median = -self.left.get()
self.diff -= 1
elif not self.right.empty():
self.median = self.right.get()
self.diff += 1
else: # median was the only element
self.median = None
return median
if __name__ == '__main__':
import sys
if len(sys.argv) > 1:
numbers = map(int, sys.argv[1:])
m = MedianHeap(numbers)
print "Median is ", m.get()
|
This is what I came up with, however there is another (I guess similar) method called a min-max-median heap which might be more efficient. In my implementation, the 'root' is the median of the set, while the left and right halves are stored respectively in a max-heap and min-heap (using Queue.PriorityQueue and a dirty trick on the sign of integers). Retrieving the median is constant-time, inserting a new element and removing the median both have logarithmic cost.
Works fine for integers:
But why is there an exception raised for floats?
It's because I'm parsing the command-line arguments as integers (look at the error code). You can replace "int" with "float" in line 61 and it should work fine.
I need to add the remove operation as well to this program like put. Any pointers how that can be accomplished?