Welcome, guest | Sign In | My Account | Store | Cart

The following code shows why you would want to use a heap instead of a list and the basics of using heapq in python with some additions for python 2.4( I've cleaned this up from earlier to make it hopefully clearer.)

Python, 59 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
an unordered list of numbers
the_list=[903, 10, 35, 69, 933, 485, 519, 379, 102, 402, 883, 1]

#standard list.sort() techniques
#to get lowest element which is 1, sort and pop
the_list.sort()
print the_list.pop(0)
>>> 1

#if you get more data, you need to sort again before popping
the_list.append(0)
print the_list.pop(0)
>>> 10 -- oops not zero, didn't sort

#a heap solves this problem, by always being ordered
#to create a heap you push with heapq.heappush onto a list
import heapq
the_heap = []
for i in the_list: heapq.heappush(the_heap, i)
print the_heap
#note how zero is first, but the heap isn't fully ordered
>>> [0, 35, 102, 379, 69, 485, 519, 883, 903, 933, 402]

#if you add some more zeros, the fact that it is not fully sorted
#becomes more obvious, look at where the zeros are at

heapq.heappush(the_heap,0)
heapq.heappush(the_heap,0)
print the_heap
>>> [0, 35, 0, 379, 69, 0, 519, 883, 903, 933, 402, 485, 102]

#But, you will still get data back in an ordered way when you pop
print heapq.heappop(the_heap)
>>> 0
print heapq.heappop(the_heap)
>>> 0
print the_heap
>>> [0, 35, 102, 379, 69, 485, 519, 883, 903, 933, 402]

#The method heapreplace is a combination of a pop and a push.
#In this case the smallest element 0 is popped off and 200 is inserted at
#some other place into the heap
print heapq.heapreplace(the_heap, 200) 
>>> 0
print the_heap
>>> [35, 69, 102, 379, 200, 485, 519, 883, 903, 933, 402]

#Ask for 5 largest or smallest-- does an actual sort for this
print heapq.nlargest(5,the_heap) #[933, 903, 883, 519, 485]
print heapq.nsmallest(5,the_heap) #[35, 69, 102, 200, 379]

#Popping everything off of the heap will give sorted results
while 1:
    try:
        print heapq.heappop(the_heap),
    except IndexError:
        break

>>> 35 69 102 200 379 402 485 519 883 903 933

If you want ordered data you can pay for it when you retrieve the data or pay for it when you add the data. The list's sort method is used for retrieveing ordered data. After adding data to an unsorted list, you then sort it if you want to be sure you can get data from lowest to highest for example. Paying for it on retrieval means you have to constantly resort or do full searches whenever you add new data, if you want to, for example, get the lowest item.

A different approach is to pay for the sort when you add the data. One way of doing this is to use a heap, a type of binary tree which in python than ensures that the parent is always less than it's children. The heap in python is a list managed with the heapq library. Because it is only a heap, the list created in python isn't fully sorted, however you can be sure than when you pop from the list, you always get the lowest item. You do not have to do an explicit sort. For many things, that may be enough, and in return you gain the efficieny of the heap: no overhead to do an sort on retrieval of data and good performance with no bad worst case senarios.

A good reason to use this is if you have a long running queue with new data arriving and always want to be able to get the most important item off of the queue without having to constantly resort your data or do a full search. If your data is small enough or you do not very many adds, the list sort is very efficient and you may not notice the difference. Otherwise, the heap has some strong advantages.

Created by John Nielsen on Mon, 13 Sep 2004 (PSF)
Python recipes (4591)
John Nielsen's recipes (36)

Required Modules

  • (none specified)

Other Information and Tasks