This data structure acts almost like a dictionary, with two modifications: First, D.smallest() returns the value x minimizing D[x]. For this to work correctly, all values D[x] stored in the dictionary must be comparable. Second, iterating "for x in D" finds and removes the items from D in sorted order. Each item is not removed until the next item is requested, so D[x] will still return a useful value until the next iteration of the for-loop.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | # Priority dictionary using binary heaps
# David Eppstein, UC Irvine, 8 Mar 2002
from __future__ import generators
class priorityDictionary(dict):
def __init__(self):
'''Initialize priorityDictionary by creating binary heap
of pairs (value,key). Note that changing or removing a dict entry will
not remove the old pair from the heap until it is found by smallest() or
until the heap is rebuilt.'''
self.__heap = []
dict.__init__(self)
def smallest(self):
'''Find smallest item after removing deleted items from heap.'''
if len(self) == 0:
raise IndexError, "smallest of empty priorityDictionary"
heap = self.__heap
while heap[0][1] not in self or self[heap[0][1]] != heap[0][0]:
lastItem = heap.pop()
insertionPoint = 0
while 1:
smallChild = 2*insertionPoint+1
if smallChild+1 < len(heap) and \
heap[smallChild] > heap[smallChild+1]:
smallChild += 1
if smallChild >= len(heap) or lastItem <= heap[smallChild]:
heap[insertionPoint] = lastItem
break
heap[insertionPoint] = heap[smallChild]
insertionPoint = smallChild
return heap[0][1]
def __iter__(self):
'''Create destructive sorted iterator of priorityDictionary.'''
def iterfn():
while len(self) > 0:
x = self.smallest()
yield x
del self[x]
return iterfn()
def __setitem__(self,key,val):
'''Change value stored in dictionary and add corresponding
pair to heap. Rebuilds the heap if the number of deleted items grows
too large, to avoid memory leakage.'''
dict.__setitem__(self,key,val)
heap = self.__heap
if len(heap) > 2 * len(self):
self.__heap = [(v,k) for k,v in self.iteritems()]
self.__heap.sort() # builtin sort likely faster than O(n) heapify
else:
newPair = (val,key)
insertionPoint = len(heap)
heap.append(None)
while insertionPoint > 0 and \
newPair < heap[(insertionPoint-1)//2]:
heap[insertionPoint] = heap[(insertionPoint-1)//2]
insertionPoint = (insertionPoint-1)//2
heap[insertionPoint] = newPair
def setdefault(self,key,val):
'''Reimplement setdefault to call our customized __setitem__.'''
if key not in self:
self[key] = val
return self[key]
|
This type of data structure is normally called a priority queue, and there are a couple of priority queue recipes already in the cookbook, but I feel that a dictionary is a better metaphor for the priority-change operations needed in typical applications of priority queues such as Dijkstra's algorithm.
The implementation here uses binary heaps, taking logarithmic amortized time per operation -- the time is amortized not worst-case because when a heap value becomes invalid, we leave it there to be cleaned up by a later call to smallest(), so a single call can do a lot of cleanup. The other recipes I saw used either sequential search (linear time find-min) or insertion into a sorted list (linear time updates). I tested a simpler variant of this data structure using sequential search instead of the heap, and found that the heap is better for heap sizes of 60 items or more.
Missing update(). Because of way deletions are handled lazily, the only dictionary operations that need to be overloaded are the ones that add new items. I thought I had covered them all, but I missed one:
Alternatively, one could use the dictionary mixin from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/117236 to make sure all dictionary methods are overloaded, but this would make the methods that only clear or read items unnecessarily inefficient.
Observations. I have found that using lists to hold the values is 10 to 20% faster than using a dictionary. I tend to do something like:
Which gives the ?best? of both worlds. This structure is very difficult to modify though so dictionaries might be appropriate for editable graphs.
I must admit that I am might be in the minority of preferring full blown graph structures. I tend to do a lot of funky matching on node and vertex objects and I find them easier to modify at a class level. This might also be a byproduct of dealing with chemistry related structures where nodes are atoms and edges are bonds.
My current standard is:
This means that when I am looking at an edge I can get to the nodes easily and vice versa. Plus I like writing things like:
Graph data structures. I'm not quite sure how a discussion of graph representation relates to this recipe, but I prefer dicts over lists for the inner part:
Modifiability is one reason, but more important to me is that this allows fast adjacency tests: 'C' in graph['A'] tests whether there is an edge 'A'-'C', not possible with your list-of-pairs representation, and for large graphs is faster with dicts than with lists. One can also easily loop over all neighbors of v (for w in graph[v]: ...) and associate useful information with the edge v-w (graph[v][w]).
thanks. Just wanted to let you know that I found this code very useful, after realizing I needed a full-on priority queue, this was a lot easier than writing my own (for like the 3rd time in as many languages).
i couldn't find any papers of yours describing this code (I understand from a CS perspective writing code isn't very paper-worthy) so I cited this as a webpage, you can see the paper here:
why don't you use the functions in Python's
heapq
module to manage the heap?An updated (well, 4 years old) version that uses
heapq
is available atRecipe 522995
.David,
I would like to use Matteo's derivative of your code in an AGPL licensed project I maintain. Unfortunately though, anything that says licensed under the PSF is licensed pretty ambiguously such that I'm sure anybody could make the case that the code is not freely licensed at all.
Because Matteo's code is a derivative of your own, you have a copyright claim on his work. So in order for him to relicense, you would have to first.
Would it be possible for you to relicense this code to remove the ambiguity, and so Matteo's code can be likewise relicensed? I suggested the unlicense to him, and I would suggest the same to you, unless of course there are particular rights you'd like to keep. I do not know if you wish to use the unlicense, but just in case I have prepared an example here. If you copy and paste the code there (minus my notice at the top) into a new gist you will have released your code under the unlicense.
I hope you have a great day,
John Sullivan
Ok, it's unlicensed at https://gist.github.com/4435950
Thank you, and happy new year!