Welcome, guest | Sign In | My Account | Store | Cart

This data structure acts almost like a dictionary, with two modifications: First, D.smallest() returns the value x minimizing D[x]. For this to work correctly, all values D[x] stored in the dictionary must be comparable. Second, iterating "for x in D" finds and removes the items from D in sorted order. Each item is not removed until the next item is requested, so D[x] will still return a useful value until the next iteration of the for-loop.

Python, 67 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# Priority dictionary using binary heaps
# David Eppstein, UC Irvine, 8 Mar 2002

from __future__ import generators

class priorityDictionary(dict):
    def __init__(self):
        '''Initialize priorityDictionary by creating binary heap
of pairs (value,key).  Note that changing or removing a dict entry will
not remove the old pair from the heap until it is found by smallest() or
until the heap is rebuilt.'''
        self.__heap = []
        dict.__init__(self)

    def smallest(self):
        '''Find smallest item after removing deleted items from heap.'''
        if len(self) == 0:
            raise IndexError, "smallest of empty priorityDictionary"
        heap = self.__heap
        while heap[0][1] not in self or self[heap[0][1]] != heap[0][0]:
            lastItem = heap.pop()
            insertionPoint = 0
            while 1:
                smallChild = 2*insertionPoint+1
                if smallChild+1 < len(heap) and \
                        heap[smallChild] > heap[smallChild+1]:
                    smallChild += 1
                if smallChild >= len(heap) or lastItem <= heap[smallChild]:
                    heap[insertionPoint] = lastItem
                    break
                heap[insertionPoint] = heap[smallChild]
                insertionPoint = smallChild
        return heap[0][1]
	
    def __iter__(self):
        '''Create destructive sorted iterator of priorityDictionary.'''
        def iterfn():
            while len(self) > 0:
                x = self.smallest()
                yield x
                del self[x]
        return iterfn()
	
    def __setitem__(self,key,val):
        '''Change value stored in dictionary and add corresponding
pair to heap.  Rebuilds the heap if the number of deleted items grows
too large, to avoid memory leakage.'''
        dict.__setitem__(self,key,val)
        heap = self.__heap
        if len(heap) > 2 * len(self):
            self.__heap = [(v,k) for k,v in self.iteritems()]
            self.__heap.sort()  # builtin sort likely faster than O(n) heapify
        else:
            newPair = (val,key)
            insertionPoint = len(heap)
            heap.append(None)
            while insertionPoint > 0 and \
                    newPair < heap[(insertionPoint-1)//2]:
                heap[insertionPoint] = heap[(insertionPoint-1)//2]
                insertionPoint = (insertionPoint-1)//2
            heap[insertionPoint] = newPair
	
    def setdefault(self,key,val):
        '''Reimplement setdefault to call our customized __setitem__.'''
        if key not in self:
            self[key] = val
        return self[key]

This type of data structure is normally called a priority queue, and there are a couple of priority queue recipes already in the cookbook, but I feel that a dictionary is a better metaphor for the priority-change operations needed in typical applications of priority queues such as Dijkstra's algorithm.

The implementation here uses binary heaps, taking logarithmic amortized time per operation -- the time is amortized not worst-case because when a heap value becomes invalid, we leave it there to be cleaned up by a later call to smallest(), so a single call can do a lot of cleanup. The other recipes I saw used either sequential search (linear time find-min) or insertion into a sorted list (linear time updates). I tested a simpler variant of this data structure using sequential search instead of the heap, and found that the heap is better for heap sizes of 60 items or more.

9 comments

Missing update(). Because of way deletions are handled lazily, the only dictionary operations that need to be overloaded are the ones that add new items. I thought I had covered them all, but I missed one:

def update(self, other):
    for key in other.keys():
        self[key] = other[key]

Alternatively, one could use the dictionary mixin from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/117236 to make sure all dictionary methods are overloaded, but this would make the methods that only clear or read items unnecessarily inefficient.

Brian Kelley 21 years, 12 months ago  # | flag

Observations. I have found that using lists to hold the values is 10 to 20% faster than using a dictionary. I tend to do something like:

graph = {'A': [('B',3), ('C',5)],
         'B': [('C', 2), ('D', 2)],
         'C': [('D', 1)],
         'D': [('C', 3)],
         'E': [('F', 8)],
         'F': [('C', 2)]}

Which gives the ?best? of both worlds. This structure is very difficult to modify though so dictionaries might be appropriate for editable graphs.

I must admit that I am might be in the minority of preferring full blown graph structures. I tend to do a lot of funky matching on node and vertex objects and I find them easier to modify at a class level. This might also be a byproduct of dealing with chemistry related structures where nodes are atoms and edges are bonds.

My current standard is:

class Node:
    def __init__(self, label):
        self.label = label
        self.edges = []    # connected edges
        self.anodes = []   # adjacent nodes (in order of edge)

class Edge:
    def __init__(self, label, node1, node2):
        self.label = label
        self.nodes = [node1, node2] # an edge can only have two nodes

        node1.edges.append(self)
        node1.anodes.append(node2)
        node2.edges.append(self)
        node2.anodes.append(node1)

class Graph:
    def __init__(self, nodes, edges):
        self.nodes = nodes
        self.edges = edges

This means that when I am looking at an edge I can get to the nodes easily and vice versa. Plus I like writing things like:

for node in g.nodes:
    for edge in node.edges:
        pass

as opposed to (pre python 2.2)

for node in g.keys():
    for node2, edge2 in g.items():
        pass

Although this could be fixed by subclassing Dict.
David Eppstein (author) 20 years, 6 months ago  # | flag

Graph data structures. I'm not quite sure how a discussion of graph representation relates to this recipe, but I prefer dicts over lists for the inner part:

graph = {'A': {'B':3, 'C':5},
         'B': {'C':2, 'D':2},
         'C': {'D':1},
         'D': {'C':3},
         'E': {'F':8},
         'F': {'C':2}}

Modifiability is one reason, but more important to me is that this allows fast adjacency tests: 'C' in graph['A'] tests whether there is an edge 'A'-'C', not possible with your list-of-pairs representation, and for large graphs is faster with dicts than with lists. One can also easily loop over all neighbors of v (for w in graph[v]: ...) and associate useful information with the edge v-w (graph[v][w]).

Ryan Coleman 17 years, 6 months ago  # | flag

thanks. Just wanted to let you know that I found this code very useful, after realizing I needed a full-on priority queue, this was a lot easier than writing my own (for like the 3rd time in as many languages).

i couldn't find any papers of yours describing this code (I understand from a CS perspective writing code isn't very paper-worthy) so I cited this as a webpage, you can see the paper here:

http://dx.doi.org/10.1016/j.jmb.2006.07.022
a 12 years, 8 months ago  # | flag

why don't you use the functions in Python's heapq module to manage the heap?

Matteo Dell'Amico 12 years, 7 months ago  # | flag

An updated (well, 4 years old) version that uses heapq is available at Recipe 522995.

John Sullivan 11 years, 2 months ago  # | flag

David,

I would like to use Matteo's derivative of your code in an AGPL licensed project I maintain. Unfortunately though, anything that says licensed under the PSF is licensed pretty ambiguously such that I'm sure anybody could make the case that the code is not freely licensed at all.

Because Matteo's code is a derivative of your own, you have a copyright claim on his work. So in order for him to relicense, you would have to first.

Would it be possible for you to relicense this code to remove the ambiguity, and so Matteo's code can be likewise relicensed? I suggested the unlicense to him, and I would suggest the same to you, unless of course there are particular rights you'd like to keep. I do not know if you wish to use the unlicense, but just in case I have prepared an example here. If you copy and paste the code there (minus my notice at the top) into a new gist you will have released your code under the unlicense.

I hope you have a great day,
John Sullivan

David Eppstein (author) 11 years, 2 months ago  # | flag

Ok, it's unlicensed at https://gist.github.com/4435950

John Sullivan 11 years, 2 months ago  # | flag

Thank you, and happy new year!