Everyone knows about LRU. Here is an implementation that takes O(1) time to insert, update or delete.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | class Node(object):
__slots__ = ['prev', 'next', 'me']
def __init__(self, prev, me):
self.prev = prev
self.me = me
self.next = None
class LRU:
"""
Implementation of a length-limited O(1) LRU queue.
Built for and used by PyPE:
http://pype.sourceforge.net
Copyright 2003 Josiah Carlson.
"""
def __init__(self, count, pairs=[]):
self.count = max(count, 1)
self.d = {}
self.first = None
self.last = None
for key, value in pairs:
self[key] = value
def __contains__(self, obj):
return obj in self.d
def __getitem__(self, obj):
a = self.d[obj].me
self[a[0]] = a[1]
return a[1]
def __setitem__(self, obj, val):
if obj in self.d:
del self[obj]
nobj = Node(self.last, (obj, val))
if self.first is None:
self.first = nobj
if self.last:
self.last.next = nobj
self.last = nobj
self.d[obj] = nobj
if len(self.d) > self.count:
if self.first == self.last:
self.first = None
self.last = None
return
a = self.first
a.next.prev = None
self.first = a.next
a.next = None
del self.d[a.me[0]]
del a
def __delitem__(self, obj):
nobj = self.d[obj]
if nobj.prev:
nobj.prev.next = nobj.next
else:
self.first = nobj.next
if nobj.next:
nobj.next.prev = nobj.prev
else:
self.last = nobj.prev
del self.d[obj]
def __iter__(self):
cur = self.first
while cur != None:
cur2 = cur.next
yield cur.me[1]
cur = cur2
def iteritems(self):
cur = self.first
while cur != None:
cur2 = cur.next
yield cur.me
cur = cur2
def iterkeys(self):
return iter(self.d)
def itervalues(self):
for i,j in self.iteritems():
yield j
def keys(self):
return self.d.keys()
|
Having not seen anything of the sort during my undergraduate years, upon writing it initially in the summer of 2002, I was enthusiastic as to it's usefulness in database caching or even operating system paging. Unfortunately, it seems as though no one is even interested in this thing, which suggests that it has been done before or something else. As a result, I'm offering it in the Python Cookbook before releasing a variant in my own Python editor PyPE (for keeping information about the 128 most recently opened documents) available at http://pype.sourceforge.net .
Use and enjoy.
Edit: Oh wow, you can edit them when there's a bug? Wow. Fixed the bugs.
Not quite O(1) or LRU. You basically maintain a dictionary and a linked list. Dictionary accesses are not O(1). I believe the Python implementation uses a hash table, which has O(N) worst case behavior. The best you can hope for in a dictionary is O(log(N)).
Furthermore, your code is not correct and does not maintain LRU semantics, as your linked list is not updated to move the most recently accessed node to the end of the list when __getitem__ is called. You are actually implementing a FIFO.
If you want to have good complexity behavior in a cache, you will have to use a priority queue algorithm like binomial heaps. I haven't tried the new Priority Queue class in Python 2.3 yet, but I have used the one at http://www.csse.monash.edu.au/hons/projects/1999/Andrew.Snare/ to good effect.
over pedantry. While saying "the best you can hope for from a dict is O(log n)" may be pedantically correct (and I'm not even sure about that), in practice assuming dicts have O(1) access isn't going to get you into hot water.
Python's hash functions are pretty good.
Bugfixes that slipped my mind. Small bugfixes:
Should be replaced by:
Hashing and O(1). Check the Python distribution source code, namely dictobject.c. If you notice the probe sequence rules, unless every object has the same hash, it will not perform O(n). In fact, with a maximally 2/3 filled table, the odds of colliding more than 10 times is less than ~1%.
To me, 10 probes is good enough. Heck, at 20 probes, we're talking about ~.01%.
Plus it has the benefit of not having to deal with pointers for a tree structure.
It is an LRU (it was buggy). After the bugfix I posted, it is indeed an LRU. Why?
If you follow the flow of code, certainly it always deletes the 'first' item, but that is the oldest item...all the newer items are inserted as the 'last' item.
Here is some output (after my bugfix given in another comment):
Seems to work for me.
I see... That's what everyone is talking about.
I only update it when writing, everyone cares about reading.
There is a bugfix in another comment.
Make LRU on read (as well as write). If you want to make it LRU on read, as well as write (which is all I had been concerned about), change:
To:
A note on binomial heaps. I've implemented both a standard binary heap, as well as a binomial heap in Python before. I was not impressed by my binomial heap implementation.
The author that you link to claims to have creted a fibonacci heap that is fast...but how fast? The graphs for his distribution show that it is quite fast, but I wanted to test it myself, so I did.
Below are some test runs against my sample implementations. dict_heap and list_heap are identical, except that dict_heap uses a dictionary and list_heap uses a Python list (which is really an array). Notice how the dictionary is only ever about 33% slower than list_heap during the removal of all the elements.
PQ0 gets slower as the sizes get larger because deleting the first item in a list (as PQ0 does) is O(n).
PQueue just seems to be poorly implemented.
PQEquivBig seems to be exactly 50% slower than list_heap during deletion, and much slower in insertion.
I'm not impressed.
Ammortized O(1) when the dictionary is sparse enough. I was thinking about it for a bit; if Python's hash functions are good enough, and the dictionary was sparse enough, one can ammortize all accesses to O(1).
Assume Python dictionaries use double hashing; first for the start location, second for a probe sequence increment. If the dictionary is 1/2 full, then the probability of you not colliding in your first shot is 1/2. For those that collide, subsequent tests have a 1/2 chance of not colliding. If we sum up the probabilities to get to probe number k... 11/2 + 21/4 + 31/8 + 41/16 + ..., which ends up summing to 2.
Very nifty.
Needs to use weakref. This recipie looks great for my application with a cache of about 1000 objects. However the doubly linked Nodes form reference cycles and will not get garbage collected when the LRU instance is deleted. Using a weakref for the Node.prev solves this. You could also use __slots__ on the Node class to save space - given that that there will be alot of them.
You make a good point about the use of slots. I've updated the recipe to reflect it.
As for the reference cycles, since 2.2, Python has had a cycle-detecting garbage collector, so these reference cycles aren't a big deal. Also, only mangling the previous pointers ignores the fact that next pointers introduce their own reference cycle. To prove that reference cycles aren't a big deal, the following is a sample interaction using the LRU with Python 2.3 (as Python 2.2 is no longer supported by the PSF).