Welcome, guest | Sign In | My Account | Store | Cart

Everyone knows about LRU. Here is an implementation that takes O(1) time to insert, update or delete.

Python, 78 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
class Node(object):
    __slots__ = ['prev', 'next', 'me']
    def __init__(self, prev, me):
        self.prev = prev
        self.me = me
        self.next = None

class LRU:
    """
    Implementation of a length-limited O(1) LRU queue.
    Built for and used by PyPE:
    http://pype.sourceforge.net
    Copyright 2003 Josiah Carlson.
    """
    def __init__(self, count, pairs=[]):
        self.count = max(count, 1)
        self.d = {}
        self.first = None
        self.last = None
        for key, value in pairs:
            self[key] = value
    def __contains__(self, obj):
        return obj in self.d
    def __getitem__(self, obj):
        a = self.d[obj].me
        self[a[0]] = a[1]
        return a[1]
    def __setitem__(self, obj, val):
        if obj in self.d:
            del self[obj]
        nobj = Node(self.last, (obj, val))
        if self.first is None:
            self.first = nobj
        if self.last:
            self.last.next = nobj
        self.last = nobj
        self.d[obj] = nobj
        if len(self.d) > self.count:
            if self.first == self.last:
                self.first = None
                self.last = None
                return
            a = self.first
            a.next.prev = None
            self.first = a.next
            a.next = None
            del self.d[a.me[0]]
            del a
    def __delitem__(self, obj):
        nobj = self.d[obj]
        if nobj.prev:
            nobj.prev.next = nobj.next
        else:
            self.first = nobj.next
        if nobj.next:
            nobj.next.prev = nobj.prev
        else:
            self.last = nobj.prev
        del self.d[obj]
    def __iter__(self):
        cur = self.first
        while cur != None:
            cur2 = cur.next
            yield cur.me[1]
            cur = cur2
    def iteritems(self):
        cur = self.first
        while cur != None:
            cur2 = cur.next
            yield cur.me
            cur = cur2
    def iterkeys(self):
        return iter(self.d)
    def itervalues(self):
        for i,j in self.iteritems():
            yield j
    def keys(self):
        return self.d.keys()

Having not seen anything of the sort during my undergraduate years, upon writing it initially in the summer of 2002, I was enthusiastic as to it's usefulness in database caching or even operating system paging. Unfortunately, it seems as though no one is even interested in this thing, which suggests that it has been done before or something else. As a result, I'm offering it in the Python Cookbook before releasing a variant in my own Python editor PyPE (for keeping information about the 128 most recently opened documents) available at http://pype.sourceforge.net .

Use and enjoy.

Edit: Oh wow, you can edit them when there's a bug? Wow. Fixed the bugs.

11 comments

Fazal Majid 17 years, 11 months ago  # | flag

Not quite O(1) or LRU. You basically maintain a dictionary and a linked list. Dictionary accesses are not O(1). I believe the Python implementation uses a hash table, which has O(N) worst case behavior. The best you can hope for in a dictionary is O(log(N)).

Furthermore, your code is not correct and does not maintain LRU semantics, as your linked list is not updated to move the most recently accessed node to the end of the list when __getitem__ is called. You are actually implementing a FIFO.

If you want to have good complexity behavior in a cache, you will have to use a priority queue algorithm like binomial heaps. I haven't tried the new Priority Queue class in Python 2.3 yet, but I have used the one at http://www.csse.monash.edu.au/hons/projects/1999/Andrew.Snare/ to good effect.

Michael Hudson 17 years, 11 months ago  # | flag

over pedantry. While saying "the best you can hope for from a dict is O(log n)" may be pedantically correct (and I'm not even sure about that), in practice assuming dicts have O(1) access isn't going to get you into hot water.

Python's hash functions are pretty good.

Josiah Carlson (author) 17 years, 11 months ago  # | flag

Bugfixes that slipped my mind. Small bugfixes:

self.first = a
a.next = None
del self.d[a[0]]

Should be replaced by:

self.first = a.next
a.next = None
del self.d[a.me[0]]
Josiah Carlson (author) 17 years, 11 months ago  # | flag

Hashing and O(1). Check the Python distribution source code, namely dictobject.c. If you notice the probe sequence rules, unless every object has the same hash, it will not perform O(n). In fact, with a maximally 2/3 filled table, the odds of colliding more than 10 times is less than ~1%.

To me, 10 probes is good enough. Heck, at 20 probes, we're talking about ~.01%.

Plus it has the benefit of not having to deal with pointers for a tree structure.

Josiah Carlson (author) 17 years, 11 months ago  # | flag

It is an LRU (it was buggy). After the bugfix I posted, it is indeed an LRU. Why?

If you follow the flow of code, certainly it always deletes the 'first' item, but that is the oldest item...all the newer items are inserted as the 'last' item.

Here is some output (after my bugfix given in another comment):

>>> a = LRU(4)
>>> a[1] = 1
>>> a[2] = 2
>>> a[3] = 3
>>> a[4] = 4
>>> for i in a.iteritems():
...     print i,
...
(1, 1) (2, 2) (3, 3) (4, 4)
>>> a[5] = 5
>>> for i in a.iteritems():
...     print i,
...
(2, 2) (3, 3) (4, 4) (5, 5)
>>> a[3] = 6
>>> for i in a.iteritems():
...     print i,
...
(2, 2) (4, 4) (5, 5) (3, 6)

Seems to work for me.

Josiah Carlson (author) 17 years, 11 months ago  # | flag

I see... That's what everyone is talking about.

I only update it when writing, everyone cares about reading.

There is a bugfix in another comment.

Josiah Carlson (author) 17 years, 11 months ago  # | flag

Make LRU on read (as well as write). If you want to make it LRU on read, as well as write (which is all I had been concerned about), change:

def __getitem__(self, obj):
    return self.d[obj].me[1]

To:

def __getitem__(self, obj):
    a = self.d[obj].me
    self[a[0]] = a[1]
    return a[1]
Josiah Carlson (author) 17 years, 11 months ago  # | flag

A note on binomial heaps. I've implemented both a standard binary heap, as well as a binomial heap in Python before. I was not impressed by my binomial heap implementation.

The author that you link to claims to have creted a fibonacci heap that is fast...but how fast? The graphs for his distribution show that it is quite fast, but I wanted to test it myself, so I did.

Below are some test runs against my sample implementations. dict_heap and list_heap are identical, except that dict_heap uses a dictionary and list_heap uses a Python list (which is really an array). Notice how the dictionary is only ever about 33% slower than list_heap during the removal of all the elements.

              init     popall
10000:
dict_heap     0.265    1.829
list_heap     0.141    1.406
BinomialHeap  0.484    4.25
PQ0           0.047    0.734
PQueue        0.359    1.594
PQEquivBig    1.062    2.172
20000:
dict_heap     0.563    3.968
list_heap     0.281    3.047
BinomialHeap  0.969    9.125
PQ0           0.11     2.765
PQueue        0.766    3.437
PQEquivBig    2.218    4.625
30000:
dict_heap     0.844    6.234
list_heap     0.422    4.75
BinomialHeap  1.469   14.313
PQ0           0.172    8.859
PQueue        1.125    5.407
PQEquivBig    3.313    7.141
40000:
dict_heap     1.141    8.562
list_heap     0.578    6.531
BinomialHeap  1.937   19.828
PQ0           0.234   16.094
PQueue        1.625    7.438
PQEquivBig    4.563    9.797

PQ0 gets slower as the sizes get larger because deleting the first item in a list (as PQ0 does) is O(n).

PQueue just seems to be poorly implemented.

PQEquivBig seems to be exactly 50% slower than list_heap during deletion, and much slower in insertion.

I'm not impressed.

Josiah Carlson (author) 17 years, 2 months ago  # | flag

Ammortized O(1) when the dictionary is sparse enough. I was thinking about it for a bit; if Python's hash functions are good enough, and the dictionary was sparse enough, one can ammortize all accesses to O(1).

Assume Python dictionaries use double hashing; first for the start location, second for a probe sequence increment. If the dictionary is 1/2 full, then the probability of you not colliding in your first shot is 1/2. For those that collide, subsequent tests have a 1/2 chance of not colliding. If we sum up the probabilities to get to probe number k... 11/2 + 21/4 + 31/8 + 41/16 + ..., which ends up summing to 2.

Very nifty.

Tim Mitchell 15 years, 6 months ago  # | flag

Needs to use weakref. This recipie looks great for my application with a cache of about 1000 objects. However the doubly linked Nodes form reference cycles and will not get garbage collected when the LRU instance is deleted. Using a weakref for the Node.prev solves this. You could also use __slots__ on the Node class to save space - given that that there will be alot of them.

Josiah Carlson (author) 15 years, 4 months ago  # | flag

You make a good point about the use of slots. I've updated the recipe to reflect it.

As for the reference cycles, since 2.2, Python has had a cycle-detecting garbage collector, so these reference cycles aren't a big deal. Also, only mangling the previous pointers ignores the fact that next pointers introduce their own reference cycle. To prove that reference cycles aren't a big deal, the following is a sample interaction using the LRU with Python 2.3 (as Python 2.2 is no longer supported by the PSF).

>>> import gc
>>> import lru
>>> x = lru.LRU(2000, zip(range(1000), range(1000)))
>>> del x
>>> gc.garbage
[]
>>> gc.collect()
3000
>>> gc.garbage
[]
>>>
Created by Josiah Carlson on Wed, 10 Dec 2003 (PSF)
Python recipes (4591)
Josiah Carlson's recipes (9)

Required Modules

  • (none specified)

Other Information and Tasks