Length-limited O(1) LRU Cache implementation « Python recipes

Everyone knows about LRU. Here is an implementation that takes O(1) time to insert, update or delete.

      class Node(object):
    __slots__ = ['prev', 'next', 'me']
    def __init__(self, prev, me):
        self.prev = prev
        self.me = me
        self.next = None

class LRU:
    """
    Implementation of a length-limited O(1) LRU queue.
    Built for and used by PyPE:
    http://pype.sourceforge.net
    Copyright 2003 Josiah Carlson.
    """
    def __init__(self, count, pairs=[]):
        self.count = max(count, 1)
        self.d = {}
        self.first = None
        self.last = None
        for key, value in pairs:
            self[key] = value
    def __contains__(self, obj):
        return obj in self.d
    def __getitem__(self, obj):
        a = self.d[obj].me
        self[a[0]] = a[1]
        return a[1]
    def __setitem__(self, obj, val):
        if obj in self.d:
            del self[obj]
        nobj = Node(self.last, (obj, val))
        if self.first is None:
            self.first = nobj
        if self.last:
            self.last.next = nobj
        self.last = nobj
        self.d[obj] = nobj
        if len(self.d) > self.count:
            if self.first == self.last:
                self.first = None
                self.last = None
                return
            a = self.first
            a.next.prev = None
            self.first = a.next
            a.next = None
            del self.d[a.me[0]]
            del a
    def __delitem__(self, obj):
        nobj = self.d[obj]
        if nobj.prev:
            nobj.prev.next = nobj.next
        else:
            self.first = nobj.next
        if nobj.next:
            nobj.next.prev = nobj.prev
        else:
            self.last = nobj.prev
        del self.d[obj]
    def __iter__(self):
        cur = self.first
        while cur != None:
            cur2 = cur.next
            yield cur.me[1]
            cur = cur2
    def iteritems(self):
        cur = self.first
        while cur != None:
            cur2 = cur.next
            yield cur.me
            cur = cur2
    def iterkeys(self):
        return iter(self.d)
    def itervalues(self):
        for i,j in self.iteritems():
            yield j
    def keys(self):
        return self.d.keys()

      

Having not seen anything of the sort during my undergraduate years, upon writing it initially in the summer of 2002, I was enthusiastic as to it's usefulness in database caching or even operating system paging. Unfortunately, it seems as though no one is even interested in this thing, which suggests that it has been done before or something else. As a result, I'm offering it in the Python Cookbook before releasing a variant in my own Python editor PyPE (for keeping information about the 128 most recently opened documents) available at http://pype.sourceforge.net .

Use and enjoy.

Edit: Oh wow, you can edit them when there's a bug? Wow. Fixed the bugs.

Tags: algorithms

11 comments

Fazal Majid 20 years, 4 months ago # | flag

Not quite O(1) or LRU. You basically maintain a dictionary and a linked list. Dictionary accesses are not O(1). I believe the Python implementation uses a hash table, which has O(N) worst case behavior. The best you can hope for in a dictionary is O(log(N)).

Furthermore, your code is not correct and does not maintain LRU semantics, as your linked list is not updated to move the most recently accessed node to the end of the list when __getitem__ is called. You are actually implementing a FIFO.

If you want to have good complexity behavior in a cache, you will have to use a priority queue algorithm like binomial heaps. I haven't tried the new Priority Queue class in Python 2.3 yet, but I have used the one at http://www.csse.monash.edu.au/hons/projects/1999/Andrew.Snare/ to good effect.

Michael Hudson 20 years, 4 months ago # | flag

over pedantry. While saying "the best you can hope for from a dict is O(log n)" may be pedantically correct (and I'm not even sure about that), in practice assuming dicts have O(1) access isn't going to get you into hot water.

Python's hash functions are pretty good.