Adaptive Replacement Cache in python « Python recipes

Python implementation of an adaptive replacement cache, as described in:

http://en.wikipedia.org/wiki/Adaptive_Replacement_Cache

Caveat: This may or may not be under patent by IBM, beware of putting it in production code. I implemented it as an experiment, and since I'm in Europe I do not believe a software patent can apply to anything I write.

      class Cache(object):
    """
    >>> dec_cache = Cache(10)
    >>> @dec_cache
    ... def identity(f):
    ...     return f
    >>> dummy = [identity(x) for x in range(20) + range(11,15) + range(20) +
    ... range(11,40) + [39, 38, 37, 36, 35, 34, 33, 32, 16, 17, 11, 41]] 
    >>> dec_cache.t1
    deque([(41,)])
    >>> dec_cache.t2
    deque([(11,), (17,), (16,), (32,), (33,), (34,), (35,), (36,), (37,)])
    >>> dec_cache.b1
    deque([(31,), (30,)])
    >>> dec_cache.b2
    deque([(38,), (39,), (19,), (18,), (15,), (14,), (13,), (12,)])
    >>> dec_cache.p
    5
    """
    def __init__(self, size):
        self.cached = {}
        self.c = size
        self.p = 0
        self.t1 = deque()
        self.t2 = deque()
        self.b1 = deque()
        self.b2 = deque()

    def replace(self, args):
        if self.t1 and (
            (args in self.b2 and len(self.t1) == self.p) or
            (len(self.t1) > self.p)):
            old = self.t1.pop()
            self.b1.appendleft(old)
        else:
            old = self.t2.pop()
            self.b2.appendleft(old)
        del(self.cached[old])
        
    def __call__(self, func):
        def wrapper(*orig_args):
            """decorator function wrapper"""
            args = orig_args[:]
            if args in self.t1: 
                self.t1.remove(args)
                self.t2.appendleft(args)
                return self.cached[args]
            if args in self.t2: 
                self.t2.remove(args)
                self.t2.appendleft(args)
                return self.cached[args]
            result = func(*orig_args)
            self.cached[args] = result
            if args in self.b1:
                self.p = min(
                    self.c, self.p + max(len(self.b2) / len(self.b1) , 1))
                self.replace(args)
                self.b1.remove(args)
                self.t2.appendleft(args)
                #print "%s:: t1:%s b1:%s t2:%s b2:%s p:%s" % (
                #    repr(func)[10:30], len(self.t1),len(self.b1),len(self.t2),
                #    len(self.b2), self.p)
                return result            
            if args in self.b2:
                self.p = max(0, self.p - max(len(self.b1)/len(self.b2) , 1))
                self.replace(args)
                self.b2.remove(args)
                self.t2.appendleft(args)
                #print "%s:: t1:%s b1:%s t2:%s b2:%s p:%s" % (
                #   repr(func)[10:30], len(self.t1),len(self.b1),len(self.t2),
                #   len(self.b2), self.p)
                return result
            if len(self.t1) + len(self.b1) == self.c:
                if len(self.t1) < self.c:
                    self.b1.pop()
                    self.replace(args)
                else:
                    del(self.cached[self.t1.pop()])
            else:
                total = len(self.t1) + len(self.b1) + len(
                    self.t2) + len(self.b2)
                if total >= self.c:
                    if total == (2 * self.c):
                        self.b2.pop()
                    self.replace(args)
            self.t1.appendleft(args)
            return result
        return wrapper

      

Tags: caching, decorators

7 comments

Raymond Hettinger 13 years, 8 months ago # | flag

Thanks for the nice, clean translation of the spec in the patent application.

As presented, the recipe uses collections.deque() which is efficient for the all of the operations except remove() and __contains__() which are both O(n) operations.

Instead of collections.deque(), it would be better to use an alternate deque implementation that supports fast searching and removal. This can easily be expressed using collections.OrderedDict() which appeared in Py2.7 and Py3.1 and has an ASPN recipe for earlier versions. In the recipe above, just substitute Deque for deque and define it as:

class Deque:
    'Fast searchable queue'
    def __init__(self):
        self.od = OrderedDict()
    def appendleft(self, k):
        od = self.od
        if k in od:
            del od[k]
        od[k] = None
    def pop(self):
        return self.od.popitem(0)[0]
    def remove(self, k):
        del self.od[k]
    def __len__(self):
        return len(self.od)
    def __contains__(self, k):
        return k in self.od
    def __iter__(self):
        return reversed(self.od)
    def __repr__(self):
        return 'Deque(%r)' % (list(self),)

sdsad 12 years, 10 months ago # | flag

There is something I can't understand in this algorithm: When t1 list is full (t2, b1, b2 - empty) and we encounter unique element that is not in t1, then algorithm will just remove last item from t1 and push new to t1, without populating b1, is that correct behavior?

eric casteleijn (author) 12 years, 10 months ago # | flag

I don't think that's true, but I may be missing a subtle bug. The replace function always appends to b1 what it pops from t1:

        old = self.t1.pop()
        self.b1.appendleft(old)

sdsad 12 years, 10 months ago # | flag

sorry for my inaccuracy, I mean the situation when we get to this place:

else:
    del(self.cached[self.t1.pop()])

And after this:

self.t1.appendleft(args)
    return result

So no append to b1 as far as I see.

eric casteleijn (author) 12 years, 10 months ago # | flag

Ah, right: it has been a while, but I think this code triggers when the adaptive cache is shifting towards growing t2 rather than t1, though I still find it strange that t1 is being popped and not b1, so that may indeed be a bug. I'll look at the original research paper again.

eric casteleijn (author) 12 years, 10 months ago # | flag

Ah: It's not a bug, though it could have used a comment, because the code is a little hard to follow there:

        if len(self.t1) + len(self.b1) == self.c:
            if len(self.t1) < self.c:
                self.b1.pop()
                self.replace(args)
            else:
                del(self.cached[self.t1.pop()])

In the case of the else, len(t1) == c, and len(b1) == 0, and the cache is trying to grow l2, so l1 (t1 + b1) must be shrunk. In this case l1 == t1 so we pop from there without adding it to the recent misses, since there is no room.

See the algorithm on page 123 of:

http://www.almaden.ibm.com/cs/people/dmodha/arcfast.pdf

sdsad 12 years, 10 months ago # | flag

Thank you very much for the explanation!

◄	Python recipes (4591)	►
◄	eric casteleijn's recipes (1)	►

Adaptive Replacement Cache in python (Python recipe) by eric casteleijn
ActiveState Code (http://code.activestate.com/recipes/576532/)

7 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Adaptive Replacement Cache in python (Python recipe) by eric casteleijn ActiveState Code (http://code.activestate.com/recipes/576532/)

7 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Adaptive Replacement Cache in python (Python recipe) by eric casteleijn
ActiveState Code (http://code.activestate.com/recipes/576532/)