Python implementation of an adaptive replacement cache, as described in:
http://en.wikipedia.org/wiki/Adaptive_Replacement_Cache
Caveat: This may or may not be under patent by IBM, beware of putting it in production code. I implemented it as an experiment, and since I'm in Europe I do not believe a software patent can apply to anything I write.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | class Cache(object):
"""
>>> dec_cache = Cache(10)
>>> @dec_cache
... def identity(f):
... return f
>>> dummy = [identity(x) for x in range(20) + range(11,15) + range(20) +
... range(11,40) + [39, 38, 37, 36, 35, 34, 33, 32, 16, 17, 11, 41]]
>>> dec_cache.t1
deque([(41,)])
>>> dec_cache.t2
deque([(11,), (17,), (16,), (32,), (33,), (34,), (35,), (36,), (37,)])
>>> dec_cache.b1
deque([(31,), (30,)])
>>> dec_cache.b2
deque([(38,), (39,), (19,), (18,), (15,), (14,), (13,), (12,)])
>>> dec_cache.p
5
"""
def __init__(self, size):
self.cached = {}
self.c = size
self.p = 0
self.t1 = deque()
self.t2 = deque()
self.b1 = deque()
self.b2 = deque()
def replace(self, args):
if self.t1 and (
(args in self.b2 and len(self.t1) == self.p) or
(len(self.t1) > self.p)):
old = self.t1.pop()
self.b1.appendleft(old)
else:
old = self.t2.pop()
self.b2.appendleft(old)
del(self.cached[old])
def __call__(self, func):
def wrapper(*orig_args):
"""decorator function wrapper"""
args = orig_args[:]
if args in self.t1:
self.t1.remove(args)
self.t2.appendleft(args)
return self.cached[args]
if args in self.t2:
self.t2.remove(args)
self.t2.appendleft(args)
return self.cached[args]
result = func(*orig_args)
self.cached[args] = result
if args in self.b1:
self.p = min(
self.c, self.p + max(len(self.b2) / len(self.b1) , 1))
self.replace(args)
self.b1.remove(args)
self.t2.appendleft(args)
#print "%s:: t1:%s b1:%s t2:%s b2:%s p:%s" % (
# repr(func)[10:30], len(self.t1),len(self.b1),len(self.t2),
# len(self.b2), self.p)
return result
if args in self.b2:
self.p = max(0, self.p - max(len(self.b1)/len(self.b2) , 1))
self.replace(args)
self.b2.remove(args)
self.t2.appendleft(args)
#print "%s:: t1:%s b1:%s t2:%s b2:%s p:%s" % (
# repr(func)[10:30], len(self.t1),len(self.b1),len(self.t2),
# len(self.b2), self.p)
return result
if len(self.t1) + len(self.b1) == self.c:
if len(self.t1) < self.c:
self.b1.pop()
self.replace(args)
else:
del(self.cached[self.t1.pop()])
else:
total = len(self.t1) + len(self.b1) + len(
self.t2) + len(self.b2)
if total >= self.c:
if total == (2 * self.c):
self.b2.pop()
self.replace(args)
self.t1.appendleft(args)
return result
return wrapper
|
Tags: caching, decorators
Thanks for the nice, clean translation of the spec in the patent application.
As presented, the recipe uses collections.deque() which is efficient for the all of the operations except remove() and __contains__() which are both O(n) operations.
Instead of collections.deque(), it would be better to use an alternate deque implementation that supports fast searching and removal. This can easily be expressed using collections.OrderedDict() which appeared in Py2.7 and Py3.1 and has an ASPN recipe for earlier versions. In the recipe above, just substitute Deque for deque and define it as:
There is something I can't understand in this algorithm: When t1 list is full (t2, b1, b2 - empty) and we encounter unique element that is not in t1, then algorithm will just remove last item from t1 and push new to t1, without populating b1, is that correct behavior?
I don't think that's true, but I may be missing a subtle bug. The replace function always appends to b1 what it pops from t1:
sorry for my inaccuracy, I mean the situation when we get to this place:
And after this:
So no append to
b1
as far as I see.Ah, right: it has been a while, but I think this code triggers when the adaptive cache is shifting towards growing t2 rather than t1, though I still find it strange that t1 is being popped and not b1, so that may indeed be a bug. I'll look at the original research paper again.
Ah: It's not a bug, though it could have used a comment, because the code is a little hard to follow there:
In the case of the else, len(t1) == c, and len(b1) == 0, and the cache is trying to grow l2, so l1 (t1 + b1) must be shrunk. In this case l1 == t1 so we pop from there without adding it to the recent misses, since there is no room.
See the algorithm on page 123 of:
http://www.almaden.ibm.com/cs/people/dmodha/arcfast.pdf
Thank you very much for the explanation!