One-line decorator call adds caching to functions with hashable arguments and no keyword arguments. When the maximum size is reached, the least recently used entry is discarded -- appropriate for long-running processes which cannot allow caches to grow without bound. Includes built-in performance instrumentation.
| Python |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | from collections import deque
def lru_cache(maxsize):
'''Decorator applying a least-recently-used cache with the given maximum size.
Arguments to the cached function must be hashable.
Cache performance statistics stored in f.hits and f.misses.
'''
def decorating_function(f):
cache = {} # mapping of args to results
queue = deque() # order that keys have been accessed
refcount = {} # number of times each key is in the access queue
def wrapper(*args):
# localize variable access (ugly but fast)
_cache=cache; _len=len; _refcount=refcount; _maxsize=maxsize
queue_append=queue.append; queue_popleft = queue.popleft
# get cache entry or compute if not found
try:
result = _cache[args]
wrapper.hits += 1
except KeyError:
result = _cache[args] = f(*args)
wrapper.misses += 1
# record that this key was recently accessed
queue_append(args)
_refcount[args] = _refcount.get(args, 0) + 1
# Purge least recently accessed cache contents
while _len(_cache) > _maxsize:
k = queue_popleft()
_refcount[k] -= 1
if not _refcount[k]:
del _cache[k]
del _refcount[k]
# Periodically compact the queue by duplicate keys
if _len(queue) > _maxsize * 4:
for i in [None] * _len(queue):
k = queue_popleft()
if _refcount[k] == 1:
queue_append(k)
else:
_refcount[k] -= 1
assert len(queue) == len(cache) == len(refcount) == sum(refcount.itervalues())
return result
wrapper.__doc__ = f.__doc__
wrapper.__name__ = f.__name__
wrapper.hits = wrapper.misses = 0
return wrapper
return decorating_function
if __name__ == '__main__':
@lru_cache(maxsize=20)
def f(x, y):
return 3*x+y
domain = range(5)
from random import choice
for i in range(1000):
r = f(choice(domain), choice(domain))
print f.hits, f.misses
|
Discussion
Since the cache has its own computational overhead, it is only useful where the underlying calculation or action is somewhat expensive.
The "cache" variable holds the mapping for input arguments to result values. The "queue" records successive accesses so we can which tell was the oldest. The queue can contain duplicates if the same arguments are called frequently. When pulling old items out of the queue, the "refcount" dictionary tracks whether other duplicate entries are still in the queue. An item isn't deleted until its refcount drops to zero. The invariants are len(cache)==len(refcount)==len(set(queue)) and sum(refcount.values)==len(queue).
Use the f.hits and f.misses instrumentation to tune the best setting for "maxsize" giving the best trade-off between hit-rate and space consumed.


Comments
question. why do you say:
instead of:
this implementation appears to be about 50% faster (using -m profile and your __main__ method as the performance test):
Problems with sasa sasa's version. "return cache[key]" needs to be indented.
Every time the memoized function is called _cursize is reset from cursize, so the LRU limit is never reached.
Even if the LRU limit were reached, it isn't really LRU. It gets rid of the the item that was inserted into the cache longest ago; it doesn't notice when the item was USED.
Nice. Since no one has made any positive comments on this recipe, I just want to say that I've read through it and think it's excellent work. I'm going to use it in an internal project.
Just what I was looking for. Thanks!
Instead of wrapper.__doc__ = f.__doc__ and wrapper.__name__ = f.__name__, use functools.wraps.
Sign in to comment