One-line decorator call adds caching to functions with hashable arguments and no keyword arguments. When the maximum size is reached, the least recently used entry or least frequently used entry is discarded -- appropriate for long-running processes which cannot allow caches to grow without bound. Includes built-in performance instrumentation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | import collections
import functools
from itertools import ifilterfalse
from heapq import nsmallest
from operator import itemgetter
class Counter(dict):
'Mapping where default values are zero'
def __missing__(self, key):
return 0
def lru_cache(maxsize=100):
'''Least-recently-used cache decorator.
Arguments to the cached function must be hashable.
Cache performance statistics stored in f.hits and f.misses.
Clear the cache with f.clear().
http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used
'''
maxqueue = maxsize * 10
def decorating_function(user_function,
len=len, iter=iter, tuple=tuple, sorted=sorted, KeyError=KeyError):
cache = {} # mapping of args to results
queue = collections.deque() # order that keys have been used
refcount = Counter() # times each key is in the queue
sentinel = object() # marker for looping around the queue
kwd_mark = object() # separate positional and keyword args
# lookup optimizations (ugly but fast)
queue_append, queue_popleft = queue.append, queue.popleft
queue_appendleft, queue_pop = queue.appendleft, queue.pop
@functools.wraps(user_function)
def wrapper(*args, **kwds):
# cache key records both positional and keyword args
key = args
if kwds:
key += (kwd_mark,) + tuple(sorted(kwds.items()))
# record recent use of this key
queue_append(key)
refcount[key] += 1
# get cache entry or compute if not found
try:
result = cache[key]
wrapper.hits += 1
except KeyError:
result = user_function(*args, **kwds)
cache[key] = result
wrapper.misses += 1
# purge least recently used cache entry
if len(cache) > maxsize:
key = queue_popleft()
refcount[key] -= 1
while refcount[key]:
key = queue_popleft()
refcount[key] -= 1
del cache[key], refcount[key]
# periodically compact the queue by eliminating duplicate keys
# while preserving order of most recent access
if len(queue) > maxqueue:
refcount.clear()
queue_appendleft(sentinel)
for key in ifilterfalse(refcount.__contains__,
iter(queue_pop, sentinel)):
queue_appendleft(key)
refcount[key] = 1
return result
def clear():
cache.clear()
queue.clear()
refcount.clear()
wrapper.hits = wrapper.misses = 0
wrapper.hits = wrapper.misses = 0
wrapper.clear = clear
return wrapper
return decorating_function
def lfu_cache(maxsize=100):
'''Least-frequenty-used cache decorator.
Arguments to the cached function must be hashable.
Cache performance statistics stored in f.hits and f.misses.
Clear the cache with f.clear().
http://en.wikipedia.org/wiki/Least_Frequently_Used
'''
def decorating_function(user_function):
cache = {} # mapping of args to results
use_count = Counter() # times each key has been accessed
kwd_mark = object() # separate positional and keyword args
@functools.wraps(user_function)
def wrapper(*args, **kwds):
key = args
if kwds:
key += (kwd_mark,) + tuple(sorted(kwds.items()))
use_count[key] += 1
# get cache entry or compute if not found
try:
result = cache[key]
wrapper.hits += 1
except KeyError:
result = user_function(*args, **kwds)
cache[key] = result
wrapper.misses += 1
# purge least frequently used cache entry
if len(cache) > maxsize:
for key, _ in nsmallest(maxsize // 10,
use_count.iteritems(),
key=itemgetter(1)):
del cache[key], use_count[key]
return result
def clear():
cache.clear()
use_count.clear()
wrapper.hits = wrapper.misses = 0
wrapper.hits = wrapper.misses = 0
wrapper.clear = clear
return wrapper
return decorating_function
if __name__ == '__main__':
@lru_cache(maxsize=20)
def f(x, y):
return 3*x+y
domain = range(5)
from random import choice
for i in range(1000):
r = f(choice(domain), choice(domain))
print(f.hits, f.misses)
@lfu_cache(maxsize=20)
def f(x, y):
return 3*x+y
domain = range(5)
from random import choice
for i in range(1000):
r = f(choice(domain), choice(domain))
print(f.hits, f.misses)
|
An LRU cache limits its size by flushing the least recently used keys. In contrast, an LFU cache flushes the least frequently used keys. Use an LRU cache when recent accesses are the best predictors of upcoming caches -- when the frequency distribution of calls changes over time. Use an LFU cache when the call frequency does not vary over time (i.e. what was the most frequent access last week is also the same this week).
An news server would likely use an LRU cache because the most popular articles change each day. In contrast, an encyclopedia likely has some articles that are always more frequently served than others; accordingly, in would use an LFU cache.
Since the cache has its own computational overhead, it is only useful where the underlying calculation or action is somewhat expensive.
The "cache" variable holds the mapping for input arguments to result values. The "queue" records successive accesses so we can which tell was the oldest. The queue can contain duplicates if the same arguments are called frequently. When pulling old items out of the queue, the "refcount" dictionary tracks whether other duplicate entries are still in the queue. An item isn't deleted until its refcount drops to zero. The invariants are len(cache)==len(refcount)==len(set(queue)) and sum(refcount.values)==len(queue).
Use the f.hits and f.misses instrumentation to tune the best setting for "maxsize" giving the best trade-off between hit-rate and space consumed.
Here's an alternative implementation using OrderedDict from Python 2.7 or 3.1:
import collections
import functools
def lru_cache(maxsize=100):
'''Least-recently-used cache decorator.
Arguments to the cached function must be hashable.
Cache performance statistics stored in f.hits and f.misses.
http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used
'''
def decorating_function(user_function):
cache = collections.OrderedDict() # order: least recent to most recent
@functools.wraps(user_function)
def wrapper(*args, **kwds):
key = args
if kwds:
key += tuple(sorted(kwds.items()))
try:
result = cache.pop(key)
wrapper.hits += 1
except KeyError:
result = user_function(*args, **kwds)
wrapper.misses += 1
if len(cache) >= maxsize:
cache.popitem(0) # purge least recently used cache entry
cache[key] = result # record recent use of this key
return result
wrapper.hits = wrapper.misses = 0
return wrapper
return decorating_function
question. why do you say:
instead of:
this implementation appears to be about 50% faster (using -m profile and your __main__ method as the performance test):
Problems with sasa sasa's version. "return cache[key]" needs to be indented.
Every time the memoized function is called _cursize is reset from cursize, so the LRU limit is never reached.
Even if the LRU limit were reached, it isn't really LRU. It gets rid of the the item that was inserted into the cache longest ago; it doesn't notice when the item was USED.
Nice. Since no one has made any positive comments on this recipe, I just want to say that I've read through it and think it's excellent work. I'm going to use it in an internal project.
Just what I was looking for. Thanks!
Instead of wrapper.__doc__ = f.__doc__ and wrapper.__name__ = f.__name__, use functools.wraps.
Great work! Anybody ever used this module in production? Is there an updated version somewhere?
I slightly modified lfu_cache to avoid a problem when everything in the cache has been called at least twice. In this scenario, the new function call will be added to the cache, but will be immediately removed (as it has less hits than everything else).
Thanks for sharing this code .
We'll be using the original version in Apacheā¢ Bloodhound . After running our test suite we have added two more things:
Modifications were suggested in this patch ...
https://issues.apache.org/bloodhound/attachment/ticket/440/t440_1456016_product_env_singleton.diff
... and eventually will be initially added in this file
http://svn.apache.org/viewvc/incubator/bloodhound/branches/bep_0003_multiproduct/bloodhound_multiproduct/multiproduct/cache.py?view=markup
if anyone has further comments about all this (code , licensing , ...) , please let us know .
Hi, First of all, thank you very much for this! I wanted to point that the code has a bug when the cached function throws an exception. In that case, when it later tries to delete the corresponding key (for which there was no stored value) it throws a KeyError exception. My solution to it was just to capture it, but it might not be the best way to go. Perhaps it would be hiding some other error in the logic.
I was trying out these caching methods: I am implementing something like this and get an error:
<code> class P4client(object): def __init__(self): pass
</code>
and I get this sort of error: AttributeError: 'str' object has no attribute '__module__'
You have an idea how to implement it with threads?
Yes. The technique is the same as everything else with threads, use a mutex to make sure that no two threads are updating the LRU/LFU data structure at the same time.
Alternatively, you can put the locks inside the lru/lfu code. See the Py3.4 Lib/functools.py source code for worked-out example: https://hg.python.org/cpython/file/463bbd862887/Lib/functools.py#l334