Welcome, guest | Sign In | My Account | Store | Cart

One-line decorator call adds caching to functions with hashable arguments and no keyword arguments. When the maximum size is reached, the least recently used entry or least frequently used entry is discarded -- appropriate for long-running processes which cannot allow caches to grow without bound. Includes built-in performance instrumentation.

Python, 160 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
import collections
import functools
from itertools import ifilterfalse
from heapq import nsmallest
from operator import itemgetter

class Counter(dict):
    'Mapping where default values are zero'
    def __missing__(self, key):
        return 0

def lru_cache(maxsize=100):
    '''Least-recently-used cache decorator.

    Arguments to the cached function must be hashable.
    Cache performance statistics stored in f.hits and f.misses.
    Clear the cache with f.clear().
    http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used

    '''
    maxqueue = maxsize * 10
    def decorating_function(user_function,
            len=len, iter=iter, tuple=tuple, sorted=sorted, KeyError=KeyError):
        cache = {}                  # mapping of args to results
        queue = collections.deque() # order that keys have been used
        refcount = Counter()        # times each key is in the queue
        sentinel = object()         # marker for looping around the queue
        kwd_mark = object()         # separate positional and keyword args

        # lookup optimizations (ugly but fast)
        queue_append, queue_popleft = queue.append, queue.popleft
        queue_appendleft, queue_pop = queue.appendleft, queue.pop

        @functools.wraps(user_function)
        def wrapper(*args, **kwds):
            # cache key records both positional and keyword args
            key = args
            if kwds:
                key += (kwd_mark,) + tuple(sorted(kwds.items()))

            # record recent use of this key
            queue_append(key)
            refcount[key] += 1

            # get cache entry or compute if not found
            try:
                result = cache[key]
                wrapper.hits += 1
            except KeyError:
                result = user_function(*args, **kwds)
                cache[key] = result
                wrapper.misses += 1

                # purge least recently used cache entry
                if len(cache) > maxsize:
                    key = queue_popleft()
                    refcount[key] -= 1
                    while refcount[key]:
                        key = queue_popleft()
                        refcount[key] -= 1
                    del cache[key], refcount[key]

            # periodically compact the queue by eliminating duplicate keys
            # while preserving order of most recent access
            if len(queue) > maxqueue:
                refcount.clear()
                queue_appendleft(sentinel)
                for key in ifilterfalse(refcount.__contains__,
                                        iter(queue_pop, sentinel)):
                    queue_appendleft(key)
                    refcount[key] = 1


            return result

        def clear():
            cache.clear()
            queue.clear()
            refcount.clear()
            wrapper.hits = wrapper.misses = 0

        wrapper.hits = wrapper.misses = 0
        wrapper.clear = clear
        return wrapper
    return decorating_function


def lfu_cache(maxsize=100):
    '''Least-frequenty-used cache decorator.

    Arguments to the cached function must be hashable.
    Cache performance statistics stored in f.hits and f.misses.
    Clear the cache with f.clear().
    http://en.wikipedia.org/wiki/Least_Frequently_Used

    '''
    def decorating_function(user_function):
        cache = {}                      # mapping of args to results
        use_count = Counter()           # times each key has been accessed
        kwd_mark = object()             # separate positional and keyword args

        @functools.wraps(user_function)
        def wrapper(*args, **kwds):
            key = args
            if kwds:
                key += (kwd_mark,) + tuple(sorted(kwds.items()))
            use_count[key] += 1

            # get cache entry or compute if not found
            try:
                result = cache[key]
                wrapper.hits += 1
            except KeyError:
                result = user_function(*args, **kwds)
                cache[key] = result
                wrapper.misses += 1

                # purge least frequently used cache entry
                if len(cache) > maxsize:
                    for key, _ in nsmallest(maxsize // 10,
                                            use_count.iteritems(),
                                            key=itemgetter(1)):
                        del cache[key], use_count[key]

            return result

        def clear():
            cache.clear()
            use_count.clear()
            wrapper.hits = wrapper.misses = 0

        wrapper.hits = wrapper.misses = 0
        wrapper.clear = clear
        return wrapper
    return decorating_function


if __name__ == '__main__':

    @lru_cache(maxsize=20)
    def f(x, y):
        return 3*x+y

    domain = range(5)
    from random import choice
    for i in range(1000):
        r = f(choice(domain), choice(domain))

    print(f.hits, f.misses)

    @lfu_cache(maxsize=20)
    def f(x, y):
        return 3*x+y

    domain = range(5)
    from random import choice
    for i in range(1000):
        r = f(choice(domain), choice(domain))

    print(f.hits, f.misses)

An LRU cache limits its size by flushing the least recently used keys. In contrast, an LFU cache flushes the least frequently used keys. Use an LRU cache when recent accesses are the best predictors of upcoming caches -- when the frequency distribution of calls changes over time. Use an LFU cache when the call frequency does not vary over time (i.e. what was the most frequent access last week is also the same this week).

An news server would likely use an LRU cache because the most popular articles change each day. In contrast, an encyclopedia likely has some articles that are always more frequently served than others; accordingly, in would use an LFU cache.

Since the cache has its own computational overhead, it is only useful where the underlying calculation or action is somewhat expensive.

The "cache" variable holds the mapping for input arguments to result values. The "queue" records successive accesses so we can which tell was the oldest. The queue can contain duplicates if the same arguments are called frequently. When pulling old items out of the queue, the "refcount" dictionary tracks whether other duplicate entries are still in the queue. An item isn't deleted until its refcount drops to zero. The invariants are len(cache)==len(refcount)==len(set(queue)) and sum(refcount.values)==len(queue).

Use the f.hits and f.misses instrumentation to tune the best setting for "maxsize" giving the best trade-off between hit-rate and space consumed.

Here's an alternative implementation using OrderedDict from Python 2.7 or 3.1:

import collections
import functools

def lru_cache(maxsize=100):
    '''Least-recently-used cache decorator.

    Arguments to the cached function must be hashable.
    Cache performance statistics stored in f.hits and f.misses.
    http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used

    '''
    def decorating_function(user_function):
        cache = collections.OrderedDict()    # order: least recent to most recent

        @functools.wraps(user_function)
        def wrapper(*args, **kwds):
            key = args
            if kwds:
                key += tuple(sorted(kwds.items()))
            try:
                result = cache.pop(key)
                wrapper.hits += 1
            except KeyError:
                result = user_function(*args, **kwds)
                wrapper.misses += 1
                if len(cache) >= maxsize:
                    cache.popitem(0)    # purge least recently used cache entry
            cache[key] = result         # record recent use of this key
            return result
        wrapper.hits = wrapper.misses = 0
        return wrapper
    return decorating_function

13 comments

sasa sasa 14 years, 2 months ago  # | flag

question. why do you say:

for i in [None] * _len(queue):

instead of:

for i in xrange(_len(queue)):
sasa sasa 14 years, 2 months ago  # | flag

this implementation appears to be about 50% faster (using -m profile and your __main__ method as the performance test):

import time
import heapq

def memoize(maxsize):
    """decorator to 'memoize' a function - caching its results"""
    def decorating_function(f):
        cache = {}  # map from key to value
        heap = []   # list of keys, in LRU heap
        cursize = 0 # because len() is slow
        def wrapper(*args):
            key = repr(args)
            # performance crap
            _cache=cache
            _heap=heap
            _heappop = heapq.heappop
            _heappush = heapq.heappush
            _time = time.time
            _cursize = cursize
            _maxsize = maxsize
            if not _cache.has_key(key):
                if _cursize == _maxsize:
                    # pop oldest element
                    (_,oldkey) = _heappop(_heap)
                    _cache.pop(oldkey)
                else:
                    _cursize += 1
                # insert this element
                _cache[key] = f(*args)
                _heappush(_heap,(_time(),key))
                wrapper.misses += 1
            else:
                wrapper.hits += 1
        return cache[key]
        wrapper.__doc__ = f.__doc__
        wrapper.__name__ = f.__name__
        wrapper.hits = wrapper.misses = 0
        return wrapper
    return decorating_function
Gary Robinson 13 years, 8 months ago  # | flag

Problems with sasa sasa's version. "return cache[key]" needs to be indented.

Every time the memoized function is called _cursize is reset from cursize, so the LRU limit is never reached.

Even if the LRU limit were reached, it isn't really LRU. It gets rid of the the item that was inserted into the cache longest ago; it doesn't notice when the item was USED.

Gary Robinson 13 years, 8 months ago  # | flag

Nice. Since no one has made any positive comments on this recipe, I just want to say that I've read through it and think it's excellent work. I'm going to use it in an internal project.

Brien Voorhees 12 years, 10 months ago  # | flag

Just what I was looking for. Thanks!

Jason R. Coombs 12 years ago  # | flag

Instead of wrapper.__doc__ = f.__doc__ and wrapper.__name__ = f.__name__, use functools.wraps.

    @functools.wraps(f)
    def wrapper(*args):
        pass #implementation omitted
patate 11 years ago  # | flag

Great work! Anybody ever used this module in production? Is there an updated version somewhere?

Bradley Ayers 10 years, 8 months ago  # | flag

I slightly modified lfu_cache to avoid a problem when everything in the cache has been called at least twice. In this scenario, the new function call will be added to the cache, but will be immediately removed (as it has less hits than everything else).

def lfu_cache(maxsize=100):
    '''Least-frequenty-used cache decorator.

    Arguments to the cached function must be hashable.
    Cache performance statistics stored in f.hits and f.misses.
    Clear the cache with f.clear().
    http://en.wikipedia.org/wiki/Least_Frequently_Used

    '''
    def decorating_function(user_function):
        cache = {}                      # mapping of args to results
        use_count = Counter()           # times each key has been accessed
        kwarg_mark = object()             # separate positional and keyword args

        @functools.wraps(user_function)
        def wrapper(*args, **kwargs):
            key = args
            if kwargs:
                key += (kwarg_mark,) + tuple(sorted(kwargs.items()))

            # get cache entry or compute if not found
            try:
                result = cache[key]
                use_count[key] += 1
                wrapper.hits += 1
            except KeyError:
                # need to add something to the cache, make room if necessary
                if len(cache) == maxsize:
                    for k, _ in nsmallest(maxsize // 10 or 1,
                                            use_count.iteritems(),
                                            key=itemgetter(1)):
                        del cache[k], use_count[k]
                cache[key] = user_function(*args, **kwargs)
                result = cache[key]
                use_count[key] += 1
                wrapper.misses += 1
            return result

        def clear():
            cache.clear()
            use_count.clear()
            wrapper.hits = wrapper.misses = 0

        wrapper.hits = wrapper.misses = 0
        wrapper.clear = clear
        wrapper.cache = cache
        return wrapper
    return decorating_function
Olemis Lang 8 years, 8 months ago  # | flag

Thanks for sharing this code .

We'll be using the original version in Apacheā„¢ Bloodhound . After running our test suite we have added two more things:

  • Added keymap arg to build custom keys out of actual args
  • Keep cache consistency on user function failure (CRITICAL BUG)

Modifications were suggested in this patch ...

https://issues.apache.org/bloodhound/attachment/ticket/440/t440_1456016_product_env_singleton.diff

... and eventually will be initially added in this file

http://svn.apache.org/viewvc/incubator/bloodhound/branches/bep_0003_multiproduct/bloodhound_multiproduct/multiproduct/cache.py?view=markup

if anyone has further comments about all this (code , licensing , ...) , please let us know .

germank 8 years, 6 months ago  # | flag

Hi, First of all, thank you very much for this! I wanted to point that the code has a bug when the cached function throws an exception. In that case, when it later tries to delete the corresponding key (for which there was no stored value) it throws a KeyError exception. My solution to it was just to capture it, but it might not be the best way to go. Perhaps it would be hiding some other error in the logic.

Jonny Galloway 8 years, 3 months ago  # | flag

I was trying out these caching methods: I am implementing something like this and get an error:

<code> class P4client(object): def __init__(self): pass

    def checkFileStats(self, filePath):
        results = 'The filepath: {0}'.format(filePath)
        print results
        return results

p4client = P4client()

filePath = (r"C:\depot\r1dev\content\r1\materialsrc\bpTest\test01\tester.tga")

@lfu_cacheItem            
def p4checkFileStats(filePath):
    '''Will cache the fstats return'''
    p4checkResults = p4client.checkFileStats(filePath)
    return p4checkResults    

p4checkFileStats(filePath)

</code>

and I get this sort of error: AttributeError: 'str' object has no attribute '__module__'

test 6 years, 10 months ago  # | flag

You have an idea how to implement it with threads?

Raymond Hettinger (author) 6 years, 10 months ago  # | flag

You have an idea how to implement it with threads?

Yes. The technique is the same as everything else with threads, use a mutex to make sure that no two threads are updating the LRU/LFU data structure at the same time.

lock = threading.Lock()        # or perhaps an RLock

@lru_cache(128)
def slow_function(x):
      ...


with lock:
    slow_function(10)

Alternatively, you can put the locks inside the lru/lfu code. See the Py3.4 Lib/functools.py source code for worked-out example: https://hg.python.org/cpython/file/463bbd862887/Lib/functools.py#l334