ActiveState Code

Recipe 498245: LRU cache decorator


One-line decorator call adds caching to functions with hashable arguments and no keyword arguments. When the maximum size is reached, the least recently used entry is discarded -- appropriate for long-running processes which cannot allow caches to grow without bound. Includes built-in performance instrumentation.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
from collections import deque

def lru_cache(maxsize):
    '''Decorator applying a least-recently-used cache with the given maximum size.

    Arguments to the cached function must be hashable.
    Cache performance statistics stored in f.hits and f.misses.
    '''
    def decorating_function(f):
        cache = {}              # mapping of args to results
        queue = deque()         # order that keys have been accessed
        refcount = {}           # number of times each key is in the access queue
        def wrapper(*args):
            
            # localize variable access (ugly but fast)
            _cache=cache; _len=len; _refcount=refcount; _maxsize=maxsize
            queue_append=queue.append; queue_popleft = queue.popleft

            # get cache entry or compute if not found
            try:
                result = _cache[args]
                wrapper.hits += 1
            except KeyError:
                result = _cache[args] = f(*args)
                wrapper.misses += 1

            # record that this key was recently accessed
            queue_append(args)
            _refcount[args] = _refcount.get(args, 0) + 1

            # Purge least recently accessed cache contents
            while _len(_cache) > _maxsize:
                k = queue_popleft()
                _refcount[k] -= 1
                if not _refcount[k]:
                    del _cache[k]
                    del _refcount[k]
    
            # Periodically compact the queue by duplicate keys
            if _len(queue) > _maxsize * 4:
                for i in [None] * _len(queue):
                    k = queue_popleft()
                    if _refcount[k] == 1:
                        queue_append(k)
                    else:
                        _refcount[k] -= 1
                assert len(queue) == len(cache) == len(refcount) == sum(refcount.itervalues())

            return result
        wrapper.__doc__ = f.__doc__
        wrapper.__name__ = f.__name__
        wrapper.hits = wrapper.misses = 0
        return wrapper
    return decorating_function


if __name__ == '__main__':
         
    @lru_cache(maxsize=20)
    def f(x, y):
        return 3*x+y

    domain = range(5)
    from random import choice
    for i in range(1000):
        r = f(choice(domain), choice(domain))

    print f.hits, f.misses

Discussion

Since the cache has its own computational overhead, it is only useful where the underlying calculation or action is somewhat expensive.

The "cache" variable holds the mapping for input arguments to result values. The "queue" records successive accesses so we can which tell was the oldest. The queue can contain duplicates if the same arguments are called frequently. When pulling old items out of the queue, the "refcount" dictionary tracks whether other duplicate entries are still in the queue. An item isn't deleted until its refcount drops to zero. The invariants are len(cache)==len(refcount)==len(set(queue)) and sum(refcount.values)==len(queue).

Use the f.hits and f.misses instrumentation to tune the best setting for "maxsize" giving the best trade-off between hit-rate and space consumed.

Comments

  1. 1. At 1:07 p.m. on 2 oct 2007, sasa sasa said:

    question. why do you say:

    for i in [None] * _len(queue):
    

    instead of:

    for i in xrange(_len(queue)):
    
  2. 2. At 12:55 p.m. on 3 oct 2007, sasa sasa said:

    this implementation appears to be about 50% faster (using -m profile and your __main__ method as the performance test):

    import time
    import heapq
    
    def memoize(maxsize):
        """decorator to 'memoize' a function - caching its results"""
        def decorating_function(f):
            cache = {}  # map from key to value
            heap = []   # list of keys, in LRU heap
            cursize = 0 # because len() is slow
            def wrapper(*args):
                key = repr(args)
                # performance crap
                _cache=cache
                _heap=heap
                _heappop = heapq.heappop
                _heappush = heapq.heappush
                _time = time.time
                _cursize = cursize
                _maxsize = maxsize
                if not _cache.has_key(key):
                    if _cursize == _maxsize:
                        # pop oldest element
                        (_,oldkey) = _heappop(_heap)
                        _cache.pop(oldkey)
                    else:
                        _cursize += 1
                    # insert this element
                    _cache[key] = f(*args)
                    _heappush(_heap,(_time(),key))
                    wrapper.misses += 1
                else:
                    wrapper.hits += 1
            return cache[key]
            wrapper.__doc__ = f.__doc__
            wrapper.__name__ = f.__name__
            wrapper.hits = wrapper.misses = 0
            return wrapper
        return decorating_function
    
  3. 3. At 12:15 p.m. on 11 apr 2008, Gary Robinson said:

    Problems with sasa sasa's version. "return cache[key]" needs to be indented.

    Every time the memoized function is called _cursize is reset from cursize, so the LRU limit is never reached.

    Even if the LRU limit were reached, it isn't really LRU. It gets rid of the the item that was inserted into the cache longest ago; it doesn't notice when the item was USED.

  4. 4. At 10:34 a.m. on 12 apr 2008, Gary Robinson said:

    Nice. Since no one has made any positive comments on this recipe, I just want to say that I've read through it and think it's excellent work. I'm going to use it in an internal project.

  5. 5. At 2:55 p.m. on 4 feb 2009, Brien Voorhees said:

    Just what I was looking for. Thanks!

  6. 6. At 10:56 a.m. on 9 nov 2009, Jason R. Coombs said:

    Instead of wrapper.__doc__ = f.__doc__ and wrapper.__name__ = f.__name__, use functools.wraps.

        @functools.wraps(f)
        def wrapper(*args):
            pass #implementation omitted
    

Sign in to comment