Welcome, guest | Sign In | My Account | Store | Cart

An O(1) LRU cache. Short, sweet, and fast. No bells and whistles.

Python, 44 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def cache_generator(original_function, maxsize):
    mapping = {}
    mapping_get = mapping.get
    root = [None, None]
    root[:] = [root, root]
    value = None
    size = 0

    PREV, NEXT, KEY, VALUE = 0, 1, 2, 3
    while 1:
        key = yield value
        link = mapping_get(key, root)
        if link is root:
            value = original_function(key)
            if size < maxsize:
                size += 1
            else:
                old_prev, old_next, old_key, old_value = root[NEXT]
                root[NEXT] = old_next
                old_next[PREV] = root
                del mapping[old_key]
            last = root[PREV]
            link = [last, root, key, value]
            mapping[key] = last[NEXT] = root[PREV] = link
        else:
            link_prev, link_next, key, value = link
            link_prev[NEXT] = link_next
            link_next[PREV] = link_prev
            last = root[PREV]
            last[NEXT] = root[PREV] = link
            link[PREV] = last
            link[NEXT] = root

def make_cache(original_function, maxsize=100):
    'Create a cache around a function that takes a single argument'
    c = cache_generator(original_function, maxsize)
    next(c)
    return c.send


if __name__ == '__main__':
    p = make_cache(ord, maxsize=3)
    for c in 'abcdecaeaa':
        print(c, p(c))

Simplified and highly optimized from the version that was added to the standard library.

The main optimization is to simplify the functionality (single argument only, no keyword arguments, no tracking of the hit/miss ratio, and no clear() method).

The next major optimization was to inline the relevant code from Python's OrderedDict implementation.

Another optimization is to use only one dictionary (the OrderedDict version uses two). It points at links organized as [PREV, NEXT, KEY, VALUE].

An interesting optimization is to roll all the code into generator and make calls using send(). This allows all of the state variables to be local variables and it saves the cost of creating a new stack frame on every call. Besides making the code very fast, it also simplifies the code quite a bit (34 lines of dirt simple Python instructions operating on lists and dictionaries).

The instructions were selected to be among Python's fastest (for example, the eval-loop has a fast path for a binary_subscription with integer index into a list; and it has a similar optimization for integer comparisons).

The code runs well on PyPy, Jython, and any version of CPython at or after Python2.5.

4 comments

Gunnlaugur Þór Briem 12 years, 4 months ago  # | flag

Neat! maxsize is off-by-one though; size starts at 0 so if maxsize == 1, then size > maxsize is still False when one entry has been cached, so two entries get cached. Example:

def fnord(word):
    print 'called fnord(%r)' % word
    return ord(word)
p = make_cache(fnord, maxsize=2)
for c in 'abca':
    print(c, p(c))

... outputs:

called fnord('a')
('a', 97)
called fnord('b')
('b', 98)
called fnord('c')
('c', 99)
('a', 97)
('b', 98)
('c', 99)

so three entries are stored.

Sven Marnach 12 years, 3 months ago  # | flag

Nice recipe. You could have items deleted from the cache garbage collected earlier if you replace line 21 by

del mapping[old_key], old_key, old_value

Otherwise, the references in old_key and old_value will keep the deleted value alive.

Petr Viktorin 12 years, 3 months ago  # | flag

Unfortunately, it's not reentrant, so it's not very useful for memoization.

def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

fib = make_cache(fib)

print([fib(n) for n in range(16)])

raises ValueError: generator already executing.

Bernardo 11 years, 9 months ago  # | flag

Oh, this code has a very big problem! Once an exception occurs on the cached function, this iterator will be terminated. Thus every call will raise a StopIteration exception.

Possible solution: use a try...catch block and yield value, the_exception_to_be_reraised or None.

Created by Raymond Hettinger on Wed, 30 Nov 2011 (MIT)
Python recipes (4591)
Raymond Hettinger's recipes (97)

Required Modules

  • (none specified)

Other Information and Tasks