Welcome, guest | Sign In | My Account | Store | Cart

For functions which are called often, particulary recursive functions or functions which are intensive to calculate, memoizing (cacheing) the return values can dramatically improve performance.

Python, 52 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Functions can be memoised "by hand" using a dictionary to hold
# the return values when they are calculated:

# Here is a simple case, using the recursive fibonnaci function
#     f(n) = f(n-1) + f(n-2)

fib_memo = {}
def fib(n):
    if n < 2: return 1
    if not fib_memo.has_key(n):
        fib_memo[n] = fib(n-1) + fib(n-2)
    return fib_memo[n]

# To encapsulate this in a class, use the Memoize class:

class Memoize:
    """Memoize(fn) - an instance which acts like fn but memoizes its arguments
       Will only work on functions with non-mutable arguments
    """
    def __init__(self, fn):
        self.fn = fn
        self.memo = {}
    def __call__(self, *args):
        if not self.memo.has_key(args):
            self.memo[args] = self.fn(*args)
        return self.memo[args]

# And here is how to use this class to memoize fib(). Note that the definition
# for fib() is now the "obvious" one, without the cacheing code obscuring
# the algorithm.
def fib(n):
    if n < 2: return 1
    return fib(n-1) + fib(n-2)

fib = Memoize(fib)

# For functions taking mutable arguments, use the cPickle module, as
# in class MemoizeMutable:

class MemoizeMutable:
    """Memoize(fn) - an instance which acts like fn but memoizes its arguments
       Will work on functions with mutable arguments (slower than Memoize)
    """
    def __init__(self, fn):
        self.fn = fn
        self.memo = {}
    def __call__(self, *args):
        import cPickle
        str = cPickle.dumps(args)
        if not self.memo.has_key(str):
            self.memo[str] = self.fn(*args)
        return self.memo[str]

Note that when using the Memoize class, it is important that the value of fib is replaced by the memoized version. Storing the memoized version elsewhere, as in memoized_fib = Memoize(fib) will not work, because the recursive calls will call fib() directly, bypassing the cache.

Obviously, functions to be memoized must have no side effects, and return the same value whenever they are called with the same set of arguments.

More significantly, the Memoize class (and the inline version above) does not cater for functions which take mutable arguments, such as length() on a list. More precisely, the argument tuple must be suitable for use as a dictionary key. (Note that you can't memoize functions which change their mutable arguments, in any case). The MemoizeMutable class handles mutable arguments, by pickling the argument tuple into a constant string.

7 comments

Mitch Chapman 23 years ago  # | flag

Variation: Memoizing subprocesses. This recipe can be extended to cover output from costly subsidiary processes.

Suppose your application executes and uses output from a legacy program. Suppose that program has no command-line options but reads its input from a batch file. Then you can generate a (very probably) unique key for the input using a module such as md5 or sha and can store the output from the program in an application cache directory.

Here's an untested example. It assumes the function to be memoized takes as its sole argument a string to be used as input to the subprocess (with apologies for the double-spaced lines):

import sys, os, md5, string

class AppMemo:
    def __init__(self, cacheDir=None):
        cacheDir = cacheDir or "./.app_cache"
        if not os.path.isdir(cacheDir):
            os.makedirs(cacheDir)
        self.cacheDir = cacheDir

    def __getitem__(self, input):
        """Given the input data for a subprocess, find the corresponding output."""
        pathname = self._inputToPathname(input)
        if not os.path.isfile(pathname):
            raise KeyError("No output file for provided input.")
        content = open(pathname, "rb").read()
        return content

    def _inputToPathname(self, input):
        """Given input data for a subprocess, get the filename containing the corresponding output."""
        hasher = md5.new(`input`)
        digest = hasher.digest()
        hash = []
        for c in digest:
            hash.append("%02X" % c)
        hash = string.join(hash, "")
        pathname = os.path.join(self.cacheDir, hash)
        return pathname

...
class AppMemoize(Memoize):
    def __init__(self, fn, cacheDir=None):
        Memoize.__init__(self, fn)
        self.memo = AppMemo(cacheDir=cacheDir)

Mitch Chapman

Zander Lichstein 23 years ago  # | flag

Add LRU code. Should create another (heavier) class which implements a Least Recently Used algorithm if you want to limit the size of the cache. zander lichstein

Zander Lichstein 23 years ago  # | flag

expiry. How about a time-based expiry too? And a method to purge a key. zander lichstein

garrett rooney 23 years ago  # | flag

weak refs would help here. This is really cool, but it could be better if it used the new weak references from python 2.1.

The way it works now, the cached values will be stored forever, and will never be garbage collected. This is fine if they are small, but if they happen to be non trivial objects, the memory cost could be a problem. If they are refered to with weak references, they can still be garbage collected, as long as there are no strong references to them.

It's not necessary for all instances, but it should at least be mentioned.

see pep 205 for more details on weak references. garrett rooney

Scott David Daniels 22 years, 11 months ago  # | flag

Add keyword args for mutable case. Although dictionaries don't work as keys for immutable args, the MemoizeMutable version can handle keyword args with:

def __call__(self, *args, **kw):
    import cPickle
    str = cPickle.dumps((args, kw))
    if not self.memo.has_key(str):
        self.memo[str] = self.fn(*args, **kw)
    return self.memo[str]
Hannu Kankaanpää 20 years, 11 months ago  # | flag

How about this? Closures would be a neat solution here. And repr takes care of non-hashable arguments and the strong ref problem (..but assumes the arguments have a repr that exposes all about them uniquely..)

def memoize(fn):
    memo = {}
    def memoizer(*param1):
        key = repr(param1)
        if not memo.has_key(key):
            memo[key] = fn(*param1)
        return memo[key]
    return memoizer
Connelly Barnes 18 years, 5 months ago  # | flag

Clear cache on return. For a memoization decorator which clears the cache when the last copy of the function returns, see Recipe 440678. This can be a useful cache-clearing scheme.