Welcome, guest | Sign In | My Account | Store | Cart

This tries to be a safer and faster memoize decorator, it works with mutable types and with keyword args too. The code comes from many different versions found around. This code isn't much tested yet, so it must be used with care.

Python, 63 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
from cPickle import dumps, PicklingError # for memoize


class memoize(object):
    """Decorator that caches a function's return value each time it is called.
    If called later with the same arguments, the cached value is returned, and
    not re-evaluated. Slow for mutable types."""
    # Ideas from MemoizeMutable class of Recipe 52201 by Paul Moore and
    # from memoized decorator of http://wiki.python.org/moin/PythonDecoratorLibrary
    # For a version with timeout see Recipe 325905
    # For a self cleaning version see Recipe 440678
    # Weak references (a dict with weak values) can be used, like this:
    #   self._cache = weakref.WeakValueDictionary()
    #   but the keys of such dict can't be int
    def __init__(self, func):
        self.func = func
        self._cache = {}
    def __call__(self, *args, **kwds):
        key = args
        if kwds:
            items = kwds.items()
            items.sort()
            key = key + tuple(items)
        try:
            if key in self._cache:
                return self._cache[key]
            self._cache[key] = result = self.func(*args, **kwds)
            return result
        except TypeError:
            try:
                dump = dumps(key)
            except PicklingError:
                return self.func(*args, **kwds)
            else:
                if dump in self._cache:
                    return self._cache[dump]
                self._cache[dump] = result = self.func(*args, **kwds)
                return result


if __name__ == "__main__": # Some examples
    @memoize
    def fibonacci(n):
       "Return the n-th element of the Fibonacci series."
       if n < 3:
          return 1
       return fibonacci(n-1) + fibonacci(n-2)

    print [fibonacci(i) for i in xrange(1, 101)]

    @memoize
    def pow(x, p=2):
        if p==1:
            return x
        else:
            return x * pow(x, p=p-1)

    print [pow(3, p=i) for i in xrange(1, 6)]

    @memoize
    def printlist(l):
        print l
    printlist([1,2,3,4])

Memoizing isn't much useful, I have never had to use it yet.

Weak references (a dict with weak values) can be used, like this: self._cache = weakref.WeakValueDictionary() but the keys of such dict can't be can't ints and others. Any suggestion is appreciated.

V.1.1: key=itemgetter(0) was indeed useless (as suggested by Michael Palmer), because keys are all different, so the second item of the tuples (that can be unsortable elements) are ignored.

V.1.2: added PicklingError management, as suggested by Rogier Steehouder.

9 comments

Michael Palmer 18 years, 3 months ago  # | flag

Very nice - can be simplified a bit. Apparently you don't need to sort kw items. The following code snippet:

from random import shuffle
a=list('Mona Lisa')
shuffle(a)
print a
b=dict.fromkeys(a)
shuffle(a)
print a
c=dict.fromkeys(a)

print b.items()==c.items()

prints 'True', but I must confess I'm not sure whether this is guaranteed. If you do want to sort, you don't need itemgetter, since sort does the 'right thing' on tuples anyway.

# simplified version

class memoize(object):
    def __init__(self, func):
        self.func = func
        self._cache = {}
    def __call__(self, *args, **kwds):
        t = (args, kwds.items())
        try:
            key = hash(t)
        except TypeError:
            key = dumps(t)
        if not key in self._cache:
            self._cache[key] = self.func(*args, **kwds)
        return result
Michael Palmer 18 years, 3 months ago  # | flag

Correction to above comment. Obviously, the last line in the previous comment must be

return self.cache[key]
Aaron Bentley 18 years, 3 months ago  # | flag

hash shouldn't be used. I don't believe that hash is appropriate here, because it considers floats and ints of the same value to be equivalent. But the result of 1/2 is different from the result of 1/2.0

Rogier Steehouder 18 years, 3 months ago  # | flag

PicklingError. I would add the following around the "dump" bit, so if pickling fails, at least it will perform the function.

try:
    ....
except PicklingError:
    return self.func(*args, **kwds)
Michael Palmer 18 years, 3 months ago  # | flag

hash _should_ be used. The comment about PicklingError is valid. However, the one about hash is not. While it is true that hash does not distinguish between floats and ints, the same is true about dict keys. Try:

a={}
a[1]= 'Clinton'
a[1.0] = 'Bush'
print a[1]

(prints 'Bush')

So, you cannot have separate entries in a dict for floats and ints of the same value.

The reason to use hash was not just brevity but also to avoid one problem with the original recipe: The

except TypeError

clause does not only catch errors that originate from key hashing but also unhandled TypeErrors that originate inside func. However, one real problem with the proposed use of hash is the increased likelihood of key collisions, since we actually hash twice (first explicitly with 'hash' and then implicity when using the result of hashing as the key). This is amended in the code below.

class memoize(object):
    def __init__(self, func):
        self.func = func
        self._cache = {}
    def __call__(self, *args, **kwds):
        t = (args, kwds.items())
        try:
            hash(t)
            key = t
        except TypeError:
            try:
                key = dumps(t)
            except cPickle.PicklingError:
                return func(*args, **kwds)
        if not key in self._cache:
            self._cache[key] = self.func(*args, **kwds)
        return self._cache[key]
Michael Palmer 18 years, 3 months ago  # | flag

hashing again. Thinking of it, I'm not really sure whether hashing a hash really increases the likelihood of collisions. Anyway, the most recent version is equivalent in its use of keys to the original one, but still avoids catching exceptions from inside the memoized function (except for PicklingErrors, obviously).

Oleg Noga 17 years, 3 months ago  # | flag

Thanks! __get__ is untrivial. This example stores cache in object instance, not in static. class memoize(object):

def __init__(self, function):

    self._function = function

    self._cacheName = '_cache__' + function.__name__



def __get__(self, instance, cls=None):

    self._instance = instance

    return self



def __call__(self, *args):

    cache = self._instance.__dict__.setdefault(self._cacheName, {})



    if cache.has_key(args):

        return cache[args]

    else:

        object = cache[args] = self._function(self._instance, *args)

        return object
brent pedersen 17 years, 2 months ago  # | flag

persistent memoize. nice! changing __init__ to this, and importing shelve and os, you can get a persistent cache. likely only useful if the function is quite slow.

cachedir = "/tmp/shelved/"
if not os.path.exists(cachedir): os.mkdir(cachedir)


class memoize(object)
    def __init__(self, func,mode='c'):
        self.func = func
        file = os.path.join(cachedir,"%s.shelve" % func.func_name )
        self._cache = shelve.open(file,mode)
        self._file = file
Daniel Miller 13 years, 5 months ago  # | flag

I posted an improvement on Oleg Noga's adaptation. See the discussion for details.

Created by bearophile - on Thu, 19 Jan 2006 (PSF)
Python recipes (4591)
bearophile -'s recipes (15)

Required Modules

Other Information and Tasks