Welcome, guest | Sign In | My Account | Store | Cart

This recipe addresses the following two needs: - Object construction for some class is expensive. - Objects of this class need to be instantiated across multiple runs of the program. For example, object instantiaton may involve reading one or more big files, connecting to a database or a network socket with considerable expected delay, etc.

The autopickle decorator deals with this problem by wrapping the __init__ of the class. The first time a specific instance is created, it is also pickled to a file. In all subsequent attempts to create the same instance, the pickled instance is loaded and returned instead. If unpickling the file is faster than creating the instance normally, all but the first instantiations are faster than the normal one.

The instance determines the path of the file to be pickled to by calling its getPickleFilename() method. This takes the same arguments given in __init__ and it has the responsibility to specify a valid and distinct path for the instance.

Python, 44 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import os
import cPickle as pickle

__all__ = ['autopickle', 'BINMODE']

# whether the pickled files are binary
BINMODE = False

def autopickle(__init__):
    """Decorator for instantiating pickled instances transparently."""

    def new__init__(self, *args, **kwds):
        picklename = self.getPickleFilename(*args, **kwds)
        if os.path.exists(picklename):
            newSelf = pickle.load(open(picklename))
            assert type(newSelf) is type(self)
            # copy newSelf to self
            if hasattr(newSelf, '__getstate__'):
                state = newSelf.__getstate__()
            else:
                state = newSelf.__dict__
            if hasattr(self, '__setstate__'):
                self.__setstate__(state)
            else:
                self.__dict__.update(state)
        else:
            __init__(self, *args, **kwds)
            picklefile = open(picklename, BINMODE and 'wb' or 'w')
            try: pickle.dump(self, picklefile, BINMODE)
            finally: picklefile.close()
    return new__init__


if __name__ == '__main__':

    class Foo(object):
        @autopickle
        def __init__(self, id):
            import time; time.sleep(2)
            self.id = id
        def getPickleFilename(self, id):
            return "%s.dat" % id

    print Foo(1)

This recipe is similar to http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/413717 in that they both cache objects transparently. The main difference is that the autopickle decorator caches an instance on disk as a pickled file, while the earlier recipe was based on caching instances in memory, typically in a dictionary. A second difference is that this recipe is not completely transparent: the autopickled instances must have a method that determines the path of the file to be pickled to.

The autopickle decorator can be thought as a simplest transparent persistence mechanism. It doesn't deal with updating the pickled file if the respective instance is modified and it doesn't scale for more than a few thousands instances. For a much more complete and scalable solution to object persistence, use an OO database such as ZODB (http://www.zope.org/Wikis/ZODB/FrontPage).

2 comments

brent pedersen 17 years, 2 months ago  # | flag

simple function decorator. that is quite useful. i've modded it to just handle a simple function. a lot if times, when developing/screen scraping, it's nice to have simple fuctions cache the results automatically.

import os
import cPickle as pickle
def autopickle(f):
    """ decorator to persistently cache a function """
    def new_f(*args, **kwds):
        filename = "pickled/%s.dat" % f.func_name
        if os.path.exists(picklename):
            return pickle.load(open(picklename))
        else:
            res = f(*args,**kwds)
            pf = pickle.dump(res,open(picklename,'wb'),1)
            return res
    return new_f

USEAGE:

@autopickle
def get_station_list(rawurl):
    # this will only print 1x.
    print "getting from first time"
    soup = BeautifulSoup(U.urlopen(rawurl).read())
    atags = soup.findAll('a',attrs={'href':re.compile('rawmain',re.I)})
    return [a['href'] for a in atags]

if __name__ == "__main__":
    atags = get_station_list(rawurl)
    print "\n".join(atags[:5])
brent pedersen 17 years, 2 months ago  # | flag

error. replace all ocurrences of 'picklename' with 'filename' so the previous works.