A caching decorator that garbage collects in a separate thread (for performance), allows each cached function to (optionally) set a custom maximum age for entries, and allows individual cache entries to be selectively invalidated.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | import time
import threading
class Cache:
"A cached function"
# a dict of sets, one for each instance of this class
__allInstances = set() # where the cached values are actually kept
maxAge = 3600 # the default max allowed age of a cache entry (in seconds)
collectionInterval = 2 # how long to wait between collection events
__stopCollecting = False
def __init__(self, func):
Cache.__allInstances.add(self)
self._store = {}
self.__func = func
def __del__(self):
if self in Cache.__allInstances:
Cache.__allInstances.remove(self)
def __call__(self, *args, **kw):
key = (args, tuple(sorted(kw.items())))
if self._store.has_key(key):
return self._store[key][1]
result = self.__func(*args, **kw)
self._store[key] = (time.time(), result)
return result
def invalidate(self):
"Invalidate all cache entries for this function"
self._store.clear()
def invalidate_one(self, *args, **kw):
"Invalidate the cache entry for a particular set of arguments for this function"
key = (args, tuple(sorted(kw.items())))
if self._store.has_key(key):
del self._store[key]
def collect(self):
"Clean out any cache entries in this store that are currently older than allowed"
now = time.time()
for key, v in self._store.items():
t, value = v # creation time, function output
if self.maxAge > 0 and now - t > self.maxAge: # max ages of zero mean don't collect
del self._store[key]
@classmethod
def collectAll(cls):
"Clean out all old cache entries in all functions being cached"
for instance in cls.__allInstances:
instance.collect()
@classmethod
def _startCollection(cls):
"Periodically clean up old entries until the stop flag is set"
while cls.__stopCollecting is not True:
time.sleep(cls.collectionInterval)
cls.collectAll()
@classmethod
def startCollection(cls):
"Start the automatic collection process in its own thread"
cls.collectorThread = threading.Thread(target=cls._startCollection)
cls.collectorThread.setDaemon(False)
cls.collectorThread.start()
@classmethod
def stopCollection(cls):
cls.__stopCollecting = True
# -------------------
# Example usage:
@Cache
def foo(arg):
print 'foo called with arg=%s' % arg
return arg*2
@Cache
def bar(arg1, arg2=3):
print 'bar called with arg1=%s arg2=%s' % (arg1, arg2)
return arg1 + arg2
foo(2) # cache misses, foo invoked, cache entry created for arg=2
foo(2) # cache hit, cached value retrieved
foo.invalidate() # all cache entries for foo are deleted (in this case, only 1)
foo(2) # cache misses, etc.
bar(1) # cache misses, cache entry created for arg1=1, arg2=3
bar(1, 3) # cache hit
bar(1, 2) # cache misses, cache entry created for arg1=1, arg2=2
bar.invalidate_one(1, 3)
bar(1, 3) # cache miss
bar(1, 2) # cache hit
Cache.collectionInterval = 1.5
Cache.startCollection() # starts cache collection for all funcs every 1.5 seconds
foo(2) # cache hit
foo.maxAge = 1 # set the max age for foo's (and only foo's) cache entries to 1 second
# wait a second
foo(2) # cache miss
Cache.stopCollection()
# ------------------
|
This kind of caching decorator is often useful in web development, when your pages are dynamic enough that you can't cache the whole page, but would still like to cache individual functions that are used to build the page. The selective invalidation is typically used when some action occurs that makes some of the cache results invalid; for example, if you're displaying message board posts, and a user posts a new message, you would want to invalidate the cache for a function that finds the most recent posts.
As with any caching system, there's some (pretty tiny) overhead, so you don't want to cache small, quick functions; you get the most bang for the buck on ones that are relatively expensive.
Also, as with any caching system, you gain speed at the expense of memory. So if the arguments the cached functions are called with are extremely variable, and you make lots of calls and have a relatively long collection interval, you could eat up large amounts of memory and even see your performance decrease (although it would take some pretty extreme usage for this to happen in the real world).
Because cache collection is done in its own thread, you can occasionally get a cached value back even when the value has theoretically expired. If a collection even occurs just before an entry expires, that entry will still be used all the way until the next collection event, whenever that is. Since the default collection interval is 2 seconds, thats the most any entry will be out of date, but if this is an issue, just set the maximum age and collection interval appropriately. In other words, to guarantee that no cache entry will ever be more than x seconds old, given that the collection interval is y seconds, set the maximum age for that cache instance to (x - y).