A persistent, lazy, caching, dictionary, using the anydbm module for persistence. Keys must be basic strings (this is an anydbm limitation) and values must be pickle-able objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | import anydbm
import cPickle as pickle
class pdict(object):
""" A persistent, lazy, caching, dictionary, using the anydbm module for
persistence. Keys must be basic strings (this is an anydbm limitation) and
values must be pickle-able objects. """
def __init__(self, file, mode):
""" Create new pdict using file. mode is passed to anydbm.open(). """
self._cache = {}
self._flush = {}
self._dbm = anydbm.open(file, mode)
def __contains__(self, key):
return key in self._cache or key in self._dbm
def __getitem__(self, key):
if key in self._cache:
return self._cache[key]
return self._cache.setdefault(key, pickle.loads(self._dbm[key]))
def __setitem__(self, key, value):
self._cache[key] = self._flush[key] = value
def __delitem__(self, key):
found = False
for data in (self._cache, self._flush, self._dbm):
if key in data:
del data[key]
found = True
if not found:
raise KeyError(key)
def keys(self):
keys = set(self._cache.keys())
keys.update(self._dbm.keys())
return keys
def sync(self):
for key, value in self._flush.iteritems():
self._dbm[key] = pickle.dumps(value, 2)
self._dbm.sync()
self._flush = {}
|
This class is meant for storing large datasets. Values are loaded lazily from the anydbm module and cached for performance. Similarly, modified values are not written back to the anydbm database until a sync() is performed.
The class is quite basic, and deliberately doesn't implement all dictionary behaviour. eg. values(), items(), etc. would defeat the purpose of having lazy fetching.
Features that might be useful are a limit on the flush/cache elements to avoid excessive memory use, support for arbitrary key types using pickle, and probably more. I think you get the idea.
Thanks for the recipe.
I couldn't find sync() method in anydbm ( python-2.7.8 : https://docs.python.org/2/library/anydbm.html )
If I switch to dumbdbm it works just fine :)