dbdict: a dbm based on a dict subclass.
On open, loads full file into memory. On close, writes full dict to disk (atomically). Supported output file formats: csv, json, and pickle. Input file format automatically discovered.
Usable by the shelve module for fast access.
| Python |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | '''Alternate DB based on a dict subclass
Runs like gdbm's fast mode (all writes all delayed until close).
While open, the whole dict is kept in memory. Start-up and
close time's are potentially long because the whole dict must be
read or written to disk.
Input file format is automatically discovered.
Output file format is selectable between pickle, json, and csv.
All three are backed by fast C implementations.
'''
import pickle, json, csv
import os, shutil
class DictDB(dict):
def __init__(self, filename, flag=None, mode=None, format=None, *args, **kwds):
self.flag = flag or 'c' # r=readonly, c=create, or n=new
self.mode = mode # None or octal triple like 0x666
self.format = format or 'csv' # csv, json, or pickle
self.filename = filename
if flag != 'n' and os.access(filename, os.R_OK):
file = __builtins__.open(filename, 'rb')
try:
self.load(file)
finally:
file.close()
self.update(*args, **kwds)
def sync(self):
if self.flag == 'r':
return
filename = self.filename
tempname = filename + '.tmp'
file = __builtins__.open(tempname, 'wb')
try:
self.dump(file)
except Exception:
file.close()
os.remove(tempname)
raise
file.close()
shutil.move(tempname, self.filename) # atomic commit
if self.mode is not None:
os.chmod(self.filename, self.mode)
def close(self):
self.sync()
def dump(self, file):
if self.format == 'csv':
csv.writer(file).writerows(self.iteritems())
elif self.format == 'json':
json.dump(self, file, separators=(',', ':'))
elif self.format == 'pickle':
pickle.dump(self.items(), file, -1)
else:
raise NotImplementedError('Unknown format: %r' % self.format)
def load(self, file):
# try formats from most restrictive to least restrictive
for loader in (pickle.load, json.load, csv.reader):
file.seek(0)
try:
return self.update(loader(file))
except Exception:
pass
raise ValueError('File not in recognized format')
def dbopen(filename, flag=None, mode=None, format=None):
return DictDB(filename, flag, mode, format)
if __name__ == '__main__':
import random
os.chdir('/dbm_sqlite/alt')
print(os.getcwd())
s = dbopen('tmp.shl', 'c', format='json')
print(s, 'start')
s['abc'] = '123'
s['rand'] = random.randrange(10000)
s.close()
f = __builtins__.open('tmp.shl', 'rb')
print (f.read())
f.close()
|
Discussion
Provides persistent dictionary support. Loads the full file into memory, leaves it there for full speed dict access, and then writes the full dict back on close (with an atomic commit).
Useful when lookup and mutation speed are more important than the time spent on the initial load and the final write-back.
Similar to the "F" mode in the gdbm module: "The F flag opens the database in fast mode. Writes to the database will not be synchronized".


Comments
Nice clean implementation! Very nice :)
Sign in to comment