This module saves and reloads compressed representations of generic Python objects to and from the disk.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | """Generic object pickler and compressor
This module saves and reloads compressed representations of generic Python
objects to and from the disk.
"""
__author__ = "Bill McNeill <billmcn@speakeasy.net>"
__version__ = "1.0"
import pickle
import gzip
def save(object, filename, bin = 1):
"""Saves a compressed object to disk
"""
file = gzip.GzipFile(filename, 'wb')
file.write(pickle.dumps(object, bin))
file.close()
def load(filename):
"""Loads a compressed object from disk
"""
file = gzip.GzipFile(filename, 'rb')
buffer = ""
while 1:
data = file.read()
if data == "":
break
buffer += data
object = pickle.loads(buffer)
file.close()
return object
if __name__ == "__main__":
import sys
import os.path
class Object:
x = 7
y = "This is an object."
filename = sys.argv[1]
if os.path.isfile(filename):
o = load(filename)
print "Loaded %s" % o
else:
o = Object()
save(o, filename)
print "Saved %s" % o
|
Tags: files
reading from the gzip'ed file. why the 'while 1: ... buffer += data' loop? Shouldn't the 'read()' return all the data? For that matter, just use pickle.load to read from a file handle. And you might want to change the "bin = 1" to use -1; the pickle interface changed slightly in 2.3.
My instrumentation system (soft real-time, 100% Python) has a history object that uses 500 MB of RAM to capture 1 hour of detailed (but often repeated) data. The original code for save() crashed my system.
For large objects, pickle.dumps() consumes way too much memory (the entire pickle is first created as a string in memory, unlike pickle.dump() which writes directly to a file), and pickle itself is way too slow (compared to cPickle). For this need, it would be far better to use cPickle, and pass the file handle to the cPickle.dump() call.
The code below reliably pickles a 500 MB object to a 5 MB gzip file in under 30 seconds on a 1 GHz Celeron.
Similar changes to load() would have similar benefits. The above code includes a change suggested by Andrew Dalke in a prior comment.
Here is the same code, taking both comments into account, (and fixing some typos in the latest save method as well):