Welcome, guest | Sign In | My Account | Store | Cart

This module saves and reloads compressed representations of generic Python objects to and from the disk.

Python, 52 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
"""Generic object pickler and compressor

This module saves and reloads compressed representations of generic Python
objects to and from the disk.
"""

__author__ = "Bill McNeill <billmcn@speakeasy.net>"
__version__ = "1.0"

import pickle
import gzip


def save(object, filename, bin = 1):
	"""Saves a compressed object to disk
	"""
	file = gzip.GzipFile(filename, 'wb')
	file.write(pickle.dumps(object, bin))
	file.close()


def load(filename):
	"""Loads a compressed object from disk
	"""
	file = gzip.GzipFile(filename, 'rb')
	buffer = ""
	while 1:
		data = file.read()
		if data == "":
			break
		buffer += data
	object = pickle.loads(buffer)
	file.close()
	return object


if __name__ == "__main__":
	import sys
	import os.path
	
	class Object:
		x = 7
		y = "This is an object."
	
	filename = sys.argv[1]
	if os.path.isfile(filename):
		o = load(filename)
		print "Loaded %s" % o
	else:
		o = Object()
		save(o, filename)
		print "Saved %s" % o

3 comments

Andrew Dalke 20 years, 8 months ago  # | flag

reading from the gzip'ed file. why the 'while 1: ... buffer += data' loop? Shouldn't the 'read()' return all the data? For that matter, just use pickle.load to read from a file handle. And you might want to change the "bin = 1" to use -1; the pickle interface changed slightly in 2.3.

rcunningham 15 years, 6 months ago  # | flag

My instrumentation system (soft real-time, 100% Python) has a history object that uses 500 MB of RAM to capture 1 hour of detailed (but often repeated) data. The original code for save() crashed my system.

For large objects, pickle.dumps() consumes way too much memory (the entire pickle is first created as a string in memory, unlike pickle.dump() which writes directly to a file), and pickle itself is way too slow (compared to cPickle). For this need, it would be far better to use cPickle, and pass the file handle to the cPickle.dump() call.

The code below reliably pickles a 500 MB object to a 5 MB gzip file in under 30 seconds on a 1 GHz Celeron.

import cPickle    # Faster than pickle
import gzip

def save(object, filename, protocol = -1):
    """Save an object to a compressed disk file.
       Works well with huge objects.
    """
    file = gzip.GzipFile(filename, 'wb')
    cPickle.dump(object, file, protocol))
    file.close()

Similar changes to load() would have similar benefits. The above code includes a change suggested by Andrew Dalke in a prior comment.

Zach Dwiel 10 years, 6 months ago  # | flag

Here is the same code, taking both comments into account, (and fixing some typos in the latest save method as well):

import cPickle
import gzip

def save(object, filename, protocol = -1):
    """Save an object to a compressed disk file.
       Works well with huge objects.
    """
    file = gzip.GzipFile(filename, 'wb')
    cPickle.dump(object, file, protocol)
    file.close()

def load(filename):
    """Loads a compressed object from disk
    """
    file = gzip.GzipFile(filename, 'rb')
    object = cPickle.load(file)
    file.close()

    return object