This is a recipe for a Python "data object." It is similar in function to namedtuple (http://code.activestate.com/recipes/500261/) and recordtype (http://code.activestate.com/recipes/576555-records/) in that it is a simple container for data, but is designed to meet three specific goals:
- Easy to subclass data objects.
- Get/set speed comparable to a simple class.
- Minimal memory consumption per instance.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | class PseudoStructure(object):
"""A fast, small-footprint, structure-like type suitable for use as
a data transfer or parameter object.
This class is not intended to be used directly. Subclasses should
define the fields they care about as slots and (optionally) define
an initializer for those fields.
This class is compatible with both Python2 and Python3 (tested on
2.4-3.3).
"""
__slots__ = []
@classmethod
def define(cls, class_name, *field_names):
r"""Dynamically create a class definition for a subclass of
``PseudoStructure``.
Note that objects of a dynamically created class cannot be
pickled. If pickling support is needed, the subclass must have
a static definition.
>>> X = PseudoStructure.define('Attribute', 'name', 'value')
>>> X.__name__
'Attribute'
>>> issubclass(X, PseudoStructure)
True
>>> X.__slots__
('name', 'value')
"""
return type(class_name, (cls,), {"__slots__": field_names})
def __getattr__(self, name):
r"""Lazily initialize the value for *name*.
>>> class Attribute(PseudoStructure):
... __slots__ = ['name', 'value']
...
>>> attr = Attribute()
>>> attr.name is None
True
>>> attr.description is None
Traceback (most recent call last):
...
AttributeError: 'Attribute' object has no attribute 'description'
"""
# only called once, if a value has not yet been assigned to the slot;
# returns None as a reasonable default; if the name is not a slot,
# raises AttributeError as expected
setattr(self, name, None)
def __getstate__(self):
r"""Build the state object for pickling this object.
>>> class Attribute(PseudoStructure):
... __slots__ = ['name', 'value']
...
>>> attr = Attribute()
>>> attr.name = 'test'
>>> sorted(attr.__getstate__().items())
[('name', 'test'), ('value', None)]
"""
state = {}
for cls in self.__class__.__mro__:
slots = getattr(cls, "__slots__", [])
if (isinstance(slots, str)):
slots = [slots]
for slot in slots:
state[slot] = getattr(self, slot)
return state
def __setstate__(self, state):
r"""Initialize this object from a pickling state object.
>>> class Attribute(PseudoStructure):
... __slots__ = ['name', 'value']
...
>>> attr = Attribute()
>>> attr.name = 'test'
>>> state = attr.__getstate__()
>>> attr2 = Attribute()
>>> attr2.__setstate__(state)
>>> attr2.name == 'test'
True
>>> attr2.value is None
True
"""
for (name, value) in state.items():
setattr(self, name, value)
def __eq__(self, other):
r"""Return True if the internal states of *other* and self are
equal.
>>> class Sample(PseudoStructure):
... __slots__ = ['value']
...
>>> sample1 = Sample()
>>> sample2 = Sample()
>>> sample1 == sample2
True
>>> sample2.value = 'test'
>>> sample1 == sample2
False
"""
return (self.__getstate__() == other.__getstate__())
def __hash__(self):
r"""Return the hash of this pseudo-structure's internal state.
>>> class Sample(PseudoStructure):
... __slots__ = ['value']
...
>>> sample1 = Sample()
>>> sample2 = Sample()
>>> hash(sample1) == hash(sample2)
True
>>> sample2.value = 'test'
>>> hash(sample1) == hash(sample2)
False
"""
return hash(tuple(sorted(self.__getstate__().items())))
def __repr__(self):
r"""Return an unambiguous description of this object.
>>> class Sample(PseudoStructure):
... __slots__ = ['value']
...
>>> sample = Sample()
>>> repr(sample)
"<Sample {'value': None}>"
"""
return "<%s %r>" % (self.__class__.__name__, self.__getstate__())
if (__name__ == "__main__"):
import doctest
import pickle
doctest.testmod()
class Test(PseudoStructure):
__slots__ = ["name"]
x = Test()
x.name = "pickle test"
y = pickle.loads(pickle.dumps(x))
assert x == y
|
I created PseudoStruct after noticing that a project I was working on was running slower than I expected and was consuming far more memory than I was comfortable with.
I had initially used just a simple/traditional class-based approach for my data objects, but the issue was that I had potentially thousands of these data objects active at any given time. I needed a data object that met the following criteria:
- Must be mutable.
- Easy to define subclasses of data object types.
- Get/set speeds comparable to simple class object.
- Memory consumption must be (significantly) less than simple class objects.
- Can be pickled.
Namedtuple instances are not mutable, so I needed to use a different approach. Recordtype has all of the required performance characteristics and can be pickled, but like namedtuple it is not a straightforward matter to subclass a recordtype. (If not for that requirement, I would have chosen recordtype.)
PseudoStruct satisfies all performance characteristics, is pickleable, and is easy to subclass. For example:
class Attribute(PseudoStruct):
__slots__ = ['name', 'value']
class RichAttribute(Attribute):
# has slots for name, value, and description
__slots__ = ['description']
Like namedtuple and recordtype, PseudoStruct doesn't allow arbitrary fields to be created:
class Attribute(PseudoStruct):
__slot__ = ['name', 'value']
attr = Attribute()
attr.arbitrary = 'not allowed'
Traceback (most recent call last):
...
AttributeError: 'Attribute' object has no attribute 'arbitrary'
Unlike namedtuple and recordtype, PseudoStruct does not define an __init__ method by default (nor convenience methods like __len__ and __iter__). However, it is trivial to add these methods in a subclass.
Finally, with PseudoStruct I can make certain fields constant/immutable, while leaving the rest mutable:
class TypedAttribute(Attribute):
# has slots for type, name, and value
__slots__ = ['type']
class StringAttribute(TypedAttribute):
# type field is now constant/immutable for all StringAttribute objects,
# but name and value are still mutable
type = 'str'
The latest version is available from BitBucket at https://bitbucket.org/mzipay/sandbox/src/tip/python/pseudostruct.