Welcome, guest | Sign In | My Account | Store | Cart

This is a recipe for a Python "data object." It is similar in function to namedtuple (http://code.activestate.com/recipes/500261/) and recordtype (http://code.activestate.com/recipes/576555-records/) in that it is a simple container for data, but is designed to meet three specific goals:

  1. Easy to subclass data objects.
  2. Get/set speed comparable to a simple class.
  3. Minimal memory consumption per instance.
Python, 158 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
class PseudoStructure(object):
    """A fast, small-footprint, structure-like type suitable for use as
    a data transfer or parameter object.

    This class is not intended to be used directly. Subclasses should
    define the fields they care about as slots and (optionally) define
    an initializer for those fields.

    This class is compatible with both Python2 and Python3 (tested on
    2.4-3.3).

    """

    __slots__ = []

    @classmethod
    def define(cls, class_name, *field_names):
        r"""Dynamically create a class definition for a subclass of
        ``PseudoStructure``.

        Note that objects of a dynamically created class cannot be
        pickled. If pickling support is needed, the subclass must have
        a static definition.

        >>> X = PseudoStructure.define('Attribute', 'name', 'value')
        >>> X.__name__
        'Attribute'
        >>> issubclass(X, PseudoStructure)
        True
        >>> X.__slots__
        ('name', 'value')

        """
        return type(class_name, (cls,), {"__slots__": field_names})

    def __getattr__(self, name):
        r"""Lazily initialize the value for *name*.

        >>> class Attribute(PseudoStructure):
        ...     __slots__ = ['name', 'value']
        ...
        >>> attr = Attribute()
        >>> attr.name is None
        True
        >>> attr.description is None
        Traceback (most recent call last):
        ...
        AttributeError: 'Attribute' object has no attribute 'description'

        """
        # only called once, if a value has not yet been assigned to the slot;
        # returns None as a reasonable default; if the name is not a slot,
        # raises AttributeError as expected
        setattr(self, name, None)

    def __getstate__(self):
        r"""Build the state object for pickling this object.

        >>> class Attribute(PseudoStructure):
        ...     __slots__ = ['name', 'value']
        ...
        >>> attr = Attribute()
        >>> attr.name = 'test'
        >>> sorted(attr.__getstate__().items())
        [('name', 'test'), ('value', None)]

        """
        state = {}
        for cls in self.__class__.__mro__:
            slots = getattr(cls, "__slots__", [])
            if (isinstance(slots, str)):
                slots = [slots]
            for slot in slots:
                state[slot] = getattr(self, slot)
        return state

    def __setstate__(self, state):
        r"""Initialize this object from a pickling state object.

        >>> class Attribute(PseudoStructure):
        ...     __slots__ = ['name', 'value']
        ...
        >>> attr = Attribute()
        >>> attr.name = 'test'
        >>> state = attr.__getstate__()
        >>> attr2 = Attribute()
        >>> attr2.__setstate__(state)
        >>> attr2.name == 'test'
        True
        >>> attr2.value is None
        True

        """
        for (name, value) in state.items():
            setattr(self, name, value)

    def __eq__(self, other):
        r"""Return True if the internal states of *other* and self are
        equal.

        >>> class Sample(PseudoStructure):
        ...     __slots__ = ['value']
        ...
        >>> sample1 = Sample()
        >>> sample2 = Sample()
        >>> sample1 == sample2
        True
        >>> sample2.value = 'test'
        >>> sample1 == sample2
        False

        """
        return (self.__getstate__() == other.__getstate__())

    def __hash__(self):
        r"""Return the hash of this pseudo-structure's internal state.

        >>> class Sample(PseudoStructure):
        ...     __slots__ = ['value']
        ...
        >>> sample1 = Sample()
        >>> sample2 = Sample()
        >>> hash(sample1) == hash(sample2)
        True
        >>> sample2.value = 'test'
        >>> hash(sample1) == hash(sample2)
        False

        """
        return hash(tuple(sorted(self.__getstate__().items())))

    def __repr__(self):
        r"""Return an unambiguous description of this object.

        >>> class Sample(PseudoStructure):
        ...     __slots__ = ['value']
        ...
        >>> sample = Sample()
        >>> repr(sample)
        "<Sample {'value': None}>"

        """
        return "<%s %r>" % (self.__class__.__name__, self.__getstate__())


if (__name__ == "__main__"):
    import doctest
    import pickle

    doctest.testmod()

    class Test(PseudoStructure):
        __slots__ = ["name"]

    x = Test()
    x.name = "pickle test"
    y = pickle.loads(pickle.dumps(x))
    assert x == y

I created PseudoStruct after noticing that a project I was working on was running slower than I expected and was consuming far more memory than I was comfortable with.

I had initially used just a simple/traditional class-based approach for my data objects, but the issue was that I had potentially thousands of these data objects active at any given time. I needed a data object that met the following criteria:

  1. Must be mutable.
  2. Easy to define subclasses of data object types.
  3. Get/set speeds comparable to simple class object.
  4. Memory consumption must be (significantly) less than simple class objects.
  5. Can be pickled.

Namedtuple instances are not mutable, so I needed to use a different approach. Recordtype has all of the required performance characteristics and can be pickled, but like namedtuple it is not a straightforward matter to subclass a recordtype. (If not for that requirement, I would have chosen recordtype.)

PseudoStruct satisfies all performance characteristics, is pickleable, and is easy to subclass. For example:

class Attribute(PseudoStruct):
    __slots__ = ['name', 'value']

class RichAttribute(Attribute):
    # has slots for name, value, and description
    __slots__ = ['description']

Like namedtuple and recordtype, PseudoStruct doesn't allow arbitrary fields to be created:

class Attribute(PseudoStruct):
    __slot__ = ['name', 'value']

attr = Attribute()
attr.arbitrary = 'not allowed'
Traceback (most recent call last):
...
AttributeError: 'Attribute' object has no attribute 'arbitrary'

Unlike namedtuple and recordtype, PseudoStruct does not define an __init__ method by default (nor convenience methods like __len__ and __iter__). However, it is trivial to add these methods in a subclass.

Finally, with PseudoStruct I can make certain fields constant/immutable, while leaving the rest mutable:

class TypedAttribute(Attribute):
    # has slots for type, name, and value
    __slots__ = ['type']

class StringAttribute(TypedAttribute):
    # type field is now constant/immutable for all StringAttribute objects,
    # but name and value are still mutable
    type = 'str'

1 comment

Matthew Zipay (author) 11 years, 4 months ago  # | flag

The latest version is available from BitBucket at https://bitbucket.org/mzipay/sandbox/src/tip/python/pseudostruct.