Welcome, guest | Sign In | My Account | Store | Cart

Fast, lightweight attribute-style access to tuples.

Python, 147 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
from operator import itemgetter as _itemgetter
from keyword import iskeyword as _iskeyword
import sys as _sys

def namedtuple(typename, field_names, verbose=False, rename=False):
    """Returns a new subclass of tuple with named fields.

    >>> Point = namedtuple('Point', 'x y')
    >>> Point.__doc__                   # docstring for the new class
    'Point(x, y)'
    >>> p = Point(11, y=22)             # instantiate with positional args or keywords
    >>> p[0] + p[1]                     # indexable like a plain tuple
    33
    >>> x, y = p                        # unpack like a regular tuple
    >>> x, y
    (11, 22)
    >>> p.x + p.y                       # fields also accessable by name
    33
    >>> d = p._asdict()                 # convert to a dictionary
    >>> d['x']
    11
    >>> Point(**d)                      # convert from a dictionary
    Point(x=11, y=22)
    >>> p._replace(x=100)               # _replace() is like str.replace() but targets named fields
    Point(x=100, y=22)

    """

    # Parse and validate the field names.  Validation serves two purposes,
    # generating informative error messages and preventing template injection attacks.
    if isinstance(field_names, basestring):
        field_names = field_names.replace(',', ' ').split() # names separated by whitespace and/or commas
    field_names = tuple(map(str, field_names))
    if rename:
        names = list(field_names)
        seen = set()
        for i, name in enumerate(names):
            if (not min(c.isalnum() or c=='_' for c in name) or _iskeyword(name)
                or not name or name[0].isdigit() or name.startswith('_')
                or name in seen):
                    names[i] = '_%d' % i
            seen.add(name)
        field_names = tuple(names)
    for name in (typename,) + field_names:
        if not min(c.isalnum() or c=='_' for c in name):
            raise ValueError('Type names and field names can only contain alphanumeric characters and underscores: %r' % name)
        if _iskeyword(name):
            raise ValueError('Type names and field names cannot be a keyword: %r' % name)
        if name[0].isdigit():
            raise ValueError('Type names and field names cannot start with a number: %r' % name)
    seen_names = set()
    for name in field_names:
        if name.startswith('_') and not rename:
            raise ValueError('Field names cannot start with an underscore: %r' % name)
        if name in seen_names:
            raise ValueError('Encountered duplicate field name: %r' % name)
        seen_names.add(name)

    # Create and fill-in the class template
    numfields = len(field_names)
    argtxt = repr(field_names).replace("'", "")[1:-1]   # tuple repr without parens or quotes
    reprtxt = ', '.join('%s=%%r' % name for name in field_names)
    template = '''class %(typename)s(tuple):
        '%(typename)s(%(argtxt)s)' \n
        __slots__ = () \n
        _fields = %(field_names)r \n
        def __new__(_cls, %(argtxt)s):
            return _tuple.__new__(_cls, (%(argtxt)s)) \n
        @classmethod
        def _make(cls, iterable, new=tuple.__new__, len=len):
            'Make a new %(typename)s object from a sequence or iterable'
            result = new(cls, iterable)
            if len(result) != %(numfields)d:
                raise TypeError('Expected %(numfields)d arguments, got %%d' %% len(result))
            return result \n
        def __repr__(self):
            return '%(typename)s(%(reprtxt)s)' %% self \n
        def _asdict(self):
            'Return a new dict which maps field names to their values'
            return dict(zip(self._fields, self)) \n
        def _replace(_self, **kwds):
            'Return a new %(typename)s object replacing specified fields with new values'
            result = _self._make(map(kwds.pop, %(field_names)r, _self))
            if kwds:
                raise ValueError('Got unexpected field names: %%r' %% kwds.keys())
            return result \n
        def __getnewargs__(self):
            return tuple(self) \n\n''' % locals()
    for i, name in enumerate(field_names):
        template += '        %s = _property(_itemgetter(%d))\n' % (name, i)
    if verbose:
        print template

    # Execute the template string in a temporary namespace
    namespace = dict(_itemgetter=_itemgetter, __name__='namedtuple_%s' % typename,
                     _property=property, _tuple=tuple)
    try:
        exec template in namespace
    except SyntaxError, e:
        raise SyntaxError(e.message + ':\n' + template)
    result = namespace[typename]

    # For pickling to work, the __module__ variable needs to be set to the frame
    # where the named tuple is created.  Bypass this step in enviroments where
    # sys._getframe is not defined (Jython for example) or sys._getframe is not
    # defined for arguments greater than 0 (IronPython).
    try:
        result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__')
    except (AttributeError, ValueError):
        pass

    return result






if __name__ == '__main__':
    # verify that instances can be pickled
    from cPickle import loads, dumps
    Point = namedtuple('Point', 'x, y', True)
    p = Point(x=10, y=20)
    assert p == loads(dumps(p, -1))

    # test and demonstrate ability to override methods
    class Point(namedtuple('Point', 'x y')):
        @property
        def hypot(self):
            return (self.x ** 2 + self.y ** 2) ** 0.5
        def __str__(self):
            return 'Point: x=%6.3f y=%6.3f hypot=%6.3f' % (self.x, self.y, self.hypot)

    for p in Point(3,4), Point(14,5), Point(9./7,6):
        print p

    class Point(namedtuple('Point', 'x y')):
        'Point class with optimized _make() and _replace() without error-checking'
        _make = classmethod(tuple.__new__)
        def _replace(self, _map=map, **kwds):
            return self._make(_map(kwds.get, ('x', 'y'), self))

    print Point(11, 22)._replace(x=100)

    import doctest
    TestResults = namedtuple('TestResults', 'failed attempted')
    print TestResults(*doctest.testmod())

There has long been a need for named access to fields in records stored as tuples. In response, people have crafted many different versions of this recipe. I've combined the best of their approaches with a few ideas of my own. The resulting recipe has been well received, so it was proposed and accepted for inclusion in the collections module for Py2.6.

Docs and examples for the module can be found on the bottom of the page at: http://docs.python.org/dev/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields

The principal features are:

  • Easy to type/read/modify function signature: named_tuple('Person', 'name age sex height nationality').

  • C-speed attribute lookup using property() and itemgetter().

  • Named tuples have no instance dictionary, so their instances take no more space than a regular tuple (for example, casting thousands of sql records to named tuples has zero memory overhead).

  • Nice docstring is helpful with an editor's tooltips.

  • Optional keywords in the contructor for readability and to allow arguments to be specified in arbitrary order: Person(name='susan', height=60, nationality='english', sex='f', age=30).

  • Key/Value style repr for clearer error messages and for usability at the interactive prompt.

  • Named tuples can be pickled.

  • Clean error messages for missing or misnamed arguments.

  • A method similar to str.replace() using a field name. The _replace() method is used instead of slicing for updating fields. For example use t.replace(f=newval) instead of tuple concatenations like t[:2]+newval+t[3:].

  • A _fields attribute exposes the field names for introspection.

  • An _asdict() method for converting to an equivalent dictionary. If desired an ordered dictionary can be substituted for the dict constructor.

  • Recipe runs on Py2.4 or later.

  • Option for automatic renaming of invalid field names to positional names (to support cases where field names are supplied externally).

Note, the idea for the __module__ attribute was suggested by Michele Simionato and Louis Riviere. That attribute makes pickling possible, and it lets help() correctly identify the module where the named tuple was defined.

Thanks to Peter Kovac pointing-out deficiencies in the keyword argument checking. Because of his comments, the recipe has evolved to its current exec-style where we get all of Python's high-speed builtin argument checking for free. The new style of building and exec-ing a template made both the __new__ and __repr__ functions faster and cleaner than in previous versions of this recipe.

At the suggestion of Martin Blais and Robin Becker, the field name spec can be either a tuple or a string. When the field names are coming from a CSV header or other automatically generated source, it is simplest to pass in a tuple of fieldnames. When the field names are already known and written-out explicitly, the string form is best because it is easier to type, easier to read, and easier to rearrange when necessary. The string form works especially well with SQL use cases because the string can be cut-and-pasted from the field spec portion of an SQL query.

The function signature with the typename separate from the field names was suggested by Louis Riviere and Jim Jewett.

The idea to use keyword arguments for _replace() was inspired by Issac Morlund's suggestion to be able to do multiple replacements in one pass.

The inspiration for the _make() classmethod came from Robin Becker and Giovanni Bajo who pointed-out an important class of use cases where existing sequences need to be cast to named tuples.

The idea for the _fields attribute was suggested by Robin Becker.

The idea for the _asdict() method was inspired by the thought that any class with name/value pairs is conceptually a mapping. Accordingly, instances of that class should be readily convertible to and from a dict.

21 comments

Yuce Tekol 17 years, 3 months ago  # | flag

IMO, instead of:

>> Person = NamedTuple("Person x y")

this one seems more pythonic:

>> Person = NamedTuple(x=0, y=0)

But of course, then the NamedTuple function should name the generated class automatically.

Also, if we really require the generated class to have the name we want, the function can have a kwarg to set the class's name.

>> Person = NamedTuple(x=0, y=0, name="Person")
Giovanni Bajo 17 years, 2 months ago  # | flag

Constructor should take an iterable. I suggest the generated __new__ should take keyword arguments or an iterable in its only supported positional argument. This matches behaviour with tuple(), list() and other containers.

Louis RIVIERE 17 years, 2 months ago  # | flag

Syntax proposal. How about:

def NamedTuple(typename, s):
Louis RIVIERE 17 years, 2 months ago  # | flag

Pickling error. >>> from cPickle import dumps

>>> dumps(p)

added to the doctest fails.

Robin Becker 17 years, 2 months ago  # | flag

some minor mods. I like this, but :)

1) allow the fields to be defined directly rather than by split

ie

def NamedTuple(typename, field_names):

.......
    if isinstance(field_names,str):
        field_names = field_names.split()

.....

2) allow a constructor __from_iterable__ and a property __field_names__

ie

    def __from_iterable__(cls,arg):
        return cls.__new__(cls,*arg)

......
    m.update(
    ........
    __field_names__ = tuple(field_names),
    __from_iterable__=classmethod(__from_iterable__),
    )
Steve Anderson 17 years, 1 month ago  # | flag

Possible Mod? I really wanted to be able to take a tuple or list like Giovanni's suggestion, so I built upon Robin's suggestions and came up with this: (I'm not sure if it breaks any semantics or protocols you were trying to preserve)

def NamedTuple(typename, field_names):
    if isinstance(field_names,str):
        field_names = field_names.split()
    nargs = len(field_names)

    def __new__(cls, *args, **kwds):
        if (len(args) == 1) and (getattr(args[0], '__iter__', False)) and (isinstance(args[0][0], str)):
            args = tuple(name for name in args[0])
        if kwds:
            try:
                args += tuple(kwds[name] for name in field_names[len(args):])
            except KeyError, name:
                raise TypeError('%s missing required argument: %s' % (typename, name))
        if len(args) != nargs:
            raise TypeError('%s takes exactly %d arguments (%d given)' % (typename, nargs, len(args)))
        return tuple.__new__(cls, args)

    repr_template = '%s(%s)' % (typename, ', '.join('%s=%%r' % name for name in field_names))

    m = dict(vars(tuple))       # pre-lookup superclass methods (for faster lookup)
    m.update(__doc__= '%s(%s)' % (typename, ', '.join(field_names)),
             __slots__ = (),    # no per-instance dict (so instances are same size as tuples)
             __new__ = __new__,
             __repr__ = lambda self, _format=repr_template.__mod__: _format(self),
             __module__ = sys._getframe(1).f_globals['__name__'],
             __field_names__ = tuple(field_names),
             __from_iterable__=classmethod(__from_iterable__),
             )
    m.update((name, property(itemgetter(index))) for index, name in enumerate(field_names))

    return type(typename, (tuple,), m)
Steve Anderson 17 years, 1 month ago  # | flag

D'oh! Should have been just:

if (len(args) == 1) and (getattr(args[0], '__iter__', False)):
Louis RIVIERE 16 years, 11 months ago  # | flag

Little optimisation. template +='\n'.join('\t%s = property(itemgetter(%d))' % (name, i) for i, name in enumerate(field_names))

This may be worth considering if many Nuples and/or many fields are needed.

The readability may suffer for it though ...

Ray Heasman 16 years, 8 months ago  # | flag

Variant that works in Python 2.4 and is Psyco-optimizable. This is a neat recipe. I need it to work in Python 2.4, so I modified __replace__. Also, I use Psyco extensively, and it optimizes list comprehensions well, but has a performance penalty for generators. Hence I changed the generator in __replace__ to a list comprehension.

Psyco has can't optimize anything that looks like a closure, so I added another method __frep__ for "field replace" to be used in __replace__, rather than making it a sub-function of __replace__.

Here are the changed lines for:

def __frep__(self, a, field, value):
    if a==field:
        return value
    else:
        return getattr(self, a)
def __replace__(self, field, value):
    return %(typename)s(**dict([(a, self.__frep__(a, field, value)) for a in %(arglist)s]))
George Sakkis 16 years, 3 months ago  # | flag

Unpickling in different process. Say that you have a pickled Point instance as in the example. Is it possible to unpickle it in a different process that doesn't know about the Point class yet ? In other words, can unpickling somehow automatically run "Point = namedtuple('Point', 'x, y', True)" if there is no global Point class already ?

François Petitjean 16 years, 2 months ago  # | flag

change verbose=False to out=None. And in the code, instead of

    if verbose:
        print template

 write
<pre>
    if out:
        out.write(template)
        out.write('\n')


The beginning of the main part becomes :
<pre>
if __name__ == '__main__':
    # verify that instances can be pickled
    from cPickle import loads, dumps
    from StringIO import StringIO
    out = StringIO()
    Point = namedtuple('Point', 'x, y', out)
    print out.getvalue()
    p = Point(x=10, y=20)


No module of the standard library should include raw print statements.
(and there is no more a print statement in Python 3.x )
If this modification is added to the implementation of namedtuple in Python 2.6, do not forget to update the documentation page.

Thank you for a very good recipe.

</pre></pre>

Andreas Nilsson 15 years, 7 months ago  # | flag

With risk of being accused of painting bikesheds. From the diskussion above: "There has long been a need for named access to fields in records stored as tuples.". The fact that it inherits from / is stored as a tuple is an implementation detail, why not call it what it is (record)? You wouldn't call a Square ShapeWithStraightEdgesAndEqualSides, would you?

Michael Foord 15 years, 7 months ago  # | flag

This recipe works fine with IronPython 2 Beta 5, with one minor modification (there was an obscure bug in IronPython 2 Beta 4 preventing it from working).

In IronPython 'sys._getframe' exists but throws a NotImplementedError if you try and use it. The following would work:

<pre>if hasattr(_sys, '_getframe') and sys.platform != 'cli':</pre>

Michael

martin goodson 15 years, 5 months ago  # | flag

I'm using this (great) recipe. If I define the namedtuple in a function is there anyway to workaround the 'top-level' pickle constraint so that I can pickle the instances of the namedtuple?

thanks alot, Martin

Raymond Hettinger (author) 15 years, 2 months ago  # | flag

Martin, I don't see any way around the top-level pickle constraint. This issue isn't unique to namedtuples. It occurs anytime you have a class definition in a non-top-level namespace. The unpickler needs some way to create an instance of a class and it can't do that unless it has visibility to that class.

Y S 15 years, 1 month ago  # | flag

decorator style

def _namedtuple(func):
    return namedtuple(func.__name__, func.__code__.co_varnames)

@_namedtuple
def Point(x,y):
    pass

Nice.

Creative use of a decorators.

Scott S-Allen 11 years, 11 months ago  # | flag

Sorry for waking an old, sleepy, thread but I was wondering why a template construct such as this:

def __getattr__(self, n):
    try:
        return self[n]
    except TypeError:
        return self[self._fields.index(n)]

Would not have suitably replaced the property-addition iterator?

for i, name in enumerate(field_names):
    template += '        %s = _property(_itemgetter(%d))\n' % (name, i)

It is as much my curiosity regarding potential breakage of something I have yet to comprehend (as I have used this elsewhere) and an offer of "hey, this works for 2.7, and seems a tad cleaner to me".

Regardless, I have learned a lot reading Mr. H's fine contributions and the recipe above is no exception!

PS. One of the other comments seems to break the spirit of this by introducing a dictionary back into the mix. My first look at __slots__ briefly fell off that wagon... thankfully it wasn't moving.

Martin Miller 11 years, 10 months ago  # | flag

@Scott S-Allen:

So what exactly would the advantage be of doing it the way you suggest? I don't think the usual EAFP vs LBYL debate applies here.

I can't speak for the author, of course, but personally prefer code designs where exceptions are for handling exceptional cases, not the rule -- even if handling them is relatively cheap performance-wise, in Python.

Tony Flury 11 years, 5 months ago  # | flag

I have to admit i am not as experienced as some on this site in using Python, but I am struggling to understand the point of this recipe, and what problem it might solve.

Reading through - the code is very clever - generating source on the fly for a new class etc, but why ?

A data construct with named attributes - isn't that just a dictionary? I can't for the life of me work out a situation where i would need this. If i need a tuple then i can create one, and in all the code I have written I have never needed to slice or dice them, or needed to access them as named fields. If i ever need to - i know i can :

a,b,c,d = <original_tuple>
...some code modifiying a,b,c & d.....
<new_tuple> = (a,b,c,d)
Brian Hendricks 10 years, 1 month ago  # | flag

Great code with one minor issue. When run as the main module the script runs doctest. Other than printing the results, the result is ignored. Continuous integration tools like Jenkins use the exit status to determine failure or success. Replacing the print statement in the last line with the following code sets the exit status on failure.

results = TestResults(*doctest.testmod())
print results
if results.failed != 0:
    _sys.exit(1)
_sys.exit(0)