Welcome, guest | Sign In | My Account | Store | Cart

Greg Stein's dtuple module deserves wider recognition. This recipe shows you how to use it with a simple list of database field names.

Python, 24 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import dtuple
import mx.ODBC.Windows as odbc

flist = ["Name", "Num", "LinkText"]

descr = dtuple.TupleDescriptor([[n] for n in flist])

conn = odbc.connect("HoldenWebSQL") # Connect to a database
curs = conn.cursor()                # Create a cursor

sql = """SELECT %s FROM StdPage
            WHERE PageSet='Std' AND Num<25
            ORDER BY PageSet, Num""" % ", ".join(flist)
print sql
curs.execute(sql)
rows = curs.fetchall()

for row in rows:
    row = dtuple.DatabaseTuple(descr, row)
    print "Attribute: Name: %s Number: %d" % (row.Name, row.Num or 0)
    print "Subscript: Name: %s Number: %d" % (row[0], row[1] or 0)
    print "Mapping:   Name: %s Number: %d" % (row["Name"], row["Num"] or 0)

conn.close()

Novice Python programmers are often deterred from using databases because query results are presented by DB API-compliant modules as a list of tuples. Since these can only be numerically subscripted, code that makes use of the query results becomes opaque and difficult to maintain. Greg Stein's dtuple module -- available from http://www.lyra.org/greg/python/dtuple.py -- really helps by defining two classes, TupleDescriptor and DatabaseTuple. The TupleDescriptor creates a description of tuples from a list of sequences, the first element of which is a column name. It is often convenient to describe data with such sequences: for example in an interactive or forms-based application each column name might be followed by validation parameters such as datatype and allowable length. The TupleDescriptor's purpose is to allow the creation of DatabaseTuple objects (see below). In this particular application no other information is needed about the columns, so the required list of sequences is constructed from a list of field names using a list comprehension. Created from a TupleDescriptor and a tuple such as a database row, the DatabaseTuple is an object whose elements can be accessed by numeric subscript (like a tuple) or column name subscript (like a dictionary). If column names are legal Python names, you can also access the columns in your DatabaseTuple as attributes. To demonstrate the utility of the DatabaseTuple the simple test program above creates a TupleDescriptor and then uses it to convert each row retrieved from a SQL query into a DatabaseTuple. Because the sample uses the same field list to build both the TupleDescriptor and the SQL SELECT statement, it demonstrates how database code can be relatively easily parameterized. The program gives the results shown below: <pre> SELECT Name, Num, LinkText FROM StdPage WHERE PageSet='Std' AND Num<25 ORDER BY PageSet, Num Attribute: Name: consult Number: 10 Subscript: Name: consult Number: 10 Mapping: Name: consult Number: 10 Attribute: Name: python Number: 12 Subscript: Name: python Number: 12 Mapping: Name: python Number: 12 Attribute: Name: expertise Number: 20 Subscript: Name: expertise Number: 20 Mapping: Name: expertise Number: 20 </pre>

5 comments

Hamish Lawson 22 years, 2 months ago  # | flag

Fieldnames could be obtained from cursor. The fieldnames could be obtained dynamically from the cursor rather than being explicitly specified:

curs.execute(sql)
descr = dtuple.TupleDescriptor(curs.description)
flist = [d[0] for d in cursor.description]
Kevin Jacobs 21 years, 4 months ago  # | flag

A more general and higher performance alternative exists... The OPAL Group (my company) has made available a more general and higher performance result set implementation that does not allocate a Python dictionary per row returned from a query. It allows tuple-like, dict-like, and object-like access to fields. The implementation requires Python 2.2.1 or better, and includes both pure-Python and C extension module implementations. For more information see: http://opensource.theopalgroup.com/

Danny Adair 19 years, 6 months ago  # | flag

Here you are. I consider this lightweight: Take the list of tuples from fetchall(), the description from the cursor and create a dtuple.DatabaseTuple only when needed:

import dtuple

class DB_ROWS:
    def __init__(self, description = [], rows = []):
        self._descriptor = dtuple.TupleDescriptor([[f][0] for f in description])
        self._rows = rows

    def __getitem__(self, index):
        return dtuple.DatabaseTuple(self._descriptor, self._rows[index])

    def __len__(self):
        """Nice to have."""
        return len(self._rows)

For example:

connection = psycopg.connect(myDSN)
cursor = connection.cursor()
cursor.execute("""SELECT * FROM mytable""")
raw_rows = cursor.fetch_all()
description = cursor.description

rows = DB_ROWS(description, raw_rows)
print len(rows)
print rows[0].mycolumn

Btw, the column (field) names are in rows._descriptor.names

Greg Stein 18 years, 1 month ago  # | flag

some rationale. When I first started using databases from Python, we designed the DB-API to return data "as close to native" as possible. That made it easier for the extension developers, and made it faster by omitting (potentially) unnecessary work.

Next came ease-of-use at the application level: wanting to refer to a column as row.my_column_name. That didn't match well with the .fetchall() results. My first iteration produced a bunch of objects with attribute-based values.

Not fast.

Thus, the idea with dtuple is to provide a proxy object in between the accesses and the raw data. It stores just a couple references, and performs all of the mapping at access time.

The initial setup is fast, memory is low, and any "nice naming" cost is amortized over the entire duration of processing the rows (rather than up front).