Welcome, guest | Sign In | My Account | Store | Cart

You need to store arbitrary objects in a PostgreSQL database without being intrusive for your classes (don't want inheritance from an 'Item' or 'Persistent' object).

Python, 83 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
from datetime import datetime
 
import psycopg
from psycopg.extensions import adapters, adapt

try: sorted()
except NameError:
    def sorted(seq):
        seq.sort()
        return seq

# Here is the adapter for every object that we may ever need to 
# insert in the database. It receives the original object and does
# its job on that instance
class ObjectMapper(object):
    def __init__(self, orig):
        self.orig = orig
        self.tmp = {}
        self.items, self.fields = self._gatherState()
 
    def _gatherState(self):
        adaptee_name = self.orig.__class__.__name__
        fields = sorted([(field, getattr(self.orig, field))
                        for field in persistent_fields[adaptee_name]])
        items = []
        for item, value in fields:
            items.append(item)
        return items, fields
 
    def getTableName(self):
        return self.orig.__class__.__name__
 
    def getMappedValues(self):
        tmp = []
        for i in self.items:
            tmp.append("%%(%s)s"%i)
        return ", ".join(tmp)
 
    def getValuesDict(self):
        return dict(self.fields)
 
    def getFields(self):
        return self.items

    def generateInsert(self):
        qry = "INSERT INTO"
        qry += " " + self.getTableName() + " ("
        qry += ", ".join(self.getFields()) + ") VALUES ("
        qry += self.getMappedValues() + ")"
        return qry, self.getValuesDict()

# Here are the objects
class Album(object):    
    id = 0 
    def __init__(self):
        self.creation_time = datetime.now()
        self.album_id = self.id
        Album.id = Album.id + 1
        self.binary_data = buffer('12312312312121')
 
class Order(object):
     id = 0
     def __init__(self):
        self.items = ['rice','chocolate']
        self.price = 34
        self.order_id = self.id
        Order.id = Order.id + 1
 
adapters.update({Album: ObjectMapper, Order: ObjectMapper})
    
# Describe what is needed to save on each object
# This is actually just configuration, you can use xml with a parser if you
# like to have plenty of wasted CPU cycles ;P.
persistent_fields = {'Album': ['album_id', 'creation_time', 'binary_data'],
                     'Order': ['order_id', 'items', 'price']
                    }
 
print adapt(Album()).generateInsert()
print adapt(Album()).generateInsert()
print adapt(Album()).generateInsert()
print adapt(Order()).generateInsert()
print adapt(Order()).generateInsert()
print adapt(Order()).generateInsert()

Psycopg 2 has a great new feature: adaptation. The big thing about adaptation is that it enable the programmer to glue most of the code out there without many difficulties.

This recipe tries to focus the attention on a way to generate SQL queries to insert completely new objects inside a database. As you can see objects do not know anything about the code that is handling them. We specify all the fields that we need for each object through the persistent_fields dict.

The most important line of this recipe is this one: adapters.update({Album: ObjectMapper, Order: ObjectMapper})

In this line we notify the system that when we call adapt with an Album instance as an argument we want it to istantiate ObjectMapper passing the Album instance as argument (self.orig in the ObjectMapper class).

adapters is just a python dict with a Key that represents the type we need to adapt from and a value that is the adapter which will adapt to the wanted interface.

The output is something like this (for each call to generateInsert):

('INSERT INTO Album (album_id, binary_data, creation_time) VALUES (%(album_id)s, %(binary_data)s, %(creation_time)s)',

{'binary_data': , 'creation_time': datetime.datetime(2004, 9, 10, 20, 48, 29, 633728), 'album_id': 1} )

This is a tuple of {SQL_QUERY, FILLING_DICT}, and all the quoting/converting stuff (from python's datetime to postgres s and from python's buffer to postgres' blob) is handled with the same adaptation process hunder the hood by psycopg2.

At last, just notice that ObjectMapper is working for both Album and Order instances without any glitches at all, and both classes could have easily been coming from closed source libraries or C coded ones (which are not easily modified), whereas a common pattern in todays ORMs or OODBs is to provide a basic 'Persistent' object that already knows how to store itself in the database.

2 comments

Shavais Z 16 years, 1 month ago  # | flag

Mapping arbitrary objects to a PostgreSQL database with psycopg2. Hello, I was an Oracle DBA for almost a decade, and have since continued to be involved in a lot of database application development on various platforms using various database engines. Here are a few notes from DBA-land..

This idea, I think, could work reasonably well in some situations, if a few things were done:

  1. This example needs to be modified to use stored plans, so that the database doesn't have to reparse the SQL for every insert.

  2. The caller should always be sure to only commit transactions when it makes sense to commit them.. which is almost never after every insert. ("Autocommit" is an abomination - it kills performance not only for your application's users, but for everyone else using the database server, too. I'm not saying this code uses it, I'm just saying a caller should be sure they're not - it's an easy mistake to make / bad habit to fall into in the context of generated SQL.)

  3. A DBA or data architect should design database tables first, using an ERD tool, for example, so that the tables are normalized, or are in a star pattern, or are designed in a way that makes some sense relationally, in the context of the whole purpose or role of the particular database in the enterprise. Just dumping Python objects into a database, willy-nilly, is (obviously) a totally absurd idea. (Again, I'm not saying that the author proposes this, just saying it should definitely not be done, in case anyone was actually considering it.) Database tables and their primary keys, indexes, not-null constraints, foreign key constraints, defaults and so on and so forth (in the absence of any really comprehensive GUI front end for the purpose) should be created in SQL scripts, where they can be the most easily reviewed, maintained, and version-controlled.

  4. Then, for the purposes of interfacing with some very simple Python utility programs (maybe to allow the manual update of individual records of individual tables, for example) a set of Python objects could be created that make use of an SQL-generating adapter like this, such that they mirror the tables in the database. These modules should only be used for very basic views of the data that don't involve anything more complex than maybe a master-detail relationship between two tables in the most complex case. (Anything more complex than that should use customized SQL, not generated SQL.)

(comment continued...)

Shavais Z 16 years, 1 month ago  # | flag

(...continued from previous comment)

Database application developers need to be aware that there is well known 80/20 rule when it comes to performance tuning. 20% of the gain that can be achieved through tuning can come from server and database engine tuning.. kernel tuning parameters, database engine tuning parameters and IO layout designs and such. 80% of the gain that can be achieved comes from application tuning - designing tables (and their keys and indexes and such) better, and using better SQL. I couldn't begin to count the number of times I've helped a developer revamp their table design and their SQL, and made things run 10, 20, or sometimes even 100 times faster.

Relational database engines are incredible tools, and tuned, customized SQL is by far the best way to interact with one; however, there is some place for this kind of automation. Kudos to the author for coming up with a way of doing it that has the advantages they mentioned.