Welcome, guest | Sign In | My Account | Store | Cart

This recipe is a follow-up to #440653, which was easy to implement but very slow because the iteration required to read all the rows of a table

As suggested by Matteo Dell'Amico in a comment, it would be much better if we could write something like

query(r.name for r in plane_tbl if r.country == "France")

where the generator expression is first translated into an SQL select, so that the iteration on the instance of query only reads the rows selected by the SQL statement

The present recipe is an attempt to achieve this. The first problem is to get the source code of the generator expression. I use information from the stack frame to get the file name and the line number, then the tokenize module to read the elements of the generator expression in the source code

Then, to build the SQL statement, the source code must be parsed : this is done using the compiler package and "visitors" that walk the AST tree returned by compiler.parse and do operations on the nodes, depending on their type

Finally, once the SQL statement is built, the iteration on the query instance can start : for the first one, the SQL statement is executed ; then the iteration yields the selected rows one by one.

The items can be : - objects, with attribute names matching those in the generator expression, except that qualified names (table.name) are converted to table_name - dictionaries : the keys are the same as the attribute names above - lists

For instance : - iterating on query(name for r in plane_tbl) returns objects with an attribute name - iterating on query(r.name for r in plane_tbl) returns objects with an attribute r_name

This is because of iteration on tables which have the same field names

query((r.name,c.name) for r in plane_tbl for c in country_tbl if r.speed > 500 )

The type of the items is set by query.return_type = object, dict or list

Python, 310 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
"""A wrapper around DBAPI-compliant databases to support iteration
and generator expression syntax for requests, instead of SQL

To get an iterator, initialize a connection to the database, then
set the cursor attribute of the query class to its cursor

Create an instance of Table for the tables you want to use

Then you can use the class query. You create an instance by passing
a generator expression as parameter. This instance translates the
generator expression in an SQL statement ; then you can iterate
on it to get the selected items as objects, dictionaries or lists

Supposing you call this module db_iterator.py, here is an example 
of use with sqlite :

    from pysqlite2 import dbapi2 as sqlite
    from db_iterator import query, Table

    conn = sqlite.connect('planes')
    query.cursor = conn.cursor()

    plane = Table()
    countries = Table()

    # all the items produced by iteration on query() are instances
    # of the Record class
    
    # simple requests
    # since no attribute of r is specified in the query, returns a list
    # of instances of Record with attributes matching all the field names
    print [ r.name for r in query(r for r in plane if r.country == 'France') ]
    
    # this request returns a list instances of Record with the attribute 
    # c_country (c.country with the . replaced by _)
    print [ country for country in query(c.country for c in countries 
            if c.continent == 'Europe') ]

    # request on two tables
    print [r.name for r in query (r for r in plane for c in countries 
            if r.country == c.country and c.continent == 'Europe')]

"""
import tokenize
import token
import compiler
import types

class ge_visitor:
    """Instances of ge_visitor are used as the visitor argument to 
    compiler.walk(tree,visitor) where tree is an AST tree built by
    compiler.parse
    The instance has a src attribute which looks like the source
    code from which the tree was built
    Only a few of the visitNodeType are implemented, those likely to appear
    in a database query. Can be easily extended
    """

    def __init__(self):
        self.src = ''

    def visitTuple(self,t):
        self.src += ','.join ( [ get_source(n) for n in t.nodes ])

    def visitList(self,t):
        self.src += ','.join ( [ get_source(n) for n in t.nodes ])

    def visitMul(self,t):
        self.src += '(%s)' %('*'.join([ get_source(n) for n in t]))

    def visitName(self,t):
        self.src += t.name

    def visitConst(self,t):
        if type(t.value) is str:
            # convert single quotes, SQL-style
            self.src += "'%s'" %t.value.replace("'","''")
        else:
            self.src += str(t.value)

    def visitAssName(self,t):
        self.src += t.name

    def visitGetattr(self,t):
        self.src += '%s.%s' %(get_source(t.expr),str(t.attrname))

    def visitGenExprFor(self,t):
        self.src += 'for %s in %s ' %(get_source(t.assign),
                get_source(t.iter))
        if t.ifs:
            self.src += ' if ' +''.join([ get_source(i) for i in t.ifs ])

    def visitGenExprIf(self,t):
        self.src += get_source(t.test)

    def visitCompare(self,t):
        compiler.walk(t.expr,self)
        self.src += ' '
        for o in t.ops:
            oper = o[0]
            if oper == '==':
                oper = '='
            self.src += oper + ' '
            compiler.walk(o[1],self)

    def visitAnd(self,t):
        self.src += '('
        self.src += ' AND '.join([ get_source(n) for n in t.nodes ])
        self.src+= ')'

    def visitOr(self,t):
        self.src += '('
        self.src += ' OR '.join([ get_source(n) for n in t.nodes ])
        self.src+= ')'

    def visitNot(self,t):
        self.src += '(NOT ' + get_source(t.expr) + ')'

def get_source(node):
    """Return the source code of the node, built by an instance of
    ge_visitor"""
    return compiler.walk(node,ge_visitor()).src

class genExprVisitor:
    """Visitor used to initialize GeneratorExpression objects
    Uses the visitor pattern. See the compiler.visitor module"""

    def __init__(self):
        self.GenExprs = []

    def visitGenExprInner(self,node):
        ge = GeneratorExpression()
        self.GenExprs.append(ge)
        for y in node.getChildren():
            if y.__class__ is compiler.ast.GenExprFor:
                ge.exprfor.append(y)
            else:
                ge.result = y

class GeneratorExpression:
    """A class for a Generator Expression"""
    def __init__(self):
        self.result = None
        self.exprfor = []
        
class Record(object):
    """A generic class for database records"""
    pass

class Table:
    """A basic iterable class to avoid syntax errors"""
    def __iter__(self):
        return self
    
class query:
    """Class used for database queries
    Instance is created with query(ge) where ge is a generator
    expression
    The __init__ method builds the SQL select expression matching the
    generator expression
    Iteration on the instance of query yields the items found by
    the SQL select, under the form specified by return_type : an object,
    a dictionary or a list"""

    cursor = None   # to be set to the cursor of the connection
    return_type = object    # can be set to dict or list

    def __init__(self,s):
        self._iterating = False # used in next()

        # First we must get the source code of the generator expression
        # I use an ugly hack with stack frame attributes and tokenize
        # If there's a cleaner and safer way, please tell me !
        readline = open(s.gi_frame.f_code.co_filename).readline
        first_line = s.gi_frame.f_code.co_firstlineno
        flag = False
        self.source = ''    # the source code
        for t in tokenize.generate_tokens(open(s.gi_frame.f_code.co_filename).readline):
            # check all tokens until the last parenthesis is closed
            t_type,t_string,(r_start,c_start),(r_end,c_end),line = t
            t_name = token.tok_name[t_type]
            if r_start == first_line:
                if t_name == 'NAME' and t_string=="query":
                    flag = True
                    res = t_string
                    start = 0 # number of parenthesis
                    continue
            if flag:
                self.source += ' '+t_string
                if t_name == 'OP':
                        if t_string=='(':
                            start += 1
                        elif t_string == ')':
                            start -= 1
                            if start == 0:
                                break
        # when the source has been found, build an AST tree from it
        ast = compiler.parse(self.source.strip())
        # use a visitor to find the generator expression(s) in the source
        visitor = genExprVisitor()
        compiler.walk(ast,visitor)
        # if there are nested generator expressions, it's too difficult
        # to handle : raise an exception
        if len(visitor.GenExprs)>1:
            raise Exception,'Invalid expression, found more ' \
                'than 1 generator expression'
        ge = visitor.GenExprs[0]
        self.sql = self.build_sql(ge)

    def build_sql(self,ge):
        """ Build the SQL select for the generator expression
        ge is an instance of GeneratorExpression
        The generator expression looks like
        (result) for x1 in table1 [ for x2 in table2] [ if condition ]
        It has 2 attributes :
        - result : an AST tree with the "result" part
        - exprfor : a list of AST trees, one for each "for ... in ..."
        """
        self.res = []
        if ge.result.__class__ is compiler.ast.Tuple:
            # more than one item in result
            self.res = ge.result.getChildren()
        else:
            self.res = [ge.result]
        results = [] # a list of strings = result part of the SQL expression
        for res in self.res:
            # a result can be a stand-alone name, or a "qualified" name,
            # with the table name first (table.field)
            if res.__class__ is compiler.ast.Name:
                results.append((res.name,None))
            elif res.__class__ is compiler.ast.Getattr:
                results.append((get_source(res.expr),res.attrname))
        self.results = results

        # "for x in y" produces an item in the dictionary recdefs :
        # recdef[x] = y
        recdefs = {}
        conditions = []
        for exprfor in ge.exprfor:
            recdefs[get_source(exprfor.assign)] = \
                get_source(exprfor.iter)
            if exprfor.ifs:
                # an AST tree for the condition
                conditions = exprfor.ifs

        # To build objects or dictionaries in the result set, we must
        # know the name of the fields in all the tables used in the
        # query. For this, make a simple select in each table and read
        # the information in cursor.description
        self.names={}
        for rec,table in recdefs.iteritems():
            self.cursor.execute('SELECT * FROM %s' %table)
            self.names[rec] = [ d[0] for d in self.cursor.description ]

        sql_res = [] # the way the field will appear in the SQL string
        rec_fields = [] # the name of the fields in the object or dictionary
        for (n1,n2) in results:
            if n2 is None:
                # "stand-alone" name
                if n1 in recdefs.keys():
                    sql_res += [ '%s.%s' %(n1,v) for v in self.names[n1] ]
                    rec_fields+=[ v for v in self.names[n1] ]
                else:
                    sql_res.append(n1)
                    rec_fields.append(n1)
            else:
                # "qualified" name, with the table name first
                sql_res.append('%s.%s' %(n1,n2))
                # in the result set, the object will have the attribute 
                # table_name (we can't set an attribute table.name, and
                # name alone could be ambiguous
                rec_fields.append('%s_%s' %(n1,n2))
        self.rec_fields = rec_fields
        
        # now we can build the actual SQL string
        sql = 'SELECT '+ ','.join(sql_res)
        sql += ' FROM '
        froms = []
        for (k,v) in recdefs.iteritems():
            froms.append('%s AS %s ' %(v,k))
        sql += ','.join(froms)
        if conditions:
            sql += 'WHERE '
        for c in conditions:
            sql += get_source(c)

        return sql
            
    def __iter__(self):
        return self
    
    def next(self):
        if not self._iterating:
            # begin iteration
            self.cursor.execute(self.sql)
            self._iterating = True
        row = self.cursor.fetchone()
        if row is not None:
            if self.return_type == object:
                # transform list into instance of Record
                # uses the rec_fields computed in build_sql()
                rec = Record()
                rec.__dict__ = dict(zip(self.rec_fields,row))
                return rec
            elif self.return_type == dict:
                return dict(zip(self.rec_fields,row))
            elif self.return_type == list:
                return row
        self._iterating = False
        raise StopIteration

There is a difference of speed between raw SQL statement passed in cursor.execute and the use of a generator expression, but it is very limited

I'm not sure at all if this hack works in all cases, the way I get the source code is ugly and probably bugged. Any help will be appreciated

2 comments

David Abrahams 16 years, 10 months ago  # | flag

This is really cool. I always thought this was impossible, but then I had no idea that the innards of list comprehensions were generator expressions. Python metaprogramming strikes again!

Matteo Dell'Amico 13 years, 3 months ago  # | flag

After a looooooong time: this is extremely cool. Thanks Pierre for the hack. I still wonder whether the inspection capabilities of Python could be extended enough to make this work without the need for parsing a text file (and, for example, working in the interpreter).