This recipe is a follow-up to #440653, which was easy to implement but very slow because the iteration required to read all the rows of a table
As suggested by Matteo Dell'Amico in a comment, it would be much better if we could write something like
query(r.name for r in plane_tbl if r.country == "France")
where the generator expression is first translated into an SQL select, so that the iteration on the instance of query only reads the rows selected by the SQL statement
The present recipe is an attempt to achieve this. The first problem is to get the source code of the generator expression. I use information from the stack frame to get the file name and the line number, then the tokenize module to read the elements of the generator expression in the source code
Then, to build the SQL statement, the source code must be parsed : this is done using the compiler package and "visitors" that walk the AST tree returned by compiler.parse and do operations on the nodes, depending on their type
Finally, once the SQL statement is built, the iteration on the query instance can start : for the first one, the SQL statement is executed ; then the iteration yields the selected rows one by one.
The items can be : - objects, with attribute names matching those in the generator expression, except that qualified names (table.name) are converted to table_name - dictionaries : the keys are the same as the attribute names above - lists
For instance : - iterating on query(name for r in plane_tbl) returns objects with an attribute name - iterating on query(r.name for r in plane_tbl) returns objects with an attribute r_name
This is because of iteration on tables which have the same field names
query((r.name,c.name) for r in plane_tbl for c in country_tbl if r.speed > 500 )
The type of the items is set by query.return_type = object, dict or list
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 | """A wrapper around DBAPI-compliant databases to support iteration
and generator expression syntax for requests, instead of SQL
To get an iterator, initialize a connection to the database, then
set the cursor attribute of the query class to its cursor
Create an instance of Table for the tables you want to use
Then you can use the class query. You create an instance by passing
a generator expression as parameter. This instance translates the
generator expression in an SQL statement ; then you can iterate
on it to get the selected items as objects, dictionaries or lists
Supposing you call this module db_iterator.py, here is an example
of use with sqlite :
from pysqlite2 import dbapi2 as sqlite
from db_iterator import query, Table
conn = sqlite.connect('planes')
query.cursor = conn.cursor()
plane = Table()
countries = Table()
# all the items produced by iteration on query() are instances
# of the Record class
# simple requests
# since no attribute of r is specified in the query, returns a list
# of instances of Record with attributes matching all the field names
print [ r.name for r in query(r for r in plane if r.country == 'France') ]
# this request returns a list instances of Record with the attribute
# c_country (c.country with the . replaced by _)
print [ country for country in query(c.country for c in countries
if c.continent == 'Europe') ]
# request on two tables
print [r.name for r in query (r for r in plane for c in countries
if r.country == c.country and c.continent == 'Europe')]
"""
import tokenize
import token
import compiler
import types
class ge_visitor:
"""Instances of ge_visitor are used as the visitor argument to
compiler.walk(tree,visitor) where tree is an AST tree built by
compiler.parse
The instance has a src attribute which looks like the source
code from which the tree was built
Only a few of the visitNodeType are implemented, those likely to appear
in a database query. Can be easily extended
"""
def __init__(self):
self.src = ''
def visitTuple(self,t):
self.src += ','.join ( [ get_source(n) for n in t.nodes ])
def visitList(self,t):
self.src += ','.join ( [ get_source(n) for n in t.nodes ])
def visitMul(self,t):
self.src += '(%s)' %('*'.join([ get_source(n) for n in t]))
def visitName(self,t):
self.src += t.name
def visitConst(self,t):
if type(t.value) is str:
# convert single quotes, SQL-style
self.src += "'%s'" %t.value.replace("'","''")
else:
self.src += str(t.value)
def visitAssName(self,t):
self.src += t.name
def visitGetattr(self,t):
self.src += '%s.%s' %(get_source(t.expr),str(t.attrname))
def visitGenExprFor(self,t):
self.src += 'for %s in %s ' %(get_source(t.assign),
get_source(t.iter))
if t.ifs:
self.src += ' if ' +''.join([ get_source(i) for i in t.ifs ])
def visitGenExprIf(self,t):
self.src += get_source(t.test)
def visitCompare(self,t):
compiler.walk(t.expr,self)
self.src += ' '
for o in t.ops:
oper = o[0]
if oper == '==':
oper = '='
self.src += oper + ' '
compiler.walk(o[1],self)
def visitAnd(self,t):
self.src += '('
self.src += ' AND '.join([ get_source(n) for n in t.nodes ])
self.src+= ')'
def visitOr(self,t):
self.src += '('
self.src += ' OR '.join([ get_source(n) for n in t.nodes ])
self.src+= ')'
def visitNot(self,t):
self.src += '(NOT ' + get_source(t.expr) + ')'
def get_source(node):
"""Return the source code of the node, built by an instance of
ge_visitor"""
return compiler.walk(node,ge_visitor()).src
class genExprVisitor:
"""Visitor used to initialize GeneratorExpression objects
Uses the visitor pattern. See the compiler.visitor module"""
def __init__(self):
self.GenExprs = []
def visitGenExprInner(self,node):
ge = GeneratorExpression()
self.GenExprs.append(ge)
for y in node.getChildren():
if y.__class__ is compiler.ast.GenExprFor:
ge.exprfor.append(y)
else:
ge.result = y
class GeneratorExpression:
"""A class for a Generator Expression"""
def __init__(self):
self.result = None
self.exprfor = []
class Record(object):
"""A generic class for database records"""
pass
class Table:
"""A basic iterable class to avoid syntax errors"""
def __iter__(self):
return self
class query:
"""Class used for database queries
Instance is created with query(ge) where ge is a generator
expression
The __init__ method builds the SQL select expression matching the
generator expression
Iteration on the instance of query yields the items found by
the SQL select, under the form specified by return_type : an object,
a dictionary or a list"""
cursor = None # to be set to the cursor of the connection
return_type = object # can be set to dict or list
def __init__(self,s):
self._iterating = False # used in next()
# First we must get the source code of the generator expression
# I use an ugly hack with stack frame attributes and tokenize
# If there's a cleaner and safer way, please tell me !
readline = open(s.gi_frame.f_code.co_filename).readline
first_line = s.gi_frame.f_code.co_firstlineno
flag = False
self.source = '' # the source code
for t in tokenize.generate_tokens(open(s.gi_frame.f_code.co_filename).readline):
# check all tokens until the last parenthesis is closed
t_type,t_string,(r_start,c_start),(r_end,c_end),line = t
t_name = token.tok_name[t_type]
if r_start == first_line:
if t_name == 'NAME' and t_string=="query":
flag = True
res = t_string
start = 0 # number of parenthesis
continue
if flag:
self.source += ' '+t_string
if t_name == 'OP':
if t_string=='(':
start += 1
elif t_string == ')':
start -= 1
if start == 0:
break
# when the source has been found, build an AST tree from it
ast = compiler.parse(self.source.strip())
# use a visitor to find the generator expression(s) in the source
visitor = genExprVisitor()
compiler.walk(ast,visitor)
# if there are nested generator expressions, it's too difficult
# to handle : raise an exception
if len(visitor.GenExprs)>1:
raise Exception,'Invalid expression, found more ' \
'than 1 generator expression'
ge = visitor.GenExprs[0]
self.sql = self.build_sql(ge)
def build_sql(self,ge):
""" Build the SQL select for the generator expression
ge is an instance of GeneratorExpression
The generator expression looks like
(result) for x1 in table1 [ for x2 in table2] [ if condition ]
It has 2 attributes :
- result : an AST tree with the "result" part
- exprfor : a list of AST trees, one for each "for ... in ..."
"""
self.res = []
if ge.result.__class__ is compiler.ast.Tuple:
# more than one item in result
self.res = ge.result.getChildren()
else:
self.res = [ge.result]
results = [] # a list of strings = result part of the SQL expression
for res in self.res:
# a result can be a stand-alone name, or a "qualified" name,
# with the table name first (table.field)
if res.__class__ is compiler.ast.Name:
results.append((res.name,None))
elif res.__class__ is compiler.ast.Getattr:
results.append((get_source(res.expr),res.attrname))
self.results = results
# "for x in y" produces an item in the dictionary recdefs :
# recdef[x] = y
recdefs = {}
conditions = []
for exprfor in ge.exprfor:
recdefs[get_source(exprfor.assign)] = \
get_source(exprfor.iter)
if exprfor.ifs:
# an AST tree for the condition
conditions = exprfor.ifs
# To build objects or dictionaries in the result set, we must
# know the name of the fields in all the tables used in the
# query. For this, make a simple select in each table and read
# the information in cursor.description
self.names={}
for rec,table in recdefs.iteritems():
self.cursor.execute('SELECT * FROM %s' %table)
self.names[rec] = [ d[0] for d in self.cursor.description ]
sql_res = [] # the way the field will appear in the SQL string
rec_fields = [] # the name of the fields in the object or dictionary
for (n1,n2) in results:
if n2 is None:
# "stand-alone" name
if n1 in recdefs.keys():
sql_res += [ '%s.%s' %(n1,v) for v in self.names[n1] ]
rec_fields+=[ v for v in self.names[n1] ]
else:
sql_res.append(n1)
rec_fields.append(n1)
else:
# "qualified" name, with the table name first
sql_res.append('%s.%s' %(n1,n2))
# in the result set, the object will have the attribute
# table_name (we can't set an attribute table.name, and
# name alone could be ambiguous
rec_fields.append('%s_%s' %(n1,n2))
self.rec_fields = rec_fields
# now we can build the actual SQL string
sql = 'SELECT '+ ','.join(sql_res)
sql += ' FROM '
froms = []
for (k,v) in recdefs.iteritems():
froms.append('%s AS %s ' %(v,k))
sql += ','.join(froms)
if conditions:
sql += 'WHERE '
for c in conditions:
sql += get_source(c)
return sql
def __iter__(self):
return self
def next(self):
if not self._iterating:
# begin iteration
self.cursor.execute(self.sql)
self._iterating = True
row = self.cursor.fetchone()
if row is not None:
if self.return_type == object:
# transform list into instance of Record
# uses the rec_fields computed in build_sql()
rec = Record()
rec.__dict__ = dict(zip(self.rec_fields,row))
return rec
elif self.return_type == dict:
return dict(zip(self.rec_fields,row))
elif self.return_type == list:
return row
self._iterating = False
raise StopIteration
|
There is a difference of speed between raw SQL statement passed in cursor.execute and the use of a generator expression, but it is very limited
I'm not sure at all if this hack works in all cases, the way I get the source code is ugly and probably bugged. Any help will be appreciated
This is really cool. I always thought this was impossible, but then I had no idea that the innards of list comprehensions were generator expressions. Python metaprogramming strikes again!
After a looooooong time: this is extremely cool. Thanks Pierre for the hack. I still wonder whether the inspection capabilities of Python could be extended enough to make this work without the need for parsing a text file (and, for example, working in the interpreter).