The user can define a query expression for (almost) any CSV file. (The first row assumed to contain column headers.) Then the code outputs all data rows matching to the given query expression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | # queryCSV.py
# FB - 201010111
# Query a CSV file.
# example CSV file: myData.csv
# id,code name,value
# 36,abc,7.6
# 40,def,3.6
# 9,ghi,6.3
# 76,def,99
# example query expression
query = 'id > 36 and code_name == "def" and value <= 100'
print 'Query:'
print query
print
import csv
csvData = csv.reader(open('myData.csv'))
csvTable = []
isHeader = True
for row in csvData:
if isHeader:
isHeader = False
headerRow = row
for i in range(len(headerRow)):
# replace spaces w/ underscores in column headers
headerRow[i] = headerRow[i].replace(' ', '_')
else:
csvTable.append(row)
# determine column types: string/int/float
colType = []
for i in range(len(headerRow)):
isFloat = True
isInt = True
for j in range(len(csvTable)):
try:
v = float(csvTable[j][i])
if not v == int(v):
isInt = False
except ValueError:
isFloat = False
isInt = False
colT = ''
if isInt:
colT = 'int'
elif isFloat:
colT = 'float'
else:
colT = 'string'
colType.append(colT)
# print headerRow[i], colT
# run the query
for j in range(len(csvTable)):
# assign the column variables
for i in range(len(headerRow)):
if colType[i] == 'string':
exec(headerRow[i] + '=' + '"' + csvTable[j][i] + '"')
elif colType[i] == 'float':
exec(headerRow[i] + '=' + 'float("' + csvTable[j][i] + '")')
elif colType[i] == 'int':
exec(headerRow[i] + '=' + 'int("' + csvTable[j][i] + '")')
# output the rows matching the query
if eval(query):
print headerRow
print csvTable[j]
print
|
Nice!
One small enhancement, there is no guarantee that header names are valid python variable names.
In this case, a small function to make names to valid python variables is needed. e.g. Replace spaces with underscores.
It's also a good example of a correct use of eval and exec. Too often you see
eval ("x=10") or equivalent rather than x=exec("10")
Nick
Added replacement of spaces w/ underscores in column headers.
Another improvement would be this: First put the whole CSV data into a 2d list. Check if a column contains only integers then assign all values for that column as integers. And if at least one value contains a radix point then assign all values for that column as floats. That would make using int() and float() in the expressions unnecessary.
Also, this code provides similar functionality to SQL select statements. I think functionality for SQL delete (remove all columns matching to the expression), and update (if a row matches to the expression then update the row according to another expression) can also be easily added.
And on top of those, modify the code the read select/delete/update commands from a text file and then it would be possible to use CSV files like database tables. :-)
Modified the code. Now there is no need to use int() or float() in the query expressions.
I found this one line code to read the csv file.
rows = list(csv.reader(open('filename.csv')))
after this you can do any operations on the variable "rows".
@Thin Rhino: Could you provide an example?