Welcome, guest | Sign In | My Account | Store | Cart

The user can define a query expression for (almost) any CSV file. (The first row assumed to contain column headers.) Then the code outputs all data rows matching to the given query expression.

Python, 73 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# queryCSV.py
# FB - 201010111
# Query a CSV file.

# example CSV file: myData.csv
# id,code name,value
# 36,abc,7.6
# 40,def,3.6
# 9,ghi,6.3
# 76,def,99

# example query expression
query = 'id > 36 and code_name == "def" and value <= 100'

print 'Query:'
print query
print

import csv
csvData = csv.reader(open('myData.csv'))
csvTable = []
isHeader = True
for row in csvData:
    if isHeader:
        isHeader = False
        headerRow = row
        for i in range(len(headerRow)):
            # replace spaces w/ underscores in column headers
            headerRow[i] = headerRow[i].replace(' ', '_')
    else:
        csvTable.append(row)

# determine column types: string/int/float
colType = []
for i in range(len(headerRow)):
    isFloat = True
    isInt = True
    for j in range(len(csvTable)):
        try:
            v = float(csvTable[j][i])
            if not v == int(v):
                isInt = False
        except ValueError:
            isFloat = False
            isInt = False

    colT = ''
    if isInt:
        colT = 'int'
    elif isFloat:
        colT = 'float'
    else:
        colT = 'string'
    colType.append(colT)

    # print headerRow[i], colT
    
# run the query
for j in range(len(csvTable)):
    # assign the column variables
    for i in range(len(headerRow)):
        if colType[i] == 'string':
            exec(headerRow[i] + '=' + '"' + csvTable[j][i] + '"')
        elif colType[i] == 'float':
            exec(headerRow[i] + '=' + 'float("' + csvTable[j][i] + '")')
        elif colType[i] == 'int':
            exec(headerRow[i] + '=' + 'int("' + csvTable[j][i] + '")')

    # output the rows matching the query
    if eval(query):
        print headerRow
        print csvTable[j]
        print

6 comments

Nick 13 years, 6 months ago  # | flag

Nice!

One small enhancement, there is no guarantee that header names are valid python variable names.

In this case, a small function to make names to valid python variables is needed. e.g. Replace spaces with underscores.

It's also a good example of a correct use of eval and exec. Too often you see

eval ("x=10") or equivalent rather than x=exec("10")

Nick

FB36 (author) 13 years, 6 months ago  # | flag

Added replacement of spaces w/ underscores in column headers.

Another improvement would be this: First put the whole CSV data into a 2d list. Check if a column contains only integers then assign all values for that column as integers. And if at least one value contains a radix point then assign all values for that column as floats. That would make using int() and float() in the expressions unnecessary.

Also, this code provides similar functionality to SQL select statements. I think functionality for SQL delete (remove all columns matching to the expression), and update (if a row matches to the expression then update the row according to another expression) can also be easily added.

FB36 (author) 13 years, 6 months ago  # | flag

And on top of those, modify the code the read select/delete/update commands from a text file and then it would be possible to use CSV files like database tables. :-)

FB36 (author) 13 years, 6 months ago  # | flag

Modified the code. Now there is no need to use int() or float() in the query expressions.

Thin Rhino 13 years, 4 months ago  # | flag

I found this one line code to read the csv file.

rows = list(csv.reader(open('filename.csv')))

after this you can do any operations on the variable "rows".

Bjorn Madsen 13 years, 4 months ago  # | flag

@Thin Rhino: Could you provide an example?

Created by FB36 on Thu, 7 Oct 2010 (MIT)
Python recipes (4591)
FB36's recipes (148)

Required Modules

Other Information and Tasks