Script generates CREATE TABLE statements based on the width of data present in comma delimited (csv) test files. Setting the correct datatypes (other than VARCHAR), is still a manual adventure.
TODOs:
- Eliminate '#N/A', '@NA' from data
- Remove commas from numeric data
- Check for duplicate column names
- Create BCP format file or INSERT statements?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | # csv2tbl.py - M.Keranen (mksql@yahoo.com) [10/14/2004]
# ------------------------------------------------------------------------
# Read a CSV file (exported from Excel), and generate an SQL statement
# to create a matching table structure (by field names and widths only).
# Expects sheet to start in cell A1, column names in first row:
# Suggested Excel export steps:
# 1) Delete blank leading rows and columns to bring start of sheet to A1
# 2) Select all cells, left justify, Format->Column->Auto Fit Selection
# 3) Save as file type .CSV
# -------------------------------------------------------------------------
import csv, os, string, sys
if len(sys.argv)<2:
print "\nUsage: csv2tbl.py path/datafile.csv (0,1,2,3 = column name format):"
print "\nFormat: 0 = TitleCasedWords"
print " 1 = Titlecased_Words_Underscored"
print " 2 = lowercase_words_underscored"
print " 3 = Words_underscored_only (leave case as in source)"
sys.exit()
else:
if len(sys.argv)==2:
dummy, datafile, = sys.argv
namefmt = '0'
else: dummy, datafile, namefmt = sys.argv
namefmt = int(namefmt)
#outfile = os.path.basename(datafile)
tblname = os.path.basename(datafile).split('.')[0]
outfile = os.path.dirname(datafile) + '\\' + tblname + '.sql'
# Create string translation tables
allowed = ' _01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
delchars = ''
for i in range(255):
if chr(i) not in allowed: delchars = delchars + chr(i)
deltable = string.maketrans(' ','_')
# Create list of column [names],[widths]
reader = csv.reader(file(datafile),dialect='excel')
row = reader.next()
nc = len(row)
cols = []
for col in row:
# Format column name to remove unwanted chars
col = string.strip(col)
col = string.translate(col,deltable,delchars)
fmtcol = col
if namefmt < 3:
# Title case individual words, leaving original upper chars in place
fmtcol = ''
for i in range(len(col)):
if col.title()[i].isupper(): fmtcol = fmtcol + col[i].upper()
else: fmtcol = fmtcol + col[i]
if namefmt == 2: fmtcol = col.lower()
if namefmt == 0: fmtcol = string.translate(fmtcol,deltable,'_') # Remove underscores
d = 0
dupcol = fmtcol
while dupcol in cols:
d = d + 1
dupcol = fmtcol + '_' + str(d)
cols.append([dupcol,1])
# Determine max width of each column in each row
rc = 0
for row in reader:
rc = rc + 1
if len(row) == nc:
for i in range(len(row)):
fld = string.strip(row[i])
if len(fld) > cols[i][1]:
cols[i][1] = len(fld)
else: print 'Warning: Line %s ignored. Different width than header' % (rc)
print
sql = 'CREATE TABLE %s\n(' % (tblname)
for col in cols:
sql = sql + ('\n\t%s NVARCHAR(%s) NULL,' % (col[0],col[1]))
sql = sql[:len(sql)-1] + '\n)'
sqlfile = open(outfile,'w')
sqlfile.write(sql)
sqlfile.close
print '%s created.' % (outfile)
|
Tags: database
csv-db-import. http://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html
I think this program does something similar, and guesses the types automatically (you can also override with a header row).
please use
os.path.join(..., ...)
instead of the... + '\\' + ...
to have this usefull on unix or mac, too.thx
How can I contribute some suggested code changes such as the one from Susanne Oberhauser in comments and upgrades (use of open ... with, 'rbU' open file flag, etc)
Edmon: It looks like you can fork recipes now, or send the changes to mksql@yahoo.com.
I need to incorporate Susanne's suggestion, as this was written when I only used Windows, now I don't use it at all ;)