Welcome, guest | Sign In | My Account | Store | Cart

Script generates CREATE TABLE statements based on the width of data present in comma delimited (csv) test files. Setting the correct datatypes (other than VARCHAR), is still a manual adventure.

TODOs:
  • Eliminate '#N/A', '@NA' from data
  • Remove commas from numeric data
  • Check for duplicate column names
  • Create BCP format file or INSERT statements?
Python, 87 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# csv2tbl.py - M.Keranen (mksql@yahoo.com) [10/14/2004]
# ------------------------------------------------------------------------
# Read a CSV file (exported from Excel), and generate an SQL statement
# to create a matching table structure (by field names and widths only).
# Expects sheet to start in cell A1, column names in first row:
# Suggested Excel export steps:
# 	1) Delete blank leading rows and columns to bring start of sheet to A1
# 	2) Select all cells, left justify, Format->Column->Auto Fit Selection
#	3) Save as file type .CSV
# -------------------------------------------------------------------------

import csv, os, string, sys

if len(sys.argv)<2:
	print "\nUsage: csv2tbl.py path/datafile.csv (0,1,2,3 = column name format):"
	print "\nFormat: 0 = TitleCasedWords"
	print "        1 = Titlecased_Words_Underscored"
	print "        2 = lowercase_words_underscored"
	print "        3 = Words_underscored_only (leave case as in source)"
	sys.exit()
else:
	if len(sys.argv)==2:
		dummy, datafile, = sys.argv
		namefmt = '0'
	else: dummy, datafile, namefmt = sys.argv

namefmt = int(namefmt)
#outfile = os.path.basename(datafile)
tblname = os.path.basename(datafile).split('.')[0]
outfile = os.path.dirname(datafile) + '\\' + tblname + '.sql'

# Create string translation tables
allowed = ' _01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
delchars = ''
for i in range(255):
	if chr(i) not in allowed: delchars = delchars + chr(i)
deltable = string.maketrans(' ','_')

# Create list of column [names],[widths]
reader = csv.reader(file(datafile),dialect='excel')
row = reader.next()
nc = len(row)
cols = []
for col in row:
	# Format column name to remove unwanted chars
	col = string.strip(col)
	col = string.translate(col,deltable,delchars)
	fmtcol = col
	if namefmt < 3:
		# Title case individual words, leaving original upper chars in place
		fmtcol = ''
		for i in range(len(col)):
			if col.title()[i].isupper(): fmtcol = fmtcol + col[i].upper()
			else: fmtcol = fmtcol + col[i]
	if namefmt == 2: fmtcol = col.lower()
	if namefmt == 0: fmtcol = string.translate(fmtcol,deltable,'_')   # Remove underscores
	
	d = 0
	dupcol = fmtcol	
	while dupcol in cols:
		d = d + 1
		dupcol = fmtcol + '_' + str(d)
	cols.append([dupcol,1])

# Determine max width of each column in each row
rc = 0
for row in reader:
	rc = rc + 1
	if len(row) == nc:
		for i in range(len(row)):
			fld = string.strip(row[i])
			if len(fld) > cols[i][1]:
				cols[i][1] = len(fld)
	else: print 'Warning: Line %s ignored. Different width than header' % (rc)

print

sql = 'CREATE TABLE %s\n(' % (tblname)
for col in cols:
	sql = sql + ('\n\t%s NVARCHAR(%s) NULL,' % (col[0],col[1]))
sql = sql[:len(sql)-1] + '\n)'

sqlfile = open(outfile,'w')
sqlfile.write(sql)
sqlfile.close

print '%s created.' % (outfile)

4 comments

Martin Blais 16 years, 8 months ago  # | flag

csv-db-import. http://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html

I think this program does something similar, and guesses the types automatically (you can also override with a header row).

Susanne Oberhauser 12 years, 6 months ago  # | flag

please use os.path.join(..., ...) instead of the ... + '\\' + ... to have this usefull on unix or mac, too.

thx

Edmon Begoli 9 years, 11 months ago  # | flag

How can I contribute some suggested code changes such as the one from Susanne Oberhauser in comments and upgrades (use of open ... with, 'rbU' open file flag, etc)

Matt Keranen (author) 9 years, 11 months ago  # | flag

Edmon: It looks like you can fork recipes now, or send the changes to mksql@yahoo.com.

I need to incorporate Susanne's suggestion, as this was written when I only used Windows, now I don't use it at all ;)