Welcome, guest | Sign In | My Account | Store | Cart

Script generates CREATE TABLE statements based on the width of data present in comma delimited (csv) test files. Setting the correct datatypes (other than VARCHAR), is still a manual adventure.

TODOs:
  • Eliminate '#N/A', '@NA' from data
  • Remove commas from numeric data
  • Check for duplicate column names
  • Create BCP format file or INSERT statements?
Python, 87 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# csv2tbl.py - M.Keranen (mksql@yahoo.com) [10/14/2004]
# ------------------------------------------------------------------------
# Read a CSV file (exported from Excel), and generate an SQL statement
# to create a matching table structure (by field names and widths only).
# Expects sheet to start in cell A1, column names in first row:
# Suggested Excel export steps:
# 	1) Delete blank leading rows and columns to bring start of sheet to A1
# 	2) Select all cells, left justify, Format->Column->Auto Fit Selection
#	3) Save as file type .CSV
# -------------------------------------------------------------------------

import csv, os, string, sys

if len(sys.argv)<2:
	print "\nUsage: csv2tbl.py path/datafile.csv (0,1,2,3 = column name format):"
	print "\nFormat: 0 = TitleCasedWords"
	print "        1 = Titlecased_Words_Underscored"
	print "        2 = lowercase_words_underscored"
	print "        3 = Words_underscored_only (leave case as in source)"
	sys.exit()
else:
	if len(sys.argv)==2:
		dummy, datafile, = sys.argv
		namefmt = '0'
	else: dummy, datafile, namefmt = sys.argv

namefmt = int(namefmt)
#outfile = os.path.basename(datafile)
tblname = os.path.basename(datafile).split('.')[0]
outfile = os.path.dirname(datafile) + '\\' + tblname + '.sql'

# Create string translation tables
allowed = ' _01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
delchars = ''
for i in range(255):
	if chr(i) not in allowed: delchars = delchars + chr(i)
deltable = string.maketrans(' ','_')

# Create list of column [names],[widths]
reader = csv.reader(file(datafile),dialect='excel')
row = reader.next()
nc = len(row)
cols = []
for col in row:
	# Format column name to remove unwanted chars
	col = string.strip(col)
	col = string.translate(col,deltable,delchars)
	fmtcol = col
	if namefmt < 3:
		# Title case individual words, leaving original upper chars in place
		fmtcol = ''
		for i in range(len(col)):
			if col.title()[i].isupper(): fmtcol = fmtcol + col[i].upper()
			else: fmtcol = fmtcol + col[i]
	if namefmt == 2: fmtcol = col.lower()
	if namefmt == 0: fmtcol = string.translate(fmtcol,deltable,'_')   # Remove underscores
	
	d = 0
	dupcol = fmtcol	
	while dupcol in cols:
		d = d + 1
		dupcol = fmtcol + '_' + str(d)
	cols.append([dupcol,1])

# Determine max width of each column in each row
rc = 0
for row in reader:
	rc = rc + 1
	if len(row) == nc:
		for i in range(len(row)):
			fld = string.strip(row[i])
			if len(fld) > cols[i][1]:
				cols[i][1] = len(fld)
	else: print 'Warning: Line %s ignored. Different width than header' % (rc)

print

sql = 'CREATE TABLE %s\n(' % (tblname)
for col in cols:
	sql = sql + ('\n\t%s NVARCHAR(%s) NULL,' % (col[0],col[1]))
sql = sql[:len(sql)-1] + '\n)'

sqlfile = open(outfile,'w')
sqlfile.write(sql)
sqlfile.close

print '%s created.' % (outfile)

4 comments

Martin Blais 17 years, 6 months ago  # | flag

csv-db-import. http://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html

I think this program does something similar, and guesses the types automatically (you can also override with a header row).

Susanne Oberhauser 13 years, 5 months ago  # | flag

please use os.path.join(..., ...) instead of the ... + '\\' + ... to have this usefull on unix or mac, too.

thx

Edmon Begoli 10 years, 10 months ago  # | flag

How can I contribute some suggested code changes such as the one from Susanne Oberhauser in comments and upgrades (use of open ... with, 'rbU' open file flag, etc)

Matt Keranen (author) 10 years, 10 months ago  # | flag

Edmon: It looks like you can fork recipes now, or send the changes to mksql@yahoo.com.

I need to incorporate Susanne's suggestion, as this was written when I only used Windows, now I don't use it at all ;)