This module takes a list of equal length lists and converts it into XML.
If the first sublist is a list of headings, these are used to form the element names of the rest of the data, or these can be defined in the function call. Root and "row" elements can be named if required.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | #LL2XML.py
"""
See http://www.outwardlynormal.com/python/ll2XML.htm for full documentation.
This module converts a list of lists into xml
(e.g.a parsed comma separated values file or whatever).
With the proper arguments, the XML output will be an HTML table.
(See the test function for an example.)
If you want to use a csv as input, you will first need to get
hold of a csv parser to create the list of lists.
Examples include those at:
http://tratt.net/laurie/python/asv/
and
http://www.object-craft.com.au/projects/csv/
"""
# set up exceptions
class Error(Exception):
def __init__(self, errcode, heading_num = 0, sublist_length = 0):
self.errcode = errcode
if self.errcode == "Length Error - Sublists":
self.message = ["All the sublists must be of uniform length."]
elif self.errcode == "Heading Error - Empty Item":
self.message = ["There is at least one empty heading item.\n",
"Please supply only non-empty headings."]
elif self.errcode == "Heading Error - heading/sublist missmatch":
self.message = ["Number of headings=",`heading_num`, "\n",
"Number of elements in sublists=", `sublist_length`, "\n",
"These numbers must be equal."]
print self.message
else: self.message = ""
self.errmsg = "".join(self.message)
def __str__(self):
return (self.errmsg)
pass
def escape(s):
"""Replace special characters '&', "'", '<', '>' and '"' by XML entities."""
s = s.replace("&", "&") # Must be done first!
s = s.replace("'", "'")
s = s.replace("<", "<")
s = s.replace(">", ">")
s = s.replace('"', """)
return s
def cleanString(s, ident):
if type(s) != type(""):
s = `s`
s = escape(s)
if ident == "tag":
s = s.lower()
s = s.replace(" ", "_")
return s
def LL2XML(LL,headings_tuple = (), root_element = "rows", row_element = "row", xml_declared = "yes"):
if headings_tuple == "table":
td_list = []
for item in LL[0]:
td_list.append("td")
headings_tuple = tuple(td_list)
root_element = "table"
row_element = "tr"
xml_declared = "no"
root_element = cleanString(root_element, "tag")
row_element = cleanString(row_element, "tag")
if headings_tuple == ():
headings = [cleanString(s,"tag") for s in LL[0]]
LL = LL[1:] # remove now redundant heading row
else:
headings = [cleanString(s,"tag") for s in headings_tuple]
# Sublists all of the same length?
if ['!' for sublist in LL if len(sublist) != len(LL[0])]:
raise Error("Length Error - Sublists")
#check headings
heading_num = len(headings)
if heading_num != len(LL[0]):
raise Error("Heading Error - heading/sublist missmatch", heading_num, len(LL[0]))
for item in headings:
if not cleanString(item,"heading"):
raise Error("Heading Error - Empty Item")
else:
pass
# Do the conversion
xml = ""
if xml_declared == "yes":
xml_declaration = '<?xml version="1.0" encoding="iso-8859-1"?>\n'
else:
xml_declaration = ""
bits = []
add_bit = bits.append
add_bit(xml_declaration)
add_bit('<')
add_bit(root_element)
add_bit('>')
for sublist in LL:
add_bit("\n <")
add_bit(row_element)
add_bit(">\n")
i = 0
for item in sublist:
tag = headings[i]
item = cleanString(item, "item")
add_bit(" <")
add_bit(tag)
add_bit(">")
add_bit(item)
add_bit("</")
add_bit(tag)
add_bit(">\n")
i = i+1
add_bit(" </")
add_bit(row_element)
add_bit(">")
add_bit("\n</")
add_bit(root_element)
add_bit(">")
xml = "".join(bits)
return xml
def test():
LL = [['Login', 'First Name', 'Last Name', 'Job', 'Group', 'Office', 'Permission'],
['auser', 'Arnold', 'Atkins', 'Partner', 'Tax', 'London', 'read'],
['buser', 'Bill', 'Brown', 'Partner', 'Tax', 'New York', 'read'],
['cuser', 'Clive', 'Cutler', 'Partner', 'Management', 'Brussels', 'read'],
['duser', 'Denis', 'Davis', 'Developer', 'ISS', 'London', 'admin'],
['euser', 'Eric', 'Ericsson', 'Analyst', 'Analysis', 'London', 'admin'],
['fuser', 'Fabian', 'Fowles', 'Partner', 'IP', 'London', 'read']]
LL_no_heads = [['auser', 'Arnold', 'Atkins', 'Partner', 'Tax', 'London', 'read'],
['buser', 'Bill', 'Brown', 'Partner', 'Tax', 'New York', 'read'],
['cuser', 'Clive', 'Cutler', 'Partner', 'Management', 'Brussels', 'read'],
['duser', 'Denis', 'Davis', 'Developer', 'ISS', 'London', 'admin'],
['euser', 'Eric', 'Ericsson', 'Analyst', 'Analysis', 'London', 'admin'],
['fuser', 'Fabian', 'Fowles', 'IP', 'Partner', 'London', 'read']]
#Example 1
print "Example 1: Simple case, using defaults.\n"
print LL2XML(LL)
print "\n"
#Example 2
print """Example 2: LL has its headings in the first line, and we define our root and row element names.\n"""
print LL2XML(LL,(),"people","person")
print "\n"
#Example 3
print """Example 3: headings supplied using the headings argument(tuple), using default root and row element names.\n"""
print LL2XML(LL_no_heads,("Login","First Name","Last Name","Job","Group","Office","Permission"))
print "\n"
#Example 4
print """Example 4: The special case where we ask for an HTML table as output by just giving the string "table" as the second argument.\n"""
print LL2XML(LL,"table")
|
Parsers of tabular data or comma separated values (csv) files will usually output a list of lists. Converting these to XML allows them to be manipulated with XSLT and other XML tools.
Be sure to use the source from the text source link above. The source code contains entities which are interpreted by web browsers, so these do not display correctly on this page.
The text source is fine.
Escaped commas. I've just found out that the way commas are escaped in csv is to put the whole data element in quotes, not just the comma. (A much less sensible idea, IMHO). This means that the posted version will not work where one or more commas are present in the csv values.
Sorry about that. If anyone wants to post a fix, I'd be grateful. Otherwise, I'll fix and update this very shortly.
Use existing module to parse CSV? For the job of parsing CSV you may want to use one of the existing modules available at the Vaults of Parnassus:
http://www.vex.net/parnassus/apyllo.py?find=csv
They will probably already have solved the problem of dealing properly with commas and quotes.
I should have looked there first, shouldn't I? Yes, you are quite right. Well that's saved me a bit of work!
I'll simplify the script to just accept the pre-parsed data and repost the module shortly. Thanks!
csv to XML? Pah! I convert a list of lists! OK, I've slashed and burned and rejigged, and now the module seems to work fine. Send it a well formed list of lists (sublists all of equal length) and it will spit XML at you.
As before, use the plain text link, as the script as it appears on this page is broken.
Now I suppose it will be rejected for being too long.
Ah well.
Tidied it up. OK, I've stripped out the documentation (it's on my web site), used the Exception class to create my exceptions, and used the append method throughout, rather than use string manipulation (it's quicker).
One nice additional feature: If you do LL2XML.LL2XML(LL,"table") - where LL is a list of equal length lists - the script returns an HTML table.
(As before, use the text source, or download from my site.)
Added a few minor fixes. See http://www.outwardlynormal.com/python/ll2XML.htm for details.
Just a minor thing. What about:
It should save some typing... :)
Yes. Quite right. The version of this script in the Cookbook book has this and several other improvements. I will synch the two versions soon. Out of interest, I will be checking the effect on performance of the much more compact code in the published version.
Unfortunately the O'Reilly published version has problems. Two of the entity replacements are broken. It looks like the editor didn't use the source code, but copied and pasted from a browser.
I'm trying to get this fixed in their online version of the module. Too late for the printed version, I expect (I've not seen it yet).
Ah well, good to be in there, anyway.