This code serializes Python data structure into XML.
This is one part of a trio of recipes:
For more information
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | '''
Py2XML - Python to XML serialization
This code transforms a Python data structures into an XML document
Usage:
serializer = Py2XML()
xml_string = serializer.parse( python_object )
print python_object
print xml_string
'''
class Py2XML():
def __init__( self ):
self.data = "" # where we store the processed XML string
def parse( self, pythonObj, objName=None ):
'''
processes Python data structure into XML string
needs objName if pythonObj is a List
'''
if pythonObj == None:
return ""
if isinstance( pythonObj, dict ):
self.data = self._PyDict2XML( pythonObj )
elif isinstance( pythonObj, list ):
# we need name for List object
self.data = self._PyList2XML( pythonObj, objName )
else:
self.data = "<%(n)s>%(o)s</%(n)s>" % { 'n':objName, 'o':str( pythonObj ) }
return self.data
def _PyDict2XML( self, pyDictObj, objName=None ):
'''
process Python Dict objects
They can store XML attributes and/or children
'''
tagStr = "" # XML string for this level
attributes = {} # attribute key/value pairs
attrStr = "" # attribute string of this level
childStr = "" # XML string of this level's children
for k, v in pyDictObj.items():
if isinstance( v, dict ):
# child tags, with attributes
childStr += self._PyDict2XML( v, k )
elif isinstance( v, list ):
# child tags, list of children
childStr += self._PyList2XML( v, k )
else:
# tag could have many attributes, let's save until later
attributes.update( { k:v } )
if objName == None:
return childStr
# create XML string for attributes
for k, v in attributes.items():
attrStr += " %s=\"%s\"" % ( k, v )
# let's assemble our tag string
if childStr == "":
tagStr += "<%(n)s%(a)s />" % { 'n':objName, 'a':attrStr }
else:
tagStr += "<%(n)s%(a)s>%(c)s</%(n)s>" % { 'n':objName, 'a':attrStr, 'c':childStr }
return tagStr
def _PyList2XML( self, pyListObj, objName=None ):
'''
process Python List objects
They have no attributes, just children
Lists only hold Dicts or Strings
'''
tagStr = "" # XML string for this level
childStr = "" # XML string of children
for childObj in pyListObj:
if isinstance( childObj, dict ):
# here's some Magic
# we're assuming that List parent has a plural name of child:
# eg, persons > person, so cut off last char
# name-wise, only really works for one level, however
# in practice, this is probably ok
childStr += self._PyDict2XML( childObj, objName[:-1] )
else:
for string in childObj:
childStr += string;
if objName == None:
return childStr
tagStr += "<%(n)s>%(c)s</%(n)s>" % { 'n':objName, 'c':childStr }
return tagStr
def main():
python_object = {'documents': [{'formats': [{'info': {'uri': 'http://www.python.org/newness-of-python.pdf', 'pages': '245'}, 'type': 'pdf'}, {'info': {'uri': 'http://www.python.org/newness-of-python.html'}, 'type': 'web'}], 'copyright': {'url': 'http://www.creativecommons.org/', 'date': 'June 24, 2009', 'type': 'CC'}, 'title': 'The Newness of Python', 'date': 'June 6, 2009', 'text': ['Python is very nice. Very, very nice.'], 'author': 'John Doe'}]}
serializer = Py2XML()
xml_string = serializer.parse( python_object )
print python_object
print xml_string
if __name__ == '__main__':
main()
|
Hi, this code is great, I have tested it, and I have a question:
If I change python_object adding one more element in format's info, for example 'age':'15':
python_object = {'documents': [{'formats': [{'info': {'uri': 'http://www.python.org/newness-of-python.pdf', 'pages': '245', 'age':'15'}, 'type': 'pdf'}, {'info': {'uri': 'http://www.python.org/newness-of-python.html'}, 'type': 'web'}], 'copyright': {'url': 'http://www.creativecommons.org/', 'date': 'June 24, 2009', 'type': 'CC'}, 'title': 'The Newness of Python', 'date': 'June 6, 2009', 'text': ['Python is very nice. Very, very nice.'], 'author': 'John Doe'}]}
I get the following XML:
<documents> <document author="John Doe" date="June 6, 2009" title="The Newness of Python"> <copyright date="June 24, 2009" type="CC" url="http://www.creativecommons.org/" /> <text>Python is very nice. Very, very nice.</text> <formats> <format type="pdf"> <info age="15" pages="245" uri="http://www.python.org/newness-of-python.pdf" /> </format> <format type="web"> <info uri="http://www.python.org/newness-of-python.html" /> </format> </formats> </document> </documents>
As you can see age="15" is my first element even though I wrote it and the end of my dict and I can notice that this code sorts alphabetically the elements of the dicts....my question is:
is there a way to avoid this alphabetically order??
two things to mention:
it appears that the dictionary method items() sorts names automatically, though the official Python docs seem to imply that order isn't guaranteed. if it didn't sort automatically, i would update the code so that it was still alphabetical in order for the resulting XML to be the same every time. this is necessary for testing purposes only.
i use a DICT to store the XML attributes. a common trait of DICTs is that the order of items stored inside isn't guaranteed. this structure works to store XML attributes because those can be listed in any specific order and it's still the same XML, so they are a good, simple fit.
if you want your XML attributes to retain their order, then another data structure would have to be used, such as an array.