This is a simple interface to a collection of classes which serialize Python objects to XML and back.
This is one part of a trio of recipes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | '''
Sirius.py
This is a simple interface to a collection of classes which
serialize Python objects to XML and back.
The system is very lightweight and was not intended for complex XML
Usage:
xmlString = Sirius.serialize( pythonObj )
pythonObj = Sirius.deserialize( xmlString )
'''
from XML2Py import XML2Py
from Py2XML import Py2XML
def deserialize( xmlString ):
deserializer = XML2Py()
return deserializer.parse( xmlString )
def serialize( pyObject, root=None ) :
serializer = Py2XML()
return serializer.parse( pyObject, root )
def main():
test_xml = '''
<documents>
<document date="June 6, 2009" title="The Newness of Python" author="John Doe">
<copyright type="CC" url="http://www.creativecommons.org/" date="June 24, 2009" />
<text>Python is very nice. Very, very nice.</text>
<formats>
<format type="pdf">
<info uri="http://www.python.org/newness-of-python.pdf" pages="245" />
</format>
<format type="web">
<info uri="http://www.python.org/newness-of-python.html" />
</format>
</formats>
</document>
</documents>
'''
# This is not for use, just to see how XML compares to Python
data_output = '''
{'documents': [
{ 'title': 'The Newness of Python',
'date': 'June 6, 2009',
'author': 'John Doe',
'copyright': {
'url': 'http://www.creativecommons.org/',
'date': 'June 24, 2009',
'type': 'CC'},
'text': ['Python is very nice. Very, very nice.'],
'formats': [
{ 'type': 'pdf',
'info': {
'uri': 'http://www.python.org/newness-of-python.pdf',
'pages': '245'}
},
{ 'type': 'web',
'info': {
'uri': 'http://www.python.org/newness-of-python.html'}
}
]
}
]
}
'''
print test_xml
deserialized1 = deserialize( test_xml )
#print deserialized1
serialized1 = serialize( deserialized1 )
#print serialized1
deserialized2 = deserialize( serialized1 )
print deserialized2
serialized2 = serialize( deserialized2 )
print serialized2
# compare using Python data structures
if deserialized1 == deserialized2:
print "They are equal"
if __name__ == '__main__':
main()
|
Solution
The idea behind these three files was to create a simple way to move between Python data structures and XML. Simple in terms of the API as well as the format of both the resulting python and XML.
The original goal was to process XML data moving between a Flex UI and a backend powered by CherryPy and MongoDB. These classes provided the translation needed in the CherryPy layer.
This solution was intended for:
- simple XML ( ie, no namespaces, comments, CDATA, etc. )
- small XML files ( in-memory processing with
lxml
)
Though the code is only intended to process relatively simple XML, it can work with any arbitrarily-nested XML which follows the guidelines for structure.
The Python data structure generated from XML was intended for storage in a database, not necessarily for programmatic access. Of course this doesn't mean you can't access the elements in code, it just means that I didn't think about the cleanliness of how it would look.
Research
After looking over the Internet for something which would fill my needs I decided to write it myself. ( I'm sure that you've never heard that before )
Some of my inspiration came from recipe 440595, recipe 410469 and recipe 570085. And I also decided that I didn't want to deal with SAX parsing ( recipe 534109, recipe 415983 ), or override Python objects ( recipe 573463 ), or have extra info in the data structure ( recipe 570085 ), or require huge import lists ( recipe 415983 ), or be overly complex ( pyxser, xmlobject ).
Quirks / Caveats
I think there are two quirks with the chosen Python data format:
- XML nodes stored underneath a "container" node ( eg,
format
as in<formats> <format /> </formats>
... ), do not have their name stored in the Python structure. It is instead re-constructed in XML by using the parent's name and removing the last character - it is assumed the name will have an 's' on the end to indicate plurality. That said, this also means that accessing elements programmatically works like it should by using array indexing: eg,...['documents'][0]
,...['formats'][3]
. Other than this, all other names are stored as attributes in a Dict. - Any spans of text not in attributes must be contained within tags with no attributes. (eg,
<body>This is my text</body>
). These strings from nodes are stored as elements in a List. ( XML nodes with text were an afterthought as I didn't need them in my solution, but added the feature in the end to make it more general. )
Example
An example can be found in the code listings.
Hi, sorry to my poor english. I never work on xml, i am trying to use this recipe but i don't have xmlstring. I have url of xml file and i want to convert to xml to python data structure. how should i pass xmlstring to your deserialize(...) method. Please help me out.