Welcome, guest | Sign In | My Account | Store | Cart

This is a simple interface to a collection of classes which serialize Python objects to XML and back.

This is one part of a trio of recipes:

Python, 90 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
'''
Sirius.py

This is a simple interface to a collection of classes which
serialize Python objects to XML and back.

The system is very lightweight and was not intended for complex XML

Usage:
  xmlString = Sirius.serialize( pythonObj )
  pythonObj = Sirius.deserialize( xmlString )
'''

from XML2Py import XML2Py
from Py2XML import Py2XML


def deserialize( xmlString ):
    deserializer = XML2Py()
    return deserializer.parse( xmlString )

def serialize( pyObject, root=None ) :
    serializer = Py2XML()
    return serializer.parse( pyObject, root )


def main():

    test_xml = '''
    <documents>
      <document date="June 6, 2009" title="The Newness of Python" author="John Doe">
        <copyright type="CC" url="http://www.creativecommons.org/" date="June 24, 2009" />
        <text>Python is very nice. Very, very nice.</text>
        <formats>
          <format type="pdf">
            <info uri="http://www.python.org/newness-of-python.pdf" pages="245" />
          </format>
          <format type="web">
            <info uri="http://www.python.org/newness-of-python.html" />
          </format>
        </formats>
      </document>
    </documents>
    '''

    # This is not for use, just to see how XML compares to Python
    data_output = '''
    {'documents': [
          { 'title': 'The Newness of Python',
            'date': 'June 6, 2009',
            'author': 'John Doe',
            'copyright': {
                'url': 'http://www.creativecommons.org/',
                'date': 'June 24, 2009',
                'type': 'CC'},
            'text': ['Python is very nice. Very, very nice.'],
            'formats': [
                {   'type': 'pdf',
                    'info': {
                        'uri': 'http://www.python.org/newness-of-python.pdf',
                        'pages': '245'}
                },
                { 'type': 'web',
                    'info': {
                        'uri': 'http://www.python.org/newness-of-python.html'}
                }
            ]
          }
      ]
    }

    '''

    print test_xml
    deserialized1 = deserialize( test_xml )
    #print deserialized1
    serialized1 = serialize( deserialized1 )
    #print serialized1
    deserialized2 = deserialize( serialized1 )
    print deserialized2
    serialized2 = serialize( deserialized2 )
    print serialized2

    # compare using Python data structures
    if deserialized1 == deserialized2:
        print "They are equal"


if __name__ == '__main__':
    main()

Solution

The idea behind these three files was to create a simple way to move between Python data structures and XML. Simple in terms of the API as well as the format of both the resulting python and XML.

The original goal was to process XML data moving between a Flex UI and a backend powered by CherryPy and MongoDB. These classes provided the translation needed in the CherryPy layer.

This solution was intended for:

  • simple XML ( ie, no namespaces, comments, CDATA, etc. )
  • small XML files ( in-memory processing with lxml )

Though the code is only intended to process relatively simple XML, it can work with any arbitrarily-nested XML which follows the guidelines for structure.

The Python data structure generated from XML was intended for storage in a database, not necessarily for programmatic access. Of course this doesn't mean you can't access the elements in code, it just means that I didn't think about the cleanliness of how it would look.

Research

After looking over the Internet for something which would fill my needs I decided to write it myself. ( I'm sure that you've never heard that before )

Some of my inspiration came from recipe 440595, recipe 410469 and recipe 570085. And I also decided that I didn't want to deal with SAX parsing ( recipe 534109, recipe 415983 ), or override Python objects ( recipe 573463 ), or have extra info in the data structure ( recipe 570085 ), or require huge import lists ( recipe 415983 ), or be overly complex ( pyxser, xmlobject ).

Quirks / Caveats

I think there are two quirks with the chosen Python data format:

  1. XML nodes stored underneath a "container" node ( eg, format as in <formats> <format /> </formats> ... ), do not have their name stored in the Python structure. It is instead re-constructed in XML by using the parent's name and removing the last character - it is assumed the name will have an 's' on the end to indicate plurality. That said, this also means that accessing elements programmatically works like it should by using array indexing: eg, ...['documents'][0], ...['formats'][3]. Other than this, all other names are stored as attributes in a Dict.
  2. Any spans of text not in attributes must be contained within tags with no attributes. (eg, <body>This is my text</body>). These strings from nodes are stored as elements in a List. ( XML nodes with text were an afterthought as I didn't need them in my solution, but added the feature in the end to make it more general. )

Example

An example can be found in the code listings.

1 comment

Rupesh Patil 8 years, 5 months ago  # | flag

Hi, sorry to my poor english. I never work on xml, i am trying to use this recipe but i don't have xmlstring. I have url of xml file and i want to convert to xml to python data structure. how should i pass xmlstring to your deserialize(...) method. Please help me out.

Created by David McCuskey on Wed, 16 Jun 2010 (MIT)
Python recipes (4591)
David McCuskey's recipes (3)

Required Modules

  • (none specified)

Other Information and Tasks