Welcome, guest | Sign In | My Account | Store | Cart

This code serializes Python data structure into XML.

This is one part of a trio of recipes:

For more information

See XML to Python data structure Recipe #577266

Python, 119 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
'''
Py2XML - Python to XML serialization

This code transforms a Python data structures into an XML document

Usage:
    serializer = Py2XML()
    xml_string = serializer.parse( python_object )
    print python_object
    print xml_string
'''

class Py2XML():

    def __init__( self ):

        self.data = "" # where we store the processed XML string

    def parse( self, pythonObj, objName=None ):
        '''
        processes Python data structure into XML string
        needs objName if pythonObj is a List
        '''
        if pythonObj == None:
            return ""

        if isinstance( pythonObj, dict ):
            self.data = self._PyDict2XML( pythonObj )
            
        elif isinstance( pythonObj, list ):
            # we need name for List object
            self.data = self._PyList2XML( pythonObj, objName )
            
        else:
            self.data = "<%(n)s>%(o)s</%(n)s>" % { 'n':objName, 'o':str( pythonObj ) }
            
        return self.data

    def _PyDict2XML( self, pyDictObj, objName=None ):
        '''
        process Python Dict objects
        They can store XML attributes and/or children
        '''
        tagStr = ""     # XML string for this level
        attributes = {} # attribute key/value pairs
        attrStr = ""    # attribute string of this level
        childStr = ""   # XML string of this level's children

        for k, v in pyDictObj.items():

            if isinstance( v, dict ):
                # child tags, with attributes
                childStr += self._PyDict2XML( v, k )

            elif isinstance( v, list ):
                # child tags, list of children
                childStr += self._PyList2XML( v, k )

            else:
                # tag could have many attributes, let's save until later
                attributes.update( { k:v } )

        if objName == None:
            return childStr

        # create XML string for attributes
        for k, v in attributes.items():
            attrStr += " %s=\"%s\"" % ( k, v )

        # let's assemble our tag string
        if childStr == "":
            tagStr += "<%(n)s%(a)s />" % { 'n':objName, 'a':attrStr }
        else:
            tagStr += "<%(n)s%(a)s>%(c)s</%(n)s>" % { 'n':objName, 'a':attrStr, 'c':childStr }

        return tagStr

    def _PyList2XML( self, pyListObj, objName=None ):
        '''
        process Python List objects
        They have no attributes, just children
        Lists only hold Dicts or Strings
        '''
        tagStr = ""    # XML string for this level
        childStr = ""  # XML string of children

        for childObj in pyListObj:
            
            if isinstance( childObj, dict ):
                # here's some Magic
                # we're assuming that List parent has a plural name of child:
                # eg, persons > person, so cut off last char
                # name-wise, only really works for one level, however
                # in practice, this is probably ok
                childStr += self._PyDict2XML( childObj, objName[:-1] )
            else:
                for string in childObj:
                    childStr += string;

        if objName == None:
            return childStr

        tagStr += "<%(n)s>%(c)s</%(n)s>" % { 'n':objName, 'c':childStr }

        return tagStr


def main():

    python_object = {'documents': [{'formats': [{'info': {'uri': 'http://www.python.org/newness-of-python.pdf', 'pages': '245'}, 'type': 'pdf'}, {'info': {'uri': 'http://www.python.org/newness-of-python.html'}, 'type': 'web'}], 'copyright': {'url': 'http://www.creativecommons.org/', 'date': 'June 24, 2009', 'type': 'CC'}, 'title': 'The Newness of Python', 'date': 'June 6, 2009', 'text': ['Python is very nice. Very, very nice.'], 'author': 'John Doe'}]}
    
    serializer = Py2XML()
    xml_string = serializer.parse( python_object )
    print python_object
    print xml_string


if __name__ == '__main__':
    main()

2 comments

Diego Calzadilla 9 years, 8 months ago  # | flag

Hi, this code is great, I have tested it, and I have a question:

If I change python_object adding one more element in format's info, for example 'age':'15':

python_object = {'documents': [{'formats': [{'info': {'uri': 'http://www.python.org/newness-of-python.pdf', 'pages': '245', 'age':'15'}, 'type': 'pdf'}, {'info': {'uri': 'http://www.python.org/newness-of-python.html'}, 'type': 'web'}], 'copyright': {'url': 'http://www.creativecommons.org/', 'date': 'June 24, 2009', 'type': 'CC'}, 'title': 'The Newness of Python', 'date': 'June 6, 2009', 'text': ['Python is very nice. Very, very nice.'], 'author': 'John Doe'}]}

I get the following XML:

<documents> <document author="John Doe" date="June 6, 2009" title="The Newness of Python"> <copyright date="June 24, 2009" type="CC" url="http://www.creativecommons.org/" /> <text>Python is very nice. Very, very nice.</text> <formats> <format type="pdf"> <info age="15" pages="245" uri="http://www.python.org/newness-of-python.pdf" /> </format> <format type="web"> <info uri="http://www.python.org/newness-of-python.html" /> </format> </formats> </document> </documents>

As you can see age="15" is my first element even though I wrote it and the end of my dict and I can notice that this code sorts alphabetically the elements of the dicts....my question is:

is there a way to avoid this alphabetically order??

David McCuskey (author) 9 years, 7 months ago  # | flag

two things to mention:

  1. it appears that the dictionary method items() sorts names automatically, though the official Python docs seem to imply that order isn't guaranteed. if it didn't sort automatically, i would update the code so that it was still alphabetical in order for the resulting XML to be the same every time. this is necessary for testing purposes only.

  2. i use a DICT to store the XML attributes. a common trait of DICTs is that the order of items stored inside isn't guaranteed. this structure works to store XML attributes because those can be listed in any specific order and it's still the same XML, so they are a good, simple fit.

if you want your XML attributes to retain their order, then another data structure would have to be used, such as an array.

Created by David McCuskey on Wed, 16 Jun 2010 (MIT)
Python recipes (4591)
David McCuskey's recipes (3)

Required Modules

  • (none specified)

Other Information and Tasks