Welcome, guest | Sign In | My Account | Store | Cart

A generic script using expat to convert xml into objects

Python, 92 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
"""
Borrowed from wxPython XML tree demo and modified.
"""

import string
from xml.parsers import expat

class Element:
    'A parsed XML element'
    def __init__(self,name,attributes):
        'Element constructor'
        # The element's tag name
        self.name = name
        # The element's attribute dictionary
        self.attributes = attributes
        # The element's cdata
        self.cdata = ''
        # The element's child element list (sequence)
        self.children = []
        
    def AddChild(self,element):
        'Add a reference to a child element'
        self.children.append(element)
        
    def getAttribute(self,key):
        'Get an attribute value'
        return self.attributes.get(key)
    
    def getData(self):
        'Get the cdata'
        return self.cdata
        
    def getElements(self,name=''):
        'Get a list of child elements'
        #If no tag name is specified, return the all children
        if not name:
            return self.children
        else:
            # else return only those children with a matching tag name
            elements = []
            for element in self.children:
                if element.name == name:
                    elements.append(element)
            return elements

class Xml2Obj:
    'XML to Object'
    def __init__(self):
        self.root = None
        self.nodeStack = []
        
    def StartElement(self,name,attributes):
        'SAX start element even handler'
        # Instantiate an Element object
        element = Element(name.encode(),attributes)
        
        # Push element onto the stack and make it a child of parent
        if len(self.nodeStack) > 0:
            parent = self.nodeStack[-1]
            parent.AddChild(element)
        else:
            self.root = element
        self.nodeStack.append(element)
        
    def EndElement(self,name):
        'SAX end element event handler'
        self.nodeStack = self.nodeStack[:-1]

    def CharacterData(self,data):
        'SAX character data event handler'
        if string.strip(data):
            data = data.encode()
            element = self.nodeStack[-1]
            element.cdata += data
            return

    def Parse(self,filename):
        # Create a SAX parser
        Parser = expat.ParserCreate()

        # SAX event handlers
        Parser.StartElementHandler = self.StartElement
        Parser.EndElementHandler = self.EndElement
        Parser.CharacterDataHandler = self.CharacterData

        # Parse the XML File
        ParserStatus = Parser.Parse(open(filename,'r').read(), 1)
        
        return self.root
    
parser = Xml2Obj()
element = parser.Parse('sample.xml')

I saw Christoph Dietze's script to turn the structure of a XML-document into a combination of dictionaries and lists which is a good idea. xml2obj is a variation of the concept and differs by:

  1. Using expat. The dom parser has a lot of overhead because it creates its own data structures, which were being used to create a second set of structures.

  2. Uses a stack to append child elements to parent.

  3. Uses classes to enhance access to the elements.

5 comments

C. Yu 18 years, 7 months ago  # | flag

Output. How do you print out the parsed xml object? Do you have any examples? Thanks.

bob w 17 years, 4 months ago  # | flag

I took this code and updated it to where it works pretty well for me now. First, I changed the Element to have its attributes in a dictionary. Then I added a few methods to provide easy access within the tree. Since I am very new to XML, I borrowed ideas from this recipe and a few notes by Uche Ogbuji. If you are interested, I would be happy to send the code with examples. I have never commented here nor posted any code so I really don't know how to approach it.

Doug Blank 15 years, 11 months ago  # | flag

obj2xml: and back again. Add this method to Element to recreate the XML:

print element.toString()

or you could even call this method __str__ to make it easy.

def toString(self, level=0):
    retval = " " * level
    retval += "<%s" % self.name
    for attribute in self.attributes:
        retval += " %s=\"%s\"" % (attribute, self.attributes[attribute])
    c = ""
    for child in self.children:
        c += child.toString(level+1)
    if c == "":
        retval += "/>\n"
    else:
        retval += ">\n" + c + ("</%s>\n" % self.name)
    return retval
david SHI 12 years, 11 months ago  # | flag

Has anyone tried it with GML with CDATA sections to extract data out?

david SHI 12 years, 11 months ago  # | flag

What I can see is that it takes in a sample.xml file. I have got a sample.xml file now, but how do I save data into a new file?

Basically, it seems to be a good idea but there is a lack of documentation/instruction on how to use it.