A generic script using expat to convert xml into objects
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | """
Borrowed from wxPython XML tree demo and modified.
"""
import string
from xml.parsers import expat
class Element:
'A parsed XML element'
def __init__(self,name,attributes):
'Element constructor'
# The element's tag name
self.name = name
# The element's attribute dictionary
self.attributes = attributes
# The element's cdata
self.cdata = ''
# The element's child element list (sequence)
self.children = []
def AddChild(self,element):
'Add a reference to a child element'
self.children.append(element)
def getAttribute(self,key):
'Get an attribute value'
return self.attributes.get(key)
def getData(self):
'Get the cdata'
return self.cdata
def getElements(self,name=''):
'Get a list of child elements'
#If no tag name is specified, return the all children
if not name:
return self.children
else:
# else return only those children with a matching tag name
elements = []
for element in self.children:
if element.name == name:
elements.append(element)
return elements
class Xml2Obj:
'XML to Object'
def __init__(self):
self.root = None
self.nodeStack = []
def StartElement(self,name,attributes):
'SAX start element even handler'
# Instantiate an Element object
element = Element(name.encode(),attributes)
# Push element onto the stack and make it a child of parent
if len(self.nodeStack) > 0:
parent = self.nodeStack[-1]
parent.AddChild(element)
else:
self.root = element
self.nodeStack.append(element)
def EndElement(self,name):
'SAX end element event handler'
self.nodeStack = self.nodeStack[:-1]
def CharacterData(self,data):
'SAX character data event handler'
if string.strip(data):
data = data.encode()
element = self.nodeStack[-1]
element.cdata += data
return
def Parse(self,filename):
# Create a SAX parser
Parser = expat.ParserCreate()
# SAX event handlers
Parser.StartElementHandler = self.StartElement
Parser.EndElementHandler = self.EndElement
Parser.CharacterDataHandler = self.CharacterData
# Parse the XML File
ParserStatus = Parser.Parse(open(filename,'r').read(), 1)
return self.root
parser = Xml2Obj()
element = parser.Parse('sample.xml')
|
I saw Christoph Dietze's script to turn the structure of a XML-document into a combination of dictionaries and lists which is a good idea. xml2obj is a variation of the concept and differs by:
Using expat. The dom parser has a lot of overhead because it creates its own data structures, which were being used to create a second set of structures.
Uses a stack to append child elements to parent.
Uses classes to enhance access to the elements.
Output. How do you print out the parsed xml object? Do you have any examples? Thanks.
I took this code and updated it to where it works pretty well for me now. First, I changed the Element to have its attributes in a dictionary. Then I added a few methods to provide easy access within the tree. Since I am very new to XML, I borrowed ideas from this recipe and a few notes by Uche Ogbuji. If you are interested, I would be happy to send the code with examples. I have never commented here nor posted any code so I really don't know how to approach it.
obj2xml: and back again. Add this method to Element to recreate the XML:
or you could even call this method __str__ to make it easy.
Has anyone tried it with GML with CDATA sections to extract data out?
What I can see is that it takes in a sample.xml file. I have got a sample.xml file now, but how do I save data into a new file?
Basically, it seems to be a good idea but there is a lack of documentation/instruction on how to use it.