ActiveState Code

Recipe 65248: Parsing an XML file with xml.parsers.expat


This is a reusable way to use "xml.parsers.expat" to parse an XML file. When re-using the "MyXML" class, all you need to define a new class, with "MyXML" as the parent. Once you have done that, all you have to do is overwrite the inherited XML handlers and you are ready to go.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import xml.parsers.expat, sys

class MyXML:
        Parser = ""

        # prepare for parsing

        def __init__(self, xml_file):
                assert(xml_file != "")
                self.Parser = xml.parsers.expat.ParserCreate()

                self.Parser.CharacterDataHandler = self.handleCharData
                self.Parser.StartElementHandler = self.handleStartElement
                self.Parser.EndElementHandler = self.handleEndElement

        # parse the XML file
  
        def parse(self):
                try:
                        self.Parser.ParseFile(open(xml_file, "r"))
                except:
                        print "ERROR: Can't open XML file!"
                        sys.exit(0)

         # will be overwritten w/ implementation specific methods

        def handleCharData(self, data): pass
        def handleStartElement(self, name, attrs): pass
        def handleEndElement(self, name): pass

Discussion

I think this is a pretty good solution, it gives you a simple, consistant, and abstracted way to parse XML files.

Comments

  1. 1. At 2:18 a.m. on 7 dec 2007, Sebastian Hoehn said:

    Forgot to store xml_file. Just an addition, after 6 years ;)

    Just add self.xml_file = xml_file to the constructor and use self.xml_file in the parse function, otherwise you will not get the actual file!

    Regards,

    Sebastian

Sign in to comment