PyXML is a useful package for parsing XML. The xmlval and xmldtd modules let you validate XML docs against an external DTD file. This is a simple, straightforward recipe that illustrates how to use the xmlval and xmldtd modules for validated XML parsing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
from xml.parsers.xmlproc import xmlproc from xml.parsers.xmlproc import xmlval from xml.parsers.xmlproc import xmldtd # code to handle XML parsing goes here class MyApp(xmlproc.Application): def handle_start_tag(self,name,attrs): pass def handle_end_tag(self,name): pass def handle_data(self,data,start,end): pass def handle_comment(self,data): pass # XML file and corresponding DTD definition file = 'test.xml' dtd = 'test.dtd' # standard XML parsing, without validation against DTD print 'Start XML Parsing (No DTD)' p = xmlproc.XMLProcessor() p.set_application(MyApp()) p.parse_resource(file) print 'End XML Parsing (No DTD)' print # XML parsing, with validation against external DTD # Since you are referencing an external DTD from # test.xml, you'll need markers like: # # <?xml version="1.0"?> # <!DOCTYPE base SYSTEM "test.dtd"> # # (where 'base' is the root element of the XML doc) # at the top of your XML doc print 'Start XML Parsing (With DTD)' d = xmldtd.load_dtd(dtd) p = xmlval.XMLValidator() p.set_application(MyApp()) p.parse_resource(file) print 'End XML Parsing (With DTD)'
Documentation on xml parsing in general, and xmlproc in particular, is easy enough to come by. However, I had to dig around a bit to find out how perform validated parsing (against an external DTD) using xmlval and xmldtd. The above recipe provides pretty much all the information you need to know for doing this.
Users will want to "round out" the implementation of MyApp (or any other subclass of xmlproc.Application) to perform "application specific" parsing per their particular needs.