PyXML is a useful package for parsing XML. The xmlval and xmldtd modules let you validate XML docs against an external DTD file. This is a simple, straightforward recipe that illustrates how to use the xmlval and xmldtd modules for validated XML parsing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | from xml.parsers.xmlproc import xmlproc
from xml.parsers.xmlproc import xmlval
from xml.parsers.xmlproc import xmldtd
# code to handle XML parsing goes here
class MyApp(xmlproc.Application):
def handle_start_tag(self,name,attrs):
pass
def handle_end_tag(self,name):
pass
def handle_data(self,data,start,end):
pass
def handle_comment(self,data):
pass
# XML file and corresponding DTD definition
file = 'test.xml'
dtd = 'test.dtd'
# standard XML parsing, without validation against DTD
print 'Start XML Parsing (No DTD)'
p = xmlproc.XMLProcessor()
p.set_application(MyApp())
p.parse_resource(file)
print 'End XML Parsing (No DTD)'
print
# XML parsing, with validation against external DTD
# Since you are referencing an external DTD from
# test.xml, you'll need markers like:
#
# <?xml version="1.0"?>
# <!DOCTYPE base SYSTEM "test.dtd">
#
# (where 'base' is the root element of the XML doc)
# at the top of your XML doc
print 'Start XML Parsing (With DTD)'
d = xmldtd.load_dtd(dtd)
p = xmlval.XMLValidator()
p.set_application(MyApp())
p.parse_resource(file)
print 'End XML Parsing (With DTD)'
|
Documentation on xml parsing in general, and xmlproc in particular, is easy enough to come by. However, I had to dig around a bit to find out how perform validated parsing (against an external DTD) using xmlval and xmldtd. The above recipe provides pretty much all the information you need to know for doing this.
Users will want to "round out" the implementation of MyApp (or any other subclass of xmlproc.Application) to perform "application specific" parsing per their particular needs.
No need to load DTD. There is absolutely no need to load the DTD yourself.
If you have defined a DTD in your xml file using:
Python will find that DTD itself and load it to validate your XML file.
It would be very very odd if you would load a DTD in variable 'p', not use that variable in any way and python will magically know that that is the DTD you want to validate with.
Specifying the DTD explicitly. If you really want to specify the DTD explicitly (as opposed to specifying it in the DOCTYPE declaration), use the following code: