Welcome, guest | Sign In | My Account | Store | Cart

PyXML is a useful package for parsing XML. The xmlval and xmldtd modules let you validate XML docs against an external DTD file. This is a simple, straightforward recipe that illustrates how to use the xmlval and xmldtd modules for validated XML parsing.

Python, 43 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from xml.parsers.xmlproc import xmlproc
from xml.parsers.xmlproc import xmlval
from xml.parsers.xmlproc import xmldtd

# code to handle XML parsing goes here
class MyApp(xmlproc.Application):
  def handle_start_tag(self,name,attrs):
    pass
  def handle_end_tag(self,name):
    pass
  def handle_data(self,data,start,end):
    pass
  def handle_comment(self,data):
    pass

# XML file and corresponding DTD definition
file = 'test.xml'
dtd  = 'test.dtd'

# standard XML parsing, without validation against DTD
print 'Start XML Parsing (No DTD)'
p = xmlproc.XMLProcessor()
p.set_application(MyApp())
p.parse_resource(file)
print 'End XML Parsing (No DTD)'
print

# XML parsing, with validation against external DTD
# Since you are referencing an external DTD from 
# test.xml, you'll need markers like:
# 
#  <?xml version="1.0"?>
#  <!DOCTYPE base SYSTEM "test.dtd">
#
# (where 'base' is the root element of the XML doc) 
# at the top of your XML doc

print 'Start XML Parsing (With DTD)'
d = xmldtd.load_dtd(dtd)
p = xmlval.XMLValidator()
p.set_application(MyApp())
p.parse_resource(file)
print 'End XML Parsing (With DTD)'

Documentation on xml parsing in general, and xmlproc in particular, is easy enough to come by. However, I had to dig around a bit to find out how perform validated parsing (against an external DTD) using xmlval and xmldtd. The above recipe provides pretty much all the information you need to know for doing this.

Users will want to "round out" the implementation of MyApp (or any other subclass of xmlproc.Application) to perform "application specific" parsing per their particular needs.

2 comments

aspn 19 years, 10 months ago  # | flag

No need to load DTD. There is absolutely no need to load the DTD yourself.

If you have defined a DTD in your xml file using:

&lt;!DOCTYPE root-node SYSTEM "/path/to/filename.dtd"&gt;

Python will find that DTD itself and load it to validate your XML file.

It would be very very odd if you would load a DTD in variable 'p', not use that variable in any way and python will magically know that that is the DTD you want to validate with.

Marius Gedminas 19 years, 9 months ago  # | flag

Specifying the DTD explicitly. If you really want to specify the DTD explicitly (as opposed to specifying it in the DOCTYPE declaration), use the following code:

#!/usr/bin/env python
from xml.parsers.xmlproc import xmlproc
from xml.parsers.xmlproc import xmlval
from xml.parsers.xmlproc import xmldtd

def validate_xml(xml_filename, dtd_filename):
    """Validate a given XML file with a given external DTD.

    If the XML file is not valid, an error message will be printed
    to sys.stderr, and the program will be terminated with a non-zero
    exit code.  If the XML file is valid, nothing will be printed.
    """
    dtd = xmldtd.load_dtd(dtd_filename)
    parser = xmlproc.XMLProcessor()
    parser.set_application(xmlval.ValidatingApp(dtd, parser))
    parser.dtd = dtd
    parser.ent = dtd
    # If you want to override error handling, subclass
    # xml.parsers.xmlproc.xmlapp.ErrorHandler and do
    #   parser.set_error_handler(MyErrorHandler(parser))
    parser.parse_resource(xml_filename)
    # If you have xml data only as a string, use instead
    #   parser.feed(xml_data)
    #   parser.close()

if __name__ == '__main__':
    import sys
    validate_xml(sys.argv[1], sys.argv[2])