Welcome, guest | Sign In | My Account | Store | Cart

This is an example SAX application and can be used as the basis for any SAX application. It is somewhat useful in and of itself when you want to get a sense of the frequency of occurance of particular elements in XML.

Python, 16 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from xml.sax.handler import ContentHandler
import xml.sax
class countHandler(ContentHandler):
    def __init__(self):
        self.tags={}

    def startElement(self, name, attr):
        if not self.tags.has_key(name):
            self.tags[name] = 0
        self.tags[name] += 1

parser = xml.sax.make_parser()
handler = countHandler()
parser.setContentHandler(handler)
parser.parse("test.xml")
print handler.tags

When I start with a new XML content set, I like to get a sense of what elements are in it and how often they occur. I use variants of this recipe. I could also collect attributes easily as you can see. If you add a stack, you can keep track of what elements occur within other elements.

In fact, this little program shows the basic steps for implementing any SAX application. Alternatives include pulldom and minidom versions. They would be overkill for this simple job though.

You can learn about other options for ContentHandler subclasses by reading the Python xml.sax.handler documentation.

I know that I could have used set_default but I'm kind of old fashioned. :)

2 comments

Sunil patil 20 years, 5 months ago  # | flag

This is good. I was struggling with use of sax parser but this article solved that problem

Alex Martelli 20 years, 1 month ago  # | flag

on the set_default side note. set_default is no use for immutable values (like, here, numbers). Rather, the elegant alternative idiom is:

adict[akey] = 1 + adict.get(akey,0)

Even more old-fashioned than Paul's choice - no += ...!-)