|
|
 |
|
Title: Count tags in a document
Submitter: Paul Prescod
(other recipes)
Last Updated: 2001/06/13
Version no: 1.2
Category:
XML
|
|
|
Description:
This is an example SAX application and can be used as the basis for any SAX application. It is somewhat useful in and of itself when you want to get a sense of the frequency of occurance of particular elements in XML.
Source: Text Source
from xml.sax.handler import ContentHandler
import xml.sax
class countHandler(ContentHandler):
def __init__(self):
self.tags={}
def startElement(self, name, attr):
if not self.tags.has_key(name):
self.tags[name] = 0
self.tags[name] += 1
parser = xml.sax.make_parser()
handler = countHandler()
parser.setContentHandler(handler)
parser.parse("test.xml")
print handler.tags
Discussion:
When I start with a new XML content set, I like to get a sense of what elements are in it and how often they occur. I use variants of this recipe. I could also collect attributes easily as you can see. If you add a stack, you can keep track of what elements occur within other elements.
In fact, this little program shows the basic steps for implementing any SAX application. Alternatives include pulldom and minidom versions. They would be overkill for this simple job though.
You can learn about other options for ContentHandler subclasses by reading the Python xml.sax.handler documentation.
I know that I could have used set_default but I'm kind of old fashioned. :)
|
|
Add comment
|
|
Number of comments: 2
This is good, Sunil patil, 2001/06/14
I was struggling with use of sax parser but this article solved that problem
Add comment
on the set_default side note, Alex Martelli, 2001/10/15
set_default is no use for immutable values (like, here, numbers). Rather, the elegant alternative idiom is:
adict[akey] = 1 + adict.get(akey,0)
Even more old-fashioned than Paul's choice - no += ...!-)
Add comment
|
|
|
|
|
 |
|