ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Count tags in a document
Submitter: Paul Prescod (other recipes)
Last Updated: 2001/06/13
Version no: 1.2
Category: XML

 

Not Rated yet


Approved

Description:

This is an example SAX application and can be used as the basis for any SAX application. It is somewhat useful in and of itself when you want to get a sense of the frequency of occurance of particular elements in XML.

Source: Text Source

from xml.sax.handler import ContentHandler
import xml.sax
class countHandler(ContentHandler):
    def __init__(self):
        self.tags={}

    def startElement(self, name, attr):
        if not self.tags.has_key(name):
            self.tags[name] = 0
        self.tags[name] += 1

parser = xml.sax.make_parser()
handler = countHandler()
parser.setContentHandler(handler)
parser.parse("test.xml")
print handler.tags

Discussion:

When I start with a new XML content set, I like to get a sense of what elements are in it and how often they occur. I use variants of this recipe. I could also collect attributes easily as you can see. If you add a stack, you can keep track of what elements occur within other elements.

In fact, this little program shows the basic steps for implementing any SAX application. Alternatives include pulldom and minidom versions. They would be overkill for this simple job though.

You can learn about other options for ContentHandler subclasses by reading the Python xml.sax.handler documentation.

I know that I could have used set_default but I'm kind of old fashioned. :)



Add comment

Number of comments: 2

This is good, Sunil patil, 2001/06/14
I was struggling with use of sax parser but this article solved that problem
Add comment

on the set_default side note, Alex Martelli, 2001/10/15
set_default is no use for immutable values (like, here, numbers). Rather, the elegant alternative idiom is:

  adict[akey] = 1 + adict.get(akey,0)
Even more old-fashioned than Paul's choice - no += ...!-)
Add comment



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. IPy Notify

4. Changing return value ...

5. Quantum Superposition

6. Pickle objects under ...

7. Generalized delegates ...

8. Reorder a sequence (uses ...

9. Setting Win32 System ...

10. ObjectMerger




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.