Welcome, guest | Sign In | My Account | Store | Cart

Count tags in a document (Python recipe) by Paul Prescod
ActiveState Code (http://code.activestate.com/recipes/65127/)

This is an example SAX application and can be used as the basis for any SAX application. It is somewhat useful in and of itself when you want to get a sense of the frequency of occurance of particular elements in XML.

      from xml.sax.handler import ContentHandler
import xml.sax
class countHandler(ContentHandler):
    def __init__(self):
        self.tags={}

    def startElement(self, name, attr):
        if not self.tags.has_key(name):
            self.tags[name] = 0
        self.tags[name] += 1

parser = xml.sax.make_parser()
handler = countHandler()
parser.setContentHandler(handler)
parser.parse("test.xml")
print handler.tags

      

When I start with a new XML content set, I like to get a sense of what elements are in it and how often they occur. I use variants of this recipe. I could also collect attributes easily as you can see. If you add a stack, you can keep track of what elements occur within other elements.

In fact, this little program shows the basic steps for implementing any SAX application. Alternatives include pulldom and minidom versions. They would be overkill for this simple job though.

You can learn about other options for ContentHandler subclasses by reading the Python xml.sax.handler documentation.

I know that I could have used set_default but I'm kind of old fashioned. :)

Tags: xml

2 comments

Sunil patil 22 years, 10 months ago # | flag

This is good. I was struggling with use of sax parser but this article solved that problem

Alex Martelli 22 years, 6 months ago # | flag

on the set_default side note. set_default is no use for immutable values (like, here, numbers). Rather, the elegant alternative idiom is:

adict[akey] = 1 + adict.get(akey,0)

Even more old-fashioned than Paul's choice - no += ...!-)

Created by Paul Prescod on Tue, 12 Jun 2001 (PSF)

◄	Python recipes (4591)	►
◄	Paul Prescod's recipes (6)	►
◄	Python Cookbook Edition 2 (117)	►
◄	Python Cookbook Edition 1 (103)	►

Required Modules

Other Information and Tasks

Licensed under the PSF License
Viewed 13081 times
Revision 3 (updated 22 years ago)

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Count tags in a document (Python recipe) by Paul Prescod ActiveState Code (http://code.activestate.com/recipes/65127/)