ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Extract text from XML document
Submitter: Paul Prescod (other recipes)
Last Updated: 2001/06/14
Version no: 1.1
Category: XML

 

4 stars 2 vote(s)


Approved

Description:

People often ask how to extract the text from an XML document. This small program does it.

Source: Text Source

from xml.sax.handler import ContentHandler
import xml.sax
import sys

class textHandler(ContentHandler):
    def characters(self, ch):
        sys.stdout.write(ch.encode("Latin-1"))

parser = xml.sax.make_parser()
handler = textHandler()
parser.setContentHandler(handler)
parser.parse("test.xml")

Discussion:

Sometimes you want to get rid of XML tags to re-key a document, or to spell check it. This will work with any well-formed XML document. It is quite efficient. If the document isn't well-formed, you could try a solution based on the xml lexer described in another recipe called "XML lexing".



Add comment

Number of comments: 2

Direct link to author's "XML lexing", Bill Bell, 2004/03/21
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65125
Add comment

Another way, Bill Bell, 2004/03/21
from sgmllib import SGMLParser class XMLJustText ( SGMLParser ) : def handle_data ( self, data ) : print data XMLJustText ( ) . feed ( "text 1text 2" )
Add comment



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. IPy Notify

4. Changing return value ...

5. Quantum Superposition

6. Pickle objects under ...

7. Generalized delegates ...

8. Reorder a sequence (uses ...

9. Setting Win32 System ...

10. ObjectMerger




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.