ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: xml2obj
Submitter: John Bair (other recipes)
Last Updated: 2002/09/11
Version no: 1.0
Category: XML

 

Not Rated yet


Description:

A generic script using expat to convert xml into objects

Source: Text Source

"""
Borrowed from wxPython XML tree demo and modified.
"""

import string
from xml.parsers import expat

class Element:
    'A parsed XML element'
    def __init__(self,name,attributes):
        'Element constructor'
        # The element's tag name
        self.name = name
        # The element's attribute dictionary
        self.attributes = attributes
        # The element's cdata
        self.cdata = ''
        # The element's child element list (sequence)
        self.children = []
        
    def AddChild(self,element):
        'Add a reference to a child element'
        self.children.append(element)
        
    def getAttribute(self,key):
        'Get an attribute value'
        return self.attributes.get(key)
    
    def getData(self):
        'Get the cdata'
        return self.cdata
        
    def getElements(self,name=''):
        'Get a list of child elements'
        #If no tag name is specified, return the all children
        if not name:
            return self.children
        else:
            # else return only those children with a matching tag name
            elements = []
            for element in self.children:
                if element.name == name:
                    elements.append(element)
            return elements

class Xml2Obj:
    'XML to Object'
    def __init__(self):
        self.root = None
        self.nodeStack = []
        
    def StartElement(self,name,attributes):
        'SAX start element even handler'
        # Instantiate an Element object
        element = Element(name.encode(),attributes)
        
        # Push element onto the stack and make it a child of parent
        if len(self.nodeStack) > 0:
            parent = self.nodeStack[-1]
            parent.AddChild(element)
        else:
            self.root = element
        self.nodeStack.append(element)
        
    def EndElement(self,name):
        'SAX end element event handler'
        self.nodeStack = self.nodeStack[:-1]

    def CharacterData(self,data):
        'SAX character data event handler'
        if string.strip(data):
            data = data.encode()
            element = self.nodeStack[-1]
            element.cdata += data
            return

    def Parse(self,filename):
        # Create a SAX parser
        Parser = expat.ParserCreate()

        # SAX event handlers
        Parser.StartElementHandler = self.StartElement
        Parser.EndElementHandler = self.EndElement
        Parser.CharacterDataHandler = self.CharacterData

        # Parse the XML File
        ParserStatus = Parser.Parse(open(filename,'r').read(), 1)
        
        return self.root
    
parser = Xml2Obj()
element = parser.Parse('sample.xml')

Discussion:

I saw Christoph Dietze's script to turn the structure of a XML-document into a combination of dictionaries and lists which is a good idea. xml2obj is a variation of the concept and differs by:

1. Using expat. The dom parser has a lot of overhead because it creates its own data structures, which were being used to create a second set of structures.

2. Uses a stack to append child elements to parent.

3. Uses classes to enhance access to the elements.



Add comment

Number of comments: 3

Output, C. Yu, 2003/05/15
How do you print out the parsed xml object? Do you have any examples? Thanks.
Add comment

obj2xml: and back again, Doug Blank, 2005/12/27
Add this method to Element to recreate the XML:

print element.toString()
or you could even call this method __str__ to make it easy.
    def toString(self, level=0):
        retval = " " * level
        retval += "<%s" % self.name
        for attribute in self.attributes:
            retval += " %s=\"%s\"" % (attribute, self.attributes[attribute])
        c = ""
        for child in self.children:
            c += child.toString(level+1)
        if c == "":
            retval += "/>\n"
        else:
            retval += ">\n" + c + ("</%s>\n" % self.name)
        return retval

Add comment

bob w, 2004/08/07
I took this code and updated it to where it works pretty well for me now. First, I changed the Element to have its attributes in a dictionary. Then I added a few methods to provide easy access within the tree. Since I am very new to XML, I borrowed ideas from this recipe and a few notes by Uche Ogbuji. If you are interested, I would be happy to send the code with examples. I have never commented here nor posted any code so I really don't know how to approach it.
Add comment



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. IPy Notify

4. Treat the Win32 Registry ...

5. a friendly mkdir()

6. Wrapping template engine ...

7. Assignment in expression

8. Changing return value ...

9. Implementation of sets ...

10. bag collection class




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.