turn the structure of a XML-document into a combination of dictionaries and lists « Python recipes

I decided not to customize the xml-parser to fit the structure of a xml-document, but to make a parser that adapts the structure of the document. By converting the xml-document in this way, the access to the elements is simple and code-customization is minimal.

      ==================================================
xmlreader.py:
==================================================
from xml.dom.minidom import parse


class NotTextNodeError:
    pass


def getTextFromNode(node):
    """
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    """
    t = ""
    for n in node.childNodes:
	if n.nodeType == n.TEXT_NODE:
	    t += n.nodeValue
	else:
	    raise NotTextNodeError
    return t


def nodeToDic(node):
    """
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
	- if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
	- if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
	- else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    """
    dic = {} 
    for n in node.childNodes:
	if n.nodeType != n.ELEMENT_NODE:
	    continue
	if n.getAttribute("multiple") == "true":
	    # node with multiple children:
	    # put them in a list
	    l = []
	    for c in n.childNodes:
	        if c.nodeType != n.ELEMENT_NODE:
		    continue
		l.append(nodeToDic(c))
	        dic.update({n.nodeName:l})
	    continue
		
	try:
	    text = getTextFromNode(n)
	except NotTextNodeError:
            # 'normal' node
            dic.update({n.nodeName:nodeToDic(n)})
            continue

        # text node
        dic.update({n.nodeName:text})
	continue
    return dic


def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)





def test():
    dic = readConfig("sample.xml")
    
    print dic["Config"]["Name"]
    print
    for item in dic["Config"]["Items"]:
	print "Item's Name:", item["Name"]
	print "Item's Value:", item["Value"]

test()



==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>

<Config>
    <Name>My Config File</Name>
    
    <Items multiple="true">
	<Item>
	    <Name>First Item</Name>
	    <Value>Value 1</Value>
	</Item>
	<Item>
	    <Name>Second Item</Name>
	    <Value>Value 2</Value>
	</Item>
    </Items>

</Config>



==================================================
output:
==================================================
My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

      

The big advantage of this recipe is that you never define the structure of the xml-document, you just use it.

One thing that bothers me, is that you must define 'multiple="true"' in the attribute of an element, if you want its children to be put in a list.

Tags: xml

5 comments

John Bair 21 years, 7 months ago # | flag

An alternate solution. Good idea. See my xml2obj recipe which is a variation on the theme that uses the expat parser for lower overhead and a stack to keep track of parents.

Kevin Manley 21 years, 4 months ago # | flag

Check out pyRXP. pyRXP from Reportlab turns XML into a python tuple tree and is extremely fast. Check it out (http://www.reportlab.com/xml/pyrxp.html)

Chris Ryland 20 years, 11 months ago # | flag

buglet? Shouldn't the first

dic.update({n.nodeName:l})

be outdented one level? Otherwise, it's adding the partially-built list to the dictionary every time through the loop. (Or maybe it's late and I'm seeing double. ;-)

Pawel Zdziechowicz 20 years ago # | flag

Improvement? Great idea! Very nice for small config. It is also good to add something like this:

tmp = nodeToDic(c)
if tmp != {}
  l.append(tmp)
else:
  l.append(getTextFromNode(c))

eg. piece of xml file

&ltShared multiple="true">
  &ltFolder>c:\Mp3&lt;/Folder>
  &ltFolder>d:\Tmp&lt;/Folder>
&lt;/Shared>

without:  {.. u'Shared': [{}, {}] ..} ,
with: {.. u'Shared': [u'c:\\Mp3', u'd:\\Tmp'] ..}

Peter Neish 19 years ago # | flag

Another improvent? How about this as an alternative to allow it to work without specifying the multiple attribute?

def nodeToDic(node):

    dic = {}
    multlist = {} # holds temporary lists where there are multiple children
    multiple = False
    for n in node.childNodes:
        if n.nodeType != n.ELEMENT_NODE:
            continue

        # find out if there are multiple records
        if len(node.getElementsByTagName(n.nodeName)) > 1:
            multiple = True
            # and set up the list to hold the values
            if not multlist.has_key(n.nodeName):
                multlist[n.nodeName] = []

        try:
            #text node
            text = getTextFromNode(n)
        except NotTextNodeError:
            if multiple:
                # append to our list
                multlist[n.nodeName].append(nodeToDic(n))
                dic.update({n.nodeName:multlist[n.nodeName]})
                continue
            else:
                # 'normal' node
                dic.update({n.nodeName:nodeToDic(n)})
                continue

        # text node
        if multiple:
            multlist[n.nodeName].append(text)
            dic.update({n.nodeName:multlist[n.nodeName]})
        else:
            dic.update({n.nodeName:text})
    return dic

◄	Python recipes (4591)	►
◄	Christoph Dietze's recipes (2)	►

turn the structure of a XML-document into a combination of dictionaries and lists (Python recipe) by Christoph Dietze
ActiveState Code (http://code.activestate.com/recipes/116539/)

5 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

turn the structure of a XML-document into a combination of dictionaries and lists (Python recipe) by Christoph Dietze ActiveState Code (http://code.activestate.com/recipes/116539/)

5 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

turn the structure of a XML-document into a combination of dictionaries and lists (Python recipe) by Christoph Dietze
ActiveState Code (http://code.activestate.com/recipes/116539/)