Welcome, guest | Sign In | My Account | Store | Cart

I decided not to customize the xml-parser to fit the structure of a xml-document, but to make a parser that adapts the structure of the document. By converting the xml-document in this way, the access to the elements is simple and code-customization is minimal.

Python, 120 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
==================================================
xmlreader.py:
==================================================
from xml.dom.minidom import parse


class NotTextNodeError:
    pass


def getTextFromNode(node):
    """
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    """
    t = ""
    for n in node.childNodes:
	if n.nodeType == n.TEXT_NODE:
	    t += n.nodeValue
	else:
	    raise NotTextNodeError
    return t


def nodeToDic(node):
    """
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
	- if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
	- if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
	- else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    """
    dic = {} 
    for n in node.childNodes:
	if n.nodeType != n.ELEMENT_NODE:
	    continue
	if n.getAttribute("multiple") == "true":
	    # node with multiple children:
	    # put them in a list
	    l = []
	    for c in n.childNodes:
	        if c.nodeType != n.ELEMENT_NODE:
		    continue
		l.append(nodeToDic(c))
	        dic.update({n.nodeName:l})
	    continue
		
	try:
	    text = getTextFromNode(n)
	except NotTextNodeError:
            # 'normal' node
            dic.update({n.nodeName:nodeToDic(n)})
            continue

        # text node
        dic.update({n.nodeName:text})
	continue
    return dic


def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)





def test():
    dic = readConfig("sample.xml")
    
    print dic["Config"]["Name"]
    print
    for item in dic["Config"]["Items"]:
	print "Item's Name:", item["Name"]
	print "Item's Value:", item["Value"]

test()



==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>

<Config>
    <Name>My Config File</Name>
    
    <Items multiple="true">
	<Item>
	    <Name>First Item</Name>
	    <Value>Value 1</Value>
	</Item>
	<Item>
	    <Name>Second Item</Name>
	    <Value>Value 2</Value>
	</Item>
    </Items>

</Config>



==================================================
output:
==================================================
My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

The big advantage of this recipe is that you never define the structure of the xml-document, you just use it.

One thing that bothers me, is that you must define 'multiple="true"' in the attribute of an element, if you want its children to be put in a list.

5 comments

John Bair 21 years, 7 months ago  # | flag

An alternate solution. Good idea. See my xml2obj recipe which is a variation on the theme that uses the expat parser for lower overhead and a stack to keep track of parents.

Kevin Manley 21 years, 4 months ago  # | flag

Check out pyRXP. pyRXP from Reportlab turns XML into a python tuple tree and is extremely fast. Check it out (http://www.reportlab.com/xml/pyrxp.html)

Chris Ryland 20 years, 11 months ago  # | flag

buglet? Shouldn't the first

dic.update({n.nodeName:l})

be outdented one level? Otherwise, it's adding the partially-built list to the dictionary every time through the loop. (Or maybe it's late and I'm seeing double. ;-)

Pawel Zdziechowicz 20 years ago  # | flag

Improvement? Great idea! Very nice for small config. It is also good to add something like this:

tmp = nodeToDic(c)
if tmp != {}
  l.append(tmp)
else:
  l.append(getTextFromNode(c))

eg. piece of xml file

&ltShared multiple="true">
  &ltFolder>c:\Mp3&lt;/Folder>
  &ltFolder>d:\Tmp&lt;/Folder>
&lt;/Shared>

without:  {.. u'Shared': [{}, {}] ..} ,
with: {.. u'Shared': [u'c:\\Mp3', u'd:\\Tmp'] ..}
Peter Neish 19 years ago  # | flag

Another improvent? How about this as an alternative to allow it to work without specifying the multiple attribute?

def nodeToDic(node):

    dic = {}
    multlist = {} # holds temporary lists where there are multiple children
    multiple = False
    for n in node.childNodes:
        if n.nodeType != n.ELEMENT_NODE:
            continue

        # find out if there are multiple records
        if len(node.getElementsByTagName(n.nodeName)) > 1:
            multiple = True
            # and set up the list to hold the values
            if not multlist.has_key(n.nodeName):
                multlist[n.nodeName] = []

        try:
            #text node
            text = getTextFromNode(n)
        except NotTextNodeError:
            if multiple:
                # append to our list
                multlist[n.nodeName].append(nodeToDic(n))
                dic.update({n.nodeName:multlist[n.nodeName]})
                continue
            else:
                # 'normal' node
                dic.update({n.nodeName:nodeToDic(n)})
                continue

        # text node
        if multiple:
            multlist[n.nodeName].append(text)
            dic.update({n.nodeName:multlist[n.nodeName]})
        else:
            dic.update({n.nodeName:text})
    return dic
Created by Christoph Dietze on Tue, 26 Feb 2002 (PSF)
Python recipes (4591)
Christoph Dietze's recipes (2)

Required Modules

  • (none specified)

Other Information and Tasks