Reads an xml file into a python dictionary of dictionaries (repeated elements are read in as lists). Modified from xmlreader.py by Christoph Dietze - differs in not needing repeated elements to be tagged in the xml file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | """
==================================================
xmlreader2.py:
Modified from: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/116539
contributed by Christoph Dietze.
Modified to allow it to work with repeating elements without having to specify the multiple attribute.
==================================================
"""
from xml.dom.minidom import parse
class NotTextNodeError:
pass
def getTextFromNode(node):
"""
scans through all children of node and gathers the
text. if node has non-text child-nodes, then
NotTextNodeError is raised.
"""
t = ""
for n in node.childNodes:
if n.nodeType == n.TEXT_NODE:
t += n.nodeValue
else:
raise NotTextNodeError
return t
def nodeToDic(node):
"""
nodeToDic() scans through the children of node and makes a
dictionary from the content.
three cases are differentiated:
- if the node contains no other nodes, it is a text-node
and {nodeName:text} is merged into the dictionary.
- if there is more than one child with the same name
then these children will be appended to a list and this
list is merged to the dictionary in the form: {nodeName:list}.
- else, nodeToDic() will call itself recursively on
the nodes children (merging {nodeName:nodeToDic()} to
the dictionary).
"""
dic = {}
multlist = {} # holds temporary lists where there are multiple children
for n in node.childNodes:
multiple = False
if n.nodeType != n.ELEMENT_NODE:
continue
# find out if there are multiple records
if len(node.getElementsByTagName(n.nodeName)) > 1:
multiple = True
# and set up the list to hold the values
if not multlist.has_key(n.nodeName):
multlist[n.nodeName] = []
try:
#text node
text = getTextFromNode(n)
except NotTextNodeError:
if multiple:
# append to our list
multlist[n.nodeName].append(nodeToDic(n))
dic.update({n.nodeName:multlist[n.nodeName]})
continue
else:
# 'normal' node
dic.update({n.nodeName:nodeToDic(n)})
continue
# text node
if multiple:
multlist[n.nodeName].append(text)
dic.update({n.nodeName:multlist[n.nodeName]})
else:
dic.update({n.nodeName:text})
return dic
def readConfig(filename):
dom = parse(filename)
return nodeToDic(dom)
def test():
dic = readConfig("sample.xml")
print dic["Config"]["Name"]
print
print "Item Type:", dic["Config"]["Items"]["Type"]
for item in dic["Config"]["Items"]["Item"]:
print "Item's Name:", item["Name"]
print "Item's Value:", item["Value"]
"""
==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>
<Config>
<Name>My Config File</Name>
<Items>
<Type>Item type</Type>
<Item>
<Name>First Item</Name>
<Value>Value 1</Value>
</Item>
<Item>
<Name>Second Item</Name>
<Value>Value 2</Value>
</Item>
</Items>
</Config>
==================================================
output:
==================================================
[u'My Config File']
Item Type: Item type
Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2
"""
|
Modified from a very useful script by Christoph Dietze. I've fixed the thing that troubled him - having to specify repeating elements with an attribute 'multiple'. Allows an xml file to be read in to a python dictionary. Any xml file can be used and repeating elements are handled as lists. Repeating elements can be mixed with single elements.
No doubt there are faster ways to do this, but this works with the standard library and should be useful for small xml files like config info.
See the original script and discussion at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/116539
Another alternative... I've also written one that handles lists:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/410469
Interesting but incomplete. pardon my remark but your code, confronted to newsml doesn't work well with repeated tags with properties. example extracted from busineswire NewsML press release system :
Adding a few tests, here and there i succeeded to produce a more interesting ouput, but not yet what i expected and probably bad pythonic writing.
(comment continued...)
(...continued from previous comment)
sorry for my displaced comment. i switched page with http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/410469. I also tried you method but had problem with tags that don't have text like in the example above