ActiveState Code

Recipe 570085: XML to python dictionary of list


Another recipe to convert xml file into a python dictionary. This recipe uses lxml

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
from lxml import etree


def dictlist(node):
	res = {}
	res[node.tag] = []
	xmltodict(node,res[node.tag])
	reply = {}
	reply[node.tag] = {'value':res[node.tag],'attribs':node.attrib,'tail':node.tail}
	
	return reply

def xmltodict(node,res):
	rep = {}
	
	if len(node):
		#n = 0
		for n in list(node):
			rep[node.tag] = []
			value = xmltodict(n,rep[node.tag])
			if len(n):
			
				value = {'value':rep[node.tag],'attributes':n.attrib,'tail':n.tail}
				res.append({n.tag:value})
			else :
				
				res.append(rep[node.tag][0])
			
	else:
		
		
		value = {}
		value = {'value':node.text,'attributes':node.attrib,'tail':node.tail}
		
		res.append({node.tag:value})
	
	return 
		
def main():
	tree = etree.parse('test2.xml')
	res = dictlist(tree.getroot())
	
	
if __name__ == '__main__' :
	main()

Discussion

If you pass a xml file like this one <?xml version="1.0"?> <elements idc="002"> <element idl="0001"> <singleelem id="000"/> bbbb </element> <dragon> <moredragon type="fire" color="red"> Enter the dragon </moredragon> </dragon> </elements>

It will result in a dictionary like this

{'elements': {'attribs': {'idc': '002'}, 'tail': None, 'value': [{'element': {'attributes': {'idl': '0001'}, 'tail': '\n', 'value': [{'singleelem': {'attributes': {'id': '000'}, 'tail': '\nbbbb\n', 'value': None}}]}}, {'dragon': {'attributes': {}, 'tail': '\n', 'value': [{'moredragon': {'attributes': {'color': 'red', 'type': 'fire'}, 'tail': '\n', 'value': '\n\tEnter the dragon\n'}}]}}]}}

Each node is represented as a key/value pair, like this

{node.tag:{value: text or child nodes, tail: tail of the node, attributes: dict of node attributes}}

The node tag becomes the key. The value to the node key contains a list of further nodes or the node text when the node is a leaf. Node key value also contains node attributes and node tail.

Comments

  1. 1. At 6:59 a.m. on 16 apr 2008, Paddy McCarthy said:

    XML in comment gobbled. Hi, You need to escape your XML as it did not show up.

    • Paddy.
  2. 2. At 7:20 a.m. on 16 apr 2008, Vivek Khurana (the author) said:

    How do I escape the xml. How do I escape the xml ? I tried putting xml in pre or code block but no luck.

Sign in to comment