Welcome, guest | Sign In | My Account | Store | Cart

I use this for configuration. I hadn't intended to put it up anywhere, but there have been a couple discussions lately about converting XML to python dicts, so I feel obligated to share another approach, one that is based on Fredrik Lundh's ElementTree.

Python, 63 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

This uses two simple classes to provide the machinery for XML conversion. See the comments and usage in the code for detailed explanation.

I make constant use of ElementTree, but my efforts are inexpert as best. If anyone can share how to make its use more elegant, I'd love to see...

Update: Fredrik Lundh was kind enough to take a look at this recipe and offer suggestions for cleaning it up. These have been implemented and my thanks goes out to him.

7 comments

Alex 14 years ago  # | flag

I am using this recipe and I found an issue with it, an XML file that contains multiple children with the same name will end up having one key in the dictionary, the value of which will be the value taken from the LAST child (with that name) that was processed.

This XML file will result in a dictionary with a single key called "Purchased" and the values associated with it will be {'PurchaseId': 'ccc1', 'PurchaseDate': 'ccc2', 'PurchaseOrigin': 'ccc3'}

Example:

<OrderNotification>
<Purchase>
    <PurchaseId>aaa1</PurchaseId>
    <PurchaseDate>aaa2</PurchaseDate>
    <PurchaseOrigin>aaa3</PurchaseOrigin>
</Purchase>
<Purchase>
    <PurchaseId>ccc1</PurchaseId>
    <PurchaseDate>ccc2</PurchaseDate>
    <PurchaseOrigin>ccc3</PurchaseOrigin>
</Purchase>     
</OrderNotification>
Alex 14 years ago  # | flag

Here is the change that I made in the XmlDictConfig class, the additional code is separated from the original one with empty lines:

def __init__(self, parent_element):
    childrenNames = []
    for child in parent_element.getchildren():
        childrenNames.append(child.tag)

    if parent_element.items(): #attributes
        self.update(dict(parent_element.items()))
    for element in parent_element:
        if element:
            # treat like dict - we assume that if the first two tags
            # in a series are different, then they are all different.
            #print len(element), element[0].tag, element[1].tag
            if len(element) == 1 or element[0].tag != element[1].tag:
                aDict = XmlDictConfig(element)
            # treat like list - we assume that if the first two tags
            # in a series are the same, then the rest are the same.
            else:
                # here, we put the list in dictionary; the key is the
                # tag name the list elements all share in common, and
                # the value is the list itself
                aDict = {element[0].tag: XmlListConfig(element)}
            # if the tag has attributes, add those to the dict
            if element.items():
                aDict.update(dict(element.items()))

            if childrenNames.count(element.tag) > 1:
                try:
                    currentValue = self[element.tag]
                    currentValue.append(aDict)
                    self.update({element.tag: currentValue})
                except: #the first of its kind, an empty list must be created
                    self.update({element.tag: [aDict]}) #aDict is written in [], i.e. it will be a list

            else:
                 self.update({element.tag: aDict})
        # this assumes that if you've got an attribute in a tag,
        # you won't be having any text. This may or may not be a
        # good idea -- time will tell. It works for the way we are
        # currently doing XML configuration files...
        elif element.items():
            self.update({element.tag: dict(element.items())})
        # finally, if there are no child tags and no attributes, extract
        # the text
        else:
            self.update({element.tag: element.text})

What it does is the following:

  • get a list of my siblings' names
  • if there are several nodes that have the same name
  • if I am the first of my kind, create a new key in the dictionary and initialize it with a list, the first element of which is the new dictionary (aDict)
  • otherwise, if the list is already there, just append the new aDict to it

In the end, all the nodes will be accessible as elements of a list that corresponds to a key in the dictionary.

p.s. my code is not very pythonic.

Alex 14 years ago  # | flag
#childrenNames = []
#for child in parent_element.getchildren():
#   childrenNames.append(child.tag)

childrenNames = [child.tag for child in parent_element.getchildren()]

This is a prettier approach - the list if children is built using list comprehension.

bernardokyotoku 12 years, 3 months ago  # | flag

Ok this code seems to be pretty referenced. I think is worth saying that you might use

from xml.etree import ElementTree

Matt Monroe 11 years, 9 months ago  # | flag

I found that this wasn't working correctly when there is one child to an element with attributes AND text, which you say in the comments. To fix this I changed the following:

elif element.items():
    self.update({element.tag: dict(element.items())})
    #add the following line
    self[element.tag].update({"__Content__":element.text})

hope this helps someone else if they need it, the text for the element will be in the dict with the attributes under the key __Content__ .

-Laserath

Luis Martin Gil 11 years, 8 months ago  # | flag

I've using yours but doesn't work well with lists. What about this solution?

Recursive, easier and less code:

from lxml import etree, objectify
def formatXML(parent):
    """
    Recursive operation which returns a tree formated
    as dicts and lists.
    Decision to add a list is to find the 'List' word
    in the actual parent tag.    
    """
    ret = {}
    if parent.items(): ret.update(dict(parent.items()))
    if parent.text: ret['__content__'] = parent.text
    if ('List' in parent.tag):
        ret['__list__'] = []
        for element in parent:
            ret['__list__'].append(formatXML(element))
    else:
        for element in parent:
            ret[element.tag] = formatXML(element)
    return ret

Luis Martin Gil luismartingil.com

Alecks Gates 10 years, 3 months ago  # | flag

I've appended the following to XmlListConfig:

        elif element.items():
            self.append(OrderedDict(element.items()))

For when we have a list of items but they only have attributes and no inner text.

Created by Duncan McGreggor on Tue, 19 Apr 2005 (PSF)
Python recipes (4591)
Duncan McGreggor's recipes (5)

Required Modules

Other Information and Tasks