Welcome, guest | Sign In | My Account | Store | Cart

A simple function to convert a headerless XML string into a dictionary using only simplejson and re.

Python, 12 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import re
import simplejson as json

def xml2dict(xml, jsonString = ''):
    tags, keys = re.findall('</?[A-Za-z0-9]+>',xml), []
    for tag in tags: keys.append(re.sub('[</>]+','',tag))
    for index in range(len(tags)-1):
        jsonString += {'<><>':   '"'+keys[index]+'": {',
                       '<></>':  '"'+keys[index]+'": "'+xml[xml.find(tags[index])+len(tags[index]):xml.find(tags[index+1])]+'"',
                       '</><>':  ', ',
                       '</></>': '}'}[tags[index].replace(keys[index],'')+tags[index+1].replace(keys[index+1],'')]
    return json.loads('{%s}' % jsonString)

1 comment

Gabriel Genellina 14 years, 5 months ago  # | flag

Either I don't get how this is supposed to work, or it simply doesn't work:

>>> xml2dict('<root><a>foo\nbar</a></root>')
Traceback (most recent call last):
...
ValueError: Invalid control character '\n' at: line 2 column 1 (char 20)

I'd expect something like {'a': 'foo\nbar'}. It cannot handle attributes either:

>>> xml2dict('<root><a attrib="text">foo</a></root>')
Traceback (most recent call last):
...
ValueError: Expecting , delimiter: line 1 column 21 (char 21)

I'd use ElementTree instead.