Welcome, guest | Sign In | My Account | Store | Cart

Convert CSV to XML.

Python, 40 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# csv2xml.py
# FB - 201010107
# First row of the csv file must be header!

# example CSV file: myData.csv
# id,code name,value
# 36,abc,7.6
# 40,def,3.6
# 9,ghi,6.3
# 76,def,99

import csv

csvFile = 'myData.csv'
xmlFile = 'myData.xml'

csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<csv_data>' + "\n")

rowNum = 0
for row in csvData:
    if rowNum == 0:
        tags = row
        # replace spaces w/ underscores in tag names
        for i in range(len(tags)):
            tags[i] = tags[i].replace(' ', '_')
    else: 
        xmlData.write('<row>' + "\n")
        for i in range(len(tags)):
            xmlData.write('    ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('</row>' + "\n")
            
    rowNum +=1

xmlData.write('</csv_data>' + "\n")
xmlData.close()

9 comments

Judy 11 years, 10 months ago  # | flag

Is there a way to do a batch conversion from csv files to xml files?

FB36 (author) 11 years, 10 months ago  # | flag

Here this one converts all CSV files in a given folder to XML files. You can run this script from a batch file etc.

# csv2xml.py
# Convert all CSV files in a given (using command line argument) folder to XML.
# FB - 20120523
# First row of the csv files must be the header!

# example CSV file: myData.csv
# id,code name,value
# 36,abc,7.6
# 40,def,3.6
# 9,ghi,6.3
# 76,def,99

import sys
import os
import csv
if len(sys.argv) != 2:
    os._exit(1)
path=sys.argv[1] # get folder as a command line argument
os.chdir(path)
csvFiles = [f for f in os.listdir('.') if f.endswith('.csv') or f.endswith('.CSV')]
for csvFile in csvFiles:
    xmlFile = csvFile[:-4] + '.xml'
    csvData = csv.reader(open(csvFile))
    xmlData = open(xmlFile, 'w')
    xmlData.write('<?xml version="1.0"?>' + "\n")
    # there must be only one top-level tag
    xmlData.write('<csv_data>' + "\n")
    rowNum = 0
    for row in csvData:
        if rowNum == 0:
            tags = row
            # replace spaces w/ underscores in tag names
            for i in range(len(tags)):
                tags[i] = tags[i].replace(' ', '_')
        else: 
            xmlData.write('<row>' + "\n")
            for i in range(len(tags)):
                xmlData.write('    ' + '<' + tags[i] + '>' \
                              + row[i] + '</' + tags[i] + '>' + "\n")
            xmlData.write('</row>' + "\n")                
        rowNum +=1
    xmlData.write('</csv_data>' + "\n")
    xmlData.close()
Sam Khan 10 years, 9 months ago  # | flag

this is such a simple and elegant python code. thank you :)

Romain 10 years, 6 months ago  # | flag

Hi,

Thanks for those one! I'm not use to Python and other code, but I try, and have a situation with charmap and decode ... When I run ur script I've got this error: File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 6060: cha racter maps to <undefined>

(Didn't try on linux ... I will monday) The only way I found is to convert my .csv with "notepad++" (for exemple) to ANSI... It's not very conveniant ^^ And I would love to work in UTF-16 (Because that's what "they" are asking ...) I suppose this issue come from the CSV module ... but ... I'm pretty noob ;)

Any idea ?

(And sorry for my poor english)

FB36 (author) 10 years, 5 months ago  # | flag

This version does not use CSV module:

# csv2xml.py
# FB - 20131005
# First row of the csv file(s) must be the header!
import sys
import os
import glob

delimiter = "," # "\t" "|" # delimiter used in the CSV file(s)

# the optional command-line argument maybe a CSV file or a folder
if len(sys.argv) == 2:
    arg = sys.argv[1].lower()
    if arg.endswith('.csv'): # if a CSV file then convert only that file
        csvFiles = [arg]
    else: # if a folder path then convert all CSV files in the that folder
        os.chdir(arg)
        csvFiles = glob.glob('*.csv')
# if no command-line argument then convert all CSV files in the current folder
elif len(sys.argv) == 1:
    csvFiles = glob.glob('*.csv')
else:
    os._exit(1)

for csvFileName in csvFiles:
    xmlFile = csvFileName[:-4] + '.xml'
    # read the CSV file as binary data in case there are non-ASCII characters
    csvFile = open(csvFileName, 'rb')
    csvData = csvFile.readlines()
    csvFile.close()
    tags = csvData.pop(0).strip().replace(' ', '_').split(delimiter)
    xmlData = open(xmlFile, 'w')
    xmlData.write('<?xml version="1.0"?>' + "\n")
    # there must be only one top-level tag
    xmlData.write('<csv_data>' + "\n")
    for row in csvData:
        rowData = row.strip().split(delimiter)
        xmlData.write('<row>' + "\n")
        for i in range(len(tags)):
            xmlData.write('    ' + '<' + tags[i] + '>' \
                          + rowData[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('</row>' + "\n")                
    xmlData.write('</csv_data>' + "\n")
    xmlData.close()
Niko Partanen 9 years, 5 months ago  # | flag

This is really good and useful code! I have just one question: How to make it batch process recursively all csv-files that are in the subfolders too?

Martin Blanchet-Pedersen 7 years, 9 months ago  # | flag

This is a really cool and flexible tool. I need to create a XML file for a external program that uses the following structure:

<?xml version="1.0" encoding="UTF-8"?><users>< <user field1="aa" field2="bb"/> <user field1="cc" field2="dd"/> <user field1="ee" field2="ff"/> </users>

(a lot of empty elements only with attributes).

And when i use the following code everything looks nice, but i can't read it as an xml?

--

xmlData = open(xmlFile, 'w')

xmlData.write('<?xml version="1.0" encoding="UTF-8"?>')

 # there must be only one top-level tag
xmlData.write('<users>' + "\n")

for row in csvData:
    rowData = row.strip().split(delimiter)
    xmlData.write('  ' + '<user')
        for i in range(len(tags)):
           xmlData.write('  ' + tags[i] + '="' + rowData[i] +'"')  
        xmlData.write('/>' + "\n")
xmlData.write('</users>' + "\n")

Any clues why chrome just reads <users></users> instead of listing all the <user> blocks?

Aiden 6 years, 10 months ago  # | flag

Hi, For some reason this code worked on other csv files but not my current one any idea why? """ This script reads a csv file on input and for each line it creates an xml file in output folder

Input file must be in the same folder as the script Filename of the the input file is set in INPUT_CSV_FILE variable Filename of the xml file is built using value in field with the name defined in FIELD_WITH_PRIMARY_KEY variable Output folder is the subfolder with the name defined in OUTPUT_SUBFOLDER variable

Notice that the script is simple and not monkey-proof, so if there are two rows with the same value in that field, the file generated when processing the second row will overwrite the file generated when processing the first one """

import csv from collections import OrderedDict import xml.etree.ElementTree import re import os.path import os

Settings

INPUT_CSV_FILE = "SNMMI_Plato_TEST.csv" FIELD_WITH_PRIMARY_KEY = "S.NO" OUTPUT_SUBFOLDER = "SNMMI XMLs"

def dict_to_xml(tag, d): """ Turn a simple dict of key/value pairs into XML """ elem = xml.etree.ElementTree.Element(tag) for key, val in d.items(): child = xml.etree.ElementTree.Element(key) child.text = str(val) elem.append(child) return elem

def xml_to_file(e, fn): """ Create XML file with the content of Element """ with open(fn, "wt", encoding="utf-8") as file: # encoding as utf-8 file.write("<?xml version='1.0' encoding='utf-8'?>") file.write(xml.etree.ElementTree.tostring(e, encoding="unicode"))

if not os.path.exists(OUTPUT_SUBFOLDER): os.makedirs(OUTPUT_SUBFOLDER)

with open(INPUT_CSV_FILE, "rt") as f: # Programatic error: encoding should be set... but we don't know the encoding! f_csv = csv.DictReader(f) headers_csv = f_csv.fieldnames # headers of csv headers_xml = [re.sub('[^a-zA-Z_]', '_', h) for h in headers_csv] # keep only letters for xml tag to be valid for row in f_csv: my_dict = OrderedDict() # empty ordered dictionary filename = 'filename_error.xml' # error in case field is not found for field in zip(headers_csv, headers_xml): my_dict[field[1]] = row[field[0]] if field[0] == FIELD_WITH_PRIMARY_KEY: filename = re.sub('[^0-9a-zA-Z_]', '_', row[field[0]]) + ".xml" elem = dict_to_xml('row', my_dict) xml_to_file(elem, os.path.join(OUTPUT_SUBFOLDER, filename))

John Doe 6 years, 9 months ago  # | flag

Thank you so much for posting this! It is extremely fast on huge files (about 1 second processing time per 10 MB of data) and outputs to great results. Much appreciated.

Created by FB36 on Mon, 11 Oct 2010 (MIT)
Python recipes (4591)
FB36's recipes (148)

Required Modules

Other Information and Tasks