Convert CSV to XML.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | # csv2xml.py
# FB - 201010107
# First row of the csv file must be header!
# example CSV file: myData.csv
# id,code name,value
# 36,abc,7.6
# 40,def,3.6
# 9,ghi,6.3
# 76,def,99
import csv
csvFile = 'myData.csv'
xmlFile = 'myData.xml'
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<csv_data>' + "\n")
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
else:
xmlData.write('<row>' + "\n")
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write('</row>' + "\n")
rowNum +=1
xmlData.write('</csv_data>' + "\n")
xmlData.close()
|
Is there a way to do a batch conversion from csv files to xml files?
Here this one converts all CSV files in a given folder to XML files. You can run this script from a batch file etc.
this is such a simple and elegant python code. thank you :)
Hi,
Thanks for those one! I'm not use to Python and other code, but I try, and have a situation with charmap and decode ... When I run ur script I've got this error: File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 6060: cha racter maps to <undefined>
(Didn't try on linux ... I will monday) The only way I found is to convert my .csv with "notepad++" (for exemple) to ANSI... It's not very conveniant ^^ And I would love to work in UTF-16 (Because that's what "they" are asking ...) I suppose this issue come from the CSV module ... but ... I'm pretty noob ;)
Any idea ?
(And sorry for my poor english)
This version does not use CSV module:
This is really good and useful code! I have just one question: How to make it batch process recursively all csv-files that are in the subfolders too?
This is a really cool and flexible tool. I need to create a XML file for a external program that uses the following structure:
<?xml version="1.0" encoding="UTF-8"?><users>< <user field1="aa" field2="bb"/> <user field1="cc" field2="dd"/> <user field1="ee" field2="ff"/> </users>
(a lot of empty elements only with attributes).
And when i use the following code everything looks nice, but i can't read it as an xml?
--
xmlData = open(xmlFile, 'w')
xmlData.write('</users>' + "\n")
Any clues why chrome just reads <users></users> instead of listing all the <user> blocks?
Hi, For some reason this code worked on other csv files but not my current one any idea why? """ This script reads a csv file on input and for each line it creates an xml file in output folder
Input file must be in the same folder as the script Filename of the the input file is set in INPUT_CSV_FILE variable Filename of the xml file is built using value in field with the name defined in FIELD_WITH_PRIMARY_KEY variable Output folder is the subfolder with the name defined in OUTPUT_SUBFOLDER variable
Notice that the script is simple and not monkey-proof, so if there are two rows with the same value in that field, the file generated when processing the second row will overwrite the file generated when processing the first one """
import csv from collections import OrderedDict import xml.etree.ElementTree import re import os.path import os
Settings
INPUT_CSV_FILE = "SNMMI_Plato_TEST.csv" FIELD_WITH_PRIMARY_KEY = "S.NO" OUTPUT_SUBFOLDER = "SNMMI XMLs"
def dict_to_xml(tag, d): """ Turn a simple dict of key/value pairs into XML """ elem = xml.etree.ElementTree.Element(tag) for key, val in d.items(): child = xml.etree.ElementTree.Element(key) child.text = str(val) elem.append(child) return elem
def xml_to_file(e, fn): """ Create XML file with the content of Element """ with open(fn, "wt", encoding="utf-8") as file: # encoding as utf-8 file.write("<?xml version='1.0' encoding='utf-8'?>") file.write(xml.etree.ElementTree.tostring(e, encoding="unicode"))
if not os.path.exists(OUTPUT_SUBFOLDER): os.makedirs(OUTPUT_SUBFOLDER)
with open(INPUT_CSV_FILE, "rt") as f: # Programatic error: encoding should be set... but we don't know the encoding! f_csv = csv.DictReader(f) headers_csv = f_csv.fieldnames # headers of csv headers_xml = [re.sub('[^a-zA-Z_]', '_', h) for h in headers_csv] # keep only letters for xml tag to be valid for row in f_csv: my_dict = OrderedDict() # empty ordered dictionary filename = 'filename_error.xml' # error in case field is not found for field in zip(headers_csv, headers_xml): my_dict[field[1]] = row[field[0]] if field[0] == FIELD_WITH_PRIMARY_KEY: filename = re.sub('[^0-9a-zA-Z_]', '_', row[field[0]]) + ".xml" elem = dict_to_xml('row', my_dict) xml_to_file(elem, os.path.join(OUTPUT_SUBFOLDER, filename))
Thank you so much for posting this! It is extremely fast on huge files (about 1 second processing time per 10 MB of data) and outputs to great results. Much appreciated.