Welcome, guest | Sign In | My Account | Store | Cart

How to read an input from .txt file in python and the output in .xml (Python recipe) by Varsha Holla
ActiveState Code (http://code.activestate.com/recipes/578985/)

Hey, I have a .txt file, I want to read the file and the output should be in .xml format. Any suggestions?

                Paper 1 / White Spaces are included
  Single Correct Answer Type

1. Text of question 1
  a) Option 1.a    b) Option 1.b
  c) Option 1.c    d) Option 1.d

2. Text of question 2
  a) This is an example of Option 2.a
  b) Option 2.b has a special char Î±
  c) Option 2.c
  d) Option 2.d

3. Text of question 3
  a) Option 3.a can span multiple
  lines.
  b) Option 3b
  c) Option 3c
  d) Option 3d

My code:

    from lxml import etree
    import csv

    root = etree.Element('data')
    #f = open('input1.txt','rb')
    rdr = csv.reader(open("input1.txt",newline='\n'))
    header = next(rdr)
    for row in rdr:
        eg = etree.SubElement(root, 'eg')
        for h, v in zip(header, row):
            etree.SubElement(eg, h).text = v

     f = open(r"C:\temp\input1.xml", "w")
     f.write(etree.tostring(root))
     f.close()

I'm getting an error like:

    Traceback (most recent call last):
      File "E:\python3.2\input1.py", line 11, in <module>
        etree.SubElement(eg, h).text = v
      File "lxml.etree.pyx", line 2995, in lxml.etree.SubElement (src\lxml\lxml.etree.c:69677)
      File "apihelpers.pxi", line 188, in lxml.etree._makeSubElement (src\lxml\lxml.etree.c:15691)
      File "apihelpers.pxi", line 1571, in lxml.etree._tagValidOrRaise (src\lxml\lxml.etree.c:29249)
    ValueError: Invalid tag name 'Ã¯Â»Â¿    Paper 1'
    

      

And I want it to consider the white spaces also. I'm using Python 3.2. Any suggestions?

Tags: python3_1

2 comments

Burak Tandogan 9 years, 4 months ago # | flag

well, if you want to change your .txt file data to a .xml file;

with open("filename.txt") as f: rd=f.readlines() with open ("newfile.xml","w") as v: for i in rd: print (i) v.write(i)

You can see lines in the .txt file, and it will create a newfile.xml and will put whatever in filename.txt

Kunjesh Kaushik 9 years, 3 months ago # | flag

Those funny characters you are getting here are the UTF-8 BOM (byte order mark). Try

import codecs
# ...
infile = codecs.open('input.txt', encoding='utf-8')
# ...
outfile = codecs.open(r'C:\temp\input1.xml', 'w' encoding='utf-8')

See the Codecs library [1] and Unicode HOWTO [2] for python.

[1] https://docs.python.org/2/library/codecs.html [2] https://docs.python.org/2/howto/unicode.html

Created by Varsha Holla on Fri, 19 Dec 2014 (MIT)

◄	Python recipes (4591)	►
◄	Varsha Holla's recipes (1)	►

Required Modules

(none specified)

Other Information and Tasks

Licensed under the MIT License
Viewed 20690 times
Revision 3 (updated 9 years ago)

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

How to read an input from .txt file in python and the output in .xml (Python recipe) by Varsha Holla ActiveState Code (http://code.activestate.com/recipes/578985/)