Welcome, guest | Sign In | My Account | Store | Cart

Hey, I have a .txt file, I want to read the file and the output should be in .xml format. Any suggestions?

Python, 47 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
          Paper 1 / White Spaces are included
  Single Correct Answer Type

1. Text of question 1
  a) Option 1.a    b) Option 1.b
  c) Option 1.c    d) Option 1.d

2. Text of question 2
  a) This is an example of Option 2.a
  b) Option 2.b has a special char α
  c) Option 2.c
  d) Option 2.d

3. Text of question 3
  a) Option 3.a can span multiple
  lines.
  b) Option 3b
  c) Option 3c
  d) Option 3d

My code:

    from lxml import etree
    import csv

    root = etree.Element('data')
    #f = open('input1.txt','rb')
    rdr = csv.reader(open("input1.txt",newline='\n'))
    header = next(rdr)
    for row in rdr:
        eg = etree.SubElement(root, 'eg')
        for h, v in zip(header, row):
            etree.SubElement(eg, h).text = v

     f = open(r"C:\temp\input1.xml", "w")
     f.write(etree.tostring(root))
     f.close()

I'm getting an error like:

    Traceback (most recent call last):
      File "E:\python3.2\input1.py", line 11, in <module>
        etree.SubElement(eg, h).text = v
      File "lxml.etree.pyx", line 2995, in lxml.etree.SubElement (src\lxml\lxml.etree.c:69677)
      File "apihelpers.pxi", line 188, in lxml.etree._makeSubElement (src\lxml\lxml.etree.c:15691)
      File "apihelpers.pxi", line 1571, in lxml.etree._tagValidOrRaise (src\lxml\lxml.etree.c:29249)
    ValueError: Invalid tag name '    Paper 1'
    

And I want it to consider the white spaces also. I'm using Python 3.2. Any suggestions?

2 comments

Burak Tandogan 9 years, 4 months ago  # | flag

well, if you want to change your .txt file data to a .xml file;

with open("filename.txt") as f: rd=f.readlines() with open ("newfile.xml","w") as v: for i in rd: print (i) v.write(i)

You can see lines in the .txt file, and it will create a newfile.xml and will put whatever in filename.txt

Kunjesh Kaushik 9 years, 3 months ago  # | flag

Those funny characters you are getting here are the UTF-8 BOM (byte order mark). Try

import codecs
# ...
infile = codecs.open('input.txt', encoding='utf-8')
# ...
outfile = codecs.open(r'C:\temp\input1.xml', 'w' encoding='utf-8')

See the Codecs library [1] and Unicode HOWTO [2] for python.

[1] https://docs.python.org/2/library/codecs.html [2] https://docs.python.org/2/howto/unicode.html

Created by Varsha Holla on Fri, 19 Dec 2014 (MIT)
Python recipes (4591)
Varsha Holla's recipes (1)

Required Modules

  • (none specified)

Other Information and Tasks