| Store | Cart

Unicode and rdf

From: Paul Prescod <p...@prescod.net>
Wed, 10 Mar 2004 11:24:45 -0800
Richard West wrote:
 > I'm trying to parse the rdf dumps from dmoz.org (Open Directory> Project) and am having great difficulty just getting Python to read> the files.  The files are RDF in UTF-8 encoding according to the> dmoz.org web site, but I get the following error:>> UnicodeDecodeError: 'utf8' codec can't decode bytes in position> 52376-52378: invalid data


Perhaps you could try using another XML parser or validator unrelated to 
Python. I am 90% confident that you will find that it will report the 
same problem. For instance you could use "xmlwf" that comes with Expat

http://sourceforge.net/projects/expat/

  Paul Prescod

Recent Messages in this Thread
Richard West Mar 10, 2004 05:41 am
Richard West Mar 10, 2004 05:45 am
Mickel Grönroos Mar 10, 2004 06:25 am
A.M. Kuchling Mar 10, 2004 01:08 pm
deelan Mar 10, 2004 01:26 pm
Paul Prescod Mar 10, 2004 07:24 pm
Messages in this thread