| Store | Cart

Unicode and rdf

From: Paul Prescod <p...@prescod.net>
Wed, 10 Mar 2004 11:24:45 -0800
Richard West wrote:
 > I'm trying to parse the rdf dumps from dmoz.org (Open Directory> Project) and am having great difficulty just getting Python to read> the files.  The files are RDF in UTF-8 encoding according to the> dmoz.org web site, but I get the following error:>> UnicodeDecodeError: 'utf8' codec can't decode bytes in position> 52376-52378: invalid data

Perhaps you could try using another XML parser or validator unrelated to 
Python. I am 90% confident that you will find that it will report the 
same problem. For instance you could use "xmlwf" that comes with Expat


  Paul Prescod

Recent Messages in this Thread
Richard West Mar 10, 2004 05:41 am
Richard West Mar 10, 2004 05:45 am
Mickel Grönroos Mar 10, 2004 06:25 am
A.M. Kuchling Mar 10, 2004 01:08 pm
deelan Mar 10, 2004 01:26 pm
Paul Prescod Mar 10, 2004 07:24 pm
Messages in this thread