Unicode and rdf

From: Paul Prescod <p...@prescod.net>
Wed, 10 Mar 2004 11:24:45 -0800
Richard West wrote:
 > I'm trying to parse the rdf dumps from dmoz.org (Open Directory> Project) and am having great difficulty just getting Python to read> the files.  The files are RDF in UTF-8 encoding according to the> dmoz.org web site, but I get the following error:>> UnicodeDecodeError: 'utf8' codec can't decode bytes in position> 52376-52378: invalid data

Perhaps you could try using another XML parser or validator unrelated to 
Python. I am 90% confident that you will find that it will report the 
same problem. For instance you could use "xmlwf" that comes with Expat


  Paul Prescod

