Richard West wrote:
> I'm trying to parse the rdf dumps from dmoz.org (Open Directory> Project) and am having great difficulty just getting Python to read> the files. The files are RDF in UTF-8 encoding according to the> dmoz.org web site, but I get the following error:>> UnicodeDecodeError: 'utf8' codec can't decode bytes in position> 52376-52378: invalid data
Perhaps you could try using another XML parser or validator unrelated to
Python. I am 90% confident that you will find that it will report the
same problem. For instance you could use "xmlwf" that comes with Expat
http://sourceforge.net/projects/expat/
Paul Prescod