Popular recipes by Rob Speer http://code.activestate.com/recipes/users/4183271/2016-09-28T02:33:32-07:00ActiveState Code RecipesRepair common Unicode mistakes after they've been made (obsoleted by ftfy package) (Python)
2016-09-28T02:33:32-07:00Rob Speerhttp://code.activestate.com/recipes/users/4183271/http://code.activestate.com/recipes/578243-repair-common-unicode-mistakes-after-theyve-been-m/
<p style="color: grey">
Python
recipe 578243
by <a href="/recipes/users/4183271/">Rob Speer</a>
(<a href="/recipes/tags/gremlins/">gremlins</a>, <a href="/recipes/tags/mojibake/">mojibake</a>, <a href="/recipes/tags/obsolete/">obsolete</a>, <a href="/recipes/tags/unicode/">unicode</a>, <a href="/recipes/tags/utf8/">utf8</a>).
Revision 4.
</p>
<p>Something you will find all over the place, in real-world text, is text that's encoded as UTF-8, decoded in some ugly format like Latin-1 or even Windows codepage 1252, and mistakenly encoded as UTF-8 again.</p>
<p>This causes your perfectly good Unicode-aware code to end up with garbage text because someone else (or maybe "someone else") made a mistake.</p>
<p>This function looks for the evidence of that having happened and fixes it. It determines whether it should replace nonsense sequences of single-byte characters that were really meant to be UTF-8 characters, and if so, turns them into the correctly-encoded Unicode character that they were meant to represent.</p>