Popular recipes tagged "mojibake"http://code.activestate.com/recipes/tags/mojibake/popular/2016-09-28T02:33:32-07:00ActiveState Code RecipesRepair common Unicode mistakes after they've been made (obsoleted by ftfy package) (Python) 2016-09-28T02:33:32-07:00Rob Speerhttp://code.activestate.com/recipes/users/4183271/http://code.activestate.com/recipes/578243-repair-common-unicode-mistakes-after-theyve-been-m/ <p style="color: grey"> Python recipe 578243 by <a href="/recipes/users/4183271/">Rob Speer</a> (<a href="/recipes/tags/gremlins/">gremlins</a>, <a href="/recipes/tags/mojibake/">mojibake</a>, <a href="/recipes/tags/obsolete/">obsolete</a>, <a href="/recipes/tags/unicode/">unicode</a>, <a href="/recipes/tags/utf8/">utf8</a>). Revision 4. </p> <p>Something you will find all over the place, in real-world text, is text that's encoded as UTF-8, decoded in some ugly format like Latin-1 or even Windows codepage 1252, and mistakenly encoded as UTF-8 again.</p> <p>This causes your perfectly good Unicode-aware code to end up with garbage text because someone else (or maybe "someone else") made a mistake.</p> <p>This function looks for the evidence of that having happened and fixes it. It determines whether it should replace nonsense sequences of single-byte characters that were really meant to be UTF-8 characters, and if so, turns them into the correctly-encoded Unicode character that they were meant to represent.</p>