Top-rated recipes tagged "unicode"http://code.activestate.com/recipes/tags/unicode/top/2016-09-28T02:33:32-07:00ActiveState Code RecipesRepair common Unicode mistakes after they've been made (obsoleted by ftfy package) (Python)
2016-09-28T02:33:32-07:00Rob Speerhttp://code.activestate.com/recipes/users/4183271/http://code.activestate.com/recipes/578243-repair-common-unicode-mistakes-after-theyve-been-m/
<p style="color: grey">
Python
recipe 578243
by <a href="/recipes/users/4183271/">Rob Speer</a>
(<a href="/recipes/tags/gremlins/">gremlins</a>, <a href="/recipes/tags/mojibake/">mojibake</a>, <a href="/recipes/tags/obsolete/">obsolete</a>, <a href="/recipes/tags/unicode/">unicode</a>, <a href="/recipes/tags/utf8/">utf8</a>).
Revision 4.
</p>
<p>Something you will find all over the place, in real-world text, is text that's encoded as UTF-8, decoded in some ugly format like Latin-1 or even Windows codepage 1252, and mistakenly encoded as UTF-8 again.</p>
<p>This causes your perfectly good Unicode-aware code to end up with garbage text because someone else (or maybe "someone else") made a mistake.</p>
<p>This function looks for the evidence of that having happened and fixes it. It determines whether it should replace nonsense sequences of single-byte characters that were really meant to be UTF-8 characters, and if so, turns them into the correctly-encoded Unicode character that they were meant to represent.</p>
Rename non-ASCII filenames to readable ASCII, i.e. replace accented characters, etc (Python)
2011-11-03T17:50:55-07:00ccpizzahttp://code.activestate.com/recipes/users/4170754/http://code.activestate.com/recipes/577226-rename-non-ascii-filenames-to-readable-ascii-ie-re/
<p style="color: grey">
Python
recipe 577226
by <a href="/recipes/users/4170754/">ccpizza</a>
(<a href="/recipes/tags/conversion/">conversion</a>, <a href="/recipes/tags/renaming/">renaming</a>, <a href="/recipes/tags/unicode/">unicode</a>).
Revision 3.
</p>
<p>The script converts any accented characters in filenames to their ASCII equivalents. e.g.:</p>
<p>Example:</p>
<pre class="prettyprint"><code>â > a
ä > a
à > a
á > a
é > e
í > i
ó > o
ú > u
ñ > n
ü > u
...
</code></pre>
<p>Before-and-after example:</p>
<pre class="prettyprint"><code>01_Antonín_Dvořák_Allegro.mp3 >>> 01_Antonin_Dvorak_Allegro.mp3
</code></pre>
<p>Usage:</p>
<pre class="prettyprint"><code>Running the script without arguments will rename all files in the current folder.
!!!WARNING!!! ***No*** backups are created.
</code></pre>
No more UnicodeDecodeErrors when printing (Python)
2013-12-17T23:02:37-08:00Ádám Szieberthhttp://code.activestate.com/recipes/users/4188745/http://code.activestate.com/recipes/578788-no-more-unicodedecodeerrors-when-printing/
<p style="color: grey">
Python
recipe 578788
by <a href="/recipes/users/4188745/">Ádám Szieberth</a>
(<a href="/recipes/tags/decode/">decode</a>, <a href="/recipes/tags/error/">error</a>, <a href="/recipes/tags/print/">print</a>, <a href="/recipes/tags/terminal/">terminal</a>, <a href="/recipes/tags/unicode/">unicode</a>).
Revision 2.
</p>
<p>I hate getting UnicodeDecodeErrors when I use print() for bugtracing or for some other reason. I decided to make a module which spares me the headache.</p>
Remove UTF-8 Byte Order Mark (BOM) from text files (Bash)
2011-10-18T06:37:52-07:00Graham Poulterhttp://code.activestate.com/recipes/users/4172291/http://code.activestate.com/recipes/577912-remove-utf-8-byte-order-mark-bom-from-text-files/
<p style="color: grey">
Bash
recipe 577912
by <a href="/recipes/users/4172291/">Graham Poulter</a>
(<a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/unicode/">unicode</a>, <a href="/recipes/tags/utf8/">utf8</a>).
</p>
<p>Shell script to removes UTF-8 Byte Order Mark (BOM) from text files, where present.</p>
Correctly reading CSV files in arbitrary encodings (Python)
2011-07-25T07:29:08-07:00Devin Jeanpierrehttp://code.activestate.com/recipes/users/4178508/http://code.activestate.com/recipes/577778-correctly-reading-csv-files-in-arbitrary-encodings/
<p style="color: grey">
Python
recipe 577778
by <a href="/recipes/users/4178508/">Devin Jeanpierre</a>
(<a href="/recipes/tags/csv/">csv</a>, <a href="/recipes/tags/unicode/">unicode</a>).
Revision 4.
</p>
<p>Recipe for using unicode files (i.e. files opened with <code>codecs.open</code>) with the csv module.</p>
Safe unicode representation (Python)
2011-03-17T23:28:40-07:00Sridhar Ratnakumarhttp://code.activestate.com/recipes/users/4169511/http://code.activestate.com/recipes/577614-safe-unicode-representation/
<p style="color: grey">
Python
recipe 577614
by <a href="/recipes/users/4169511/">Sridhar Ratnakumar</a>
(<a href="/recipes/tags/unicode/">unicode</a>).
</p>
<p>Safely convert any given string type (text or binary) to unicode. You won't get UnicodeDecodeError error, at the cost of ignoring those errors during conversion, which is useful for debugging and logging. This recipe requires the <a href="http://code.activestate.com/pypm/six/">six</a> package.</p>
Simple reverse converter of unicode code points string (Python)
2009-09-22T20:02:32-07:00Ryanhttp://code.activestate.com/recipes/users/4171767/http://code.activestate.com/recipes/576909-simple-reverse-converter-of-unicode-code-points-st/
<p style="color: grey">
Python
recipe 576909
by <a href="/recipes/users/4171767/">Ryan</a>
(<a href="/recipes/tags/code/">code</a>, <a href="/recipes/tags/points/">points</a>, <a href="/recipes/tags/prefix/">prefix</a>, <a href="/recipes/tags/reverse/">reverse</a>, <a href="/recipes/tags/str/">str</a>, <a href="/recipes/tags/string/">string</a>, <a href="/recipes/tags/u/">u</a>, <a href="/recipes/tags/unicode/">unicode</a>).
Revision 4.
</p>
<p>It's a simple recipe to convert a str type string with pure unicode code point (e.g string = <strong>"\u5982\u679c\u7231"</strong> ) to an unicode type string.
Actually, this method has the same effect with <strong>'u'</strong> prefix. But differently, it allows you to pass a variable of code points string as well as a constant one.</p>
safe print (Python)
2009-01-02T15:40:30-08:00Trent Mickhttp://code.activestate.com/recipes/users/4173505/http://code.activestate.com/recipes/576602-safe-print/
<p style="color: grey">
Python
recipe 576602
by <a href="/recipes/users/4173505/">Trent Mick</a>
(<a href="/recipes/tags/encode/">encode</a>, <a href="/recipes/tags/safe/">safe</a>, <a href="/recipes/tags/unicode/">unicode</a>).
</p>
<p>A replacement for "print" that will safely handle unicode conversion.</p>
Decorator for writing polymorphic functions (Python)
2010-09-21T06:08:48-07:00Baptiste Carvellohttp://code.activestate.com/recipes/users/4175002/http://code.activestate.com/recipes/577393-decorator-for-writing-polymorphic-functions/
<p style="color: grey">
Python
recipe 577393
by <a href="/recipes/users/4175002/">Baptiste Carvello</a>
(<a href="/recipes/tags/polymorphism/">polymorphism</a>, <a href="/recipes/tags/unicode/">unicode</a>).
Revision 2.
</p>
<p>Python 3 makes a clean separation between unicode text strings (str) and byte
strings (bytes). However, for some tasks (notably networking), it makes sense
to apply the same process to str and bytes, usually relying on the byte string
beeing encoded with an ASCII compatible encoding.</p>
<p>In this context, a polymorphic function is one which will operate on unicode
strings (str) or bytes objects (bytes) depending on the type of the arguments.
The common difficulty is that string constants used in the function also have
to be of the right type. This decorator helps by allowing to use a different
set of constants depending on the type of the argument.</p>