Top-rated recipes tagged "unicode"http://code.activestate.com/recipes/tags/unicode/top/2016-09-28T02:33:32-07:00ActiveState Code RecipesRepair common Unicode mistakes after they've been made (obsoleted by ftfy package) (Python) 2016-09-28T02:33:32-07:00Rob Speerhttp://code.activestate.com/recipes/users/4183271/http://code.activestate.com/recipes/578243-repair-common-unicode-mistakes-after-theyve-been-m/ <p style="color: grey"> Python recipe 578243 by <a href="/recipes/users/4183271/">Rob Speer</a> (<a href="/recipes/tags/gremlins/">gremlins</a>, <a href="/recipes/tags/mojibake/">mojibake</a>, <a href="/recipes/tags/obsolete/">obsolete</a>, <a href="/recipes/tags/unicode/">unicode</a>, <a href="/recipes/tags/utf8/">utf8</a>). Revision 4. </p> <p>Something you will find all over the place, in real-world text, is text that's encoded as UTF-8, decoded in some ugly format like Latin-1 or even Windows codepage 1252, and mistakenly encoded as UTF-8 again.</p> <p>This causes your perfectly good Unicode-aware code to end up with garbage text because someone else (or maybe "someone else") made a mistake.</p> <p>This function looks for the evidence of that having happened and fixes it. It determines whether it should replace nonsense sequences of single-byte characters that were really meant to be UTF-8 characters, and if so, turns them into the correctly-encoded Unicode character that they were meant to represent.</p> Rename non-ASCII filenames to readable ASCII, i.e. replace accented characters, etc (Python) 2011-11-03T17:50:55-07:00ccpizzahttp://code.activestate.com/recipes/users/4170754/http://code.activestate.com/recipes/577226-rename-non-ascii-filenames-to-readable-ascii-ie-re/ <p style="color: grey"> Python recipe 577226 by <a href="/recipes/users/4170754/">ccpizza</a> (<a href="/recipes/tags/conversion/">conversion</a>, <a href="/recipes/tags/renaming/">renaming</a>, <a href="/recipes/tags/unicode/">unicode</a>). Revision 3. </p> <p>The script converts any accented characters in filenames to their ASCII equivalents. e.g.:</p> <p>Example:</p> <pre class="prettyprint"><code>â &gt; a ä &gt; a à &gt; a á &gt; a é &gt; e í &gt; i ó &gt; o ú &gt; u ñ &gt; n ü &gt; u ... </code></pre> <p>Before-and-after example:</p> <pre class="prettyprint"><code>01_Antonín_Dvořák_Allegro.mp3 &gt;&gt;&gt; 01_Antonin_Dvorak_Allegro.mp3 </code></pre> <p>Usage:</p> <pre class="prettyprint"><code>Running the script without arguments will rename all files in the current folder. !!!WARNING!!! ***No*** backups are created. </code></pre> No more UnicodeDecodeErrors when printing (Python) 2013-12-17T23:02:37-08:00Ádám Szieberthhttp://code.activestate.com/recipes/users/4188745/http://code.activestate.com/recipes/578788-no-more-unicodedecodeerrors-when-printing/ <p style="color: grey"> Python recipe 578788 by <a href="/recipes/users/4188745/">Ádám Szieberth</a> (<a href="/recipes/tags/decode/">decode</a>, <a href="/recipes/tags/error/">error</a>, <a href="/recipes/tags/print/">print</a>, <a href="/recipes/tags/terminal/">terminal</a>, <a href="/recipes/tags/unicode/">unicode</a>). Revision 2. </p> <p>I hate getting UnicodeDecodeErrors when I use print() for bugtracing or for some other reason. I decided to make a module which spares me the headache.</p> Remove UTF-8 Byte Order Mark (BOM) from text files (Bash) 2011-10-18T06:37:52-07:00Graham Poulterhttp://code.activestate.com/recipes/users/4172291/http://code.activestate.com/recipes/577912-remove-utf-8-byte-order-mark-bom-from-text-files/ <p style="color: grey"> Bash recipe 577912 by <a href="/recipes/users/4172291/">Graham Poulter</a> (<a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/unicode/">unicode</a>, <a href="/recipes/tags/utf8/">utf8</a>). </p> <p>Shell script to removes UTF-8 Byte Order Mark (BOM) from text files, where present.</p> Correctly reading CSV files in arbitrary encodings (Python) 2011-07-25T07:29:08-07:00Devin Jeanpierrehttp://code.activestate.com/recipes/users/4178508/http://code.activestate.com/recipes/577778-correctly-reading-csv-files-in-arbitrary-encodings/ <p style="color: grey"> Python recipe 577778 by <a href="/recipes/users/4178508/">Devin Jeanpierre</a> (<a href="/recipes/tags/csv/">csv</a>, <a href="/recipes/tags/unicode/">unicode</a>). Revision 4. </p> <p>Recipe for using unicode files (i.e. files opened with <code>codecs.open</code>) with the csv module.</p> Safe unicode representation (Python) 2011-03-17T23:28:40-07:00Sridhar Ratnakumarhttp://code.activestate.com/recipes/users/4169511/http://code.activestate.com/recipes/577614-safe-unicode-representation/ <p style="color: grey"> Python recipe 577614 by <a href="/recipes/users/4169511/">Sridhar Ratnakumar</a> (<a href="/recipes/tags/unicode/">unicode</a>). </p> <p>Safely convert any given string type (text or binary) to unicode. You won't get UnicodeDecodeError error, at the cost of ignoring those errors during conversion, which is useful for debugging and logging. This recipe requires the <a href="http://code.activestate.com/pypm/six/">six</a> package.</p> Simple reverse converter of unicode code points string (Python) 2009-09-22T20:02:32-07:00Ryanhttp://code.activestate.com/recipes/users/4171767/http://code.activestate.com/recipes/576909-simple-reverse-converter-of-unicode-code-points-st/ <p style="color: grey"> Python recipe 576909 by <a href="/recipes/users/4171767/">Ryan</a> (<a href="/recipes/tags/code/">code</a>, <a href="/recipes/tags/points/">points</a>, <a href="/recipes/tags/prefix/">prefix</a>, <a href="/recipes/tags/reverse/">reverse</a>, <a href="/recipes/tags/str/">str</a>, <a href="/recipes/tags/string/">string</a>, <a href="/recipes/tags/u/">u</a>, <a href="/recipes/tags/unicode/">unicode</a>). Revision 4. </p> <p>It's a simple recipe to convert a str type string with pure unicode code point (e.g string = <strong>"\u5982\u679c\u7231"</strong> ) to an unicode type string. Actually, this method has the same effect with <strong>'u'</strong> prefix. But differently, it allows you to pass a variable of code points string as well as a constant one.</p> safe print (Python) 2009-01-02T15:40:30-08:00Trent Mickhttp://code.activestate.com/recipes/users/4173505/http://code.activestate.com/recipes/576602-safe-print/ <p style="color: grey"> Python recipe 576602 by <a href="/recipes/users/4173505/">Trent Mick</a> (<a href="/recipes/tags/encode/">encode</a>, <a href="/recipes/tags/safe/">safe</a>, <a href="/recipes/tags/unicode/">unicode</a>). </p> <p>A replacement for "print" that will safely handle unicode conversion.</p> Decorator for writing polymorphic functions (Python) 2010-09-21T06:08:48-07:00Baptiste Carvellohttp://code.activestate.com/recipes/users/4175002/http://code.activestate.com/recipes/577393-decorator-for-writing-polymorphic-functions/ <p style="color: grey"> Python recipe 577393 by <a href="/recipes/users/4175002/">Baptiste Carvello</a> (<a href="/recipes/tags/polymorphism/">polymorphism</a>, <a href="/recipes/tags/unicode/">unicode</a>). Revision 2. </p> <p>Python 3 makes a clean separation between unicode text strings (str) and byte strings (bytes). However, for some tasks (notably networking), it makes sense to apply the same process to str and bytes, usually relying on the byte string beeing encoded with an ASCII compatible encoding.</p> <p>In this context, a polymorphic function is one which will operate on unicode strings (str) or bytes objects (bytes) depending on the type of the arguments. The common difficulty is that string constants used in the function also have to be of the right type. This decorator helps by allowing to use a different set of constants depending on the type of the argument.</p>