Welcome, guest | Sign In | My Account | Store | Cart

Safe unicode representation (Python recipe) by Sridhar Ratnakumar
ActiveState Code (http://code.activestate.com/recipes/577614/)

Safely convert any given string type (text or binary) to unicode. You won't get UnicodeDecodeError error, at the cost of ignoring those errors during conversion, which is useful for debugging and logging. This recipe requires the six package.

      def safe_unicode(obj):
    """Return the unicode/text representation of `obj` without throwing UnicodeDecodeError

    Returned value is only a *representation*, not necessarily identical.
    """
    if type(obj) not in (six.text_type, six.binary_type):
        obj = six.text_type(obj)
    if type(obj) is six.text_type:
        return obj
    else:
        return obj.decode(errors='ignore')

      

Here's the test case:

def test_safe_unicode():
    from applib.misc import safe_unicode
    import six

    abc_bytes = b'ab\nc'
    abc_text = 'ab\nc'

    assert safe_unicode(abc_bytes) == abc_text
    assert safe_unicode(abc_text) == abc_text

    foo_text = 'abc'  # note: \x89 is ignored.
    foo_bytes = b'\x89abc'

    assert safe_unicode(foo_text) == foo_text
    assert safe_unicode(foo_bytes) == foo_text

Tags: unicode

2 comments

ccpizza 13 years, 1 month ago # | flag

If you only care about representation, and not about decoding, then why not simply use my_nasty_unicode_string.encode('us-ascii','xmlcharrefreplace')? This will not generate exceptions in broken consoles like the ones on win32, and has the advantage that it will print properly on an HTML page.

Mike Ivanov 13 years ago # | flag

Refer to the object directly -- the more common method. Most of the time, you'll have some object in the template's context you want to attach the comment to; you can simply use that object.

For example, in a blog entry page that has a variable named entry, you could use the following to load the number of comments:

EMAIL_HOST = "mail.activestate.com"
EMAIL_SUBJECT_PREFIX = "[code3.as.com] "
SEND_BROKEN_LINK_EMAILS = False

NOTIFY_ON_NEW_COMMENTS = True
NOTIFICATION_FROM_EMAIL = "ActiveState Code Comment <no-reply@activestate.com>"  # long line .. comment sdfdsf sdfd f

#---- Load local settings, if any.
# For now the first one wins.

_local_settings_paths = [
    expanduser("~/etc/sites/code3.activestate.com/local_settings.py"),
    "/etc/sites/code3.activestate.com/local_settings.py",
]
for path in _local_settings_paths:
    if exists(path):
        sys.path.insert(0, dirname(path))
        from local_settings import *
        del sys.path[0]
        break
del _local_settings_paths

Created by Sridhar Ratnakumar on Thu, 17 Mar 2011 (MIT)

◄	Python recipes (4591)	►
◄	Sridhar Ratnakumar's recipes (7)	►

Required Modules

Other Information and Tasks

Licensed under the MIT License
Viewed 7387 times
Revision 1

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Safe unicode representation (Python recipe) by Sridhar Ratnakumar ActiveState Code (http://code.activestate.com/recipes/577614/)