Welcome, guest | Sign In | My Account | Store | Cart

Safely convert any given string type (text or binary) to unicode. You won't get UnicodeDecodeError error, at the cost of ignoring those errors during conversion, which is useful for debugging and logging. This recipe requires the six package.

Python, 11 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def safe_unicode(obj):
    """Return the unicode/text representation of `obj` without throwing UnicodeDecodeError

    Returned value is only a *representation*, not necessarily identical.
    """
    if type(obj) not in (six.text_type, six.binary_type):
        obj = six.text_type(obj)
    if type(obj) is six.text_type:
        return obj
    else:
        return obj.decode(errors='ignore')

Here's the test case:

def test_safe_unicode():
    from applib.misc import safe_unicode
    import six

    abc_bytes = b'ab\nc'
    abc_text = 'ab\nc'

    assert safe_unicode(abc_bytes) == abc_text
    assert safe_unicode(abc_text) == abc_text

    foo_text = 'abc'  # note: \x89 is ignored.
    foo_bytes = b'\x89abc'

    assert safe_unicode(foo_text) == foo_text
    assert safe_unicode(foo_bytes) == foo_text

2 comments

ccpizza 13 years, 1 month ago  # | flag

If you only care about representation, and not about decoding, then why not simply use my_nasty_unicode_string.encode('us-ascii','xmlcharrefreplace')? This will not generate exceptions in broken consoles like the ones on win32, and has the advantage that it will print properly on an HTML page.

Mike Ivanov 13 years ago  # | flag

Refer to the object directly -- the more common method. Most of the time, you'll have some object in the template's context you want to attach the comment to; you can simply use that object.

For example, in a blog entry page that has a variable named entry, you could use the following to load the number of comments:

EMAIL_HOST = "mail.activestate.com"
EMAIL_SUBJECT_PREFIX = "[code3.as.com] "
SEND_BROKEN_LINK_EMAILS = False

NOTIFY_ON_NEW_COMMENTS = True
NOTIFICATION_FROM_EMAIL = "ActiveState Code Comment <no-reply@activestate.com>"  # long line .. comment sdfdsf sdfd f

#---- Load local settings, if any.
# For now the first one wins.

_local_settings_paths = [
    expanduser("~/etc/sites/code3.activestate.com/local_settings.py"),
    "/etc/sites/code3.activestate.com/local_settings.py",
]
for path in _local_settings_paths:
    if exists(path):
        sys.path.insert(0, dirname(path))
        from local_settings import *
        del sys.path[0]
        break
del _local_settings_paths