Safely convert any given string type (text or binary) to unicode. You won't get UnicodeDecodeError error, at the cost of ignoring those errors during conversion, which is useful for debugging and logging. This recipe requires the six package.
1 2 3 4 5 6 7 8 9 10 11 | def safe_unicode(obj):
"""Return the unicode/text representation of `obj` without throwing UnicodeDecodeError
Returned value is only a *representation*, not necessarily identical.
"""
if type(obj) not in (six.text_type, six.binary_type):
obj = six.text_type(obj)
if type(obj) is six.text_type:
return obj
else:
return obj.decode(errors='ignore')
|
Here's the test case:
def test_safe_unicode():
from applib.misc import safe_unicode
import six
abc_bytes = b'ab\nc'
abc_text = 'ab\nc'
assert safe_unicode(abc_bytes) == abc_text
assert safe_unicode(abc_text) == abc_text
foo_text = 'abc' # note: \x89 is ignored.
foo_bytes = b'\x89abc'
assert safe_unicode(foo_text) == foo_text
assert safe_unicode(foo_bytes) == foo_text
Tags: unicode
If you only care about representation, and not about decoding, then why not simply use
my_nasty_unicode_string.encode('us-ascii','xmlcharrefreplace')
? This will not generate exceptions in broken consoles like the ones on win32, and has the advantage that it will print properly on an HTML page.Refer to the object directly -- the more common method. Most of the time, you'll have some object in the template's context you want to attach the comment to; you can simply use that object.
For example, in a blog entry page that has a variable named entry, you could use the following to load the number of comments: