I received UnicodeEncodeError when playing with various codepages in source code/files/standard streams. Sometime I receive UnicodeEncodeError when script launched via scheduler or in long running batch when parsing unpredictable [alien ;)] HTML.
Function console() helps avoid this exceptions by converting erroneous charatcters to standard python representation.
to do in future: make a codec-wrapper for safe using in statements like this:
sys.stdout=codecs.getwriter('cp866')(sys.stdout)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | # -*- coding: Windows-1251 -*-
_goodchars=dict()
def console(msg):
'''
Author: Denis Barmenkov <denis.barmenkov@gmail.com>
Date: 02-jul-2007
Write string to stdout.
On UnicodeEncodeError exception all the unsafe chars from string
replaced by its python representation
'''
global _goodchars
try:
print msg
except UnicodeEncodeError:
# get error,
res=''
for i in list(msg):
# try to put unknown characters thru print statement:
if i not in _goodchars:
try:
print i # try print character, some extra trash on screen
# for each unknown printable character
_goodchars[i]=i # safe character, save it as is
except UnicodeEncodeError:
# format character as python string constant
code=ord(i)
if code < 256:
t='\\x%02x' % code # 8-bit value
elif code < 65536:
t='\\u%04x' % code # 16-bit value unicode
else:
t='\\U%08x' % code # other values as 32-bit unicode
_goodchars[i]=t # or '.' for readability ;-)
res+=_goodchars[i] # append to result
print res
if __name__=='__main__':
import codecs
import sys
reload(sys)
# prepare my encodings
sys.setdefaultencoding('cp1251') # set default encoding for source
sys.stdout=codecs.getwriter('cp866')(sys.stdout) # set DOS cyrillic codepage
test_string='\xab'
try:
print 'Using print statement:', test_string
except UnicodeEncodeError:
print 'UnicodeEncodeError exception while using print!'
print 'Using console():',
console(test_string)
|