I received UnicodeEncodeError when playing with various codepages in source code/files/standard streams. Sometime I receive UnicodeEncodeError when script launched via scheduler or in long running batch when parsing unpredictable [alien ;)] HTML.
Function console() helps avoid this exceptions by converting erroneous charatcters to standard python representation.
to do in future: make a codec-wrapper for safe using in statements like this:
sys.stdout=codecs.getwriter('cp866')(sys.stdout)
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | # -*- coding: Windows-1251 -*-
_goodchars=dict()
def console(msg):
    '''
    Author: Denis Barmenkov <denis.barmenkov@gmail.com>
    Date: 02-jul-2007
    Write string to stdout.
    On UnicodeEncodeError exception all the unsafe chars from string
    replaced by its python representation
    '''
    global _goodchars
    try:
        print msg
    except UnicodeEncodeError:
        # get error, 
        res=''
        for i in list(msg):
            # try to put unknown characters thru print statement:
            if i not in _goodchars:
                try:
                    print i # try print character, some extra trash on screen
                            # for each unknown printable character 
                    _goodchars[i]=i # safe character, save it as is
                except UnicodeEncodeError:
                    # format character as python string constant
                    code=ord(i)
                    if code < 256:
                        t='\\x%02x' % code # 8-bit value
                    elif code < 65536:
                        t='\\u%04x' % code # 16-bit value unicode
                    else:
                        t='\\U%08x' % code # other values as 32-bit unicode
                    _goodchars[i]=t # or '.' for readability ;-)
            res+=_goodchars[i]  # append to result
        print res
if __name__=='__main__':
    import codecs
    import sys
    reload(sys)
    # prepare my encodings
    sys.setdefaultencoding('cp1251')                  # set default encoding for source
    sys.stdout=codecs.getwriter('cp866')(sys.stdout)  # set DOS cyrillic codepage
    test_string='\xab'
    try:
        print 'Using print statement:', test_string
    except UnicodeEncodeError:
        print 'UnicodeEncodeError exception while using print!'
        
    print 'Using console():',
    console(test_string)
 | 

 Download
Download Copy to clipboard
Copy to clipboard